Introduction

Measurement of auditory evoked potentials has proven to be a powerful physiological technique for evaluating auditory system performance, especially in non-verbal subjects not specifically trained for hearing tests. Evoked potential measures are particularly attractive for tests with odontocete cetaceans (e.g., dolphins and porpoises). Because these animals are relatively large, most species are not routinely kept under human care and the time, cost, and required expertise has often prevented psychophysical testing. Odontocetes have a large, highly evolved brain, enlarged auditory centers, and an auditory nerve that is pronounced relative to most other mammalian species (Ridgway 2000). These properties contribute to large amplitude auditory evoked potentials that are relatively easy to measure using surface electrodes. Evoked potentials have been measured in a variety of odontocetes using stimuli such as broadband clicks, tone-pips, multi-tone stimuli, and amplitude modulated tones (e.g., Cook et al. 2006; Dolphin 1996; Dolphin et al. 1995; Mooney et al. 2006; Nachtigall et al. 2005; Popov et al. 1992, 2005; Ridgway et al. 1981; Supin and Popov 1995; Szymanski et al. 1999; Yuen et al. 2005). Although evoked responses to clicks and tone pips are generally large, narrow band stimuli such as multi-tone signals and sinusoidal amplitude modulated (SAM) tones are often preferred for frequency-specific threshold estimates (e.g., Finneran and Houser 2006; Nachtigall et al. 2005; Popov et al. 2005; Supin et al. 2001; Yuen et al. 2005).

SAM tones produce a rhythmic evoked response, called the envelope following response (EFR) that follows the envelope of the SAM stimulus. The EFR is created when sound stimuli are presented sufficiently fast that transient evoked responses overlap and reach a steady-state (Stapells et al. 1984; Supin et al. 2001). The EFR follows the stimulus waveform periodicity and therefore has a fundamental frequency equal to the SAM tone modulation frequency. This feature permits signal analysis in the frequency domain and statistical analyses to objectively determine the presence or absence of an evoked response (Dobie and Wilson 1989, 1996).

The modulation rate transfer function (MRTF) describes the relationship between EFR amplitude and phase and the SAM tone modulation frequency. MRTFs have been measured in a number of odontocetes, including the bottlenose dolphin (Dolphin et al. 1995; Supin and Popov 1995), Risso’s dolphin (Mooney et al. 2006), beluga (Dolphin et al. 1995; Klishin et al. 2000), beaked whale (Cook et al. 2006), and false killer whale (Dolphin et al. 1995). Marine mammal MRTFs are qualitatively similar to those measured in humans and other terrestrial mammals (Stapells et al. 1984; Kuwada et al. 1986; Dolphin and Mountain 1992). Mammalian MRTF amplitudes exhibit minima and maxima, presumably caused by constructive and destructive interference of evoked potentials arriving from spatially separated generation sites (Stapells et al. 1984); however, the general shape of the MRTF is that of a low pass filter and there is some upper modulation frequency above which EFR amplitudes progressively decline (Stapells et al. 1984; Kuwada et al. 1986). This upper cutoff frequency is related to the ability of the auditory system to respond to rapid changes in the envelope of a sound, called temporal resolution (Viemeister 1979; Viemeister and Plack 1993). Temporal resolution may be assessed psychophysically by measuring SAM tone modulation depth thresholds as a function of modulation rate (Viemeister 1979; Viemeister and Plack 1993). Supin and Popov (1995) showed good linearity between EFR amplitudes and SAM tone modulation depth in dolphins, indicating that EFR MRTFs could also be used to assess temporal resolution. Purcell and John (2004) demonstrated that the maximum modulation rate for detectable EFRs in humans was correlated with psychophysical measures of maximum perceived threshold, which also indicates that EFR measurements could serve as an objective method of assessing temporal resolution.

Knowledge of the MRTF is also important for auditory threshold assessment using the EFR. Since the particular modulation frequency will affect the dynamic range of EFR measurements, some care must be taken to ensure an appropriate modulation frequency for threshold testing via EFR methods. For species that have not been previously tested, estimating the MRTF prior to EFR threshold measurements is prudent to ensure the use of an appropriate modulation rate (e.g., Cook et al. 2006; Mooney et al. 2006).

EFR phase angles also vary as a function of the modulation frequency. Measurements of EFR phase angles at different modulation frequencies may be used to estimate group delays, and thus evoked response generation sites, associated with EFRs (Rickards and Clark 1984). Bottlenose dolphin and beluga group delays calculated from EFR phase angles (Dolphin et al. 1995; Supin and Popov 1995) suggest three distinct generation sites (cortex, brainstem, and periphery) in contrast to the two major sites postulated for humans (cortex, brainstem) (Kuwada et al. 1986; Purcell and John 2004).

Published MRTFs for odontocetes were measured in subjects for whom auditory thresholds were either known to be or assumed to be within normal limits (Dolphin et al. 1995; Klishin et al. 2000; Supin and Popov 1995). The present paper reports the results of a study to examine modulation rate transfer functions in two bottlenose dolphins: one with “normal” hearing and the other with mild to moderate hearing loss (ASLHA 2004) at frequencies above 40 kHz. The objectives were to determine the degree to which moderate hearing loss affected the MRTF amplitude and phase and to examine the suitability of frequency spectra from click-evoked potentials to predict MRTF amplitude and phase. Modulation rate transfer functions, derived with SAM tones of different carrier frequencies, were compared across carrier frequencies and subjects. The spectral amplitude and phase from click-evoked potentials were also measured and compared to the MRTFs of both subjects.

Methods

Subjects

The subjects were two bottlenose dolphins: BLU (female, 40 years, 200 kg) and WEN (male, 21 years, 210 kg). Both subjects were housed in netted enclosures in San Diego Bay, California. The protocols followed in this study were approved by the Institutional Animal Care and Use Committee of the Biosciences Division, SSC San Diego, and were in compliance with all applicable US. Department of Defense guidelines for the care of laboratory animals.

Both subjects had extensive prior experience with psychophysical and evoked potential tests. Prior behavioral and EFR measurements for BLU revealed high-frequency hearing loss beginning around 40 kHz (Finneran and Houser 2006). An audiogram in quiet conditions did not exist for WEN; however, behavioral hearing thresholds in San Diego Bay and EFR thresholds in-air revealed good high-frequency hearing to approximately 140 kHz (Finneran and Houser 2006). Figure 1 shows sensation levels (dB re: EFR threshold) for the SAM stimuli used in this study. At 10, 20, and 30 kHz there were no significant differences in subjects’ thresholds (one-way ANOVA, P > 0.05); however, BLU’s thresholds at 50 and 60 kHz were approximately 40 dB above WEN’s.

Fig. 1
figure 1

Sensation levels for 130 dB re 1 μPa stimuli for BLU and WEN. Error bars represent standard errors. There were no significant differences in thresholds at 10, 20, and 30 kHz. BLU’s thresholds at 50 and 60 kHz were approximately 40 dB above WEN’s

Evoked potential measurements

Tests were conducted in air, with the subjects resting on foam mats. Sound stimuli were generated, and evoked potentials digitized by, a multifunction data acquisition card (National Instruments PCI-MIO-16E-1) residing in a personal computer. A piezoelectric transducer embedded in a silicon suction cup (a “jawphone”) was positioned over the lower left jaw and used to present sounds to the subject (Brill et al. 2001; Møhl et al. 1999; Moore et al. 1995). Gold-plated cup electrodes embedded in suction cups were used to non-invasively measure evoked responses. Electrodes were coupled to the skin surface using conductive paste. Responses were differentially measured between the non-inverting electrode located at the vertex, approximately 5 cm posterior to the blowhole, and the inverting electrode positioned contralateral to the jawphone near the external auditory meatus. A biopotential amplifier (Grass IP-511) filtered (see below) and amplified (×105) the electrode signals. Each measurement was based on the average of 500 epochs. Epochs with peak instantaneous voltage exceeding 20 μV were excluded from the averaging. Evoked responses were digitized at 15 or 20 kHz. The AEP measuring system and jawphone calibration techniques are described in detail in Finneran and Houser (2006).

EFR measurements used 62 ms SAM tones with a 1 ms cosine envelope rise and fall. Stimulus sensation levels for each subject are shown in Fig. 1 as a function of carrier frequency. Stimuli were presented intermittently at the rate of approximately 14/s. Carrier frequencies were 20, 30, 40, 50, and 60 kHz. Modulation rates were varied in 50-Hz steps from 100 to 1,000 Hz (except 317 Hz was substituted for 300 Hz to avoid electrical line noise), 100-Hz steps from 1,100 to 2,000 Hz, and 500-Hz steps from 2,500 to 5,000 Hz. Most EFR measurements used a bioamplifier bandpass filter setting of 0.3–3 kHz; however, some additional measurements were made using 0.1–3 kHz and 0.3–10 kHz filters for the lower (100–500 Hz) and upper (1,800–5,000 Hz) frequencies, respectively. Evoked responses were analyzed over a 60 ms time period centered on the response, yielding a 16.7-Hz resolution for frequency domain analysis. The presence or absence of a response was determined using a statistical technique: if the magnitude-squared coherence exceeded the critical value based on α = 0.01 and 20 subaverages, the response was considered to be detected (Dobie 1993; Dobie and Wilson 1989, 1996).

EFR amplitudes and phase angles were corrected for the bioamplifier frequency response; phase measurements were also corrected for the time delay between stimulus onset and analysis window start. Since sounds were delivered via a jawphone attached to the subject, no phase correction was applied for sound propagation delay. Measured EFR phase angles were unwrapped by adding ±2π rad if the phase difference between adjacent values exceeded ±π rad. Linear regression was performed on the phase versus modulation frequency data and the slope of each regression line, Δθf m , was used to calculate the group delay T d :

$$ T_{d} = \frac{{\Delta \theta /\Delta f_{m} }} {{2\pi }}, $$
(1)

where Δθf m has units of rad/Hz and T d is expressed in seconds.

Click-evoked potentials were generated using a 100 μs rectangular pulse applied to the jawphone at a rate of approximately 75/s. This produced a double-pulsed sound pressure signature, as measured underwater using a calibrated hydrophone (B&K 8105) at a distance of 15 cm from the jawphone [see Finneran and Houser (2006)]. Figure 2 shows the resulting sound pressure spectrum level. The −10 and −20 dB bandwidths were 55–112 kHz and 20–120 kHz, respectively. Click-evoked responses were averaged over 500 epochs. Six averaged responses, recorded over multiple days (BLU = 3, WEN = 6) were analyzed from each subject. Frequency analysis was performed using a 13 ms (rectangular) window.

Fig. 2
figure 2

Underwater sound pressure spectrum levels measured 15 cm from jawphone excited with 100 μs rectangular click. The frequency resolution is 1 kHz

Results

Figure 3 shows the EFR instantaneous voltages recorded for BLU using a 30 kHz carrier frequency and modulation rates from 100 to 2,000 Hz. Figure 4 presents the corresponding amplitude spectra. These data (Figs. 3, 4) are representative in most respects: a delay of approximately 4 ms occurred between the stimulus onset and the visible evoked response waveform, confirming physiological origin of the responses rather than stimulus artifact. Evoked responses had fundamental frequencies corresponding to the SAM modulation frequency and, in many cases, additional amplitude peaks corresponding to the second and third harmonics of the modulation frequency. Responses were generally not detected with our measurement system when the modulation rate exceeded 2,000–2,500 Hz.

Fig. 3
figure 3

EFR instantaneous voltages measured for BLU using a 30 kHz carrier frequency. The numbers by each series indicate the modulation rate, from 100 to 2,000 Hz. No responses were detected at 800 Hz or above 2,000 Hz

Fig. 4
figure 4

EFR amplitude spectra corresponding to the waveforms in Fig. 3. The frequency resolution is 16.7 Hz. The numbers by each series indicate the modulation rate, from 100 to 2,000 Hz. No responses were detected at 800 Hz or above 2,000 Hz

Figures 5 and 6 present the MRTF amplitudes (in dB) and phase angles, respectively, for BLU and WEN at each carrier frequency. Amplitudes and phase angles are only displayed for detected EFRs. Mean values are reported for modulation frequencies tested more than once. MRTF amplitudes at each carrier frequency for BLU and WEN are offset to improve clarity (to preserve the amplitude relationship, the same offset was applied to BLU’s and WEN’s data at each carrier frequency). Phase values were also offset (individually) for clarity. For each set of phase data, linear regressions were performed over the modulation frequency range 100–1,600 Hz where the data points exhibited good linearity (r 2 > 0.99 for all regressions). Group delays calculated from the regression line slopes were 3.4, 3.5, 3.5, 3.6, 3.6 ms (median = 3.5 ms; mean ± SD = 3.5 ± 0.08 ms) for BLU and 3.6, 4.4, 3.5, 3.4, and 3.4 ms (median = 3.5 ms; mean ± SD = 3.7 ± 0.42 ms) for WEN at 20, 30, 40, 50, and 60 kHz, respectively. All group delays were between 3.4 and 3.6 ms except WEN at 30 kHz (4.4 ms).

Fig. 5
figure 5

MRTF amplitudes (in dB) for BLU and WEN. Data series pairs (BLU and WEN) at each carrier frequency have been offset for clarity

Fig. 6
figure 6

MRTF phase angles for BLU and WEN. Data series have been individually offset for clarity. The solid and dashed lines are linear regressions applied to the phase data from 100 to 1,600 Hz for BLU and WEN, respectively

Figure 7 shows the upper cutoff frequencies, defined as the modulation frequency corresponding to an MRTF amplitude of −20 dB re the maximum amplitude. Upper cutoff frequencies ranged from 1,400 to 1,940 Hz for the carrier frequencies tested. Mean values (±SD) were 1,670 ± 173 Hz and 1,830 ± 131 Hz for BLU and WEN, respectively.

Fig. 7
figure 7

Upper cutoff frequencies for BLU and WEN defined by the −20 dB re maximum amplitude points

Figures 8 and 9 show click-evoked potential (a) waveforms, (b) amplitude spectra, and (c) phase spectra, for BLU and WEN, respectively. In Figs 8a and 9a, the upward direction represents a positive difference between the non-inverting and inverting electrodes. All six (averaged) click-evoked responses are superimposed to help identify trends. Initial positive peaks occurred around 1.7 ms, with the largest trough occurring at approximately 3.9 ms. Linear regressions were performed on the phase angle versus frequency data over the frequency range 1,000–1,500 Hz (just after the abrupt phase shifts near 900–1,000 Hz); mean group delays calculated from the regression slopes were 3.5 and 3.6 ms for BLU and WEN, respectively. Mean upper cutoff frequencies (based on the −20 dB amplitude re maximum) calculated from click-evoked potential spectra were 1,810 and 1,880 Hz for BLU and WEN, respectively, higher than those measured from SAM stimuli (1,670 and 1,830 for BLU and WEN, respectively).

Fig. 8
figure 8

Click-evoked potentials measured from BLU. a Time waveforms of six individual click-evoked potentials. b Amplitude and c phase spectra for the click waveforms shown in a. The dashed lines show the amplitude and phase spectra for the 20 kHz MRTF for comparison

Fig. 9
figure 9

Click-evoked potentials measured from WEN. a Time waveforms of six individual click-evoked potentials. b Amplitude and c phase spectra for the click waveforms shown in a. The dashed lines show the amplitude and phase spectra for the 20 kHz MRTF for comparison

Discussion

Envelope following response amplitudes

Plots of EFR amplitude as a function of modulation frequency were similar in shape to those observed in prior investigations of odontocetes such as the bottlenose dolphin (Dolphin et al. 1995; Supin and Popov 1995), beluga (Dolphin et al. 1995; Klishin et al. 2000), beaked whale (Cook et al. 2006), and Risso’s dolphin (Mooney et al. 2006). Although the specific locations of maxima and minima varied somewhat with carrier frequency, peaks near 550–600 Hz and 1,000–1,200 Hz were common; at higher carrier frequencies a third peak was visible near 1,400–1,700 Hz as well. These locations, as well as an additional peak suggested by the limited data at 2,500 Hz, agree closely with results previously presented by Supin and Popov (1995) for Tursiops. For BLU, the peak at 550–600 Hz generally had the largest amplitude; for WEN the peak near 1,000 Hz was often the largest, though in most cases the differences were within 2–3 dB. The maxima/minima are believed to result from interference between potentials generated from multiple sources (Supin and Popov 1995), so the specific locations and amplitudes of maxima and minima may be related more to electrode position than to real differences in EFRs (Supin and Popov 1995).

Upper cutoff frequencies exhibited more variation (several hundred Hz), both with modulation frequency and between subjects, than those reported by Supin and Popov (1995). Although BLU’s high-frequency hearing loss did not substantially affect the MRTFs, upper cutoff frequencies were generally lower (about 11%) than those of WEN. The observed differences seem unlikely to have resulted simply from differences in stimulus sensation level, since sensation levels were similar at 20 and 30 kHz, where upper cutoffs differed by ∼200 Hz, and sensation levels were very different at 50 kHz, where upper cutoff frequencies were nearly identical. Mean values for upper cutoff frequency (across all carrier frequencies) for BLU and WEN were not significantly different (paired t test, P > 0.05). Responses were reliably detected with our measurement system up to 2,000–2,500 Hz, with occasional detections up to 3,000 Hz. Although higher modulation frequencies may have evoked responses as well, these were not significantly different from the measurement system noise according to the statistical approach used in this study (magnitude-squared coherence with α = 0.01 and 20 subaverages).

EFR amplitudes were similar between WEN and BLU, regardless of the modulation frequency tested. Sensation levels at the 40, 50 and 60 kHz carrier frequencies were 20, 40, and 40 dB lower for BLU than for WEN, yet the EFR amplitudes at those modulation rates were approximately the same. The similarities in response amplitudes may be related to the SAM stimuli sound pressure levels (SPLs); the particular SPL was chosen to produce detectable responses across a wide range of modulation rates, including those rates that are not optimal. The use of a lower SPL would have reduced the dynamic range of the measurement and prevented detection of the MRTF local minima. However, the relatively large SPLs may have resulted in a significant spread of energy along the basilar membrane and neurological recruitment, resulting in relatively large EFR amplitudes. It is possible that lower SPLs would have produced lower EFR amplitudes in BLU compared to WEN, however, this would have prevented estimating the MRTF with any precision.

Overall, BLU’s high frequency hearing loss did not substantially affect the shapes or amplitudes of the MRTFs, despite large differences in stimulus sensation levels. Supin and Popov (1995) also measured MRTFs at different sensation levels and saw no substantial differences; however, in the present study the changes in sensation level occurred from pre-existing hearing loss in one of the subjects. The factors producing loss of sensitivity at high frequencies may result in other adverse effects on the auditory system, such as concomitant reductions in temporal and spectral pattern processing (Divenyi et al. 2005; Fitzgibbons and Gordon-Salant 2001, 2004),so the comparisons made here are worthwhile. The results of the present study also mean that measurement of the MRTF will not necessarily reveal high frequency hearing loss and that this technique is not a suitable stand-alone technique to assess auditory sensitivity.

Envelope following response phase angles

Phase angles changed linearly with modulation frequency from 100 to approximately, 1,600–1,800 Hz for both subjects at all carrier frequencies. This indicates a constant time delay for EFRs, and thus a consistent location of sources, over this range of modulation frequencies. Group delays calculated from the slope of the phase data were consistently between 3.4 and 3.6 ms, except for WEN at 30 kHz, where the delay was 4.4 ms. The reason for this difference is unknown—there is nothing unique in WEN’s EFR thresholds or MRTF amplitudes at 30 kHz. The 30 kHz data were collected during the same session, immediately before the 40 kHz data, using the same electrode placement. Calculated group delays from 100 to 1,600 Hz were slightly lower than those previously reported for Tursiops (∼4 ms, Supin and Popov 1995; ∼5 ms for modulation rates between 80 and 250 Hz, Dolphin et al. 1995). The lower calculated delays may have been a result of the jawphone sound source and monotic stimulation as opposed to direct field or freefield stimulation. Additionally, carrier frequencies used by Dolphin et al. (1995) were several octaves lower (≤4 kHz) than those used here. Group delays for carrier frequencies of 4, 5.6, and 8 kHz were not specifically provided by Supin and Popov (1995), so direct comparisons to group delays obtained at those frequencies cannot be made. If differences in the peripheral acoustic pathways exist for low frequencies (<5 kHz) and high frequencies (>20 kHz), then differences in group delays might result from frequency specific propagation along these pathways. Prior estimates of the dependency of cochlear traveling wave velocities on stimulus frequency in dolphins suggest that progressively lower frequencies increase the latency of evoked responses (Popov and Supin 2001). EFR latencies decrease with increasing stimulus level (Stapells et al. 1984) but it is unlikely that differences between group delays in this study and those reported by Dolphin et al. (1995) are related to differences in received sound levels, since both studies featured stimuli at comparable SPLs.

Supin and Popov (1995) reported similar EFR phase results for Tursiops; however, the results of the present study conflict sharply with those of Dolphin et al. (1995), who reported three distinct group delays for Tursiops: 20.1 ms from 18 to 80 Hz, 5.7 ms from 80 to 250 Hz, and 1.3 ms for frequencies >250 Hz. Although the present study did not use sufficiently low frequencies to reveal the suspected cortical generator with group delay ∼20 ms, the phase data and resulting group delays over the frequency range 100–1,300 Hz differ dramatically. Although group delays will depend, to some extent, on electrode position (Supin and Popov 1995), the present study used the same electrode montage as Dolphin et al. (1995), thus the discrepancy is most likely a result of the modulation frequency spacing used by Dolphin et al. (1995) being too large to resolve ±2π rad phase ambiguities.

Click-evoked potentials

Click-evoked potential amplitude and phase spectra were similar to MRTF spectra and featured the same general low-pass nature and constant slope phase over the modulation frequency range of about 300–2,000 Hz. The locations of minima/maxima were similar but not exactly the same. These data indicate that frequency spectra of click-evoked potentials may be useful as proxies for MRTFs in situations where time and access are limited. Application of these proxies should be limited to the general shape of the MRTF rather than specific locations of maxima and minima and caution should be used in applying the click-evoked potential spectra to predict EFR amplitudes at specific modulation frequencies. Differences between click-evoked potential spectra and the MRTF may exist in part because of the level of the click stimulus. Prior comparisons between click-evoked potential spectra and MRTFs found that clicks emitted at lower stimulus levels, i.e., closer to threshold, produced spectra that were most similar in nature to the MRTF (Supin and Popov 1995).