Introduction

The compressive nonlinearity of signal processing in the auditory system makes it possible to reduce a very large range of sound intensities (up to 120 dB, i.e., by a factor of 1012 with respect to the power), making this entire range accessible not only for perception, but also for analysis. Compressive nonlinearity occurs at the level of the cochlea (rev. Robles and Ruggero 2001; Cooper 2004) and manifests itself in the responses of auditory nerve fibers (rev. Cooper 2004). The compressive nonlinearity is a property of the “active” cochlear mechanism, which functions as positive feedback through outer hair cells, which perceive vibrations of the basilar membrane and return enhanced vibrations back to the membrane. The gain of the active mechanism is level dependent; the higher the level of the acoustic signal is, the lower the gain. As a result, the range of vibration of the basilar membrane is compressed compared to the input signal range.

In common laboratory animals (e.g., cats and guinea pigs), compressive nonlinearity is prominent in the basal part of the cochlea, thus indicating the effectiveness of the active cochlear mechanism in the upper end of the hearing frequency range. For outer hair cells in vitro, an upper frequency limit of electromotility of 79 kHz has been found (Frank et al. 1999). However, it is not known whether this mechanism is able to function in vivo at frequencies above 80 kHz that are available for animals with unique hearing abilities, such as odontocetes including toothed whales, dolphins, and porpoises (Au 1993; Supin et al. 2001; Au and Hastings 2008). The answer to this question may add to the knowledge of fundamental hearing mechanisms.

Indirect evidence of the efficiency of the active mechanism has been provided by data on frequency tuning of hearing in odontocetes. In the upper part of the frequency range (approximately 100 kHz and higher), the quality of a frequency-tuned auditory filter was assessed to be up to 35 in the bottlenose dolphin, Tursiops truncatus (Supin et al. 1993, 2001; Popov et al. 1996), 20 in the common dolphin, Delphinus delphis (Popov and Klishin 1998), and almost 50 in the beluga whale, Delphinapterus leucas (Klishin et al. 2000; Sysueva et al. 2014) and the porpoises Phocoena phocoena and Neophocaena phocaenoides (Popov et al. 2006). These estimates are several times higher than auditory filter qualities in humans described by a formula by Glasberg and Moore (1990). Assuming that high-frequency tuning is afforded by the active mechanism, the high qualities listed above indicate efficiency of the active mechanism at high frequencies.

However, in several investigations, lower estimates on frequency-tuned auditory filter qualities have been suggested for odontocetes (Johnson 1971; Au and Moore 1990; Finneran et al. 2002; Lemonds et al. 2011, 2012). Although explanations of the disagreement have been presented (Sysueva et al. 2014), further evidence for the efficiency of the active mechanism in the high-frequency hearing of odontocetes is desirable.

Data on compressive nonlinearity may be additional evidence for participation of the active mechanism in high-frequency hearing of odontocetes, because the nonlinearity is a feature of the active mechanism. Research on hearing in humans has shown that cochlear compression can be revealed and measured without invasive experimentation on the cochlea using a psychophysical approach. This approach is based on comparison of masking effects of on- and low-frequency maskers. A feature of the cochlear compressive nonlinearity is that responses to frequencies close to the characteristic frequency (CF) are nonlinear, whereas responses to frequencies below the CF are linear (Robles et al. 1986). Therefore, since responses to both the signal and on-frequency masker are equally subjected to compression, then the on-frequency masker level at threshold (MLT) linearly depends on the signal level. Responses to low-frequency maskers are not subjected to compression in the signal representation. Therefore, when the signal level is varied, less variation in the low-frequency masker level is necessary to reach the masked threshold. The difference between variations in on- and low-frequency MLTs indicates the compression slope for the response to the signal (Oxenham and Plack 1977; Nelson et al. 2001; Lopez-Poveda et al. 2003).

The same approach can be used for investigations of compressive nonlinearity in odontocetes using auditory evoked potential (AEP). Similar to the psychophysical method, the AEP method is relevant for threshold measurement. The AEP method has demonstrated its effectiveness for research on hearing in odontocetes and allows rapid audiometric measurements (Supin et al. 2001). In particular, to produce robust rhythmic AEPs known as the rate-following response (RFR), rhythmic trains of short pips may be used (Supin and Popov 2007). In the present study, this method was exploited for measurements of MLT in experiments aimed to reveal compressive nonlinearity. MLT were measured for test signals of frequencies from 45 to 90 kHz where bottlenose dolphins have high sensitivity of hearing (Au 1993; Supin et al. 2001).

Materials and methods

Subject and facilities

The subject was a male bottlenose dolphin, Tursiops truncatus, with a body length of 275 cm and body mass of 250 kg, housed at the Utrish Marine Station of the Russian Academy of Sciences on the Black Sea coast. The animal was housed in a round seawater tank 6 m in diameter and 1.5 m in depth. The care and use of the animal were in compliance with the Guidelines of the Russian Ministry of Education and Science on the use of animals in biomedical research.

Test stimuli and maskers

The test stimuli were tone pip trains digitally generated at an update rate of 512 kHz. The trains were 16 ms long and contained 16 pips presented at a rate of 1000 pips s− 1. The pip trains were presented at a rate of 20 s− 1. Each pip in the train included 16 cycles of a carrier enveloped by a cosine function (Fig. 1a). The carrier frequency was 45, 64, or 90 kHz. Therefore, the overall duration of each pip was from 0.18 ms at a 45-kHz carrier to 0.09 ms at a 90-kHz carrier, and its equivalent rectangular duration was from 0.07 ms at a 45-kHz carrier to 0.035 ms at a 90-kHz carrier. At all carrier frequencies, the pip spectra (Fig. 1b) had an equivalent rectangular bandwidth of 0.2 oct.

Fig. 1
figure 1

Examples of waveforms and spectra of the signal and maskers. a Waveforms of pips of a 64-kHz carrier frequency (two starting pips of a train are presented); 1: electronic signal activating the transducer, 2: acoustic signal; the records are shifted relative one another for better visibility. The acoustic waveform was delayed relative the electronic waveform by the sound propagation time of 0.67 ms. b Electronic and acoustic frequency spectra of one pip. c Spectrum of 100-ms sample of noise centered at 64 kHz (on-frequency masker). d The same of 100-ms sample of noise centered at 32-kHz (1-oct low-frequency masker). In B to D: El electronic signal activating the transducer, Ac acoustic signal

The stimulus waveforms and spectra were the same as previously described in detail (Popov et al. 2016, 2017).Trains of short pips were used as the test stimuli, because they produced a rhythmic AEP sequence, referred to as the rate-following response (RFR), more effectively than narrow-band sinusoidally modulated tones (Supin and Popov 2007).

The maskers were band-limited noises. A noise bandwidth of 0.2 oct was set by a fourth-order Butterworth filter (Fig. 1c, d). The noise band was centered either at the same center frequency as that of the stimulus (on-frequency noise) or at a frequency below the masker (low-frequency noise). It has been shown (Oxenham and Plack 1977; Nelson et al. 2001; Lopez-Poveda et al. 2003) that in humans, nonlinearity manifests in full measure when the low-frequency masker is down to 1 oct below the signal. Frequency tuning in dolphins is more acute than in humans (see Introduction); nevertheless, for confidence, we tested low-frequency maskers of frequencies within a range of 1 oct. The step of variation was 0.25 oct. So, the frequencies of the low-frequency maskers were 0.25, 0.5, 0.75 or 1 oct below the stimulus frequency.

Acoustic measurements

Acoustic signals and maskers were monitored before and after several experiments by positioning a receiving hydrophone (see Instrumentation below) next to the animal’s head. The spectra of the acoustic signals and maskers did not precisely reproduce that of the electronic signal, because of the frequency response of the transducer and, probably, reverberation. However, their equivalent rectangular bandwidths were maintained at 0.2–0.22 oct (Fig. 1).

The sound level pressure (SPL) of the pip trains were specified in dB root-mean-square (RMS) re 1 µPa over the 16-ms pip-train duration. Computation of the RMS across the entire train duration was used, because at the pip rate of 1000 s− 1, RFR features nearly complete energy summation over both the pips and inter-pip pauses of the train. Therefore, threshold estimates specified in RMS over the entire train duration agree with those provided by other methods (Supin and Popov 2007). The SPL of the masking noise was specified in dB RMS re 1 µPa. Local sound levels around the animal’s head varied within a range of ± 2.5 dB.

Evoked potential recording

Brain potentials were picked up through surface disk electrodes (see Instrumentation below). The active electrode was positioned at the vertex, 7 cm behind the blowhole and above the water surface. The reference electrode was positioned on the dorsal fin, above the water surface. Brain potentials were fed through shielded cables to a balanced amplifier with an 80-dB gain and a frequency passband of 100–3000 Hz. The amplified potentials were digitized at a sampling rate of 16 kHz.

For extraction of responses from the brain potentials, the processing routine extracted 25-ms epochs in synchrony with the onset of the test pip trains. The epochs were averaged on-line. A 16-ms segment of the averaged records (from the 5th to the 22nd millisecond after starting the signal pip train) containing the RFR to the pip train stimulus was fast Fourier transformed online to obtain the response frequency spectrum. With the 16-ms analysis window and 16 kHz sampling rate (256 samples in the window), the spectrum resolution was 62.5 Hz. The 1 kHz spectral peak was considered a measure of the RFR magnitude.

Experimental design

During experimentation, the subject remained in the housing tank. The water level in the tank was lowered to 50 cm. The animal was supported by a stretcher so that the dorsal part of the body and the blowhole were above the water surface. The stretcher was made of fishing net and was transparent to sound. The animal was not anaesthetized or tranquilized.

The transducer that played both the test signals and the masking noise was immersed in the water at a depth of 25 cm, 1 m in front of the animal’s head. To reduce sound reflection from the tank wall, the wall in front of the animal’s head was covered by sound-absorbing material.

The MLT was assessed by recording the RFR to the test tone pips. Within every trial, brain potentials were averaged on-line. To find the MLT, the masker level was varied trial-by-trial according to an adaptive (staircase) one-up–one-down procedure while the signal level was kept constant during the measurement run. The adaptive procedure required on-line decision-making concerning the presence or absence of the response. To make the decision, an arbitrary criterion was applied: a record was considered response-present when the 1 kHz peak in the response spectrum was more than twice as high as any of the spectrum components within an adjacent range from 0.75 to 1.25 kHz. On-line averaging was continued until either the response-present criterion was achieved (the trial was assessed as response-present) or the response-present criterion was not achieved, but all spectral components within the range of 0.75–1.25 kHz were below 0.05 µV (the trial was assessed as response-absent). With this rule applied, 100–500 traces were generally averaged to collect 1 record. With the test signal presentation rate of 20 trains/s, each record took ~ 5–25 s. Masker levels were varied trial-by-trial by 5-dB increments/decrements. If the response was detected according to the criterion specified above, the next masker level was increased by 5 dB; if the response was absent, the next masker level was decreased by 5 dB. Masker levels of four reversal points (transitions from masker level increase to decrease and vice versa) were averaged, and this value was taken as the MLT in the particular measurement run, i.e., for a particular signal level. Signal levels varied run-by-run by 5-dB steps for finding the MLT in every run, i.e., for every signal level. The family of found MLTs resulted in a function of the MLT vs. signal level. Each function was collected five times, and the obtained functions were averaged to obtain the final MLT-vs-signal level function.

Instrumentation

Both signals and maskers were digitally synthesized by a standard personal computer using a custom-made program (Virtual Instrument) designed using LabVIEW software (National Instruments, Austin, TX, USA). The synthesized signals and maskers were digital-to-analog converted by a D/A channel of an NI USB-6251 acquisition board (National Instruments). To amplify and attenuate the signal and maskers, a custom-made low-noise amplifier-attenuator with a 200 kHz passband and 50 Ω output impedance was used.

Both test signals and maskers were played, depending on frequency, through either a B&K 8104 (Bruel & Kjaer, Naerum, Denmark) or ITC-1032 transducer (International Transducer Corporation, Santa Barbara, CA, USA). The playback channel was calibrated by a receiving hydrophone (B&K 8103, Bruel & Kjaer) using a custom-made amplifier with a 40-dB gain and a 200 kHz passband based on an AD820 chip (Analog Devices, Norwood, TX, USA).

Brain potentials were picked up using F-E5G 10-mm golden-plated disk electrodes (Grass Technologies, Warwick, RI, USA) and an LP511 brain potential amplifier (Grass Technologies). Digitizing of the amplified signals was performed with a 16-bit analog-to-digital converter, which was one of the A/D channels of the NI USB-6251 acquisition board.

Processing of the digitized brain potentials was performed using a custom-made program (Virtual Instrument) designed using LabVIEW software (National Instruments).

Results

Rate-following response waveform

Baseline (no masker) RFR records are exemplified in Fig. 2. Test stimuli of high levels (up to 85 dB SPL) provoked a robust RFR that appeared as a sequence of AEP waves at a rate of 1 kHz. The RFR amplitude was stimulus-level dependent; the lower the level was, the lower the RFR amplitude. At a threshold stimulus level (55 dB SPL in Fig. 2), the RFR disappeared in noise, thus allowing the assessment of the response threshold. The RFR amplitude dependence on the stimulus level is illustrated by the frequency spectra (Fig. 3). The thresholds found in such a way were 70 dB at a signal frequency of 45 kHz, 55 dB at a signal frequency of 64 kHz, and 65 dB SPL at a signal frequency of 90 kHz.

Fig. 2
figure 2

Baseline (no masker) RFR records (waveforms). Stimulus frequency 64 kHz. Stimulus SPLs are indicated next to the records; St stimulus envelope

Fig. 3
figure 3

Frequency spectra of the waveforms presented in Fig. 1. Stimulus SPLs are indicated next to the spectra

Rate-following response in the masking background

The RFRs in masking noise were recorded at stimulus levels from 30 to 90 dB above the response threshold in quiet. Signal levels less than 30 dB above threshold were not used, because MLT estimates for near-threshold stimuli could be ambiguous; if absolute threshold fluctuates by at least a few dB, the RFR amplitude could be influenced by both controlled masker level variation and uncontrolled fluctuation of the absolute threshold. Signal levels higher than 90 dB above threshold were not used, because the MLTs of low-frequency maskers might be achieved at fatiguing masker levels.

The RFR records obtained in the masking background are exemplified in Fig. 4 (waveforms) and Fig. 5 (frequency spectra). Test stimuli of the exemplified level (125 dB SPL, 70 dB above absolute threshold) provoked a robust RFR. During the 16-ms stimulation time, the RFR amplitude slightly decayed, thus demonstrating the adaptation effect described previously (Popov et al. 2016, 2018). However, the response was obvious during the entire time of stimulation unless it was suppressed by the masker. An increase in the masker level resulted in a response reduction to disappearance. In the example presented in Figs. 4 and 5, the response was still visible at a masker level of 105 dB and disappeared at a masker level of 110 dB SPL. According to the accepted criteria (see Experimental design), the record in the noise background of 105 dB was assessed as response-present; the record in the noise of 110 dB was assessed as response-absent. In this particular case, the MLT was estimated to be 107.5 dB.

Fig. 4
figure 4

RFR records to stimuli of 64 kHz in a masking background of 64 kHz. The stimulus SPL is 100 dB, and masker SPLs are indicated next to the records; St stimulus envelope

Fig. 5
figure 5

Frequency spectra of the waveforms presented in Fig. 3. Masker SPLs are indicated next to the spectra

Masker level at threshold as a function of the signal level

MLTs as functions of the signal level were found for three signal frequencies, specifically 45, 64, and 90 kHz. For each of these signal frequencies, the functions were obtained at five noise-center frequencies: the on-frequency (the same as the signal frequency) and four low-frequency maskers (0.25, 0.5, 0.75, and 1 oct below the signal). The resulting functions are presented in Fig. 6. Slopes of these functions with 95% confidence ranges are listed in Table 1. Features of all the functions are as follows:

Fig. 6
figure 6

MLT dependences on the stimulus level. Stimulus frequencies: a 45 kHz, b 64 kHz, c 90 kHz. Masker frequencies are indicated in the legends in oct re. stimulus frequency (0 oct indicates on-frequency masker, and negative values indicate low-frequency maskers)

Table 1 Regression slopes for MLT
  1. 1.

    All the functions can be successfully approximated by straight lines (R2 not less than 0.980), either within the entire range of the signal level variation (100 to 165 dB SPL for 45 kHz, 85–145 dB SPL for 64 kHz) or above a breakpoint (95–150 dB SPL for 90 kHz).

  2. 2.

    For on-frequency maskers (0-oct shift signal frequency), the regression slopes were 1.1 with a 95% confidence range as low as 0.05–0.07 at all tested signal frequencies.

  3. 3.

    For low-frequency maskers 1 oct below the signal, the regression slopes were 0.53 to 0.57 with a 95% confidence range as low as 0.03 to 0.05.

  4. 4.

    For low-frequency maskers 0.25–0.75 oct below the signal, the regression slopes had intermediate values, from 0.53 to 0.77.

Discussion

According to the hypothesis of compressive nonlinearity, the slopes of MLT-vs.-signal level functions were expected to be approximately 1 dB/dB for on-frequency maskers and less than 1 dB/dB for low-frequency maskers. In general, the experimental data obtained in the present study fit this expectation.

For the on-frequency maskers, the obtained regression slopes were 1.1 dB/dB. The excess of 0.1 dB/dB over the expected value of 1.0 dB/dB exceeded the 95% confidence ranges. The nature of this excess is not yet clear. As a preliminary hypothesis, it may be assumed that the range of on-frequency MLTs fell within a range of the output-vs.-input function (Glasberg and Moore 2000) that featured stronger compression than the signal range. Irrespective of the validity of this explanation, it is important that for on-frequency maskers, the slopes were never less than 1.0 dB/dB.

The low-frequency maskers featured slopes almost twice as low as those for the on-frequency maskers. These data may be explained in such a way that responses to the signals were subjected to compression at a compression slope of approximately 0.5 dB/dB, whereas effects of low-frequency maskers were not compressed or were compressed less than responses to the signals.

The estimates of the compression slope obtained in the present study (0.53–0.57 dB/dB) were higher (i.e., compression was less prominent) than the estimates of 0.16–0.17 dB/dB obtained in psychophysical measurements in humans by Oxenham and Plack (1977) or 0.2–0.4 dB/dB obtained by Nelson et al. (2001) and Lopez-Poveda et al. (2003), or the compression slope of 0.28 dB/dB predicted by a model by Glasberg and Moore (2000) for 50–80-dB stimuli at a maximum gain of the active cochlear mechanism of 60 dB (the lower the compression slope is, the stronger the compression). A possible explanation of this disagreement is that in the psychophysical experiments in humans, several precautions were taken to avoid effects that reduced the compression rate. These effects could include lateral suppression and off-frequency listening. To avoid lateral suppression, the forward masking paradigm, rather than simultaneous masking, was used with the assumption that suppression stops almost instantly with the offset of the suppressor (Oxenham and Plack 1977). To avoid off-frequency listening, noise was applied in the expected off-frequency listening region (Oxenham and Plack 1977) or signals minimally exceeding the baseline threshold were used (Nelson et al. 2001; Lopez-Poveda et al. 2003). In the latter case, MLT was measured by variation in the gap between the masker and signal. None of these precautions could be taken in experiments that exploit pip-train signals. The forward masking could not be used, because of the rather long duration of the signals; therefore, only simultaneous masking was possible. The rather wide spectrum of the pip-train signals did not allow for depiction of the expected region of the off-frequency listening and application of a noise in this region without affecting the signal itself. Notably, in psychophysical experiments, a compression slope of approximately 0.5 was found for simultaneous masking (Bacon et al. 1999).

Another source of the rather low compression slope might be insufficient frequency spacing between the signal and low-frequency maskers. Larger frequency spacing could not be used in our experiments, because at higher spacing, the masker levels necessary to reach the masked threshold were too high. Nevertheless, the obvious indication of compression was obtained since the MLT was dependent on the signal level at a rate markedly less than 1 dB/dB for low-frequency maskers.

Thus, the data presented herein demonstrate in vivo the cochlear compression at frequencies of up to 90 kHz. Assuming that the effect of compressive nonlinearity is a property of the active cochlear mechanism, the data presented herein may be considered manifestation of contribution of the active mechanism at the tested sound frequencies, i.e., up to 90 kHz. The active cochlear mechanism may take a part in the high discriminative capabilities of the dolphin’s auditory system.