Keywords

1 Introduction

Cetaceans are acoustic specialists that rely heavily on sound for communication, navigation, hunting, foraging, and protection (Mann et al. 2000). Individual fitness can be compromised if ocean noise has a negative impact on these basic survival abilities. When one sound interferes with the ability to detect, discriminate, or recognize another sound, auditory masking occurs.

1.1 Critical Ratios and Critical Bands

Early studies focused on measuring critical bands (CBs) and critical ratios (CRs) and how the spectral density of noise within a limited bandwidth (e.g., one-third octave) affected masked thresholds. CRs have become a standard metric for describing and predicting auditory masking due to their relative simplicity. CRs can be calculated by

$$ \mathrm{C}\mathrm{R}={\mathrm{L}}_{\mathrm{S}}\;{\mathrm{L}}_{\mathrm{N}} $$
(13.1)

where LS is the signal sound pressure level (SPL) at threshold (in dB re 1 μPa) and LN is the spectral density of the noise (in dB re 1 μPa2/Hz). The accuracy of CRs in predicting masked tonal thresholds in environmental noise is limited, however, primarily because CRs assume that the noise is Gaussian (G) and that masking is limited to a narrow band of noise centered on a signal’s frequency (Au and Moore 1990). In non-G noise, CRs have been shown to vary by as much as 22 dB (Fig. 13.1).

Fig. 13.1
figure 1

Critical ratios (+SD) and spectrograms for seven different noise types. The bandwidth of each spectrogram is 6–14 kHz. Data are from Branstetter et al. (2013)

1.2 Comodulation Masking Release

In addition to the spectrum level of noise, the time-domain features of noise also affect auditory masking. When noise is amplitude modulated (AM) across frequency regions (i.e., comodulated), a release from masking known as the comodulation masking release (CMR) occurs. Several studies have demonstrated CMR in odontocetes using synthetic noise and natural noise sources (Branstetter and Finneran 2008; Erbe 2008; Trickey et al. 2011; Branstetter et al. 2013). Lower AM rates result in a more salient CMR (Branstetter and Finneran 2008). Amplitude modulation must be coherent across auditory filters (Branstetter et al. 2013); thus, noise bandwidths must exceed a critical band for CMR to occur. As a result, time-domain noise metrics in addition to the pressure spectral density (PSD) are needed to describe and predict auditory masking from different noise types.

1.3 Study Goals

Experiments were conducted to quantify the relationship between specific noise metrics and masked-detection thresholds. Several noise types (biological, anthropogenic, and synthesized noise) at four different spectral-density levels (85, 90, 95, and 100 dB re 1 μPa2/Hz) were used to measure masked-detection thresholds for a 10-kHz tonal signal. Statistical models were then used to identify the noise metrics related to auditory masking.

2 Participants

Three Atlantic bottlenose dolphins (Tursiops truncatus) participated. All participants had normal hearing at the frequencies tested. The study followed a protocol approved by the Institutional Animal Care and Use Committee of the Biosciences Division, Space and Naval Warfare Systems Center Pacific and all applicable US Department of Defense guidelines for the care of laboratory animals.

2.1 Behavioral Hearing Tests

Participants were trained to position on an underwater bite plate and whistle in response to a 10-kHz tone (tone trial) or remain silent if no tone was present (catch trial). A one-down one-up adaptive-staircase procedure (Levitt 1971) was used to estimate thresholds at the 50% correct level. Noise was continuously played (from the same projector as the signal) for the duration of the threshold estimation procedure. A complete description of the testing procedure can be found in Branstetter et al. (2013).

Seven noise types were used as maskers (Fig. 13.1). Five of the noise types were field recordings: snapping shrimp, rain, boat, pile saw, and ice squeaks. The remaining two noise types, G and comodulated noise, were synthesized (see Branstetter and Finneran 2008). All noise types were band-pass filtered (6–14 kHz) to produce a flat spectrum. Noise level was an independent variable and varied from 80 to 100 dB re 1 μPa2/Hz in 5-dB increments. Complete details of the noise recordings and synthesis can be found in Branstetter et al. (2013).

2.2 Noise Metrics

The noise metrics used can be divided into three categories, (1) waveform, (2) frequency spectrum, and (3) the envelope of the temporal waveform, and are listed in Table 13.1.

Table 13.1 Metrics and abbreviations used in the statistical models

An additional metric, the comodulation index (CI), was designed to measure the degree to which a noise sample is comodulated (i.e., amplitude modulation is correlated across frequency regions). To calculate the CI, noise is first band-pass filtered into a signal (S) band (9.5–10.5 kHz), a low-frequency (LF) band (8.5–9.5 kHz), and a high-frequency (HF) band (10.5–11.5 kHz). The bandwidth of the filters approximates the auditory filter bandwidth at these frequencies (Branstetter and Finneran 2008). The Hilbert envelope is extracted from the output of each filter and the DC component is subtracted. The magnitude-squared coherence (MSC) is then calculated between the S and LF envelopes and again between the S and HF envelopes, resulting in two 1-dimensional (1-D) arrays (Fig. 13.2). To reduce MSC values from the two 1-D arrays to a single value (CI), the maximum MSC value was selected from both arrays, resulting in the CI (Fig. 13.2).

Fig. 13.2
figure 2

Processing stages to calculate the comodulation index (CI). (a) Noise is band-pass filtered into a signal (S) band, a low-frequency (LF) band, and a high-frequency (HF) band (waveforms in b, c, and d, respectively). The Hilbert envelope is extracted from each band of noise (thick lines in b, c, and d, respectively). The magnitude-squared coherence (MSC) is calculated between the Hilbert envelopes from the S and LF bands and again from the S and HF bands. (e) MSC as a function of frequency for three noise types. Each function is the average of five 100-ms segments. Each noise type has two functions because the S band is compared with both the LF and HF bands. Noise that is comodulated has a higher MSC at the lower frequencies. The CI was calculated by selecting the largest MSC for a given noise type regardless of frequency

2.3 Statistical Models

Multiple-regression models were constructed in the statistical language R Development Core Team (2012). Noise metrics (Table 13.1) were modeled as explanatory variables to evaluate their relationship to the resulting masked thresholds. Models were simplified by fitting a maximum model and then removing the nonsignificant explanatory variables (stepwise deletion).

3 Results

An exponential decay function including both PSD and CI proved to be the most parsimonious, best fit model

$$ y={b}_1PSD+{b}_2{e}^{CI/{b}_3} $$
(13.2)

where y is the predicted threshold, and b 1, b 2, and b 3 are parameter estimates. Figure 13.3 displays masked threshold values fit with the exponential decay model in which b 1 = 1.13, b 2 = 32.84, and b 3 = 0.24. Data are displayed with a surface plot representing model predictions. The data points represent masked thresholds from 3 participants, with 12 different noise types collected over 6 year. Analysis of the residual errors demonstrates that the two-parameter model produces much better fits than CR predictions while still being simple and parsimonious

Fig. 13.3
figure 3

Model fits (surface plot) and masked thresholds (data points). PSD pressure spectral density

4 Discussion

A simple two-parameter model including both the PSD and CI appears to explain the bulk of the auditory-masked threshold data within this study. The relationship between thresholds and PSD is linear, whereas the relationship between thresholds and CI appears to follow a decelerating trajectory. Mitigating the effects of auditory masking depends on our ability to describe and predict masking in a wide range of conditions. Predictions based on CRs (or other spectra–based measurements) are an important first step, but the predictions are limited to the noise type in which the CRs were estimated (i.e., G noise). Time-domain metrics related to noise must be included to improve masked-threshold predictions.