Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Auditory perception may be defined as the ability to detect, interpret, and attach meaning to sounds. For marine mammals, auditory perception plays a critical role in a variety of acoustically mediated behaviors, such as communication, foraging, social interactions, and avoidance of predators. Auditory perception can play an important role in detecting objects in the environment, discriminating between objects, and identifying the location of objects. Auditory perception is also a key component in auditory scene analysis—i.e., segregating a mixture of sounds from a complex natural environment into “auditory streams” produced from individual sources and attending to those streams of interest (Bregman 1990).

Although perception involves many other factors beyond merely hearing or detecting sounds, sound detection is a required element for perception. As with many other processes, sound detection may be adversely affected by the presence of noise. Because auditory perception plays a key role in so many vital tasks, noise that adversely affects sound perception could ultimately result in fitness consequences to the individual.

This chapter focuses on two of the most common manifestations of the effects of noise on sound detection: auditory masking and noise-induced threshold shifts. Masking can be described as a reduction in the ability to hear a sound caused by the presence of another sound. A noise-induced threshold shift is a reduction in auditory sensitivity following a noise exposure. Both masking and threshold shifts have the effect of reducing an animal’s auditory sensitivity over some frequency bandwidth, with the key distinction between the two that masking essentially occurs during the noise exposure, while a threshold shift persists after cessation of the noise. Because both processes are heavily influenced by the function of the peripheral auditory system, we begin with a brief overview of the anatomy and function of the ear in marine mammals, followed by individual discussions of masking and noise-induced threshold shifts. The relevant literature in each area is reviewed and synthesized to present the current understanding of these phenomena in marine mammals. Finally, some conclusions are presented and directions for future research proposed.

2 The Peripheral Auditory System in Marine Mammals

As in terrestrial mammals, the peripheral auditory system of marine mammals includes the external (outer) ear, middle ear, and inner ear. The external ear includes the pinnae (if present), the external auditory meatus (ear canal), and the tympanic membrane. The external ears of marine mammals exhibit a variety of adaptations from their terrestrial ancestors. The pinnae are absent in all cetaceans, and the external auditory meatus appears to be vestigial in most cetaceans (Ridgway 1999). The external ear pinna is small in otariid pinnipeds, but absent in phocids, odobenids, and sirenians (Nummela 2008b). For echolocating odontocetes, high frequency sounds are received through specialized fatty tissues in the lower jaws that offer a path to the ear (Ketten 2000; Nummela 2008b; Popov et al. 2008), thus these structures may also be considered as part of the external ear in these species. The ear of delphinoid cetaceans, unlike other species including Physeteridae, Kogiidae, and Ziphiidae, is suspended in the enlarged, air-filled peribullar space by fibrous bands with no bony connection to the skull (Ketten 2000). This suspension acoustically isolates each ear from the skull (McCormick et al. 1970).

The middle ear includes three small bones, the malleus, incus, and stapes that link the tympanic membrane to the fluid-filled cochlea of the inner ear. In odontocetes, the ossicular chain is more massive than in land mammals, but also stiffer, resulting in the middle ear apparatus being tuned to a higher frequency (Ketten 2000). In delphinoid cetaceans the malleus is not in direct contact with the tympanic membrane, but there is a large tympanic ligament that contacts the malleus. In mysticetes, the ossicles are also massive but apparently lack the stiffening elements, suggesting a lower frequency response (Ketten 2000). The middle ear ossicles are enlarged in sirenians, phocids, and odobenids; however, otariid middle ear ossicles are of similar size to terrestrial carnivores (Nummela 2008a, b).

Vibrations of the stapes are transmitted to the basilar membrane and organ of Corti located within the cochlea. The organ of Corti contains four rows of delicate mechanosensory hair cells: three rows of outer hair cells and one row of inner hair cells. Motion of the stapes causes fluid motion within the cochlea, which results in displacement of the basilar membrane, and deflection of the hair cell stereocilia. The inner hair cells generate neural impulses when their ciliary bundles are deflected, and thus provide the main neural output from the cochlea to the brain. In contrast, the outer hair cells have a motor function, and change their shape and stiffness in response to neural signals from the brain. The outer hair cells may therefore influence the mechanics of the cochlea, and form part of an active mechanical preamplifier which enhances the performance of the auditory system (de Boer and Nuttall 2010).

The mechanical properties of the basilar membrane vary along the length of the cochlea, from high stiffness near the base (where the stapes is attached), to lower stiffness at the apex. This results in a frequency-dependent vibration pattern of the basilar membrane with the basal portion responding best to high frequencies and the apical portion responding best to lower frequencies. For any specific location on the basilar membrane, there will be some frequency that produces a maximum vibration amplitude; lower frequencies will still displace the membrane (though with smaller amplitude) and higher frequencies will produce very little displacement at that location. Different populations of inner hair cells thus respond preferentially to different frequencies, depending on the physical position of the hair cell along the length of the basilar membrane. An inner hair cell is thus said to be “tuned” to a certain frequency, called the characteristic frequency, depending upon its location along the basilar membrane; hair cells near the cochlear base have higher characteristic frequencies than those located near the apex. The frequency-dependent basilar membrane motion and hair cell tuning therefore result in a frequency-to-place mapping within the cochlea. This mechanism is often referred to as the auditory filter, since, for a given nerve fiber, the cochlea performs band-pass filtering.

Hair cell tuning arises from two mechanisms: a passive component arising from the mechanical properties of the basilar membrane, and an active component that arises from outer hair cell motility. The passive component results in relatively broad tuning while the active component “sharpens” tuning by increasing the vibration amplitude over a narrow range of frequencies. As the received sound pressure level (SPL) increases, the relative contributions between the active and passive processes change, with the passive process becoming more dominant. The result is a broadening of hair cell tuning, or auditory filter width, at higher sound levels (Anderson et al. 1971; Moore and Glasberg 2003).

Compared to terrestrial species, the inner ears of marine mammals are functionally analogous, but differ in the contact with bones of the skull (fibrous suspension or bony connection), cochlear dimensions, basilar membrane length, thickness, and stiffness, hair cell densities, and innervation. This results in species-dependent parameters for the audible frequency range.

In summary, the sensation of hearing in marine mammals, results from sound conducted via the head to the cochlea. In many species the conduction chain is via the external and middle ear, while in delphinoid cetaceans experimental data suggest that transmission of sound is via the fat body of the lower jaw directly to the stapes or inner ear (McCormick et al. 1970). In all species, vibration of the basilar membrane causes deflection of the inner hair cell stereocilia and the generation of neural impulses. Although there are many species-specific differences and significant peripheral auditory system adaptations from land mammals, the inner ears of marine mammals are functionally analogous to those of land mammals, with the most substantial differences concerning the frequency range of hearing. As in land mammals, the complex, frequency-specific vibration patterns of the basilar membrane, the tuning characteristics of the hair cells, and the role of the outer hair cells in active cochlear amplification have a profound impact on the perception of sound. These factors also figure prominently in the discussion of auditory masking and noise-induced threshold shifts.

3 Auditory Masking

Auditory masking occurs when one sound (usually called noise) interferes with the detection, discrimination, or recognition of another sound (usually called the signal). Although well-studied in humans, only basic auditory masking studies related to signal detection have been performed on marine mammals due to animal availability and the difficulties associated with training an animal to perform a psychophysical hearing test. Of the few masking experiments performed on marine mammals, most are of the type where the animal is required to detect a tonal signal in the presence of another tone or broadband Gaussian noise. The results of these experiments can usually be explained within the framework of the power spectrum model of masking (described in detail below) and represent an important first step in understanding auditory masking in these animals. More recent experiments using complex and realistic sounds (both signal and noise) suggest that descriptions of auditory masking in marine mammals, like in humans, cannot be reduced to metrics exclusively related to frequency and SPL. At the very minimum, the temporal patterns of sounds, as well as the location of the sounds relative to each other, also play important roles in describing how two or more sounds are segregated in a complex auditory scene (for similar phenomena in anurans and birds, see Chaps. 6 and 8).

3.1 Signal Detection in Noise

3.1.1 Tone-on-Tone Masking

A bottlenose dolphin’s (Tursiops truncatus) ability to detect a tonal signal (the “probe”) in the presence of another tonal signal (the “masker”) was first investigated by Johnson (1971). In this experiment, behavioral thresholds for a 70 kHz probe tone were estimated in the presence of a masking tone where the frequency and SPL of the masker were independent variables. The masking pattern was similar to what is found in humans in that, (1) more masking occurred when the probe and masker frequencies were similar, (2) lower masker frequencies had a greater masking effect than higher masking frequencies, and (3) higher SPL noise masked a broader range of frequencies than lower SPL noise. As with humans, when the masker and probe frequencies were very similar, detection thresholds actually decreased rather than increased (Fig. 10.1). In humans, this threshold decrease was associated with the perception of “beats.” Presumably, when both the probe and masker tones fall within a single auditory filter, listeners no longer perceive two tones, but instead, a single amplitude-modulated tone with a modulation rate equal to the frequency difference between the tones. The dolphin, like humans, might have also perceived beats and used this cue for signal detection.

Fig. 10.1
figure 1

Two tone masking [adapted from Johnson (1971)]. The vertical line indicates the frequency of the probe tone. Symbols indicate the threshold of the probe tone in the presence of the masker tone at various frequencies and SPLs. The 80 dB re 1 μPa masker was repeated with different results, apparently this difference reflected learning by the dolphin

Neurophysiological techniques have also been used to measure frequency tuning curves in a number of odontocetes using the tone-on-tone masking paradigm (Popov et al. 1996; Supin and Popov 1986). In these studies, the addition of a tonal masker was found to suppress the evoked response to a tonal probe much in the same way that tonal maskers affect the detectability of tones in psychophysical experiments. For short duration tone-pip stimuli, masker frequencies below the tone-pip frequency produced a tuning curve with an average slope of 52 dB/octave. For masker frequencies higher than the tone-pip frequency, the average slope of the tuning curve was 96 dB/octave, almost twice as steep as that of lower frequency maskers. A common feature of the above studies is that lower frequency maskers appear to have a greater masking effect on higher frequency tones than vice versa. This result is directly related to basilar membrane mechanics discussed earlier. When the basilar membrane is excited by two or more tones of different frequencies, the traveling wave of the lower frequency tone will propagate through the higher frequency regions thus causing a greater masking effect on the higher frequency even when the frequency separation is relatively large.

3.1.2 Critical Bands and Critical Ratios

Fletcher (1940) conducted a series of seminal experiments with human listeners that have been repeated with several animal species including a few odontocetes and pinnipeds. Using a band-widening paradigm, Fletcher discovered that thresholds for a tonal signal centered in band-limited Gaussian noise increased proportionally with the bandwidth of noise, but only up to a certain “critical bandwidth.” Noise bandwidths beyond this critical bandwidth no longer contributed to the masking of the signal. To account for this result, Fletcher envisioned the auditory system behaving as a series of continuously overlapping band-pass filters, where masking only occurred if the signal and the masker were within a common auditory filter or critical bandwidth (CB). Because of this relationship, the bandwidth of a hypothetical auditory filter can be estimated by simply measuring tonal thresholds in broadband noise, since only the noise within an auditory filter centered on the signal will effectively mask the signal. If the power spectral density of the noise, N, and the power of the signal at threshold, S th, are known, the CB is given by

$$ \Updelta F_{\text{CB}} = {{S_{\text{th}} } \mathord{\left/ {\vphantom {{S_{\text{th}} } {\left( {K \cdot N} \right),}}} \right. \kern-0pt} {\left( {K \cdot N} \right),}} $$
(10.1)

where ΔF CB is the CB and K is a constant. If K is assumed to equal 1, the equation simplifies to

$$ \Updelta F_{\text{CR}} = {{S_{\text{th}} } \mathord{\left/ {\vphantom {{S_{\text{th}} } {N,}}} \right. \kern-0pt} {N,}} $$
(10.2)

where ΔF CR is called the critical ratio (CR). The CR expressed as a frequency level, in dB re 1 Hz, is calculated by subtracting the noise pressure spectral density level (L N, in dB re 1 μPa2/Hz) from the signal SPL at threshold (L S, in dB re 1 μPa):

$$ L_{\text{CR}} = L_{\text{S}} - L_{\text{N}} . $$
(10.3)

For example, a CR of 20 dB re 1 Hz (equivalent to 100 Hz) states that the signal must be 20 dB greater than the noise spectral density level of the masker to be detected. This simple metric is most commonly used to predict masking effects of noise found in a marine mammal’s environment (e.g., anthropogenic noise, see Chap. 14). Compared to the band-widening technique used to estimate CBs, CRs require only a fraction of the time and effort with respect to data collection. As a result, CRs have become a standard first step at understanding auditory masking in many marine mammal species.

Critical ratios for several odontocete cetaceans demonstrate a similar pattern of masking in which more masking occurs at high frequencies, presumably because of the increasing bandwidth of auditory filters at higher frequencies (Fig. 10.2). CRs appear flat for signal frequencies of 1 kHz and below. Critical ratios for pinnipeds also demonstrate an increase as a function of signal frequency for both underwater and airborne sounds (Fig. 10.3). CRs and CBs for both odontocetes and pinnipeds suggest that auditory filter bandwidths increase as a function of the center frequency of the filter.

Fig. 10.2
figure 2

Critical ratios measured in different odontocete species

Fig. 10.3
figure 3

Critical ratios from different pinniped species

3.1.3 The Power Spectrum Model of Masking and the Auditory Filter

Fletcher’s (1940) original concept of an auditory filter bank developed into what is now referred to as the power spectrum model (PSM) of auditory masking (Patterson and Moore 1986). The model makes the following assumptions:

  1. (1)

    The auditory system can be modeled as a series of continuously overlapping band-pass filters.

  2. (2)

    Only the spectral components of a noise masker that are within a filter centered on the signal frequency will effectively mask the signal.

  3. (3)

    Signal detection is accomplished by monitoring an energy detector at the output of the filter centered on the signal. More energy will be present in a signal-plus-noise interval than a noise-alone interval.

  4. (4)

    Signal thresholds are proportional to the noise power that passes through a single auditory filter. Noise is represented by its long-term spectrum.

Formally, the PSM can be expressed as:

$$ P_{s} = K\int\limits_{ - \infty }^{\infty } {N(f)W(f){\text{d}}f,} $$
(10.4)

where P s is the power of the signal at threshold, N(f) is the noise power spectral density and W(f) is a weighting function described by the shape of the auditory filter. Auditory filter shapes have been derived for bottlenose dolphins (Finneran et al. 2002a; Lemonds 1999) and a beluga (Delphinapturus leucas, Finneran et al. 2002a) using a behavioral response, notched-noise masking paradigm (Patterson 1976). An assumption is made that the auditory filter shape can be estimated by a simple-rounded exponential function (roex) with a limited number of free parameters. In both Finneran et al. (2002a) and Lemonds (1999) a two-parameter, roex (p,r) function was used:

$$ W(g) = \left( {1 - r} \right)\left( {1 + pg} \right)\,e^{ - pg} + r $$
(10.5)

where g is the normalized frequency deviation [g = |ff o |/f o , where f is frequency and f 0 is the signal frequency], and p and r are adjustable parameters. Common features of the auditory filters are that bandwidths increase with both increased noise level and increased center frequency. The relationship between bandwidth and center frequency of the filter can be described by the quality factor, Q:

$$ Q \, = \, f_{o} /\Updelta f, $$
(10.6)

where f o is the frequency of the signal and Δf is the filter bandwidth. For many mammals, the entire auditory periphery can be reasonably approximated using the same value for Q (constant-Q filters). Auditory filter Q values tend to vary depending on the methodology used to estimate thresholds. For example, Q values of 2.2 and 12.3 were estimated for a bottlenose dolphin using CB and CR techniques, respectively (Au and Moore 1990).

High Q values reflect narrow filter bandwidths which result in enhanced frequency resolution, with the trade-off of compromised temporal resolution. Auditory filter banks for bottlenose dolphins and belugas have properties where frequency resolution is best at lower frequencies while temporal resolution is better at higher frequencies (Fig. 10.4). This may not be the case for smaller porpoises. Tuning curves derived from electrophysiological measurements suggest at least two species of porpoises (Phocoena phocoena and Neophocaena phocaenoidis asiaeorientalis) have auditory filter banks with relatively constant bandwidths across frequencies (Popov et al. 2006). Such a filter bank may allow for enhanced frequency resolution at the cost of compromised temporal resolution. A recent re-evaluation of critical ratio data suggests that the auditory filter bank of the bottlenose dolphin might be better modeled as a constant-Q filter bank for frequencies below 40 kHz and a constant bandwidth filter bank for frequencies above 40 kHz (Lemonds et al. 2011).

Fig. 10.4
figure 4

Roex auditory filter banks for a Tursiops truncatus, b Delphinapterus leucas, and c Phocoena phocoena

Modeling the auditory periphery proves useful not only for describing auditory masking, but the auditory filter banks can be used to model other hearing phenomena such as discrimination and recognition abilities during passive hearing and echolocation (Au et al. 2009; Branstetter et al. 2007; Roitblat et al. 1993). Figure 10.4 displays roex(p,r) auditory filter banks constructed for three odontocete species: bottlenose dolphins (Lemonds 1999), belugas (Finneran et al. 2002a), and harbor porpoises (Popov et al. 2006). Filter bandwidths for these three species predict that critical ratios at higher frequencies should be highest for the dolphin and lowest for the harbor porpoise, which is consistent with the empirical findings in Fig. 10.3.

3.2 Masking with Complex Stimuli

3.2.1 Comodulation Masking Release

The use of simple but well-defined stimuli in masking experiments has proven useful in elucidating the underlying mechanisms of the auditory system. For example, the power spectrum model of masking, which is based almost exclusively on experiments using pure tones and Gaussian noise stimuli, can adequately describe most of the masking results discussed thus far in this chapter. This is not surprising since most of these experiments were conducted using pure tones and Gaussian noise. However, sounds marine mammals encounter in their natural environment are likely to be more complex than pure tones and Gaussian noise. Models derived from simple stimuli may be limited in their ability to generalize to environmental noise. For example, one of the primary assumptions of the PSM is that only noise within a CB centered on a signal contributes to the masking of that signal. However, if the noise is coherently amplitude modulated (comodulated noise) across frequency regions, a release from masking relative to a Gaussian masker of the same pressure spectral density occurs for noise bandwidths greater than a CB; i.e., more total noise power results in less masking. This phenomenon is known as comodulation masking release (CMR) and has been demonstrated in anurans (Chap. 6), birds (Chap. 8), and several mammalian species (Bee et al. 2007; Nelken et al. 2001; Pressnitzer et al. 2001), including humans (Hall et al. 1990) and the bottlenose dolphin (Branstetter and Finneran 2008). (For a discussion of potential CMR in insects see Chap. 3). Figure 10.5 displays masked threshold patterns for both Gaussian and comodulated noise within a standard band-widening paradigm (Fletcher 1940). Consistent with the PSM, thresholds for Gaussian noise increase up to a specific bandwidth (the CB) and then asymptote because noise at frequencies beyond the CB no longer contributes to the masking of the signal. A similar pattern emerges for comodulated noise for masker bandwidths less than the CB. However, there is a monotonic decrease in thresholds for masker bandwidths greater than the CB. The release from masking is substantial (17 dB at the largest bandwidth) and is beyond the capability of the PSM to explain. Although several explanations for CMR have been proposed, numerous studies suggest that the auditory system compares temporal envelopes between an auditory filter centered on the signal and flanking auditory filters (Hall et al. 1984; McFadden 1988). The addition of a tonal signal to comodulated noise decreases the modulation depth in the signal channel, thus reducing the envelope correlation between the signal and flanking bands. The presence or absence of a tonal signal can be determined by comparing envelope correlation across frequency channels (Hall et al. 1984).

Fig. 10.5
figure 5

a Masking patterns for Gaussian and comodulated noise (adapted from Branstetter and Finneran 2008) and b critical ratios from three different noise types (data calculated from Trickey et al. 2011)

The extent to which ocean noise is comodulated has not been fully investigated; however, at least two studies suggest CMR may play a role in auditory masking for environmental noise that marine mammals encounter. Erbe (2008) estimated detection thresholds for a beluga using pure tones and beluga vocalization signals with Gaussian, ice-cracking, underwater bubble generator, and propeller noise types. Thresholds for ice-cracking noise, which is comodulated, were at least 6 dB lower than the other uncomodulated noise types (Erbe 2008). A similar release from masking was found for bottlenose dolphins detecting a 10 kHz pure-tone in snapping shrimp noise (Trickey et al. 2011), which is also comodulated. CRs from Gaussian noise overestimated masked thresholds using snapping shrimp noise, primarily because CRs assume that only noise within a single auditory filter contributes to masking.

Additional studies, initially using realistic signals and maskers and then using controlled stimuli, are needed to determine not only the masking patterns for realistic sounds, but also the mechanisms that govern these masking patterns. If environmental noise is similar to Gaussian noise, the PSM can provide accurate predictions. However, if natural noise is not Gaussian, additional mechanisms yet unknown will need to be determined before accurate predictions can be made.

3.2.2 Spatially Separated Sound Sources

In realistic acoustic environments with multiple sound sources, detecting a biologically relevant signal in noise depends not only on the physical attributes of the signal and noise, but also on the location of the signal and noise relative to each other and to the listener’s position and orientation. In humans, where research on this topic is more extensive, the relative position of sound sources can act as one of the most salient cues in segregating multiple sounds in a complex auditory scene (Bregman 1990), and can lead to a spatial release from masking (SRM). Many types of ocean noise (e.g., boat vessel noise, industrial sites) are emitted from directional sources that can be well off-axis from a biologically relevant signal. In such situations, masking predictions based only on the CR may over-estimate the amount of actual masking.

Au and Moore (1984) measured hearing thresholds for pure tones emitted from an on-axis transducer while Gaussian noise was emitted from a second transducer that varied in position in both the horizontal and vertical planes. Although the authors intended to measure the dolphin’s receiving beam pattern, their data are also an example of a spatial release from masking. Figure 10.6 displays threshold values relative to when the noise source was directly in front of the animal (i.e., the position where most masking occurs).

Fig. 10.6
figure 6

Spatial release from masking (i.e., receiving beam patterns) for the bottlenose dolphin (adapted from Au and Moore 1984)

Levels at off-axis positions represent the amount of SRM. Off-axis noise positions produced less masking and the effect was stronger at higher frequencies. Au and Moore (1984) were interested in the receiving beam pattern for processing echolocation signals, and as a result, only tested frequencies of 30 kHz and above and only at angles in front of the animal. Lower frequencies associated with communication were not tested, although if the trend that lower frequencies exhibit less SRM holds true, communication signals will likely be more susceptible to masking than sonar signals. Furthermore, noise locations behind the animal will likely result in even a larger SRM. Additional studies using lower frequencies are therefore warranted.

SRM for airborne sounds has been studied with a harbor seal (Phoca vitulina) and California sea lion (Zalophus californianus) using a similar approach as Au and Moore (1984), except that the noise transducer’s position was held constant at the on-axis position and the position of the signal transducer varied in the horizontal plane (Fig. 10.7, Holt and Schusterman 2007). Because detection thresholds will vary as a function of position even without masking noise, Holt and Schusterman (2007) used a metric called the masking level difference (MLD) to account for unmasked thresholds differences:

Fig. 10.7
figure 7

Masking level differences for the harbor seal (Phoca vitulina) and the California sea lion (Zalophus californianus)

$$ MLD = \left( {M_{q} - M_{0} } \right) - \left( {U_{q} - U_{0} } \right), $$
(10.7)

where U 0 and U q are the unmasked threshold at 0° and q°, respectively, and M 0 and M q are masked thresholds at 0° and q°, respectively. Overall, the results suggest that signals are better detected when they are separated in spatial location from the noise, although the relationships between threshold, frequency, and noise angular position were inconsistent across these two species. The difference in MLD patterns may be related to differences in external ear (i.e., pinnae) morphology between these species or to individual differences between the subjects.

3.3 Echolocation

Of all the marine mammals, only odontocete cetaceans have conclusively demonstrated the ability to echolocate. Although their detection, discrimination, and recognition abilities have been well-studied, very little research has been conducted on their ability to echolocate in the presence of noise. What is known is that odontocetes appear to have the capability to modify their echolocation signal to compensate for noise levels. This was demonstrated when echolocation discrimination tasks were conducted in both San Diego Bay, California and Kaneohe Bay, Hawaii with the same beluga (Au et al. 1988). The ambient noise in both locations is dominated by snapping shrimp, although the noise spectral density levels in Kaneohe Bay were typically 15–20 dB greater than those of San Diego Bay. Beluga clicks recorded in San Diego bay typically had peak–peak (p–p) source levels between 201 and 202 dB re 1 μPa, with peak frequencies typically between 40 and 60 kHz. However, in Kaneohe Bay, which possessed higher ambient noise levels, the beluga clicks had p–p source levels between 210 and 214 dB re 1 μPa, with peak frequencies between 100 and 120 kHz. Apparently, the animal increased the level and peak frequency of its incident signal to compensate for the increased ambient noise in Kaneohe Bay. It is unclear, however, if the animal intentionally shifted the peak frequency of its signals to the higher end of the spectrum to avoid low-frequency masking. Odontocete echolocation signals show a strong positive correlation between amplitude and peak frequency (Au 1980), suggesting the frequency shift may have simply been a by-product of increasing the source level (see Chap. 7 for a similar discussion for bird songs).

3.4 Consequences of Auditory Masking

The most obvious consequence of auditory masking is a reduction in the distance at which an animal could detect a sound of interest. Because sound absorption is frequency-dependent, with low frequencies traveling farther than higher frequencies, low-frequency noise has the potential to affect marine mammals at larger distances compared to higher frequency noise. Consequently, the communication ranges of mysticetes that rely on very low-frequency sounds have likely been reduced (compared to preindustrial ranges), thus compromising the biological functions of these signals (Clark et al. 2009). Communication ranges of other marine mammals (e.g., odontocetes and pinnipeds) that utilize higher frequency sounds may be affected by auditory masking by higher frequency noise sources such as small boat engines and marine construction. For specific scenarios involving Gaussian-like noise sources, knowledge or estimates of the hearing threshold and CR for a species, along with the signal and noise properties, can be used to estimate the resulting detection range (e.g., Clark et al. 2009; Janik 2000). For more complex noise sources that may be comodulated, simple estimates based on Gaussian noise and the PSM will tend to over-estimate the masking effects of noise and under-estimate the range at which a particular signal can be detected.

Simple models for masking and animal communication range also typically do not include the compensatory mechanisms that animals use to communicate in suboptimal environments. For example, when humans communicate in noisy environments, we often increase speech amplitude, move closer together, read lips, turn our backs toward a noisy sound source, or simply leave the noisy area. Marine mammals appear to employ similar strategies but little is known about their effectiveness or cost. If an animal is able to leave, or avoid an area of potential masking there may be associated metabolic costs that are yet to be determined. In many circumstances, leaving a zone of auditory masking may not be an option (e.g., pervasive low-frequency shipping noise). Some areas may be too important to leave such as feeding and breeding grounds. In these cases, an animal may attempt to compensate for the noise by increasing its signal amplitude while communicating (Holt et al. 2008; Parks et al. 2011), shifting signal frequencies (McDonald et al. 2009), or increasing its repetition rate or duration (Miller et al. 2000). Again, compensation may come with a cost and the effectiveness is unknown. In other cases, consequences may be unavoidable and may include a decreased ability to maintain group cohesion, decreased ability to detect predators and prey, and decreased foraging and breeding success.

Detection of a sound only implies that the sound registered in the listener’s auditory system. If an animal can detect a signal but is unable to recognize or make sense of the information (e.g., humans detecting speech but not understanding it because of noise) the signal’s utility will be lost. The harmonic structure of odontocete whistles has a direction-dependant pattern (Branstetter et al. 2012) that has been hypothesized to convey information on location and direction of travel of the signaler (Lammers and Au 2003; Miller 2002). If odontocetes use the whistle harmonic structure to monitor the direction of travel of group members, masking may reduce the animal’s ability to maintain group cohesion when separated at larger distances. The potential effect would be to limit the distance between group members, and thus reduce the area covered during cooperative behaviors such as foraging.

4 Noise-Induced Threshold Shifts

Most adults living in industrialized countries have experienced a loss of hearing sensitivity, and eventual recovery, after exposure to high intensity sound at concerts, while operating firearms, or in the presence of industrial machinery or power tools. This phenomenon is called a noise-induced threshold shift (NITS), and is characterized as an increase in auditory threshold (loss of sensitivity) over some frequency range, that persists after the cessation of a noise exposure. The magnitude of a NITS generally decreases with increasing time after the noise exposure. If the hearing threshold returns to normal after some period of time, the NITS is called a temporary threshold shift (TTS). If, however, thresholds remain elevated after some extended period of time (typically 30 days), then the remaining amount of NITS is called a permanent threshold shift (PTS). The term compound threshold shift (CTS) is used to describe an initial NITS that only partially recovers, leaving some residual PTS; i.e., a CTS represents some combination of TTS and PTS (Ward 1997). Figure 10.8 illustrates the relationships between TTS, PTS, and CTS.

Fig. 10.8
figure 8

Distinctions between TTS, PTS, and CTS

A NITS may result from a variety of mechanical and biochemical processes, including physical damage or distortion of the tympanic membrane and cochlear hair cell stereocilia, hair cell death resulting from oxidative stress, changes in cochlear blood flow, and swelling of cochlear nerve terminals from glutamate excitotoxicity (Henderson et al. 2006; Kujawa and Liberman 2009). Although the outer hair cells are the most prominent target for noise effects, severe noise exposures may also result in inner hair cell death and loss of auditory nerve fibers (Henderson et al. 2006). Recent studies in mice have also revealed that a TTS near the limits of reversibility, e.g., a 40 dB maximum TTS, measured 24 h after exposure via auditory brainstem response and compound action potential, may result in acute loss of afferent nerve terminals, delayed cochlear nerve degeneration, and permanently attenuated suprathreshold neural responses, despite complete recovery of auditory thresholds (Kujawa and Liberman 2009). These data suggest that there may be progressive consequences to noise exposure not revealed by conventional threshold testing.

A great deal of work has been done to characterize TTS and PTS in humans and other terrestrial mammals (rev Clark 1991; Henderson and Hamernik 1986; Kryter 1973; Melnick 1991; Miller 1974; Quaranta et al. 1998; Ward 1997; see Chaps. 4 and 8 for reviews of TTS and PTS in fish and birds, respectively). The primary emphasis of these efforts has been to predict and mitigate human occupational hearing loss, thus the particular exposure conditions have focused on those conditions most often encountered in industrial or military settings: multi-hour exposure to broadband noise and exposure to impulse and impact noise. A goal of early human work was to relate the amount of TTS experienced at the end of an 8 h work day to the amount of PTS that would be experienced after many years of comparable daily exposures (e.g., Nixon and Glorig 1961). Although these efforts were not completely successful, and no clear predictive relationship has been found between TTS and PTS, much has been learned about the relationships between threshold shifts and exposure parameters such as SPL, duration, frequency, and duty cycle. It is also clear that larger exposures are necessary to produce PTS compared to TTS, thus knowledge of TTS-inducing exposure levels can be used to mitigate the occurrence of PTS. For example, terrestrial mammal data have shown that a NITS less than 40 dB, measured 2–4 min after exposure, is not likely to result in PTS (e.g., Kryter et al. 1966).

TTS and PTS data from humans and terrestrial mammal models have been used to define safe limits for occupational noise exposure. For steady-state (i.e., nonimpulsive) noise exposures, current US regulations prescribe a maximum permissible exposure SPL of 90 dBA for an 8-h period; for each halving of exposure time, the permissible SPL increases by 5-dB, called a 5 dB exchange rate (29CFR1910.95 2009). The maximum permissible exposure to impulsive or impact noise is 140 dB re 20 μPa peak SPL (29CFR1910.95 2009).

Despite the wealth of knowledge accumulated via human and terrestrial mammal studies, the applicability of these data to marine mammals is limited. There are significant differences between the peripheral auditory systems of marine and terrestrial mammals and the sound transduction mechanisms in air and water, thus direct extrapolation of human noise exposure criteria to marine mammals is not practical. Also, the types of noise exposures most relevant for people (e.g., 8-h exposure to broadband noise) may not be relevant to marine mammals exposed to shorter duration, intermittent sources such as military sonars, pile driving, and seismic airguns. For these reasons, a number of TTS measurements have been conducted with marine mammals to determine noise exposure conditions necessary for TTS, and to predict those capable of causing PTS, in these animals.

4.1 Measuring NITS in Marine Mammals

Studies of NITS in marine mammals have focused on measuring TTS after exposure to relatively long duration, broadband noise (Kastak et al. 1999, 2005, 2007; Kastak and Schusterman 1996; Kastelein et al. 2011; Mooney et al. 2009a; Nachtigall et al. 2003, 2004; Popov et al. 2011), relatively short duration tones (Finneran et al. 2005, 2007c, 2010a, b; Finneran and Schlundt 2010; Mooney et al. 2009b; Ridgway et al. 1997; Schlundt et al. 2000), and single underwater impulses (Finneran et al. 2000, 2002b, 2003; Lucke et al. 2009). Subjects have consisted of bottlenose dolphins, belugas, a harbor porpoise (Phocoena phocoena), Yangtze finless porpoises (Neophocaena phocaenoides asiaeorientalis), California sea lions (Zalophus californianus), a harbor seal (Phoca vitulina), and a Northern elephant seal (Mirounga angustirostris).

The experimental approaches for TTS measurements in marine mammals are analogous to those used to measure TTS in terrestrial mammals. Tests begin with a pre-exposure hearing threshold measurement at one or more frequencies. This is followed by the fatiguing sound exposure—the sound that may cause TTS. Finally, post-exposure hearing thresholds are measured at one or more frequencies. The NITS at each frequency is typically defined as the difference (in decibels) between the post-exposure and pre-exposure thresholds at that frequency, though some studies (e.g., Mooney et al. 2009a, b) have used an average “baseline” threshold instead of the pre-exposure threshold. To assess the recovery of hearing after a NITS, and to verify that the shift was in fact temporary, post-exposure thresholds are typically measured several times, over a period that may extend for several days.

There have been no designed studies of PTS in marine mammals; however, Kastak et al. (2008) reported incomplete recovery of a 50-dB initial threshold shift in a harbor seal, resulting in 7–10 dB of PTS measured about 2 months after exposure.

4.2 Predicting the Onset of NITS

One of the goals of marine mammal TTS research has been to identify exposure levels that are just-sufficient to cause a TTS. These exposure levels are often referred to as “onset TTS” levels, and have been widely used in environmental analyses to estimate the numbers of animals that may be adversely affected by human-generated noise (e.g., US Navy 2008). The first controlled TTS experiments in marine mammals used a 6-dB criterion to identify a measurable TTS (Ridgway et al. 1997; Schlundt et al. 2000); for this reason, a noise exposure sufficient to induce 6 dB of TTS has often been taken as the onset-TTS exposure level.

The onset of PTS in marine mammals has been estimated by assuming that a TTS greater than 40 dB has the potential to result in some PTS. Exposures sufficient to induce 40 dB of TTS are estimated from onset-TTS exposure levels and TTS growth rates (see Southall et al. 2007).

4.3 Parameters that Affect NITS

The major findings to arise from marine mammal TTS experiments parallel findings from terrestrial mammal experiments. As in terrestrial mammals, the most significant factors that affect hearing loss are the exposure SPL, exposure duration, exposure frequency, temporal pattern, and recovery time. In addition to those factors that affect the actual function of the subject’s auditory system, some additional parameters affect the amount of TTS that is measured. For example, the amount of TTS varies with frequency, so the specific hearing test frequency will influence the amount of TTS that is observed. Also, the methodology used to perform the hearing test has been found to affect the amount of TTS observed. The following sections discuss each of these factors individually and provide example data to illustrate what is currently known about TTS in marine mammals.

4.3.1 Hearing Test Method

Marine mammal hearing assessments are conducted using behavioral (i.e., psychophysical) or electrophysiological methods. For behavioral methods, subjects are trained to perform a specific action, such as vocalizing or pressing a paddle, in response to hearing test tones. Tone SPLs are manipulated and the subject’s responses tracked to estimate the threshold. Most TTS studies have used adaptive staircase paradigms, where the tone SPL is reduced after each detection and increased following a nondetection (Cornsweet 1962; Levitt 1971). The threshold is then estimated from the reversal points, where the tone SPL changes from increasing to decreasing or vice versa. During behavioral approaches it is also important to feature signal-absent trials, so that any changes to the subject’s response bias can be identified. Behavioral methods are straightforward to implement and the resulting data are easy to interpret. The amount of time required to obtain a behavioral threshold depends on the specific experimental paradigm. With a staircase procedure and multiple stimulus presentations within each reinforcement interval, behavioral thresholds can be obtained in as little as 2–4 min (Finneran et al. 2005; Schlundt et al. 2000); however, regardless of the specific behavioral test paradigm, initial subject training typically requires several months.

Electrophysiological approaches use passive electrodes placed on the head (Fig. 10.9) to record changes in the electroencephalogram (EEG) that are synchronized with the onset of a sound stimulus. These small voltages, on the order of microvolts, are called auditory evoked potentials (AEPs). To measure AEPs, relatively short duration (typically tens of milliseconds) stimuli are presented hundreds or thousands of times, and the resulting AEPs synchronously averaged, to reduce residual physiological background noise caused by breathing, head movement, eye movement, etc. Marine mammal TTS measurements have generally used amplitude modulated stimuli to produce a steady-state, harmonic AEP called the auditory steady-state response (ASSR) or envelope following response (EFR). The ASSR amplitude at the stimulus modulation rate is recorded as the stimulus SPL is manipulated. Thresholds are based on the lowest detectable response (e.g., Finneran et al. 2007c) or by fitting a curve to the ASSR-stimulus SPL graph and extrapolating to the zero-crossing point (e.g., Nachtigall et al. 2004). The most appropriate modulation rates vary across species; for odontocetes, frequencies around 1 kHz are optimal (e.g., Dolphin et al. 1995; Finneran et al. 2007b, 2009; Nachtigall et al. 2005, 2008; Popov et al. 2005; Schlundt et al. 2011; Supin and Popov 1995), while in pinnipeds, frequencies near 150–200 Hz have worked well (Mulsow and Reichmuth 2007, 2010; Mulsow et al. 2011a, b). Evoked potential thresholds may be obtained as quickly as behavioral thresholds and are not limited by the requirements to train subjects for behavioral testing; however, AEP methods, and especially the ASSR technique, tend to work better at relatively high frequencies. For dolphins, the ASSR method is most effective at frequencies of ~8 kHz and above; in sea lions, the ASSR has been successfully used at frequencies of 500 Hz and above.

Fig. 10.9
figure 9

A bottlenose dolphin participating in an AEP-based hearing test. The electrodes are embedded in suction cups attached to the head, back, and dorsal fin

It is important to keep in mind that ASSR thresholds and behavioral thresholds are not equivalent. Behavioral testing is a cognitive task—the subject must hear the sound stimulus and make a decision whether to respond. The signal processing chain includes the auditory cortex and centrally located processing centers in the brain. In contrast, at the modulation rates typically employed in marine mammal threshold testing, the ASSR is composed of summed neuronal activity from many individual generators at locations ranging from the auditory nerve to the brainstem. In this sense, ASSR and behavioral thresholds provide different glimpses of the function of the auditory system. There is no reason to expect behavioral and ASSR thresholds to perfectly agree—and they normally do not, with ASSR thresholds typically 5–15 dB higher than behavioral thresholds (e.g., Finneran et al. 2007a; Mulsow et al. 2011b; Mulsow and Reichmuth 2010; Schlundt et al. 2007, 2008; Yuen et al. 2005). TTS results obtained with the two techniques may also differ. In the only direct comparison between TTS obtained from behavioral and ASSR threshold measurements (Finneran et al. 2007c), the ASSR technique consistently resulted in larger amounts of TTS and longer recovery times (Fig. 10.10). These data caution against pooling TTS data obtained with behavioral and ASSR methods and show that even after recovery of behavioral thresholds, some functions of the auditory system may still be adversely affected. This suggests that the ASSR technique may be a more sensitive indicator of auditory damage compared to psychophysical threshold testing.

Fig. 10.10
figure 10

Comparison of TTS recovery, from the same exposure, measured using ASSR and behavioral methods (adapted from Finneran et al. 2007c)

4.3.2 Hearing Test Frequency

The specific hearing test frequency will also affect the amount of TTS that is observed. Studies of dolphins and belugas exposed to tones have shown that the maximum TTS does not occur at the exposure frequency, but normally at frequencies one-half to one octave above the exposure frequency (Finneran et al. 2007c; Schlundt et al. 2000). The spread of TTS from tonal exposures can thus extend over a broad frequency range; i.e., narrowband exposures can produce broadband (greater than one octave) TTS (Fig. 10.11). These findings match those from human and terrestrial mammal studies (e.g., McFadden 1986; Ward 1962). For octave band noise exposures, the upward spread of TTS, or “half-octave shift,” has not always been observed, with some pinniped studies showing the maximum TTS near the center frequency of the exposure (Kastak et al. 2005), and dolphin experiments showing the maximum TTS one-half octave above the center of the noise band (Mooney et al. 2009a). This result is also consistent with terrestrial mammal data, where the half-octave shift is most commonly associated with tonal noise exposures. The failure for broadband noise to result in an upward spread of TTS may also be related to the TTS magnitudes induced; as the exposure level increases, the activation area on the basilar membrane spreads more toward the basal end of the cochlea and thus affects higher frequencies to a greater extent (McFadden and Plattsmier 1983). At lower amounts of TTS, the activation pattern tends to be more symmetrical about the noise center frequency.

Fig. 10.11
figure 11

Influence of hearing test frequency on the amount of TTS that is observed. For tonal exposures, the maximum TTS normally occurs one-half to one octave above the exposure frequency (adapted from Finneran et al. 2007c)

4.3.3 Recovery Time

Since TTS is a temporary phenomenon, the amount of TTS observed will be a function of the recovery time—the amount of time that has elapsed since the cessation of the noise exposure. For this reason, numeric subscripts are normally used to indicate the recovery time associated with a specific TTS measurement; i.e., TTS4 indicates a TTS measured 4 min after the exposure.

The amount of TTS normally decreases with increasing recovery time; however, the relationship is not necessarily monotonic, and it is common to see examples of delayed recovery, where the TTS may remain nearly constant for some time after the exposure (e.g., Finneran et al. 2007c; Popov et al. 2011). In many cases the recovery function is not linear with time, but approximately linear with the logarithm of time. In these cases, the recovery rates are often described by the slope of the recovery function; for dolphins, recovery rates between 1.5 and 2 dB per doubling of time have been measured when the initial shifts were ~5–15 dB (Finneran et al. 2007c; Mooney et al. 2009a; Nachtigall et al. 2004). For larger amounts of TTS, up to ~40 dB, recovery rates of 4–6 dB per doubling of time have been measured in a dolphin (Finneran et al. 2007c). For a sea lion, recovery rates from TTS12 of ~20–35 dB were ~2.5 dB per doubling of time (Kastak et al. 2007). Complex TTS recovery patterns have been observed in dolphins after exposure to 3-kHz tones (Finneran et al. 2010a). These curves often contained regions where TTS was linear with the logarithm of time, but also often contained regions with varying slopes. Double exponential functions used to fit human TTS recovery data (Keeler 1968; Patuzzi 1998) fit the dolphin recovery data and, for 3-kHz exposures with durations from 1 to 128 s, the recovery functions were described using TTS4 and recovery time only; i.e., recovery functions did not depend on the specific SPL and duration but only on the resulting TTS4 (Fig. 10.12; Finneran et al. 2010a). The extent to which this result may be extrapolated to other exposure conditions is unknown.

Fig. 10.12
figure 12

TTS recovery after 3 kHz exposures, as a function of TTS4 and the logarithm of post-exposure time (in min). Symbols indicate the experimentally measured values for four dolphins. The color bar indicates TTS in dB (adapted from Finneran et al. 2010a)

4.3.4 Noise Sound Pressure Level

As in many other animal groups, the amount of TTS generally increases with the noise SPL; however, the relationship is neither monotonic nor linear. Ward (1976) defined “effective quiet” as the highest SPL that would not produce a significant TTS or affect recovery from a TTS produced by a prior, higher level exposure. For humans, effective quiet for octave band noise with center frequencies from 250 to 4,000 Hz is around 68–76 dBA (Ward et al. 1976). To date, there have been no studies performed to measure effective quiet in a marine mammal; however, we can estimate the upper limit for effective quiet by examining the lowest noise exposure SPLs that have resulted in measurable amounts of TTS. For dolphins, effective quiet must be less than 155–160 dB re 1 μPa, since this SPL produced TTS after only 30 min of exposure to broadband noise centered around 6–7 kHz (Mooney et al. 2009a; Nachtigall et al. 2004). For sea lions, harbor seals, and Northern elephant seals, effective quiet must be less than 80 dB re 1 μPa, which produced TTS at 2.5 kHz after 22-min underwater exposures to octave band noise centered at 2.5 kHz (Kastak et al. 2005). For sea lions in air, effective quiet must be less than 94 dB re 20 μPa, which produced ~5 dB of TTS after only 25 min exposures to 2.5-kHz, octave band noise (Kastak et al. 2007).

At exposure levels above effective quiet, the amount of TTS increases with SPL in an accelerating fashion. This is illustrated in Fig. 10.13, which shows the increase, or growth, of TTS4 with increasing SPL in a dolphin exposed to short duration, 3-kHz tones (Finneran et al. 2010a). At low exposure SPLs, the amount of TTS is small and the growth curves have shallow slopes. At higher SPLs, the growth curves become steeper and approach linear relationships with the noise SPL. TTS growth curves for dolphins, harbor seals, sea lions, and northern elephant seals have been successfully fit by equations with the form:

Fig. 10.13
figure 13

Growth of TTS4 as a function of SPL for a bottlenose dolphin exposed to 3 kHz tones. Vertical error bars indicate SD for the mean TTS4 in each exposure group. Horizontal error bars indicate the SD for the mean exposure SPLs in each group. The solid lines are functions with the form of Eq. (10.8) fit to the data (adapted from Finneran et al. 2010a)

$$ y(x) = a\log_{10} \left[ {1 + 10^{(x - b)/10} } \right], $$
(10.8)

where y is the amount of TTS, x is the exposure level, and a and b are fitting parameters (Finneran et al. 2005, 2010a; Kastak et al. 2005, 2007). This particular function has an increasing slope when x < b and approaches linearity for x > b (Maslen 1981). The linear portion of the curve has a slope of a/10 and an x-intercept of b. TTS growth curves for dolphins have been shown to be frequency-dependent, with growth rates at 3-kHz of approximately 0.2–0.7 dB/dB, while those at higher frequencies have steeper slopes, such as 1.2 dB/dB at 20 kHz (Finneran et al. 2010a; Finneran and Schlundt 2010). The growth rate for a California sea lion tested in air was ~2.5 dB/dB at 2.5 kHz (Kastak et al. 2007).

4.3.5 Noise Duration

TTS also generally increases with noise duration; however, as with SPL the growth functions are nonmonotonic. Growth functions relating TTS to the exposure duration are also accelerating functions, where the slope is shallow at low amounts of TTS (e.g., less than 10 dB) and becomes increasingly steep as the duration (and amount of TTS) increase. At low amounts of TTS, the functions for TTS growth with increasing exposure duration appear roughly linear (Finneran et al. 2010a; Mooney et al. 2009a), but approach linear behavior with the logarithm of time as the exposure duration and resulting amount of TTS increase. TTS growth functions based on exposure duration, up to about 20 dB of TTS4, have been successfully fit by Eq. (10.8) (Finneran et al. 2010a).

Terrestrial mammal data have shown that if the noise SPL is fixed and the exposure duration continually increased, the amount of TTS will eventually reach a plateau, where further increases in exposure duration do not result in additional threshold shift. This region is called asymptotic threshold shift (ATS). ATS has been hypothesized to represent the upper bound of PTS that could be produced by noise of a specific SPL, regardless of duration (Mills 1976). Exposure durations sufficient to induce ATS in terrestrial mammals have generally been at least 4–12 h (Mills 1976; Mills et al. 1979), much longer than the maximum exposure durations used with marine mammal testing (less than 1 h). As a result, ATS has not been observed in any marine mammals; however, given the similarities in cochlear function it is likely that similar patterns of TTS growth would be found in marine mammals, including regions of ATS. When ATS is taken into account, TTS growth with exposure duration is best described using exponential functions (Keeler 1968; Mills et al. 1979).

4.3.6 Sound Exposure Level and the “Equal Energy Rule”

Sound exposure is an “energy-like” metric, defined as the time integral, over the duration of the exposure, of the instantaneous sound pressure-squared (American National Standards Institute 1994); the term sound exposure level (SEL) refers to the sound exposure expressed in decibels, referenced to 1 μPa2·s in water or (20 μPa)2·s in air (American National Standards Institute 2011). For multiple or intermittent exposures, the cumulative SEL, defined as the total SEL calculated over the “on-time” of the noise exposure, is often used to characterize the exposure. SEL is linearly related to the SPL and logarithmically related to the exposure time, meaning that SEL will change on a 1:1 basis with SPL, and change by 3 dB for each doubling/halving of exposure time. For plane progressive waves, sound exposure is proportional to sound energy flux density, so the use of SEL is often described as an “equal-energy” rule, whereby exposures of equal energy are assumed to produce equal amounts of NITS, regardless of how that energy is distributed over time. Since the SEL changes by 3 dB for each doubling or halving of exposure duration, the use of SEL or an equal energy rule can also be described as a “3-dB exchange rate” for acoustic damage risk criteria. This means that the permissible noise exposure SPL will change by 3 dB with each doubling or halving of exposure time; e.g., an equal energy rule means that if the permissible exposure limit is 90 dB re 1 μPa for an 8-h exposure, the limit for a 4-h exposure would be 93 dB re 1 μPa.

Because threshold shifts depend on both the exposure SPL and duration, it has become convenient to use SEL as a single numeric value to characterize a noise exposure and to predict the amount of NITS. SEL has been shown to be an effective predictor of TTS, and has been useful in establishing acoustic damage risk criteria for marine mammals (Finneran et al. 2010a, 2005; Kastak et al. 2007, 2005; Mooney et al. 2009a). However, the marine mammal studies, like terrestrial mammal studies, have shown that the equal energy rule has limitations, and is most applicable to single, continuous exposures. As the exposure duration increases, the relationship between TTS and SEL also begins to break down. Specifically, duration has a more significant effect on TTS than what would be predicted on the basis of SEL alone (Finneran et al. 2010a; Kastak et al. 2005; Mooney et al. 2009a). This means that if two exposures have the same SEL but different durations, the exposure with the longer duration will tend to produce more TTS. For this reason, recent models for TTS in marine mammals have begun to treat TTS as a function of both exposure SPL and duration, representing TTS growth as a surface rather than a curve (e.g., Fig. 10.14; Finneran et al. 2010a; Kastak et al. 2007; Mooney et al. 2009a).

Fig. 10.14
figure 14

TTS4 as a function of SPL and duration for 3-kHz tone exposures. Symbols represent individual TTS4 values measured in four dolphins. The color bar indicates TTS4 in dB (adapted from Finneran et al. 2010a)

The marine mammal data serve to emphasize that the equal energy rule is an over-simplification. The temporal pattern of noise exposure is known to affect the resulting threshold shift. It is also well-known that the equal energy rule will over-estimate the effects of intermittent noise, since the quiet periods between noise exposures will allow some recovery of hearing compared to noise that is continuously present with the same total SEL (Ward 1997). However, despite its simplistic nature and obvious limitations, the equal energy rule continues to be a useful concept, since it highlights the need to consider both the noise amplitude and duration when predicting auditory effects. Early efforts to mitigate the effects of noise on marine mammals often neglected the noise duration and predicted zones of hearing loss based on the SPL alone. Predictive models have significantly advanced since, and the use of SEL, while clearly not perfect, is simple, allows the effects of multiple noise sources to be combined in a meaningful way, and is accurate, especially when applied to a limited range of noise durations. The use of cumulative SEL for intermittent exposures also errors on the side of caution since it will always over-estimate the effects of intermittent sources.

4.3.7 Noise Frequency

For humans, TTS increases with increasing noise frequency, at least up to 2–6 kHz, which is near the range of best hearing sensitivity (Elliott and Fraser 1970; Miller 1974). Because of the similarities in inner ear structure/function, it seems logical that marine mammals would respond in a similar fashion; i.e., that animals would be more susceptible to TTS at frequencies where auditory sensitivity is higher. Most marine mammal TTS data, however, have been collected at relatively low frequencies, generally between 1 and 10 kHz. This frequency range contains some of the most intense anthropogenic sources, but is below the region of best sensitivity for many species. Early TTS data obtained at multiple noise frequencies in dolphins did not reveal significant differences in TTS onset at 3, 10, and 20 kHz, perhaps because of inter-subject differences in susceptibility or because TTS values were based on masked hearing thresholds (Schlundt et al. 2000). As a result, most acoustic impact criteria have used similar numeric thresholds for the onset of TTS, regardless of exposure frequency (e.g., Southall et al. 2007). More recent data, however, have revealed large differences (~15 dB) between TTS onset at 3 kHz compared to 20 kHz (Finneran and Schlundt 2010; Finneran et al. 2007c). TTS growth rates in dolphins have also been shown to increase with exposure frequency above 3 kHz, with the maximum growth rate, and lowest threshold for the onset of TTS, occurring near 14–28 kHz in dolphins (Finneran 2010; Finneran and Schlundt 2010). The occurrence of maximum TTS in an odontocete at a few tens of kilohertz, but not at the frequency of maximum sensitivity, is also supported by the data of Popov et al. (2011), who found higher susceptibility in the Yangtze finless porpoises at 32 kHz compared to higher frequencies.

These data demonstrate the need for frequency-specific criteria for noise susceptibility. For humans, susceptibility to noise across frequency is handled through the use of auditory weighting functions. Weighting functions describe a series of frequency-specific correction factors, or “weights” that are added to noise levels to increase the calculated noise dose at frequencies where individuals are more susceptible, and to decrease the noise dose at frequencies where individuals are less sensitive. Human auditory weighting functions were derived from equal loudness contours and measures of subjective loudness level, not auditory sensitivity. For marine mammals, equal loudness levels have only recently been measured (Finneran and Schlundt 2011), and only in a single bottlenose dolphin. Auditory weighting functions derived from the equal loudness contours agree remarkably well with TTS onset values in dolphins exposed to short duration tones (Finneran 2010), and suggest that, in the absence of equal loudness level data for other species, the use of auditory sensitivity curves as weighting functions may provide a reasonable alternative.

4.3.8 Temporal Pattern of Noise

Most marine mammal TTS experiments have featured single, continuous noise, or single impulses, and there have been only two studies designed to examine the effects of intermittency and temporal pattern on TTS (Finneran et al. 2010b; Mooney et al. 2009b). These studies have shown that TTS can accumulate across multiple exposures, but the resulting TTS will be less than the TTS from a single, continuous exposure with the same total SEL. This result is not surprising, since the equal energy rule is known to over-estimate the effects of intermittent noise because it does not account for recovery that may occur in the quiet intervals between noises. Finneran et al. (2010b) found that the modified power law model (Humes and Jesteadt 1989) fit the growth of TTS across multiple, short duration tonal noise exposures; however, it is unknown to what extent this method would fit other test conditions.

4.3.9 Impulse Noise

The term “impulse noise” is generally used to denote any short duration, high amplitude sound with relatively broad frequency content and relatively fast rise time. Common examples of impulsive sound sources would include impact pile driving, explosions, and seismic air guns. Terrestrial mammal studies of the auditory effects of impulse noise have revealed that impulse noise may be particularly hazardous to hearing, and that the variability associated with NITS measurements is higher when using impulsive fatiguing sources (Henderson and Hamernik 1986). In addition to the factors affecting NITS listed above, the rise time and number of impulses will also affect the resulting amount of NITS (Henderson and Hamernik 1986).

Very few TTS studies have been conducted with marine mammals exposed to impulsive noise sources. Finneran et al. (2000) exposed dolphins and a beluga to single impulses from an array of underwater sound projectors designed to produce pressure signatures resembling underwater explosions, but found no TTS after exposure to the highest level the device could produce (SEL = 179 dB re 1 μPa2·s). Similarly, no TTS was found in two California sea lions exposed to single impulses from an arc-gap transducer with SELs of 161–163 dB re 1 μPa2·s (Finneran et al. 2003). Finneran et al. (2012, 2011) also reported preliminary data showing no behavioral TTS in three bottlenose dolphins exposed to a sequence of 10 impulses, produced from a seismic air gun at an interval of 10 s/impulse. The cumulative SEL for the 10 impulses was ~176 dB re 1 μPa2·s. One of the three dolphins had also been exposed to 10 impulses with cumulative SEL of ~195 dB re 1 μPa2·s with no TTS (Finneran et al. 2011).

For impulse noise studies, measurable TTS has only been observed in a single beluga exposed to an impulse from a seismic watergun (Finneran et al. 2002b), and a single harbor porpoise exposed to an impulse from a seismic air gun (Lucke et al. 2009). The SEL necessary for the onset of TTS in the beluga was 186 dB re 1 μPa2·s, 9 dB lower than that required for TTS after exposure to a 1-s tone (Schlundt et al. 2000), which supports the idea that impulsive noise exposures are more hazardous than nonimpulsive exposures with the same energy. The exposure SEL required for onset TTS in the harbor porpoise was ~164 dB re 1 μPa2·s; however, the impulsive data are the only TTS data available at present for harbor porpoises, so there can be no impulsive/nonimpulsive comparison. At present, the relationship between exposure frequency content and the occurrence and frequency spread of impulse noise TTS is unclear. The TTS in the beluga and the harbor porpoise exposed to single impulses occurred at frequencies above the predominant energy in the exposures, suggesting an upwards shift in TTS as one would expect based on terrestrial mammal data (Finneran et al. 2002b; Lucke et al. 2009). It is also possible that the failure of air gun impulses to produce TTS in a dolphin at cumulative SELs higher than those producing TTS in a beluga exposed to a single impulse may be related to the frequency content of the exposures (Finneran et al. 2012, 2011).

4.4 Perceptual Consequences of NITS

Exposures required for the onset of TTS are relatively large; e.g., for dolphins exposed to short duration tones at 3-kHz, the SEL required for TTS is about 195 dB re 1 μPa2·s (Finneran et al. 2005; Schlundt et al. 2000). This means that for short or moderate duration exposures, relatively high SPLs are generally required to induce TTS in marine mammals. This in turn results in relatively small areas around a sound source where received levels may reach sufficient values to induce TTS, and even smaller regions where a PTS may occur. From this standpoint, a NITS may not be as significant to marine mammal populations as other potential effects, such as masking, which may occur at lower received SPLs and thus within larger areas around a sound source. However, from an individual animal’s perspective, a NITS could be a serious consequence, since the loss of hearing sensitivity associated with PTS is permanent and that associated with TTS could last for hours to days after the cessation of the noise. During this time, any activities that depended upon the animal’s hearing ability would be compromised to a degree determined by the extent and character of the hearing loss.

The consequences of a NITS will vary depending on the extent and frequency regime of the loss, the amount of time required for recovery, and the particular hair cell populations that are affected. For humans, the severity of hearing loss is normally described categorically as normal (0–15 dB hearing loss), slight (16–25 dB), mild (26–40 dB), moderate (41–55 dB), moderately severe (56–70 dB), severe (71–90 dB), and profound (91 dB or more) (Clark 1981). Although this scale is for humans, it gives an idea of the significance of various amounts of hearing loss to an animal; i.e., a NITS of 10 dB is a small amount of hearing loss, while 70 dB could be considered severe.

The most obvious consequence of a NITS is an increase in absolute threshold, which may arise from loss or damage to inner and/or outer hair cells. Elevated hearing thresholds would result in reduced detection ranges for sounds within the frequency range of loss, potentially affecting communication, navigation, and echolocation detection ranges during foraging. Damage or loss of outer hair cells would also reduce the active cochlear processes and cause a reduction in the compressive nonlinearity in the basilar membrane motion and a loss of frequency selectivity, which would broaden the excitation pattern along the basilar membrane (Moore 1998). Reduced frequency selectivity can in turn affect loudness perception, frequency discrimination, and the perception of complex sounds (Moore 1998). Abnormal frequency selectivity may also cause masking effects to be more pronounced in hearing-impaired listeners, especially when the masker and signal frequencies differ (Moore 1996). Hearing loss is often accompanied by a phenomenon called loudness recruitment, where the growth rate of loudness is higher in impaired ears compared to normal ears. This can cause an exaggerated sense of dynamic fluctuations in sounds, since the apparent loudness would change more dramatically than for a normal listener (Moore et al. 1996). Loudness recruitment could also result in an exaggerated sense of how fast a sound source is approaching or receding, since recruitment would result in a higher rate of a change of loudness compared to an unimpaired ear. Unilateral hearing loss can result in abnormal binaural or spatial hearing, leading to difficulties in localizing sound sources and using spectral cues to identify sound sources within background noise, and making it more difficult to spatially separate the locations of sound sources amidst background noise (Moore 1996, 1998). Hearing loss can also affect temporal resolution, making it more difficult to follow the temporal structure of time-varying sounds.

5 Conclusions

Much progress has been made in understanding the function of the auditory system in marine mammals and the potential adverse effects of noise on the hearing of these animals. Much of the resulting data have shown that marine mammal ears are very much analogous to those of terrestrial mammals, a result of their possessing inner ears very similar to terrestrial mammals. Detection of tones and complex sounds in Gaussian and comodulated noise, and measures of TTS have revealed that auditory masking and noise-induced threshold shifts in marine mammals behave similarly as those in humans and terrestrial mammals. The most significant differences concern the specific noise exposures required for masking and threshold shift effects in the various marine mammal species, and the frequency patterns of those effects.

Almost all of our information concerning the effects of noise on marine mammal perception has come from controlled experiments on captive animals. In many cases, the studies involved complex psychoacoustic tasks with “expert” subjects—animals for whom much time and effort have been spent in behavioral conditioning for specific experimental paradigms. Although conducting psychophysical tasks with captive subjects is a time consuming process which limits the maximum number of subjects for whom data can be obtained, many of the questions regarding perceptual effects of noise can only be answered in this fashion, and the degree to which stimuli can be controlled and manipulated cannot typically be matched in field studies.

Despite the progress made in understanding masking and noise-induced threshold shifts in marine mammals, many gaps in our understanding of how marine mammals perceive sound in noisy environments still remain. Aside from extrapolations based on anatomical data, information on mysticete hearing is almost completely lacking. Almost no data on masking and echolocation exist, even though all odontocetes rely on echolocation to capture prey, navigate, and potently detect predators. Almost all masking studies have employed Gaussian noise, assumed masking was restricted to a single auditory filter, and that the noise could be represented by its spectral density. This metric ignores temporal fluctuations that appear to play a significant role in an animal’s ability to segregate a signal from noise. Identifying the proper noise metrics and a better understanding of auditory mechanisms that govern masking will help aid in making more accurate predictions about the effects of noise on communication. Data on noise-induced threshold shifts in marine mammals are available for only very few species, and few individuals within these species. There also remain significant questions regarding the effects of exposure frequency, the rate of TTS growth and recovery after exposure to intermittent noise, the effects of single and multiple impulses, and the extent and manner in which TTS data can be extrapolated to other species.