Keywords

8.1 Introduction

How do animals select out and organize auditory events from an acoustically complex environment? Research efforts aimed at addressing this question were pioneered by Albert Bregman, who carried out foundational experiments on the perceptual organization of sound in humans. Bregman’s book, Auditory Scene Analysis (Bregman 1990), presents a comprehensive overview of 25 years of human research that applied Gestalt principles to studies of human hearing.

Experimental work has revealed organizational principles in human hearing that may inform our understanding of auditory scene analysis by echolocation in bats and dolphins. Using pure tones, harmonic complexes, speech, and a variety of other acoustic stimuli, Bregman showed that human listeners perceptually organize sound stimuli into auditory streams. A classic example of Bregman’s experiments involves the presentation of pure tones that alternate between high and low frequencies. When the frequency separation between the tones is comparatively low (e.g., less than an octave) and the intervals between successive tones is comparatively long, a human listener typically reports hearing out individual tones in the pattern (Fig. 8.1a, upper panel). However, when the tone frequency separation increases and the time interval between them decreases, human listeners instead report hearing two streams of sounds, one high in pitch and the other low (Fig. 8.1a, lower panel). The spectrotemporal separation of tones required for a listener to hear out high- and low-frequency streams depends on details of the stimulus parameters and, to some extent, on the individual listener. Moreover, the perception of auditory streams tends to build up over time, indicating that auditory stream segregation depends on cognitive-perceptual processes. Such processes are likely to operate in a broad range of animal systems as well (Bee and Micheyl 2008 Fig. 8.1b); however, phenomenological reports, the dependent measure in many human auditory scene analysis studies, are not amenable to animal research. Further, the perceptual organization of simplified stimuli, such as tone sequences, holds little biological relevance to animals that rely on natural sounds for species-specific communication, territorial defense, foraging, and navigation. An additional challenge to researchers who wish to understand the perceptual organization of sound in echolocating animals, such as bats and dolphins, is the animal’s active control over the timing and spectral content of their sonar signals, which immediately impact the acoustic information that comprises their experience of an auditory scene.

Fig. 8.1
figure 1

Frequency and temporal separation diagram in human and bat studies. (a) Schematic illustration of the time and frequency parameters that can influence auditory stream segregation in psychophysical studies of auditory scene analysis studies in humans (see Bregman 1990). Upper panel: Human listeners tend to report hearing out individual tones presented in a sequence when the sounds alternate in frequency with comparatively long interstimulus intervals. Lower panel: Human listeners tend to perceive two separate auditory streams (indicated by dashed lines encircling tone sequences) when presented with tones alternating in frequency with larger spectral separation and comparatively short interstimulus intervals. (b) Spectrograms of echolocation calls and echoes that may give rise to perceptual segregation of auditory streams in echolocating bats. Frequency (kHz) is plotted against time (ms). The echolocation signals of two different bats are displayed, one circled in red and the other circled in green. Solid lines encircle the calls and dashed lines encircle the echoes. The timing and frequency of echo returns may contribute to the bat’s perceptual analysis of auditory scenes

8.2 Characterizing Auditory Scenes of Echolocating Animals

The sensory world of an animal is acoustically complex and dynamic. From a barrage of auditory stimuli, echolocating animals face the challenge of detecting, sorting, grouping, and tracking biologically relevant signals to communicate with conspecifics, seek food, engage in courtship, avoid predators, and navigate in space. Sections 8.2.1 and 8.2.2 present an overview of the acoustic information that comprises the natural scene of bats and dolphins in their habitats in air and under water.

8.2.1 Bats

Echolocating bats live and forage in a variety of environments, including dense vegetation, open space, edges of forests, and close to water surfaces (Schnitzler et al. 2003). At night, vision is limited, and echolocation allows animals to orient and forage successfully using sound. Echolocating bats produce high-frequency sonar signals and listen to the retuning echoes to determine the three-dimensional location and features of objects (Griffin 1958; Moss and Schnitzler 1995). Echo returns from the bat’s sonar signals come not only from targets of interest (e.g., food), but also from obstacles, such as trees, buildings, and other animals (Fig. 8.2a). Tracking echoes from isolated objects in open space is a comparatively simple task for the bat, but not one that it regularly encounters. Even for open space foragers, clusters of insects present the acoustic challenge of many overlapping echoes, from which a bat must select and pursue a single prey item at a time (Griffin et al. 1960). In addition, there may be other bats seeking food in the vicinity, also creating an acoustically complex mix of conspecifics’ sonar sounds and echoes from moving prey and obstacles. For a bat hunting insects or fish over water, it must listen to echoes that reveal the object on the surface or water disturbances created by moving prey (Schnitzler et al. 1994; Kalko et al. 1998). In dense vegetation the bat’s auditory scene is far more complex: Echo returns from closely spaced shrubs, trees, branches, and food items create a cascade of echoes, arriving at the bat’s ears from different directions and distances (Moss and Surlykke 2010; Fig. 8.2b). Stationary and large obstacles produce relatively strong echoes, but may be separable from fluttering insect echoes, which contain rapid amplitude and spectral variation produced by moving wings (Schnitzler and Flieger 1983; von der Emde and Menne 1989; von der Emde and Schnitzler 1990; Fig. 8.3a). If fruit hangs stationary amidst vegetation clutter, the bat must discriminate echo features from the fruit and nearby branches and leaves. Bats that take insects from substrate may use two streams of acoustic information, one from active echolocation and the other from passively listening to prey-generated signals (Barber et al. 2003).

Fig. 8.2
figure 2

A bat producing echolocation calls and getting echoes from various prey items and objects in the environment. (a) Complex acoustic environment for the echolocating bat. (Modified from Neuweiler, 1989.) A bat generates echolocation pulses and listens to returning echoes to track prey and avoid obstacles. It is relatively simple task when there is only one bat and one target in open space. However, a bat often encounters several echolocating conspecifics/heterospecifics, pursues multiple targets and forages in cluttered environment. (b) A schematic illustrates the timing of echo returns from different objects and prey in the environment. (Adapted from Moss, C. F., & Surlykke, A. Auditory scene analysis by echolocation in bats. Journal of the Acoustical Society of America [2001] 110, 2207–2226; reprinted with permission from Acoustical Society of America.) The upper panel shows a cartoon of a bat pursuing insect prey in the vicinity of trees. The numbers mark selected instances when the bat produces an echolocation call and the insect’s position when the bat sonar signal ensonifies the insect. The middle panel shows echolocation call spectrograms generated by the bat. The echo delay from different objects in the environment is displayed in the lower panel for different phases of a bat’s prey capture sequence. The left y-axis shows the time before prey capture and the right y-axis shows the prey capture phases (search, approach, buzz). The x-axis shows echo delay. The color corresponds to the echoes reflected from different objects in the environment, tree a (red), tree b (blue) and tree c (green), and insect target (black)

Fig. 8.3
figure 3

Echo recordings from various objects. (a) Echo recordings from four different insect species (von der Emde and Schnitzler 1990). The upper traces show spectrograms and the lower traces show oscillograms. The signal used in this study was a constant 83-kHz tone, which imitated the CF part of Rhinolophus ferrumequinum’s echolocation calls. The tested insect faced three different Fig. 8.3 (continued) directions, 0°, 90°, and 180°, with 0° being head-on. All insects fluttered at 50 Hz. These four insect species are Deilephila: Deilephila elpenor, sphingid moth, Lcpidoptera; Scotia: Scotia exelamationes, noctuid moth, Lepidoptera; Melolontha: Melolontha melolontha, scarabid beetle, Coleoptera; Tipula: Tipula oleracea, cranefly, Tipulidae, Diptera. (b) Echo recordings from a fluttering army worm moth facing four different directions. (Adapted from Moss, C. F., & Zagaeski, M. Acoustic information available to bats using frequency modulated sonar sounds for the perception of insect prey. Journal of the Acoustical Society of America [1994] 95, 2745–2756; reprinted with permission from Acoustical Society of America.) The upper trace of each panel shows the oscillogram and the lower trace of each panel shows the spectrogram. Each panel represents one direction that the moth was facing. The moth drawing in each panel indicates the angle of ensonification. (c) Echo recordings from various plants. (Adapted from Yovel et al. 2009). The upper left panel is from a field recording. The upper right panel is from a plastic model plant (a single elevation angle and five horizontal angles). The bottom left panel shows ensonification of a Ficus plant with decreasing leaf density from 36 angles around the plant. The bottom right panel shows the time signal and spectrogram and spectrum of the emitted signal

To understand how an echolocating animal analyzes its acoustic environment, we begin with a review of the acoustic information carried by echoes returning from various objects, such as insect prey, fruit, flowers, and vegetation, and then design experiments to explore their discrimination and classification of these objects. Schnitzler et al. (1983) and Moss and Zagaeski (1994) recorded sonar echoes from fluttering insect prey, with the goal of characterizing the acoustic information that may be used by echolocating bats to detect and possibly discriminate prey (Fig. 8.3b). Acoustic “glints” in echoes from long constant frequency (CF) bat signals arise from beating wings of flying insects. The glints are characterized by spectral broadening and amplitude peaks that occur in each wingbeat cycle (Schnitzler et al. 1983), and may occur several times in a single echo, depending on the duration of the sonar signal and the insect’s wingbeat frequency. By contrast, echoes reflecting from brief frequency modulated (FM) calls provide the bat with an acoustic “snapshot,” a brief segment of an insect’s wingbeat cycle, because the duration of the FM bat’s signal is shorter than the wingbeat period of even the fastest fluttering insects (Moss and Zagaeski 1994). This means that FM bats must integrate echoes over time if they are to represent the changing profile of dynamic targets.

Yovel et al. (2011) reviewed studies of sonar echoes from objects in the echolocating bat’s natural environment and proposes how this animal may classify complex sonar stimuli, such as vegetation. Researchers broadcast FM or CF signals, similar to bat echolocation calls, directed at objects from different angles and recorded the echoes. Yovel and colleagues described several models, which can be used to classify echoes from different objects (Yovel et al. 2009; Fig. 8.3c). By using statistical models, such as discriminant function analysis (Stilz 2004), or machine learning classifier (Yovel et al. 2008), it is possible to classify correctly most echoes reflected from different plants. Although using statistical methods can correctly classify the vegetation from sonar echoes, behavioral experiments must be carried out to explicitly study the animal’s perception and classification of natural objects.

8.2.2 Dolphins

Dolphins are widely distributed throughout the oceans of the world, and individual dolphin species can be found in a variety of environments, but they generally live close to plentiful sources of food (Benoit-Bird and Au 2009). Dolphin groups (pods) have been reported in shallow coastal and riverine environments, where observations tend to be easily made, and animals are accessible by boats or other water craft. For the most part dolphins are a noisy group of animals; they emit whistles, buzzes, clicks, squeals, and a host of other sounds. Dolphins produce broadband biosonar clicks, with energy in the frequency range of about 20–120 kHz, and the sound energy propagates forward from the animal’s head, transmitted from the nasal area of the forehead. Dolphin sonar transmission characteristics are described in other chapters of this volume (e.g., Fenton, Jensen, Kalko, & Tyack, Chap. 2; Simmons & Houser, Chap. 6) (Fig. 8.4).

Fig. 8.4
figure 4

The dolphin echolocation click train consists of a series of emitted clicks (signals) that usually have an interclick interval exceeding the two-way travel time to the target of interest. The target echo will appear midway between the clicks in a train (plus a few milliseconds). A method of estimating the dolphin attention range is to split the time between two emitted clicks. (a) Echolocation emitted click and target echo (spheroid). The echo is about 2.5 times the duration of the click (about 85 μs). The two-way travel time has been removed for clarity. (b) This panel shows the concept of two-way travel time and the method by which target distance can be estimated from interclick interval (ICI). At a range of about 91 m, the ICI is about 1,400 times the click duration and about 60 times the echo duration

The ocean abounds with flotsam and jetsam, and a dolphin must be able to use its biosonar to determine which echo returns are natural and which are not. Echoes from prey, surrounding obstacles, clutter, the ocean bottom, and the reflective under-surface of the water-air boundary return to the animal in the form of an acoustic cauldron; a mix from which the animal must perceive and extract information relevant to its survival in the natural environment. Research has documented over the past 25 years that dolphins can process complex echo information to determine size, shape, material composition, and other properties of objects (Nachtigall and Moore 1988; Au 1993; Harley et al. 2003), which suggests that dolphin biosonar supports natural auditory scene perception in these animals.

For open water foraging in echolocating dolphins, the ability to detect and track prey targets is of primary importance for capturing fish either alone or in cooperative feeding bouts. Dolphin biosonar is assumed to be a relatively short-range, high-resolution active sensing system, i.e. hundreds of meters. In open water, free from reverberation and the interference of cluttering objects, dolphin detection range has been estimated using the noise-limited transient form of the sonar equation (Urick 1983; Au 1993). The detection range of engineered sonar is a function of several variables (Urick 1983):

$$ \mathrm{DT}=\mathrm{SL}\hbox{--} 2\mathrm{TL}+\mathrm{TS}\hbox{--} \left(\mathrm{NL}\hbox{--} \mathrm{DI}\right), $$

where DT is the detection threshold, SL the source level, TL is the transmission loss, TS is the target strength, NL is the background noise level, and DI the receiver directivity index. As it relates to dolphin biosonar DT, SL, and DI are biologically determined variables and must be estimated based on animal performance in psychoacoustic experiments (see Fig. 8.5; Au 1993, pp. 143–151 for overview; Au et al. 2007).

Fig. 8.5
figure 5

Given the noise-limited form of the sonar equation: DT = SL − 2TL + TS − (NL − DI), the detection threshold range of a target can be estimated based on target strength. In this figure a generalized detection model that predicts the threshold detection range of a fish by an echolocating dolphin in quiet and in noise created by snapping shrimp

Dolphin biosonar has been shown to be highly adaptive, and the animal has control over various aspects of the signal, such as source level, peak frequency, bandwidth, and beam geometry. A variety of environmental conditions, as well as task difficulty, animal age, and experience also influence the animal’s echolocation signals. The biosonar beam is not fixed in either range or cross section and can vary considerably (up to about 32°; Fig. 8.6) on a click-to-click basis (Moore et al. 2008). Even the spectral energy distribution within the beam may vary dynamically between echolocation clicks (Starkhammar et al. 2010).

Fig. 8.6
figure 6

In this single dolphin click, energy in the 30–40, 40–50, and 50–60 kHz bands show clustering in two different spatial regions based on the spectral magnitude of the band-limited frequency distribution of energy (color bar shows normalized SL). (Data from Moore et al. 2008, with permission)

Dolphin identification of objects is based on the echo spectral returns produced by biosonar clicks and is due to material; size; shape; and whether the object is solid or hollow, which can be discriminated by a dolphin using biosonar (Nachtigall and Moore 1988). Several experiments have investigated the ability of the dolphin to integrate, identify, and resolve various spectral cues within a target echo and the ability of dolphins to perceive and detect multiple echo returns (Vel’min and Dubrovskiy 1976; Moore et al. 1984; Au and Banks 1998). A temporal integration window in dolphin echo detection has been suggested and is termed the critical interval (Vel’min and Dubrovskiy 1976). Within this 265-μs interval, all the echo energy appears to be summed and individual echo highlights outside this interval are not. However, Johnson and colleagues (1989) demonstrated that pulses presented within this interval could be discriminated when either a low-amplitude pulse, followed by a high-amplitude pulse, or the reverse; Au and Pawloski (1989) speculated that the relevant cue for this discrimination may have been spectral rippling in echoes (Fig. 8.7).

Fig. 8.7
figure 7

The stimuli used in a dolphin temporal order discrimination experiment used unequal amplitude click-pairs, human listeners can discriminate these stimuli when the clicks are separated by only a few milliseconds. A spectral analysis of these stimuli show that there is no effect on the power spectrum by reversing the temporal order of the pairs and the discrimination was assumed to be from phase sensitivity. For the dolphin, however, the cue for discrimination was asserted to be the ripple in the power spectrum. (a) The click-pair stimuli used by Johnson et al. (1989) with a large-amplitude click 200 μs before a small-amplitude click. (b) The resulting power spectra of the large, small, and the ripple effect of combing the two clicks

Helweg et al. (2003) examined complex multi-highlight echoes and found that a dolphin’s discrimination performance was high when multiple complex echo highlights occurred both within a single integration window and when these highlights were distributed across many integration intervals. These results, taken together, indicate that dolphins have the ability to isolate and process brief acoustic events of lower amplitude echo highlights, while rejecting higher amplitude highlight features, a process that is adaptive for discrimination in reverberant environments, which prevail in the animal’s natural ocean niche. Previous dolphin echolocation research purported the notion that each emitted echolocation click is triggered by the proceeding clicks echo return (e.g. emit a click, wait for the echo, emit the next click) a few other, less known observations, indicated that at long detection ranges dolphins emitted groups of closely timed packets of clicks. Ongoing investigations by Finneran (2013) and associates studying long range dolphin echo detection and discrimination indicate that the dolphin can change its click emission strategy. As the target range is extended beyond 75–100 m. the animal may switch to packet emissions or may simply increase the repetition rate of the overall click train; overlapping emitted clicks and returning echoes. These results further complicate the issue of how, exactly, does the animal integrate echo information and present exciting avenues of new research into the methods and processes of how dolphins perceive the auditory scene via echolocation.

8.3 Studies of Auditory Scene Analysis in Echolocating Animals

Over the last several decades, echolocation research in bats and dolphins has detailed the acoustic cues used to localize and discriminate sonar targets (see Moss and Schnitzler 1995); however, there remains an incomplete understanding of the larger problem of auditory scene analysis, namely how echo features from the natural environment are perceptually organized in the animals sonar receiver. Auditory scene analysis in echolocating animals may involve the combination of passive listening (e.g., communication calls and sonar signals of conspecifics and other natural sounds in the habitats) and active sonar. Here we emphasize the perceptual organization of sounds generated through echolocation.

8.3.1 Bats

Some psychophysical studies have revealed components of echo perception in bats that contribute to auditory scene analysis. The greater horseshoe bat, Rhinolophus ferrumequinum, for example, produces long CF signals, preceded and followed by FM sweeps. This bat species exhibits Doppler shift compensation as it flies (Schnitzler 1968), adjusting the frequency of its sonar emissions with its flight velocity, to ensure that Doppler shifted echoes fall in the region of its highest hearing sensitivity and frequency selectivity (Long and Schnitzler 1975). Doppler shift compensation behavior allows the bat to hear amplitude and frequency modulations in CF echoes introduced by fluttering insect prey, and the greater horseshoe bat can discriminate small changes in wingbeat rate (von der Emde and Menne 1989). Moreover, this bat species can recognize fluttering insects from novel echoes, suggesting that it represents complex acoustic patterns as an auditory object (von der Emde and Schnitzler 1990; Fig. 8.8).

Fig. 8.8
figure 8

Illustration of playback experiment showing the choices of four greater horseshoe bats (Rhinolophus ferrumequinum, RF1–RF4) between different insect echoes (von der Emde and Schnitzler 1990). The bat chose between the echo of the insect, Tipula, turned with its side toward the bat (Tipula 90°) and the echo of another insect species facing in one of three different directions (0°, 90°, and 180°). Each bar in the horizontal histograms shows the percentage of trials the bat chose a certain echo playback. All four individual bats showed a 90 % preference for the echo to which they were trained (Tipula, 90°)

Simmons et al. (1990) conducted a series of psychophysical experiments on the FM bat, Eptesicus fuscus, which suggest that this species converts spectral information in sonar echoes from complex targets to represent the underlying spatial separation of closely spaced reflecting surfaces. In an echo playback experiment, bats were trained to discriminate two-glint echoes, separated by 100 μs, and a single-glint echo. The two-glint playback simulated the reflection from two surfaces, separated by ~1.75 cm and contained spectral notches at 10-kHz intervals, created by the 100 μs offset of its component echoes; the single-glint echo simulated the return from a point target. The delay and attenuation of the two-glint echoes remained fixed across trials, but the delay and attenuation of the single-glint echo changed. Bats showed an increase in errors when the delay of the single-glint target coincided with the arrival time of either the first or second echo of the two-glint target, and these errors also depended on the amplitude of the single-glint target, because time-intensity trading influenced the bat’s perception of the single-glint’s target range. The results of this study suggest that the bat converts the spectral information carried by the interference pattern of the overlapping echoes of the two-glint target into the underlying delay or distance separation of two reflecting surfaces.

When an echolocating bat forages in an acoustically complex environment, each sonar vocalization results in a cascade of echoes arriving from different directions and distances. Further, the bat’s position changes between successive echolocation calls and echoes. If the bat were to respond to single echoes, it would surely fail to intercept moving insects, particularly in the presence of obstacles, such as vegetation and other flying bats. Success requires that the bat integrate the features of echoes over time and use this information to plan appropriate motor behaviors for prey interception and obstacle avoidance. Moss and Surlykke (2001) studied the echolocating big brown bat’s ability to integrate delay information across echoes. In a two-alternative forced choice echo playback experiment, bats were required to discriminate between orderly sequences of echoes with decreasing or increasing delay and random sequences of echoes containing the same delay values (Fig. 8.9a, b). It was not surprising that big brown bats could successfully perform a task that required integration of echo delay information over time (Fig. 8.9c), but it was important to establish that the bat’s sonar receiver supports this basic operation required for auditory scene analysis.

Fig. 8.9
figure 9

Experimental setup and performance of two big brown bats (Eptesicus fuscus) in an echo integration experiment. (From Moss, C. F., & Surlykke, A. Auditory scene analysis by echolocation in bats. Journal of the Acoustical Society of America [2001] 110, 2207–2226; reprinted with permission from the Acoustical Society of America.) (a) The setup of the echo integration experiment. The bat was trained to sit on the Y-shaped platform and produce echolocation calls into a 1/8-in. microphone (m). The echolocation signals were amplified, band-pass filtered, digitized, Fig. 8.9 (continued) electronically delayed, attenuated, low-pass filtered, and broadcast back to the bat through a loudspeaker (s). (b) There are two sets of delay pattern, one is sequential (S) echoes and the other is random (R) echo playbacks. (c) Performance of two bats (G-6 and M-6) trained in this experiment to discriminate sequential and random echo playback signals. The dashed line in each panel indicates 75 % of correct and this is the criterion to determine if the bat made a successful discrimination. Each block means trials recorded from different days

A recent study uncovered an important feature of FM-bat sonar that would support auditory scene analysis in complex, cluttered environments, where echoes from vegetation could interfere with the perception of on-axis sonar prey. Bats actively lock the sonar beam pattern axis on selected targets (Ghose and Moss 2003; Surlykke et al. 2009), which results in off-axis echo returns from other objects. Bates et al. (2011) noted that off-axis echoes contain less energy at higher frequencies than on-axis echoes, due to the directional characteristics of the sonar beam, the head-related transfer function, and spherical spreading losses. This means that temporal registration of the fundamental and higher harmonics would be disrupted in the bat’s sonar receiver, due to time-intensity trading. In other words, the second harmonic of the echo would be represented in the bat’s sonar receiver at a longer delay than the fundamental, because weaker sounds evoke activity in the auditory system at longer latencies than stronger sounds (Simmons et al. 1990). Although the fundamental and harmonics arrive at the same time at the bat’s ear, the auditory system would register a temporal offset of the weaker higher harmonics. Bates et al. (2011) measured the bat’s range discrimination performance when the timing of the fundamental and second harmonic was electronically offset, and they found that the bat’s perception of distance was degraded. They interpret this finding to suggest that off-axis echoes from clutter objects would be defocused, which would minimize their masking effect on selected on-axis target echoes. In effect, on-axis targets are represented as sharp, and off-axis targets are represented as blurred.

Advances in technology have enabled real-time playbacks of modified sonar calls to simulate echoes from simple and complex targets at different distances. Although research using these methods have advanced our understanding of the resolution of biological sonar systems, they are not suitable to the study of auditory scene analysis. The microphones and loudspeakers that comprise these playback systems are echo reflectors that interfere with the animal’s perception of the simulated echoes of a complex environment by reducing the perceptual salience of phantom target echoes. Indeed, this realization led Cynthia Moss and colleagues to shift research efforts on the bat’s perception of complex auditory scenes from phantom target playback studies to quantitative analyses of adaptive sonar behaviors. The bat’s adaptive sonar behavior provides indirect information about its perception, because changes in sonar vocalizations indicate the information an animal has processed from echoes and what information it seeks. Therefore, studies of the bat’s adaptive sonar behavior provide a window to the animal’s perception of complex echo environments.

Bat echolocation engages adaptive sonar behaviors that contribute directly to accurate localization and tracking of objects. The features of sonar calls produced by a bat to probe its environment directly impact the information available to its acoustic imaging system. In turn, the bat’s perception of the auditory scene influences the features of subsequent sonar vocalizations. The bat’s adjustments in echolocation call parameters, such as duration, interval, direction, and spectrum, provide insight to the acoustic information used to solve the perceptual problem of sorting and tracking echoes arriving from different directions and distances.

The sonar beam patterns of echolocating bats are directional and vary with sound frequency and across species (Hartley and Suthers 1989). Nasal emitters produce complex beam patterns that are shaped by the noseleaf (Schnitzler and Grinnell 1977; Vanderelst et al. 2010). The sonar beam patterns of oral emitters typically show less complex spatial profiles but can contain more than one lobe. Laboratory studies have documented that oral echolocators accurately adjust the directional aim of sonar calls to sequentially inspect closely spaced objects (Moss et al. 2011). In one study that required the big brown bat to engage in both obstacle avoidance and prey capture, the animal sequentially pointed its sonar beam axis at the edges of a net opening to find its way through an obstacle and gain access to a food reward (Surlykke et al. 2009). This FM bat also adjusted the duration of its echolocation calls to avoid overlap with sonar vocalizations and echoes from the objects it was inspecting. When the big brown bat shifted its sonar gaze to more distant objects, it tolerated overlap between calls and echoes from nearby obstacles (Surlykke et al. 2009).

In a target discrimination study, free-flying big brown bats exhibited similar adjustments in call direction and duration to inspect small tethered objects sequentially with different textures (Falk et al. 2011). Recordings of sonar returns from the textured objects showed echo-to-echo variation in spectrum, with different patterns of change for each of the textured stimuli. The larger the differences in echo-to-echo spectral profile between stimuli, the higher the bat’s target discrimination performance. This finding suggests that the bat listened to changes in echo profiles over time to perform the texture discrimination task. Further research is needed to better understand the bat’s perception of target texture through echolocation.

When bats fly in groups, they face the challenge of sorting their own calls and echoes from those of conspecifics, and they must adopt strategies to avoid sonar jamming (Ulanovsky et al. 2004; Gillam et al. 2007; Fig. 8.10a, b). Laboratory experiments showed that individual big brown bats adjusted the start and end frequencies of their FM sweeps, along with FM bandwidth, when they were foraging with another bat in a large flight room (Surlykke and Moss 2000; Chiu et al. 2009). The magnitude of the vocal adjustment depended on the baseline similarity of the bats’ calls (when they flew alone) and their spatial separation in the room: Bats with similar baseline call design made adjustments in spectral call characteristics, and bats with different baseline call design maintained spectral differences without adjusting their sonar signals. Bats made the largest adjustments in the spectral characteristics of their calls when they flew less than 1 m apart (Chiu et al. 2009). It is also noteworthy that bats in this competitive foraging situation sometimes went silent (ceased vocalizing for at least 200 ms), and this behavior appears at least in part to be driven by jamming avoidance. Bats showed the most silent behavior when their baseline call design was similar to that of its competitor and when the two bats flew less than 1 m apart (Chiu et al. 2008), a result that parallels the spectral adjustment data (Moss et al. 2011). It is possible that the silent bat listened passively to the calls produced by the vocalizing bat, and these signals contributed to this animal’s perception of the auditory scene.

Fig. 8.10
figure 10

Echolocating bats adjust their call frequency to avoid signal jamming with conspecifics. (a) The bat (Tadarida brasiliensis) adjusted its call frequency in response to the playback signals similar to its call frequency. (From Gillam et al. (2007), with permission from Proceedings of the Royal Society of London.) The dashed line indicates the time the playback signals (a frequency of 24.3 kHz) were switched on. This is an exemplary trial from the recordings. (b) A recording from two big brown bats (Eptesicus fuscus) flying together. (Adapted from Surlykke, A., & Moss, C. F. Echolocation behavior of the big brown bat, Eptesicus fuscus, in the field and the laboratory. Journal of the Acoustical Society of America [2000] 108(5), 2419–2429; reprinted with permission from Acoustical Society of America.) Before 1.5 s, one bat was calling at relatively low frequency (indicated by triangle) and the other bat was calling at relatively high frequency. After 1.5 s, the low-frequency one stopped calling and the high-frequency one lowered its call frequency

Field experiments in multiple bats flight conditions also indicated that bats change the temporal or spectral features of their echolocation calls in order to avoid signal interference with conspecifics. Past studies have demonstrated that when flying with conspecifics, several bat species (big brown bats, Eptesicus fuscus, and free-tailed bats, Tadarida brasiliensis and T. teniotis) shift their call frequencies either upward or downward to avoid possible call interference and overlap by neighboring individuals (Ulanovsky et al. 2004; Gillam et al. 2007; Bates et al. 2008). A playback experiment showed that T. brasiliensis raised the end frequency of the FM sweep in response to playback jamming signals, whose frequencies were equal to the average end frequencies of this species’ sonar calls (Gillam et al. 2007; Fig. 8.10a). Both laboratory and field studies have identified the strategy that the bat increases differences between its own sonar call design and those of conspecifics when flying in groups. This strategy presumably helps the bat segregate its own echolocation pulses from others in proximity.

Studies of the echolocating bat’s adaptive vocal behaviors provide a window to the acoustic information an animal has processed and the information it is seeking from its environment. Quantitative analysis of adaptive sonar behavior may therefore contribute to our understanding of auditory scene analysis by echolocation. It is important, however, to caution the reader that inferences made from adaptive sonar behavior are not direct measurements of perception, and other research methods, such as psychophysical tasks, can generate complementary data that serve to deepen our understanding of natural scene perception.

8.3.2 Dolphins

The dynamic quality of the dolphin echolocation signal, coupled with the animal’s ability to perceive and discriminate changes in the acoustic information contained within the echolocation frequency range, make these signals well suited for auditory scene analysis. Although there is almost no direct information on the ability of the dolphin echolocation system to perform auditory scene analysis as described by Bregman (1990) for human listeners, one observation of auditory stream segregation in dolphins has been reported (Moore and Finneran 2011). We know that dolphins change signal level and frequency to overcome masking stimuli, and they shift signal frequency with age presumably due to hearing loss (Moore et al. 2004; Kloepper et al. 2010). We know little or nothing about why or how multiple animals adjust their signals when echolocating during foraging because it is extremely difficult to monitor and record individual animals accurately in the open ocean.

When dolphins echolocate in either a detection or discrimination experiment they emit click trains that comprise hundreds of dynamically changing signals. In the past it has been nearly impossible to determine exactly what causes the changes in individual emitted clicks in the train. However, using new high-speed phantom echo generators (PEGs; Finneran et al. 2010) experimenters can track each click in the train and precisely time when a target appears (detection) or when a target changes (discrimination). Armed with this precision, an examination of how the animal changes its signals in response to the changing target stimuli is possible. These experiments are now underway and hopefully will result in a much better understanding of the signal emission strategies of echolocating dolphins. In one phantom target discrimination study, following the methods of Moss and Surlykke (2001) with an echolocating dolphin, a series of phantom echoes, representing a sphere target, was programmed to systematically approach, recede, or appear randomly along the range axis (Moore and Finneran 2011). Each emitted echolocation signal triggered a target echo return that was presented to the dolphin’s lower jaw via a jaw-phone (suction attached transducer at the acoustic window). The results clearly indicated discrimination between the systematically approaching/receding echo streams versus the random stream, demonstrating the both the ability to integrate information over time and short term memory for acoustic events. This result parallels that reported by Moss and Surlykke (2001) for the echolocating bat. Other indirect evidence of stream segregation for auditory scene analysis in dolphins is suggested by their ability to perform echolocation delayed-match-to-sample (DMTS) tasks. Roitblat et al. (1990) reported an experiment of dolphin echolocation DMTS that required a blindfolded dolphin to correctly choose, from a selection of three objects, a previously presented object. The ability to perform this task at 90 % or better clearly demonstrates the animal’s ability to engage short-term memory for complex target returns. These studies provide evidence that the dolphin biosonar system possesses, at the minimum, the rudimentary requirements for auditory scene analysis.

8.3.2.1 The Littoral Ocean (Noisy, Reverberant, and Cluttered)

Most dolphins live in and along the coastal regions in shallow waters, bays, estuaries, and riverine environments, which are very noisy, notoriously non-Gaussian and non-stationary (Urick 1975). Dolphins have evolved biosonar that is adapted for this noisy, reverberant, and highly cluttered environment.

Dolphin biosonar has been shown to be a highly refined acoustic sense that these animals use for detection and discrimination of targets (see Simmons & Houser, Chap. 6). For dolphins hunting prey in very shallow water, or prey that is buried beneath the ocean bottom, reverberation plus noise impose limits on the perception of returning echoes. To be successful in the detection of targets, dolphins must be able to overcome competing returns from the various bottom composition and inhomogeneities, distortion due to thermal discontinuities in the medium, and losses due to absorption in the ocean bottom, clutter and other biological sources. This detection scenario is much more complicated than in open water (Moore 1997; Houser et al. 2005; Martin et al. 2005; see Fig. 8.11).

Fig. 8.11
figure 11

The search strategy of a dolphin as it swims an underwater path around a red float located several hundred meters from a boat. (a) Source level in dBre:1 μPa and interclick interval in milliseconds for a target search. The arrows indicate the point at which the dolphin whistles, indicating detection of the target (b) virtual reality rendering, viewed from animal depth, of the target search path as a series of white dots. Data sensors included full three-dimensional position data (heading, pitch, roll, acceleration, angular rates, depth, and velocity). Data collected from the sensor pack are geo-located with a virtual rendering created after the search. (See Martin et al. 2005, and Houser et al. 2005)

Resolving sonar targets in high density clutter, such as fish camouflaged by inshore kelp forests, or the detection of moving prey, either down range of the main sonar beam or cross-range of the beam, is an important capability demonstrated by dolphins. It is noteworthy that even when the target is coplanar with the cluttering objects the dolphin can detect the target when the target echo to clutter backscatter ratio is approximately 0 .25 dBpkpk (the subscript pkpk refers to the peak-to-peak value of the emitted signal) (Au and Moore 1984). Altes et al. (2003) carried out a psychophysical study to test the hypothesis that dolphins combine echoes to improve signal detectability in noisy, cluttered, and reverberant conditions and to determine the best receiver model accounting for the dolphin’s performance. They point out that if a moving echolocating animal has the ability to sum echo samples from the same point (target) using different pulse-echo pairs, this ability could be electronically simulated as a synthetic aperture sonar process. In this study, an echolocating dolphin detected a 50-kHz, 80-μs sinusoid pulse presented at 7 m range in noise. A pulse was delivered for each emitted echolocation click generated by the animal. The detection threshold in noise as a function of the number of delivered pulses was determined for N = 1, 4, 8, and 16 pulses. They found that for the dolphin’s acoustic environment, the binary M-out-of-N detection model closely matched the dolphins’ detection performance, but the data were a poor fit for a linear or energy summation model. The binary M-out-of-N detector model is a basic building block for neural all-or-none signals (binary action potentials). The Altes et al. (2003) study supported an earlier premise that dolphins may use the ensemble of echo returns to discriminate target attributes (Moore and Pawloski 1990; Roitblat et al. 1990). Using only the echoes collected from a dolphin performing a DMTS task in the noise of Kaneohe Bay, Hawaii, these investigators found that a neural network that performed an averaging of returned echo spectra (see Fig. 8.12) could classify simple targets shapes as well as the dolphin when echoes had good signal-to-noise ratio.

Fig. 8.12
figure 12

Time-aligned successive target echo spectra (glycerin-filled fuel bottle) suspended in front of blindfolded dolphin (time between echoes removed). The target echo is seen to emerge from the background noise as the train of echoes progresses and can be seen in the center of the echo-train (arrows indicate both echo number and frequency range; see Moore et al. 1991)

Au (1994) and Au et al. (1995) advanced other models that combined temporal and spectral information over an echolocation train in a neural network model with noisy echoes as exemplars. Branstetter et al. (2007) advanced these models of dolphin echo representation with one that incorporates both spectral and temporal resolution of the dolphin auditory system based on processes which have been demonstrated from dolphin psychoacoustic results. Although it is unreasonable to suggest that these models fully represent the underlying processes occurring in dolphin echo recognition, these investigators have applied these models to motivate further psychoacoustic investigations and to better understand the processes that may be at play in the wider arena of auditory scene analysis.

8.3.2.2 Tracking Prey in the Presence of Conspecifics

Dolphins are cooperative foragers which sometime employ scout animals to locate prey. Different species use varying techniques to herd or crowd fish into a small confined area to allow individuals to catch them more easily. Field observations of spinner dolphins indicate a high degree of synchronization with almost immediate transitions of discrete hunting behaviors between animals over large distances where water clarity and light levels (hunting at dusk or night) would make visual cues highly unlikely (Benoit-Bird and Au 2009). Although the capture of individual prey by feeding animals may be aided by bioluminescence (produced by living organisms in the water), recordings (of hunting and feeding) bouts suggested that between feeding bouts these dolphins were using clicks signals, not whistles, to coordinate behaviors between animal groups. Click signals (assumed to be biosonar related) would meet the requirements for this kind of inter-animal coordination of behavior, they are highly directional, have a wide bandwidth (and thus large information-carrying capacity), and allow selective communication between individual animals or groups of animals (Lammers et al. 2003). In addition, the dolphin’s impressive passive listening capability surely must play a pivotal role in monitoring conspecifics during social and cooperative hunting behaviors. Xitco and Roitblat (1996) conducted an experiment that demonstrated that echolocating dolphins could “eavesdrop” on conspecifics returning target echoes. They showed that the listening dolphin could perform a matching-to-sample task above chance by using a “champions” echolocation signals. Until more studies like the Benoit-Bird and Au (2009) and the Xitco and Roitblat (1996) investigations are conducted, we must continue to speculate on the auditory scene of echolocating dolphins and the behavioral dynamics of their hunting behavior in the wild.

8.4 Challenges and Future Direction for the Study of Auditory Scene Analysis in Bats and Dolphins

Although bats and dolphins are both echolocators, they operate under vastly different conditions, aerial and aquatic (see Madsen & Surlykke, Chap. 9), and the challenges we face in designing experiments to study their auditory perception are not the same. These two groups of animals exhibit different natural behaviors, which should be taken into consideration when designing studies of auditory scene analysis. Some dolphins and bats are cooperative foragers, and some search for food individually. While foraging in groups, both bats and dolphins face the challenge of sorting out their own calls from others. Echolocating bats employ several strategies, including shifts in call frequency, changes in call duration, and increased pulse intervals, to avoid signal jamming with conspecifics. Dolphins, animals that forage in large groups, produce highly directional, wide bandwidth click signals to hunt for fish, but the strategy they use to avoid signal interference is not yet clear. It is possible that group foragers may use the sonar signals produced by neighbors to guide their prey search and orientation, rather than develop a strategy to avoid interference between their own signals and those of others. Cooperative sonar behavior is a field with many open questions and opportunities for future research.

Both bat and dolphin researchers have studied echoes reflecting from a variety of objects and conducted behavioral experiments to investigate the animal’s ability to discriminate among different objects by echolocation. It has been demonstrated that both bats (Simmons et al. 1974; Habersetzer and Vogler 1983; Falk et al. 2011) and dolphins (Moore et al. 1984; Au and Pawloski 1992; Au 1993) can discriminate object structure and they can both integrate echo information over time (Moss and Surlykke 2001; Moore and Finneran 2011). Some researchers also have built models to understand how echolocators use biosonar to perceive their environment (Moore et al. 1991; Branstetter et al. 2007; Yovel et al. 2008). Modeling efforts in the field of echolocation present exciting challenges.

The sonar beam patterns produced by echolocating animals are directional, and therefore bats and dolphins can direct their sonar beam to inspect objects of interest. Studies of target range discrimination in bats suggest that resolution is highest along the central axis. Therefore, bats can maximize information from objects of interest and minimize clutter interference through directional control of the sonar beam (Bates et al. 2011). Although it has been shown that dolphins can detect objects in reverberation and heavy clutter (Au and Turl 1983), an open question remains as to exactly how dolphins use their highly directional sonar beam to minimize masking by interfering echoes.

In our review of psychophysical studies of sonar perception, we note the limitations of understanding auditory scene analysis by echolocation when the animal may be constrained to a limited repertoire of emitted signals, and listening to a mix of simulated and real echoes that can compromise the perceptual salience of the experimental setting. In addition, changes in the animal’s head aim that would normally result in large changes in echo features may not be represented in playback echoes and therefore fail to fully capture 3D elements of natural sonar objects. We have learned from psychophysical studies a great deal about the limits of the echolocating animal’s echo processing, but there remains much to understand about higher level perceptual processes that contribute to auditory scene analysis.

By contrast, adaptive motor studies are better suited to fully engage the animal in more natural behaviors in which they dynamically modulate their sonar calls in response to echo returns from the environment. Adaptive sonar behaviors are an integral component of echolocation systems that would be expected to feed into auditory scene analysis processes. However, adaptive motor studies have not provided a direct measure of the animal’s perception of a complex, natural environment. Instead, auditory perception can only be inferred from the animal’s adaptive motor behaviors. Future research on auditory scene analysis by echolocation must embrace the challenge of marrying the advantages of psychophysical and adaptive motor studies, taking creative new approaches to tap into an animal’s perception of its complex, 3D auditory world, while allowing it to engage in its natural behaviors.