Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Most people reading this chapter will have had first-hand experience trying to converse in noisy social gatherings, such as in a popular bar or at a professional sporting event. Difficulty understanding speech in these sorts of social environments is aptly named the “cocktail party problem” (Cherry 1953) in the literature on human hearing and speech communication (reviewed in Bronkhorst 2000; McDermott 2009). Acoustic communication in noisy environments like cocktail parties places heavy demands on receivers’ abilities to detect and discriminate among signals (Chap. 2). However, communicating in noisy environments requires that receivers do more than merely detect a signal’s presence and determine whether two or more signals differ. The problem we and other animals have hearing in such environments—and indeed in any environment where there are multiple concurrent sound sources—stems ultimately from the physics of sound. Unlike objects in a visual scene, which can occlude light emitted or reflected by background objects, the sound sources in an “acoustic scene” generate sound pressure waves that add linearly to form a single, composite waveform that impinges on the ears. A primary function of the auditory system is to organize this sensory input into coherent perceptual representations of the various sound sources present in the environment (Bregman 1990; Yost et al. 2008). Perceptually organizing complex acoustic scenes requires that listeners decompose the composite waveform and assign its constituent parts to their correct sources. These latter perceptual tasks—variously referred to as “sound source segregation” (Brown and Cooke 1994), “sound source perception” (Yost et al. 2008), “auditory grouping” (Darwin and Carlyon 1995; Darwin 1997), “auditory streaming” (Shamma and Micheyl 2010), or “auditory object formation” (Griffiths and Warren 2004)—represent key elements of what has been more broadly termed “auditory scene analysis” (Bregman 1990). While issues of auditory perception related to signal detection (Chap. 2), sound pattern recognition (Gerhardt and Huber 2002), source localization (Christensen-Dalsgaard 2005), and noise (Brumm and Slabbekoorn 2005) have featured prominently in studies of animal acoustic communication, this is less so for auditory scene analysis (Hulse 2002; Bee and Micheyl 2008). Consequently, we are just beginning to uncover the mechanisms by which nonhuman animals perceptually organize complex acoustic scenes.

The perceptual challenges faced by breeding anurans parallel the difficulty we have following one person speaking in a noisy social gathering. Both their auditory systems and the diversity present in their vocal communication systems make frogs particularly interesting animal models for studying perceptual mechanisms for hearing and sound communication in noisy and acoustically complex environments (Feng and Ratnam 2000; Feng and Schul 2007). Vocal communication in frogs functions primarily in the context of reproduction and it commonly takes place in social and physical environments characterized by multiple concurrent sound sources and high levels of biotic and abiotic background noise (Chap. 5; see also Narins and Zelick 1988; Ryan 2001; Gerhardt and Huber 2002; Narins et al. 2007; Wells 2007). Reproduction usually depends on a female frog’s ability to respond correctly to the advertisement signals of a conspecific male. Vocal signals also mediate agonistic interactions, enabling male frogs to estimate their opponent’s proximity, size and fighting ability, and even recognize him as a familiar individual (Gerhardt and Bee 2007). All of these behaviors require that receivers accomplish a number of key perceptual tasks that include detecting signals, recognizing them as conspecific calls, localizing their source, and discriminating among different signal variants. But how do frogs successfully complete these tasks in the cacophonous acoustic scene of a breeding chorus? How do they determine distinct sound sources? How do frogs perceptually organize complex acoustic scenes and solve their own cocktail-party-like problem? As we hope to make clear in this chapter, answering these questions remains both a fundamental challenge in understanding hearing and sound communication in frogs, as well a primary goal of modern research on these remarkable little animals.

2 Frogs as Model Systems for Studies of Acoustic Communication in Noise

Anuran systematics is an area of very active research and taxonomic nomenclature in this group has undergone some revisions and re-revisions in recent years (Frost et al. 2006; Pyron and Wiens 2011). In review chapters like this one, there is considerable potential for confusion when one species has multiple scientific names in the literature. In this chapter, we adopt the nomenclature of (Pyron and Wiens 2011) and point out where we are using new names.

2.1 Experimental Methods

Frogs have long served as model organisms for investigating the mechanisms, function, and evolution of animal acoustic communication. Excellent reviews of anuran hearing and communication can be found in previous volumes edited by Fritzsch et al. (1988), Ryan (2001), and Narins et al. (2007), and in books by Gerhardt and Huber (2002) and Wells (2007). Three primary experimental approaches based on behavioral responses have been used successfully to study anuran hearing and communication. The two most commonly used methods involve assessing the animal’s natural behavior in response to playbacks of either natural or synthetic calls (Gerhardt 1992a). Female frogs in reproductive condition, as well as males defending calling sites or territories, exhibit positive phonotaxis toward speakers broadcasting conspecific calls (Gerhardt 1995). While there are studies of phonotaxis behavior in the field (e.g., Schwartz et al. 2001; Narins et al. 2003, 2005; Amézquita et al. 2005, 2006; Ursprung et al. 2009), most phonotaxis studies have been conducted with female subjects under controlled conditions in laboratory sound chambers that provide high levels of control over acoustic test environments. Generally, two types of phonotaxis test designs are used, and both involve presenting repeated sound stimuli in a systematic way that simulates one or more naturally calling individuals (Gerhardt 1995). In multiple-stimulus tests, subjects are presented with two or more alternating or overlapping stimuli, and experimenters assess the proportions of subjects choosing each competing alternative. If a proportion of females greater than that expected by chance responds to a particular stimulus, the interpretation is that they can discriminate among stimuli, localize the source of at least one of them, and have a preference for that kind of signal. Multiple-stimulus tests are sometimes referred to as choice tests (e.g., Bush et al. 2002) or discrimination tests (e.g., Ryan and Rand 2001). By far, the most common type of multiple-stimulus test in studies of anuran communication has been the two-alternative choice test, which pairs two alternating stimuli against each other. In the second main type of phonotaxis test, single-stimulus tests, the experimenter measures behavior (e.g., latency of approach) in response to a single stimulus. If females respond in a single-stimulus test by approaching the speaker, the interpretation is that they can detect the sound, recognize it as the call of an appropriate (or at least acceptable) mate, and localize it. Single-stimulus tests have also been referred to as no-choice tests (e.g., Bush et al. 2002) and recognition tests (e.g., Ryan and Rand 2001) in the literature. Another natural behavior important in experimental studies of anuran hearing and communication is the evoked vocal response (EVR) (Capranica 1965). When stimulated with playbacks of conspecific calls, male frogs commonly call back in response. While more common in studies of territorial aggression (e.g., Bee and Gerhardt 2002) and call site defence (e.g., Wagner 1989), several studies of hearing in noise (e.g., Narins 1982; Penna et al. 2005; Penna and Hamilton-West 2007) and perceptual organization (e.g., Simmons and Bean 2000) have measured the EVR as well.

A third but less common approach used to study frog hearing involves measuring prepulse inhibition, or reflex modification (Yerkes 1904; Hoffman and Ruppen 1996). In this psychophysical technique, the amplitude of a reflex (e.g., leg flexion) elicited by one stimulus (e.g., mild shock) is modified by prior presentation of a brief, neutral stimulus (e.g., a tone) that does not elicit the reflex by itself (reviewed in Simmons and Moss 1995). By varying the neutral stimulus (e.g., tone amplitude) and measuring the resulting changes in reflex inhibition, it is possible to assess the animal’s sensitivity to the neutral stimulus. The method is not currently used widely in research on anurans, but it has proven very useful as a way to probe perceptual organization in this group. Unfortunately, more traditional psychophysical methods involving classical or operant conditioning, as well as measures of other unconditioned responses (e.g., the galvanic skin response), have not been very successful tools in the study of frog hearing (reviewed in Simmons and Moss 1995; but see Elepfandt et al. 2000). The establishment of rigorous methods to study frog hearing based on conditioned responses would be a welcomed development in the field.

Two cautionary points about studies of hearing and sound communication in frogs are worth bearing in mind. First, a major disadvantage of natural responses to acoustic signals as behavioral assays (i.e., phonotaxis and the EVR) is the difficulty (perhaps impossibility) of distinguishing between signal detection and signal recognition, though it might be interesting or desirable to do so. Motivated subjects that fail to respond may do so because they did not detect the sounds or because the sounds were not recognized as conspecific signals. Likewise, animals may not behaviorally discriminate among signals even though they can perceive acoustic differences among them. Therefore, frog studies employing phonotaxis or the EVR provide useful information about “just meaningful differences,” but not about “just noticeable differences” (Nelson and Marler 1990). This general point, of course, potentially applies to all animal playback studies and is an important consideration when interpreting results.

Second, some frogs perceive the sounds of a chorus not as “noise” (to be ignored) but as a “signal” of interest. Both male and female frogs can be attracted by the sounds of breeding choruses (Gerhardt and Klump 1988a; Bee 2007a; Swanson et al. 2007; Christie et al. 2010). One advantage of this type of behavior is that it makes frogs ideal systems for studies of “soundscape orientation” (Slabbekoorn and Bouton 2008). The major disadvantage, however, is that the behavioral significance of chorus sounds as a signal can potentially confound their salience as a masker in experimental studies. In some cases, the lack of a behavioral response to a target signal in the presence of chorus noise could mean that the “masker” actually acted as a relatively more attractive signal. Therefore, it is important to assess the attractiveness of chorus noises or other sounds used as maskers or distractors in control experiments (Vélez and Bee 2010, 2011, 2013; Nityananda and Bee 2011; Vélez et al. 2012, 2013).

2.2 Anuran Breeding Choruses as Cocktail Parties

Three aspects of anuran communication are particularly relevant when thinking of frog breeding choruses as cocktail-party-like acoustic scenes. First, frog calls can be exceedingly loud. In a detailed study of sound pressure levels and sound pattern radiation in 21 species of frogs from North America, Gerhardt (1975) reported peak sound pressure levels (peak SPL) at a distance of 1 m ranging between about 90–110 dB (see also Loftus-Hills and Littlejohn 1971; Passmore 1981; Penna and Solís 1998). As careful practitioners of frog bioacoustics research will attest, it is not always an insignificant technological challenge to reproduce frog calls with high fidelity (i.e., low noise, no distortion) at natural call amplitudes.

Second, breeding choruses are noisy, multisource acoustic environments (Fig. 6.1). Choruses can easily include hundreds of loudly calling males (Murphy 2003) and can be heard from distances of up to 1–2 km (Griffin 1976; Arak 1983). The noise generated by the aggregation of signaling males can reach maximum levels of up to 90 dB SPL (re 20 μPa, RMS) in frequency regions corresponding to spectral components of advertisement calls (Narins 1982). Moreover, anuran choruses usually include multiple species, perhaps numbering a dozen or more in the tropics. Sometimes, though certainly not always, the advertisement calls of the different species composing mixed-species choruses occupy different regions of the frequency spectrum in a way suggesting “acoustic niche” partitioning (see Sect. 5.2.1 in Chap. 5).

Fig. 6.1
figure 1

The multisource acoustic scenes of frog breeding choruses. Spectrograms (top panels) and waveforms (bottom panels) of (a) the sound of a frog breeding chorus and the vocalizations of (b) Hyla cinerea, (c) H. chrysoscelis, (d) Pseudacris maculata, and (e) Dendrobates histrionicus. The acoustic scene of the breeding chorus in (a) includes calls of green treefrogs (H. cinerea) with spectral energy around 0.85 kHz and between 2.5 and 3.0 kHz, Gulf Coast toads (Bufo valliceps) consisting of a series of pulses with spectral energy around 1.5 kHz, and an orthopteran insect with spectral energy between 8.5 and 9 kHz. Arrows highlight temporal and spectral acoustic elements that the auditory system must integrate sequentially and simultaneously

Third, frog calls typically are made up of temporal sequences of distinct sound elements (e.g., notes or pulses) with spectral energy (e.g., harmonics) occurring simultaneously across the frequency spectrum (Fig. 6.1). Information related to species identity, body size and fighting ability, aggressive motivation, physiological condition, genetic quality, and individual identity can be conveyed by the distinctive temporal and spectral properties of frog calls (Gerhardt 1992b; Gerhardt and Bee 2007). Of course, not all of the frog calls in a chorus are generated by potential mates or sexual competitors. Sometimes the call of most immediate concern may be that of a predatory frog species (Schwartz et al. 2000; Bernal et al. 2007). Decoding the acoustic information in calls and generating adaptive behavioral responses requires that receivers assign temporal and spectral elements of vocalizations to their correct source.

Importantly, conspecific and heterospecific frogs are rarely the only sources of noise in the local environment. In addition to other signaling animals (e.g., insects, Paez et al. 1993; Fig 6.1a), anuran choruses often form in areas with high levels of abiotic noise from sound sources such as rivers and streams (Feng et al. 2002; Narins et al. 2004), waterfalls (Boeckle et al. 2009), rain (Penna et al. 2005) and wind (Penna et al. 2005). Sometimes abiotic “noise” may serve as a potent sound stimulus. West African reed frogs (Hyperolius nitidulus), for instance, exhibit negative phonotaxis in response to the sound of fire by fleeing and searching for cover (Grafe et al. 2002).With the increasing human population and the concomitant expansion of urban areas, anuran breeding habitats are also being invaded by anthropogenic noise (Chap. 5 and 14).

2.3 Some Relevant Features of the Anuran Ear

2.3.1 Frequency Tuning in the Peripheral Auditory System

All auditory systems exhibit frequency selectivity or “tuning” that ultimately arises from spectral filtering by the periphery (e.g., insects: Chap. 3; birds: Chap. 8; mammals: Chap. 10). In frogs, airborne sounds with frequencies characteristic of acoustic signals are encoded by two anatomically distinct sensory papillae in the inner ears (reviewed in Capranica 1976; Zakon and Wilczynski 1988; Lewis and Narins 1999; Simmons et al. 2007). The amphibian papilla exhibits tonotopic organization and the auditory nerve fibers innervating this papilla have best excitatory frequencies ranging from as low as 80 Hz up to 600–1,600 Hz, depending on the species. In contrast, the basilar papilla lacks tonotopic organization and is sensitive to higher frequencies than the amphibian papilla. Although frogs tend to have somewhat wider filter bandwidths compared to other vertebrates, behavioral studies have also reported enhanced selectivity in frequency regions that match the spectral components of conspecific advertisement calls (Moss and Simmons 1986; Fay and Simmons 1999).

Neurophysiological studies have shown the tuning of the basilar papilla, and in some cases that of the amphibian papilla, to match closely one or more frequency components in conspecific advertisement calls in many anurans (reviewed in Zakon and Wilczynski 1988; Lewis and Narins 1999; Gerhardt and Schwartz 2001). The discovery of enhanced sensitivity to frequencies emphasized in conspecific signals gave rise to the view of the anuran peripheral auditory system as a “matched filter” (Capranica and Moffat 1983; Simmons 2013). Given the dispersion in frequency sometimes seen in multispecies frog choruses (Chap. 5), one primary function of a matched filter ear would be to reduce auditory masking by calls of heterospecific signalers and perhaps also abiotic noise. Unfortunately, behavioral audiograms exist for only a handful of species, and they offer limited support for the idea that frogs have their most sensitive hearing in the range of frequencies found in their own calls (Fig. 6.2). In green treefrogs (Hyla cinerea; Megela-Simmons et al. 1985) and African clawed frogs (Xenopus laevis; Elepfandt et al. 2000), audiograms measured using reflex modification and conditioning, respectively, exhibited increased sensitivity to frequencies emphasized in conspecific calls (Fig. 6.2a). This is not the case, however, for North American bullfrogs (Rana catesbeiana), which have harmonically rich calls with a bimodal spectrum characterized by two predominant peaks in acoustic energy, one centred between 200 and 300 Hz and the other between 1,200 and 1,600 Hz; there is very little acoustic energy in the range of 500–1,000 Hz (Capranica 1965; Bee and Gerhardt 2001). Using reflex modification, Megela-Simmons et al. (1985) measured broad, U-shaped behavioral audiograms with maximal sensitivity (thresholds of 10–20 dB SPL) in the range of 400–1,600 Hz (Fig. 6.2b). Sensitivity declined at rates of 26 dB/octave on the low end and 32 dB/octave on the high end. Interestingly, the lowest thresholds of the two animals for which audiogram data are available were at 600 and 800 Hz, precisely in the range of frequencies between the two peaks of the advertisement call’s spectrum. Subsequent measures of the bullfrog’s masked threshold and critical ratio function also were not closely tied to the spectrum of the advertisement call (Simmons 1988b). These findings with bullfrogs do not strongly support the matched filter hypothesis. Additional studies measuring behavioral audiograms in a greater diversity of frogs in quiet and noisy conditions are still badly needed.

Fig. 6.2
figure 2

Auditory sensitivity in relation to the spectral content of advertisement calls. Behavioral audiograms determined using reflex modification techniques (a, c) and the frequency spectra of advertisement calls (b, d) for (a, b) two green treefrogs (Hyla cinerea) and (c, d) two North American bullfrogs (Rana catesbeiana). As these two species illustrate, some frogs show increased hearing sensitivity to frequencies emphasized in conspecific calls (a, b) but others do not (c, d). Audiograms redrawn from Megela-Simmons et al. (1985) and reprinted with permission from the Acoustical Society of America

2.3.2 Directionality of the Peripheral Auditory System

In order to reproduce, female frogs must often locate males in structurally complex habitats characterized by dense vegetation and other obstacles. Males have to localize intruders into their territory or calling site in these same habitats. Moreover, it may often be necessary to localize other individuals under very low light levels (e.g., in a rainforest on a cloudy night). Therefore, sound localization has obvious fitness consequences for anurans and, not surprisingly, it has been investigated in several behavioral and physiological studies over the years (reviewed in Eggermont 1988; Rheinlaender and Klump 1988; Klump 1995; Gerhardt and Huber 2002; Christensen-Dalsgaard 2005; Christensen-Dalsgaard 2011). In most species, directional acuity in the horizontal plane is on the order of 10°–20°, though acuity may be considerably better in some species (Shen et al. 2008). Three-dimensional directional acuity is typically lower than that in azimuth (e.g., Passmore et al. 1984).

Two aspects of spatial hearing in frogs are worth bearing in mind. First, unlike mammalian ears, which are decoupled pressure receivers, anuran ears are pressure-gradient receivers. The two middle ears on opposite sides of the head are coupled through the mouth cavity and Eustachian tubes (Narins et al. 1988; Christensen-Dalsgaard 2005, 2011). This has important implications for directional hearing. Frogs are small animals and the dominant sound frequencies present in most frog calls (e.g., in the range of 0.2–7 kHz) typically have wavelengths (e.g., 172–5 cm, respectively) much longer than the width of most frogs’ heads. Consequently, interaural time and level differences are quite small at the external surfaces of the tympanic membranes, which sit flush with the side of the head in most species. However, the internal coupling of the two middle ears gives rise to inherent directionality due to the interaction of sounds arriving at both surfaces of the tympanic membranes. Second, there are extratympanic inputs into the anuran auditory system (reviewed in Mason 2007). Sounds that enter via extratympanic pathways like the floor of the mouth and the lungs account for most of the sensitivity and directionality to low frequency sounds (reviewed in Christensen-Dalsgaard 2005). As discussed below, spatial hearing may be important in some aspects of the perceptual organization of acoustic scenes in frogs, rendering the anuran auditory system a particularly interesting model for studies of hearing and communication in noisy environments.

3 Functional Consequences of Communicating in a Chorus

3.1 Decreased Signal Active Space

Active space is a fundamental concept in the study of communication referring to the spatial extent over which signals effectively elicit appropriate behavioral responses from receivers (Bradbury and Vehrencamp 2011). High noise levels and interfering sounds are among the many factors that influence the active space of acoustic signals. For frogs, this means receivers may only be able to hear and assess the calls of a few males in their immediate vicinity. One of the first studies to specifically address the problems imposed by chorus noise was conducted with green treefrogs by Gerhardt and Klump (1988b). In single-stimulus tests, females responded to male advertisement calls when the signal-to-noise ratio (SNR) was at least 0 dB. Based on this result and on previous field observations of the distances moved by females in the chorus (Gerhardt et al. 1987), they suggested that, in natural settings, female green treefrogs might sample just a few nearby males whose calls reach or exceed the background noise generated by the aggregation. Wollerman (1999) used two-alternative choice tests to estimate the SNR at which females of the hourglass treefrog, Dendropsophus ebraccatus (formerly Hyla ebraccata), responded to advertisement calls in the presence of chorus noise (Fig. 6.3a). Females were allowed to choose between two speakers separated by 180°; one speaker broadcast only chorus noise and the other speaker broadcast chorus noise and advertisement calls at SNRs of 0 dB, +1.5 dB, +3 dB, or +6 dB. The minimum SNR necessary to elicit differential approaches to the speaker broadcasting the call was between +1.5 and +3 dB (Fig. 6.3a). By considering the distribution and density of calling males in the chorus, Wollerman (1999) estimated that, at any given location in the chorus, females could recognize the calls of only the nearest male. If D. ebraccatus females were to exert mate choice under these conditions, they would have to move relatively large distances to sample several males. Field observations suggest that female D. ebraccatus move and sample up to seven males in a chorus before mating (Morris 1991).

Fig. 6.3
figure 3

Signal recognition and discrimination in noise by females of the hourglass treefrog, Dendropsophus ebraccatus. Results of two-alternative choice experiments showing the numbers of females choosing (a) speakers broadcasting chorus noise alone versus chorus noise + signal at the indicated signal-to-noise ratios and (b) speakers broadcasting a call with a dominant frequency of either 2,960 or 3,240 Hz at the indicated signal-to-noise ratios. Asterisks indicate significant differences. Graphs redrawn from (a) Wollerman (1999) and (b) Wollerman and Wiley (2002) and reprinted with permission from Elsevier

More recently, Bee and Schwartz (2009) investigated the effects of a “chorus-shaped” masker (i.e., an artificial noise with the long-term spectrum of a chorus) on the ability of females of Cope’s gray treefrogs (Hyla chrysoscelis) to recognize advertisement calls. They compared three different methods of estimating “signal recognition thresholds,” which they operationally defined as the minimum signal level required to elicit reliable positive phonotaxis. Defined in this manner, the signal recognition threshold is conceptually analogous to the “speech reception threshold” common in studies of the human cocktail party problem (Plomp and Mimpen 1979a, b). In a single-stimulus experiment, the target signal (a conspecific advertisement call) was broadcast across different tests at one of nine different fixed levels ranging between 37 and 85 dB SPL in 6 dB steps. In a parallel two-alternative choice experiment, females were given a choice between the same target signal and a decoy signal (calls of the closely related eastern gray treefrog, Hyla versicolor). Across different tests, the two signals were broadcast at the same fixed levels used in the single-stimulus experiment. Estimates of signal recognition thresholds were calculated based on the proportion of females approaching the target signal, the latency to reach the source of the target signal, and each subject’s angular orientation relative to the target signal 20 cm from the subject’s initial position. In a third experiment, Bee and Schwartz (2009) used an adaptive tracking procedure to estimate signal recognition thresholds; in this procedure, thresholds are determined by adjusting the SNR up or down between consecutive tests depending on the subject’s response in a previous test (see Bee and Schwartz 2009 for a description). In the absence of noise, most estimates of signal recognition thresholds in all three experiments ranged between 35 and 42 dB SPL, which was similar to that determined for the eastern gray treefrog by Beckers and Schul (2004). Signal recognition thresholds in noise typically increased to between 65 and 71 dB SPL, which corresponded to threshold SNRs ranging between about −5 to +1 dB, values similar to, or slightly lower than, those obtained for green treefrogs (Gerhardt and Klump 1988a) and hourglass treefrogs (Wollerman 1999). These studies of gray treefrogs indicate that females can detect and recognize calls at drastically lower sound levels in the absence of noise compared with in the presence of chorus-like noise. Consequently, the background noise in a chorus has substantial impact on reducing the potential active space of a male’s signal.

While a few studies have investigated the impacts of anthropogenic noise on the signaling behaviors of male frogs (reviewed in Chap. 5), only one published study has investigated its impacts on signal active space (Bee and Swanson 2007; see Barrass 1985 for an unpublished account). Bee and Swanson (2007) recorded road traffic noise near highways adjacent to two wetlands where frogs breed. In single-stimulus phonotaxis experiments with Cope’s gray treefrog, they estimated signal recognition thresholds in three conditions: a no masker condition, a chorus-shaped noise masking condition, and a traffic-shaped noise masking condition. The traffic-shaped noise was a broadband noise with most of its energy below 1 kHz. Masked recognition thresholds were generally similar in the two masking conditions and about 25 dB higher than in the no-masker control condition. These results suggest that traffic noise could impose limitations on communication. An interesting study yet to be conducted is to compare female performance under the combined influences of chorus noise and anthropogenic noise. In addition, traffic noise could have even greater effects in species with advertisement calls having lower frequencies. However, given the numbers of dead frogs seen on roads during seasonal migrations into suitable breeding habitats, car tires may already do more damage to anuran populations than traffic noise ever will. While investigations into the effects of anthropogenic noise have become increasingly important in conservation biology in recent years (Barber et al. 2010; Brumm 2010), anthropogenic noise may be relatively far down on the list of threats to anurans living in a world with chytrid fungus, ranavirus, rampant habitat loss, invasive species, chemical toxins, and water pollution (reviewed in Semlitsch 2003).

3.2 Impaired Proximity Assessment

Male frogs maintain non-random spacing in choruses (Martof 1953; Gerhardt et al. 1989). One of the primary acoustic cues for doing so is the perceived amplitude of nearby neighbors’ calls (Wilczynski and Brenowitz 1988). When the amplitude of the calls of a neighbor or an intruder exceeds some threshold, males commonly switch from producing advertisement calls to aggressive calls (Rose and Brenowitz 1991). Lemon (1971) and Passmore and Telford (1981) hypothesized that neighboring male frogs avoid call overlap through precise call timing interactions to preserve their abilities to judge neighbors’ proximities. Schwartz (1987) found support for this prediction in field playback experiments with the eastern gray treefrog (H. versicolor), spring peepers (Pseudacris crucifer), and the yellow cricket treefrog (Dendropsophus microcephalus, formerly Hyla microcephala). He used a portable frog-call synthesizer that could be triggered to produce a call after various delays in response to a subject’s own calls. On different trials, and over a range of stimulus amplitudes, synthetic calls were triggered so that they either overlapped the subject’s calls or alternated with them in time. Males gave significantly more aggressive calls in the alternating condition compared to when stimulus calls overlapped the subjects’ calls. This result indicates that males cannot as accurately estimate a neighbor’s proximity when their calls overlap in time. Therefore, call alternation might function in allowing males to maintain optimal inter-male distances in the chorus. To our knowledge, no study has investigated the influences of overall background noise levels on aggressive behavior and the maintenance of inter-male spacing.

3.3 Impaired Source Localization

As described above, frogs tested in the laboratory are generally quite accurate at localizing sound sources. Therefore, one potential problem that frogs might be expected to encounter in the noisy, multisource environment of a breeding chorus involves increased difficulty localizing sources. Two studies of phonotaxis behavior in female frogs have investigated the effects of call overlap on source localization. Passmore and Telford (1981) showed that females of the painted reed frog (Hyperolius marmoratus) were equally good at localizing sources that broadcast either alternating or synchronous calls. Schwartz (1987) reported similar findings in his studies of eastern gray treefrogs, yellow cricket treefrogs, and spring peepers. Interestingly, there appear to have been no previous studies of frogs’ abilities to localize sounds in the presence of high levels of background noise (Feng and Schul 2007). Additional work is needed to assess the influences of chorus noise on the acuity of sound localization in anurans in order to determine the extent to which impaired localization constitutes part of the frog’s cocktail party problem.

3.4 Impaired Sound Pattern Recognition

High levels of background noise and overlapping signals can impact the ability of receivers to recognize spectral and temporal properties that identify conspecific calls. Consider, for example, the calls of the closely related eastern gray treefrog (H. versicolor) and Cope’s gray treefrog (H. chrysoscelis), which differ primarily in pulse rate. Under quiet conditions, females of both species are highly selective for calls with pulses produced at normal conspecific rates, and in two choice tests they practically never choose calls of the wrong species (Littlejohn et al. 1960; Gerhardt and Doherty 1988; Bush et al. 2002; Schul and Bush 2002; Bee 2008a; Nityananda and Bee 2011). In the presence of chorus-shaped noise, however, responsive females surprisingly choose the correct and incorrect calls in similar proportions at low SNRs (e.g., −9 dB), indicating that high noise levels potentially impair species recognition in choruses (Bee 2008a). High noise levels might also contribute to the mating mistakes females occasionally (though rarely) make in real chorus environments (Gerhardt et al. 1994). Call overlap from neighboring males is also a serious problem impairing recognition of conspecific calls in these frogs (Schwartz 1987; Marshall et al. 2006; Schwartz and Marshall 2006).

Schwartz (1987) used multiple-stimulus experiments to test the hypothesis that acoustic interference in the form of call overlap disrupts a female’s perception of temporal information critical for species discrimination. He tested females of the eastern gray treefrog, yellow cricket treefrog, and spring peeper. Males of the first two species produce pulsatile advertisement calls and pulse timing information is important for sound pattern recognition. In contrast, male spring peepers produce advertisement calls lacking a pulsed structure and consisting instead of a single frequency modulated tone (“peep”). In a four-alternative choice test, Schwartz (1987) gave females a choice between four stimuli presented from a four-speaker array, with each speaker separated by 90° and oriented toward the centre of a circular arena. Two stimuli were presented as alternating calls broadcast from speakers on opposite sides of the arena (separated by 180°); the other two stimuli were broadcast as overlapping calls from the other pair of speakers (also separated by 180°). In different tests, the two overlapping calls were either presented “in-phase” (i.e., precisely synchronized) or “out-of-phase.” When overlapping calls were in-phase, females of all three species chose alternating and overlapping calls in similar proportions. In the out-of-phase conditions, however, the temporal structure of pulsed calls was no longer preserved. Females of eastern gray treefrogs and yellow cricket treefrogs, the two species with pulsed calls, strongly preferred alternating to overlapping calls in the out-of-phase condition. In contrast, spring peepers, which do not produce pulsed advertisement calls, showed no preferences for alternating calls versus overlapping calls presented out of phase. These results supported the hypothesis that call overlap obscures temporal features of calls necessary for sound pattern recognition.

Given the findings of Schwartz (1987), it would seem advantageous for male eastern gray treefrogs (H. versicolor) to avoid call overlap with their neighbors. Surprisingly, however, mesocosm-scale experiments creating choruses of different group sizes in an artificial pond showed that groups larger than two males do not avoid call overlap (Schwartz et al. 2002). Instead, males of H. versicolor (and also its close relative, H. chrysoscelis) increase the number of pulses in their calls while reducing call rates in response to the calls of other individuals in the chorus and during playbacks of both calls and chorus noise (Wells and Taigen 1986; Klump and Gerhardt 1987; Schwartz et al. 2002; Love and Bee 2010; Ward et al. 2013b). Males usually maintain a constant “pulse effort” (number of pulses per call × call rate). Schwartz et al. (2008; 2013) have tested two hypotheses that might account for this interesting signaling behavior (Chap. 5). According to the “interference risk hypothesis” (Schwartz et al. 2001), the observed correlated changes in call duration and rate function to increase the chances that females will perceive enough unmasked pulses and interpulse intervals per call necessary to allow call recognition. This hypothesis predicts that, whenever there is risk of call overlap, producing longer calls at lower call rates should attract more females than producing shorter calls at higher call rates because, on average, the number of unobscured pulses and interpulse intervals should be higher in the former combination. To test this prediction, Schwartz et al. (2008) gave H. versicolor females a choice between short and long calls with equal pulse efforts, which had, on average, either 50 or 67 % of the call overlapped by another call or by a burst of chorus-shaped noise. Their results strongly refuted the interference risk hypothesis. Females showed no preferences for the longer alternatives under any of the circumstances in which overlapping calls or bursts of noise interfered with the temporal structure of the advertisement calls.

More recently, Schwartz et al. (2013) have tested an alternative hypothesis (the “call detection hypothesis”) to explain the correlated changes in call duration and rate in male signaling behavior in the presence of noise and competing calls. The key prediction of this hypothesis is that longer calls are easier to detect in a noisy environment. This prediction is based on a fundamental feature of auditory processing known as temporal integration, which refers to the ability to integrate acoustic features over time (reviewed in Brumm and Slabbekoorn 2005; Recanzone and Sutter 2008). It is well established that signal detection thresholds decrease (over a limited range) as a function of increasing signal duration (Heil and Neubauer 2003), and that other animals besides gray treefrogs also lengthen their signals as a function of ambient noise levels (e.g., Brumm et al. 2004). In quiet conditions, a minimum number of consecutive pulses (e.g., 3–6 in H. versicolor and 6–9 in H. chrysoscelis) are required to elicit positive phonotaxis from females of both gray treefrog species (Bush et al. 2002; Vélez and Bee 2011). These findings are generally consistent with temporal integration by neurons in the frog midbrain that require a minimum number of correct interpulse intervals before firing (Alder and Rose 1998; Edwards et al. 2002; Schwartz et al. 2010a). Importantly, different pulse-integrator neurons fire in response to different threshold numbers of pulses (Alder and Rose 1998; Edwards et al. 2002). Therefore, signals with more pulses are more likely to activate larger populations of pulse-integrator neurons. Might temporal integration by these or similar neurons provide an advantage to producing longer calls in noise? In a series of single-stimulus phonotaxis tests, Schwartz et al. (2013) used an adaptive tracking procedure (Bee and Schwartz 2009) to measure response thresholds of female H. versicolor during broadcasts of calls of different durations (10, 20, 30, or 40 pulses) in the presence of chorus shaped noise. In contrast to predictions of the call detection hypothesis, response thresholds did not vary as a function of call duration. These results are inconsistent with the hypothesis that changes in the signaling behaviors of male eastern gray treefrogs in acoustically cluttered environments function to take advantage of temporal integration in the receiver’s auditory system. At present, an entirely satisfactory functional explanation of this interesting signaling behavior remains elusive.

3.5 Constraints on Mate Choice Preferences

Anurans have featured prominently in studies of sexual selection (Ryan 1991; Ryan and Rand 1993; Gerhardt and Huber 2002). Consistent with findings from numerous other taxa (reviewed in Andersson 1994), female frogs often exhibit directional preferences for various signal features, such as longer or louder calls, faster rates of calling, more complex calls, and lower sound frequencies (reviewed in Ryan and Keddy-Hector 1992; Gerhardt and Huber 2002). Much of what we currently know about the intraspecific mate choice preferences of female frogs comes from phonotaxis tests conducted under optimal listening conditions. However, it is also a well-established fact that chorus noise and call overlap can render a male’s calls less attractive than they would be in quiet conditions (Schwartz and Wells 1983; Wells and Schwartz 1984). Current opinion is that mate choice preferences identified in quiet laboratory sound chambers are usually constrained by the overlapping calls and high levels of background noise present in the natural environment.

The abilities of call overlap and high noise levels to constrain mate choice preferences based on temporal call features is well illustrated by studies of gray treefrogs. In two-alternative choice tests conducted in quiet conditions, female eastern gray treefrogs (H. versicolor) and Cope’s gray treefrogs (H. chrysoscelis) exhibit strong, directional preferences for longer calls that contain more pulses (Gerhardt et al. 1996; Schwartz et al. 2001; Bee 2008b; Vélez et al. 2013; Ward et al. 2013b). Evidence from quantitative breeding studies of the former species indicate that preferences for longer calls provide females with indirect benefits associated with increased offspring fitness (Welch et al. 1998). The expression of preferences for longer calls, however, is constrained under acoustic conditions designed to simulate the “real world” acoustic environments of breeding choruses. In laboratory two-alternative choice tests conducted in the presence of chorus noise, females of both species exhibit reduced preferences for longer calls (Schwartz et al. 2001; Bee 2008b). Under some conditions, noise was able to completely abolish or even reverse preferences for longer over shorter calls in H. chrysoscelis (Bee 2008b). Schwartz et al. (2001) extended their investigations of H. versicolor by testing female preferences for call duration under more realistic conditions. In a mesocosm-scale experiment, they created real choruses consisting of between four and eight males calling in an artificial pond constructed in a greenhouse. Male calling behavior was monitored while individual females were released from a holding cage at the centre of the pond and allowed to choose a mate. Overlapping calls were common. Logistic regression analyses revealed only weak discrimination by females based on differences in the average number of pulses in male calls. Results from a field experiment conducted in a natural chorus were similar. In that experiment, Schwartz et al. (2001) placed eight speakers housed in cylindrical screen cages around the edge of a pond in which H. versicolor males called and formed a chorus. Typical calls contain between 8 and 28 pulses, with an average pulse number of about 16 pulses per call. The eight speakers at the pond broadcast synthetic calls that had 6, 9, 12, 15, 18, 21, 24, or 27 pulses. The results were clear; females discriminated strongly against the shortest, six pulse call, but there was little evidence that females discriminated among calls with nine or more pulses. In summary, experiments with gray treefrogs conducted in the laboratory, in mesocosm choruses in artificial ponds, and in natural choruses indicate that high levels of background noise and call overlap impair the ability of females to exercise adaptive mate choice preferences for males that produce longer calls with more pulses.

Chorus noise can also impact the choices females make between calls with different spectral properties (Fig. 6.3b). In two-alternative choice tests conducted in the absence of noise, female hourglass treefrogs preferred calls with a dominant frequency of 2,960 Hz to those with a dominant frequency near the population mean of 3,240 Hz (Wollerman 1998; Wollerman and Wiley 2002). Because dominant frequency is inversely related to body size, preferences for lower frequency calls might allow females to choose larger males (Wollerman 1998). In the presence of chorus noise, however, the preference for the low frequency call was abolished at SNRs of +6 and +9 dB (Wollerman and Wiley 2002). Interestingly, the preference was actually reversed in favor of the higher-frequency call at a SNR of +3 dB (Fig. 6.3b). This result was interpreted as a possible change in strategy used by females in very noisy conditions. At an ultimate level, shifting from a discrimination task to one of detecting the most common call produced by conspecific males (e.g., calls with near-average values) might allow females to avoid mating with heterospecific males when high levels of background noise introduce uncertainty about species identity (Wollerman and Wiley 2002). A possible proximate-level hypothesis for these data is that signal recognition thresholds in noise vary as a function of dominant frequency. Recall that (Wollerman 1999) estimated a SNR for signal recognition between +1.5 and +3 dB for an average call with a dominant frequency of 3,240 Hz (Fig. 6.3a). If the masked recognition threshold for a 2,960 Hz call is higher, the observed change in female preference in favor of the 3,240 Hz call at a low SNR might be related to differences in the ability of females to detect or recognize both calls in the presence of chorus noise.

As a counter example to studies of gray treefrogs and hourglass treefrogs, Schwartz and Gerhardt (1998) have shown using two-alternative choice tests with another treefrog that the addition of chorus-like noise can also reveal significant preferences for calls differing in frequency that were not exhibited in quiet conditions. In that study, female spring peepers failed to discriminate behaviorally between two calls with different dominant frequencies in the absence of noise, but preferred a higher-frequency alternative when tested in the presence of artificial chorus noise. Multiunit recordings from the torus semicircularis of females demonstrated that this result was associated with a de-sensitization of the auditory system in response to loud noise (i.e., a threshold shift). At a high stimulus amplitude (85 dB SPL) without noise, calls of different frequency elicited similarly strong neural responses. This was likely due to broadening of eighth nerve tuning curves and rate saturation. However, isointensity neural response profiles became more peaked as stimulus amplitudes were reduced to 55 dB SPL. Interestingly, when noise simulating a loud chorus of males accompanied stimulus calls presented at 85 dB SPL, neural responses were more similar to those obtained at 55–65 dB SPL in quiet conditions. Moreover, such noisy conditions were the only ones during which there was a significant association between both the neural response strength to, and behavioral discrimination of, calls of different frequency for individual females.

3.6 Summary and Future Directions

The challenges associated with communicating in the acoustic scene of a breeding chorus impose functional consequences on receivers. To date, most of this work has examined the consequences for female frogs in the contexts of sound pattern recognition and choosing preferred mates. By comparison, we know less about how chorus noise and interfering signals impact source localization, not only in the horizontal and vertical planes, but also in terms of source proximity. An important direction for future studies will be to assess how frogs determine source location and proximity in noisy, multisource conditions. Another goal for future research will be to understand better the effects that noise has on the neural processing of communication sounds in relation to its impacts on perception (e.g., Schwartz and Gerhardt 1998). For example, auditory nerve fibers exhibit shifts in tone-evoked rate-level functions in noise that appear to function as a gain control mechanism that allows the auditory system to encode intensity information in the presence of noise (e.g., Narins 1987). Likewise, there is neurophysiological evidence from recordings in the frog midbrain to suggest noise can enhance the encoding of important information (e.g., amplitude modulation) through stochastic resonance (e.g., Bibikov 2002). Precisely how these sorts of neural phenomenon correspond to a receiver’s perceptual experience in noise remains poorly understood. While the studies reviewed in this section illustrate the problems for communication posed by listening in breeding choruses, an important next step for future research will be to understand the potential solutions by which receivers overcome or ameliorate them. We turn to these issues in the next three sections.

4 Release from Auditory Masking

4.1 Shifts in Frequency Tuning Associated with Heterospecific Signalers

As discussed above (Sect. 6.2.3.1), there is evidence that in some frog species, the peripheral auditory system acts as a “filter” tuned to the spectral content of conspecific calls. As a result, the periphery can reduce the potential for auditory masking by filtering out frequencies in heterospecific calls. We return to this idea here to highlight how the labile nature of behavioral sensitivity to different sound frequencies in frogs may be related to hearing and communication in a chorus environment.

Males of Allobates femoralis (formerly Epipedobates femoralis), a neotropical frog common throughout Amazonia, defend territories on the forest floor against intrusion by other conspecific males; an important component of a defensive response is phonotaxis toward a calling intruder (Narins et al. 2003, 2005; Amézquita et al. 2005, 2006). In some geographic locations, but not others, A. femoralis occurs syntopically with another frog species, Ameerega trivittata (formerly E. trivittatus). Both species produce calls composed of a series of pulses, and A. trivittata calls have lower, but overlapping, frequencies that could potentially mask the frequency content of A. femoralis calls (Amézquita et al. 2005). There is no evidence to suggest the spectral content of the two species’ calls have diverged in sympatry (Amézquita et al. 2006). In a study of territorial behavior in A. femoralis, Amézquita et al. (2006) used a phonotaxis assay to generate behavioral “frequency-response curves” that measured the magnitude of response as a function of the carrier frequency of a synthetic call (Fig. 6.4). By comparing responses from male A. femoralis from populations that were sympatric and allopatric with A. trivittata, they tested the hypothesis that male responsiveness to calls with different frequencies is shaped in ways consistent with evolutionary shifts in frequency tuning in the auditory system. That is, they were interested in testing whether the frequency sensitivity of the auditory system is evolutionarily labile and can change in ways that reduce auditory masking by heterospecifics. The results unequivocally showed that the low-frequency tail of the behavioral frequency-response curve of A. femoralis was shifted toward higher frequencies in areas of sympatry, as expected if the tuning of the auditory system had shifted over evolutionary time to filter out the calls of A. trivittata (Fig. 6.4). More recently, Amézquita et al. (2011) showed that the signal recognition space, both in the spectral and temporal domains, is shaped in ways that reduces acoustic interference by heterospecific calls in a complex acoustic environment of ten vocally active species.

Fig. 6.4
figure 4

Putative shifts in frequency tuning in the Amzonian dendrobatid frog, Allobates femoralis. Males of this species produce a call composed of a series of frequency modulated pulses sweeping upward between about 2.6 and 3.4 kHz. This species has a geographic distribution partially overlapping that of another frog, Ameerega trivittata. In A. trivittata, male calls have slightly lower frequency (e.g., 2.3–3.0 kHz) that partially overlaps the lowest frequencies in A. femoralis calls and thus represent a potential source of masking and interference in sympatric populations. The data depicted here represent behavioral frequency response curves showing the probability that males of A. femoralis responded to playbacks at the given frequency interpolated from logistic regression analysis of binary responses. As illustrated here, the low-frequency end of behavioral frequency response curves were shifted to higher frequencies in (a, b) sympatric populations compared with (c, d) allopatric populations. Whether these shifts represent shifts in auditory tuning or shifts in behavioral decision rules remains an interesting and important question for future study. Redrawn from Amézquita et al. (2006) with permission from John Wiley and Sons

If the behavioral results of (Amézquita et al. 2006) can be confirmed with physiological measures of frequency sensitivity, it would represent a very exciting confirmation that the “matched filter” tuning of frog auditory systems is evolutionarily labile in ways that can bring about a release from auditory masking in mixed-species choruses without concomitant (co-evolutionary) shifts in the spectral content of conspecific signals. At present, however, it is not possible to draw this conclusion without physiological measures of auditory tuning because the results appear equally consistent with shifts in either frequency tuning or behavioral decision rules (Chap. 2). That is, it is possible that the behavioral differences between sympatric and allopatric populations illustrated in Fig. 6.4 reflect some form of stimulus-specific behavioral plasticity, such as decreases in aggressive responsiveness (e.g., Bee 2003) or increases in aggressive response thresholds (e.g., Humfeld et al. 2009).

4.2 Shifts in Frequency Tuning Associated with Environmental Noise

Among the predicted evolutionary responses to high levels of abiotic background noise in a habitat would be to use a frequency channel free from the noise or a different sensory modality all together. Frogs exhibit both types of solutions in response to noise generated by fast-flowing water. The use of visual signals by frogs breeding in such habitats is well known (reviewed in Hödl and Amézquita 2001). The discovery that some frogs breeding near sources of loud water noise also communicate with ultrasonic signals is arguably the most important recent finding in studies of anuran acoustic communication (Feng et al. 2006; Arch et al. 2008, 2009). In some habitats, the frequency spectrum of sound generated by fast-flowing water is characterized by high levels of acoustic energy below 30 kHz, with most energy present at the very low end (e.g., 100 Hz) of this frequency range. An anuran acoustic communication system in this habitat might benefit from using ultrasonic frequencies to reduce auditory masking (see Sect. 5.3.1.1 in Chap. 5).

Sensitivity to ultrasound has been demonstrated in a few species of frogs, such as the concave-eared torrent frog, Odorrana tormota (formerly Amolops tormotus), the hole-in-the-head frog, Huia cavitympanum, and the large odorous frog, Rana livida. Ultrasonic signaling has been studied most extensively in the first two of these species. Recordings of evoked potentials and single units from the auditory midbrain of these two species reveal sensitivity to frequencies above 20 kHz (Feng et al. 2006; Arch et al. 2009). Doppler vibrometry measurements of the tympanic membrane of H. cavitympanum revealed a broad peak in sensitivity that extended into the ultrasonic frequency range (Arch et al. 2009). At a behavioral level, sensitivity to ultrasonic signals has been documented through playback experiments with males of both species. In field playback experiments, Arch et al. (2009) found that H. cavitympanum males respond antiphonally to playbacks of both audible (<20 kHz) and ultrasonic (>20 kHz) components of conspecific calls. Similarly, audible and ultrasonic components of calls also evoke vocal responses from O. tormota (Feng et al. 2006). In the latter species, ultrasonic hearing involves an apparently derived ability to close the Eustachian tubes, which are thought to be permanently open in most other frogs. The accompanying increase in the impedance of the middle ear both boosts tympanic vibrations for higher sound frequencies and lowers them for lower frequencies (Gridi-Papp et al. 2010). Also likely helpful for ultrasonic hearing are recessed tympana and unusually thin tympanic membranes (Feng et al. 2006; Feng and Narins 2008).

4.3 Spatial Release from Masking

In natural settings, including multisource social aggregations, sound sources are often spatially separated. Human listeners experience significant improvements in speech intelligibility when the sources of target speech and competing sources of noise are spatially separated (reviewed in Bronkhorst 2000). Compared to a “co-localized” condition, in which target speech and masking noise originate from the same location, a release from masking of about 6–10 dB is observed when signals and maskers are displaced in azimuth. Under binaural listening conditions, the major cues for this spatial release from masking in humans are an improvement in the signal-to-noise ratio at one ear and disparities in the interaural time and level differences of signals and maskers (Bronkhorst 2000). Spatial release from masking is not unique to humans and has been demonstrated in several birds (Chap. 8) and mammals (Chap. 10). Several studies have also investigated the effect of spatial separation between signals and noise in anurans.

Using two-alternative choice tests, Schwartz and Gerhardt (1989) measured the ability of female green treefrogs to detect and discriminate between advertisement calls and aggressive calls in the presence of broadband maskers that were either co-localized with the signals or separated by either 45° or 90° in azimuth. Spatial separation led to observable improvements in the ability of females to detect calls, but not to discriminate between the two call types. Consistent with signal detection theory, this result suggests signal discrimination is a more difficult task than signal detection (Chap. 2). The magnitude of spatial unmasking was estimated to be about 3 dB. More recent studies of Cope’s gray treefrog (H. chrysoscelis) have investigated spatial release from masking in the presence of chorus-shaped noise. Using single-stimulus phonotaxis experiments, Bee (2007b) presented females with advertisement calls at SNRs between −12 and +12 dB (in 6 dB steps) in the presence of chorus-shaped noise. The target signal and masker were broadcast from two speakers that were either adjacent (angular separation of 7.5°) or spatially separated by 90°. Based on measures of normalized response latencies, Bee (2007b) estimated the magnitude of spatial unmasking to be on the order of 6–12 dB (Fig. 6.5). A more recent study using an adaptive tracking procedure to measure signal recognition thresholds in co-localized and separated (90°) conditions revealed about 4 dB of masking release in the separated condition (Fig. 6.5; Nityananda and Bee 2012). These estimated magnitudes of masking release are biologically relevant, as females are known to discriminate differences in SNR as small as 2 dB (Bee et al. 2012). Bee (2008a) found that spatial separation between signals and chorus-shaped noises also influenced the ability of female Cope’s gray treefrogs to discriminate between conspecific advertisement calls and those of the eastern gray treefrog (H. versicolor), a closely related species that often breeds synchronously and syntopically with Cope’s gray treefrogs. It was recently shown that improved discrimination in the separated conditions resulted from better recognition of temporal sound patterns (Ward et al. 2013a). Spatial separation between sound sources may also influence intraspecific mate choice. Richardson and Lengange (2010) recently showed that increased spatial separation between signals enhanced the ability of female European treefrogs, Hyla arborea, to discriminate between calls in the presence of background noise.

Fig. 6.5
figure 5

Spatial release from masking in Cope’s gray treefrog, Hyla chrysoscelis. a Schematic top view of a phonotaxis test arena (2 m diameter) showing the positions of speakers, signals, and noises in a co-localized condition (signal and noise broadcast from the same or immediately adjacent speakers) and a separated condition (signal and noise broadcast from two speakers separated by 90°). b Normalized latency to reach the target speaker as a function of signal-to-noise ratio (SNR) and angular separation between the target signal and the chorus-shaped masker. Normalized latencies were calculated relative to the latency to reach the target signal in reference conditions in the absence of masking noise; values equal to 1.0 represent latencies similar to those in reference conditions, whereas values close to 0 represent very slow behavioral responses or no response at all. Compared to the co-localized condition, normalized latencies were significantly higher in the separated (90°) condition at SNRs of −6 and 0 dB (asterisks). c Signal recognition thresholds, estimated using an adaptive tracking procedure, are significantly lower in the separated (90°) condition compared with the co-localized condition. Redrawn from (b) Bee (2007b) and (c) Nityananda and Bee (2012) and reprinted with permission from Elsevier

Spatial separation between sound sources also confers benefits when calls are masked by other temporally overlapping calls. As discussed earlier (Sect. 6.3.4), call overlap can degrade the temporal structure of pulsed calls, thereby hindering sound pattern recognition. When given a choice between overlapping calls broadcast from adjacent or spatially separated speakers, female eastern gray treefrogs discriminated in favor of the separated calls when the angle of separation was 120°, but not when it was 45° (Schwartz and Gerhardt 1995). Two other studies, however, have shown that spatial separation of sound sources may offer limited benefits. A spatial separation of 120° was insufficient for females of the yellow cricket treefrog to discriminate in favor of separated calls over adjacent calls (5°) that were presented such that temporal overlap degraded the pulse structure of the call (Schwartz 1993). In the two closely related gray treefrog species, H. versicolor and H. chrysoscelis, females showed strong preferences for conspecific calls when they alternated with heterospecific calls separated by 90° (Marshall et al. 2006). Interestingly, however, when the two calls overlapped, H. chrysoscelis females still showed a strong preference for the conspecific call, while H. versicolor females actually approached the speaker broadcasting the heterospecific call (Marshall et al. 2006). This study highlights the fact that sound source segregation mechanisms might operate differently even among closely related species (see also Vélez et al. 2012; Vélez and Bee 2013).

Spatial unmasking in anurans has also been investigated at a neurophysiological level in northern leopard frogs (Rana pipiens). Ratnam and Feng (1998) found that increasing the angular separation between the sources of a masking noise and an amplitude-modulated signal resulted in lower signal detection thresholds in neurons in the torus semicircularis. Similarly, Lin and Feng (2001) found evidence for spatial release from masking in the responses of both single auditory nerve fibers and torus semicircularis neurons. In the auditory nerve, the magnitude of spatial unmasking was about 3 dB, while that in midbrain neurons was on the order of about 9 dB (Lin and Feng 2001). On the one hand, these results indicate that central neural processing contributes to enhancing the effect of spatial separation measured at the periphery (Lin and Feng 2001, 2003). On the other hand, however, the magnitude of spatial unmasking at the periphery is similar to that seen in behavior, although this may represent differences between the species tested (treefrogs versus leopard frogs). Additional work integrating behavioral and physiological measures of spatial unmasking in the same species will be required to resolve this issue.

These findings serve to emphasize that spatial release from masking may be one important mechanism by which frogs solve problems associated with breeding in choruses. Given heterogeneity in the spatial distributions of calling males in the habitat, it seems likely that receivers in a chorus often encounter situations in which signals of interest are well separated from other concurrent signals and dominant sources of background noise in the environment. Under such conditions, spatial release from masking may contribute to a listener’s ability to hear individual calls. As the studies above illustrate, there may sometimes be considerable variation in auditory processing strategies among species, emphasizing the need for rigorous comparative studies to understand the evolution of mechanisms for hearing in noisy natural settings.

4.4 Dip Listening and Comodulation Masking Release

A well-known feature of natural soundscapes, including those generated in frog breeding choruses, is that sound levels fluctuate through time, that is, they are amplitude modulated (Richards and Wiley 1980; Nelken et al. 1999; Vélez and Bee 2010). Furthermore, these amplitude modulations are often correlated through time across different regions of the frequency spectrum; that is, natural sounds are often comodulated (Klump 1996; Nelken et al. 1999). Psychophysical studies of a phenomenon called dip listening indicate that human listeners are often much better at detecting and recognizing target signals, including speech, when maskers fluctuate in amplitude compared to steady-state maskers with stationary envelopes. The magnitude of masking release due to dip listening commonly ranges between 5 dB and 20 dB, depending on the temporal properties of the maskers and the target signals, and has been attributed to the listener’s ability to catch brief “glimpses” of target signals at moments when the amplitude of the masker dips to low levels (Buus 1985; Gustafsson and Arlinger 1994; Bacon et al. 1998; Füllgrabe et al. 2006). Studies of a related phenomenon called comodulation masking release (CMR, reviewed in Verhey et al. 2003) indicate that listeners are also sensitive to spectro-temporal correlations in the fluctuating envelopes of masking noise. For example, when a tone signal is masked by a narrow band noise with a fluctuating envelope, the addition of a second narrowband noise at a remote frequency can produce several dB of masking release when its envelope is comodulated with that of the on-signal band compared to when it fluctuates independently. Dip listening has also been demonstrated in insects (Chap. 3) and CMR has been demonstrated in songbirds (Chap. 8), gerbils (Klump et al. 2001), and dolphins (Chap. 10). Might these processes also function in hearing and sound communication in frogs?

The sounds generated by frog choruses exhibit species-specific patterns of amplitude fluctuation (Vélez and Bee 2010). A recent study using single-stimulus tests has shown that female Cope’s gray treefrogs experience a release from masking by listening in the dips of sinusoidally amplitude modulated (SAM) chorus-shaped noise (Fig. 6.6, Vélez and Bee 2011). Interestingly, evidence for dip listening was found only in SAM maskers that fluctuated at slow rates [e.g., 0.625 (Fig. 6.6b), 1.25, and 2.5 Hz], for which signal recognition thresholds were about 2-4 dB lower than those obtained in the presence of a steady-state masker (Fig. 6.6d). At intermediate rates [e.g., 5 (Fig. 6.6c) to 20 Hz], signal recognition thresholds were not different from those measured in the presence of the steady-state masker (Fig. 6.6d). And at faster rates of fluctuation (e.g., 40–80 Hz), SAM maskers caused increases in signal recognition thresholds of about 4–6 dB compared with steady-state maskers (Fig. 6.6d). Given that advertisement calls in Cope’s gray treefrogs have pulse rates of about 35–50 pulses/s, impaired recognition at faster rates of modulation is consistent with the idea that the temporal structure of the masker interfered with the subjects’ perception of the temporal pulse structure of the signal. The masking release seen at slow rates of masker modulation could be attributed to dip listening. Analyses of the target signals and maskers revealed that the maximum number of consecutive pulses fitting between the 6-dB down points of the sinusoidal modulation was 32 pulses in the most slowly fluctuating masker, decreasing exponentially to just one pulse in the most rapidly fluctuating maskers (Fig. 6.6e). Significant masking release was observed in the masking conditions for which the number of consecutive pulses occurring in a dip was nine pulses or more. In other words, females benefited from dip listening when they could catch acoustic glimpses of about nine pulses. This result was consistent with parallel tests conducted in quiet showing the threshold number of consecutive pulses required to elicit positive phonotaxis was between six and nine pulses (Fig. 6.6e; Vélez and Bee 2011). Schwartz et al. (2013) have also demonstrated dip listening in the eastern gray treefrog. Together, these results suggest that the ability of female gray treefrogs to listen in the dips of amplitude modulated noise may be constrained by sensory mechanisms responsible for encoding temporal properties critical for species recognition, such as neurons in the midbrain that “count” interpulse intervals (Alder and Rose 1998; Edwards et al. 2002). Interestingly, however, not all frogs may benefit from dip listening. Parallel tests of call recognition in fluctuating noise with green treefrogs have so far failed to uncover evidence for dip listening (Vélez et al. 2012; Vélez and Bee 2013).

Fig. 6.6
figure 6

Dip listening in Cope’s gray treefrog, Hyla chrysoscelis. Waveforms of 3.2 s segments of the target signal (black) in the presence of chorus-shaped maskers (gray) representing a the steady-state control condition, and conditions in which maskers fluctuated sinusoidally (SAM) at rates of b 0.625 Hz and c 5 Hz. The target signal was a synthetic advertisement call composed of 32 pulses delivered at a rate of 45.5 pulses/s. The solid black line in b depicts the sine wave used to modulate the masker. The dashed lines in b and c illustrate the values at which the amplitude of the fluctuating maskers reaches 50 % of the maximum amplitude and mark the 6 dB down-points used to measure the number of pulses falling within dips of fluctuating maskers (e). d Threshold differences as a function of masker fluctuation rate; these differences are relative to the threshold measured in the control condition with a non-fluctuating, steady-state masker. The dashed line represents no-difference (i.e., 0 dB) from the control condition. e Maximum number of consecutive pulses falling in the dips of SAM maskers as a function of masker fluctuation rate. A pulse was considered to fall in a dip when its maximum amplitude fell between the 6 dB down points of the masker. Redrawn from Vélez and Bee (2011) and reprinted with permission from Elsevier

The anuran auditory system, with its two anatomically distinct sensory papillae for encoding airborne sounds, offers a superb model for studying CMR. The two sensory papillae can be considered separate “channels” for sensory input, providing a unique perspective on questions of within-channel versus across-channel mechanisms (Verhey et al. 2003). At a behavioral level, there is evidence to suggest that frogs exploit common envelope fluctuations across frequency channels in the recognition of advertisement calls. A release from masking of approximately 3 dB to 5 dB in the presence of comodulated maskers has been reported for females of H. chrysoscelis (Bee and Vélez 2008). At a physiological level, neural correlates of CMR have been documented for neurons in the auditory midbrain of the northern leopard frog (Goense and Feng 2012). Additional studies on the relative role of within and across channel contributions to CMR, at both the behavioral and physiological levels, would make valuable contributions to our understanding of the mechanisms for hearing in noisy environments in anurans.

4.5 Summary and Future Directions

As illustrated in this section, evolution has equipped anurans with a number of tricks and tools to reduce the impact of masking noise in their environment. Interesting and important questions remain about the ability of evolution to fine-tune auditory tuning in ways that filter out the calls of heterospecifics. A goal for future studies should be to determine at a physiological level whether there are differences in frequency sensitivity between sympatric and allopatric populations of species having calls that spectrally overlap. Likewise, there are tantalizing data from cricket frogs (Acris crepitans) to suggest the hypothesis that population differences in auditory tuning may sometimes reflect the operation of natural selection arising from differences in habitat acoustics (Witte et al. 2005). Hypotheses about population differences in auditory tuning might be easily tested using portable instruments to generate audiograms based on the auditory brainstem response (e.g., Schrode et al. in press), or by using distortion product otoacoustic emissions (e.g., Meenderink et al. 2010). Another important question for future studies will be to assess the evolution of ultrasonic hearing in frogs. Have we already discovered all species that communicate using ultrasound, or do many more fascinating discoveries await the herpetologist equipped with the right equipment for recording and reproducing ultrasonic frequencies (Arch and Narins 2008)? Clearly, it will be necessary to investigate ultrasonic communication in a phylogenetic framework. Studies of spatial release from masking, dip listening, and CMR indicate that frogs and humans may exploit some of the same perceptual cues for listening in noisy environments. Several features of the anuran auditory system – e.g., ears that function as pressure-gradient receivers, inner ears with multiple sensory papillae, and midbrain neurons that count interpulse intervals – make studies of the physiological mechanisms involved in achieving masking release particularly important (Bee 2012). While often smaller than the masking release observed in other vertebrates, frogs clearly benefit from exploiting spatial separation between signals and noise and temporal fluctuations in noise. Whether differences in magnitude between frogs and other vertebrates reflect real species differences or stem from differences in methodology remains an unanswered question. The use of more traditional psychophysical measures with frogs, or the development of entirely new techniques (e.g., Márquez et al. 2008) that measure phonotaxis behavior in stationary subjects (e.g., similar to trackballs and walking compensators used in many insect studies; Gerhardt and Huber 2002) might shed much needed light on this issue.

5 Auditory Scene Analysis

Acoustic scenes are often quite complex. They may comprise multiple sources concurrently producing sequences of spectrally rich sounds. Somehow, the auditory system has to make sense of this type of sensory input, and the processes by which it does are commonly studied under the rubrics of auditory scene analysis (Bregman 1990) or auditory grouping (Darwin and Carlyon 1995; Darwin 1997). The major question of interest concerns how the auditory system integrates the sounds generated by one source into a coherent representation (often termed an auditory stream or object) that is distinct from the sounds produced by other sources in the environment. In other words, how do auditory systems put together sounds that belong together and keep apart sounds that do not?

As discussed previously, frog calls often consist of discrete sound elements (e.g., notes or pulses) produced in sequence, and each element often has simultaneous sound energy across the frequency spectrum (Fig. 6.1). In this section, we take up the question of how receivers perceptually bind or integrate sounds arising from the same source. Two forms of perceptual integration can be distinguished (Bregman 1990): (1) the sequential integration of temporally separated sound elements produced by the same source (e.g., pulses, notes, and calls) and (2) the simultaneous integration of different components of the frequency spectrum of a sound originating from the same source (e.g., harmonics, formants). The human auditory system accomplishes sequential and simultaneous integration by exploiting a relatively small number of commonalities in the acoustic properties of sounds arising from a single source. Sounds produced by a given source are more likely to be grouped into one auditory stream when they share common acoustic properties (Bregman 1990; Darwin and Carlyon 1995; Darwin 1997; Carlyon 2004). In contrast, sounds with acoustic properties that differ substantially are more likely to be assigned to different auditory streams. In this section, we review the current state of knowledge concerning sequential and simultaneous integration in the context of hearing and sound communication in frogs. In addition, we review recent studies of frogs’ abilities to perceptually reconstruct auditory objects when signals are partially masked and to exploit schema-based cues in auditory grouping.

5.1 Sequential Integration

5.1.1 Auditory Streaming Based on Frequency Differences

The term auditory streaming is commonly used to refer to the ability to integrate sequences of sounds from one source into a coherent auditory stream that can be attended and followed through time (Bregman 1990; Carlyon 2004; Shamma and Micheyl 2010). Examples of auditory streaming at work are when we follow a melody line in polyphonic music or one person speaking in a noisy restaurant. Studies of auditory streaming in humans have made extensive use of simple sound sequences of two repeated, interleaved tones (A and B) differing in frequency or some other salient acoustic property (e.g., ABABAB…; Fig. 6.7a, b) (reviewed in Moore and Gockel 2002; Carlyon 2004). Frequency is an important cue for organizing such sequences. When the frequency separation (ΔF) between A and B tones is small, we hear a “trilled” sound jumping up and down in frequency (Fig. 6.7a). When ΔF is larger, something very different is perceived. The two tones perceptually “split” into different streams, one comprising all A tones (e.g., A–A–A–…) and one comprising all B tones (–B–B–B…), each perceived at half the rate of the original sequence (Fig. 6.7b). Tones similar in frequency are grouped together into a coherent stream, while tones different in frequency are assigned to different streams. Psychophysical studies using this or similar stimulus paradigms with fish, songbirds, and monkeys have revealed auditory streaming to be common among vertebrates (reviewed in Fay 2008). In humans, some of the same mechanisms responsible for ΔF-based streaming of tone sequences may underlie our abilities to exploit differences in F 0 to assign concurrent voices to separate auditory streams (e.g., Brokx and Nooteboom 1982; Bird and Darwin 1998).

Fig. 6.7
figure 7

Call timing and the utility of auditory streaming in frogs. a, b Schematic illustration (top spectrograms, bottom waveforms) of auditory streaming based on differences in frequency (ΔF) in the ABAB stimulus paradigm common in human psychoacoustic studies. a At small frequency separations between tones A and B, the tones are more likely to be perceptually “grouped” together to form one stream and listeners tend to hear a “trilled” sound jumping up and down in frequency (e.g., ABABABAB…). b In contrast, at large frequency separations, the A and B tones become perceptually separated into two different auditory streams and listeners tend to hear two tone sequences broadcast at half the rate, one comprising all A tones (e.g., A–A–A–A–…) and the other all B tones (e.g., –B–B–B–B…). cf Schematic waveforms illustrating several types of call timing interactions that occur between neighboring male frogs in breeding choruses (see Chap. 5). Differences in frequency or other acoustic properties might allow receivers to perceptually separate interleaved calls (c, d, f) or notes (e) into different auditory streams. cf Reprinted from Wells and Schwartz (2007) with permission from Springer

Auditory streaming based on frequency differences might be important for anuran communication, but this question has so far received little attention. Across species, male frogs calling in close proximity in a chorus exhibit a diversity of call timing behaviors to avoid obscuring the temporal structure of their calls (reviewed in Chap. 5). The end result is a sequence of temporally interleaved sounds arising from different sources (Fig. 6.7c–f). To what extent might receivers exploit individual differences or species differences in frequency (or other acoustic properties) to perceptually organize such sequences into coherent auditory streams that correspond to different sources? Nityananda and Bee (2011) took up this question in a study of Cope’s gray treefrog. In single-stimulus tests, they measured female responses to a target signal that simulated a conspecific call and consisted of a short pulse train with a pulse rate of 45.5 pulses/s (Fig. 6.8a). In the absence of other sounds, this signal elicited robust phonotaxis. The key to the experiment was that in some tests, the pulses of the target signal were temporally interleaved with a behaviorally neutral “distractor” consisting of a continuous pulse train (also 45.5 pulses/s). Each time the signal was presented with the distractor, the instantaneous pulse rate was effectively doubled to 91 pulses/s (Fig. 6.8a). Importantly, control tests had demonstrated that females were selective for the conspecific pulse rate of 45.5 pulses/s and discriminated strongly against signals with the faster pulse rate (see also Bush et al. 2002; Schul and Bush 2002). Therefore, Nityananda and Bee (2011) reasoned as follows. If females assigned target and distractor pulses to different auditory streams, then they should exhibit positive phonotaxis toward the attractive percept of the target signal. If, however, they perceptually “fused” or integrated pulses from the target and the distractor into the same stream, this should result in an unattractive percept based on pulse rate. When the target and distractor had the same carrier frequency (ΔF = 0 semitones), females exhibited little interest in the target (Fig. 6.8b). But on trials when the carrier frequency of the distractor was sufficiently far removed (e.g., ΔF ≥ 6 semitones) from that of the target, but still within the empirically determined hearing range, females exhibited phonotaxis toward the target (Fig. 6.8b). These data are consistent with the hypothesis that auditory streaming was possible based on differences in frequency. That improvements in performance were observed at frequency separations of 6 semitones was important, as this approximates the difference in frequency between the two spectral peaks present in conspecific advertisement calls and the dominant spectral peak in the calls of the synchronously and syntopically breeding American toad (Bufo americanus) (Fig. 6.8c).

Fig. 6.8
figure 8

Auditory streaming in Cope’s gray treefrog, Hyla chrysoscelis. a Spectrogram (top) and waveform (bottom) of the interleaved target signal and distractor pulse trains. When the two pulse trains are perceptually integrated into a single auditory stream, the resulting pulse rate is 91 pulses/s and unattractive to females. When the target signal and the distractor are segregated into two streams, the target signal has an attractive pulse rate of 45.5 pulses/s. b Normalized latency to reach the target speaker as a function of frequency separation (ΔF). Values equal to 1.0 represent latencies similar to those in reference conditions lacking a distractor, whereas values close to 0 represent very slow behavioral responses or no response at all. Compared to the condition in which both pulse trains had the same frequency (ΔF = 0 semitones), responses were significantly faster when the frequency separation was equal to or greater than six semitones (asterisks). c Spectrogram of the acoustic environment of a mixed-species chorus that included Cope’s gray treefrogs and several other frog species. Arrows depict the frequency range of the distractors when the target signal was the low spectral peak of the call (1.3 kHz, right arrow) and the high spectral peak of the call (2.6 kHz, left arrow). Redrawn from Nityananda and Bee (2011) and reprinted with permission from the authors

5.1.2 Common Spatial Origin

In humans, spatially related cues can be effective in sequential integration when there is only one sound source present in the environment (reviewed in Darwin 2008). Thus far, only a few studies have investigated the role of spatial cues in sequential integration in frogs. Currently available evidence from several studies suggests that frogs may be willing to group temporally separated call elements over fairly large spatial separations. For example, one study of the EVR in males of the Australian quacking frog (Crinia georgiana) indicated that receivers perceptually group sounds coming from opposite directions. In this species, males produce a multinote call that sounds very much like a quacking duck. During episodes of vocal competition with neighbors, males attempt to match the number of “quacks” in their neighbors’ calls. In a field playback test, Gerhardt et al. (2000) presented males with two sequential four-note calls from speakers separated by 180°. The timing of the two calls was such that they had the same overall temporal pattern as an eight-note call. Somewhat surprisingly, males responded to the playbacks as if they had heard a single eight-note call coming from one location, instead of two consecutive four-note calls coming from opposite directions. Males continued to show evidence of grouping the two four-note calls together even when the second call in the sequence was attenuated by 6 dB. In terms of auditory grouping, male quacking frogs are fairly permissive of spatial separation between sounds comprising behaviorally meaningful temporal sequences. As studies of túngara frogs (Engystomops pustulosus, formerly Physalaemus pustulosus) and Cope’s gray treefrogs (H. chrysoscelis) indicate, female frogs are also willing to group widely separated sounds.

Male túngara frogs produce a simple call consisting of a whine only, and a complex call consisting of a whine followed by one or more chucks (Fig. 6.9a). Female túngara frogs exhibit positive phonotaxis toward speakers broadcasting whines alone but not chucks alone (Ryan 1985; Ryan and Rand 1990). Based on these findings, Farris et al. (2002, 2005) tested the hypothesis that common spatial origin promotes the sequential integration of whines and chucks into coherent representations of complex calls. In a circular arena, they broadcast whines and chucks in the natural temporal sequence from either the same speaker (angular separation of 0°) or from two speakers separated by 45°, 90°, 135°, or 180° (Fig. 6.9b) Females significantly oriented toward the chuck in conditions in which the two components of the call were separated by 45°, 90°, and 135°, but not by 180° (Fig. 6.9c) These results provide strong evidence for auditory grouping by frogs, but they refute the hypothesis that common spatial origin is necessary for grouping signal components separated in time. Based on their results, Farris et al. (2002) suggested whines and chucks are weighed differently in making “what” and “where” decisions; information about species identity (“what”) is primarily encoded in the whine, while information about location (“where”) is primarily encoded in the chuck. More recently, Farris and Ryan (2011) demonstrated females make relative comparisons and group whines and chucks in relatively closer proximity when multiple chucks are separated from a single whine.

Fig. 6.9
figure 9

Sequential integration in túngara frogs, Engystomops pustulosus. a Spectrogram and waveform of a túngara frog complex call composed of a whine and a chuck. b Schematic illustration of the test arena and the positions of the speakers that broadcast the whine and chucks from a common spatial origin (0°) or from different spatial origins (45°, 90°, 135°, and 180°). c Angles at which females exited the test arena in response to each condition. Females showed significant orientation toward the whine alone but not toward the chuck alone. When the whine and the chuck were broadcast from different locations, orientation toward the chuck was significant at angular separations of 45°, 90°, and 135° but not 180°. Reprinted from Farris et al. (2002) with costly permission from Karger AG

Bee and Riemersma (2008) showed that common spatial origin is also not necessary for sequential integration in Cope’s gray treefrogs. Females of this species are highly selective for calls with conspecific pulse rates (approximately 35–50 pulses/s), which are about twice as fast as the pulse rate of calls produced by males of the eastern gray treefrog (H. veriscolor). Bee and Riemersma (2008) presented females of H. chrysoscelis with two interleaved pulse sequences in which the pulses from each sequence were interdigitated. Each sequence had a pulse rate of 25 pulses/s (similar to H. versicolor calls), but the composite of both sequences combined had the preferred pulse rate of 50 pulses/s (as in conspecific calls). Hence, if the two sequences were perceptually integrated, the percept should have been one of an attractive conspecific call, whereas perceptual segregation should have resulted in the percept of two unattractive calls. On separate trials, the two interleaved sequences were separated by 0°, 45°, 90°, or 180°. The results showed that females were very permissive of spatial separation, and were able to integrate the two pulse trains even at a spatial separation of 180°. Together with results from quacking frogs and túngara frogs, these results suggest that common spatial origin may be a relatively weak acoustic cue for sequential integration in frogs. The permissiveness of sequential integration based on spatial cues may explain the failure of spatial separation to improve call recognition in some species (Schwartz 1993; Schwartz and Gerhardt 1995).

5.2 Simultaneous Integration

5.2.1 Harmonicity

Acoustic signals often contain harmonics that are multiple integers of the fundamental frequency (F 0). Instead of hearing simultaneous pure tones at each harmonic frequency, we tend to hear harmonic complexes as single, fused sounds with one pitch corresponding to F 0. Thus, our auditory systems group simultaneous sounds sharing a common F 0 (i.e., “harmonicity”) into a single coherent percept. The role of harmonicity as an auditory grouping cue has been studied quite extensively in humans using vowel sounds and complex tones (reviewed in Darwin and Carlyon 1995; Darwin 1997; Carlyon and Gockel 2008). A slight “mistuning” of one harmonic can cause listeners to “hear out” that spectral component, or perceptually segregate it from the rest of the harmonic complex. Under these conditions, listeners hear two simultaneous sounds: one complex tone and one separate pure tone corresponding to the mistuned harmonic. Therefore, inharmonic relationships between the spectral components of concurrent sounds favor their segregation into different auditory streams. Not surprisingly, then, differences in F 0 between simultaneously spoken sentences facilitate identification and recognition of target speech (e.g., Brokx and Nooteboom 1982; Bird and Darwin 1998).

Frogs are also sensitive to harmonicity. Using reflex modification, Simmons (1988a) demonstrated that thresholds for detecting a two-tone complex in noise were about 10 dB higher in green treefrogs when the two tones were inharmonically related compared with detection of harmonic tone complexes. These results confirmed that frogs process harmonic sounds differently than inharmonic ones. In a subsequent study of this species, Gerhardt et al. (1990) failed to find evidence that harmonicity influenced female preferences in phonotaxis experiments. When given a choice between harmonic and inharmonic synthetic calls, females chose randomly between the two alternatives. In the presence of background noise, there was no evidence for higher detectability of the harmonic alternative, nor preferences for the harmonic stimulus. In contrast to green treefrogs, two-alternative choice tests with the closely related barking treefrog (H. gratiosa) revealed that females of this species discriminate between harmonic and inharmonic synthetic calls (Bodnar 1996). Interestingly, females actually preferred inharmonic alternatives to harmonic ones when no frequency modulation (FM) was present in the signals. When FM was added, females preferred the harmonic alternative. Bodnar’s (1996) study also demonstrated that females were very sensitive to harmonicity, with a mistuning of one spectral component by 1.1 % sufficient for call discrimination. Results from studies of male frogs also reveal conflicting patterns of results. Simmons et al. (1993) used the EVR of male green treefrogs to investigate harmonicity as a call recognition cue in the laboratory. There were no significant differences in male vocal behavior (number of evoked calls and latency to first vocal response) in response to harmonic and inharmonic synthetic calls. In contrast, Simmons and Bean (2000) conducted field experiments testing the EVR in North American bullfrogs and found that they could discriminate between harmonic calls and calls with one spectral component mistuned by 2.8 %.

5.2.2 Common Onsets and Offsets

Another cue that contributes to simultaneous integration in humans is onset and offset synchrony (reviewed in Darwin and Carlyon 1995; Darwin 1997). Spectral components that start and end at the same times are more commonly grouped together into one auditory object. In contrast, frequency components that start, or end, at sufficiently different times from the other components are usually assigned to different auditory objects. To our knowledge, no study of anurans has investigated the effects of onset/offset synchrony. Indeed, we are aware of only one study that has investigated the role of common onsets/offsets in the communication system of a nonhuman animal. Geissler and Ehret (2002) showed that synthetic pup wriggling calls with a harmonic having an asynchronous onset or offset reduced the probability that female mice (Mus domesticus) would respond appropriately. Similar studies should be conducted with male and female frogs to investigate the role of common onsets/offset as an auditory grouping cue in anuran communication.

5.2.3 Common Spatial Origin

There is a general consensus that spatial cues play relatively minor roles in simultaneous integration in humans (reviewed in Darwin 2008). To date only one study has investigated the role of common spatial origin in allowing frogs to integrate the simultaneous spectral components common in multiharmonic acoustic signals. Bee (2010) took advantage of the spectral preferences of female Cope’s gray treefrogs to test the hypothesis that common spatial origin promotes simultaneous integration. The advertisement calls of this species consist of pulses with two harmonically related spectral peaks centred around 1.1–1.4 and 2.2–2.8 kHz (Fig. 6.10a). Females will approach calls with only one or the other spectral peak, but they generally prefer calls having both spectral peaks (Gerhardt 2005; Gerhardt et al. 2007; Bee 2010). Using two-alternative choice tests, Bee (2010) offered females a choice between two calls that were either spatially coherent or incoherent and that alternated in time with each other from opposite sides of a test arena (Fig. 6.10a). In the spatially coherent alternative, both harmonics were broadcast simultaneously from the same speaker on one side of the arena. In the spatially incoherent alternative, each harmonic was broadcast from one speaker in a pair of speakers located on the opposite side of the arena from the spatially coherent call. Across different trials, the separation between the sources of the harmonics in the spatially incoherent alternative was 7.5°, 15°, 30°, or 60° (Fig. 6.10a). At all angular separations tested, females significantly (or nearly so) preferred the spatially coherent alternative. In fact, females preferred the spatially coherent alternative to the incoherent one in proportions not different from their preferences for calls with two spectral peaks over those with just one spectral peak (Fig. 6.10b). These results support the hypothesis that common spatial origin promotes simultaneous integration in gray treefrogs.

Fig. 6.10
figure 10

Simultaneous integration in Cope’s gray treefrog, Hyla chrysoscelis. a Schematic representation of the circular test arena, the locations of the speakers, and the power spectra of the spatially coherent bimodal call (left) and the unimodal 1.1 kHz (top right) and 2.2 kHz (bottom right) unimodal calls. b Expected (squares) and observed (circles) proportions of subjects choosing each speaker when the harmonics of the spatially incoherent stimuli were separated by angles (θ) equal to 7.5°, 15°, 30°, or 60°. Expected proportions were based on results from two-alternative choice tests pairing a spatially coherent bimodal call against a unimodal call with the specified frequency. Redrawn from Bee (2010) and reprinted with permission from the American Psychological Association

5.3 Auditory Induction

In noisy environments, receivers may occasionally have to deal with signals that are partially masked by short, intermittent, loud sounds in the environment. While at a cocktail party, for instance, part of one person’s sentence may be momentarily masked by the loud cough or sneeze of a nearby guest. As studies of phonemic restoration (Warren 1970) have shown, our auditory system is quite good at perceptually reconstructing speech elements that are partially masked. Studies of a phenomenon known as the continuity illusion have generalized these results by showing, for example, that inserting a silent gap into a tone and then filling the gap with noise induces the perceptual illusion of an uninterrupted tone that continues through the noise (reviewed in King 2007). In both instances, it is as if our auditory system is able to “fill in” missing sound elements. Together, phonemic restoration and the continuity illusion represent examples of something called auditory induction, which refers to the auditory system’s ability to reconstruct or restore masked or missing elements of sound (King 2007). Importantly, auditory induction is not uniquely human and has been demonstrated using vocalizations in songbirds (Braaten and Leary 1999; Seeba and Klump 2009) and monkeys (Miller et al. 2001; Petkov et al. 2003). In contrast, two recent studies failed to find strong evidence that frogs experience auditory induction (Schwartz et al. 2010b; Seeba et al. 2010).

Using two-alternative choice tests, Seeba et al. (2010) took advantage of female preferences for longer calls (Fig. 6.11a) to test the hypothesis that female Cope’s gray treefrogs (H. chrysoscelis) perceptually reconstruct discrete pulses of male advertisement calls. When females were given a choice between a call with a normal pulse structure and a call with silent gaps produced by removing groups of pulses, females unanimously chose the complete call over the “gap call” (Fig. 6.11b). Importantly, a “gap-filled call” created by filling the gaps with bursts of band-limited noise was chosen unanimously over a gap call (Fig. 6.11c). On the surface, this result seemed to support the auditory induction hypothesis. However, the question remained as to whether subjects actually perceived illusory pulses during the noise bursts. Seeba et al. (2010) reasoned that if females perceptually restored the missing pulses in the gap-filled call, then they should preferentially choose it (20 real pulses +15 illusory pulses) over a shorter 20 pulse call, and it should be equally attractive as an equivalent duration 35 pulse call. These predictions were not supported. Females chose the longer gap-filled call and the shorter 20 pulse call in proportions not significantly different from chance expectations (Fig. 6.11d), but exhibited a significant preference for a 35 pulse call over the gap-filled call of equivalent duration (Fig. 6.11e). Schwartz et al. (2010b) similarly exploited female preferences for longer duration calls to investigate auditory induction in eastern gray treefrogs (H. versicolor), but they also found no evidence that females heard illusory pulses when real pulses were replaced with noise.

Fig. 6.11
figure 11

A test of auditory induction in Cope’s gray treefrogs, Hyla chrysoscelis. Schematic waveforms and percentage of females choosing each alternative stimulus (A or B) in two-alternative choice tests. a Females preferred longer calls with more pulses (35 pulses vs. 20 pulses). b Females also preferred a 35 pulse call over an equivalent-duration 20 pulse call in which 15 pulses (3 groups of 5) were removed (gap call). c Consistent with the auditory induction hypothesis, females preferred calls in which the gaps were filled with noise bursts (gap-filled call) over the gap call. As illustrated in (d, e), however, females did not perceive the gap-filled call as a call with 35 pulses. This conclusion follows because (d) females chose randomly between the gap-filled call and a shorter 20 pulse call, but strongly preferred a 35 pulse call to a gap-filled call of equivalent overall duration. Redrawn from Seeba et al. (2010) and reprinted with permission from Elsevier

These results with gray treefrogs suggest the provisional conclusion that frogs, unlike birds and mammals, may be incapable of perceptually reconstructing signals momentarily interrupted by noise. Seeba et al. (2010) and Schwartz et al. (2010b) discuss various hypotheses for these negative results. One such hypothesis is that frogs cannot restore temporally discrete elements in pulsatile calls. It was suggested that frogs might be able to restore missing portions of continuous sounds, such as a long call note. Preliminary data from studies of auditory induction in túngara frogs, however, have so far provided little evidence that females are able to perceptually restore short segments that are deleted from the normally continuous whine portion of the call and filled with noise (AT Baugh, MA Bee, and MJ Ryan, unpublished data).

5.4 Schema-Based Auditory Grouping

Thus far, we have considered only forms of auditory scene analysis based on commonalities in grouping cues present in the spectral and temporal properties of sound elements composing vocal signals. But the formation of auditory groups can also occur based on a listener’s prior experiences or expectations in a process Bregman (1990) referred to as schema-based scene analysis. There is little a priori reason to limit schema-based analyses to experiential influences. Evolution might also equip listeners with hard-wired schema for analyzing acoustic scenes. Thus far, this general question has received little attention in frogs. Some frogs produce complex vocalizations comprising distinctly different sound elements that follow statistical rules of ordering (e.g., Larson 2004; Gridi-Papp et al. 2006). Schema-based auditory grouping might occur if receivers have evolved templates for call recognition that incorporate the same rules governing signal production. For example, the complex call of the túngara frog consists of a whine followed by one or more chucks due to the morphological constraints of call production (Gridi-Papp et al. 2006). Farris and Ryan (2011) have shown that female túngara frogs are sensitive to this temporal order when it comes to grouping whines and chucks together, indicating for the first time that schema-based auditory grouping might be important in frogs.

5.5 Summary and Future Directions

The rigorous study of auditory scene analysis is still in its infancy in frogs, yet scene analysis represents an important aspect of perceiving acoustic signals in complex and noisy acoustic environments (Bee 2012). Many important questions remain for future work. It will be particularly important to determine the spectral, temporal, and spatial cues that frogs use to perceptually organize acoustic scenes. For example, do other cues, besides frequency separation, promote the formation of multiple auditory streams? To what extent do multiple cues interact? Is auditory streaming necessary for receivers to make sense of the call timing interactions of nearby males? Much work also remains before we understand the importance of harmonicity as an auditory grouping cue in frogs. Previous results from studies of harmonicity in treefrogs and bullfrogs are important because they highlight two fundamental principles in the study of hearing and sound communication in frogs. First, sometimes receivers may be able to detect acoustic differences (in harmonic structure in this case) that may or may not be used in call discrimination tasks. Studies of communication behaviors may not always tell the whole story about anuran hearing. Second, similar studies in multiple species, sometimes even closely related ones, can yield contrasting outcomes. These findings highlight the need for comparative studies of hearing in these animals. Questions about the importance of common onsets/offsets as an auditory grouping cue have yet to be addressed in frogs. Studies of the influence of onset/offset synchrony on call recognition, and its potentially correlated effects on combination-sensitive neurons in the auditory system (e.g., Fuzessery and Feng 1982), are needed to understand this potentially important auditory grouping cue.

The few presently available studies suggest some potentially interesting differences between humans and frogs in terms of the role spatial cues play in sequential and simultaneous integration. While spatial cues appear to play a limited role in simultaneous integration in humans, they may be important in frogs. In contrast, spatial cues appear to play a role during sequential integration in humans, but perhaps only a minor one in frogs. Efforts to understand how the frog’s pressure-gradient receiver contributes to exploiting spatial cues in the perceptual organization of acoustic scenes represent an exciting frontier in research on anuran hearing. An important direction for future studies should be to rigorously quantify the spatial heterogeneity present in frog choruses. Recent developments in microphone array technology provide the technological basis for doing so (Jones and Ratnam 2009; Bates et al. 2010). Similarly, it is presently unclear why frogs appear not to exhibit auditory induction. Until additional data from more frog species become available, however, conclusions about differences in auditory induction between humans, frogs, and other animals must remain provisional. Additional studies of schema-based auditory scene analysis in frogs are also badly needed.

6 Multimodal Cues

In dense social aggregations, or any time background noise levels are high, receivers may benefit from using multiple sensory modalities for communication. In human listeners, for instance, speech detection and intelligibility improve when speech sounds are accompanied by corresponding lip gestures (Grant and Seitz 2000; Schwartz et al. 2004). Indeed, environmental noise may generally select for the use of multimodal signals in animals (Hebets and Papaj 2005). Early work established that some frogs, such as the Coqui frog (Eleutherodactylus coqui) and the white-lipped frog (Leptodactylus albilabris) of Puerto Rico, are sensitive to acoustic signals and coincident seismic signals produced when the vocal sac “thumps” the substrate (Lewis and Narins 1985; Lewis et al. 2001; see also Sect. 5.2.5 in Chap. 5). A few recent studies indicate that frogs may make more use of multimodal information than previously thought by showing that receivers of both sexes also attend to visual cues associated with the vocal sac (Narins et al. 2003, 2005; Rosenthal et al. 2004; Taylor et al. 2008, 2011; Taylor and Ryan 2013; Gomez et al. 2009; Richardson et al. 2010). Beyond establishing that frogs attend to visual cues associated with vocalizations, these studies firmly establish the use of robotics and video playbacks as important new tools in the study of hearing and sound communication in anurans. To the best of our knowledge, no published study has investigated whether reliance on visual cues associated with an inflating vocal sac improves the ability of frog receivers to solve cocktail-party-like problems. Determining the extent to which male and female frogs may be lip-reading (or vocal-sac-reading, to be more accurate) will be an important next step in understanding how frogs perceptually organize complex audio-visual scenes.

7 Conclusions

In this chapter, we have adopted what might be called a “cocktail party perspective” on anuran acoustic signal perception in noisy environments. Given that essentially modern frogs already hopped the earth while dinosaurs still roamed it (Wells 2007), the evolutionary success of this vertebrate lineage is a testament to their ability to overcome cocktail-party-like problems associated with breeding in noisy social environments. Experimental studies conducted over the last three decades, many of them in the last few years, have begun to uncover the mechanisms by which frogs cope with high levels of masking noise and competing sound sources in complex acoustic scenes. Current evidence suggests that frogs exploit some of the same spectral, temporal, and spatial cues that humans also use to achieve a release from auditory masking and to form perceptually coherent auditory groups. There is, however, evidence to suggest that at least some of these cues may differ in relative importance or be processed differently in frogs and humans. This is not surprising given some of the evolved differences between the peripheral and central auditory systems of amphibians and mammals. And it is precisely the evolutionary history of the vertebrate auditory system that makes the study of anuran acoustic signal perception in noisy environments so important. Many of the basic features involved in hearing in noisy, multisource environments probably arose early during the evolution of vertebrate hearing (Popper and Fay 1997; Fay and Popper 2000; Lewis and Fay 2004). But it is important to also bear in mind that some key features of the vertebrate auditory system have had multiple evolutionary origins (Manley et al. 2004). Tympanic hearing, for example, appears to have arisen independently in each major lineage of tetrapod vertebrates (Christensen-Dalsgaard and Carr 2008). Evolution by natural selection is well known for finding a diversity of solutions to common problems. Therefore, it is certainly not unreasonable to expect (and in fact, it would be unreasonable not to expect) that different vertebrate lineages may possess different suites of hearing mechanisms comprising some ancient ones inherited from our last common ancestor, as well as some novel ones that have been derived or elaborated in a particular lineage since their divergence from other lineages. Sometimes solutions to cocktail-party-like problems may be evolutionarily homologous across taxa; other times, evolution may have created analogous solutions to the problem. This has profound implications for how we study animal acoustic communication (Bee and Micheyl 2008). The only way to examine both ancient, shared mechanisms and more recently derived novel mechanisms is to take a broad comparative approach to understand how animals in different lineages solve similar problems. Given the number of questions that remain concerning how frogs perceive sounds and acoustic signals in noisy environments, anuran amphibians will continue to be an important taxon for this line of comparative research on hearing and sound communication over the coming decades.