1 Introduction

Nearly all of the approximately 6630 known species of anurans (frogs and toads) engage in acoustic communication (reviewed in Gerhardt and Huber 2002; Kelley 2004; Ryan 2001; Wells 2007). In most species, vocalizations are produced primarily or exclusively by males, and they serve a variety of important functions in reproductive and social behavior. Female frogs listen to the vocalizations produced by males to select not just mates of their own species, but also mates of particularly high quality (Ryan and Rand 1993; Welch et al. 1998). Male frogs listen to vocalizations to determine the size, fighting ability, and individual identity of their competitive rivals (Bee et al. 2016). The overwhelming importance of acoustic communication, and hence the auditory system, in the behavior of anurans has made them important models for answering fundamental questions in animal behavior, evolutionary biology, and auditory neuroscience. In this article, we review research describing how one key feature of the anuran auditory system—their internally coupled ears—functions in the contexts of hearing and sound communication.

Many anurans communicate in environments that are both physically and acoustically complex, and they often do so at night under low-light conditions. Physical complexity exists because frogs communicate in habitats that can include water, aquatic vegetation, herbaceous plants, shrubs, and trees. Acoustic complexity exists because male frogs typically call at high amplitudes (Gerhardt 1975) in dense aggregations. The resulting “choruses” may consist of hundreds of calling individuals from a dozen or more species. Hence, frog choruses are characterized by high levels of background noise and acoustic clutter (Narins and Zelick 1988).

Fig. 1
figure 1

Anatomy of internally coupled ears in hylid treefrogs. a An adult female of Cope’s gray treefrog, H. chrysoscelis, with a white arrow depicting the tympanum. b Schematic of the middle ear of the Pacific treefrog, Pseudacris (formerly Hyla) regilla, redrawn from Lombard and Straughan (1974). c Magnetic resonance imaging (MRI) scan of a Cope’s gray treefrog made with a 9.4-T magnet with 31-cm bore, redrawn from Bee (2015). Abbreviations: col columella (red), ec extracolumella (yellow), et Eustachian tube, mc mouth cavity, op operculum (green), opm opercularis muscle (brown), ss suprascapula, sc (inner ear) semicircular canal, t tympanum

A common behavior exhibited by both male and female frogs in response to hearing conspecific vocalizations is “phonotaxis,” a stereotyped behavioral approach toward individual calling males (Gerhardt 1995). Female frogs typically select their mate by exhibiting phonotaxis toward a calling male (Feng et al. 1976; Passmore et al. 1984; Rheinlaender et al. 1979), ultimately approaching very closely and even touching him to initiate mating. Male frogs commonly exhibit phonotaxis toward another nearby calling male as a key component of their aggressive response if the rival is perceived as a threat to possession of a calling site or territory (Bee 2003; Narins et al. 2003; Ursprung et al. 2009).

Accurate phonotaxis in dark, physically complex, and acoustically cluttered environments illustrates two key functions of the frog’s internally coupled ears: sound source localization and sound source segregation. Sound source localization refers to the auditory system’s ability to determine the position of a source in three-dimensional space. Sound source segregation (or “auditory scene analysis”; Bregman 1990) refers to the ability of the auditory system to parse the composite sound pressure wave generated by multiple, simultaneously active sources and assign its constituent parts to their correct source (Yost et al. 2008). The reproductive and social behaviors of frogs require that they accurately perform both tasks.

Here, we review research aimed at discovering how the internally coupled ears of frogs contribute to solving problems of sound source localization and segregation. We review anatomical, biophysical, neurophysiological, and behavioral studies in an attempt to link the structure and function of the internally coupled ears of frogs to the behavioral performance of individuals engaged in various localization and segregation tasks. Readers are referred to previous reviews of source localization (Christensen-Dalsgaard 2005, 2011; Gerhardt and Huber 2002; Rheinlaender and Klump 1988) and segregation (Bee 2012, 2015; Vélez et al. 2013) in frogs for additional information not covered here. We focus this review on treefrogs in the genus Hyla because they are the most thoroughly studied frogs in terms of source localization and segregation, and because their experimental tractability makes them promising models for future research on how animals with internally coupled ears localize and segregate sound sources.

Fig. 2
figure 2

Directionality of the tympanum in Hyla versicolor. The plot shows vibration amplitude as a function of source incidence angle in azimuth in \(30^{\circ }\) steps (relative to the snout at \(0^{\circ }\)). The center of the plot corresponds to a vibration amplitude of 10 nm; distance between the concentric reference circles is 10 dB. Data are shown for three frequencies: 1080 Hz (blue circles), 1520 Hz (red squares), and 2200 Hz (green triangles). The tympanum’s frequency response is shown at each angle (solid black lines), and the response from \(60^{\circ }\) is re-plotted as a gray area behind each spectrum. Note that the greatest directionality is generally seen at frequencies intermediate between the two peaks of the bimodal frequency response of the tympanum (e.g., 1520 Hz) and that the two peaks correspond approximately to the lower peak (e.g., 1080 Hz) and upper peak (e.g., 2200 Hz) of conspecific advertisement calls. Redrawn from Jørgensen and Gerhardt (1991)

2 Anatomy

The ears typical of most modern anurans consist of tympana, auditory ossicles, air-filled middle ear cavities, and large, permanently open Eustachian tubes (Fig. 1). The tympanum is large and in most species consists of relatively undifferentiated skin and sits flush with the side of the head (Fig. 1a). Anurans have a single middle ear bone, the columella, which contacts the tympanum via a cartilaginous structure called the extracolumella (Fig. 1b). The columella is homologous to the mammalian stapes, but had a non-auditory function in the early tetrapods (Clack 1997). The air-filled middle ears of anurans are clearly internally coupled through the Eustachian tubes and mouth cavity (Fig. 1c) (Narins et al. 1988).

Although the early history of frogs is not particularly well documented in the fossil record (Roček 2000), the available data indicate that a functional, tympanic ear is an ancestral trait in the anurans. The Triassic proanuran Triadobatrachus had some anuran characteristics, but it also clearly had a different Bauplan from modern anurans (or is a larval form). Fossils are lacking between Triadobatrachus and the earliest, essentially modern anurans, Prosalirus and Vieraella, which are known from the early Jurassic. Moreover, the ear region is not well preserved in the earliest known anuran species. However, the slightly later Notobatrachus had a middle ear that resembles that of modern anurans (Báez and Basso 1996). This finding suggests that the anuran tympanic ear probably emerged in the Triassic or Permian (based on characteristics of their supposedly tympanic, amphibamid ancestors). Given that the major groups of tetrapod vertebrates diverged earlier, in the Carboniferous, the anuran tympanic ear arose independently of tympanic ears in other tetrapod lineages (Christensen-Dalsgaard and Carr 2008; Grothe and Pecka 2014; Schnupp and Carr 2009). The exact selection pressure driving the evolution of the tympanic ear in anurans is unknown. It may be that the origin of the tympanic ear reflected an early specialization for acoustic communication. Alternatively, the increased sensitivity and directionality at higher frequencies provided by a tympanic ear may have been adaptive in simply providing the animals with important new information about their environment.

Fig. 3
figure 3

Directionality of the tympanum in Hyla chrysoscelis. The plot shows the transfer function of tympanum vibration velocity (color, dB re 1 mm/s/Pa) as a function of direction (x-axis, ipsilateral angles positive, frontal direction is 0) and frequency. a Lungs inflated. b Lungs deflated. c Lungs manually re-inflated. Note the two peaks of tympanum vibration near 1.4 and 2.5 kHz and the pronounced directionality between these two frequencies. Note also how the spectral peak at 1.4 kHz (red arrow) and strong directionality observed in the range of 1.6–1.9 kHz (black arrow) disappear when the lungs are deflated. The data depicted here are from a male frog. Reprinted from Journal of Comparative Physiology A, volume 200, M. S. Caldwell, N. Lee, K. M. Schrode, A. R. Johns, J. Christensen-Dalsgaard, M. A. Bee, “Spatial hearing in Cope’s gray treefrog: II. Frequency-dependent directionality in the amplitude and phase of tympanum vibrations,” pp. 285–304, Copyright (2014), with permission from Springer

3 Biophysics

In addition to transmission through the tympanum and well-developed middle ear structures (Fig. 1), sound can enter the frog ear through several different pathways. Evidence suggests that sound can also enter through the body wall, especially above the lungs, as well as through the nares, the mouth floor, and through extratympanic pathways that most likely involve sound-induced vibrations of the skull (i.e., bone conduction) transduced via the operculum or by coupling via the round window (Christensen-Dalsgaard 2005). Experiments on two hylid species, Pseudacris (formerly Hylaregilla and Hyla versicolor, showed that a substantial part of the low-frequency auditory sensitivity remained even after the tympanum was removed, suggesting considerable extratympanic sensitivity in these species below 1 kHz (Lombard and Straughan 1974).

Biophysical measurements of tympanum vibrations have been undertaken using laser Doppler vibrometry in four Hyla species for which source localization and segregation have also been investigated: H. versicolor (Jørgensen and Gerhardt 1991), H. gratiosa (Jørgensen and Gerhardt 1991), H. cinerea (Michelsen et al. 1986), and H. chrysoscelis (Caldwell et al. 2014). In all four species, the tympanum vibration spectrum exhibits the inherent directionality expected for an ear that functions as a pressure difference receiver (Figs. 23). The typical directional pattern, depicted in Fig. 2 with data from H. versicolor (Jørgensen and Gerhardt 1991), is ovoidal in shape, with a relatively steep gradient across the midline and a shallower gradient more lateral. In H. chrysoscelis, these ovoidal patterns of directionality are largely symmetrical about the transverse plane. As described later, this forward–rearward symmetry provides a simple biophysical explanation for the inability of females of this species to distinguish sounds coming from forward versus rearward directions in some behavioral tests of sound source localization (Caldwell and Bee 2014; Caldwell et al. 2014).

Estimates of the maximum directionality of the tympanum’s vibration amplitude vary with both frequency and the method of estimation (Table 1). The most common method of directionality estimation is to compute the vibration amplitude difference (VAD) as the difference between vibration amplitudes of the measured tympanum across different angles of sound incidence. In Hyla (and also in a ranid and an eleutherodactylid), the maximum VAD at frequencies emphasized in advertisement calls ranges from 3 to 10 dB, with most values near 5–6 dB (Table 1). Perhaps somewhat surprisingly, the best directionality is often observed at frequencies different from those emphasized in advertisement calls. In all Hyla species studied to date, the frequency response of the tympanum has a characteristic bimodal spectrum (Figs. 23). The maximal directionality is often found at sound frequencies intermediate between the two peaks of the tympanum’s bimodal frequency response, as in other frogs (Christensen-Dalsgaard 2005), and is strongly influenced by the lung input (Fig. 3). For example, the maximum VAD measured at these intermediate, non-call frequencies in Hyla (and also in a ranid and an eleutherodactylid) ranges from 10 to 25 dB (Fig. 2; Table 1). Greater directionality at intermediate, non-call frequencies is closely tied to a large reduction in the sensitivity of the tympanum to these frequencies at some angles (e.g., between \({-}90^{\circ }\) and \(0^{\circ }\) in Fig. 2). Given greater directionality in the tympanum’s response at intermediate frequencies, one might expect better azimuthal localization at similar frequencies. As described below, however, behavioral studies show poorer localization in female H. versicolor at these intermediate frequencies compared with those present in conspecific advertisement calls (Jørgensen and Gerhardt 1991). At present, we can only speculate about the reason for this apparent disconnect between the tympanum’s directionality and the animal’s performance in directional hearing tasks. Perhaps increased directionality at intermediate frequencies is the by-product of a reduction in sensitivity that functions to filter out frequencies emphasized in the calls of other species. Although not maximal, tympanum directionality at call frequencies is still robust and clearly provides the animal with the information required to localize and segregate sources of calls (Fig. 3).

Table 1 Laser vibrometry measures of the maximum directionality of tympanum vibration amplitudes in response to acoustic stimuli presented from different sound incidence angles determined as the vibration amplitude difference (VAD) or the interaural vibration amplitude difference (IVAD)

We presently lack a detailed understanding of how interaural coupling and multiple input sources interact to create the directionality observed in the tympanum response in Hyla. So far, none of the biophysical experiments in Hyla have measured acoustical interaural coupling directly, but studies in ranid frogs (Feng 1980; Vlaming et al. 1984) have found an interaural gain of approximately \({-}\)6 dB, which can generate a maximal directional difference of approximately 10 dB (Christensen-Dalsgaard 2005). Since this is comparable to the maximal directionality in some of the hylids investigated, it is likely that their interaural coupling is also comparable. The low-frequency peak of the tympanum’s frequency response most likely is generated by input from the lung, at least in some species. Figure 3a shows an example of tympanum transfer functions from H. chrysoscelis (Caldwell et al. 2014). The bimodal spectrum has a low-frequency peak at 1.4 kHz, close to the low-frequency peak of the species’ advertisement call (1.25 kHz), and a high-frequency peak at 2.5 kHz, corresponding to the high-frequency peak of the call. The low-frequency peak coincides with the resonance frequency of body wall vibrations; it disappears when the lungs are deflated (Fig. 3b) and reappears when they are re-inflated (Fig. 3c) (see also Jørgensen and Gerhardt 1991). However, the influence of the lung input on the auditory coupling is not well understood in Hyla (or in any other group of frogs) and awaits further studies. In terms of acoustic input through the lung and other pathways, the biophysics of the internally coupled middle ears of frogs is far more complicated than the lizard middle ear, which can be modeled efficiently as a two-input system (Carr et al. 2016; Shaikh et al. 2016). Realistic models of auditory coupling in the treefrog ear will have to include the properties of the lung input, as well as other extratympanic inputs (Aertsen et al. 1986; Narins 2016).

Fig. 4
figure 4

Directionality of auditory nerve responses in Hyla versicolor. The central figure in each panel is a polar plot showing spike rate at 10 dB above threshold as a function of azimuthal sound incidence angle for a a low-frequency fiber with a characteristic frequency of 395 Hz or b a high-frequency fiber with a characteristic frequency of 1705 Hz (Christensen-Dalsgaard, unpublished data). Recordings were made from the auditory nerve on the animal’s right side. In a, circular grid spacing is 30 spikes/s; in b it is 10 spikes/s. Surrounding each polar plot are peristimulus time histograms showing the relative magnitudes of responses as a function of azimuthal sound incidence angle (all on the scale indicated); below each histogram is a depiction of the stimulus waveform

4 Neurophysiology

4.1 Auditory nerve

Only a few studies of Hyla have investigated the processing of directional information by the nervous system. Pilot studies of H. versicolor have revealed strongly directional responses to tone bursts in auditory nerve fibers (Christensen-Dalsgaard 2004). Figure 4 shows data for two units, one with a low characteristic frequency (395 Hz) and one with a higher characteristic frequency (1705 Hz). In both units, a strongly lateralized response was observed, with a steep gradient across the midline. The directionality at the higher frequency of 1705 Hz broadly follows the ovoidal pattern observed for tympanum directionality in the same species (cf. 1580 Hz in Fig. 2 and 1705 Hz in Fig. 4). In contrast, the directionality at the low frequency of 395 Hz, which falls largely outside the range of the tympanum’s frequency response (see Fig. 2), also follows an ovoidal pattern that is almost certainly extratympanic in origin. In addition to variation in firing rate, auditory nerve fibers can encode directional information in their response latency. Auditory nerve responses to calls in H. cinerea, for example, show a direction-dependent time shift of several ms (Klump et al. 2004), which is probably caused by the decreased sensitivity from contralateral angles.

4.2 Hindbrain

Binaural processing in frogs begins in the first auditory nucleus of the brain, the dorsal medullary nucleus (DMN). To date, the responses of binaural cells in the DMN have only been investigated in ranid frogs. In Rana catesbeiana, 47 % of the DMN cells were binaural, and the most common of these were so-called EI cells (excited by input from one ear and inhibited by input from the other). A smaller number of EE cells (excitatory inputs from both ears) were also found in the DMN (Feng and Capranica 1976). Most EI cells were excited by input from the contralateral ear, and the contralateral and ipsilateral inputs exhibited almost identical frequency selectivity. The inhibition (usually from the ipsilateral ear) was only effective if the stimulus delivered to the inhibitory ear was leading. In another ranid frog, R. temporaria, it was shown that the response of the binaural cells was complex, either EI (again, ipsilateral inhibitory) or EE, and depended on the interaural time difference (ITD). In most cases, leading ipsilateral stimuli inhibited the response, but when lagging, ipsilateral stimulation could also be excitatory (Christensen-Dalsgaard and Kanneworff 2005). However, with free-field stimulation the response of the binaural cells clearly showed a sharpened directionality.

The binaural processing in the next auditory nucleus in the ascending auditory system, the superior olivary nucleus (SON), has only been studied in H. cinerea (Feng and Capranica 1978). In the SON, 42 % of units were binaural, which is approximately the same proportion as reported in the study of the DMN of R. catesbeiana (Feng and Capranica 1976). Of these binaural cells, most were EI cells that were inhibited by input to the ipsilateral ear, and a smaller fraction consisted of EE cells. The low-frequency EI cells were sensitive to interaural time differences; inhibition was more pronounced when ipsilateral stimuli led by up to 0.5 ms. However, no specialized temporal processing by coincidence detectors was reported.

4.3 Midbrain

The inferior colliculus (IC) of frogs is an important stage in the midbrain for processing vocalizations. It appears to serve as a sensory gateway to higher levels of the brain responsible for sensorimotor integration and motor control (Wilczynski and Ryan 2010). Binaural processing in the IC is strongly lateralized, again with the contralateral IC being excited and the ipsilateral IC inhibited by directional stimuli. Directionality at the level of the IC is sharpened by ipsilateral inhibition, either from the contralateral IC or from lower brain stem areas (Zhang et al. 1999). To date, there has been no demonstration of the representation of a space “map” in the IC of anurans, which instead appear to be similar to lizards in using a “meter” strategy, whereby differences in azimuth are encoded by differences in firing rates (Carr and Christensen-Dalsgaard 2015; Christensen-Dalsgaard 2005).

In the only study of directional processing in the Hyla IC, Schwartz and Gerhardt (1995) found that in H. versicolor, the direction-dependent differences in multiunit responses evoked by calls presented contralaterally versus ipsilaterally depended on their spatial separation and absolute sound level. At a separation of \(120^{\circ }\) symmetrical about the midline, the directional gain (contralateral–ipsilateral) ranged between 7.5 and 9.2 dB over absolute signal levels ranging between 63 and 83 dB. When the separation was only \(45^{\circ }\), directionality varied between 4.1 and 6.6 dB across the same range of absolute signal levels. Although these values are in line with magnitudes of directionality predicted from laser measurements of the tympanum in H. versicolor (Table 1; Jørgensen and Gerhardt 1991), they are larger than the difference in relative sound amplitude (3 dB) that was able to abolish any advantage of spatial separation between overlapping calls in parallel behavioral studies (see Fig. 10d below; Schwartz and Gerhardt 1995). Schwartz and Gerhardt (1995) discuss several possible hypotheses for differences between their behavioral and neurophysiological results.

Direct evidence for a role in source segregation of excitatory and inhibitory interactions in the frog IC has come from studies of free-field spatial release from masking in northern leopard frogs, R. pipiens (Lin and Feng 2001, 2003; Ratnam and Feng 1998). Spatial separation in azimuth between probe stimuli and noise maskers resulted in a maximum spatial release from masking of 2.9 dB, on average, in auditory nerve fibers, but 9.4 dB in IC neurons. Iontophoresis of bicuculline, a \(\hbox {GABA}_{\mathrm{A}}\) receptor antagonist, resulted in a large decrease in neural spatial release from masking that was closely tied to a more general degradation of direction sensitivity. Thus, neural inhibition is likely necessary for accurate source localization and source segregation. Interestingly, the binaural sensitivity of IC neurons can be modulated by forebrain stimulation, so as to bias the relative response of the left and right IC (Ponnath and Farris 2014). This forebrain modulation of sensitivity might be understood as an attentional selection mechanism that could be used in source segregation. Depending on input from the forebrain, one sound source or a group of sound sources in a particular hemifield of IC responsiveness might be selected or deselected. Similar inhibitory and modulatory processing remains to be investigated in Hyla.

Fig. 5
figure 5

Simulated EI neuron response to free-field sound. In this model, contralateral is excitatory and ipsilateral is inhibitory when leading by up to 1 ms. The response was constructed by comparing each ipsilateral and contralateral spike in an auditory nerve recording (as in Fig. 9). The peristimulus time histograms and colored curves in the polar plot show the responses of right (red) and left (blue) EI neurons to simultaneous calls from both sides (Note the color coding of the y-axes for the two overlaid histograms; histograms at all angles are on the scale indicated). The black curve shows the actual nerve spike rate data (i.e., before EI processing) at a stimulus level of 70 dB SPL (Christensen-Dalsgaard, unpublished data)

4.4 Model of EI processing

Figure 5 shows a model of central EI processing using as input actual auditory nerve data from H. versicolor (Christensen-Dalsgaard, unpublished data). In the model, it is assumed that contralateral input is excitatory and ipsilateral input is inhibitory. This assumption is based on the physiological data from the SON of H. cinerea discussed above (Feng and Capranica 1978). Also, it is assumed that an ipsilateral input leading contralateral input by up to 1 ms suppresses the response in the neuron. In this simplified model, the input is simply the action potentials from the auditory nerve, and each spike train is compared to a spike train from its mirror location. As illustrated in Fig. 5, EI processing functions to sharpen the directional response, but the sharpening for most of the directional auditory nerve inputs is rather small, partly because the contralateral input already is reduced and delayed by the interaural coupling mechanism. At high stimulus levels, where the ipsilateral input is saturated, EI processing will be more important and might extend the useful dynamic range of the directional response. However, additional EE processing might be equally important, since it creates a direction-independent measure of stimulus amplitude, which may also be important in source localization and segregation, as well as in estimates of distance. In the worst possible case—two precisely synchronized calls from males spaced symmetrically around a receiver—the EI neurons on both sides of the receiver’s brain would be stimulated equally. However, even in this unlikely hypothetical scenario, comparison of the outputs of EI and EE neurons would still enable the receiver to determine that there were two callers present, rather than just one caller located close to the midline (e.g., directly in front of the receiver). Such a determination would be possible, for example, if the directionality was strong and ovoidal, and EI and EE neurons simply subtracted and summed spikes. For the case of two precisely synchronized callers spaced symmetrically around the receiver, EI neurons on each side of the brain would have the same response strength, say S (which just reflects the contralateral input for each neuron if the directionality is strong). The EE neurons in this case would have a response strength of 2S. In contrast, if there were only one caller positioned along the midline (e.g., directly in front of the receiver), the EI response on both sides would be close to zero, whereas the EE response would depend on the directionality of the ear and would most likely be close to S. Thus, the ratio of EI/EE responses could provide one metric that the central nervous system could use in source segregation. These ideas remain to be tested.

Table 2 Closed-loop behavioral measures of sound source localization acuity

5 Behavior: source localization

A major function of the frog’s internally coupled ears is to allow the animal to localize calling males. Treefrogs call from a variety of spatial positions in the available habitat, including from the surface of open water, from dense clusters of emergent aquatic vegetation, and, of course, from elevated positions in trees (Ptacek 1992). Consequently, treefrogs must be able to localize sound sources in the horizontal and vertical planes as well as estimate source distance. In this section, we review behavioral studies that have examined the performance of treefrogs in various localization tasks requiring them to determine the azimuth, elevation, and distance of a source of natural or synthetic models of advertisement calls (Table 2).

5.1 Azimuth

Behavioral studies of source localization in frogs have focused on performance in the horizontal plane (Rheinlaender and Klump 1988). This work has revealed that frogs do not merely lateralize sound sources (i.e., determine whether sound comes from the left or right side), nor do they simply move toward a source by scaling a pressure gradient in the sound field. Instead, they can discriminate between different angles of sound incidence, and they appear to localize sources in azimuth to within \(5^{\circ }{-}10^{\circ }\). In addition, they can localize sound frequencies having wavelengths more than an order of magnitude longer than their interaural distance, a capability arising from the internal coupling of the ears.

In a study of phonotaxis by females of the green treefrog (H. cinerea) and the barking treefrog (H. gratiosa), Feng et al. (1976) were the first to show empirically that frogs must use both ears to accurately localize a source in azimuth. Females of both species exhibited relatively directed paths toward a sound source broadcasting calls when they could use two ears. By applying a thin layer of silicone grease to one tympanum, Feng et al. (1976) could attenuate its input by 20–40 dB. When grease was applied to the left tympanum, frogs hopped or walked in tight circles to the right, and when it was applied to the right tympanum, they instead circled to the left. These data demonstrated unequivocally that frogs rely on binaural comparisons involving interaural differences in intensity, arrival time, or both for localizing sources in azimuth.

Rheinlaender et al. (1979) conducted much more extensive analyses of phonotaxis by females of the green treefrog (H. cinerea) in which they quantified its accuracy over a distance of 3 m in response to two different synthetic calls, both of which had the same gross temporal properties of a natural advertisement call. One synthetic call mimicked the frequency spectrum of natural calls, with equal-amplitude spectral peaks at 0.9, 2.7, and 3.0 kHz. The second consisted of only the 0.9 kHz spectral peak. To measure the accuracy of phonotaxis, Rheinlaender et al. (1979) quantified the angular error of subjects’ consecutive jumps relative to the position of the speaker. On average, jump error angles were \(16.1^{\circ }\) in response to the three-component call, though many females had much smaller jump error angles. Indeed, the mean jump error angle of the subject exhibiting the best performance was just \(4.3^{\circ }\). In addition, females exhibited head scanning behavior prior to about 25 % of jumps (see also Passmore et al. 1984). Following scanning, the mean head orientation angle relative to the speaker (\(8.4^{\circ })\) was about half that of the mean jump error angle (\(16.1^{\circ }\)) (Fig. 6). Moreover, jumps that followed head scanning were associated with smaller mean jump error angles (\(11.8^{\circ }\)) compared with jumps that were not preceded by head scanning (\(17.6^{\circ }\)) (Fig. 6). Thus, females could localize calls with better accuracy than that indicated by jump angles, and accuracy improved with head scanning.

An important result from Rheinlaender et al. (1979) was that the mean jump angle was \(15.1^{\circ }\) in response to the synthetic call consisting of only the 0.9 kHz peak, compared with \(16.1^{\circ }\) for the three-component call. Female green treefrogs have interaural distances on the order of about 1–1.5 cm (Feng et al. 1976). Their ability to accurately localize a 0.9 kHz sound, which has a wavelength of about 38 cm at \(25\,^{\circ }\hbox {C}\), was the first definitive behavioral evidence to indicate that frogs must use a pressure difference mechanism to localize sounds, much like that described earlier for insects (Rheinlaender et al. 1979). Later work by Klump et al. (2004) extended the results of Rheinlaender et al. (1979) to show that, while green treefrogs can also accurately localize sounds consisting of the just the higher frequencies present in calls, performance was not as good as when the 0.9 kHz peak was also present (Table 2).

A subsequent study of source localization in the eastern gray treefrog (H. versicolor) by Jørgensen and Gerhardt (1991) integrated behavioral tests of phonotaxis in three dimensions with biophysical measurements of ear directionality using laser Doppler vibrometry. In response to a synthetic advertisement call having spectral peaks at 1.1 and 2.2 kHz, the mean horizontal jump error angle was \(19^{\circ }\). Both the median and mode jump error angles were smaller: The median was in the range of \(10^{\circ }{-}15^{\circ }\) and the mode was in the range of \(5^{\circ }{-}10^{\circ }\) (based on the binned histogram data in their Fig. 6). Thus, as in green treefrogs, localization in azimuth is almost certainly better than suggested by the mean jump error angle. Head scanning was not observed in Jørgensen and Gerhardt ’s (1991) study of gray treefrogs, making the observed mean jump error angle of \(19^{\circ }\) most comparable to that of \(17.6^{\circ }\) reported for green treefrogs by Rheinlaender et al. (1979) on trials when females of that species did not engage in head scanning. Thus, head scanning is not necessary for treefrogs to localize sounds in azimuth.

Fig. 6
figure 6

Measures of behavioral performance in closed-loop phonotaxis tests of source localization in azimuth in green treefrogs (Hyla cinerea). Histogram showing distributions of head orientation angles (\(\alpha )\) and jump error angles (\(\gamma )\) when females engaged in head scanning behavior. Insets show how head orientation angle and jump error angle were computed. Data are from Rheinlaender et al. (1979)

Another important finding from the study of Jørgensen and Gerhardt (1991) stems from a rather clever aspect of their experimental design. Each time the frog moved to a new position, the experimenters momentarily stopped playbacks, quickly computed a new attenuation setting to achieve a sound pressure level of 85 dB SPL at the frog’s new position (based on prior calibrated measurements) and then resumed stimulus broadcasts. By doing this each time the frog moved, they eliminated source location cues related to the gradient of sound pressure within the sound field. Nevertheless, the frogs still could localize the source as accurately as when gradient cues were available. These results confirmed that treefrogs do not merely move up a sound pressure gradient during phonotaxis to localize sources.

More recently, Caldwell and Bee (2014) examined source localization in Cope’s gray treefrog (H. chrysoscelis), which is the sister species of the eastern gray treefrog (H. versicolor). Caldwell and Bee (2014) did not quantify jump error angles, as in the studies just described. Instead, they measured the angles between consecutive turns made by subjects exhibiting phonotaxis toward a synthetic advertisement call. The mean “path turn angle” was \(13.0^{\circ }\), although some females exhibited much more directed paths with smaller turn angles. In these experiments, females were tested in a circular arena with the speaker hidden out of sight behind an acoustically transparent but visually opaque wall. The mean error in the angle at which subjects first made contact with the wall of the circular arena, relative to the position of the hidden speaker, was \(6.9^{\circ }\). These estimates of path turn angle and error angle at the arena wall demonstrate an accuracy of localization in line with previous studies of green treefrogs (Rheinlaender et al. 1979) and eastern gray treefrogs (Jørgensen and Gerhardt 1991).

There is an important distinction to be made between closed-loop and open-loop tests of sound localization (see Klump 1995). The studies described up to this point used closed-loop tests to examine source localization accuracy. In a closed-loop phonotaxis test, subjects hear repeated presentations of the sound as they move through the sound field. This enables them to continually update their estimates of source location between consecutive movements, for example by head scanning or shifting body position, prior to moving toward the source following a subsequent sound presentation. Such updating from multiple positions in the sound field is eliminated in open-loop experiments. In an open-loop phonotaxis test, sound is switched off immediately when the subject makes its first rotational or translational movement, and the difference between the subject’s starting and ending positions, relative to the speaker, is used to determine localization acuity. To date, only two studies have used open-loop tests to investigate source localization in frogs, and both reported remarkably similar findings.

Fig. 7
figure 7

Measures of behavioral performance in open-loop phonotaxis tests of source localization in azimuth in a barking treefrogs (Hyla gratiosa) and b Cope’s gray treefrog (H. chrysoscelis). Shown here are the mean orientation angles of subjects after making a translational or rotational movement relative to the position of a source of advertisement calls at sound incident angles in the frontal hemifield between \({-}45^{\circ }\) (left) and \({+}45^{\circ }\) (right). Redrawn from data in Klump and Gerhardt (1989) and Caldwell and Bee (2014)

In their study of the barking treefrog (H. gratiosa), Klump and Gerhardt (1989) demonstrated unequivocally that frogs possess true angle discrimination. A natural call was presented from various frontal angles between \({-}45^{\circ }\) (left) and \({+}45^{\circ }\) (right) in open-loop tests. As illustrated in Fig. 7, there was a linear relationship between the angle of sound incidence (relative to the frog’s snout at \(0^{\circ }\)) and the extent to which the frog turned in the same direction. These data provided a clear indication that the frogs could discriminate between different angles (and hence do better than mere lateralization). Interestingly, the errors associated with orientation were smaller when the sound came from slightly off axis versus from directly in front of the animal. Such a pattern in localization performance is predicted by the peripheral auditory system’s ovoidal pattern of inherent directionality (see Fig. 2).

For 25 years, the work by Klump and Gerhardt (1989) was the only study of source localization in frogs to use open-loop tests. Recently, Caldwell and Bee (2014) replicated and extended this work in their study of Cope’s gray treefrog (H. chrysoscelis). As the data in Fig. 7 demonstrate, the two species exhibit striking similarity in angle discrimination of frontal angles between \({-}45^{\circ }\) and \({+}45^{\circ }\). What Caldwell and Bee (2014) additionally found, however, is that angle discrimination deteriorated significantly beyond \(45^{\circ }\) lateral. Specifically, orientation errors increased dramatically as the sound was moved into the rear hemifield (see Fig. 2 in Caldwell and Bee (2014)). In fact, the data were consistent with the interpretation that, while angle discrimination is excellent in the frontal hemifield, the animals could not discriminate between forward and rearward angles under open-loop test conditions. For example, when the sound was presented from either \({+}30^{\circ }\) or \({+}150^{\circ }\) to the animal’s right side, they turned approximately the same amount (\(30^{\circ }\)) and direction (to the right). These data did not reflect limitations in turning ability. Rather, as discussed above, this result was predicted by the high degree of forward–rearward symmetry in the directionality of the tympanum’s vibration amplitude (Caldwell et al. 2014).

To summarize, we now have closed-loop measures of localization in azimuth from several treefrog species (Table 2). While these studies have produced largely consistent results, all of them probably underestimate localization acuity. Closed-loop tests have ruled out the hypothesis that frogs localize sources simply by steering up a gradient in sound pressure level, and open-loop tests have ruled out simple lateralization, showing that frogs can discriminate between different azimuthal angles.

5.2 Elevation

Anurans generally lack external ear structures, like the pinnae of mammals or the asymmetric ears of barn owls, that could exploit or generate informative cues about the elevation of sound sources. Yet many frogs readily localize males calling from elevated perches. Only three studies have investigated localization of elevated sound sources in frogs (Gerhardt and Rheinlaender 1982; Jørgensen and Gerhardt 1991; Passmore et al. 1984). All of these studies have used closed-loop tests. Two of them have been of treefrogs in the genus Hyla. Because of this dearth of data on elevated source localization in frogs, we presently lack not only precise estimates of localization acuity in the vertical plane, but also a clear understanding of how localization in elevation is achieved, particularly with respect to the possible role of internally coupled ears and the multiple pathways of acoustic input to the auditory periphery.

Using a three-dimensional grid system, Gerhardt and Rheinlaender (1982) showed that females of the green treefrog (H. cinerea) readily locate an elevated source of synthetic calls \((0.9 + 2.7 + 3.0\,\hbox {kHz})\). This study did not quantify jump error angles. Of note, however, was their description of extensive head scanning behaviors. Subjects tended to make lateral head scanning movements with their jaw parallel to the ground or slightly elevated, even if they first had to twist their head or body sideways from a vertical or inclined position to do so. Indeed, extensive head scanning was noted by Gerhardt and Rheinlaender (1982) as one of the most prominent features of sound localization in elevation by green treefrogs (see also Passmore et al. 1984). Interestingly, however, this distinctive behavior was not exhibited by females of the eastern gray treefrog (H. versicolor) in a nearly identical experimental test of localization in elevation (Jørgensen and Gerhardt 1991).

Jørgensen and Gerhardt (1991) reported a vertical jump error angle of \(12^{\circ }\), which is smaller than the horizontal jump error angle of \(19^{\circ }\) they reported for the same species. This value is also in line with (or slightly smaller than) measures of horizontal jump error angles in other species (Table 2). Hence, localization of elevated sources was at least as good as localization in azimuth based on measures of jump error angles. As an additional measure of localization, Jørgensen and Gerhardt (1991) computed three-dimensional jump error angles, which take into account movement in both azimuth and elevation. In addition, performance was measured using synthetic calls differing in spectrum and compared with expectations derived from measurements of the frequency-dependent directionality of tympanum vibrations. The gray treefrog advertisement call has two prominent spectral peaks at approximately 1.1 and 2.2 kHz. Although the bimodal frequency response of the tympanum exhibited its greatest sensitivity at frequencies near the two spectral peaks of the call, the greatest directionality was observed at intermediate frequencies (\({\sim }1.4\,\hbox {kHz})\) (see Fig. 2). In tests using stimuli with a single spectral peak, three-dimensional jump error angles were paradoxically highest at 1.4 kHz (\(36^{\circ })\)—that is, at the frequency of greatest azimuthal directionality—compared with frequencies of 1.0 (\(24^{\circ }\)) and 2.0 kHz (\(30^{\circ }\)). Three-dimensional jump error angles were lowest (\(23^{\circ })\) in response to synthetic calls with a natural bimodal spectrum \((1.1 + 2.2\,\hbox {kHz})\) (Table 2). The results from Jørgensen and Gerhardt (1991), together with those of Gerhardt and Rheinlaender (1982), raise important, and as yet unanswered, questions about localization of elevated sources. What benefits (if any) to localization in elevation accrue from head scanning, and why do not all species do it? And why are jump error angles relatively higher at the frequencies of best azimuthal directionality at the periphery? Clearly more behavioral studies (including open-loop tests) of localization in elevation are needed to discover how frogs localize elevated sources and what, if any, role internally coupled ears might play in this behavior.

5.3 Distance

Studies of aggressive signaling between male frogs suggest the sound pressure level of a nearby neighbor’s calls can function as a cue for maintaining inter-male spacing in choruses (Brenowitz 1989; Wilczynski and Brenowitz 1988). Distance to a source influences female mating decisions in some frogs, and females choose the closer of two otherwise identical calls (Murphy 2008; Murphy and Gerhardt 2002). Indeed, it is commonplace in studies of frog communication to manipulate signal level as a proxy for source distance (Gerhardt et al. 2000b; Schwartz 1989; Wagner 1989). However, only one study has investigated the specific cues frogs might use to determine source distance.

Murphy (2008) tested three alternative hypotheses about the cues females of the barking treefrog (H. gratiosa) use to estimate the distance to a calling male. Somewhat surprisingly, his data suggested that none of the tested hypotheses were correct. There was no evidence that females used measureable degradation of the frequency spectrum due to excess attenuation of high frequencies to estimate source distance. (Temporal degradation was negligible over the distances tested.) Differences in the relative amplitude of calls at the female’s starting position did not determine her ultimate choice, nor did differences in the steepness of the sound gradient as she moved toward a source.

Having ruled out sound degradation, relative amplitude differences, and differences in sound gradient steepness as cues for distance estimation, Murphy (2008) hypothesized that females might use a more cognitively complex form of “acoustic triangulation” to estimate the distances to different calling males (Fig. 8). According to this hypothesis, a female monitors differences in the rates of change in the angles between herself and different calling males as she moves through the chorus. The angles for more distant males would change more slowly than those for males relatively closer to the female’s position (Fig. 8). Thus, Murphy ’s (2008) hypothesis relies on the receiver maintaining accurate measures of the angular relationships between itself and sound sources in the environment as it moves. Notably, then, the acoustic triangulation hypothesis posits an important additional function of the frog’s internally coupled ears in estimating distances to sound sources. To date, no study has tested the acoustic triangulation hypothesis in frogs. Future efforts to do so will be important.

Fig. 8
figure 8

Acoustic triangulation hypothesis for source distance estimation in frogs. Depicted here is the change in the angles (\(\theta _{1}\) and \(\theta _{2})\) between a female and two calling males as the female moves through the chorus environment. The hypothesis holds that females estimate source distance by attending to the rate at which these angles change as they move (Murphy 2008). Adapted from Murphy (2008), assessment of distance to potential mates by female barking treefrogs (Hyla gratiosa). Journal of Comparative Psychology 122, 264–273, with permission from the American Psychological Association

5.4 Effects of noise

Until recently no study had investigated the effects of noise on sound source localization in frogs. Caldwell and Bee (2014) conducted both closed-loop and open-loop tests with Cope’s gray treefrog (H. chrysoscelis) in quiet and in the presence of band-limited noise (0.5–4.5 kHz), which encompassed the frequencies in signals and spanned most of the species’ estimated hearing range (Hillery 1984; Schrode et al. 2014). Noise was presented from an overhead speaker to create a uniform level of noise across the extent of the test arena floor. Multiple signal-to-noise ratios (SNR) were tested, and all SNRs were chosen to be above previously reported behavioral response thresholds so as to eliminate confounds of masked signal detection. In open-loop tests, the presence of noise, as well as differences in SNR, had negligible effects on the accuracy of orientation in azimuth. In fact, orientation errors were slightly smaller, on average, in the presence of noise. Instead, noise caused an increase in the latency to orient. In quiet, subjects typically oriented after presentation of the very first call, but in noise, orientation occurred more often after presentation of two or three calls. These results are consistent with the idea that, by spending more time listening to the source before orienting in noise, subjects were able to localize it as accurately as they could in quiet. Largely consistent results were also obtained in closed-loop tests. Although at the lowest SNR tested (\(+\)3 dB), error angles at the arena wall were higher, and path lengths (and to some extent also travel time) were longer, the presence versus absence of noise and the SNR had small or negligible effects across most conditions. Notably, there were no discernable effects of noise on path turn angle. Together, these results with Cope’s gray treefrog suggest that, at suprathreshold SNRs, the anuran ear is relatively impervious to noise, at least when it originates from a direction other than the location of a source of signals (see also Penna et al. 2009). Additional studies of open-loop source localization in a greater diversity of spatialized noise contexts would provide valuable new information about the potential influence of noise on localization acuity.

6 Behavior: source segregation

Calling male frogs must compete with numerous sources of abiotic and biotic noise, including the calls of other nearby neighbors, and they exhibit a range of remarkable adaptations for doing so (Schwartz and Bee 2013). Yet ultimately, the task falls to receivers to perceptually sort out the complex mixture of sounds impinging on the ears in order to make sense of the auditory world (Bee 2012, 2015; Bee and Micheyl 2008; Vélez et al. 2013). In humans, the spatial relationships between multiple sound sources influence how receivers perceptually segregate sound mixtures into their correct sources (Darwin 2008). Current evidence suggests that spatial cues processed by the frog’s internally coupled ears can play multiple, but still poorly understood, roles in sound source segregation.

6.1 Spatial release from masking

In the presence of high background noise levels, listeners are susceptible to auditory masking. Exploiting spatial separation between signals and noise is one way to improve signal detection and recognition, resulting in spatial release from masking (Bronkhorst 2000). Humans achieve spatial release from masking by listening with the ear having the better SNR, as well as by processing available binaural cues. Evidence suggests frogs also experience spatial release from masking, and this no doubt stems, in part, from the processing of directional information provided by their internally coupled ears (Caldwell et al. 2016; Lin and Feng 2003).

Schwartz and Gerhardt (1989) were the first to demonstrate spatial release from masking in frogs. In closed-loop phonotaxis tests, female green treefrogs (H. cinerea) experienced about 3 dB of masking release when it came to responding to signals. No improvement was found in the separated conditions, however, when it came to the ability of subjects to discriminate between an attractive advertisement call and an unattractive aggressive call. According to Schwartz and Gerhardt (1989), the magnitude of masking release observed in their behavioral trials was predicted by the directionality of the species’ tympanum response demonstrated earlier by Michelsen et al. (1986). While this is possible, subsequent work with northern leopard frogs (R. pipiens) has shown that central auditory processing increases the magnitude of spatial release from masking relative to that measured in auditory nerve fibers (Lin and Feng 2001, 2003; Ratnam and Feng 1998).

More recent work on spatial release from masking has focused on Cope’s gray treefrogs (H. chrysoscelis). In closed-loop tests, Bee (2007) reported a spatial release of 6–12 dB when signals and noise were separated by \(90^{\circ }\) compared with a co-located condition. These estimates were based on differences in response latency. In a follow-up study, Nityananda and Bee (2012) measured signal recognition thresholds (sensu Bee and Schwartz 2009) under co-located and \(90^{\circ }\) spatially separated conditions. While there was considerable individual variation, on about 70 % of trials, thresholds were relatively lower in the separated condition, on average, by 4.5 dB (Fig. 9). (The grand mean across all trials was 3 dB, as reported for green treefrogs.) While perhaps small, release from masking on the order of 3–6 dB could have real influences on phonotaxis and mate choice in nature. Female treefrogs readily discriminate differences in signal level or SNR as small as 2–4 dB (Bee et al. 2012; Fellers 1979; Gerhardt 1987). Moreover, Bee (2008) and Ward et al. (2013) have shown that females of Cope’s gray treefrog are better able to discriminate between conspecific and heterospecific calls based on differences in pulse rate when signals are spatially separated from sources of noise. Taken together, these behavioral studies indicate that the directional hearing provided by internally coupled ears allows treefrogs to benefit from spatial separation between environmental sources of noise and signals of interest. Recent biophysical measurements of spatial release from masking in the response of the tympanum confirm that the inherent directionality of the frog’s internally coupled ears provides an important physical basis for segregating signals from spatially separated noise (Caldwell et al. 2016).

Fig. 9
figure 9

Spatial release from masking in Cope’s gray treefrog (Hyla chrysoscelis). Depicted here are masked signal recognition thresholds measured in the presence of “chorus-shaped noise” (i.e., noise having the long-term spectrum of natural breeding choruses) that was either co-located with the signal or separated from it by \(90^{\circ }\) around a circular test arena. Subjects exhibited spatial release from masking on approximately 70 % of trials (left), whereas differences in threshold were negligible on the remaining 30 % of trials (right). Redrawn from data in Nityananda and Bee (2012)

6.2 Auditory grouping

In acoustically cluttered environments, listeners face the challenge of parsing the composite sound pressure wave generated by multiple sources into elements that are correctly grouped together and assigned to their correct sources (Bregman 1990). Separate spectral components produced simultaneously by a given source (e.g., harmonics or formants) must be grouped together across frequency, and the sounds produced sequentially by a given source (e.g., notes, pulses, and syllables) must be grouped together through time. In humans, spatial information can be exploited to perform both simultaneous and sequential auditory grouping (Darwin 2008). The limited evidence available suggests that the processing of directional information provided by the frog’s internally coupled ears also contributes to both tasks, although perhaps not equally so.

In a phonotaxis study of Cope’s gray treefrogs (H. chrysoscelis), Bee (2010) found that females preferentially chose spatially coherent calls with both spectral peaks of the bimodal spectrum originating from the same location over an alternative in which the two peaks were spatially separated in azimuth. This was true even when the spatial separation between the two spectral peaks was just \(7.5^{\circ }\), the smallest separation tested. This finding directly supports our contention that most measures of localization in azimuth based on jump error angles from closed-loop studies underestimate this ability in frogs. In addition, the patterns of preferences observed in tests of spatially coherent versus separated stimuli did not differ from those expected from a series of control tests in which subjects chose between stimuli having a spatially coherent bimodal spectrum versus a unimodal spectrum. One interpretation of this finding is that subjects perceived the spatially incoherent bimodal call as if it were separate unimodal calls. There was little evidence to suggest that preferences for spatially coherent calls over spatially separated alternatives resulted from greater difficulty localizing spatially separated sources. Additional studies are needed to corroborate and generalize these results with Cope’s gray treefrogs.

Fig. 10
figure 10

Spatial coherence as a cue for sequential auditory grouping in eastern gray treefrogs (Hyla versicolor) and Cope’s gray treefrogs (H. chrysoscelis). a Schematic illustration of how pairs of pulse trains, each with a pulse rate in the range of H. versicolor, were used as stimuli in two-alternative choice tests or single-stimulus, no-choice tests. The two versicolor-like pulse trains (e.g., 20 pulses/s) were temporally interleaved to create a single chrysoscelis-like pulse train (e.g., 40 pulses/s), but presented from spatially separated speakers. bg Results from phonotaxis tests examining the effect of spatial separation on sequential auditory grouping in H. versicolor (bd, after Schwartz and Gerhardt 1995) and H. chrysoscelis (eg, after Bee and Riemersma 2008). bd Depictions of speaker configurations used in the two-alternative choice tests of Schwartz and Gerhardt (1995) and their results plotted in the form of preference functions for various choice tests. eg Depictions of speaker configurations used in the single-stimulus tests of Bee and Riemersma (2008) and their results showing response latencies and the proportions of subjects responding as a function of spatial separation. Reprinted from International Journal of Psychophysiology 95(2), M. A. Bee, “Treefrogs as animal models for research on auditory scene analysis and the cocktail party problem,” pp. 216–237, Copyright (2015), with permission from Elsevier

Sequential auditory grouping (i.e., auditory streaming) has been investigated in both the eastern gray treefrog (H. versicolor) and Cope’s gray treefrog (H. chrysoscelis) (Fig. 10). These two sister species both produce advertisement calls consisting of a series of discrete pulses. Pulse rate is a species-specific property, and it is about twice as fast in H. chrysoscelis compared with H. versicolor. Schwartz and Gerhardt (1995) and Bee and Riemersma (2008) took advantage of this twofold species difference in pulse rate, as well as the remarkable selectivity females have for conspecific pulse rates (Bush et al. 2002), to investigate the role of spatial coherence in sequential auditory grouping. In both studies, females were presented with two interleaved pulse trains, each having the slower pulse rate more typical of H. versicolor. By temporally interleaving the pulses of these two versicolor-like pulse trains, the experimenters could create a single, faster train of chrysoscelis-like pulses (Fig. 10a). The question in both studies was the following. If the two interleaved pulse trains are presented from different spatial locations, do receivers hear two calls of H. versicolor or one of H. chrysoscelis? If spatial separation promoted their perceptual segregation, then a H. versicolor percept was expected to emerge, which would be attractive to females of H. versicolor but unattractive to females of H. chrysoscelis (Fig. 10a). While several details of experimental protocol differed between the two studies (cf. Fig. 10b, e), the results were nevertheless consistent: Spatial separation had relatively weak influences on responses. In two-alternative choice tests, Schwartz and Gerhardt (1995) found that females preferentially approached one of the pair of pulse trains separated by \(120^{\circ }\) over either of the pulse trains in an alternative pair separated by \(5^{\circ }\). No preferences were observed, however, when the \(120^{\circ }\) angle of separation was reduced to \(45^{\circ }\) and the alternative pair was still separated by \(5^{\circ }\) (Fig. 10c). Moreover, the preference for \(120^{\circ }\) separation over \(5^{\circ }\) separation was abolished simply by attenuating by just 3 dB one of the two interleaved pulse trains separated by \(5^{\circ }\) (Fig. 10d). In their study of Cope’s gray treefrogs, Bee and Riemersma (2008) tested females in a series of single-stimulus (“no choice”) tests that measured responses to interleaved pulse trains separated by different angles. Analyses of response latencies (Fig. 10f) and the proportions of subjects responding (Fig. 10g) revealed that some females still approached one of the two interleaved pulse trains even when they came from opposites sides of a circular test arena (i.e., \(180^{\circ }\) separation), indicating that some females readily grouped the two interleaved pulse trains across large angular separations. Related work in other frogs is consistent with the general observation that frogs are willing to group sequentially produced sounds over large angles of spatial separation (Farris et al. 2002, 2005; Gerhardt et al. 2000a), though improved performance may be observed when they are forced to make relative comparisons to determine which sequential sounds are to be grouped together (Farris and Ryan 2011).

7 Summary and future directions

Judging by anatomy, internally coupled ears are probably a general trait in all frog species with a tympanic ear, although only a few species have been investigated so far. The available data show that in frogs, internally coupled ears provide robust directional information at the periphery that is further processed and refined by central auditory mechanisms involving an interplay of excitation and inhibition. This directional information is no doubt important in communication, enabling female frogs to locate potential mates and male frogs to locate competitive rivals. Directional information also provides important cues for segregating the sounds of multiple callers amid high levels of background noise.

At present, we still lack a well-integrated understanding of how the structure and function of the frog’s internally coupled ears contribute to the animal’s perceptual performance in source localization and segregation. This lack of understanding arises from three sources that should become focal points for future research. First, there has been a tendency to reduce problems of sound source localization in frogs to localization in azimuth only. In contrast, frogs (in particular, treefrogs) solve problems that require them to localize sources in three dimensions: left–right (azimuth), up–down (elevation), and back–forth (distance). How might internally coupled ears contribute to solving more difficult, multidimensional problems of localization and segregation? Second, despite decades of research, we still lack well-integrated data across different levels of investigation. Forward progress will be made through efforts to explain the animal’s performance in various source localization and segregation tasks using anatomical, biophysical, neurophysiological, and behavioral data collected from the same species. To facilitate this integration, future research should coalesce around one species, or perhaps a small number of species, to examine in much greater depth the contribution of internally coupled ears to source localization and segregation. We focused this review on the genus Hyla, because these frogs provide excellent opportunities for a concerted approach integrating anatomy, biophysics, neurophysiology, and behavior to understand the biological mechanisms and function of internally coupled ears. Finally, we need new research efforts to quantitatively and computationally model precisely how the anatomy, biophysics, and neurophysiology related to the frog’s internally coupled ears contribute to source localization and segregation. This research would present significant opportunities for collaboration between biologists, physicists, and roboticists to not only model the system, but also to implement it in hardware and software that performs as well as animals with internally coupled ears in solving real-world problems of source localization and segregation. Basing this future work on the treefrog model currently represents the best opportunity for forward progress.