Introduction

Communication utilizes multiple sensory systems (Marler and Tenaza 1977; McGurk and MacDonald 1976; Rovner and Barth 1981; Hankison and Morris 2003; Narins et al. 2003). Channels of information transfer may work independently, or in combination with other signaling pathways. Uni-modal, or single channel signaling, relies on the neural integration of a single transmission modality, such as visual, acoustic, or olfactory (Stein et al. 2004). Signaling using multiple channels requires the simultaneous coordination of different sensory systems, all of which receive information independently (Movellan and McClelland 2001). Multimodal signaling may have substantial costs because it relies on the physiological and cognitive equipment necessary to integrate incoming signals into a single coherent message (Partan and Marler 2005). Despite its costs, the use of multiple modalities may reduce ambiguity to both the signaler and receiver. Redundant signals reinforce the content of the message and thus offer insurance that very important signals will be accurately interpreted (Rowe and Guilford 1999; Partan and Marler 2005).

The effects of redundant multimodal signaling can be either equivalent to the responses elicited by each individual stimulus, or enhanced when cues are combined (Rowe 1999; Partan and Marler 2005). To discern equivalence from enhancement, experiments utilizing two or more stimuli must be presented independently and together and the resulting response intensity quantified. Redundant, equivalent stimuli will evoke a similar response regardless of whether the cues are presented alone or simultaneously. However, if the response to combined cues is greater than their individual effects, the presence of multiple signals will increase the intensity of the receiver’s response. In female pigeons, Columba livia, for example, courtship response behavior was most intense when both visual and acoustic playbacks of male mating behavior were presented (Partan et al. 2005). Similarly, Narins et al. (2003) demonstrated in poison dart frogs (Epipedobates femoralis) that both a dynamic visual frog model and a simultaneous auditory stimulus were necessary to initiate aggressive male territory-defense behavior. Separating the effects of each modality can be difficult, especially in the field, because effects can be additive, multiplicative, or not perceptually independent (Rowe 1999). Even in the area of human perception the full effects of each additional stimuli on the receiver is difficult to gauge (Aydin and Pearce 1997; Cohen 1997).

How individuals cognitively bind multiple sensory inputs to form a unified percept is a key question in neuroscience (Spence and Driver 2004). In humans, visual stimuli dominate acoustic stimuli, a phenomenon known as the “ventriloquism effect” (Thurlow and Jack 1973; Jack and Thurlow 1973; Slutsky and Recanzone 2001; Alais and Burr 2004). Although the ventriloquism effect has traditionally been thought of as vision capturing sound, the visual stimuli need not always predominate. Alais and Burr (2004) found that the ability of one modality to dominate other modalities depends upon the reliability of the primary stimulus to provide accurate information. The binding of two apparently coupled stimuli can persist despite spatial and temporal discrepancies in signal presentation. At small distances or time intervals, on the order of a few degrees of view or several hundred milliseconds, the discontinuity between the cues is imperceptible (Thurlow and Jack 1973; Slutsky and Recanzone 2001). The larger the discrepancy becomes, however, the more quickly and easily the two signals are identified as being independent of one another (Jack and Thurlow 1973; Slutsky and Recanzone 2001; Alais and Burr 2004).

Because of inherent difficulties in studying multimodal signals, few studies have attempted to identify the prevalence and limits of the ventriloquism effect in animals in their natural habitats (Hoy 2005). A notable exception is a study by Narins et al. (2005). Using a robotic frog model and a playback speaker, they were able to investigate the spatial and temporal binding of multimodal signals by male dart-poison frogs under natural conditions. Frogs responded more when the visual and acoustic stimuli were placed in close proximity to one another (2–12 cm). If, however, the distance separating the two stimuli was increased to 25–50 cm, the speaker was more likely to be approached, indicating acoustic stimuli dominated the frog’s response. Thus, akin to humans, spatial binding in animals may rely upon the more salient signaling modality to capture concurrently transmitted stimuli.

Territorial animals faced with the task of assessing the threat of potential intruders must accurately decipher environmental signals and cues (Naguib et al. 2004). Territories may be large and topographically complex. Thus, to detect intruders, relying on visual stimuli alone may lead to missing a significant number of intruders. Territorial birds are ideal subjects in which to study multimodal communication because playback of conspecific song reliably elicits defensive behavior (e.g., Chantrey and Workman 1984; Nowicki et al. 2002; Moulton et al. 2004). We focused on pied currawongs (Strepera graculina), territorial breeders that are often found in pairs during the breeding season (Recher 1976).

Our study had two aims: first, to test the cognitive ability of a territorial, forest-dwelling bird to bind two spatially disparate stimuli; and second to define the processing of these acoustic and visual cues as having either equivalent or enhanced effects when presented together. One of three treatments—audio alone, audio and visual close together, and audio and visual far apart—was presented to free-living subjects and the magnitude of the currawongs’ response was measured. The fourth possible treatment, a “visual only” treatment (i.e., presentation of the stationary model in the absence of any acoustic cues) was not conducted because previous studies highlighted the necessity of an auditory or visually dynamic stimulus to initially engage animals (Wells 1978; Chandler and Rose 1988). While this somewhat limits our inference, interpreting the three “possible” treatments is sufficient to potentially demonstrate both spatial binding and stimulus enhancement.

Given the three treatments, there are three potential outcomes (Fig. 1). The first possible result (Fig. 1a) would reveal signal enhancement by multimodal stimuli when compared to the uni-modal stimuli (acoustic only), but no difference in level of response for spatially disparate cues. This would be inconsistent with previous studies demonstrating the ability of animals to spatially bind distinctly different signals only when discrepancies are small (i.e., on the order of a few degrees—Narins et al. 2005). The second possible result (Fig. 1b) would illustrate the ventriloquist effect that is present in the “close” treatment. This is because the response magnitude is greater than that obtained by the same stimuli used in the “far” treatment (Fig. 1b). There is no evidence of an enhancement effect with the addition of the second stimuli (“acoustic only” elicits similar responses to “close”). The third possible result (Fig. 1c), and our expected outcome, would be consistent with previous studies examining both spatial binding and multimodal signaling. Here, the response would be greatest only in the “close” treatment, such that the spatial discrepancy is small for the ventriloquist effect to be meaningful.

Fig. 1
figure 1

Potential results from experimental treatments where “response” is measured on an arbitrary scale. a There is response enhancement of multimodal signals, without evidence of spatial binding of bimodal stimuli. b There is no response enhancement by presenting multimodal signals, with evidence of spatial binding of bimodal stimuli. c There is both response enhancement of multimodal signals, and there is evidence of spatial binding of bimodal stimuli

While the characteristics of the response by a forest-dwelling bird to a territorial invasion has yet to be documented, we can make several logical predictions from what is understood regarding both risk assessment, and territorial responses to conspecific calls. van der Veer (2002) found that Yellowhammers (Emberiza citrinella) that could see a predator (a stuffed hawk) would resume feeding behaviors sooner after hearing conspecific warning calls than individuals that could not visually locate the model. This suggests that individuals with more knowledge about a potential risk are less likely to exhibit prolonged or enhanced levels of vigilance. In response to conspecific calls, the territorial wren Troglodytes troglodytes will choose a higher song post (Mathevon and Aubin 1997). This permits improved projection of its own song, and likely also allows for better exchange of visual cues. Thus we predict that the least informative (i.e., audio alone), and thus most unclear of the three treatments, will be characterized by increases in the distance of the subject from the stimuli, the variability of its position (which we quantify as the standard deviation in the distance from the stimulus), and the number of locomotion events; all of which reflect an overall increase in vigilance. We anticipate that spatial binding of visual and acoustic cues will increase the amount of information available concerning the likelihood of territorial invasion, and lead to a reduction in vigilance, whereas two unbound signals, which are less informative, will result in increased vigilance.

Methods

Field experiments were conducted in Booderee National Park (34°58′S, 150°41′E), Australian Capital Territory between 19 October and 1 November 2005. The majority of our playback experiments were conducted between 05:00 and 11:00 h, Australian Eastern Standard Time, although time of day was not a concern considering pied currawongs were active and vocalized throughout daylight hours (personal observations). We attempted to limit sites to forest fringes, where vegetation was sparse, to maximize visibility of the focal currawong; in some cases observations were truncated because the focal bird was out of sight.

To test the ability of subjects to associate a distinct, conspecific territorial intruder’s call with the physical presence of the intruder, a model was placed either close to or far from a single speaker (Sony SRS-77G). Both the model and the speaker were situated on a tripod 1 m above the ground. One of two latex models of the common American raven (Archie McPhee Toys, Seattle, Washington), painted to resemble a pied currawong, was used as the visual model. Currawong vocalizations from a commercial recording (Horton 2000) were used to make three different 16 bit, 44 kHz acoustic stimuli. All three of our playback exemplars were generated from two unique, yet similar, recordings. We edited the stimuli so that all playbacks had the same duration. Using the software Canary (Charif et al. 1995), we created three unique 5 s tracks of similar currawong vocalizations repeated every 20 s for 10 min. These three exemplars were both within the natural range of currawong calls and were sufficiently variable to allow us to test the general hypothesis about how the localization of a vocal stimuli influences currawong response without increasing the likelihood of having spurious exemplar-related effects. Exemplars were saved as AIF files and broadcast from an iPod (Apple Computer, Cupertino, CA, USA) through powered speakers (Sony SRS-77G; Sony Corporation, Tokyo). We standardized the peak amplitudes of the playbacks on each speaker to approximately 90 dB SPL measured 1 m in front of the speaker using a digital sound level meter (SPER Scientific 840029).

The three treatments and three playback exemplars were randomly presented. The first treatment (speaker only) was set up as a baseline for response and consisted of a single speaker broadcast into the forest. The second treatment (close) consisted of the speaker and the model separated by 1 m to test whether there was an enhanced effect by multimodal stimuli. A distance of 1 m was necessary to accommodate the legs of the tripods upon which the model and speakers were secured. The third treatment (far) was set up with the speaker and the model 15 m apart to determine if currawongs had the ability to perceptually bind two spatially displaced stimuli. In the forest habitat 15 m was the maximum distance consistently within the visual range of the observer, allowing for accurate scoring of currawong behavior. To avoid sensitization or habituation of subjects to experimental stimuli (Knight and Temple 1986; Shalter 1978), trials were conducted only once per location with a distance between locations that was well beyond the acoustic range of our playback. Over 30 playbacks per treatment were conducted in a random fashion. Within the 98 trials, 49 trials successfully recruited currawongs. Observations were retained for further analysis if the focal currawong remained within 25 m of either stimulus for more than 30 s (mean duration was 454 s), leaving a sample size of 42 focal birds. From these 42 trials, the sample sizes for each treatment were 10, 17, and 15 for speaker only, close, and far, respectively.

There were a total of four observers who worked in groups of two and always conducted the same tasks. Observers sat or stood 25 m from the experimental setup to minimize possible effects caused by their presence. Every 15 s, one observer recorded time-sampled spatial measurements including horizontal distance to speaker and direct height of subject from the ground. Due to the inherent difficulties for one observer to accurately estimate the horizontal distances of the subject from both the speaker and the model, we elected to use only the speaker as a reference point as it was present in every treatment. From these data we calculated the direct distance of the focal currawong to the speaker by using the Pythagorean theorem (direct distance2 = horizontal distance2 + direct height2). A second observer used a cassette recorder (Sony TCM-200DV) to record a 10-min continuous focal observation (Martin and Bateson 1986) on the first currawong to enter into a 25 m radius of either the speaker or the model. Our ethogram included the following behaviors: looking (each time the focal bird altered head orientation while remaining upright), flying, hopping (quickly jumped to a different location), walking, foraging, preening (moved its beak along it body), vocalizing, contacting model or speaker, out of sight, and other (behaviors not included in ethogram). Behaviors were considered to be mutually exclusive. Tape recordings were subsequently analyzed using JWatcher (version 1.0, Blumstein et al. 2006).

To estimate intra-observer reliability, each observer scored a single trial until intra-observer reliability exceeded 95%. For inter-observer reliability, each person scored the same trial until reliability exceeded 95%. For spatial measurements, each observer calibrated their distance estimates by judging height distances from approximately 25 m away and then measuring them by visually rotating the vertical distance to a horizontal position on the ground. To further reduce inter-observer error, the same two observers always focused on collecting focal animal samples and the other two observers focused on collecting distance estimates.

Statistical analyses compared both the spatial variation of the focal bird with respect to the speaker (average direct distance to the speaker and the standard deviation of the average direct distance to the speaker), and the behavioral responses [which we combined into categories of locomotion (fly, hop, and walk), relaxed (forage and preen), and vigilance (look)]. Vocalizations were omitted due to their interestingly rare occurrence. We analyzed the behavioral dependent variables (locomotion, relaxed, vigilance) by examining how treatment influenced the number of bouts, the average duration of each bout, the standard deviation of the average number and duration of each bout, and the proportion of time in sight allocated to each behavioral response.

There was variation in the number of conspecifics recruited, as well as whether focal currawongs were mobbed by heterospecifics. Heterospecifics were recruited by acoustic playbacks in all trials, regardless of the presence of currawongs. We used these as covariates in our analyses by fitting general linear models (in SPSS 11.0 for the Macintosh-SPSS, Inc. 2002) to determine whether each treatment explained significant variation in our dependent variables. For the general linear models, our independent variables included treatment, number of conspecifics, occurrence of mobbing, and all possible interactions between them. Data were transformed when necessary to meet assumptions of linear models. Post hoc comparisons to examine treatment differences were calculated using marginal mean values. Marginal means are the expected mean value after statistically controlling for variation in the dependent variable explained by other independent variables.

Results

Spatial binding

Treatment explained no significant variation in average duration of bouts (GLM, F 2, 30 = 1.228, P = 0.312), average standard deviation of bout duration (F 2, 30 = 1.007, P = 0.463), and proportion of time in sight spent in locomotion (F 2, 30 = 1.036, P = 0.441; Fig. 2a–c).

Fig. 2
figure 2

Time allocation responses (average ± SD) of pied currawongs to three experimental treatments: speaker only, speaker + Close visual stimulus, speaker + Far visual stimulus. Independent variables for each graph are as follows: a average duration of each locomotion bout; b average standard deviation in the duration of locomotion events; c proportion of time spent in locomotion; and d average number of locomotion events. In all cases, marginal mean values are plotted

While the model explaining variation in the number of locomotive events was not significant (F 11, 30 = 1.657, P = 0.133; Fig. 2d), a two-way (treatment × total number of conspecifics; F 2, 30 = 4.110, P = 0.026) and a three-way (treatment × mobbing × total number of conspecifics; F 2, 30 = 4.249, P = 0.024) interaction were significant. Pair-wise comparisons were significantly different between the treatment pairs of close and far (mean dif. = 25.292; P = 0.035), and speaker and far (mean dif. = −31.726; P = 0.005).

We found evidence of spatial binding in the standard deviation of direct distance to the speaker (F 11, 30 = 2.720, P = 0.015; Fig. 3a). The treatment alone significantly (F 2, 30 = 6.640, P = 0.004) explained 30.7% of the variation (the model explained 49.9% of the total variation). Additionally, there were two significant two-way interactions (treatment × mobbing, F 11, 30 = 3.469, P = 0.044, partial eta-square = 0.188; treatment × total number of conspecifics, F 2, 30 = 4.583, P = 0.018, partial-eta square = 0.234).

Fig. 3
figure 3

Spatial responses (average ± SD) of pied currawongs to three experimental treatments: speaker only, speaker + Close visual stimulus, speaker + Far visual stimulus

Multimodal signaling

Spatially coupled visual and acoustic signals enhanced response. Treatment-dependent differences in the standard deviation of the direct distance that the focal currawong was from the speaker were seen after controlling for variation explained by mobbing and the presence of conspecifics. Models explaining variation in the average direct distance to speaker (F 11, 30 = 2.720, P = 0.034) and the standard deviation of the average direct distance to speaker (F 11, 30 = 1.657, P = 0.015) were significant (Fig. 3). Pair-wise comparisons, however, revealed treatments to be significantly different only for average standard deviation of direct distance to speaker (Fig. 3a). Focal subjects varied their position relative to the speaker in a similar manner (mean dif. = 0.170; P = 0.895) in both speaker only and far treatments, but were significantly less variable when they responded to the close treatment when compared with the speaker only (mean dif. = −2.916; P = 0.011) or the far treatment (mean dif. = −3.086; P = 0.034). Analysis of locomotor behavior and time allocation failed to produce any significant models capable of describing either spatial binding or multimodal signaling.

Discussion

Integrating the information encoded within multiple signaling modalities is an essential problem many species must solve. We found that pied currawongs were likely to spatially bind visual and acoustic stimuli, and that these two stimuli together enhanced curawong’s response.

To our knowledge, this study is the first of its kind to test the ability of a free-living bird to spatially bind two disparate stimuli. When acoustic and visual stimuli were placed in close proximity (1 m), the number of locomotive events by the focal currawong was low (or less “vigilant”) relative to stimuli presented a greater distance apart (Fig. 2d). This suggests a cognitive binding of the two stimuli when distance disparities are at a minimum. Similar significant differences were found when the standard deviation of distance from the speaker was compared for both close and far treatments (Fig. 3a). Despite the simultaneous presentation of the visual and acoustic cues in both treatments, responses to stimuli were dissimilar. In close treatments, subjects appear to have associated the playback with the model, while in far treatments the greater distance between stimuli inhibited the spatial binding of the visual and acoustic cues.

Multimodal perception likely acts to enhance an individual’s response to territorial intrusion. When two stimuli were presented together, currawong responses were less spatially variable than when the stimuli were positioned far apart (Fig. 3a). Uni-modal treatments (speaker only) resulted in greater variability in the distance from the close treatment. This may, in part, reflect the currawongs’ attempts to locate the bird responsible for the calls. As with humans, currawongs apparently interpreted two simultaneously presented cues separated by a substantial distance as distinctly different, yet redundant, uni-modal stimuli (Jack and Thurlow 1973; Slutsky and Recanzone 2001; Alais and Burr 2004). Thus, the similar pattern of variation in response to speaker only and far treatments was expected. Such an outcome reflects the decreased response level characteristic of uni-modal signals, regardless of whether they are presented independently or simultaneously but spatially apart. These findings are consistent with earlier work by Chantrey and Workman (1984), which found song and model together to elicit longer periods of display and song response from territorial European robins (Erithacus rubecula) than the model alone. Figure 3b further illustrates the intensifying effect of multimodal signaling on currawong response to territorial intrusions, albeit in the absence of spatial binding.

In the biological context of territory defense, the ability of a resident animal to correctly assess the challenge created by an intruder may be enhanced by its adeptness in spatially binding multimodal cues. For example, in situations with multiple, simultaneous stimuli, such as a dense forest with many calling birds, spatial binding would give the defender the advantage of being able to focus on the intruder despite being inundated with multiple stimuli. This would be the avian equivalent of the “cocktail-party” phenomenon in which visual and acoustic stimuli enhance the ability of an observer to focus on the speaker despite being in a crowd of noisy people (Cherry 1953). Ranging, whereby individuals assess distance by quantifying the degree of distortion of acoustic cues, is also integral to successful territory defense (Naguib 1996). Thus, spatially binding the acoustic stimuli to the physical intruder may further enhance the ability of an individual to successfully protect his territory.

The results of this study demonstrate that pied currawongs have the cognitive ability to spatially bind multiple signals. Results also demonstrate that there is an enhanced effect of bimodal cues on territorial defense responses. While the expected results supporting the presence of spatial binding and multimodal signaling were not always found together, there exists, however, significant evidence to suggest that both events occur. Future studies are necessary to investigate additional factors influencing the cognitive processing of multiple signaling events in non-humans. To understand the conditions for cross-modal integration in animals, the critical distance at which simultaneously presented bimodal stimuli are instead considered two separate uni-modal cues must be established. The potential importance of spatial binding as a component of ranging (e.g., Naguib and Wiley 2001) may also provide additional insight into the cognitive capacity of birds, and thus offer a more complete understanding of the complex processing events necessary for successful territory defense.