Introduction

An important problem in the evolution of language literature is whether and how the calls of non-human primates refer to objects and events external to themselves. Empirically, this is usually assessed by investigating how recipients respond to conspecific calls, originally given to specific external events (Cheney and Seyfarth 2007; Wheeler and Fischer 2012; Arnold and Zuberbühler 2013; Zuberbühler and Wittig 2011). The classic example is the vervet monkey alarm call system where receivers behave differently after hearing acoustically different alarm calls, such as by climbing a tree after hearing alarm calls produced to leopards but not after hearing alarm calls produced to snakes (Seyfarth et al. 1980a; Price and Fischer 2014). These basic findings have been replicated in many other species, including chimpanzees and bonobos (e.g., Slocombe and Zuberbühler 2005; Clay and Zuberbühler 2011), suggesting that extracting relatively context-specific information from others’ calls is widespread within the primate order.

Comparably less research has been devoted to the psychological processes taking place in the signalers. One important question here is whether signalers actively try to direct a recipient’s attention to an external referent (referential communication) or whether this process is non-intentional, so that calls merely function as if they are referential (‘functional reference’). In this second possibility, signalers need not intend to refer receivers’ attention to anything, as long as recipients can learn to form associations between sounds and external events (Zuberbühler 2000a; Rainey et al. 2004). Examining animals’ signaling intentions is inherently complicated, and not much progress has been made to address this question (Seyfarth and Cheney 2003). As a consequence, ‘functional reference,’ referential signaling without intention, has become the default model for non-human primate communication (Seyfarth and Cheney 2003; Seyfarth et al. 2010).

Potentially interesting exceptions are male Thomas langurs (Wich and de Vries 2006) or male blue monkeys (Papworth et al. 2008) that appear to take receivers’ awareness or proximity to danger into account when calling. In many species, including chickens and social mongooses, the production of alarm calls is dependent on the presence of a conspecific (Karakashian et al. 1988; Le Roux et al. 2008), or of a particular conspecific, like kin (Hoogland 1983; Cheney and Seyfarth 1985). In contrast, Thomas langurs and blue monkeys also seem to take into account more complex, relational information between the audience and the threat.

In chimpanzees, two previous studies have shown that alarm call production to camouflaged, deadly snakes (see Supplementary video 5) is sensitive to multifactorial audience effects. First, signalers were more likely to emit quiet alarm calls if receivers did not yet have information about the snake compared with when they did, that is, when signalers had witnessed receivers seeing the snake or hearing other individuals’ quiet alarm calls (Crockford et al. 2012). Second, while emitting both quiet and loud types of alarm calls, signalers showed signs of intentionality (sensu Bruner 1974) during call production, such as gaze alternation between the snake and the receiver and signaling persistence until the goal was met (Schel et al. 2013a). The latter is particularly interesting in that the ‘goal’, defined by ‘stopping calling’, related to a decrease in receivers’ but not signaler’s risk. Results are consistent with the interpretation that quiet alarm call production is under some voluntary control, with evidence for at least the intention to change others’ behavior (termed first-order intentionality by Dennett 1983), and likely the intention to direct receivers’ attention to an external object, in this case a hidden threat.

Few studies have attempted to examine both the signaler’s intention and the recipient’s comprehension within the same call system. Although playback experiments show that many calls successfully direct receivers’ attention either to the signaler (mating calls: Ryan 1980; Mennill et al. 2002; aggressive calls: Bergman et al. 2003; affiliative calls: Cheney et al. 1995) or to an external object or event (Seyfarth et al. 1980a), it has not yet been demonstrated that these signals are, at the same time, produced with the explicit aim to direct another’s attention to a specific location. The connection between signaler intent and recipient comprehension must have been a vital step in the evolution of language, which has been discussed extensively in both the child development and ape gesture literature (see Bruner 1974; Tomasello 2008).

With this in mind, and given that chimpanzee quiet alarm call emission exhibits markers of voluntary and intentional production (Crockford et al. 2012; Schel et al. 2013a), we tested whether chimpanzees’ quiet alarm calls were capable of directing receivers’ attention to a signaler’s location. To address this, we carried out a field experiment contrasting quiet alarm calls, called ‘alert hoos’ (Crockford et al. 2012), with ‘rest hoos,’ hoos being a group of vocalizations that is acoustically very similar but appears in several variants.

Compared with other calls in the chimpanzee vocal repertoire, hoos are among the most tonal, quiet and lowest frequency (longest sound wave length) (Crockford 2005). Alert hoos are emitted during encounters with visually concealed threats, particularly camouflaged dangerous snakes or wire snares set by hunters (Crockford et al. 2012; Schel et al. 2013a). There is some uncertainty in the current literature concerning whether what has been termed ‘alert hoos’ (Crockford et al. 2012; Fig. 1), and ‘soft huus’ (Schel et al. 2013a; Fig. 1) are indeed the same call variant as relevant acoustic analyses have not yet been conducted. However, hoo call variants are also produced in a range of non-predatory contexts and mainly function to re-establish contact, without referring to an external event. For example, hoos are produced in the introduction and build-up phases of pant hoots, a long-distance call given in a number of contexts (Crockford 2005). Hoos are also produced at the start of traveling—‘travel hoos,’ or during resting—‘rest hoos’ (Gruber and Zuberbühler 2013), which have also been called ‘soft grunts’ and ‘extended grunts’ by Goodall (1986, p. 131). Both types are used in benign contact contexts when other individuals are relatively close (<100 m) and can be acoustically differentiated from each other (Gruber and Zuberbühler 2013). All hoo types are produced by both males and females and are commonly heard on a daily basis, apart from the ‘alert hoos,’ which are heard more on a weekly basis.

Fig. 1
figure 1

Spectrograms of hoo variants used as playback stimuli and originally produced in alert or rest contexts. Y axis (Hz), X axis (s). a One long and one short rest hoo variant from adult male KZ; b short rest hoos from adult males ZF and KZ, adult female KW, subadult male PS and subadult female RE, respectively; c single alert hoos from adult males SQ and KZ, adult female KW, subadult male PS and subadult female OK, respectively; d natural series of alert hoos from adult female KW. Spectrograms correspond to ESM sound files 1, 2, 3, 4

Under natural conditions, receivers of ‘alert hoos’ react by abandoning their activity to either cautiously approach or avoid the signaler (Crockford et al. 2012). Once the signaler is within visibility, recipients appear to use the signaler’s head orientation to detect the hidden threat, which is typically 2–15 m from the signaler (N = 108/111 cases for 33 subjects, Crockford et al. 2012). Natural responses to ‘rest hoos’ are considerably different, despite the acoustic similarity between the two hoo call variants. After hearing a ‘rest hoo,’ receivers usually continue resting or feeding but may respond with a vocalization, often also a ‘rest hoo.’

We designed a playback study to ascertain whether chimpanzees extracted different information from ‘alert’ compared with ‘rest’ hoos. Specifically, we expected subjects to look toward the speaker in both conditions but (a) to be more attentive, (b) to show more cautious behavior, and (c) to show more search behavior after ‘alert’ compared with ‘rest hoos.’ We measured attentiveness by the number of looks and looking duration toward the speaker, standard measures in playback experiments (Cheney and Seyfarth 2007; Zuberbühler and Wittig 2011). An additional sign of high attentiveness was when subjects squared their whole body toward the speaker. Cautious behavior was measured by the number of travel pauses and steps taken. While individuals might be motivated to pause to wait for the call provider in both conditions, we expected the number of steps taken to be greater when caution was not required, that is, in the rest condition. Finally, search behavior was measured by counting repeated changes in head position (presumably to scan the dense undergrowth near the speaker).

Methods

Study site and subjects

Subjects were wild-living chimpanzees of the Sonso community in Budongo Forest, Uganda (Reynolds 2005), followed by human observers since 1990 and habituated since around 1995. Observations were made from February 2008–August 2010 and June–July 2011. Experiments were conducted from April to August 2010 and June–July 2011. Out of a total of 77 chimpanzees, we tested 12 subjects; 3 adult (>14 years) and 2 subadult (10–14 years) females and 4 adult (>15 years) and 3 subadult males (10–15 years). Subjects were selected based on their travel habits, by selecting individuals that most commonly travelled in central parts of the territory so that multiple trials would be possible in the area where travel paths were easiest to predict.

Selection of playback stimuli

Primate calls tend to be individually distinctive, and playback experiments have repeatedly shown that primates are quick to recognize the identity of callers across a range of soft to loud call types (Cheney and Seyfarth 2007). Hoos used as playback stimuli were recorded opportunistically from known individuals using Sennheiser MKH416 and MKH418 microphones and Marantz PMD 660 digital recorders. Digital sound files were saved in.wav format and used within 24 months from the time of the recording. After transferring the calls to a laptop computer, we used PRAAT software (Boersma and Weenink 2009) to screen for calls of high quality, without overlap from other individuals and free of undesired background noise.

Playback stimuli consisted of one of the four different hoo exemplars (Fig. 1). Rest hoos are almost always produced as either short or long single hoos and thus were presented as a single hoo. Alert hoos are sometimes produced singly but more often as a series. To keep stimulus duration comparable and to reflect natural variation across hoo contexts, playback stimuli were either a single short or long rest hoo, a single alert hoo or a natural series of three alert hoos. Duration of playback stimuli was as follows (mean ± SD): alert hoo series = 3.95 ± 1.3 s, single alert hoos = 0.24 ± 0.05 s, single short rest hoo = 0.23 ± 0.04 s, single long rest hoo = 0.55 ± 0.11 s.

Alert hoo stimuli were recorded either while the subject was looking at a rhinoceros or gaboon viper or model viper (constructed out of plaster cast, painted using colors and geometric patterns representative of each snake species, and then varnished; Fig. S1). Both long and short rest hoos were recorded while signalers were resting (sitting or lying) for >1 min. Alert hoo series were selected to be similar in length and amplitude. Variation in duration of all single hoos was controlled for in the statistical models. All stimuli were recorded at distances of 4–10 m from callers and were then calibrated to match the natural amplitude for each call type. Sound pressures at a distance of 1 m were mean ± SD = 66 ± 5 dB for alert hoos and 70 ± 7 dB for rest hoos. Calls were stored on an Apple iPod and broadcasted from a Nagra DSM speaker placed in a specially modified backpack for non-obstructed sound presentation.

Experimental Design

The experiment was based on a within-subject design, with the aim that each subject was exposed to at least one of two alert hoo stimuli (single or series) and at least one of two rest hoo stimuli (short or long).

Playback experiments were carried out when subjects were alone or in small parties, as they walked past or rested within 5–10 m of a previously concealed speaker. Considerable care was taken to avoid the possibility that call providers should hear their own calls. To this end, one observer followed the call provider and communicated via handheld radios when they were >200 m away from the subject, well beyond the acoustic range of either call type (<100 m).

Considerable care was also taken that the speaker and its operator were hidden in dense vegetation 5–10 m away from the trail along which the subject was expected to travel. The speaker was positioned at an angle of 60–90° from the subject’s expected head orientation when walking along the trail. The experimenter, positioned 7–15 m from the subject, filmed the subject using a Panasonic NV-GS 330 DV camera continuously, before (>10 s), during and after the playback. Although the speaker was placed in dense vegetation, we aimed to use a relatively open area of path, allowing filming of subjects with minimal vegetation interference. It was not always possible to predict a subject’s walking direction following the playback, and filming was thus sometimes restricted by undergrowth. Videoing of subjects continued for as long as possible, provided the subject remained visible and within 40 m of the speaker. To avoid confounds that subjects might be more motivated to respond to the calls of certain individuals over others, such as bond partners, estrous females or more dominant individuals (e.g., see Schel et al. 2013b), subjects heard the same call provider across trials where possible (see Table 1), and different subjects heard different call providers. Also, call providers were never in estrous at the time their calls were played back. 26 different stimuli from 6 call providers [1 adult and 1 subadult female (individuals KW RE), 3 adult and 1 subadult male (individuals NK KT SQ PS)] were used across the 12 subjects (Table 1). Seven stimuli were single alert hoos, eight stimuli were ‘alert hoo’ series, nine stimuli were short, and four were long ‘rest hoos.’ We chose call providers with relatively neutral relationships to subjects (ESM section C). The strength of social bonds was calculated using a ‘Composite Relationship Index’ following Crockford et al. (2013) (see ESM) and rank differences using matrices based on a standard criterion, the production of submissive pant grunt vocalizations (see Wittig et al. 2014 ESM for analyses).

Table 1 Playback stimuli heard by subjects

To avoid habituation to any of the playback stimuli, we conducted experiments below the frequency of naturally occurring hoos, and chimpanzees rarely heard playback stimuli on consecutive days. The order of presentation was counterbalanced across trials.

Data analysis

Coding of behavioral responses

Using VLC video software, CC extracted six behavioral variables using a frame-by-frame method (25 frames per second) during the first 30 s from the start of the first simulated hoo. One other observer, Liran Samuni, independently blind-coded (15/42) 35 % of the trials on two behavioral variables, respectively, to ascertain inter-rater reliability of behavioral measures, resulting in good inter-rater reliability with Spearman’s correlation of 0.82 (Scans) and 0.81 (Body Orientations).

Behavioral variables and predictions were as follows (see Table 2):

Table 2 Response variables, predictions and descriptive statistics from 42 playback trials
  1. 1.

    Number of looks to the speaker: number of times the subject’s head turned and paused within a 30° arc of the speaker (prediction: alert hoo > rest hoo).

  2. 2.

    Total looking duration to the speaker: total duration of all looks within a 30° arc of the speaker (prediction: alert hoo > rest hoo).

  3. 3.

    Number of body orientations to the speaker: usually, chimpanzees monitor their environment using head turning movements to look in the direction of a particular stimulus. Much less frequently, they also turn their whole body in the direction of particular stimulus (Crockford, personal observation). We assumed that if the body orientation remained in the direction of travel, then this was an expression of low attention and was not recorded. In contrast, a stationary position with subject’s arms turned more than half way from the direction of travel toward the speaker was considered an expression of high attention. Each such occurrence of body turning toward the speaker was counted as one ‘high attention’ event. We predicted to observe more high attention events in response to alert than rest hoos.

  4. 4.

    Number of scans: number of times the head direction changed while looking within a 45° arc of the speaker to left or right, up or down. Only abrupt changes in direction were noted, indicated by a prior cessation in head movement. Changes in direction were measured by ear or eyebrow ridge movement, as these are prominent features with distinct edges whether viewed from frontal, lateral or posterior positions, as compared to general head movement. Also, these body parts do not move independently from the head. We expected that in the alert condition, subjects would scan more times either to locate the signaler—in order to gain precise threat location information from the head and body orientation of the signaler, as suggested for titi monkeys (Cäsar et al. 2012) and putty-nosed monkeys (Arnold and Zuberbühler 2013) after hearing predator alarm calls—or to search around the speaker to locate the threat themselves.

  5. 5.

    Number of Pauses: halts in walking caused by all four limbs stopping forward movement at the same time. We expected that individuals paused more when needing to ascertain if and where a nearby threat might be.

  6. 6.

    Number of steps: full forward—or backward—paces of a forelimb. In the inverse direction from the other variables, we expected there to be fewer steps when caution was required in movement due to a nearby but unlocated threat in the alert compared with the rest condition.

Statistical analysis for playback experiments

In order to determine whether hoos produced in rest and alert contexts elicit different behavioral responses, we conducted a series of Generalized Linear Mixed Models (GLMM; Baayen 2008) using R version 3.0.2 (R Core Team 2013) and the function glmer of the package lme4 (Bates et al. 2014). Each model tested a different behavioral response variable, coded from the videos, against the same set of four fixed effects predictor variables and four random factors. The predictor variables were as follows: (1) Call Context: the context of the playback stimulus (alert or rest hoo); (2) signal length: short or long, where ‘short’ calls were single alert hoos or short rest hoos, and ‘long’ were single long rest hoos or a series of alert hoos (see Table 1 for distribution of trials across subjects); (3) subject rank relative to call provider: dominant or subordinate (9 dominant and 12 subordinate trials in both alert and rest hoo conditions); (4) number of hoo playbacks already heard by the subject throughout the study (mean ± SD 2.7 ± 1.5 range 1–6, N = 12 subjects). Call Context was our main variable of interest and was considered to be the test predictor, with the other three variables considered as control predictors. We included one interaction, Call Context (Alert v Rest) and Signal Length (Short v Long), given that repeated alert hoos may have provided a more urgent signal than single alert or rest hoos (Zuberbühler 2009). In all models, this interaction was insignificant; thus, it was removed, and the models were rerun.

Because our experiments were based on a within-subjects design, with subject call provider pairs being tested more than once, the following random intercepts were included in the model: subject identity, call provider identity, dyad identity of subject and call provider. Finally, the playback stimulus was included as a random factor. This was because some stimuli were used more than once (max = 3), although subjects were never played the same playback stimulus twice. Only one random slope could be fitted (for Call Context within Call Provider), as in most combinations of fixed and random effects, there was at least one instance when fewer than two different values of the fixed effects occurred per level of the random effects (Schielzeth and Forstmeier 2009; Barr et al. 2013).

The six behavioral variables coded from the videos were potential response variables. To reduce redundancy between correlated behavioral variables, as well as to reduce multiple testing of the behavioral variables, we conducted a Factor Analysis in R (using the factanal argument) and selected the variable with the strongest loading from each of the three resulting factors (Factor 1: Scans; Factor 2: Body Orientation; Factor 3: Steps; See ESM section D for results). We tested these three variables as response variables in three separate models. Each model was run as a Poisson model, and as the observation time was not the same for each subject (mean ± SD 25.5 ± 6.1 s, range 10–30 s, N = 42 trials), Observation Time (log-transformed) was included as an offset term (McCullagh and Nelder 2008).

To ensure model convergence, we set the optimizing function to ‘bobyqa’ and the maximum iterations to 10,000. One model, with Steps as a response variable, only reached convergence after the random slope had been excluded. We checked for model stability by excluding the levels of the random effects one at a time from the data, which indicated that no influential cases existed. This was done for models excluding the correlation between the random slope and intercept. Variance Inflation Factors (VIF, Field 2005) were derived using the function VIF of the R-package car (Fox and Weisberg 2011). They were applied to a standard linear model excluding the random effects and, with a maximum VIF < 2, did not indicate colinearity to be an issue. Tests of overdispersion showed no cause for concern with dispersion parameters <1. Finally, given that the three GLMMs conducted were multiple tests of the same behavioral response, P values relating to the test predictor, Call Context, were subjected to Bonferroni corrections, such that significant values were considered below p = 0.0167.

Results

In 42 trials, we were able to code behavioral responses that lasted for more than 10 s (mean ± SD 25.5 ± 6.1 s, range 10–30 s; Table 2). Trials in which the subject could be filmed for ≤10 s were excluded (10 trials, all due to subjects moving into areas obscured by dense undergrowth). Subjects looked to the speaker in every trial (mean ± SD of looking duration/observation time: alert context: 47.7 ± 17 %, range 5–90 %; rest context: 27.7 ± 29 %, range 1–93 %).

The test predictor (Call Context) showed a significant influence on the response variable (Table 3) in two of three statistical models, but with only one model remaining significant after the Bonferroni correction. In the Scans model, we tested if the number of scans was influenced by the test predictor (Call Context). Following our predictions, chimpanzees engaged in more head scanning in the direction of the playback speaker after hearing alert hoos than rest hoos (GLMM for Scanning: estimate = −0.40, SE = 0.14, z = −2.95, p = 0.003, Tables 2, 3a; Fig. 2a, Supplementary Videos 6 and 7).

Table 3 Influence of predictor variables on behavioral responses following the playback experiment
Fig. 2
figure 2

Chimpanzee behavioral response rates during first 30 s after broadcasting an alert or a rest hoo. Circles represent mean values per subject per condition. Dashed lines connect values of the same subject across conditions. a Number of head scans within 45° of speaker, depending on hoo context; b number of body orientations to speaker, depending on hoo context

In the Body Orientation model, we tested if the number of body orientations to the speaker was influenced by the test predictor (Call Context). Partially supporting our prediction, chimpanzees showed a tendency to engage in more body orientations to the speaker after hearing alert hoos compared with rest hoos (GLMM for Body Orientation: estimate = −1.1, SE = 0.52, z = −1.96, p = 0.05, Tables 2, 3b; Fig. 2b).

In the Steps model, only a control predictor variable, Rank of Subject Relative to Call Provider, showed a significant effect, which remained a trend following Bonferroni Correction. Chimpanzees made more steps when they were dominant rather than subordinate to the call provider (GLMM: estimate = −0.76, SE = 0.32, z = −2.48, p = 0.017, Table 3c; Dominant to Call Provider: mean ± SD 14.4 ± 9.7 steps/observation time; Subordinate to Call Provider: mean ± SD 8.3 ± 6.5 steps/observation time (Fig. 3). It should be noted that this was mainly a between-subjects effect as Call Provider was held constant within subjects across conditions. It should also be noted that this result could be an artifact of multiple testing as we accounted for only one test predictor in the GLMM and thus ran no full—null model comparisons.

Fig. 3
figure 3

Step rate depending on subjects’ dominance rank relative to the call providers’ during first 30 s after broadcasting an alert or a rest hoo. Circles represent mean values per subject per condition. Dashed lines connect values of the same subject across conditions. Dom dominant, Sub subordinate. As call providers were generally kept constant within subjects, this was essentially a between-subjects test

The interaction between Call Context and Signal Length was not significant in any model and was therefore removed (Estimates are given with respect to rest hoos and short hoos, P values before Bonferroni Correlation: GLMM for Scanning: estimate = 0.10, SE = 0.35, z = 0.28, p = 0.78; GLMM for Body Orientation: estimate = −1.08, SE = 0.89, z = 1.21, p = 0.23; GLMM for Steps: estimate = 0.14, SE = 0.49, z = 0.30, p = 0.77).

No measured aspect of the behavioral response following the playback stimulus was affected by the other two predictor variables: the length of the playback stimulus and the number of previous playback stimuli heard throughout the testing period.

Discussion

Chimpanzees responded differently after hearing acoustically graded but distinguishable hoo variants, suggesting that they extracted information about the context associated with the different call variants, despite their acoustic similarity. Specifically, subjects engaged in significantly more scanning, indicative of search behavior in the direction of the speaker, after hearing alert compared with rest hoos. Signal duration did not show a significant effect in any of the models, indicating that call context rather than call duration was influencing behavioral responses.

It is possible to argue that differential responses were simply arousal driven, due to specific acoustic markers of threat, urgency or excitement in the two hoo types, such as differences in the maximum fundamental frequency (supplementary audio files; Morton 1977; Owren and Rendall 2001). We find this explanation less plausible, however, for the following reasons. First, repetition of alert hoos, an acoustic feature expected in more urgent contexts (for review see Zuberbühler 2009), did not elicit greater attention to the speaker than single alert hoos. Second, the same call type in monkeys (Cheney et al. 1995; Cheney and Seyfarth 1999, 2007), or even the exact same call in chimpanzees (Wittig et al. 2014), can elicit either no or considerable attention to the speaker depending the relationship between signaler and recipient (Engh et al. 2006), signaler and a third party (Cheney and Seyfarth 1999; Bergman et al. 2003; Crockford et al. 2007), or the receivers’ previous social interactions (Wittig et al. 2007a, b, 2014). This is the case even for calls with conspicuous acoustic features of threat and urgency, such as high-pitch, high-amplitude screams and barks (Wittig et al. 2014). In contrast, hoos are among the most quiet, low-pitched and tonal calls in the chimpanzee repertoire (Goodall 1986; Crockford 2005). Overall, it therefore seems unlikely that differences in scanning behavior were directly and inflexibly induced purely by acoustic features of urgency or excitement, without any associated assessment of contextual information conveyed by the call on behalf of the recipient. In addition, given that call providers were kept constant within subjects across conditions, a vested interest in call providers cannot have caused differential responses. More likely, chimpanzees extracted contextual information from these calls, presumably because of learned associations between call variants and their respective contexts, as has been suggested for other species (Monkeys: Cheney and Seyfarth 1990, page 151; Fischer 1998; Zuberbühler 2000a; Hornbills: Rainey et al. 2004; see also Zuberbühler 2009; Seyfarth et al. 2010).

Across conditions, chimpanzees appeared to be motivated to search for the caller, even though all playback stimuli were quiet, unremarkable calls. Subjects looked toward the speaker or around the speaker area for several seconds in most trials (Table 2). This concurs with our natural observations when a chimpanzee first becomes aware of a new arrival in the vicinity. This can happen several times throughout the day within the dense forest habitat and fission–fusion social system of chimpanzees. The fact that subjects could not immediately see the caller was likely to have precipitated continued search behavior, again, as occurs in natural situations. After hearing either type of rest hoo, chimpanzees stopped searching after a few scans. In contrast, after hearing one or several alert hoos, subjects scanned significantly more in the direction of the speaker, suggesting that chimpanzees were more motivated to see the call provider or to locate the threat close to the call provider. Alert hoos, compared with rest hoos, were more effective in drawing receivers’ attention to call providers as well as to the locality of a threat. Attention drawn to call providers is by default also drawn to the locality of the threat, since chimpanzees rarely produce alert hoos at greater distances than 15 m from the threat: N = 3/111 cases (Crockford et al. 2012).

Under natural conditions, alert hoos are given to a range of threats, particularly to camouflaged, dangerous snakes (see Supplementary video 5), snares and fresh leopard scat. Natural observations suggest that the cautious approach behavior of receivers toward signalers of alert hoos, followed by receivers’ continued search behavior in the direction of the signalers’ gaze/head position (Supplementary video 5), indicates that although some information about the presence of a threat seems to be conveyed in the call, the exact cause of the disturbance is not specified by the calls and requires individuals to acquire additional contextual information (Wheeler and Fischer 2012), as has already been demonstrated for other primate alarm call systems (Zuberbühler 2000b; Cäsar et al. 2012; Arnold and Zuberbühler 2013). Given the highly camouflaged nature of some of these threats, it may be that additional contextual information acquired from observing signalers’ orientation is crucial for finding the threat (Arnold and Zuberbühler 2013; Supplementary video 5).

Few studies have attempted to examine production intention and comprehension within the same call system, a necessary step in determining whether signalers intend for their calls to draw receivers’ attention to an external object or whether receiver’s attention is drawn without signaler intent. A drawback in this study is of course that there was no snake for receivers to find, so that although receivers engaged in more search behavior in the speaker area, after hearing alert rather than rest hoos, we cannot explicitly state what subjects were searching for. We can think of two possibilities.

First, receivers may have been motivated to detect signalers if acoustic features denoting alarm or distress were more evident in alert compared with rest hoos. As argued above, the acoustic features of the different hoo types are unlikely to be the main factor that influenced receiver responses.

Second, receivers may have been motivated to detect signalers because the alert hoo, but not the rest hoo, conveyed information about a threat. Playback studies on monkey alarm calls that include a looking or scanning variable show that alarm calls precipitate searching behavior, whether to the speaker (Fischer 1998; Cäsar et al. 2012; Arnold and Zuberbühler 2013) or in the direction expected for aerial or ground predators (Seyfarth et al. 1980b; Schel et al. 2010; Cäsar et al. 2012; Arnold and Zuberbühler 2013). Some studies show that looking toward the speaker, rather than up for aerial predators or down for ground predators, occurs more after hearing alarm call types that are given to broader ranges of stimuli than alarm call types given to a narrow category of stimuli (Cäsar et al. 2012; Arnold and Zuberbühler 2013). In such cases, Arnold and Zuberbühler (2013) suggest that receivers may be looking for additional cues from signalers, such as their orientation. Captive chimpanzees are known to change their behavior according to the body and head orientation of another species, humans (Kaminski et al. 2004). Thus, once receivers have seen signalers, signalers’ head orientation may then draw receivers’ attention toward the threat itself.

One advantage of the alert hoo system relating to a class of objects rather than a specific object is that there is flexibility in terms of what external object signalers can refer receivers’ attention toward. When new types of hidden threat are identified by signalers, such as snares, signalers use the same call as for existing hidden threats, such as snakes, which this study indicates draws receivers’ attention toward signalers.

One relevant question is what signalers might gain from recruiting receivers when seeing a snake. Gaboon and rhinoceros vipers are not known to prey on chimpanzees and, in contrast to ambush predators such as leopards and pythons, chimpanzees do not mob vipers. Thus, recruiting others to these snakes does not appear to increase personal gain or cooperative predator defense behavior. Kin-directed benefits in social learning are one possibility, although calling also occurs when kin members are not present (Crockford et al. 2012). Chimpanzees may also be interested in protecting other group members if group size is a key factor in resource defense. There is some evidence that larger chimpanzee communities can annex territory from smaller ones (Mitani et al. 2010), suggesting that protecting community members from deadly snake bites may be an adaptive strategy.

This study shows that chimpanzees extracted different information from alert hoos than from acoustically similar rest hoos. The most generous interpretation is that this information alerts receivers to the presence of a threat close to the signaler. Receivers were apparently motivated to locate the signaler, presumably because they require additional contextual information to locate and identify the threat. Previous studies show that signalers emit alert hoos, not specifically to recruit receivers, for example, for mobbing, but to inform them with some level of intent about a hidden threat (Crockford et al. 2012; Schel et al. 2013a). Taking past and current evidence together, although further testing is needed, chimpanzee alert hoos represent a plausible case where signalers intend to transfer relatively specific information to receivers through a vocal signal.