Introduction

Being able to assess another’s attentional state during social interactions is an important skill, particularly in communicative interactions involving visual signals. Only if the other individual is attending will the visual signal be successful. Even though a large body of research has investigated non-human apes’ abilities in this domain, the findings are somewhat inconsistent in terms of which cues the subjects actually attend to. From observational studies, focusing on communicative interactions between conspecifics, there is evidence that all great ape species adjust their communication depending on the body posture of potential recipients. That is, they use visual signals only when a recipient is facing them or they even move within the recipient’s visual field (for chimpanzees: Liebal et al. 2004a; Tomasello et al. 1994, 1997; for bonobos: Pika et al. 2005; for gorillas: Genty et al. 2009; Pika et al. 2003; for orangutans: Liebal et al. 2006; Tempelmann and Liebal 2009).

Several experimental studies have been conducted in order to further analyse just how flexible apes’ understanding of another’s attentional state is and which cues must be attended in order to determine it. One paradigm used extensively is the so-called requesting paradigm. The subject is positioned opposite one or several human experimenters whose attentional states vary. The question being explored is whether the subject will adjust its begging behaviour to the attentional state of the human and/or is able to assess who to beg from. One of the first researchers to use this paradigm were Povinelli and Eddy (1996) who presented chimpanzees with a choice of two humans. In a series of conditions, both experimenters engaged in different attentional states, e.g. one human was oriented towards the chimpanzee, while another faced away from it. Results indicated that the chimpanzees made their choice of who to beg from based on body orientation rather than the orientation of the face or the status of the eyes. Povinelli and colleagues therefore concluded that chimpanzees know virtually nothing about the role of the eyes (or the role of face orientation) in terms of perception.

Nevertheless, it accumulates evidence suggesting that at least chimpanzees differentiate a human’s attention based on the status of the human’s eyes. Call and Tomasello (1994) and Gomez (1996) found that chimpanzees (and also some enculturated orangutans) were sensitive to the status of a human’s eyes (see also Kaminski et al. 2004). Also Hostetter et al. (2007) were able to demonstrate that chimpanzees tested in the requesting paradigm attended to the status of the eyes, in as much as the apes used auditory gestures more frequently when the human’s eyes were closed compared to when they were open. In Povinelli et al.’s study (2003), chimpanzees adjusted their communicative behaviour depending on a human experimenter’s attentional state; that is, depending on where the human was looking (e.g. the relevant food or the ceiling).

Chimpanzees (other species are underrepresented in the aforementioned studies) therefore seem to be quite flexible in their understanding of another’s attentional state, an ability which seems to be grounded on a true understanding of another’s visual perspective (e.g. Hare et al. 2000, 2001; Okamoto-Barth et al. 2007). Therefore, these results are contradictory to Povinelli et al.’s findings and conclusions. A recent study of Kaminski et al. (2004) incorporating different ape species (bonobos, chimpanzees and gorillas) has begun to provide the first findings that might help to clarify this antagonism and draw conclusions about the factors that influence other ape species’ assessment of a human’s attentional state. In that study, using the requesting paradigm, the experimenter’s attentional state was varied systematically by manipulating two factors that might potentially influence the apes’ behaviour; the orientation of the human’s face and the orientation of the human’s body. The apes begged a human for food who was engaged in one of four possible face and body orientations, which varied independently. Contrary to Povinelli and Eddy (1996), subjects were only presented with one experimenter; thus, the experimental manipulation took place across trials. Interestingly, the authors found an interaction between the two main factors, face and body orientation, suggesting that both factors affected the apes’ communicative behaviour. When the experimenter was bodily oriented towards the subject, the apes attended to the orientation of the face and begged more when it was oriented towards them than away from them. However, when the experimenter was oriented away from them, this effect vanished and the orientation of the face no longer affected the apes’ begging behaviour. Indeed, the apes begged significantly less when the experimenter was oriented away from them. Kaminski et al. (2004) suggest that this may be the result of body and face orientation conveying two different types of information and a bivariate and hierarchical interpretation of the two factors. While body orientation may impart information about the experimenter’s general ability to hand over food, face orientation provides information about the experimenter’s actual attentional state. If this were the case, apes should only attend to the orientation of the face when the human experimenter is in a position to hand over food, regardless of body orientation. Furthermore, the authors could not detect any species difference regarding these findings.

In the current study, we tested Kaminski et al.’s hypothesis systematically by eliminating the apes’ need to pay attention to the human’s body orientation. We did so by using a feeding device with which the experimenter was able to deliver food independently of his body orientation. However, little is known about how the other ape species assess humans’ attentional states which makes it difficult to make statements about the evolution of this ability. Therefore, we also compared all great apes species systematically with the same paradigm.

Method

Subjects

Six orangutans (Pongo pygmaeus), four gorillas (Gorilla gorilla), 18 chimpanzees (Pan troglodytes) and five bonobos (Pan paniscus) participated in this experiment. There were 23 females and 10 males ranging in age from 4 to 35 years. Fourteen apes were human reared, 17 were mother reared, and the rearing history of two individuals was unknown. With one exception, all subjects were born in captivity (see Table 1 for an overview).

Table 1 Details of the subjects participating in the present study

All apes were housed at the Wolfgang Köhler Primate Research Centre at Leipzig Zoo (Germany), where they lived in groups of conspecifics. All apes had access to indoor and outdoor areas. During testing, the apes were fed according to their daily routine four times a day on a diet of fruit, vegetables and monkey chow; water was available ad libitum.

Apparatus

The apparatus (see Fig. 1) consisted of one half of a dissected piece of transparent pipe (length: 20 cm) with a centred hole halfway along its length. This pipe was fixed to the uppermost edge of a tall upright board (height: 102.5 cm, width: 20 cm) so that the concave inner of the pipe acted as a receptacle. The board had a segment hewn directly beneath the hole in the Perspex pipe. Attached to the hole on the underside of the pipe was a transparent tube (length: 69.5 cm, diameter: 4 cm) which acted as a chute, carrying rewards from the open pipe at the top of the board down to a Plexiglas box, which was located 58.5 cm from the board. The Plexiglas box (height: 16.5 cm, depth: 20 cm, width: 14.3 cm) was fixed at a height of 47.5 cm from the ground to a metal mesh. This mesh constituted the direct boundary of the apes’ cage. The side of the box that was fixed to the mesh was open, allowing a subject to reach with its fingers into the box.

Fig. 1
figure 1

Apparatus and general experimental setup

The experimenter sat next to the apparatus (see Fig. 1). The design of the apparatus enabled the experimenter to deposit food pieces into the open pipe (which shall henceforth be called the “platform”) in full view of the subject. By pushing the food into the hole in the platform, the food made its way down the tube into the box beside the ape’s enclosure.

Warm up

The warm up ensured that the subjects understood the mechanism of the apparatus and that the experimenter was able and willing to give them food independent of his body position. The experimenter offered food (approximately every 10 s) by pushing it through the tube in the apparatus. The experimenter changed his body position randomly in order to show subjects that food could be given irrespective of body orientation. Each individual received a warm up at the beginning of each testing day (for 2 min) and between sessions (for 1 min).

Experimental procedure

E sat beside the apparatus and in front of the subject. During the experimental trials, E placed a piece of food on the platform in front of the subject and engaged in one of 4 possible conditions (modelled after Kaminski et al. 2004).

  1. 1.

    Face/Front: E’s body and face were oriented towards the subject.

  2. 2.

    No face/Front: E’s body was oriented towards the subject, while his head was turned away from the subject.

  3. 3.

    Face/Back: E’s body was turned away from the subject, and his face was oriented towards the subject.

  4. 4.

    No face/Back: E’s body and face were turned away from the subject.

During each trial, E remained motionless, holding the respective position for 30 s. Subsequently, without changing his body orientation, E dropped the food through the tube where it travelled to the subject. To keep subjects motivated to beg for the food, one to four filler trials followed each experimental trial. During the filler trials, E placed some food on the platform and offered it to the subject immediately, without waiting for any requesting behaviour.

Each subject received four session (2 sessions per day), with 4 trials per session resulting in up to 16 trials altogether. Each condition was presented once in each session, and order of condition was randomized across sessions. There was a break of approximately 1 min between the consecutive sessions of a day, during which time the experimenter left the testing room.

Data scoring

All experimental trials were video-recorded from three different perspectives, using a splitter, and were later coded by a second person. Opposed to Kaminski et al. (2004), who coded only five different behaviours, we coded the complete set of subjects’ behaviour using a deductive/inductive coding schema; this had several reasons. Different from Kaminski et al.’s (2004) setting, we used a mesh instead of a Plexiglas panel, between experimenter and subject; this gave subjects a little more degrees of freedom, which is why we could code behaviour that Kaminski et al. (2004) could not find, like “mesh shake”. We also looked detailed at the different modalities of signals (e.g. visual and auditory signals) and whether there was a differential use of these modalities depending on the human’s attentional state; thus, we included the complete set of signals used by the individuals.

For analysis, we grouped behaviour into three different categories depending on their modalities; these were unimodal visual, unimodal auditory and bimodal visual–auditory signals (see Table 2 for definitions).

Table 2 Definition of signals in the current study classified according their respective sensory modality

Reliability

For inter-observer reliability, roughly 20% of the video material was coded by a person unfamiliar with the hypothesis of the study. Reliability was excellent (Spearman r = 0.82, P < 0.001, n = 93).

Results

To analyse whether the subjects’ behaviour differed between the four sessions, we conducted an ANOVA with the within factors condition and session and the between factor species. Analyses revealed that for all species, the factor session had no influence on the total number of behaviours in the respective condition. Therefore, for the following analyses, we pooled the data of all (four) trials for each respective condition.

Unimodal visual signals

First, we focused on the apes’ performance with regard to unimodal visual signals (see Table 3). These signals are especially interesting in the present context, because their adequate usage postulates an assessment of a potential recipients’ attentional state on the basis of their visual perspective. We conducted an ANOVA with the within factors body orientation (front vs. back), face orientation (towards vs. turned) and the between factor species. It revealed a significant effect of face orientation (F 1, 29 = 9.147, P = 0.005) with subjects producing more gestures when the experimenter’s face was oriented towards them rather than turned away (see Fig. 2). There was no interaction with the factor body (F 1, 29 = 0.621, P = 0.44) or with the between factor species (F 3, 29 = 2.427, P = 0.085). Body orientation had no main effect (F 1, 29 = 1.849, P = 0.184); however, there was an interaction with species (F 3, 29 = 5.593, P = 0.004). A post hoc comparison revealed that overall chimpanzees produced more unimodal visual signals (M = 27.50; SD 18.87), when the front of the experimenter’s body was orientated towards them, compared to when E presented his back (M = 18.94; SD 17.28; Wilcoxon test, z = −3.466, P = 0.001). No other factors or their interaction had a significant effect.

Table 3 Mean number of signals of each modality produced in all trials of each respective condition (SD)
Fig. 2
figure 2

Mean number of unimodal visual signals (SE) produced by subjects across conditions

Unimodal auditory and bimodal visual–auditory signals

Second, we analysed the use of unimodal auditory signals and bimodal visual–auditory signals (see Table 2). Analyses were made regarding (1) their proportional use as initial signals (see Table 4 for mean values) and (2) their overall frequency (see Table 3). An ANOVA with the within factors body orientation (front vs. back), face orientation (towards vs. turned) and the between factor species revealed no significant effect of any one of the factors (F1, 29; all P values > 0.05) or their interaction (F 3, 29; all P values > 0.05). This is true for unimodal auditory signals (vocalization) as well as bimodal (visual–auditory) signals, and both types of signals grouped together, independent of whether their proportional use as initial gestures, or their overall frequency, was analysed.

Table 4 Mean proportion of initially produced signals per condition (SD)

Discussion

The current study shows that for the bonobos, gorillas and orangutans, the experimenter’s face orientation was the only relevant factor helping them to assess the human experimenter’s attentional state. This indicates that as soon as the experimenter is in a position to offer food, from whatever position, his body orientation no longer affects the subjects’ behaviour. The results for the chimpanzees, however, are slightly different. While the face was also the major factor for them, the experimenter’s body orientation also affected their behaviour. However, there was no interaction between both factors. Contrary to the findings of Kaminski et al. (2004) showing a hierarchal, bivariate relation between both factors in the current study, face and body orientation influenced chimpanzees’ behaviour independently of each other.

At this point, we can only speculate why chimpanzees behaved differently to the other great apes. Compared to other great ape species, chimpanzees have a more pronounced competitive group structure. Chimpanzees have a less egalitarian (i.e. Boesch et al. 2002) and more aggressive (Furuichi 1997) social system than bonobos. Most interesting in this context, chimpanzees are reported to have less relaxed feeding competition (e.g. Kano 1992), and Hare et al. (2007) could experimentally demonstrate that they exhibit less social tolerance and are less cooperative during cofeeding compared to bonobos. Therefore, because chimpanzees in general potentially have to judge other group members’ attentional states more directly and simultaneously (i.e. to avoid punishment or to have faster access to resources), it might be beneficial for them to use less exact but faster deductible features as initial indications of other individuals’ attentional states.

However, our results clearly show that by removing the human’s restriction on offering food, it removed the dominating effect of body orientation. However, this also supports the hypothesis that body and face orientation are indeed interpreted hierarchically and that the apes’ attention to body orientation may mask their sensitivity to the experimenter’s attentional state. This explains the contradictory results of studies where the experimenter’s body orientation varied as opposed to other studies where it did not.

That apes assessment of the experimenter’s ability to hand over food may mask their sensitivity to a human’s attention becomes apparent when we compare two recent studies. Hostetter et al. (2007) found that chimpanzees, tested in the requesting paradigm, not only use a human’s face as a relevant cue but also attend to the status of the eyes and distinguish between those situations where the experimenter’s eyes are closed from situations where the experimenter’s eyes are open. Kaminski et al. (2004), however, found no such effect for chimpanzees specifically or for the other great apes. The important difference between both studies may be that Kaminski et al. (2004) placed the food on a table between the ape and the experimenter, whereas in the Hostetter et al. (2007) study, the food was offered directly from the experimenter’s hand. In line with our hypothesis, the apes’ assessment of the experimenter’s body orientation, and evaluation of E’s ability to give them food, may mask their attention to the human’s attentional state in one study but not in the other. This hypothesis is also supported by the findings of Hattori et al. (2009) who found that capuchin monkeys are sensitive to the state of a human’s eyes in a requesting paradigm, but only when the desired food was offered directly from the human’s hand, not when the food was placed on a table next to the experimenter.

Altogether, these findings might also indicate apes’ problems to cope with triadic contexts in which they have to refer to a third entity (the food). That is, the recipient and the food have to be closely linked such that the communication towards the human automatically implies the delivery of food. In this sense, the communicative situation is reduced to a mainly dyadic context where it is sufficient to attract the human’s attention towards the self, as the human’s behavioural spectrum is reduced to food delivery. Further studies will be necessary to investigate apes’ behaviour in triadic contexts investigating both their production and comprehension of signals (see also Gomez 2005).

The present findings also demonstrate that all great ape species are able to differentiate a human’s attention on the basis of the orientation of the face. Thus, they suggest that the ability to assess other individuals’ attentional states via the orientation of the face might represents a skill already present in the last common ancestor of the great apes. This finding is especially interesting since great apes’ skills differ, especially with regard to the more sophisticated cognitive skills of visual perspective taking. For example, compared to gorillas, chimpanzees, bonobos and orangutans are less skilled at using a human’s gaze as a referential cue (Okamoto-Barth et al. 2007). From the present study, we can conclude that this difference between the great ape species is not grounded in a general inability of orangutans to adequately assess other individuals’ attentional states. A further finding of the present study was that apes did not use bimodal visual–auditory and unimodal auditory signals differentially depending on the attentional state of the recipient; namely, they did not use them more frequently when the human was not attending visually. This is supported by other studies using the food requesting paradigm (Kaminski et al. 2004; Liebal et al. 2004b), but also from observations of interactions of conspecifics (for chimpanzees: Liebal et al. 2004a; for orangutans: Tempelmann and Liebal 2009; for gorillas: Genty et al. 2009). Although chimpanzees and the other great apes use visual gestures preferentially when a conspecific or human is attending (e.g. Hostetter et al. 2001; Leavens et al. 2004, 2009); in the current study, they did not use unimodal auditory or bimodal signals to attract the attention of the human as other authors have concluded from their findings (e.g. Hostetter et al. 2001, 2007; Leavens et al. 2004). However, we cannot say from where these differences stem. They may be due to the very large sample sizes used in these studies, which allowed the authors to detect significant behavioural differences that we could not identify.

In sum, by clearly signalling the human’s ability and motivation to offer food to the subjects, we eliminated the dominating influence of body orientation on the apes’ performance. As a consequence, the present study provides evidence that all great ape species judge a human’s attentional state on the basis of what the face is doing (or even the eyes). Thus, the present study suggests that the ability to assess other individuals’ attentional state via the orientation of the face constitutes a common skill in all great ape species and provides explanations for the seemingly contradictory results of previous studies following this paradigm. Furthermore, it demonstrates the necessity to take into account the triadic nature of the food requesting context.