Introduction

Whether nonhuman animals use gaze cues alone or together with other body cues such as head orientation as a window into others’ minds has been a key focus of comparative cognition research for the past few decades. Gaze-following or looking in the direction that others’ eyes (and sometimes head and body) are orienting towards (Butterworth and Jarrett 1991; Call et al. 1998; Rosati and Hare 2009) is a basic skill that is a prerequisite to more sophisticated forms of social cognition. In humans, a gaze-following capacity is reliably in place in the first few months of infants’ life (D’Entremont et al. 1997), and according to some, it is a key stepping stone towards language acquisition (Brooks and Meltzoff 2005). A wide range of animals have been shown to follow the gaze of a conspecific or a human (primates: Rosati and Hare 2009; non-primate mammals: Kaminski et al. 2005; Werhahn et al. 2016; birds: Watve et al. 2002; Bugnyar et al. 2004; Loretto et al. 2010; reptiles: Wilkinson et al. 2010; Simpson and O’Hara 2019).

Gaze-following can be driven by very different cognitive mechanisms (Butterworth and Jarrett 1991; Povinelli and Eddy 1996; Emery 2000). In its simplest form, gaze-following is a reflexive response triggered by an external stimulus (other’s gaze) without any understanding of the significance of the stimulus. The pervasiveness of gaze-following among vertebrates suggests that this reflexive response to gaze has an ancient origin. However, many species have shown a deeper understanding of gaze by flexibly exploiting observable gaze cues and even the unobservable psychological causes of the gaze (Krupenye and Call 2019). Within the primate order, certain species of strepsirrhines (e.g., ring-tailed lemurs), platyrrhines (e.g., capuchin monkeys), cercopithecines (e.g., rhesus macaques), and hominids (e.g., bonobos and chimpanzees) have at least shown the capacity to flexibly exploit observable gaze cues (see review below). This raises the interesting possibility that this higher level understanding of gaze is also a product of shared descent in primates (a.k.a. the continuity hypothesis). Alternatively, such capacity could have evolved convergently in response to social complexity, because the species that had been studied all tended to be highly social (a.k.a. the convergence hypothesis). The small apes are a key primate group to test these two hypotheses as (1) they are one of the least studied primate groups, (2) they are the apes most distantly related to humans, and (3) they live in relatively small social groups.

Below, we first review the major paradigms used to study animals understanding of seeing. We then summarize the current state of knowledge about gibbon social cognition.

What do nonhuman animals understand about seeing?

Many tasks have been developed to simulate a social interaction that can tease apart the underlying mechanisms of gaze-following. Based on the context of social interactions, these tasks can be broadly categorized into three types: neutral, cooperative, and competitive.

A common type of neutral tasks, known as ‘geometric gaze-following’, is an extension of the classic paradigm in which subjects follow gaze when a conspecific or a human demonstrator simply looks upward or aside into distant space. In these modified gaze-following tasks, a subject would encounter a distractor or a barrier as she turns her head to track the demonstrator’s line of sight. If the subject forms an expectation about the gaze being directed towards a target, she should look past the distractor or look around the barrier to locate the target (Povinelli and Eddy 1996; Tomasello et al. 1999). When this expectation is violated (e.g., the subject could not locate a visible target), they may perform ‘double-looks’—looking back at the demonstrator and following the gaze for a second time (Call et al. 1998). Evidence of geometric gaze-following has been documented in several major taxa including great apes (Povinelli and Eddy 1996; Call et al. 1998; Tomasello et al. 1999; Bräuer et al. 2005; Okamoto-barth et al. 2007; MacLean and Hare 2012), old world monkeys (Scerif et al. 2004; Goossens et al. 2008), new world monkeys (Burkart and Heschl 2006; Amici et al. 2009), corvids (Bugnyar et al. 2004), and canids (Range and Virányi 2011; Met et al. 2014). Furthermore, gaze-following into distant space and geometric gaze-following seem to follow different developmental trajectories, suggesting that the underlying mechanism of the latter is indeed more than just a reflexive response (Schloegl et al. 2007; Range and Virányi 2011).

A classic cooperative task, known as the ‘object-choice task’, requires subjects to locate hidden food by following gaze and other social cues from an experimenter who is seeing or has seen where the food is (Anderson et al. 1995; Call et al. 1998). In theory, subjects should follow the cues if they understand what the experimenter can see (or has seen). In reality, subjects often received a host of additional social cues (e.g., gestures), which confounded understanding gaze with understanding gestures (reviewed in Emery and Clayton 2009). In the few studies that examined gaze-only cues, performances were generally poor and inconsistent among primates (e.g., Anderson et al. 1995; Call et al. 1998; Burkart and Heschl 2006; Tan et al. 2014) and birds (Schloegl et al. 2008; von Bayern and Emery 2009). However, domestic cats and dogs, including puppies, are capable of using gaze-only cues in the object-choice task (Hare et al. 2002; Téglás et al. 2012; Pongrácz et al. 2018). This contrast has led to the idea that non-domesticated species are struggling in the object-choice task, because they fail to understand the cooperative communicative nature of the social cues (Hare 2001; but see Clark et al. 2019). Recently new tasks have been developed to simulate a cooperative but not communicative context (e.g., Grueneisen et al. 2017).

The most seminal type of competitive task, known as the ‘food-competition task’, has been developed to provide a more ecologically valid context (Hare et al. 2000, 2001). The general setup is that a conspecific or an experimenter competes with a subject for two pieces of food. Both are visible to the subject, but only one can be seen by the competitor. Yet, the competitor has priority access to the food (by dominating the subject or by being able to take the food out of the subject’s reach). If the subject is sensitive to what the competitor can see, they should approach the piece invisible to the competitor.

When competing against a conspecific, a few primate species showed a preference for stealing the ‘invisible’ food (chimpanzees: Hare et al. 2000, 2001; Bräuer et al. 2007; Kaminski et al. 2008; long-tailed macaques: Overduin-de Vries et al. 2014; Tonkean macaques: Canteloup et al. 2016; tufted capuchins: Hare et al. 2003; common marmosets: Burkart and Heschl 2007). Chimpanzees and long-tailed macaques even passed trials where it was impossible for them to simply react to the behaviors of the competitor (i.e., ‘behavioral reading’), suggesting that their preferences for the invisible food were based on an inference about the unobservable perceptual state of the competitor (for examples, in corvids, see Clayton et al. 2007; Bugnyar 2011; Bugnyar et al. 2016).

Food-competition tasks with humans as competitors come with a better control over the behaviors of the competitor. This advantage has allowed researchers to pinpoint what kinds of social information regarding the perceptual states of others subjects are able to exploit. Hare and colleagues (2006) showed that chimpanzees stole food from a human when her body, head, and eyes were all facing away (i.e., body + head + eye) and when only her head and eyes were looking away (i.e., head + eye). Rhesus macaques are capable of using not only the competitor’s body and head orientations, but also eye orientation alone (Flombaum and Santos 2005). Some lemur species are able to exploit the head + eye cues, but not the eye-only cue (Sandel et al. 2011; MacLean et al. 2013; Bray et al. 2014). Golden snub-nosed monkeys, however, did not take advantage of any of these cues (Tan et al. 2014). Finally, a high sensitivity to the eye-only cue has also been found in dogs (Call et al. 2003), jackdaws (von Bayern and Emery 2009) and starlings (Carter et al. 2008).

Taken together, there is evidence that an increasing number of primate species that are able to flexibly exploit observable gaze cues (and, for some, even unobservable visual perception of others) in neutral, cooperative, or competitive contexts. This indicates that such capacity might have evolved as a result of shared descent in the primate order. To examine this continuity hypothesis, it is necessary to test the small apes.

What do we know about gibbon social cognition?

In general, gibbons’ socio-cognitive abilities have been largely unexplored (Cunningham and Mootnick 2009; Liebal 2016). However, the family Hylobatidae holds special value for understanding the evolution of social cognition. Gibbons are not only the least studied apes, but also the apes most distantly related to humans. As such, from a phylogenetic perspective, studying gibbons is critical to determine whether cognitive capacities that have been demonstrated across great apes might have evolved even earlier, or whether those capacities evolved uniquely in the great ape lineage (e.g., Amici et al. 2010). Meanwhile, the present socio-ecology of gibbons seems to predict limited socio-cognitive skills. Even though gibbons are able to form multi-male or multi-female groups (Malone et al. 2012; Hu et al. 2018), the size of these groups is limited. Gibbons mainly live in two-adult groups with 3–6 individuals in total (Fuentes 2000), although monogamy is not obligate (Sommer and Reichard 2000). Furthermore, gibbons’ high level of arboreality might have forced them to rely on larger cues such as head or body which can be easier to spot through dense vegetation (Rosati and Hare 2009). As a result, gibbons might not have evolved much sensitivity to other’s perceptual states in the visual domain.

A few studies have investigated this question using geometric gaze-following and object-choice tasks. In neutral contexts, four gibbon species (Hylobates moloch, H. pileatus, H. lar, and Symphalangus syndactylus) were found to follow other’s gaze into distant space (Horton and Caldwell 2006; Yocom 2010; Liebal and Kaminski 2012). Except for two pileated gibbons—one of them was hand-raised by humans and thus more likely to be exposed to human communicative cues during her development (Horton and Caldwell 2006), there remains no evidence for geometric gaze-following such as ‘double-looks’ (Liebal and Kaminski 2012) or gaze-following around barriers (Yocom 2010). In cooperative contexts, Inoue and colleagues (2004) first reported that one enculturated white-handed gibbon was able to follow not only the body + head + eye-orientation cues, but also the eye-only cue. A recent study using the object-choice task tested a larger, non-enculturated sample of six gibbon species: H. muelleri, H. gabriellae, H. lar, Nomascus leucogenys, N. siki, and S. syndactylus (N = 11). The authors concluded that their subjects were able to locate hidden food by following body + head + eye-orientation cues from a human for 150 times in 264 total trials (p < 0.05, binomial test). However, this analysis was problematic, because all the trials were treated as if they were performed by a single subject. A re-analysis of their results (Table 1; Caspar et al. 2018) shows that as a group, these gibbons did not follow the body + head + eye-orientation cues (t = 1.73, n = 11, p = 0.11, one-sample t test).

Table 1 Summary of individual performance in each experiment

Taken together, there remains no clear evidence that, in neutral and cooperative contexts, gibbons’ gaze-following goes beyond a reflexive response. Yet, they have never been tested in a competitive context. Here, we examined whether two gibbon species (Hoolock leuconedys and H. moloch) could steal food invisible to a human competitor using the competitor’s body-, head-, and eye-orientation cues (adapted from Flombaum and Santos 2005; Sandel et al. 2011). Subjects approached two food pieces with the competitor standing in between. In experiment 1, the competitor turned his body, head, and eyes sideways, so his profile was facing the subjects (i.e., the food piece in front of his body was being seen and thus contested, while the one behind him was uncontested). In experiment 2, the competitor’s body was facing the subjects, but he always turned his head towards the contested piece and his eyes were either open or closed depending on conditions.

The continuity hypothesis predicts that gibbons should show sensitivity to observable gaze cues. Specifically, subjects should prefer the uncontested food if they are sensitive to body-orientation (experiment 1) and head-orientation cues (experiment 2). Furthermore, if they rely on eye-orientation cues, subjects in experiment 2 should also choose the uncontested food more often when the competitor’s eyes are open than when they are closed.

In contrast, the convergence hypothesis predicts that, due to the relatively low level of sociality (and high level of arboreality) in gibbons, they should not have evolved such sensitivity. Instead, gaze-following in gibbons should remain a low-level stimulus–response process that has been widely shared in vertebrates. Specifically, subjects should not show a preference for the uncontested food. If any, they should instead prefer the contested food due to stimulus enhancement from the competitor’s gaze.

Experiment 1: body orientation

Methods

Subjects

Seven eastern hoolock gibbons (H. leuconedys) and three silvery gibbons (H. moloch) participated in the current study (total N = 10, 6M:4F, age in years = 15.97 ± 10.83). One silvery gibbon, Oula, stopped participating during the test, but we included her in some of our analysis.

All subjects were housed at the Gibbon Conservation Center (GCC) in Santa Clarita (CA, USA). They lived in pairs or small groups (except for Winston who was housed alone). Subjects were tested alone in a compartment of their enclosure. Participation was voluntary as subjects could quit testing by refusing to pick up the food which we used to center them (see below for criteria). Subjects were fed multiple small meals per day. Testing took place between meals to ensure motivation. Water was accessible ad libitum. Subjects’ groupmate(s) were housed in a secondary compartment away from where testing took place to minimize potential interference.

Setup and materials

Two experimenters were involved: one (E1) played the competitor on one side of the enclosure, and the other (E2) centered the subject on the opposite side of the enclosure. By centering the apes at the beginning of the trial, we minimized side biases.

Two testing tables, each with a sliding tabletop, were placed in front of the mesh (55 × 55 × 120 cm) and approximately 1 m apart from each other. When the tabletops were retracted, the platforms were out-of-reach of the subjects (approximately 1 m from the mesh). When the tabletops were slid towards the mesh, they became within reach (approximately 30 cm from the mesh).

We used slices of fresh bananas as food rewards and placed them directly on the tabletops (one slice per table). See an illustration of the setup in Figures S1 and S2 in the supplementary materials.

Procedure

The experiment consisted of three phases: introduction, pretest, and test.

Introduction

This phase introduced the testing table to the subjects. E1 baited both tabletops with a banana slice. Meanwhile, E2 centered the subject by placing a small piece of banana on the floor just outside the mesh on the opposite side. A trial started when the subject picked up that piece and E1 pushed the tabletops simultaneously to the mesh. E1 then quickly exited the testing area walking two meters away on the opposite direction to the mesh. E1 faced away the subject until the trial ended. The trial ended either (1) when the subject reached through the mesh to retrieve the food on one table or (2) when no attempt was made within 2 min. In the former case, the subject was allowed to pick up the food on the other table. In the latter case, a trial was repeated.

To proceed to the next phase, subjects needed to retrieve the food successfully three trials in a row. Subjects were deemed to have lost motivation when a trial was repeated for three consecutive times or when they, twice in a row, refused to pick up the food piece from E2 within 5 min. In this case, testing resumed on a later day with the passing criterion restarted.

Pretest

This phase established a competitive relationship between E1 and the subject. The setup was identical to Introduction, but E1’s behaviors were different.

After baiting and pushing forward both tables, E1 remained behind one of the two tables facing the subject, placed his hand on the extended tabletop, and looked into the testing room. The table with E1 behind it was referred to as the contested table, and the other one as the uncontested table. We counterbalanced which table was contested and uncontested across trials. The subject was given 2 min to reach for one table.

If the subject reached for the contested table, E1 quickly retracted the table and removed the food piece. The subject was then allowed a chance to feed on the uncontested table until the end of the 2-min trial to freely explore and feed on the uncontested table (while E remained still behind the contested table).

If the subject reached for the uncontested table, she could freely take the food from that table, and then, she was allowed to approach the contested table (and to be denied by E1) before the trial ended at the 2-min mark.

A subject had to choose the uncontested table first in seven of eight consecutive trials to proceed to the next phase (p < 0.05 in binomial test). We followed the same criteria as the introduction phase to repeat a trial and to pause a session. See table S1 for details on the total number of trials which each individual needed to pass each criterion.

Test

This phase required the subject to choose between the contested table and the uncontested one. In this phase, E1 position himself in a novel way that the subject had not experienced in Introduction or Pretest.

As shown in Fig. 1a, E1 now stood in the space between the two tables and turned 90° sideways. The table E1 was facing became the contested table, while the one behind E1 was the uncontested table. E1 used one hand to hold the edge of the sliding tabletop of the contested table to retract it if necessary.

Fig. 1
figure 1

Setups of the competitor in experiment 1 (a), the eye-open condition (b), and the eye-closed condition (c) of experiment 2. The circles with bold line and dashed line referred to the contested and the uncontested piece of food, respectively. Results of both experiments (d) show that subjects chose the uncontested piece above chance (dashed line)

If the subject chose the uncontested table, she was allowed to eat the food. E1 immediately removed the food on the contested table, and the trial ended. If the subject chose the contested table, E1 immediately removed that piece of food. Then, there was a 50% chance that E1 also removed the uncontested piece of food. Otherwise, E1 completely turned his back on the subject room, allowing the subject to feed on the uncontested piece. This was designed to keep subjects’ motivation (following Sandel et al. 2011).

Each subject completed 16 trials (except for Oula who stopped participating after five trials, and Win Bo who received 17 trials due to experimenter error). The relative location of the two tables was counterbalanced within and between subjects. Within a session, the experimenter did not face the same table for more than two consecutive trials. We followed the same criteria as pretest to repeat a trial or to pause a session. If a session was aborted, the subject would resume the same trial on another testing day.

Coding and analysis

A choice was recorded when the subject reached her hand through the mesh in front of a table. All trials were coded live by a third experimenter filming the action. A randomly selected sub-sample of 18% trials was coded by a second coder. Reliability was excellent for the location of the contested table (Cohen’s Kappa = 1) and the subjects’ choices (Cohen’s Kappa = 1).

We used Wilcoxon signed-rank tests to assess whether subjects chose the uncontested table significantly above chance. We then ran trial-by-trial analysis using generalized linear mixed model (GLMM) with the outcome of each trial as the binary response (contested vs. uncontested) following a binomial distribution (via the ‘lme4’ package, Bates et al. 2015). The full model included subject sex (center coded, same for study 2) and trial number (z-transformed, same for study 2) as fixed predictors and subject identity as the random predictor. The null model included only the intercept and the random predictor. We first compared the full model to the null model using likelihood ratio test (LRT, via the ‘car’ package, Fox and Weisberg 2011). We then obtained coefficients of the final model using Wald z test. Finally, we used binomial test to examine subjects’ choices in the first trial.

If subjects were sensitive to the orientation of the competitor’s body (or head or eyes), they should show a preference for the uncontested table (i.e., a significant intercept term in GLMM).

Results

Subjects chose the uncontested table in 84.2% ± 11.7% (mean ± SD) of test trials (Fig. 1d; see Table 1 for individual performance). Gibbons as a group preferred the uncontested table significantly above chance (p = 0.006, V = 55; Wilcoxon sum rank test).

Model comparison revealed that the full model was not significantly different from the null model (χ2 = 1.947, df = 1, p = 0.378). This suggests that subject sex and trial number did not collectively influence subjects’ choice, and the null model shall be the final model. The intercept term of the final model was significant (z = 7.714, df = 1, p < 0.001), and its coefficient was 1.82, suggesting that overall subjects were 5.14 times more likely to choose the uncontested table than the contested one. Finally, eight of ten subjects took food from the uncontested table in the first trial (p = 0.055, binomial test).

Discussion

Our subjects demonstrated a strong preference for the uncontested table, i.e., they relied on the body + head + eye-orientation cues to steal food from a competitor. This preference is similar to those observed in chimpanzees, rhesus macaques, and ring-tailed lemurs (Flombaum and Santos 2005; Hare et al. 2006; Sandel et al. 2011), supporting the continuity hypothesis for the evolution of primates’ sensitivity to observable cues.

Our results are inconsistent with the explanation that the subjects have learned such a preference during the study, because (1) trial number has no effect on the subjects’ performance, and (2) they already showed the preference in the very first trial. It is also unlikely that the subjects have learned this preference during pretests as the competitor’s body was oriented differently between the Introduction and the Test phase. However, we entertain the possibility that gibbons could have experienced cues such as body orientation prior to our study and/or used associative cues to determine the location of the uncontested table.

However, given that the competitor gave a combination of body-, head-, and eye-orientation cues during the test, it remains unclear exactly what cue(s) were used by the subjects. Of particular interest is whether the subjects were sensitive to head- or even eye-orientation cues that are subtler but also more accurate than body orientation. Furthermore, to make sure that he could retract the tabletop in time if necessary, the competitor’s hand was touching the contested table (but not the uncontested one), making local enhancement an alternative explanation to our results.

In the next experiment, we manipulated the competitor’s head and eye orientations while controlling for body orientation and local enhancement.

Experiment 2: head- and eye-orientation cues

In experiment 2, subjects could not make a choice based on (1) the body orientation of the competitor as his body was always facing forward or (2) local enhancement as he was touching both tables. The competitor turned his head towards the contested table—if subjects are sensitive to head-orientation cue, they should prefer the uncontested table. The competitor opened his eyes in the eye-open condition, but closed them in the eye-closed condition—if the subjects are sensitive to eye-orientation cue, they should show a stronger preference for the uncontested table in the former condition.

Methods

Subjects and setup

All subjects in experiment 1, except for Oula, participated in the current study (N = 9, 3F:6M). Oula was not willing to be separated during the study. The experimental setup was the same as experiment 1.

Procedure

The current study had two phases: pretest and test. No introduction phase was run as all subjects had participated in experiment 1 recently.

Pretest

This phase was identical to experiment 1 with one exception. Since subjects had passed the pretest in experiment 1, we loosened the passing criteria: subjects needed to choose the uncontested table either in five consecutive trials or in six out of seven trials (p < 0.05 in one-tailed binomial test).

Test

E1 stood between the two tables with his body facing toward the testing room. Each hand of E1 was holding and operating a sliding tabletop. The head of E1 turned sideways to one of the tables. That table was the contested table and the other was the uncontested table.

In the eye-open condition (Fig. 1b), E1 opened his eyes and looked at the contested piece of food (i.e., both head- and eye-orientation cues were available). E1 reacted the same way as in experiment 1 when the subject has made a choice. In the eye-closed condition (Fig. 1c), E1 closed his eyes, while keeping his head towards the contested piece (i.e., only head-orientation cue was available). E1 remained still until the end of the trial (i.e., for 2 min), regardless of whether and which table the subject chose.

Subjects received 16 test trials per condition (32 in total). Subjects finished the two conditions on separate days. Which condition was administered first was counterbalanced between subjects.

Coding and analysis

We coded a choice the same way as experiment 1. A randomly selected sub-sample of 19% trials was coded by a second coder. Reliability was very high for the location of the contested table (Cohen’s Kappa = 1) and the subjects’ choices (Cohen’s Kappa = 0.96).

We used Wilcoxon signed-rank test to examine whether gibbons significantly preferred the uncontested table above chance. We constructed a full GLMM model including subject sex, trial number, condition, the order of conditions, a condition × condition order interaction, and a condition × trial number interaction as fixed predictors and subject identity as a random predictor. We compared the full model to the null model with only the intercept term and the random predictor. We then obtained model coefficients using Wald z tests. Finally, we used binomial test to examine the first-trial performance.

To further compare subjects’ performances between experiment 1 and 2, we built another GLMM model using combined data from the two experiments. This combined model had subject sex, trial number, condition, and a condition × trial number interaction as fixed predictors. Note that ‘condition’ became a three-level variable: body + head + eye cue (from experiment 1), head + eye cue, and head-only cue (from experiment 2). We first compared this model to a null model (with only the intercept and the random predictor). We then used LRT to examine the overall effect of each fixed predictor. Because the effect of ‘condition’ was significant, we then ran post hoc, pairwise analysis of the ‘condition’ variable with Tukey correction (via the ‘emmeans’ package, Lenth 2016).

If they relied on head-orientation cues, subjects should show a preference for the uncontested table regardless of conditions. If they also relied on eye-orientation cues in addition to head-orientation cues, subjects should show a stronger preference for the uncontested table in the head + eye cue than in head-only cue condition. If they relied on head- and body-orientation cues independently, they should show a stronger preference for the uncontested table in experiment 1 than either condition in experiment 2.

Results

The mean frequencies of choosing the uncontested table were 70.1% ± 12.8% (mean ± SD) in the eye-open condition and 62.5% ± 7.2% in the eye-closed condition (Fig. 1d; see Table 1 for individual responses).

For the data of experiment 2 alone, Wilcoxon signed-rank tests revealed that gibbons preferred the uncontested table significantly above chance levels in the eye-open and the eye-closed conditions (both condition: p = 0.014; V = 36; Wilcoxon signed-rank tests).

LRT revealed that the full model was not significantly different from the null model (χ2 = 8.994, df = 6, p = 0.174), suggesting the collection of fixed predictors—condition, subject sex, trial number, and the interaction terms–did not influence subjects’ choice between the two tables. The intercept term of the null model was significant (z = 5.434, df = 1, p < 0.001). The coefficient was 0.678, meaning that, regardless of condition, subjects were 0.97 times more likely to choose the uncontested table than the contested one.

Binomial test revealed that seven of nine subjects chose the uncontested table in the first trial of the eye-open condition (p = 0.090), while six of nine subjects did so in the eye-closed condition (p = 0.254).

When looking at the combined data of both experiments, the full-null model comparison was significant (χ2 = 28.965, df = 6, p < 0.001). Condition and the intercept term of the full model were significant (condition: χ2 = 20.051, df = 2, p < 0.001; intercept: χ2 = 78.068, df = 1, p < 0.001). No other predictor was significant. As predicted, post hoc Tukey test subjects’ preference for the uncontested table was stronger in experiment 1 than either condition in experiment 2 (both p < 0.005).

Discussion

Our subjects showed a preference for the uncontested table in both conditions. This finding shows that subjects were sensitive to head-orientation cues in a competitive context. This level of sensitivity (62.5–70.1%) was comparable to other primate species tested in this paradigm, although these species live in much larger social groups (chimpanzees, 66.7%, Hare et al. 2006; rhesus macaques, 72.7%, Flombaum and Santos 2005; ring-tailed lemurs, 63–69%, Sandel et al. 2011).

This result provides evidence against local enhancement, because the competitor touched both tables. As in experiment 1, trial number has no effect on subjects’ choices suggesting that apes did not develop a preference over time. In addition, twice as many subjects chose the uncontested table in the first trial. Although binomial tests were not significant (likely due to low statistical power), the pattern was highly consistent with experiment 1. Our results lend no support to the hypothesis that the subjects somehow learned to use head-orientation cue during testing.

Subjects did not seem to rely on eye-orientation cue, since we observed no difference between the eye-open and eye-close conditions. Notably, subjects did pay attention to the competitor’s head in the same trials (i.e., they chose the uncontested table in a majority of trials), suggesting that this lack of sensitivity was not, because subjects had little motivation for food or paid no attention to the human competitor. This observed pattern is consistent with findings that nonhuman primates in general fail to pick up eye-orientation cue regardless of contexts (e.g., Tomasello et al. 2007; MacLean et al. 2013; Tan et al. 2014; but see Flombaum and Santos 2005). Nonetheless, it is possible that our head cue shadowed gibbons’ reliance on eye cues alone. In the future, it would be important to disentangle head + eye cues from eye cues alone (e.g., the head facing the subject, while the eyes face the contested table).

Finally, subjects seemed capable of distinguishing body orientation from head orientation as their preference for the uncontested table was significantly stronger in experiment 1 than in experiment 2. This result might be subject to an order effect (although the two experiments were conducted several months apart for the majority of subjects). It is also important to point out that in experiment 1, these two cues were in line with each other. Because body orientation is more detectable while head orientation is more accurate, future research should determine which cue subjects give priority to (e.g., Hare et al. 2006).

General discussion

In two food-competition tasks, eastern hoolock gibbons and silvery gibbons successfully exploited information about what a competitor could and could not see, supporting the continuity hypothesis that primates share similar socio-cognitive skills to detect others’ agents body cues including a sensitivity to head orientation. We found that gibbons stole food invisible to the competitor when he turned away his body (in experiment 1) and his head (in experiment 2). However, they failed to consider the subtlest yet most accurate visual cue—the head + eye (although head orientation could have masked gibbons’ focus on eye cues alone). Our results could not be explained by local enhancement as the competitor was touching both tables in experiment 2. It is also unlikely that this observed sensitivity to body- and head-orientation cues was learned during testing, because (1) this sensitivity did not increase over time in both experiments, and (2) a majority of subjects already showed such sensitivity in the first trial.

Our results do not necessarily indicate that subjects were capable of visual perspective taking, i.e., that they understood their own perspective could differ from the perspective of others and, as a result, they knew that they could see something that others could not see (Flavell et al. 1978). First, it is possible that subjects had already learned how to respond to body- and head-orientation cues before the study. Responses to social cues could have been the product of associative learning mechanisms, resulting in apes’ limited and inflexible understanding of these cues. Future research should, therefore, present subjects with a variety of cues regarding other’s attentional states, but these cues should be somewhat unfamiliar to subjects. This is important for disentangling whether individuals could flexibly adapt to different social cues or their success was a product of associate learning prior to the study. Importantly, such cues can be not only behavioral (like the current study) but also contextual (e.g., whether a food piece was covered by an occluder or not, Hare et al. 2006; whether a food container is opaque or transparent, Melis et al. 2006). If subjects consistently and flexibly respond to these cues, it is then safe to conclude that such sensitivity is more than a result of associative learning (e.g., Flombaum and Santos 2005).

Second, the current design could not rule out the ‘behavioral reading’ hypothesis that subjects were sensitive to the observable, behavioral cues instead of the unobservable, mental states (Penn and Povinelli 2007; Heyes 2015). Future studies should adopt other variants of the food-competition task in which no observable cue is available (e.g., competition with a human, Karg et al. 2015a; competition with a conspecific, Kaminski et al. 2008).

Our results are different from the negative results of other studies (Yocom 2010; Liebal and Kaminski 2012; Caspar et al. 2018). The first reason for this discrepancy may be a species difference—70% of our sample were eastern hoolock gibbons (H. leucogenys), a species that was never represented in the other studies. However, the current literature does not suggest that this species lives in more complex social groups than the gibbon species in the above studies. For example, one proxy for social complexity in gibbons is the percentage of groups that contain more than two adults. Only 5.4% of eastern hoolock gibbon groups were observed to have more than two adults (Chetry et al. 2011; Fan et al. 2011), while this number for pileated gibbons (H. pileatus), white-handed gibbons (H. lar), and siamangs (S. syndactylus) was 8.3%, 26.7%, and 45.5%, respectively (Srikosamatara and Brockelman 1987; Morino 2012; Reichard et al. 2013). Furthermore, the performance of the eastern hoolock gibbons in our sample was qualitatively similar to the silvery gibbons (although our sample size did not allow quantitative comparison between species).

The second and, in our opinion, more plausible reason is methodological. While previous studies present apes with neutral and cooperative scenarios, our study presents gibbons with a competitive situation. Our results add to the growing literature that competitive contexts can be of high ecological validity when it comes to elicit cognitive abilities in nonhuman animals (Hare and Tomasello 2004; Herrmann and Tomasello 2006; von Bayern and Emery 2009; Karg et al. 2015b). However, competitive contexts are not always necessary to elicit socio-cognitive skills (MacLean and Hare 2012; Grueneisen et al. 2017). It is possible that a lack of power (N = 4) has led to the negative result in the gaze-following-around-barrier task in Yocom (2010). Gibbons did not show any ‘double-look’ in Liebal and Kaminski (2012), but this measure might not be as sensitive as the gaze-following-around-barrier task. For example, capuchin and spider monkeys followed gaze around barrier, but did not show ‘double-looks’ (Amici et al. 2009). In addition, using conspecifics might increase the salience of non-competitive contexts as there is evidence that siamangs selectively direct gestures and facial expressions toward conspecific partners who are attending (Liebal et al. 2004).

Qualitatively, the sensitivity to body- and head-orientation cues shown by our subjects seems comparable to those observed in similar tasks with chimpanzees (Hare et al. 2006), rhesus macaques (Flombaum and Santos 2005) and ring-tailed lemurs (Sandel et al. 2011). Nevertheless, gibbons’ societies contrast with those of the primates previously tested in these paradigms. Although small apes can form some groups (Malone et al. 2012; Hu et al. 2018), they mainly live in small groups (mean group size 3–6 individuals) including a breeding pair. These results thus seem to support the continuity hypothesis that primates in general share a fundamental sensitivity towards other agents’ body and facial features as a product of shared descent. Our results seem to challenge the convergence hypothesis, suggesting that a sensitivity to others’ observable cues did not only evolve as a product of increased sociality (at least in apes), and raise intriguing questions regarding the ultimate function of such sensitivity. First, in lemurs, such sensitivity is correlated with mean group size (MacLean et al. 2013). Does the same relationship between group size and sensitivity to attentional cues apply to monkeys and apes? Second, does group size really matter for the sensitivity to head orientation or rather the overall amount of social interactions with conspecifics, independently of whether they occur with one partner or several ones? Third, if the latter is the case, are gibbons more social than previously thought? These questions shall be addressed by quantitative comparisons between gibbons and other primate species to shed further light on the relationship between socio-ecology and the evolution of primate cognition.