Introduction

Living in social groups can reinforce the development and expansion of cognitive and emotional abilities underlying social competence (Social Brain Hypothesis—Dunbar and Shultz 2007; Schyns et al. 2009). Communication is essential for survival, cohesion, and coordination of a group. Signal complexity, in both execution and recognition (Schyns et al. 2009), parallels the evolution of social systems (Schmidt and Cohn 2001; Parr et al. 2005; Freeberg et al. 2012). Species living in tolerant societies and/or in sizable groups show a greater variability and flexibility in the use of communicative signals than those living in despotic societies and/or in small groups (Maestripieri 1999; McComb and Semple 2005; Parr et al. 2005; Ciani et al. 2012).

Play is recognized as one of the most sophisticated forms of social communication, particularly in highly complex mammalian societies (Fagen 1981; Pellis and Pellis 1996). In these societies, behavioural mechanisms have evolved to keep the play session well balanced. When players strongly differ in body mass and hierarchical rank, these mechanisms limit escalations into aggression (e.g. self-handicapping and role reversal: Pereira and Preisser 1998; Thompson 1998; see Video ESM1). In these cases, communicative signals maintain playful motivation. It has been suggested that one function of play behaviour is to learn how to decode such communicative signals (Fagen 1993; Burghardt 2005). The most intriguing question is how playmates discriminate between playful and serious intentions when it is well known that during play sessions, animals use behavioural patterns belonging to other functional contexts (Pellis and Pellis 1996, 1997; Palagi 2008).

In primates, the relaxed open-mouth display, or play face, is the typical facial expression occurring during play (van Hooff and Preuschoft 2003). The open-mouth display was interpreted as a ritualized version of biting, common during play fighting (van Hooff and Preuschoft 2003; Palagi 2006). The play face can be associated with a pant-like vocalization (Provine 2000), considered to be homologous to human laughter (van Hooff 1972; Davila Ross et al. 2009). For many decades, facial expressions have been considered as linked to specific internal states, a concept already well expressed by Darwin (1872), who argued that facial expressions are the inevitable counterpart of felt emotions. Emotions can be deemed as a mechanism that leads to proper behavioural responses to a wide range of both internal and environmental stimuli (Parr et al. 2005). Primates regularly communicate their emotions through facial expressions and vocalizations (Ekman 1993, 1997; Parr et al. 2005), but gestures are mainly restricted to humans and apes where they are very frequent during play (Liebal et al. 2006; Pollick and de Waal 2007; Genty et al. 2009; Hobaiter and Byrne 2011a). Gestures are conventionally classified as “intentional” signals (Leavens et al. 2005), because they are linked to less evolutionary urgent functions and are produced voluntarily by the sender (Call and Tomasello 2007; Arbib et al. 2008). Two main criteria define intentional signals, they must be: (1) used in social contexts (Leavens et al. 2005) and (2) influenced by the attentional status of the observer (Bakeman and Adamson 1986; O’Neill 1996). Eye contact, body orientation, response waiting, and persistence are all critical features that must be considered to support the intentional nature of a communicative signal (Cartmill and Byrne 2011). All of these characteristics were detected in the gestural communication of great apes (Call and Tomasello 1994; Hare et al. 2000; Hostetter et al. 2001; Pika et al. 2003; Liebal et al. 2004; Poss et al. 2006; Pollick and de Waal 2007; Leavens et al. 1996, 2004a, b).

It must be noted that the boundary between intentional and emotional communication is less clear-cut than previously thought. Sherwood et al. (2004, 2005) demonstrated the presence of two different neuro-anatomical routes determining the emission of facial expressions: an involuntary “emotional” path (through the facial nucleus in the pons of the brainstem) and a voluntary “intentional” path (through activity in the facial representation area of the motor cortex). Moreover, recent neuro-anatomical and neurological studies in human and non-human primates indicated the presence of a tight connection between the intentional and emotional communication systems (Cattaneo and Pavesi 2014), even though the degree to which they intermingle for the emission of a given signal is still not known.

Although gestural communication in the great apes has been studied for a long time (Call and Tomasello 2007), a universally accepted operational definition of gesture is still lacking. In fact, gestures can be narrowly defined as communicative movements of hands, feet, or limbs, but frequently also facial expressions, body postures, locomotor patterns, and head movements are included. An important feature that differentiates gestures from other signals is the broad flexibility of their use and their disentanglement from specific behavioural contexts (Pollick and de Waal 2007; de Waal and van Hooff 1981).

In the African great apes, most gestures occur during social play (Tomasello et al. 1994, 1997; Pika et al. 2005; Genty et al. 2009; Genty and Byrne 2010), especially between immature subjects. Juveniles show a greater variability in the gestural repertoire compared to younger and older subjects (Tomasello et al. 1989; Hobaiter and Byrne 2011b). This age distribution mirrors that linking the frequency of social play to the age of the subjects (Power 2000; Burghardt 2005; Pellis and Pellis 2009). Hence, playful activity can be considered as a sort of training ground where new gestures can be expressed and tested for their effectiveness. What still remains unexplored, however, is how gestures and facial expressions are produced during play and how they are modified in the course of ontogeny. Due to their highly tolerant nature (Hare et al. 2007, 2012; Wobber et al. 2010a, b; Rilling et al. 2012) and high propensity to play (Palagi 2006; Palagi and Cordoni 2012) even as adults (Palagi and Paoli 2007), bonobos (Pan paniscus) are a good model species to test hypotheses about the use of communicative signals during play (Palagi 2008). The aim of our research was to explore differences and similarities in the use of gestures and facial expressions by evaluating whether and how specific features of play modify the use of these two communicative systems, characterized by different degrees of emotionality and intentionality.

Prediction 1

Due to the interactive nature of communicative signals (de Waal 2003), we expect higher levels of both gestures and facial expressions during social than solitary play.

Prediction 2

Play fighting, the roughest and riskiest version of social play, can escalate and degenerate into an aggressive encounter (Pellis and Pellis 1996). Therefore, due to the fine-tuning and de-escalating function of signals during social play (de Waal 2003; Waller and Dunbar 2005), we predict that both gestures and facial expressions reach their peak frequencies when animals engage in play fighting compared to other kinds of social play.

Prediction 3

It has been reported that the number of players involved in a session influences the use of signals (Hayaki 1985; Palagi 2008, 2009). Gestures are considered as intentional signals, and therefore, they must be directed towards a specific receiver (Liebal et al. 2006; Pollick and de Waal 2007; Genty et al. 2009; Hobaiter and Byrne 2011b). On the other hand, playful facial expressions can also be the outcome of a positive emotional internal state linked to the rewarding nature of play (Fredrickson 2004; Panksepp and Panksepp 2013). We expect to find a dichotomy in the use of the two communicative systems depending on the number of players (Prediction 3a) and the attentional state of the receiver (Prediction 3b). Gestures should be more frequent in dyadic than in polyadic interactions and when the receiver can visually perceive them, whereas such differences should not be present when considering playful facial expressions.

Prediction 4

Some authors hypothesized that adults learn by experience to intentionally select the most effective gestures to maximize their communicative potential and to limit the redundancy of gestures that immature subjects adopt as a “fail-safe” strategy (Repertoire Tuning hypothesis—Hobaiter and Byrne 2011b). If the phenomenon of “Repertoire Tuning” is also present during playful social interactions (a non-serious and non-functional context), we expect to find a negative correlation between the age of the subjects and the frequency and repertoire size of gestures they performed (Prediction 4a).

Adults, while playing roughly, must restrain themselves to avoid unintentionally harming younger playmate (e.g. self-handicapping process, Pellis and Pellis 2009; Power 2000). In adults, this could limit the gratification normally gained during playful activities (Power 2000), thus leading to a decrease of play faces, if they are mainly driven by emotions. In this case, we expect to find a negative correlation between the frequency of playful facial expressions and the age of players. If playful facial expressions in adults are mainly linked to an intentional component, a positive correlation between these signals and the age of subjects should be found (Prediction 4b).

Methods

The colony and the data collection

Behavioural data were collected during three months of naturalistic observations (August–October 2009) on the bonobo colony hosted at the Apenheul Primate Park (Apeldoorn, The Netherlands). The group composition included 12 subjects (see Table 1 for details).

Table 1 Composition of the bonobo colony (Pan paniscus) housed at the Apenheul Primate Park in the period of data collection

The bonobo area consisted in interconnected multiple indoor enclosures of about 230 m2 overall and an outdoor naturalistic island of about 5000 m2, among which the animals could freely move after the first feeding session (9.00) until the last feeding session (17.30). Just before the last feeding session, bonobos were separated into two groups with variable composition to spend the night in the indoor facilities and they were reunited the next morning just before the first feeding session. During the observation hours, bonobos were fed four times (9.00, 12.45, 15.00, and 17.30) and most of the food was scattered on the ground. Water was available ad libitum, and several environmental enrichments were provided. No stereotypic or aberrant behaviour characterized the study group. Observations were made over a 7-h period, encompassing morning and afternoon, 6 days a week. Play session data were collected over 502 h of observation using the all occurrences sampling method (Altmann 1974). Play sessions were video-recorded to describe in detail the use of communicative signals: facial expressions and gestures.

Operational definitions and statistics

We focussed our attention on facial expressions typical of play: play face (PF) and full play face (FPF) (Figure ESM1). In the PF, the mouth is opened and only the lower teeth are exposed, whereas in the FPF also the upper teeth are visible (Palagi 2006). In this study, we could not evaluate the emission of the pant-like vocalization that can accompany the full play face, due to the distance between the observers and the bonobos in the outdoor enclosure and to the glass dividing wall in the indoor enclosure. Therefore, we considered only the visual component of the play faces.

For gestures, we adopted the ethogram published by Pollick and de Waal (2007; see the Figures ESM2 and ESM3 and the Video ESM2) and integrated it with two head movements that had been previously described as having a communicative function within the Pan genus: head nod (i.e. repeated back and forth movement of the head—Hobaiter and Byrne 2011a; see the Video ESM4) and head shake (i.e. repeated horizontal movement of the head from side to side—van Hooff 1973). A list with the description and classification of gestures used in this study is provided in Table ESM1. Since the purpose of this research was not to investigate the possible meaning of gestural sequences, we considered all gestures as if they were emitted singly (e.g. if a bonobo displayed a “reach out” and then immediately a “slap ground”, we considered them as two single gestures). We divided the gestures into three categories according to the sensory modality they rely on: (1) visual gestures, based solely on visual information; (2) auditory gestures, based on sound production, and (3) tactile gestures, based on establishing a body contact with the recipient (Call and Tomasello 2007).

We measured the attentional state of the receiver by considering the head orientation (see Fig. 1 for a graphic presentation of the criteria adopted). These criteria were necessary only for visual gestures and facial expressions. We conventionally considered auditory gestures as perceived when the sender and the receiver were at a maximum distance of two metres. Tactile gestures simply needed body contact between sender and receiver.

Fig. 1
figure 1

Scheme illustrating the criteria used to evaluate the attentional state of the receiver in relation to visual gestures and facial expressions. When the sender was in front of the receiver (i.e. within the range of its stereoscopic view), we considered visual gestures and facial expressions as perceived. When the receiver was facing away from the sender, we considered visual gestures and facial expressions as not perceived. All the doubt cases linked to lateral views were discarded from the analyses

For the data collection, a rigorous and repeatable observation protocol was developed by E.P.

Before starting systematic data collection, the observer (E.D.) underwent a training period (90 h). During the training phase (the trainer was E.P.), the same focal animal was followed by the observer and the trainer simultaneously, and the data were then compared. Training was over when the observations matched in 95 % of cases and when the Cohen’s kappa was >0.80). Kappa coefficients were computed to assess the agreement for play face and full play face, and the gestures listed in the Table ESM1. During the video-analysis, such procedure was replicated at monthly intervals in order to control for the interobserver reliability for each behavioural item considered. Cohen’s kappa was never <0.80.

For solitary play, we considered both sessions involving the manipulation of objects and those characterized by acrobatic locomotor patterns, such as pirouettes, somersaults, jumps, runs, and twists (Palagi 2014). A solitary play session started when an individual performed the first play behavioural pattern. If the bout started again after a delay of 10 s, it was counted as a new play session.

A single social play session started when one partner invited to play or directed any playful behaviour towards a group member and ended when the playmates ceased their activities or one of them moved away. If the bout started again after a delay of 10 s, it was counted as a new play session. To be included in the analysis, each play session had to last at least 10 s.

When a social play session was characterized by the absence of any kind of physical contact, that session was considered as locomotor–rotational play (LR-play) (Wilson and Kleiman 1974; Burghardt 2005; Tacconi and Palagi 2009). Depending on the patterns included in the session, we distinguished two types of contact play (C-play): rough and gentle play. A playful contact session was defined as “rough” when it included play fighting (i.e. fast, vigorous, and reiterated behavioural patterns including stamping, brusque rushing, dragging, kicking, tumbling, biting, and slapping). All the other sessions not including the patterns previously reported were labelled as “gentle”, mainly characterized by grab gentle, gentle touching, tickling, and finger wrestling (van Hooff 1973).

For each play session, the number and the identity of the playmates were recorded, thus permitting the distinction between dyadic (two players involved) and polyadic (more than two players involved, see the Video ESM3) play sessions.

Nonparametric statistics was used, because of the small sample size and deviation from normality (Siegel and Castellan 1988). The Wilcoxon matched-pair signed-rank and Friedman test (and the relative post hoc Dunnett test) were used to assess differences between the frequency of gestures and facial expressions during the different types of playful interactions (solitary vs social; rough vs gentle vs locomotor; dyadic vs polyadic). The Wilcoxon matched-pair signed-rank was also employed to test whether the attentional state of the receiver influenced the emission of signals recruiting different sensory modalities. Spearman test was applied to check for possible correlations between the age of the subjects and the signals used.

We made use of exact tests according to the threshold values suggested by Mundry and Fisher (1998). The analyses were two tailed, and the level of significance was set at 5 %.

Results

We recorded 2,883 solitary and 1,250 social play sessions: 69 involved only adults, 717 involved adult and immatures, and 464 involved only immatures. Within the play sessions, 2,029 playful facial expressions and 1,766 gestures were recorded.

All the subjects of the colony were involved in social play (meanmin = 240.15 ± 97.34 SE). Only nine subjects were observed to play solitarily (meanmin = 36.16 ± 22.21 SE).

Prediction 1

Both playful facial expressions (exact Wilcoxon’s: T = 0, N = 9, ties = 0, p = 0.002; Fig. 2a) and gestures (exact Wilcoxon’s: T = 0, N = 9, ties = 0, p = 0.00195; Fig. 2b) were more frequent during social than solitary play.

Fig. 2
figure 2

Boxplots showing the distribution of playful facial expressions (a) and gestures (b) performed during solitary and social play. The box plots show the median and 25th and 75th percentiles; the whiskers indicate the values within 1.5 times the interquartile range, IQR. The open dot is an outlier more than 1.5 IQR from the rest of the scores

Prediction 2

The exact Friedman’s test showed a significant difference in the use of playful facial expressions (χ 2 = 9.75, N = 8, df = 2, p = 0.0047; Fig. 3a) and gestures (χ 2 = 9.25, N = 8, df = 2, p = 0.0079; Fig. 3b) across the different types of social play animals engaged in (rough, gentle, and locomotor play). Individuals made a wider use of communicative signals during playful interactions involving a physical contact between players (rough and gentle) compared to those interactions characterized only by locomotor patterns. The results of the Dunnett’s test are reported in the legend of Fig. 3, and only those animals (N = 8) that engaged in all the three types of play were included in the analysis.

Fig. 3
figure 3

Boxplots showing the distribution of playful facial expressions (a) and gestures (b) performed during the three different types of play considered. Rough and gentle play includes playmates’ physical contact. (Dunnett’s test—facial expressions: rough versus gentle: q = 1.06, ns; rough versus locomotory: q = 3, p < 0.01; gentle versus locomotory: q = 3.18, p < 0.01—gestures: rough versus gentle: q = 0.35, ns; rough versus locomotory: q = 2.75, p < 0.01; gentle versus locomotory: q = 3.53, p < 0.01). Only significant differences are reported in the figure

Prediction 3a

When considering the number of players involved in the same session, some differences between the two kinds of communicative signals emerged. We did not find any differences in the use of playful facial expressions as a function of the number of players (exact Friedman’s test: χ 2 = 2.8, N = 5, df = 2, p = 0.367; Fig. 4a); on the other hand, the exact Friedman’s test revealed a different use in the gestural communication according to the number of players (χ 2 = 8.4, N = 5, df = 2, p = 0.008; Fig. 4b), with gestures being more frequent during the dyadic compared to polyadic play sessions. The results of the Dunnett’s test are reported in the legend of Fig. 4, and only those animals (N = 5) that engaged in all the three types of play (2, 3, and 4 players) were included in the analysis.

Fig. 4
figure 4

Boxplots showing the distribution of playful facial expressions (a) and gestures (b) performed as a function of number of players involved (two, three, and four). Dunnett’s test—gestures: two versus three: q = 2.68, p < 0.05; two versus four: q = 2.85, p < 0.05; three versus four: q = 1.34, ns). Only significant differences are reported in the figure

Prediction 3b

The emission of visual signals varied according to the attentional state of the receiver, with senders performing playful facial expressions (exact Wilcoxon’s: T = 0, N = 10, ties = 1, p = 0.004; Fig. 5a) and visual gestures mostly when the receiver could see them (exact Wilcoxon’s: T = 0, N = 10, ties = 1, p = 0.004; Fig. 5b). Differently from gestures exclusively based on a visual modality, those enriched by either a tactile (exact Wilcoxon’s: T = 9, N = 10, ties = 0, p = 0.066) or an acoustic component (exact Wilcoxon’s: T = 4, N = 10, ties = 2, p = 0.065) did not show any statistical difference according to the attentional state of the receiver. We restricted these analyses to dyadic play interactions, in order to avoid any possible error linked to the presence of multiple receivers.

Fig. 5
figure 5

Boxplots showing the distribution of playful facial expressions (a) and visual gestures (b) according to the visual attentional state of the receiver

Prediction 4

The frequency of playful facial expression was negatively correlated with the age of the subjects (Spearman r s = −0.851, N = 10, p = 0.002; Fig. 6); on the other hand, we found no age correlation either with the frequency (Spearman r s = −0.365, N = 10, p = 0.300; Fig. 7) or with the number of different types of gestures performed, i.e. repertoire size (Spearman r s = 0.541, N = 10, p = 0.106).

Fig. 6
figure 6

Scatterplot showing the correlation between the age of the subjects and the frequency of playful facial expressions performed

Fig. 7
figure 7

Scatterplot showing the correlation between the age of the subjects and the frequency of gestures performed

Discussion

Gestures and facial expressions were mainly performed during social, rather than solitary play (Prediction 1 supported). This finding supports the hypothesis that they can be considered as adaptations produced to obtain a social purpose. Differently from gestures, playful facial expressions were also present during solitary play, in agreement with previous findings (Palagi 2008). This “private emotional expression”(van Hooff and Preuschoft 2003, p. 257) observed in bonobos and chimpanzees during solitary play may represent an external sign of gratification due to the rewarding nature of this behaviour, a mechanism apparently absent in monkeys (van Hooff and Preuschoft 2003; De Marco and Visalberghi 2007). An elucidating example describing the emotional nature of the play face was reported by Tanner and Byrne (1993). They observed a gorilla female repeatedly concealing her play face with the hand apparently to avoid the possibility that group members could perceive it. This observation suggests that in some cases, play faces, probably due to their emotional component, are spontaneous and difficult to inhibit. The intentional act of hiding the face indicates that subjects may be aware of the message conveyed by play faces and of the consequences the message entails (e.g. triggering a social play session). The capacity for self-reflection or self-awareness typical of the great apes is probably the precursor of more complex forms of cognition in social communication.

During social play, we expected to find a higher frequency of gestures and facial expressions during rough compared to gentle and locomotor sessions (not involving any kind of physical contact), because it is much more probable that a play-fighting session could degenerate into a real aggression if the players do not accurately balance their actions (Palagi et al. 2007; Palagi 2009; Bekoff and Allen 2002). Therefore, to prevent the risk of being misinterpreted, individuals need to continuously communicate their playful intentions (Bekoff 1974, 1995; Flack et al. 2004). However, Prediction 2 was only partially supported. We found that for both rough and gentle play (involving physical contact), the frequency of playful facial expressions and gestures was significantly higher than that of locomotor play, but we found no difference in the frequencies of communicative signals between the two categories of contact play. Therefore, it is body contact between playmates that determines a more frequent use of both types of communicative signals. Contact play sessions, compared to locomotor, are characterized by a deeper degree of trust between players because of the absence of a “safety distance”. Under these circumstances, communicating is prioritized and more urgent. Moreover, communication can be less efficient during locomotor play, as its main features are chasing, fleeing, and climbing, all patterns connoted by a lack of face-to-face interaction. Obviously, tactile gestures can only occur during contact play; however, this does not explain the higher frequency of gestures during this kind of playful interactions. Indeed, bonobos can choose the best option to communicate with their playmates. For example, during locomotor play, bonobos could use more frequently acoustic or visual gestures. The low rate of gestures recorded during locomotor play strongly suggests that this kind of activity does not necessarily require a constant fine-tuning through intentional communication.

The number of players involved in the same playful interaction modified the use of gestures, which were more frequent in dyadic than in polyadic sessions, but did not influence the frequency of playful facial expressions (Prediction 3a supported). Due to the traits distinguishing intentional communication, it is not surprising that gestures are predominantly performed when a single receiver is attending, even within the playful context. Indeed, the same gesture can be performed in different contexts, and in the same context, different gestures can be produced. The “means-ends disassociation” between gestures and contexts has been reported for the great ape species (Tomasello et al. 1994; Pika et al. 2005; Call and Tomasello 2007; Liebal 2007; Pika 2007a, b; Genty et al. 2009), including humans (Bates et al. 1979), and was interpreted as evidence of the intentional nature of this form of communication, typical of the Hominoidea superfamily. Therefore, the “meaning” of a gesture must be interpreted by the receiver by evaluating the environmental and social conditions in which it is produced (de Waal and van Hooff 1981; Pollick and de Waal 2007; Hobaiter and Byrne 2014). The high degree of freedom in the interpretation of a gesture can lead to a more selective use of this kind of signal. On the contrary, playful facial expressions are context specific and transmit an unequivocal positive message that cannot be misconceived. Due to their linkage with the positive emotional state experienced by the sender while playing (Seyfarth and Cheney 2003; Parr et al. 2005), within the primate order, playful facial expressions seem to extend the duration of the play session (children: Rothbart 1973; chimpanzees: Matsusaka 2004; orang-utans: Davila-Ross et al. 2011; gelada baboons: Mancini et al. 2013) and increase rates of affinitive behaviours (Palagi and Mancini 2011; Waller and Dunbar 2005). These features legitimize a broad use of playful facial expressions, independently of the number of playmates.

It was shown that great apes modify their gestures not only according to the presence/absence of an observer (audience effect—Call and Tomasello 1994; Leavens et al. 1996, 2004b; Hostetter et al. 2001), but also according to its attentional state, in particular the possibility for the receiver to visually perceive the gesture (Call and Tomasello 1994; Hare et al. 2000; Hostetter et al. 2001; Pika et al. 2003; Leavens et al. 2004b; Liebal et al. 2004; Poss et al. 2006; Pollick and de Waal 2007). Therefore, we evaluated how the attentional state of the receiver influences the emission of gestures and facial expressions during play. We limited this analysis to dyadic play sessions, where an unequivocal receiver was present. The results showed that visual gestures and playful facial expressions were more frequently used towards a visually attentive receiver (Prediction 3b partially supported), whereas the attentional state of the receiver did not influence the emission of acoustic or tactile gestures. Our data confirm previous findings (Tomasello et al. 1994; Pika et al. 2003; Hostetter et al. 2001; Leavens et al. 1996) for the use of gestures belonging to the three sensory categories and expand their validity to the play context. Unexpectedly, a visually attending receiver also increased the frequency of playful facial expressions by the sender. We interpret this result as a further evidence of the intentional emission of facial expressions that can be used to communicate a positive playful mood to the visually attending playmate, probably with the purpose of balancing and maintaining the play session over time, as was demonstrated in previous studies (Rothbart 1973; Matsusaka 2004; Davila-Ross et al. 2011; Mancini et al. 2013). This finding is also in line with the evidence that in humans and great apes, different kinds of smile coexist and can be the outcome either of a genuine positive emotional state (Duchenne smile) or of a more manipulative cognitive process (non-Duchenne smile) (Darwin 1872; Ekman et al. 1990; Wild et al. 2003; Gervais and Wilson 2005; Davila-Ross et al. 2011). Research is needed to clarify whether the capacity to discriminate between spontaneous and volitional facial expressions is a prerogative of humans (Surakka and Hietanen 1998), or it is a shared feature which extends also to the great apes.

Focussing on the ontogeny of play communication, we found no correlation between the age of the subjects and the gestures performed, both in terms of frequency and repertoire (Prediction 4a not supported). Our results only apparently are in contrast with the hypothesis of “Repertoire Tuning” (Hobaiter and Byrne 2011b), which predicts a decrease of repertoire size and frequency of gestures with age. Different from Hobaiter and Byrne (2011b; p. 830), who evaluated the influence of age on the use of gestures by pooling the contexts in which they occurred (play, grooming, feeding, agonistic, sexual, travelling, consortship, affiliating, and resting), our analysis was restricted to the playful context. Probably, within the non-functional context of play, the need for selecting only the most efficient gestures appears to be less critical and the lack of inhibition typical of play behaviour also embraces the domain of gestural communication (Burghardt 2005). The neotenic tendencies (Brosnan 2010; Parker and McKinney 1999) typical of bonobos (Palagi 2006; Wobber et al. 2010a, b; Hare et al. 2007; Lieberman et al. 2007) can help to explain the use of gestures in play. Compared to chimpanzees, bonobos show high levels of playful activity into adulthood (Palagi and Cordoni 2012). Adult bonobos use the gestural communication format typical of the immature phase, both in terms of redundancy and repertoire variability.

In contrast to gestures, the frequency of playful facial expressions changed according to the age of the subjects. In particular, the frequency of playful facial expressions decreased with age (Prediction 4b supported). Immature bonobos are probably more emotionally involved than adults when playing. Adult play behaviour could be less rewarding because of the risks associated with this activity. To limit the risk of harming or frightening immature playmates, adults have to restrain themselves while playing. This self-handicapping strategy serves to maintain a playful mood, thus limiting the immature playmates’ mother intervention that could have negative consequences for the adult player. Moreover, play could be considered as an immature-oriented behaviour, in which adults have the mere function of amusing their young playmates, as it has been reported for different human cultures (Power 2000).

On the whole, our results show that in bonobos, the use of facial expressions and gestures is strongly influenced by the characteristics of the playful session. Similarities and differences are probably shaped by the different degree of emotionality and/or intentionality characterizing these signals. There is an impressive use of communicative signals during playful interactions. Therefore, play can provide critical information to shed light on the ontogenetic and evolutionary pathways characterizing non-human and human communication. In particular, given the rewarding nature of this behaviour, play is a good field to explore the possible dichotomy between intentionally and emotionally driven signals.