What is a gesture?

Comparative researchers interested in the origins of human communication often study our closest relatives, nonhuman primates (hereafter: primates), to identify potential precursors to human language. Pre-linguistic children are often used as a ‘point of reference’ in comparative research, as like primates, they lack the ability to speak, but they use a variety of different gestures from an early age on, such as showing or requesting objects, waving good bye, or pointing to objects in their environment (Bates et al. 1979; Butterworth 1998; Iverson and Goldin-Meadow 1998; Liszkowski et al. 2004). Consequently, the criteria used to define a gesture in pre-linguistic children are largely adopted by studies investigating the gestural communication in other primates (Leavens 2004; Leavens et al. 2005).

Central to the definition of a gesture is that it is intentionally used, in a way that it is purposeful, voluntarily produced behavior, which is directed to specific individuals to influence their behavior (Benga 2005). Therefore, a behavior is considered a gesture, if it is produced in the presence of an audience, with initiators tailoring their gestures to the recipients’ attentional state, in a way that visual gestures are only used if the recipient is attending. Furthermore, if communicative attempts fail, signalers may persist and elaborate their gesture to elicit the recipient’s response (Leavens 2004; Leavens et al. 2005). Other researchers extend this definition and highlight that gestures—in contrast to actions—are motorically ineffective (Liebal and Call 2012), in that they are not directly causing the recipient’s response (e.g., pulling someone moves the recipient to the intended location), but involve the gesturer waiting for a response (e.g., by pulling someone at their arm, then letting go and waiting for the other to follow) (Call and Tomasello 2007; Tomasello et al. 1989). Although gesture researchers commonly highlight that only intentionally produced behaviors are considered gestures (Bourjade et al. 2014; Hobaiter and Byrne 2011; Leavens and Hopkins 1998; Pika et al. 2003; Tomasello et al. 1994), there is currently no consensus regarding which and how many of these behavioral markers of intentional production are necessary to define a gesture (Liebal et al. 2013).

While research into human gestures often focusses on the visual modality (Kendon 2004; Liszkowski et al. 2012), comparative researchers additionally consider tactile and auditory gestures. Visual gestures comprise both manual gestures (e.g., ‘arm raise’) and body postures (e.g., ‘present back’). Unlike tactile gestures, such as ‘touch’ or ‘push’, visual gestures do not involve physical contact between the interacting individuals, and may, therefore, be referred to as “distant” signals. Auditory gestures are also “distant” signals and involve a sound (e.g., ‘belly slap’, ‘chest-beat’, or ‘hand clap’), which in contrast to vocal utterances is not produced by the vocal cords, but other body parts (Kalan and Rainey 2009; Pika et al. 2003). It is sometimes difficult to categorize gestures appropriately regarding their sensory perceptual mode, as some gestures are not ‘pure’ signals. For example, ‘throw object’ may include an auditory component if the object hits the ground, or it can be tactile if the object hits the recipient. Regarding the perception of this signal, the tactile and auditory components “overrule” the visual part, as this gesture can also be perceived by a visually inattentive individual (Liebal et al. 2004). There is also a tendency to preferentially consider a gesture tactile over auditory, as ‘slap’ is categorized as tactile gesture, although it also contains auditory and visual components.

Furthermore, the discrimination of gestures from other signal types—facial expressions, vocalizations—is often confusing. For example, like vocalizations, auditory gestures are also acoustic signals, but they are not produced with the vocal cords. Facial expressions (also frequently labeled as “displays”) are also visual signals, but they are often distinguished from visual gestures and thus considered a different signal type. For example, research into the communication of apes differentiates between visual gestures and facial expressions (Call and Tomasello 2007; Pollick and de Waal 2007). Research with monkeys, however, often refers to such facial expressions as “facial gestures” (Maestripieri 1999), or focusses specifically on orofacial movements, which are often linked to the production of vocalizations, like ‘coo’ and ‘threat’ calls (Ghazanfar and Logothetis 2003) or sounds, such as ‘lip-smacks’ (Ghazanfar et al. 2012). This classification of signal types, which is not based on their sensory channels, but on the cognitive skills involved in their use (e.g., voluntary production, sensitivity to the attentional state of the recipient), points to the traditional dichotomy between voluntarily produced, intentional gestures, in contrast to apparently more reflexive, emotional facial expression and vocalizations (Liebal and Oña 2018). This dichotomy, however, is increasingly challenged by some studies suggesting that at least some vocalizations and facial expressions are voluntarily produced (Crockford et al. 2012; Scheider et al. 2016; Schel et al. 2013; Waller et al. 2015).

This already highlights that, across studies and species, there is great variability in how gestures are defined (Hobaiter and Byrne 2017; Liebal 2017). For example, there is little agreement about which body parts should be considered (e.g., manual gestures, any limb movements, body postures, head movements), and whether gestures should be labeled based on their structural properties or social function (e.g., ‘extend arm’ versus ‘reach’). Finally, studies differ in their levels of detail when differentiating between gesture types, in that, for example, some identify ‘foot stomp’ (Tomasello et al. 1994, 1997), while others differentiate between ‘stomp’, ‘stomp other’, ‘stomp 2-feet’ and ‘stomp 2-feet other’ (Hobaiter and Byrne 2011). Consequently, because of this variation in defining primate gestures, it is often difficult to compare findings across studies.

When do gestures emerge?

Most signals, regardless of signal type, are not present at birth and even innate forms of communication might only appear later in ontogeny (Rosati et al. 2014). Furthermore, the emergence of signals does not have to be limited to early years of development, since some signals, for example, those used in aggressive encounters or in sexual interactions, might only appear later in an individual’s lifetime. Likewise, earlier acquired signals might be subject to later modifications, even in adulthood, as a result of the individual’s cognitive, social and physical development, and/or the influence of its social and physical environment (Bard et al. 2017; Cartmill and Byrne 2007; Fröhlich et al. 2017; Hobaiter and Byrne 2011). This can either concern changes in a gesture’s form, as demonstrated for the ‘touch’ gesture in chimpanzees, occurring in 36 different variants (Bard et al. 2017), or its usage, as the very same gesture might be used for different functions depending on age, as shown for the ‘throw-back head’ gesture in siamangs (Liebal et al. 2004). While young individuals use it as an invitation for play, adult siamangs employ it to initiate sexual behavior. Finally, although an individual might not produce a certain signal (yet), it might be able to comprehend it. Nursery-reared chimpanzee infants, for example, first responded to others’ gestures before using gestures themselves to initiate an interaction (Bard et al. 2014). Thus, the ontogeny of gestural communication involves not only the emergence of an individual’s ability to produce a gesture, but also to use it in the correct social context, and to respond to it appropriately if they are the recipient of this gesture.

Importantly, most existing gesture research focusses on the production of gestures (Slocombe et al. 2011) and only more recently, recipients’ responses to such signals and the consequent behavioral outcomes have received increased attention (Graham et al. 2017; Hobaiter and Byrne 2017; Schneider et al. 2017). For the purpose of this paper, however, we mainly focus on the emergence and usage of gestures.

This already points to the complexity of this topic, as these different facets of gestural communication need to be considered when addressing the emergence of gestures and the mechanisms underlying their acquisition.

How are gestures acquired?

Previously, three major mechanisms underlying gesture acquisition have been proposed, each leading to different predictions regarding the degree of concordance of individual repertoires and developmental trajectories of gesture usage (see also Pika and Fröhlich 2018). First, genetic channeling (Genty et al. 2009) (also described as genetic transmission or phylogenetic ritualization) suggests that gestures are predominantly innate. Consequently, gestural repertoires should show little variability across individuals within and across groups or sites, but not necessarily across age groups. Thus, the innate, initially large and redundant gestural repertoire of younger individuals is increasingly “fine-tuned”, resulting in a subset of effective, regularly used gesture types in adults (Byrne et al. 2017; Hobaiter and Byrne 2011), while gesture forms in younger individuals should basically resemble those of adults.

Second, gestures could be acquired by some form of social learning (or social transmission), with observers’ acquiring parts of the behavioral repertoire of another individual (Whiten and Ham 1992). From the different mechanisms that have been suggested to underlie social learning (Call and Carpenter 2002), imitation has received most attention by researchers interested in primates’ gestural development. If primates learn gestures by imitating those of others, we would expect to find very similar individual repertoires within groups, but unlike in the case of genetic transmission, repertoires should differ across groups. In other words, the concordance of repertoires within groups should exceed concordance between different groups or sites. Infants are expected to learn from frequent interaction partners, particularly their mothers, and their repertoires should thus be characterized by a high degree of concordance, both regarding gesture form and usage. However, as gesture forms are not assumed to be innate, we might expect an increasing variability in the forms of one gesture type with the infant’s increasing age, as gesture use is adjusted to different interaction partners and social contexts.

A third mechanism, ontogenetic ritualization, involves the shaping of previously non-communicative behaviors into increasingly ritualized, communicative gestures, in repeated interactions with others (Call and Tomasello 2007; Tomasello 2008). For example, hitting each other frequently occurs in chimpanzee play. From this full-fledged behavior, chimpanzees may ritualize an ‘arm raise’ gesture used to initiate play—instead of actually hitting their partner (Tomasello et al. 1989). This dyadic learning process has been modeled as taking place in repeated interactions (Arbib et al. 2014; Gasser et al. 2014), with gestures representing “abbreviations of full-fledged social actions” (Call and Tomasello 2007; Tomasello 2008). With increasing age, young individuals should acquire an increasing number of gesture types, reaching an asymptote in adults. Individual repertoires should be characterized by high degrees of variability concerning gesture types and specifically gesture forms, as the outcome of such a ritualization might be different for each dyad, which includes the occurrence of idiosyncratic gestures only used by single individuals. Unlike proposed for social learning, there should be little overlap of individual repertoires not only across, but also within groups and sites.

In the following, we will discuss the evidence for and arguments against each of these theories.

What is the evidence?

To investigate if communicative repertoires are innate, a substantial body of research examined, for example, the effect of early social deprivation (Mitchell et al. 1966), the lack of sensory input (Winter et al. 1973), cross-fostering (Owren et al. 1992), or hybridization (Geissmann 1984) on the development of an individual’s communicative behavior. These types of studies, however, focused mostly on vocalizations. For gestures, it is reported that a human-raised gorilla, who never interacted with conspecifics, still produced the ‘chest-beat’ (Redshaw and Locke 1976), suggesting that this gesture represents a species-typical, innate behavior. Kummer (1997) described that after an adult female savanna baboon was transferred into a group of hamadryas baboons, she first responded with her species-typical behavior to the approaches of the male, but soon started to produce the hamadryas-specific behavior. However, these two studies did not specifically investigate how these individuals acquired their gestural repertoires.

This question was first addressed in more systematic ways in a series of studies by Michael Tomasello, Josep Call, and their colleagues (1985, 1989, 1994, 1997). They investigated the learning and use of gestures in captive chimpanzees, with focus on the variability of individual repertoires and their flexible use depending on the social context and the recipient’s behavior. They applied a cross-sectional design (but some individuals were observed repeatedly at different ages) to compare gesture use across different ages (14 months to almost 5 years) and groups. This research was later extended to other ape species, including siamangs and orangutans (Liebal et al. 2004, 2006) as well as gorillas and bonobos (Pika et al. 2003, 2005). Results showed that across species, there was considerable variability among individual repertoires, between age groups, and study sites. For example, individual repertoires of Sumatran orangutans (Pongo abelii) ranged between 6 and 19 gesture types (representing 21 and 66% of the total repertoire observed in this study) (Liebal et al. 2006). Across species, there were several instances of idiosyncratic gestures, which were only produced by single individuals and most likely indicate that apes are able to create novel gestures. Also, gestural repertoires increased with age, with a variety of gestures used in the play context, but decreased again in adults. Additionally, there was evidence for some group-specific gestures, which were used by the majority of individuals of one group, but not in other groups. Because of this high variability of gestural repertoires within and across groups as well as across age ranges, together with the occurrence of some idiosyncratic gestures, the authors concluded that gestures are unlikely to be genetically transmitted over generations (Tomasello et al. 1985), as this would have predicted much higher degrees of concordance across groups. Based on this evidence, it was proposed that gestures emerge from previously non-communicative behaviors, shaped in repeated interactions with other group members in the form of ontogenetic ritualization (Call and Tomasello 2007).

Richard Byrne, Cat Hobaiter, Emily Genty, et al., however, drew a different conclusion from their research. In a comprehensive analysis of several gorilla groups’ gestural repertoires, including individuals from captive and natural settings, Genty et al. (2009) found that gorillas use a species-specific gestural repertoire, with very little variability across groups. Like other studies with captive apes (Call and Tomasello 2007), they found that individual repertoires increased with age, with the numbers of gesture types dropping again in adults (Genty et al. 2009). Although they also observed some group-specific and idiosyncratic gestures, they explained their occurrence by varying housing conditions across zoos and incomplete data sets because of the limited visibility in the wild. Hobaiter and Byrne (2011) describe a very similar pattern for wild chimpanzees. Gestural repertoires varied between individuals and age classes, with no substantial evidence for idiosyncratic or group-specific gestures. Hobaiter and Byrne (2011) further emphasize that ontogenetic ritualization is very unlikely to underlie gesture acquisition in this species, since they were not able to identify the initial behaviors from which two seemingly ritualized gestures (‘reach’, ‘position’) emerged from (although it is unclear to what extent a ritualized gesture needs to resemble the original action, Liebal and Call 2012). They concluded that chimpanzees use species-typical gestures, with most of their gestures shared with other great ape species, since many gesture types found in chimpanzees were also reported for gorillas and orangutans (Hobaiter and Byrne 2011). Together, their findings suggest that great apes do not acquire their gestures by ontogenetic ritualization, as this would have to result in a much higher degree of variability within and across species (Genty et al. 2009; Hobaiter and Byrne 2011). Importantly, although they assume gesture forms to be innate, they emphasize that these  gestures are intentionally produced and flexibly used in a variety of contexts. They further argue that the previously reported variability of individual gestural repertoires, specifically the occurrence of idiosyncratic gestures found by Tomasello et al., can be most likely explained by the short observation periods of these studies, and propose that the longer primates are observed, the more likely it is to capture their complete gestural repertoires, which consequently reduces the degree of observed variability.

However, an increasing number of longitudinal studies, in which great apes are observed from a very young age over several months or even years, offer alternative explanations (Bard et al. 2014, 2017; Fröhlich et al. 2016b, 2017; Graham et al. 2017; Halina et al. 2013; Schneider et al. 2012a, b). In contrast to previous studies, they consider longer periods of infancy, with focus on the development of gestural communication (e.g., onset and early use of gesturing; Bard 1992; Bard et al. 2014; Schneider et al. 2012a), and on specific behavioral contexts (e.g., locomotion-related interactions in mother–infant dyads; Fröhlich et al. 2016b; Halina et al. 2013). For example, Halina et al. (2013) studied captive bonobos in five zoos and differentiated nine different gesture types (two tactile, seven visual) in addition to a variety of actions that mothers and their infants used to initiate carrying events. Although there was some overlap of their repertoires, mothers and infants mostly used different gestures, which seemed to reflect their different roles in carry interactions. Repertoires varied not only between mothers and infants, but also across dyads and within the corresponding age class (mother or infant). These authors also reported that the form of almost all observed gestures closely resembled actions used to initiate carrying, indicating that gestures may have been ritualized from these actions (Halina et al. 2013). For example, the gesture ‘touch’ (the back or shoulder) is structurally similar to the action ‘gather’ (defined as “gather or turn the recipient toward oneself by applying pressure to their body”) (Halina et al. 2013). Based on these findings, Halina et al. (2013) concluded that, in bonobos, ontogenetic ritualization is the major mechanism underlying gesture acquisition in the context of carrying behavior.

Similarly, Fröhlich et al. (2016b) focused on carry initiations in the travel context in two communities of wild chimpanzees, at varying ages of the offspring (between 9 and 69 months of age). In addition to visual, tactile, and auditory gestures, they also considered two vocalizations that occurred during initiations of travel bouts. Like Halina et al. (2013), they found only little overlap in the gestural repertoires within and between the two study sites. Chimpanzee mothers initiated most joint travels and used a larger variety of gestures than their offspring, which is different from what was found in bonobos (Halina et al. 2013). While younger chimpanzees mostly applied actions and vocalizations to initiate carry events, with increasing age, they shifted to more gestural initiations (Fröhlich et al. 2016b). The variability of gestural repertoires within and between groups, as well as between mothers and their infants, questions whether genetic transmission is the major mechanism involved in gesture acquisition.

However, while Halina et al. (2013) conclude that ontogenetic ritualization is key to the emergence of gestures in the carry context, as behaviors become increasingly ritualized gestures with a stable communicative function, Fröhlich et al. (Fröhlich et al. 2016b; Pika and Fröhlich 2018) have come to a different conclusion. They argue that ontogenetic ritualization does not convincingly explain their findings, and propose a revised social negotiation theory (Pika and Fröhlich 2018), inspired by Plooij’s initial studies on wild chimpanzee infants’ communication (1978, 1984). Fröhlich et al. suggest that it is important to consider individual interactional experiences in such developmental processes, as “…gestures are the output of social shaping, shared understanding and mutual construction in real time by both interactants” (Fröhlich et al. 2016b). Thus, unlike ontogenetic ritualization, social negotiation does not require gestures to emerge from previously non-communicative actions, as they are used as “full-blown behaviors” from the beginning (Pika and Fröhlich 2018). However, it seems that in reality, the processes of ontogenetic ritualization and social negotiation are difficult to distinguish, as even in longitudinal studies, it is not possible to exclude the possibility that parts of the shaping-process might have been missed (at least with the methods that have been used to date), while “full-blown” behaviors are much more salient and thus easier to capture.

Up to this point, we have contrasted research suggesting that genetic transmission is the major mechanism of gesture acquisition (Genty et al. 2009; Hobaiter and Byrne 2011) with studies highlighting that gestures are shaped in social interactions (Call and Tomasello 2007; Fröhlich et al. 2016b; Halina et al. 2013). Although the mechanisms suggested to underlie gesture acquisition are fundamentally different, the question we want to address in the following is whether these apparently contradicting accounts can be reconciled. For example, Bard et al. (2014) concluded that mechanisms of gesture acquisition might differ between gesture types. In a study with captive, nursery-reared chimpanzees, they found that 3.5 month-old infants already used requests to initiate tickle play. These gestures occurred significantly earlier than their other gestures, which only started to emerge around 6 months of age (another study of captive and mother-raised chimpanzees found that first gestures emerged at around 10 months of age; Schneider et al. 2012a). Many of the infants’ gestures were requests to play or groom, apart from ‘wrist present’ and ‘rump present’, which were used in negotiating rank relationships. Bard et al. (2014) argue that these two latter gestures do not represent ritualized signals. Instead, they are spontaneously produced “as emotional responses”, and therefore are most likely innate behaviors. In contrast, gestures to initiate play or grooming “…are co-constructed from meaningful social interactions […] through inter-active and inter-subjective processes based on shared communicative meaning” (Bard et al. 2014). This study shows that gesture acquisition in great apes is most likely explained by more than one mechanism, involving a complex interplay of both genetic and social factors (Gillespie-Lynch et al. 2014; Liebal and Call 2012; Perlman et al. 2014).

In another study which points to the complex relationship between genetic predispositions and social factors that influence the ontogeny of gestural communication, Schneider et al. (2012b) compared gestural repertoires of infants and their mothers in captive chimpanzees and bonobos. They found that both infants and mothers were more likely to share specific gesture types with individuals of their own age class (also with individuals of other zoos or of the corresponding other species) than infants would share with their mothers. This finding suggests that, first, it seems unlikely that infants learned their gestures by observing their mothers, as there was no overlap between the infants’ and their mothers’ repertoires (as we will discuss later in this paper). Second, the large overlap among infants, even across different study sites, seems to provide support for a genetical determination of their early repertoires. However, the fact that infants and mothers used rather different repertoires seems to not really fit into this pattern, as we would expect to find little variability across individual repertoires and across different age groups when genetic transmission is assumed to underlie gesture acquisition. It is important to note, however, that this does not necessarily mean that repertoires may not vary over an individual’s lifetime, as genetic channeling predicts that individuals use different gestures, depending on their age and social role (Genty et al. 2009; Hobaiter and Byrne 2011). In other words, Schneider et al.’s (2012b) findings can be explained by the fact that infants need to achieve different social goals than their mothers, and therefore use a different gestural repertoire than their mothers. Together, this demonstrates that identifying mechanisms of gesture acquisition solely based on the extent of variability within and across age groups is problematic, if not misleading, as developmental challenges and social factors influence gestural repertoires and their usage.

To give an additional example, it seems that the onset and developmental pathways of gesture acquisition are influenced by the infants’ increasing degree of independence (from their mothers), resulting in their advanced motility and more varied social interactions outside the mother–infant dyad (Lembeck 2015; Lonsdorf et al. 2014; Schneider 2012; Schneider et al. 2012b). A longitudinal comparison of four nonhuman great ape species, covering the infants’ first 18 months of life, revealed that captive orangutans started to gesture at least 4 months later than the African great apes, who used their first gestures at around 15 months of age (Schneider et al. 2012a). This finding is interesting, as orangutan mother–offspring dyads form strong, long-lasting bonds (van Noordwijk and van Schaik 2009), with infants reaching ‘independence’ later than infants of the other great apes (Wich et al. 2004). Furthermore, across great ape species, the proportion of visual and auditory gestures (signals which are effective over distance) used by the infants increased with age at the expense of tactile signals (which occur in close physical proximity) across all great ape species (Fröhlich et al. 2016a; Schneider et al. 2012a), supporting the idea that increased motility has an impact on the types of gestures used.

Very few studies have explicitly focused on investigating whether social learning is involved in gesture acquisition. The specific processes underlying social learning and imitation in particular, as well as the often incongruent definitions are fiercely debated (Dean et al. 2016; Galef 2013; Tramacere and Moore 2016). However, there appears to be consensus that, compared with human children, nonhuman apes show general difficulties in copying actions, especially novel ones (Bates and Byrne 2010; Subiaul 2016), although apes seem to recognize when they are being imitated (Haun and Call 2008; Nielsen et al. 2005). That being said, experimental studies have shown that some “enculturated” bonobos, chimpanzees and orangutans, who were raised in close contact with humans (Call and Tomasello 1996), are able to reproduce familiar and some more sophisticated actions (Call 2001; Custance et al. 1995; Miles et al. 1998; Myowa-Yamakoshi and Matsuzawa 1999; Tomasello et al. 1993). These inconsistent findings have led authors to distinguish between “simple imitation” of single, often familiar actions or outcomes, which apes are capable of, compared to “complex imitation” that requires complete and accurate execution of novel actions or outcomes, which has not been reported for apes (Arbib 2005; Subiaul 2016; Tramacere and Moore 2016).

To study mechanisms underlying gestural acquisition, Tomasello et al. (1997) trained two adult chimpanzees to use novel begging gestures to obtain food rewards. When re-introduced to their group, these chimpanzees used the newly acquired gestures to beg for food from a human. The remaining chimpanzees of the group, however, failed to produce the gestures, despite observing the trained chimpanzees receiving food in response to their begging. Tennie et al. (2012) extended this paradigm and trained a chimpanzee model to perform both novel and familiar gestures. However, except for one individual who imitated familiar (but not novel) gestures, there was no evidence that chimpanzees copied communicative gestures, regardless of whether they were novel or familiar actions (Tennie et al. 2012). This demonstrates that there is very limited evidence that non-encultured, untrained chimpanzees copy both familiar and novel gestures from other conspecifics. Thus, at least in this specific context of gesture acquisition, it seems that chimpanzees mostly fail to show “simple imitation” of familiar manual actions.

Using a human demonstrator in a “do-as-I-do” paradigm, Byrne and Tanner (2006) repeatedly demonstrated several manual actions to an adult, zoo-housed female gorilla, without training her or rewarding her responses. These actions, such as ‘slap top of head’, ‘slap cheek’, and ‘rub stomach’, were selected based on the criteria that they were physically possible to perform by the gorilla, but were not species-typical behaviors nor part of the female’s extensive repertoire of idiosyncratic gestures. The gorilla spontaneously started to use those gestures, but the replicated gestures were not exact copies of those performed by the human demonstrator. Moreover, as this female has been previously studied intensively over longer periods of time, the closer inspection of existing video data revealed that all of these imperfect copies shown in response to the human’s gestures resembled actions that she had been previously produced. Byrne and Tanner (2006), therefore, concluded that “…gestural imitation in great apes is based on facilitation of rare behaviors in their extensive and often idiosyncratic gestural repertoire (…) rather than on acquiring novel behaviors by imitation”.

Thus, while few studies have explicitly focused on investigating whether imitation is involved in gesture acquisition, current evidence tells us that great apes are unable to imitate novel gestures from other conspecifics or human demonstrators. However, it is important to point out that these existing studies focused on adult individuals (who already used an established gestural repertoire). Furthermore, it remains an open question whether they were incapable or not motivated to imitate others’ gestures (Tennie et al. 2012). Finally, this also points to a methodological challenge, as it is very difficult to prove that an apparently newly introduced action is indeed novel for the apes.

In summary, what is currently known about primate gestural acquisition stems from studies exclusively focusing on great apes, mostly observed in captive settings (Slocombe et al. 2011). Different conclusions are drawn across studies, with some highlighting the uniformity of gestural repertoires across individuals, groups, and species, pointing to inherited gestural repertoires, while others find high variability between gestural repertoires and evidence for ritualized gestures in specific contexts. Some studies, however, suggest that more than one mechanism is involved, with all or at least some gestures genetically transmitted and shaped in social interactions with others. How this “shaping” is proposed to take place—in the form of ontogenetic ritualization, co-construction, or social negotiation—differs across studies. The majority of studies, however, provide little evidence that imitation plays a significant role in the acquisition of gestures in nonhuman primates.

Furthermore, studies generally suggest that even if gestures are inherited, they are intentionally and flexibly used as means of communication, and are adjusted to the context of use and the behavior of the recipient. This highlights that it is not sufficient to merely look at the number of gestures constituting a repertoire, but that we need to investigate the use of, and response to, gestures to fully capture the developmental pathways of gestural communication (Graham et al. 2017; Schneider et al. 2017).

Before we turn to the discussion of why results are so inconsistent across studies and suggest a way forward for future studies, we want to turn to another form of visual communication—not by manual gestures or postures, but by facial expressions. While most studies into primate communication investigate only one signal type (Slocombe et al. 2011), comparing developmental trajectories of gestures and facial expressions, and studying the similarities and differences across these signal types, will enable us to better understand which (if any) of the underlying mechanisms are unique to the gestural modality.

Development of facial communication

In contrast to primate gestures, little is known about the proximate aspects of facial expressions (Liebal et al. 2013; Waller and Micheletta 2013), such as if and how their use is adjusted to the recipient’s attentional state (Liebal et al. 2004, 2006; Scheider et al. 2016; Waller et al. 2015). Even less is known about developmental trajectories of facial communication.

Isolation studies revealed that rhesus macaque infants showed ‘threat expressions’, ‘grimaces’ and ‘lip-smacking’, although they never had any contact with conspecifics (Hinde and Rowell 1962; Redican 1975). This shows that facial expressions are most likely innate, as already suggested by Darwin (1872). In contrast, other studies indicate a more gradual development of facial expressions in nonhuman primate infants (Bard  2003, 2005; Chevalier-Skolnikoff 1982), resembling patterns of the development of facial expressions in humans (Camras et al. 2003; Sroufe 1997; Stenberg and Campos 1990).

For example, new-born chimpanzees smile during rapid eye movement (REM) sleep, indicating similar subcortical maturation processes during ontogeny as in human infants (Mizuno et al. 2006). Bard (2005) described a minimum of four different facial expressions within the first 42 days of life in chimpanzees, which were already similar in their appearance to the facial expressions of adult chimpanzees. For example, new-born chimpanzees raised with human caregivers show cry faces from the age of 5 days, and smile at the age of 11 days (Bard 2003). Pout faces were observed at the age of 17 days, different forms of angry faces from the age of 19 days, and laughter at the age of 37 days (Bard 2005). In a longitudinal study, Chevalier-Skolnikoff (1982) compared infants of several primate species including great apes (orangutans and gorillas), Old World monkeys (stump-tailed macaques and langurs), and humans, by applying Piaget’s six sensorimotor developmental stages to the facial communication of young primates (Piaget 1952). Across species, developmental patterns were similar, with facial expressions emerging in the first years of life, starting with more reflex-like facial movements, later followed by more voluntarily produced facial expressions. The first facial expressions included reflexive rooting, sucking and crying faces, followed by lip puckers in monkeys and apes, and various open-mouth expressions, like smiles and laughter. Unlike new-born monkeys, new-born apes (and humans) initially showed high frequencies of spontaneous, random facial movements. Monkey species, on the other hand, showed a faster development of facial expressions than apes and humans. By 6 months of age, monkeys used threat displays, while apes showed fear faces [please note that the primate fear face is more appropriately referred to as silent bared-teeth face, as its use is not limited to threatening situation (Waller and Dunbar 2005), but varies across species and social systems (Beisner and McCowan 2014; Preuschoft 2004)]. In human infants, fear faces were only observed after 8 months of age, together with expressions of anger, sadness and surprise. Contrary to monkeys, two additional developmental stages were described for human and ape infants, who performed novel facial movements like kissing and tongue protruding (emerging at 18 months in humans and at 3–4 years in apes). De Marco and Visalberghi (2007) studied captive tufted capuchins and found that facial expressions were fully absent at birth, but started to emerge at around 1 month of age. In contrast to rhesus macaques (Chevalier-Skolnikoff 1982), capuchin monkeys started to use lip-smacking very early, followed by the play face, silent bared-teeth display and variants of open-mouth faces (De Marco and Visalberghi 2007). In wild chimpanzees, play faces emerge between 6 (Plooij 1984) and 11 weeks of age (van Lawick-Goodall 1968a), respectively, followed by laughter at 12 weeks of age (van Lawick-Goodall 1968a). Around the same time, at 14-week of age, pout faces accompanied by whimpering sounds have been reported.

While these reports describe when facial expressions emerge in ontogeny, very little is known about how they are used in interactions with conspecifics. For example, van Lawick-Goodall (1968b) mentions the increasing role of visual signals in wild chimpanzees, as soon as infants start to move away from their mothers, which seems to resemble developmental patterns of visual gesture use (Fröhlich et al. 2016a; Schneider et al. 2012a). Lembeck (2015) conducted a detailed analysis of the different facial movements (action units) play faces are composed of across different ape species (chimpanzees, bonobos, and several hylobatid species), using modified versions of the human Facial Action Coding System (FACS) (Ekman and Friesen 1978) (chimpFACS, Vick et al. 2007; gibbonFACS, Waller et al. 2012). At 6 months of age, the prototypical version of this facial expression (Parr et al. 2007a, b), but also variations including other facial muscle movements, were already present across the different ape species (Lembeck 2015). However, in response to their infants’ play face, chimpanzee and bonobo mothers showed a greater variety of actions’ units (or variants of play faces) during play than their infants, while the opposite pattern was found in hylobatids (Lembeck 2015). Ross et al. (2014) found that play faces of 12–15 month-old chimpanzees were frequently matched by their play partners. However, since these authors did not use a FACS-based analysis, it is not clear whether partners matched the exact facial configuration, or merely a variant of the play face.

Taken together, from the very little we know about the ontogeny of facial communication in primates, it seems that many facial expressions are present from an early age on, even in socially isolated individuals, and often within the infant’s first months of life. The onset of facial expressions seems to differ across species, with monkeys showing a faster developmental trajectory than apes. When facial expressions emerge for the first time, their form is already very similar to those facial expressions used by adults. However, systematic comparisons of changes in structural properties of one facial expression over an individual’s lifetime are currently missing. Facial expressions often occur in response to emotional events (play, threats, separation from the caregiver), but very little is known about their social use and whether they are adjusted to the recipients’ behavior.

What can developmental pathways of facial expressions tell us about mechanisms of gesture acquisition?

The direct comparison of findings of longitudinal studies on facial and gestural communication is difficult, since both signal types are usually studied separately from each other, using different theoretical and methodological approaches (Slocombe et al. 2011). Still, from the little we know, it seems that there is no variability across individual facial repertoires, indicating that they are most likely genetically determined. Furthermore, most facial expressions of apes emerge earlier than their first gestures, and they already seem to resemble the facial appearance of adult’s expressions. In other words, a shaping process in social interactions, e.g., via ontogenetic ritualization, seems less likely for facial expressions. This seems to be different from the developmental patterns found in humans: like great ape infants, human infants frequently use uncoordinated facial expressions, which they often use independently of specific contexts (Holodynski and Friedlmeier 2006). During a gradual process, these expressions are increasingly regulated by the caregiver, resulting in the children’s autonomous regulation of their emotions and corresponding facial expressions. Whether a similar process is present in nonhuman primates is currently unknown.

Taken together, it seems that primate facial expressions are genetically more determined and less socially shaped than gestures, as within each species, there is very little variability regarding the structural properties and the usage of facial expression, but also across individual facial repertoires. However, given that there is neither a substantial data set of longitudinal studies nor systematic comparisons of the development of gestures and facial expressions in primates, this conclusion is rather premature.

Which conclusions can we draw from inconsistent findings on gesture acquisition?

We have demonstrated in this paper that findings on gesture acquisition in primates are rather inconsistent. While some studies find evidence for species-typical, inherited repertoires with little variability across individuals and even species (Genty et al. 2009; Hobaiter and Byrne 2011), others conclude that ontogenetic ritualization or similar processes, based on constructing or shaping gestures in social interactions with others, are major mechanisms of gesture acquisition (Fröhlich et al. 2016b; Halina et al. 2013; Tomasello et al. 1994).

Although some authors suggest that more than one mechanism is likely to be involved (Bard et al. 2014; Gasser and Arbib 2018), at first glance, it seems difficult to reconcile these different findings. However, we suggest that the varying findings across studies are rather the result of the different approaches by the two main gesture research groups rather than true differences between species or different groups within one species. As our review showed, the “St. Andrews-team” around Richard Byrne including Cat Hobaiter, Erica Cartmill, and Emily Genty largely supports a genetically determined gestural repertoire, while the “Leipzig-team” of Michael Tomasello and Josep Call, together with Simone Pika, Federico Rossano, Marta Halina, and Katja Liebal suggests that ontogenetic ritualization is the major mechanism underlying gesture development. How is it then possible to reconcile these drastically differing positions? Byrne et al. (2017) suggest that approaches “…that highlight the importance of social interactions in the development of gesturing […] are not incompatible with a phylogenetically ritualized set of available gesture types”. In the following, we take a slightly different perspective and will argue that first, direct comparisons of the results of these two teams are difficult, because they use dissimilar methodological approaches, and second, each team focusses on different aspects of their results. Consequently, findings may appear more different from each other than they are.

Addressing the first issue, the “St. Andrews-team” studied several species, in captive and wild settings, over longer periods of time, but with relatively little longitudinal developmental data. The “Leipzig-team” almost exclusively worked with captive apes, and conducted both cross-sectional as well as longitudinal studies, with focus on the emergence of first gestures and their use in social interactions. Both teams focus on the flexible and intentional use of gestures, but they differ, for example, in the numbers of criteria necessary to define a behavior as an intentionally produced gesture. While the “Leipzig-team” initially emphasized the signaler’s response-waiting and the flexibility of gesture usage as important markers for intentional use, the “St. Andrews-team” defined a behavior as intentional if it was characterized by at least one of the following criteria: audience checking, response-waiting, or persistence in communicative attempts (Genty et al. 2009; for a more detailed discussion, see; Liebal et al. 2013). Regarding the second issue, we suggest that the findings of the two teams are not as different as they seem. For example, both teams report more or less substantial gestural repertoires with idiosyncratic gestures being either rare or absent, and some degree of variability across individuals. However, each team emphasizes different aspects of their findings. The “St. Andrews-team” appears to focus more on gestures shared across individuals (species-typical gestures) and explains the occurrence of variability across groups or individuals as well as idiosyncratic gestures by differences in housing or rearing conditions, or different sampling techniques. The “Leipzig-team”, on the other hand, tends to emphasize the variability of individual repertoires, and pays less attention to environmental factors (e.g., different housing conditions) and those gestures shared across groups and species.

The apparently different findings of the two teams may be easily reconciled by acknowledging that more than one single mechanism is involved in the development of primate gestural communication. Currently, each team is relatively limited by their specific perspective on primate gesture acquisition. If researchers interpret their findings and those of the corresponding other team as potentially two sides of the same coin, which are not necessarily mutually exclusive, this might be a first important step for a better understanding of how primates acquire their gestures.

Possible ways forward

The emerging picture of primate gesture acquisition is more complex than previously expected. Current findings point to an interplay of both genetic underpinnings as well as social factors influencing the emergence and development of gestural communication. While some scholars propose that mechanisms differ depending on gesture type (Bard et al. 2014), others suggest that species-typical gestures are innate, but that their use and reaction to gestures’ of others need to be learned (Genty et al. 2009; Hobaiter and Byrne 2011). Although it is possible to formulate predictions regarding the expected behavioral patterns for each mechanism (Pika and Fröhlich 2018), in reality, it is difficult to disentangle especially ontogenetic ritualization and genetic channeling, as genetic transmission does not necessarily mean that gestures are “hard-wired” and not modifiable in their usage, and maybe even in their form.

In the following, we want to identify some additional gaps of knowledge or “blind spots” in gesture research, with the aim of inspiring some future research on the development of gestural communication in nonhuman primates.

First and most importantly, researchers need to agree on at least some core aspects of a gesture definition. This includes decisions about which criteria are necessary and sufficient to describe intentionality, whether gestures should be limited to movements of the hand/arms or should also include body postures and head movements, and which sensory gestural modalities (tactile, visual, auditory) should be differentiated. Although it seems an obvious suggestion, more species, particularly non-great ape species, need to be studied and systematically compared with regard to their gestural development, at different sites in both their natural habitats and captive settings.

Second, although longitudinal studies are very challenging regarding the time and logistics necessary to conduct them, they are urgently needed to investigate the emergence, use, and potential structural modification of gestures. If we want to answer the question whether some gestures are increasingly ritualized from non-communicative behaviors, it appears insufficient to only examine adult repertoires, as longitudinal studies are more likely to capture this transition into fully ritualized gestures. To systematically investigate this process, we need to develop methods to track changes in the gestures’ structural properties (Roberts et al. 2012), similar to the detailed coding schemes developed for facial movements of different primate species (Caeiro et al. 2013; Parr et al. 2010; Vick et al. 2007; Waller et al. 2012).

Third, if we assume that different mechanisms underlie gesture acquisition in primates, gestural development may take very different forms. We have already highlighted that some gestures might be genetically determined, while others are shaped in social interactions. An additional possibility is that different mechanisms play a role at different times in an individual’s life. For example, although there is no substantial evidence yet for gestures being acquired by imitation, it might well be that imitating others’ gestures only occurs in specific developmental stages, e.g., only in adults, but never in younger individuals. Along those lines, it is important to highlight that individual needs and purposes for communication might vary between different age classes, or males and females, respectively. This of course also impacts the corresponding degree of variability observed across different individuals.

Fourth, to understand patterns of gesture acquisition, the usage and response to gestures should be considered, rather than limiting the scope to repertoire sizes and their composition. Regarding gesture usage, it has been shown that older chimpanzees use fewer, but more efficient single gestures, while younger individuals need to learn which gestures are most efficient and therefore frequently produce gesture sequences, often involving redundant gestures (Hobaiter and Byrne 2011). Furthermore, although chimpanzee infants use visual gestures from an early age on (Schneider et al. 2012a), it is currently unclear whether they adjust their use to others’ attentional state as found in older individuals (Call and Tomasello 2007). Very little is known about how young primates learn to respond appropriately to others’ gestures. For example, chimpanzee and bonobo infants already respond to gestures before they start to produce them (Bard et al. 2014; Graham et al. 2017). Chimpanzee (as well as other great apes) infants respond pervasively to gestures from their mothers and other conspecifics, while mothers are more “selective” in their responsiveness to both infants and other group members (Schneider et al. 2017). However, whether the mothers respond less since they were less motivated, or because the infants did not use the “appropriate” gesture, is currently unclear.

Gasser and Arbib (2018) use computational modeling as a different approach to develop more detailed theories about how gestures are acquired. They argue that even if innateness of gestures is assumed, some form of learning is always involved (e.g., when individuals learn how to use a gesture in the appropriate context), based on a model supporting both ontogenetic ritualization by mutual shaping as well as the “pruning” of innate gestures (Arbib et al. 2014). Their dyadic model of brain mechanisms supporting these learning processes is useful to better understand the roles of the interacting individuals and the neurobiological and cognitive foundations of this process (Gasser and Arbib 2018). Similar modeling approaches may inspire new ways of analyzing and interpreting data, e.g., when combining longitudinal studies with testing explicit hypotheses on how housing conditions or rearing history influences the development of individual gestural repertoires.

Finally, primate communication is mostly studied using a unimodal approach (Slocombe et al. 2011). Gestures, however, are just one of the different communicative means primates use (Hobaiter et al. 2017; Wilke et al. 2017). The comparison with developmental pathways of other communicative modalities, as we have illustrated in this paper for facial expressions, will help to identify the mechanisms underlying their acquisition and patterns of usage shared across these modalities, and those that are potentially unique to gestures. Together, this approach will lead to a better understanding of the complexity of developmental trajectories in primate communication.