Introduction

Gestural communication permeates practically every aspect of great apes’ social lives. Broadly defined as socially directed and mechanically ineffective bodily movements (e.g., Cartmill and Byrne 2007; Pika 2008; Hobaiter and Byrne 2011a), gestures occur in everyday communication across the full range of social contexts from meat-sharing and sex to joint travel and grooming and between all possible combinations of age-sex class relationships, for example: same-sex dyads during affiliation, social grooming, or travel (Goodall 1986; Pika and Mitani 2006; Douglas and Moscovice 2015); male-female dyads during consortship and mating (Hobaiter and Byrne 2012; Genty and Zuberbühler 2014) or mother-infant dyads in joint travel, food sharing, and social play (Plooij 1978; Bard 1992; Halina et al. 2013; Fröhlich et al. 2017).

Early descriptions of gesture use date back to the 1930s (for example: Ladygina-Kohts and de Waal 2002) and were included in the first field studies of chimpanzees (van Lawick-Goodall 1968; Plooij 1978, 1979, 1984; Goodall 1986) and gorillas (Schaller 1963, 1965). Comparative gestural research was initially focused on great apes living in captive settings (chimpanzees, Pan troglodytes: Tomasello et al. 1985, 1989, 1994, 1997; gorillas, Gorilla gorilla: Tanner and Byrne 1996; Pika et al. 2003; bonobos, Pan paniscus: Pika et al. 2005; orangutans, Pongo abelii/pygmaeus: Liebal et al. 2006; Cartmill and Byrne 2007). These studies showed that great apes rely on gestures in their day-to-day intra-specific communication and possess extensive gestural repertoires (for review see: Call and Tomasello 2007). Great ape gestures qualify as intentional signals: irrespective of the species, methods, setting (field/captive), or research focus, across studies researchers find abundant evidence that gestures are regularly produced towards individual recipients in goal-oriented ways across a wide range of social contexts (e.g., Call and Tomasello 2007; Perlman et al. 2012; Bard et al. 2014b; Roberts et al. 2014a; Byrne et al. 2017; Fröhlich et al. 2017). For example: gesturing is adjusted to the visual orientation of the target recipient (e.g., Liebal et al. 2004b; Leavens et al. 2005b; Cartmill and Byrne 2007; Hobaiter and Byrne 2011a); signalers persist in, and sometimes elaborate, their gesturing until their goal is achieved (e.g., Leavens et al. 2005b; Cartmill and Byrne 2007; Hobaiter and Byrne 2011b; Roberts et al. 2014b); and gestures are characterized by a flexible relationship between signal and outcome (means-ends dissociation), implying that individual signalers are able to use different signals/gestures to achieve the same outcome/goal or a single gesture for several outcomes (Tomasello et al. 1994; Pika et al. 2003; Liebal et al. 2006; Hobaiter and Byrne 2014; Bard et al. 2017; Graham et al. 2018).

While the goal-oriented and flexible use of gestural signals by great apes is well established, less attention has been dedicated to the mechanisms underlying gesture acquisition and use during an individual’s lifetime. A thorough understanding of development is critical for deciphering to what extent communication depends on input from the social and physical-ecological environment (Liebal et al. 2013; Bard et al. 2014a; Pika and Fröhlich 2018). In a pioneering study at the first established chimpanzee field site, Gombe in Tanzania, Frans Plooij (1978) described a sequence of communicative development in chimpanzee infants. Following Plooij’s early work (1978, 1979), a number of studies explored gestural acquisition and development in captivity (Savage-Rumbaugh et al. 1977; Tomasello et al. 1985, 1989, 1994, 1997; Schneider et al. 2012a, b; Halina et al. 2013; Bard et al. 2014b). However, while captive studies provide opportunities for more fine-grained analyses, variation in the physical and social environments experienced by captive and wild apes may differently impact their behavior and development (Bard 1992; Tanner and Byrne 1996; Boesch 2007; Hobaiter and Byrne 2011a; Seyfarth and Cheney 2017). To understand to what extent communicative development incorporates input from a range of socio-ecological environments, findings generated in captivity should be complemented by those of populations living in their natural environment (Boesch 2007). Fortunately, the number of studies of gestures and gesturing in wild groups has also grown rapidly in recent years (e.g., Pika and Mitani 2006; Genty et al. 2009; Hobaiter and Byrne 2011a, b, 2012, 2014; Roberts et al. 2012, 2014a; Douglas and Moscovice 2015; Fröhlich et al. 2016a, b, 2017, 2018; Graham et al. 2016, 2018).

This review has two major objectives. First, we discuss how different operationalizations of the term ‘gesture’ have led to substantial variation between lines of gestural research. This variation makes direct comparability between studies challenging, but also highlights the importance of considering different perspectives in building a complete picture of gesture acquisition. Second, we review the breadth of recent research on the mechanisms that shape great ape gestural repertoires (i.e., ontogenetic and phylogenetic origins) and the individual and social factors that impact their use during development (i.e., ontogenetic trajectories). Third, we emphasize that more research on multimodality in relation to the wider socio-ecological context (for example: habitat characteristics and social structure) is needed to achieve a more thorough understanding of communicative development in apes.

Problems with definitions: what is a ‘gesture’?

Despite decades of research, there remains no strict consensus on how to define a gesture. Many researchers would probably agree that gestures include socially directed, mechanically ineffective movements of the extremities (e.g., Tomasello et al. 1997; Pika 2008; Cartmill and Byrne 2010; Hobaiter and Byrne 2011a; Bard et al. 2014b; Fröhlich et al. 2016a). Given that signals (as opposed to cues) are defined in evolutionary biology as traits that have been under selection specifically for their communicative function (Maynard Smith and Harper 2003; Ruxton and Schaefer 2011), this definition has led to many ambiguities. For example, studies including ‘mechanical ineffectiveness’ in their definition seldom specify whether it refers to the form or the outcome of a gesture (Perlman et al. 2012). Moreover, studies vary in terms of whether gestures are restricted only to movements of the hand and fingers (Leavens and Hopkins 1998; Pollick and De Waal 2007; Leavens et al. 2010; Roberts et al. 2012, 2014a), include body postures and bodily movements (for example: bobbing, rocking; Tanner and Byrne 1996; Genty et al. 2009), only include actions qualified by criteria of first-order intentionality, or incorporate different sensory modalities beyond the visual channel (see also Liebal et al. 2018).

Traditionally, comparative psychologists dissociated animal gesture from signals used in dynamic social displays, which has led to confusion. In most recent studies on gestural communication, the ‘gestures’ described go beyond movements of the extremities to encompass those of the entire body or even static body postures (e.g., Genty et al. 2009; Hobaiter and Byrne 2011a; Halina et al. 2013; Bard et al. 2014b; Fröhlich et al. 2016a; although cf. Pollick and de Waal 2007; Roberts et al. 2012). The distinction of a gesture from a ‘display’ is only in the evidence for its intentional use. However, given that the criteria for intentional use are typically not considered or explored in ethological descriptions of displays, comparison across research fields and across taxa becomes problematic. Take, for example, the ‘leaf clip’ gesture used by chimpanzees; outside of gestural research, it is typically categorized as a ‘display’ (Nishida 1980; Matsumoto-Oda and Tomonaga 2005), but within gestural research as a manual gesture with clear evidence for intentional use (Hobaiter and Byrne 2011a, 2012). In the opposite case, the ‘Hand-clasp’ (a social signal used by chimpanzees in grooming) is often categorized as a gesture in ethology (Whiten et al. 1999; Pollick and de Waal 2007; Arbib et al. 2008; Bard et al. 2014b), but without any evidence for (or at least investigation of) its intentional use. What do we call a mechanically ineffective movement of the extremities that functions as a signal, but without evidence that it is goal-directed? A vocalization researcher would not label a chimpanzee vocal ‘Hoo’ signal differently depending on the mental state or internal cognitive processes of the signaler, but a gesture researcher might (Liebal et al. 2013).

The emphasis on intentional use as a key criterion of a gestural signal stems from the excitement generated by the early demonstrations that great ape gestures were the first intentional communicative signals described outside of human language (Hewes 1973; Plooij 1978; Leavens and Hopkins 1998; Tomasello 2008). Today, most gestural researchers require that every token of signal use, irrespective of its physical similarity to previous cases of gesturing, be accompanied by some evidence of intentional use to be classified as a ‘gesture’. So, the distinction between categorizing socially directed, goal-directed physical actions that meet the criteria for intentional gesture, and stereotyped and reflexive behavioral signals that do not (such as the mating displays of many birds), depends on our ability to detect intentional use. However, the detection and description of intentional signals remain the source of significant debate (Bar-On 2013; Moore 2015; Scott-Phillips 2015; Townsend et al. 2017). We have no access to a signaler’s internal cognitive processes, and instead are reliant on external behavioral indications that together suggest intentional behavior. These behavioral criteria for establishing intentional use typically refer to the signaler’s and/or recipient’s visual attention—whether that be moving to produce a signal within a recipient’s line of sight or visual monitoring of the recipient by the signaler during response waiting.

Here, we face another issue in the description of a signal as a ‘gesture’—modality. Gesture is still frequently considered to be a primarily visual mode of communication, perhaps due to the fact that human gesture is generally investigated as action in visual space (Kendon 2004). However, gestures can be perceived through three sensory channels: vision, hearing, and touch. For those gestures with a salient, or even dominant, audible component (for example: ‘slap object’ or ‘leaf clip’), it can be challenging to establish intentional signal use because they can be intentionally directed to non visually attending recipients. Gesture is an intrinsically multimodal (sensu: involving several sensory channels, see definition below) form of communication (Leavens and Hopkins 2005; Cartmill and Byrne 2007; Pollick et al. 2008; Fröhlich 2017; Hobaiter et al. 2017), but at present, the bias towards visual attention in the definition of intentional signal use likely impacts both the range of signals described as gestures, and our ability to detect intentional use in vocalizations and other signal types.

Problems with definitions: what is ‘a’ gesture?

Comparative psychologists have typically focused mainly on signal production in human and non-human primates—particularly great apes—and refer to signal categories such as vocalization, gesture, or facial expression as a ‘modality’ of communication. Multimodal signals are then described as the simultaneous or sequential integration of signals from at least two of the ‘modalities’, e.g., gesture and facial expression (Liebal et al. 2013). However, outside of great ape communication, the term ‘modality’ is typically used to refer to perception through the sensory channels of vision, touch, hearing, olfaction, etc. (Rowe 1999; Partan and Marler 2005). Behavioral ecologists, working across a much wider range of species and taxa, are interested in the ultimate function of complex signals and have typically focused on the senses employed to perceive signals (Partan and Marler 1999; Hebets and Papaj 2005). Here, multimodal signals are those that incorporate multiple sensory modalities.

A single gesture (for example a visual-audible ‘slap object’) thus contains multiple sensory ‘modalities’ from the perspective of a behavioral ecologist, but not from the perspective of a comparative psychologist (Hobaiter et al. 2017; Wilke et al. 2017; Fröhlich and van Schaik 2018). In contrast, a visual-silent gesture such as an ‘Arm wave’ combined with a (visual) facial expression would be classified as multimodal by a comparative psychologist, but unimodal (visual) by a behavioral ecologist (Marler 1961; Wilson 1976; Partan and Marler 1999). It is a mess (that we have contributed to). The different approaches contribute distinct and important parts of the picture, but the inconsistencies in the terminology make subsequent comparison of data on ‘multimodal’ communication across taxa problematic, impeding advances in understanding the mechanisms underlying animal communication. Comparative researchers in the field of primate communication often focus on the phylogeny of language-specific components such as intentionality and reference and have emphasized the combination of communicative categories, such as gestures and facial expressions, by arguing that individual signal types may have different underlying cognitive processes (Waller et al. 2013). However, the impact of novel findings in one field can be enhanced by realigning terminology with that of related research fields (e.g., behavioral ecology). Recent studies of chimpanzee communication have started to explore these distinctions. Multimodality in a single signal is ‘fixed’ (a chimpanzee cannot produce the audible components of a pant-hoot vocalization, without also producing the visible facial movements), while multimodality in signal combinations (the addition of a visual-audible vocalization to a visual-silent gesture) is optional and represents an opportunity for ‘flexible’ communication (Davila-Ross et al. 2015; Hobaiter et al. 2017; Wilke et al. 2017; Fröhlich and van Schaik 2018). Signal combinations enable signalers to adapt their signaling to a specific physical or social environment (Hobaiter and Byrne 2017; Wilke et al. 2017). This distinction between fixed and flexible combination of communicative units (i.e., signal categories as well as sensory components) presents a fascinating new area for testing the function of and cognitive prerequisites for different types of multimodal and multicomponent communication in great apes. In addition, the thorough study of multimodality also requires researchers to differentiate between the production and comprehension of signals (such as individual gestures, vocalizations, facial expressions) and signal combinations.

Theories of gestural acquisition

The possible mechanisms of gesture acquisition are inextricably linked to the different ways that developmental trajectories in gesture use were investigated. Research on apes’ gestural acquisition has been ongoing for several decades (e.g., Plooij 1978; Arbib et al. 2008; Pika 2008; Liebal and Call 2012), with a special issue on the topic published over the past year (Bard et al. 2017; Byrne et al. 2017; Leavens et al. 2017; Kersken et al. 2018; Liebal et al. 2018; Pika and Fröhlich 2018; Tomasello and Call 2018; Arbib and Gasser in press), and so here, we provide only a brief overview.

Researchers initially differentiated between individual and social learning processes of gesture acquisition (reviewed in Liebal and Call 2012). Building on Plooij’s (1978) early descriptions of the ‘social negotiation’ of a behavior into a signal (which he termed ‘conventionalization’), Tomasello and colleagues developed the first formal hypothesis of gestural acquisition, termed ‘Ontogenetic Ritualization’ (OR). They adapted the ethological concept of signal evolution over phylogenetic time (‘ritualization’); in OR, the forms that gestures take derive directly from repeated social interactions in which individuals participate through an individual learning process (Tomasello 1990; Tomasello et al. 1994). A series of studies, all conducted in captivity, found indirect support for this hypothesis by reporting the presence of idiosyncratic gesture types (i.e., gesture types unique to single individuals) and greater levels of similarity within, as opposed to between, groups (Pika et al. 2003, 2005; Liebal et al. 2006; Halina et al. 2013). In contrast, any evidence for the acquisition of gestural signals by imitation, or group-specific socially learned gesture types remained negligible (Tomasello et al. 1989, 1997; Tanner and Byrne 1996; Byrne and Tanner 2006). Research in captive settings has shown that chimpanzee and bonobo infants share a considerably larger portion of their gestural repertoire with individuals of their age group than with their mothers, further indicating that mothers’ gestures are most likely not imitated (Schneider et al. 2012b).

Studies on great ape gestural communication in the wild (Genty et al. 2009; Hobaiter and Byrne 2011a) presented apparently contrasting evidence for the existence of genetically predisposed, species-specific gestural repertoires in great apes (Byrne et al. 2017). Finding an absence of idiosyncratic or group-specific gestures, significant overlap in species repertoires, and a strong effect of observation time on individual repertoire size, these studies concluded that the repertoire of signals available to great apes was phylogenetically ritualized, in a similar way to the repertoires of signals prevalent across animal and human communication (Hobaiter and Byrne 2011a). In addition to the mechanisms of OR, imitation, and genetic endowment, Perlman et al. (2012) proposed that on-line (‘real-time’) adaptation of action is involved in the acquisition of ape gestures. By studying travel coordination in a captive gorilla mother-infant pair, the authors concluded that ‘directive pushes’ are ‘molded to the physical affordances and social context of the moment of communication’. Bard et al. (2014b) examined gestural ontogeny in infant nursery-reared chimpanzees and found partial evidence for both learning and genetic endowment. Their results suggested that there are different modes of acquisition for different gesture types, with the bulk of gestures co-constructed as a result of social interactions. This premise was further explored in the studies of Fröhlich et al. (2016b, c, 2017) on the gestures that infant chimpanzees in two wild communities produce in interactions with their conspecifics. The authors found that social exposure and context play a substantial role for the gestural usage of young apes and proposed a revised theory of ‘social negotiation’ (Fröhlich et al. 2016c; Pika and Fröhlich 2018). The hypothesis states that gestures do not originate via shortening of a functional action sequence (contra the Ontogenetic Ritualization Hypothesis), but from the exchange of full-blown social behavior (i.e., an action produced in its complete, natural form). This exchange results in a mutual understanding that certain behavior can carry distinct meaning linked to particular social contexts and are produced to achieve distinct goals (Fröhlich et al. 2016c; Pika and Fröhlich 2018).

Different perspectives on gesture and gestural ontogeny

Studies on the onset and development of gestural communications in great apes have been heavily influenced by the diverging definitions of ‘gesture’ as used by the respective researchers. In the past decade, the debate about the acquisition of great ape gestures has pitted hypotheses that incorporate learning mechanisms and genetic predisposition against one another (Hobaiter and Byrne 2011a; Liebal and Call 2012). Here, we argue that the different theories could potentially be reconciled by reconsidering the perspectives taken on gestures and gesturing by the different groups of researchers as representing different levels of explanation (see also Liebal et al. 2018). For example, all groups of gesture researchers describe a gesture type (or category) ‘Touch’—common across all individuals (and indeed all ape species; Call and Tomasello 2007; Hobaiter and Byrne 2011a)—this could be classified as a phylogenetically ritualized gesture. However, at the same time, the specific form of this gesture as produced by any one individual, or in any specific interaction, may vary substantially in the orientation of the signaler movement, or the location of contact to the recipient (Tanner and Byrne 1996; Perlman et al. 2012; Bard et al. 2017), showing ‘real time adaptation’ (Perlman et al. 2012) and/or ‘social negotiation’ (Pika and Fröhlich 2018) of the exact form in a specific interaction. Similarly, the gestural ‘repertoires’ of two individuals can be measured at a specific point in time or developmental stage and be found to differ dramatically (e.g., Schneider et al. 2012b; Fröhlich et al. 2017); but, over a lifetime, the available ‘repertoire’ of gestures expressed by the two individuals may be highly similar. We can also distinguish the way in which an ape produces the gesture, or the way in which a gesture is understood (Hobaiter and Byrne 2017). Hence, depending on the level of explanation investigated, ‘a’ gesture or ‘a’ repertoire might refer to something fundamentally different.

As a result, the apparent differences in the nature of gesture acquisition may have emerged from a focus on different levels of explanation of the gestural system. Many species have a biologically available repertoire of signals. Similarly, we can ask the question: what are the available species-typical repertoires of gestures in great apes, or the set of family-typical gestures that members of all great ape species could produce or discriminate (Genty et al. 2009; Hobaiter and Byrne 2011a; Byrne et al. 2017)? However, in any one individual, and in any one specific communicative event, the use and expression of this available repertoire will vary. In human language, with its cultural diversity of sounds, words, and structures, our phonemes are rapidly channeled through early experience (Ruben 1997; Kuhl 2004). We are left with an individually and culturally specific subset of sounds with which we communicate on a day-to-day basis. Within these, the expression of these sounds in any specific instance of communication may again vary. Any two examples of even a single simple word produced by the same individual likely vary in tone, pitch, and emphasis (e.g., Scherer 1995).

As in any study of behavior, no single approach to the study of gesture is ‘correct’ in providing a more accurate explanation than others—a complete understanding of how gestural signals are acquired and deployed is only acquired by incorporating different levels of explanation (Tinbergen 1963). In the study of available gestural repertoires, the focus lies on the study of gesture as a system (i.e., at the level of the available ‘tool-set’); in every ape population, we observe large overlap in the gesture types used, for example: ‘Present’, ‘Reach’, ‘Touch’, or ‘Arm raise’. In the study of gesturing the focus lies on the use of this system (i.e., at the level of the ‘tool-use’); here, each individual, as consequence of her experiences and the socio-ecological environment, may use specific gesture types a little differently. A mother whose child is a little further away, or whose desire to travel is not urgent, may deploy a ‘Reach’ gesture to solicit her infant to approach and climb on. Another mother, or the same mother in a different situation, might employ a ‘Touch’—which itself may be deployed with varying force and duration, and to different points of contact on the recipient’s body. Moreover, signal production, communicative usage, and comprehension may all show different developmental pathways, which might be in turn suggestive of different cognitive prerequisites (Liebal et al. 2013). Here, interestingly, the variation in the physical-ecological and social environment in which captive and wild chimpanzees develop may have contributed to some of the variation in findings across studies. If the available forms of gesture types are vertically transmitted via genetic endowment, the selection of gesture types, and the appropriate use and response to these gestures may still be learned and affected by development. In other words, although some components of gestures might withstand different rearing environments, others may vary with variation in socio-ecological experiences during development (Hobaiter and Byrne 2011b; Liebal et al. 2013; Fröhlich et al. 2017). The general form of the gesture ‘Arm raise’ (i.e., moving the hand and/or arm vertically above the shoulder) will be the same across social settings and even ape species (Hobaiter and Byrne 2011a; Graham et al. 2018; Kersken et al. 2018), but its fine-grained and contextual use (specific body parts and their orientation and the context in which it is used) might differ across developmental stages, social groups, and environmental settings (Perlman et al. 2012; Bard et al. 2017; Pika and Fröhlich 2018).

Available gestural repertoires: innate and family-typical

In recent studies on gestural communication in chimpanzees and gorillas, Byrne and colleagues (Genty et al. 2009; Hobaiter and Byrne 2011a) proposed that apes’ available gestural repertoires are biologically ‘hard-wired’ and mainly derived from genetic inheritance. A group of gesture researchers, based at the University of St Andrews, have identified an array of gesture types commonly found across ape species, providing evidence that large sections of these gestural repertoires are in fact family-typical (Genty et al. 2009; Cartmill and Byrne 2010; Hobaiter and Byrne 2011a; Graham et al. 2016). These species- and family-typical repertoires of gestures are consistent in basic form throughout development (for example ‘Arm raise’; Genty et al. 2009; Hobaiter and Byrne 2011a). However, they may be expressed differently by specific individuals, or in different instances of communication (for example, in the orientation of the arm and hand). While it remains possible that large species-typical repertoires of gestures could be acquired through social learning, ontogenetic ritualization, or even imitation, biological inheritance provides the most parsimonious explanation—particularly given the prevalence of genetically channeled repertoires of signals across other species, including humans (Kuhl 2003, 2004; Ruben 1997).

One criticism of the hypothesis of biological inheritance has been that, given the natural anatomical constraints, gestural repertoires across ape species will be inevitably similar irrespective of the presumed acquisition mechanism. All ape species share the same basic body plan, and there are limited possibilities to how you can move a body of this type. However, a recent exploration of chimpanzee gestures showed that only around 12% of the physically possible gesture forms were expressed in the chimpanzee repertoire (Hobaiter and Byrne 2017). Byrne and colleagues thus made a strong case for the notion that the majority of gesture types in the available ape repertoire are biologically inherited and, with an extensive overlap in repertoire across all great ape genera, their phylogenetic origin is then argued to be relatively old (Byrne et al. 2017).

When describing the available repertoire, it is challenging to discriminate different gesture types. For example, the gesture ‘Touch’, used as a label across many studies, may or may not include the gesture types: stroke, light touch, etc. (Hobaiter and Byrne 2011a). One recent study distinguished 36 forms of this single ‘gesture’ (Bard et al. 2017). Should we discriminate a ‘Hand shake’ from an ‘Arm shake’, an ‘Arm swing’ from a ‘Leg swing’? Again, there is no ‘correct’ approach. The appropriate level of discrimination depends on the question being asked. One approach employed by Hobaiter and Byrne (2017) is to use ape behavior to guide the process. If apes employ two ‘types’ of gesture to consistently achieve the same goal, we can make the case that—from the apes’ perspective—they are a single gesture ‘type’. After splitting gesture forms to a highly detailed level (resulting in 1005 possible gesture types), gestures were lumped into ‘types’ based on consistencies in the behavioral responses of recipients, resulting in a repertoire of 81 gesture types in chimpanzees. Another approach to discriminate gesture types, which also employs the apes’ behavior, is to classify the meaning of individual signals directly, as done by Bard et al. (2017). Here, rather than exploring consistent patterns of use, gesture meaning is deciphered for every single instance of gesture use.

Gestural usage: shaped by interactional experiences

Evidence that the available gestural repertoires of great apes are largely innate (Byrne et al. 2017) does not prevent considerable modification of and flexibility in gestural usage throughout an individual’s life time (Hobaiter and Byrne 2011b; Pika and Fröhlich 2018). Previous studies in both captive (Tomasello et al. 1989, 1994, 1997; Schneider et al. 2012a, b; Bard et al. 2014b, 2017) and wild settings (van Lawick-Goodall 1968; Plooij 1978; Hobaiter and Byrne 2011b; Fröhlich et al. 2016b, c, 2017) suggested that the development of gesture usage in chimpanzee infants is linked to entering their social world and the opportunities it affords to interact with conspecifics. Given that communication takes place in a wide range of social and physical (ecological) environments, in many behavioral contexts, and over an individual’s lifetime, it is likely that individuals rely on input from their social environment before communicative skills fully manifest (Liebal et al. 2013). For instance, Bard et al. (2014b) examined gestural ontogeny in nursery-reared chimpanzees and suggested that the majority of gestures used by individuals emerge through “co-construction”, that is, through social interactions based on shared communicative meaning, which may differ as a function of context. In a study carried out in two communities of wild chimpanzees, Fröhlich et al. (2016c) found evidence for considerable inter-individual variation in the mothers’ gestural repertoires used to initiate joint travel with their offspring. Another study focusing on three different communicative contexts—food-sharing, joint travel, and social play—examined the role of social exposure, namely behavioral context, interaction rates, and maternal proximity, for infant gestural production (Fröhlich et al. 2017). Interestingly, the rate of previous interaction with conspecifics, but not with their mothers, had a positive effect on gestural frequency and repertoire. Indeed, the number of gesture types used by infants (aged between 9 and 69 months) increased with the number of interaction partners in the previous month of life. The empirical link between social exposure and gestural performance suggests that learning via repeated social interactions shapes the communicative development of gesturing in young apes (see also Bard et al. 2014b). While the mother-infant relationship is critical for normal social development (Maestripieri 2009), early socialization in the wider social environment seems to be essential to develop social competency later in life (Parker and Asher 1987; Hamilton 2010). In sum, accumulating evidence from across different studies and sites suggests that communicative development is reliant on the infants’ early social environment (e.g., van Lawick-Goodall 1968; Hobaiter and Byrne 2011b; Fröhlich et al. 2017).

The developmental trajectory in gestural communication

In contrast to the mechanisms of gesture acquisition in great apes, the age of emergence and developmental trajectory in gesturing has received less attention to date. Longitudinal studies of ontogenetic trajectories are still rare, especially for great apes living in their natural environments. In the first longitudinal study of chimpanzees at Gombe, Lawick-Goodall (1967) (Goodall 1986) noted that non-vocal signals in the first few months of life are limited to variations in body contact for mother-infant coordination. Plooij (1978), later focusing on communicative development in the same community, observed a gradual transition towards goal-oriented and voluntary (‘illocutionary’) communication in chimpanzees between 9 and 12 months, in a similar manner to human infants (Bates et al. 1975, 1979). During this transition, chimpanzee infants gradually began to deploy intentionally communicative gestures to influence the behavior of conspecifics and to initiate interactions such as play and grooming.

In a captive setting, Schneider et al. (2012a) investigated gestural onset and the emergence of tactile, visual, and auditory gesturing across great ape genera (Gorilla, Pan, Pongo). As seen in wild chimpanzees, infants of the three African ape species (chimpanzee, bonobo, and lowland gorilla) started gesturing towards the end of their first year. Orang-utan infants showed a later onset, only starting to gesture at around 15 months of age, perhaps reflecting their slower life histories (Wich et al. 2004; van Noordwijk and van Schaik 2005). While tactile and visual gestures emerged at around the same time and were used in similar proportions in the first months of gesturing, auditory gestures emerged significantly later in the African ape genera and were not observed in the orang-utan infants studied (Schneider et al. 2012a). A study on chimpanzee infants reared by human caretakers in a peer-group nursery found that in interactions with caretakers, some gestures emerged even before the age of 9 months and at different ages for different contexts, suggesting that specific cognitive mechanisms played no major role (Bard et al. 2014b). Taking snap-shots of development at different stages in ontogeny, Tomasello et al. (1997) found that the gestural repertoire increased until the age of 5–6 years and decreased again afterwards, with variation in the use of gestures within different behavioral contexts. For example, gestural requests for grooming and food were used throughout development, the use of gestures in certain contexts vanished after infancy (e.g., nursing), and others typically emerged in older infants (for example, gestures for aggression and sex were mainly employed after reaching 3–4 years old; Tomasello et al. 1997). Some findings from captive settings have been complemented by studies on different chimpanzee communities in the wild, showing that young chimpanzees undergo a developmental shift from actions and tactile gestures to visual communication (Plooij 1978; Fröhlich et al. 2016c), and an increase in auditory communication with infant age (Fröhlich et al. 2016b). This incorporation of visual and auditory signals may reflect the infant apes’ increasing physical and social independence from their mothers. As young apes start to spend time out of physical contact with their mothers, there are more opportunities for non-contact communication.

Infant apes combine their gestures into sequences both as rapid series of gestures (without response waiting and sometimes overlapping) and as strings of gestures that include response-waiting followed by further persistence in gesturing. Almost half of the gestural signals produced by infant apes are within rapid series of multiple gestures, but their use decreases throughout development (Hobaiter and Byrne 2011b). In contrast, the use of strings of gestures to persist in communication peaks in juvenile individuals, before again decreasing in maturity (Liebal et al. 2004a; Hobaiter and Byrne 2011b). Hobaiter and Byrne have suggested that the rapid series of gestures may be a mechanism through which young apes can explore their large repertoires, learning to employ the most efficient signals (“repertoire tuning”). One area that remains to be explored is whether and how gestural development interacts with vocal development and the ontogeny of other modes of communication (e.g., facial expressions, gaze, species-specific sexual signals). Enculturated great apes raised in human-environments and taught to employ signs from American sign language continue to employ their naturalistic gestures (e.g., McCarthy et al. 2013). Further research on great apes living in different social groups and environments, implementing both a cross-sectional and longitudinal study design (e.g., Fröhlich et al. 2018), will help to shed light on the development of gesture as part of a multimodal system.

The effects of context and sex on early gestural communication

Previous research on gestural development suggests that social play is the major context of gesture usage in young African apes (Tomasello et al. 1997; Genty et al. 2009; Hobaiter and Byrne 2011a; Schneider et al. 2012a). Play interactions with peers and other ‘non-mother’ individuals may serve as an essential platform for experimentation, on which great apes can explore the effectiveness of intentional gestures that gain fundamental importance in their adult life (Fröhlich et al. 2016b). Feeding represents another important context in gestural development, with young apes regularly employing their gestures to solicit food transfers (Pika et al. 2003, 2005; Fröhlich et al. 2017), especially in orang-utans (Bard 1992; Schneider et al. 2012a). Both play and feeding contexts may incorporate communicative exchanges related to desirable objects (e.g., Pika and Zuberbühler 2008; Hobaiter et al. 2014). These represent ‘triadic’ interactions, involving a signaler, recipient, and a third entity—prerequisites for the development of referential communication (i.e., communicative acts referring to external entities or events; Leavens et al. 2005a; Pika 2012).

Recent studies of chimpanzee development have highlighted sex differences in the importance of early socialization in chimpanzees (Murray et al. 2014). In the fission-fusion social structure characteristic of wild chimpanzees (Nishida 1968; Aureli et al. 2008), the mother can actively influence the offspring’s social environment through selective subgrouping (Lonsdorf et al. 2014a). From a very early age, infant male chimpanzees in particular seem to exploit these social opportunities: the number of males’ social partners increased with increasing age and distance from the mother (Lonsdorf et al. 2014a, b). These social differences are reflected in sex differences in infant chimpanzee gesturing. For example, male infants deployed more contact gestures than females to solicit play (Fröhlich et al. 2016b) and request food transfers (MF et al. unpubl. data) and, after controlling for age, used a larger variety of gesture types (Fröhlich et al. 2017).

Outstanding questions: multimodality in the ontogeny of ape communication

In the field of animal communication, developmental work has tended to focus on either the vocal or gestural modality independently, with the bulk of work on acquisition carried out on song learning in songbirds (e.g., Marler 1997; Brainard and Doupe 2002; Beecher and Brenowitz 2005). Studies of vocal development in birds and mammals have demonstrated that individual experiences accumulated through social interactions (e.g., responses of conspecifics) can play a substantial role by introducing new sounds into individuals’ repertoires and encouraging improvisation (Snowdon and Hausberger 1997). As discussed earlier, previous research has explored the developmental trajectories of different sensory modalities within ape gesturing (Schneider et al. 2012a; Fröhlich et al. 2016b, c). However, it is crucial to keep in mind that gestures represent part of apes’ larger repertoire of communicative signals, which includes vocalizations and facial expressions (Liebal et al. 2013). For a more thorough understanding, it is critical to investigate communicative development in a holistic fashion, across production modes and sensory modalities (Liebal et al. 2013; Hobaiter et al. 2017; Fröhlich and van Schaik 2018).

In primates, little is known about whether and how the developmental trajectories of multimodal signals (in which two or more components of different sensory modalities must be produced together in order to produce an individual signal) and multimodal signal combinations (in which two distinct signals, which incorporate different sensory modalities, are flexibly coupled) differ from unimodal signaling (Liebal et al. 2013; Bard et al. 2014b; Gillespie-Lynch et al. 2014). Some developmental research on multimodal integration has focused on audio-visual perception in human and non-human primates, whereas multimodal production remains understudied (reviewed in Partan 2013). Even less is known about the development of multimodal signal combinations (Fröhlich and van Schaik 2018). Early explorations of a multimodal or multi-signal approach to chimpanzee communication have found strong effects of age on signal choice (Hobaiter et al. 2017; Wilke et al. 2017), with a bias towards gestural communication in early infancy (e.g., Gillespie-Lynch et al. 2013; Hobaiter et al. 2017; Fröhlich et al. 2018). In light of an increasing body of work that demonstrates a substantial impact of social experience on socio-cognitive and communicative development (Snowdon and Hausberger 1997; Laporte and Zuberbühler 2011; Bard et al. 2014b; Fröhlich et al. 2017; Katsu et al. 2017), we should strive to understand the role of learning and social experience in both unimodal and multimodal signal production (see also Higham and Hebets 2013).

The ‘backup signal’ hypothesis, initially invoked by behavioral ecologists, implies that the different components of multimodal signals are redundant, that is they individually elicit the same response in the receiver (Møller and Pomiankowski 1993; Partan and Marler 1999). Similarly, multimodal signal combinations might be part of a learning process in communicative development in which the immature ape learns to deploy context-appropriate communicative tactics by first using redundant signals sequentially and/or simultaneously (Liebal et al. 2013; Fröhlich and van Schaik 2018). Some support for this explanation comes from studies on chimpanzees. As described above, Hobaiter and Byrne (2011b) found that chimpanzees gradually shift from initially long and largely redundant gestural sequences to selecting more effective single gestures as adults. A recent study on joint travel initiation in mother-infant pairs suggested a developmental shift from multimodal (audible ‘hoo whimper’ vocalizations combined with visual-silent gestures) to unimodal signaling (visual-silent gestures only) in infant chimpanzees (Fröhlich et al. 2016c). There appears to be many more gestures in great ape repertoires than meanings for which they are used (Hobaiter and Byrne 2014; Graham et al. 2018); this redundancy may offer signalers the opportunity to select different sensory modalities in which to communicate similar information. However, the restricted range of meanings described might also result from how observers currently classify ‘meaning’ (for example: requiring a visible behavioral change by the recipient) rather than from a naturally constrained set of meanings (Hobaiter and Byrne 2014; Bard et al. 2017). Apart from the sensory modality in which information is transmitted, the type of information is also a key consideration; for example, bonobos employ gestural signals to differentiate the context in which an ambiguous vocal signal is used (Genty et al. 2014). In chimpanzees, while all vocalizations and some gestures convey information in the auditory modality, vocalizations (and possibly buttress-drumming gestures; Arcadi et al. 1998) also encode the identity of the signaler. Flexibility in the choice of signal or signal modality allows signalers to be selective in their use depending on the potential risk (or benefits) of ‘eavesdroppers’ acquiring the information being transmitted (Hobaiter and Byrne 2012; Hobaiter et al. 2017).

An alternative explanation for the combination of signals and modalities is proposed by the theories of refinement and complementarity (e.g., Partan and Marler 2005; Jacob et al. 2011; Genty et al. 2014; Hobaiter et al. 2017; Fröhlich and van Schaik 2018). Recent studies of chimpanzee and bonobo communication suggest that vocal and gestural signals are not used interchangeably. Chimpanzee gesture-vocal signal combinations were more likely to elicit a behavioral response than vocal signals produced alone, but not as compared to gestural signals produced alone (Wilke et al. 2017). Similarly, chimpanzees were more likely to switch to gesture-vocal combinations following the failure of a vocal signal but not a gestural one (Hobaiter et al. 2017).

If the different signal types or signal components of multimodal communication are combined in order to refine or complement a core message, then we would predict that single components and signals precede the use of more complex communication during development. However, substantial comparative work focusing on the ontogeny of multimodal production in nonhuman primates is needed to reveal what role multimodal signal combinations play throughout development, and across social roles, which also change across ape lifetimes.

Outstanding questions: the impact of the socio-ecological environment

Recently, it has been emphasized that conceptual frameworks must start to consider the impact of signal efficacy and receiver perception on animal multimodal communication (Higham and Hebets 2013), including prior social experience (Hebets and Vink 2007) and ecological factors (Munoz and Blumstein 2012). Despite growing evidence for the impact of social exposure on socio-cognitive and communicative development (Snowdon and Hausberger 1997; Laporte and Zuberbühler 2011; Leavens and Bard 2011; Bard et al. 2014b; Fröhlich et al. 2017; Katsu et al. 2017), the role of previous interactional experiences not only for gestural signaling, but for animal communication in general remains poorly understood (see also Higham and Hebets 2013). Sociality is thought to favor communication via multiple channels, as the interacting individuals are spatially close enough to see, hear, smell, and/or touch each other (Partan and Marler 2005). Multimodality may thus be particularly suitable for the relatively short-distance communication typical for many group-living species of primates (Marler 1965). Fröhlich et al. (2018) explored the developmental trajectories of established behavioral markers of intentional communication in apes on a study of infants’ gestures, actions, vocalizations, and ‘bi-modal combinations’ (a gesture plus vocalization). The authors found that the use of audience checking, goal persistence, and sensitivity to the recipient’s visual orientation increased with infant age. However, context, interaction partner, and group membership (study site) also impacted the selection of signal types, as well as the behavioral markers of intentional communication, strongly suggesting that the social environment needs to be considered in studies of communicative development (Fröhlich et al. 2018).

Another, often neglected, factor in studies on the functionality of ape communication is the physical-ecological environment in which it occurs. The physical environment profoundly impacts the production, transmission, and reception of different modes of information (whether visual, auditory, or olfactory). Nevertheless, while environmental selection pressures on signal efficacy have been mostly discussed in non-primate communication (Partan 2013; Halfwerk et al. 2014; Halfwerk and Slabbekoorn 2015), a number of studies suggest that the physical environment should be considered more often in primate communication. Some aspects of the signaling environment may be obvious: dense vegetation, as found in rainforest habitats, presents a significant obstacle to the transmission of visual information. But rather than a cost, these barriers to information transmission may provide an opportunity. For example, as above, chimpanzees adjust their use of gestures, to limit the transmission of information to a specific audience (Hobaiter and Byrne 2012; Byrne et al. 2017). The impact of the environment on long-distance acoustic signals, such as chimpanzee pant-hoots, has been suggested as a possible explanation for some group differences in the acoustic structure of vocalizations (Mitani et al. 1999). Conversely, variation in habitat acoustics was ruled out as a possible explanation for the variation in howler monkey (Alouatta spp.) loud call volume (Dunn et al. 2015). Other aspects of habitat variation may be more subtle. For example, buttress drumming by wild chimpanzees is a striking multimodal signal, in which the auditory components are thought to encode individual identity (Arcadi et al. 1998), which can be transmitted over 1 km even in dense rainforest. However, their production (and perhaps the extent to which detailed information can be encoded within them) relies on the presence and distribution of large-buttressed tree species, typically present throughout primary rainforest, but which may be sparsely distributed or even absent in secondary or mountain-forest habitat. A study of limb laterality in chimpanzee gesturing found no impact of gesturing while in a terrestrial or arboreal habitat (Hobaiter and Byrne 2013), a comparison of limb choice in the production of the same gesture types, found greater flexibility in limb choice by the more arboreal orang-utan as compared to the more terrestrial chimpanzee (A. Knox et al. unpubl. data).

One feature unique to gestural, as opposed to vocal or facial communication, is the opportunity for individuals to flexibly deploy an alternative signal of a different modality, within the same category of signal. For example, all vocalizations contain both audible and visual information, facial expression contact visual-silent information, but a signaler can choose to selectively deploy a silent-visual gesture (“Arm raise”), or a visual-audible gesture (“Knock object”), or even a visual-audible-tactile gesture (“Slap other”). Given the very large repertoires of signals, and the apparent redundancy in the meanings for which they are deployed (Hobaiter and Byrne 2014; Graham et al. 2018), gestures offer signalers the opportunity to adjust their signaling to accommodate moment-to-moment variation in the physical-ecological environment (as well as to their social one).

A key challenge is to accurately attribute differences in communicative patterns across individuals, social groups, and species of great apes to genetic, social, or physical-ecological factors. Recently, Fröhlich and van Schaik (2018) proposed that social and physical effects of the environment on ape multimodal communication could be teased apart by observing great apes living in different research settings. For example, orang-utans as arboreal species are assumed to rely on tactile and auditory rather than visual signals, because of the restricted vision in their habitat compared to terrestrial species (Marler 1965; Liebal et al. 2006). Considered generally “semi-solitary”, the Bornean (P. pygmaeus) and Sumatran (P. abelii) species are thought to differ substantially in sociability and behavioral variants (van Schaik et al. 1999, 2009). In contrast, in captive settings orang-utans are exposed to a highly social and (semi-) terrestrial lifestyle similar to that of chimpanzees and bonobos. These diverse settings provide the opportunity to examine to what extent the socio-ecological environment is linked to signal usage in non-human great apes. A key step in achieving this goal is increasing collaboration among researchers working on great ape communication, particularly where we are able to reconcile variation in terminology and approach.

Conclusion

In this review, we provide an overview of recent work on gestural ontogeny in great apes. We suggest that apparently disparate views on the fixed or flexible nature of ape repertoires may be largely reconciled by considering them to be different levels of explanation and that subtle differences in the use of terminology across studies and fields may be at the root of apparently contradictory findings. A gesture type may be species-typical, but its specific expression in day-to-day gesturing may be highly flexible. The ‘repertoire’ of two individuals may differ dramatically when measured over a month, or even a year, but may match when measured over a lifetime. While available repertoires appear largely innate and species-typical, inter-individual differences in gesture usage suggest an important role for learning, mirroring the current state of knowledge on primate vocalizations (Cheney and Seyfarth 2018). For any particular instance of gesturing, individual and social variables (including at least: partner identity, age, sex, rank, physical location, visual attention, social and biological relationship, and the presence of bystanders), as well as the behavioral context determine which gestures are selected from the communicative tool set, and how they are deployed. The increasing evidence for the impact of the social environment on gesturing and the need for further evidence on the impact of the physical-ecological environment on communication represent both a challenge and an opportunity for comparative studies of behavior and cognition.

To develop a more thorough understanding of the socio-ecological factors involved in gesture use, we can make use of an explicitly multimodal multicomponent approach. More holistic, comparative work focusing on the ontogeny of primate communication is needed to reveal what role multimodal signal combinations play in development. In turn, this might shed new light on the cognitive processes underlying ape communication, allowing us to gain more insight into the evolutionary continuity between non-human and human multimodal communication.