Keywords

1 Introduction

Relatively little scientific attention has been directed towards the innovative use of animal signals in novel environmental contexts (Hopkins et al. 2007a, b). In particular, primate calls have been long characterized as inflexible, reflexive, biologically determined systems over which animals exert little to no voluntary control (e.g., Arbib et al. 2008; Hauser 1996; Smith 1977). Indeed, it is this alleged inflexibility of calls that is taken as evidence for various versions of the gestural theory of language origins (e.g., Arbib 2005; Arbib et al. 2008; Corballis 1991, 2002; Donald 1991; Hewes 1973). According to these theorists, it is only in the manual gestures of great apes, communicating in the visual modality, that we find evidence for the levels of voluntariness in signalling that approach the voluntary control manifest in modern human speech.

Contrasting with this view are a variety of vocal origins theories, which either argue or assume that language evolved within the auditory–vocal modality (e.g., Deacon 1997; Dunbar 1996; Fitch 2000; Knight 2008; Zuberbühler 2005). Of particular relevance to our argument, Dunbar (1996) has postulated that, as social group sizes increased in hominid evolution, our ancestors developed “vocal grooming” to service affiliative relationships beyond the relatively limited numbers of individuals who could be physically groomed, given the chronic constraints on time. Among primates, species with larger group sizes tend to have larger vocal repertoires, with many researchers noting the strong relationship between measures of social complexity and complexity in call systems (e.g., Freeberg et al. 2012; McComb and Semple 2005). With respect to the origins of language, the unprecedented social complexity of our hominid ancestors is considered to have been an ecological determinant of increased vocal complexity in our lineage (Dunbar 1996).

Dunbar’s (1996) hypothesis was developed to address one of the many questions pertaining to the earliest origins of human language: what was its adaptive function? As Bateson (1972a , b) pointed out so many years ago, the function of language cannot be, in any simple sense, “to communicate”. As Bateson (1972a, p. 411) put it, “There is a general popular belief that in the evolution of [humans], language replaced the cruder [communicative] systems of the other animals. I believe this to be totally wrong”. Based on cybernetic and evolutionary principles, Bateson argued that if language had evolved to supplant the functions of communication, then humans’ non-verbal expressive repertoires would necessarily decay under the repeated scything of natural selection. Yet, in reality, humans have exquisitely subtle capacities for non-verbal expression, through facial expressions, dance, gestures, and systems of touch (e.g., massage techniques). Moreover, it is not the case that animal communication systems are, inherently, maladaptively primitive. Hence, language poses a deep paradox: it is used to communicate, but it has not functionally replaced non-symbolic modes of communication. The adaptive context in which language arose, thus, remains mysterious from a functional point of view: on the one hand, animals communicate perfectly well without language or speech; on the other hand, humans can also communicate effectively, with great impact, without speech, so what is language “for?” Dunbar’s (1996) insight echoes Bateson’s frequent refrain that communication, writ large, is about relationships (e.g., Bateson 1972b).

Dunbar’s theory arose from his study of gelada baboons; he noticed that these social animals, which live in very large, multilevel communities of several hundred individuals, spend up to 20 % of their waking time grooming each other (e.g., Dunbar 2010, for review). Grooming is, fundamentally, about relationships. Social grooming (allogrooming) is a universal feature of the social lives of primates (e.g., Smuts et al. 1987). Grooming is a significant component in the maintenance of friendly social relationships and in the resolution of conflict between erstwhile combatants (Aureli and de Waal 2000). Grooming is much more than merely a mechanism for maintaining skin and coat: there are profound endocrinological (hormonal) effects of receiving grooming from others (Keverne et al. 1989). Dunbar noticed that as group size increases among social primates, so does the amount of time devoted to grooming. Above a certain community group size (~150 individuals, in Dunbar’s thesis), the demands of maintaining relationships begin to conflict with other survival needs, such as time spent foraging. Dunbar postulated that affiliative intentions could, in this circumstance, be communicated via vocal–auditory means, leaving the hands free for foraging. Thus, Dunbar’s theory describes a functional replacement of the grasping and stroking manual actions deployed in grooming bouts to a call-mediated system of relationship maintenance.

Corballis has long been concerned with the functional neuroscience of manual activity and the cerebral, asymmetrical specializations for speech (e.g. Corballis 1991, 2002). For Corballis, there is no coincidence that (a) the vast majority of humans are right-handed (and therefore left-hemisphere dominant for manual action) and (b) the vast majority of humans are left-hemisphere dominant for speech. Corballis (2002) posits that, in the human lineage, the intentional control of manual gestures (and other manual actions) that is manifest among great apes—and therefore presumably the last common ancestor of humans and the other great apes—was acquired by components of the oral cavity, from the lips to the tongue and, eventually, in our larynxes (voice boxes). Indeed, for Corballis, phonemes (the constituent sound units of speech) are occult gestures. Thus, while Dunbar emphasizes the transition from manual grooming to speech, Corballis emphasizes a transition from manual gesturing to speech. Both theories ground the origins of human speech in intentional manual action.

In this chapter, we elaborate the multimodal theory of speech origins through exploration of an intriguing intersection between Corballis’s (2002) theory of the gestural origins of language and Dunbar’s (1996) theory of the origins of speech as the vocal maintenance of grooming/affiliative relationships in complex social environments. Rather than focusing exclusively on either a vocal or gestural origins view of language evolution, some researchers, including ourselves, posit various versions of multimodal (vocal–auditory; visual–gestural) origins hypotheses of language origins (e.g., Hopkins et al. 2007a , b; Hurford 2007; Leavens 2003; Leavens et al. 2004; McNeill 1992; Taglialatela et al. 2011). In contrast to the strictly gestural origin or vocal origin hypotheses, multimodal origin hypotheses of language origins posit that signalling in the vocal and gestural domain coevolved as a single signalling mechanism for intraspecific communication. After a brief review of call production, we will turn to several of the lines of evidence that support a multimodal origin of language.

2 Primate Calls

Calls are produced by air inhaled or expelled through the pharyngeal (oral or nasal) cavities. The primary mechanical engine for inhalations and exhalations is the diaphragm. The air stream produced can be modulated at numerous places in the laryngopharyngeal column, including vibrations at the vocal folds, and a variety of compressions of the airstream in the supralaryngeal (above the larynx) cavities. For example, the lips might be compressed during exhalation, creating a sputtering sound, or during inhalation, creating a kissing or squeaking kind of sound. In human speech, many different consonants are created by different parts of the tongue impacting against different parts of the hard and soft palates (Fitch 2000; Owren and Rendall 2001).

The preponderance of current opinion is that the primary human/non-human animal difference in the control of this articulatory apparatus is that humans display a unique and very high degree of voluntary control over both (a) the emission of sounds with vocal cord vibrations (a.k.a. “voicing”) and (b) the degree of modulation of the airstream in the supralaryngeal cavities (e.g., Fitch 2000; Owren and Rendall 2001; but see, e.g., Lemasson 2011; Owren et al. 2011; Snowdon 2009, for recent reviews of evidence for vocal plasticity in non-human primates). Both claims have been challenged by recent findings. With respect to the assumption that primates lack control over voiced calls, we found, for example, that some chimpanzees display an apparently voluntary extended grunt—a voiced call—to attract attention to themselves (Leavens et al. 2004; Russell et al. 2013; Taglialatela et al. 2012). In addition, a recent study of a gibbon demonstrated apparent voluntary control over the physical properties of the animal’s larynx (Koda et al. 2012). Thus, emerging evidence suggests that some apes do display some apparently voluntary control over voiced calls, in some circumstances (also see Owren et al. 2011).

However, here we are concerned with the second assumption, the idea that humans have a unique ability to voluntarily modulate calls in the supralaryngeal cavities. In a recent review, Owren and his colleagues (Owren et al. 2011) have suggested that other primates do display apparent voluntary control of mostly non-voiced calls. This conclusion is consistent with our own findings that chimpanzees display a spontaneous and manifest choice over the sensory modality of their signalling behaviour, in some experimental circumstances (Hopkins et al. 2007a, b; Leavens et al. 2004, 2010). Moreover, several other lines of evidence converge on the conclusion that great apes have voluntary control over some of their calls [see Hopkins et al. (2011), for a recent review].

3 Evidence from Attention-Getting Behaviour

Evidence supporting the idea of voluntary control over calling behaviour in great apes includes the tactical deployment of both calls and manual gestures by chimpanzees who are exposed to humans in experimentally manipulated states of visual attentiveness. Thus, chimpanzees will display attention-getting calls or other sounds, if an experimenter is facing away from them, but then switch to manual gestures or other visual signals when the experimenter turns to look directly at them (Bodamer and Gardner 2002; Hostetter et al. 2001; Krause and Fouts 1997; Leavens et al. 2004, 2010; McCarthy et al. 2013). Moreover, chimpanzees choose from qualitatively different categories of calls depending on the specific circumstances; if presented with a banana placed outside their cage, but no human, they display species-typical food calls, but if an inattentive human is also present with a banana, the apes display a variety of attention-getting behaviours, including a number of calls that have not been described in these kinds of contexts in wild great apes (Hopkins et al. 2007b). Captive apes frequently face a situation in which they can see desirable items (often, but not always food), but are literally barred from directly reaching out and acquiring these items. Apes in these situations develop tactics for capturing the attention of any humans present and redirecting their attention to the desired entities, for example through pointing. These communicative tactics permit the apes to exert influence beyond the boundaries of their enclosures. Indeed, we have argued that these kinds of contexts, which we have termed the Referential Problem Space, are almost completely absent from the environments of wild apes (e.g., Leavens et al. 1996, 2005, 2008)—wild chimpanzees are only rarely subject to situations in which their instrumental goals on distal objects, such as object retrieval, can only be met through the communicative manipulation of other chimpanzees [see Hobaiter et al. (2013), for rare examples of such contexts among wild chimpanzees]. In contrast, both captive apes and human infants face long daily epochs in which they are physically restrained, and in this context, in the Referential Problem Space, both apes and human children develop communicative tactics for the manipulation of social agents to meet their instrumental goals (Leavens et al. 1996, 2005, 2008). Thus, chimpanzees choose the modality of their signals in accordance with context-specific communicative demands, using auditory signals to capture the attention of visually inattentive humans.

4 Evidence from Attention-Getting Calls

There is considerable inter-individual variability in the attention-getting calls that chimpanzees use when they are soliciting the attention of humans (reviewed by Hopkins et al. 2010, 2011). Recently, Taglialatela et al. (2012) have demonstrated that offspring of captive chimpanzees tend to acquire and use the attention-getting calls of their mothers—significantly more so than their siblings, who were equally related to their mothers, but raised apart from them. Taglialatela (2012) identified six attention-getting calls in their sample (see their Table 1, p. 499):

  • extended grunts (voiced, atonal sounds produced by the chimpanzees with an open mouth);

  • kisses (produced by inhaling air through pursed lip);

  • lip smacks (produced by placing upper and lower lips tightly together then pulling them apart quickly, making an audible “pop” sound);

  • pants (audible, rapid, rhythmic sequence of inhaling and exhaling);

  • raspberries (produced by blowing air out through pursed lips); and

  • teeth chomps (produced by clacking teeth together so that the hitting together of upper and lower jaws is audible).

For purposes of the present argument, note that only extended grunts appear to be voiced, whereas the other 5 call types are all produced by supralaryngeal modification of the airstream. The most significant aspect of these calls, from the standpoint of this chapter, is that, with the exception of the extended grunts, they are used both in the wild and captivity in association with grooming (e.g., Ghiglieri 1988; Goodall 1986; de Waal 1982). Although we currently lack the data to address this question directly, our impression is that these calls, when used in attention-getting contexts, are amplified versions of the softer calls used during grooming sessions by chimpanzees. In more recent work, Russell et al. (2013) demonstrated that some chimpanzees can be trained to display novel attention-getting calls; thus, not only is there a growing body of a posteriori evidence consistent with the view that attention-getting calls are socially learned, this latest study is a direct, prospective experimental demonstration of this capacity in chimpanzees.

5 Evidence from Patchy Distribution of Calls

Another category of evidence for flexibility in calls is the emerging evidence for geographical differences in call repertoires (van Schaik et al. 2003; Wich et al. 2012). The inclusion of calls in some locations, and its absence in others, in the same species, suggests that there is a social, learned component to some calls. This is a different phenomenon from group-based geographical differences in the acoustic structures of calls that are, themselves, displayed across groups, which is well established among some birds (e.g., Barrington 1773; Darwin 1871) and has more recently been widely reported among primate species (e.g., Crockford et al. 2004; Green 1975; Marshall et al. 1999; Wich et al. 2008). There is increasing evidence that call repertoires are geographically distinct in two distinct ways: categorically different call repertoires, in which specific calls are present or absent in different populations, and contextually different uses of calls in different populations.

Among orangutans, for example, van Schaik, Wich and their colleagues have demonstrated that three calls, raspberries, kiss squeak with hands, and kiss squeak with leaves, are patchily distributed among disparate study populations (van Schaik et al. 2003; Wich et al. 2012). For example, raspberries, which are bilabial fricatives associated in this species with nest-building, are reportedly absent from four of six sites studied, but present in two sites, one on the island of Sumatra the other on Borneo (van Schaik et al. 2003). Hence, these calls, made by expelling or inhaling air through slightly compressed lips, are modulated supralaryngeally; they are not automatic emissions tied to particular contexts in this species. More recent work has demonstrated that these calls are distributed independently of the genetic relatedness of individuals who display them (Wich et al. 2012). Wich et al. (2012, p. e36180) concluded that “[o]rang-utans occasionally invent calls with an arbitrary acoustic structure”.

6 Evidence From Language-Trained Apes

Early scientific attempts to teach apes to speak were largely ineffective. In the late nineteenth century, Garner (1896) reported that a chimpanzee could articulate the French word, “fue”. Witmer (1909) reported that a chimpanzee named Peter could, with difficulty, articulate the word “Mama”, on demand. Similarly, Furness (1916) described an orangutan, also named Peter, that could articulate “Papa” and “cup”. Hayes and Hayes (1954) reported that a chimpanzee named Viki could articulate four words, “Papa”, “Mama”, “cup”, and “up”. These very modest findings underscore the apparent difficulty apes have in displaying speech, but they do also highlight that apes can produce novel articulations on demand.

Hopkins and Savage-Rumbaugh (1991 ) demonstrated that Kanzi, a language-trained bonobo, displayed a vocal repertoire that differed acoustically from those of other captive, but not language-trained bonobos. More recently, Taglialatela et al. (2003) identified semantic categories in Kanzi’s idiosyncratic vocalizations. Thus, Kanzi displays substantial innovation in his use of vocal signals.

There are numerous and long-standing reports of apes smoking (e.g., Kearton 1925), and, more recently, Perlman et al. (2012) have documented the ability of another language-trained ape, the gorilla, Koko, to make sounds with such musical instruments as harmonicas and recorders. Recently, Kanzi has demonstrated the ability to inflate balloons by mouth (Daily Mail 2010). This body of evidence demonstrates that apes have voluntary control over their breathing apparatus, the engine for making sounds, and the fronts of their oral cavities.

7 Evidence from Oro-Facial Asymmetries

Many calls are associated with expressive facial expressions that typically accompany those calls. One tactic to assess asymmetries in cerebral function is to evaluate asymmetries in the facial expressions that accompany particular calls. For example, Hauser (1993) reported more rapid retraction of the lips on the left side of the faces of rhesus monkeys, compared to the right side, during emotionally aggressive facial expressions, implying right-hemisphere dominance in these facial expressions (see also Hook-Costigan and Rogers 1998). Among great apes, Hopkins and his colleagues have reported similar asymmetries, demonstrating apparent right-hemisphere dominance during emotional displays (e.g., Fernández-Carriba et al. 2002).

Some calls, however, are associated with oro-facial asymmetries in the opposite direction, implicating left-hemisphere dominance (Losin et al. 2008). In particular, as noted above, this class of calls is distinguished by use as attention-getting signals in ecologically novel, captive environments. Thus, the facial expressions associated with the calls that captive chimpanzees use to attract the attention of human experimenters who are looking away from them tend to display a strikingly different pattern of oro-facial asymmetry, compared to most other calling contexts.

Two of these calls are the raspberry and the extended grunt. While these calls have been reported in ape repertoires in the wild (chimpanzees: Goodall 1986; orangtuans: van Schaik et al. 2003), they have not been reported to have an attention-getting function. Figure 1 depicts this pattern of left-hemisphere cerebral lateralization reported for the faces of chimpanzees displaying these attention-getting calls.

Fig. 1
figure 1

The least squares mean facial asymmetry index (FAI) and 95 % confidence intervals for four calls, including two calls used in captive circumstances to capture the attention of humans (raspberry, extended grunt) and two calls not used in this specific, ecologically novel context (pant-hoot, food bark). See Losin et al. (2008) for complete method, but, in short, this technique involves measuring the areal asymmetries in the left and right sides of the oral cavity at the point of its maximum opening; thus, negative numbers reflect greater oral exposure on the left side of the face, implicating right-hemisphere dominance, and positive numbers, conversely, imply left-hemisphere dominance. Reprinted with permission from Losin et al. (2008, p. e2529; doi:10.1371/journal.pone.0002529.g002)

8 Evidence from the Neuro-Functional Foundations of Attention-Getting Calls

Positron emission tomography (PET) studies of chimpanzee brains during communication have revealed activation of the left inferior frontal gyrus (IFG), including regions identified in human brains as Broca’s area, among other areas (Taglialatela et al. 2008, 2011). Broca’s area has long been identified as a crucial component of humans’ ability to produce articulate speech. In the first of these studies, (Taglialatela et al. 2008) reported that chimpanzees displayed activation of these anatomical homologues of human speech production during vocal and gestural communication, although the independent contributions of vocal and gestural signalling to this activation could not be identified. Subsequently, these same authors compared two chimpanzees who displayed gestures, but not any of the attention-getting calls identified in the previous section with two chimpanzees who did display these attention-getting sounds (Taglialatela et al. 2011). They found that the chimpanzees who displayed attention-getting calls also showed more activation in the left IFG, relative to the two chimpanzees who did not display attention-getting calls, suggesting a unique association of attention-getting calls with a region of the brain that, in humans, is devoted to intentional communication.

9 Evidence from Cerebral Asymmetries

It has been known for a long time that in human populations, Broca’s area and Wernicke’s area, critical for production and comprehension of speech, respectively, are usually larger in the left cerebral hemisphere than in the right cerebral hemisphere (e.g., Foundas et al. 1998; Geschwind and Levitsky 1968). Hopkins and his colleagues (Cantalupo and Hopkins 2001; Hopkins et al. 1998) and others (e.g., Gannon et al. 1998) have demonstrated that these “language” areas can be identified in the brains of great apes, and they are also asymmetrically larger, on average, in the left cerebral hemispheres of these close human relatives, although not every study of Broca’s area homologues in great ape brains finds this asymmetry, suggesting that the degree of asymmetry, here, is less robust in great apes than in humans (e.g., Meguerditchian et al. 2012; Schenker et al. 2010). In related work, there is some evidence that chimpanzees display a weak but significant right-hand bias for bimanual grooming, implicating a left-hemisphere dominance for this activity (e.g., Hopkins et al. 2007a).

10 Evidence from Comparative Neurobiological Studies

There are important cortical regions, nuclei and cranial nerves involved in oro-facial motor control and control of vocal folds. Notably, the trigeminal, facial and hypoglossal nuclei directly innervate the muscles, and recent comparative studies in primates have shown that there are qualitative changes in their volume and architecture between humans and apes compared to monkeys. For example, Sherwood et al. (2005) compared the volume and grey level index (GLI) of these three nuclei in a sample of 47 species of primates and found that for facial nuclei, great apes and humans (after scaling for overall medulla size differences) were significantly larger than predicted for all primates. These authors suggested that these differences may be related to potential differences in oro-facial motor control associated with communication or emotional expressions. In a related study, Sherwood et al. (2004) examined the laminar distribution and density of Brodmann’s area 4 (BA4) in several catarrhine primate species including macaques, baboons, apes, and humans. BA4s located within the ventral portion of the precentral gyrus has been implicated in oro-facial motor control. Humans and great apes showed relatively greater thickness within layer III and lower cell volume densities compared to the Old World monkeys. The lower cell densities were interpreted to suggest that there was greater spacing between neurons within the region providing for greater cortical–cortical connectivity between BA4 and other brain regions. The collective findings suggest that there is enhanced neural representation of cortical control of the oro-facial musculature of chimpanzees, relative to other primates. We suggest that this increased cortical representation may allow for chimpanzees and other great apes to learn new sounds such as the attention-getting sounds discussed in this chapter.

11 Summary of Evidence and Relation to Corballis and Dunbar

Thus, there is a class of calls displayed by chimpanzees that consist of apparently voluntary control over the respiratory apparatus, apparently voluntary control over a variety of post-laryngeal modifications of the airstream, are apparently amplified versions of sounds made during grooming, display a reverse patterning of cerebral dominance, compared with most vocalizations, and are ontogenetically adapted for use in ecologically novel experimental contexts in which chimpanzees are dependent upon humans to act on the world for them, the Referential Problem Space. There are at least two theoretically significant aspects of this pattern of empirical results.

The first significant implication of this pattern of findings is that these sounds are amplified versions of sounds that chimpanzees make during grooming episodes. When chimpanzees groom each other, they might repetitively chomp their teeth, display low-level sputtering, smack their lips together or pant repeatedly. Dunbar (1996) has suggested that when social networks become too large for one-to-one grooming to support those networks, then calling behaviour fulfils that role and sees in this postulate a possible socioecological mechanism that might have fostered oral communication in our hominid ancestors. The patterns we reviewed are consistent with Dunbar’s hypothesis: we find that even in the absence of natural selection, a relatively simple set of changes to chimpanzees’ ecological circumstances elicits remarkable innovation in call use, when these apes are dependent upon others to act on the world outside their cages. We propose that there is substantial, yet heretofore underappreciated flexibility in the call systems of great apes, and we think that it is possibly no coincidence that the calls associated with grooming—crucial for developing and maintaining affiliative social relationships—are the calls that display the most flexibility in use. Grooming is used strategically—and therefore apparently intentionally—in great apes (e.g., Aureli and de Waal 2000; de Waal 1982). The flexibility in these call systems is manifested in the Referential Problem Space. We have previously argued that this socioecological circumstance, in which an organism is dependent upon another to act on the world for them, characterized the early developmental environments of our hominid ancestors, when babies began to be born too weak and helpless to cling to their mothers throughout the infancy period (Leavens et al. 2008, 2009). In contemporary chimpanzees, newborns are similarly weak and helpless, but rapidly develop the capacity to cling to their mothers during locomotion, and this occurs early in infancy—indeed, chimpanzees are capable of independent locomotion by approximately 5 months of age. In contrast, human babies lack this clinging capacity throughout their infancy period, and locomotor development is an extremely protracted process with a duration of several years (Adolph and Berger 2005).

The second aspect of theoretical interest is that these calls are, except for pants and extended grunts, modulated at the very top of the supralaryngeal cavity, specifically at the lips. This is consistent with the evolutionary scenario for language origins proposed by Corballis (2002); in his view, the evolution of language proceeded in our own lineage according to the following order: gestures from the hands to gestures of the mouth and then, finally, to occult gestures of the larynx in contemporary speech. It is, therefore, really quite remarkable that the flexibility in calls that we find in these close relatives of humans is largely manifested at the front of the mouth. We interpret this to be consistent with Corballis’s suggestion, and, moreover, we think that this supports the view that our hominid ancestors were preadapted for supralaryngeal modulation of calling, in the sense that Hauser et al. (2002) proposed that humans share a mosaic of communication characteristics with other mammals. Corballis’s long-standing concern with the left hemisphere as being preadapted for linguistic communication is, we think, supported by the evidence suggestive (a) of right-hand dominance for manual gestures in chimpanzees, particularly when the animals are simultaneously calling (Hopkins and Leavens 1998; Hopkins and Cantero 2003), (b) of right-hand dominance for bimanual grooming in chimpanzees (Hopkins et al. 2007a, b), and (c) of left-hemisphere dominance for speech in humans. This pattern supports the idea that the last common ancestor of great apes and humans were already left-hemisphere dominant for manual grooming, and when the later Pleistocene growth in mean group size in the human lineage exerted the adaptive effects on relationship maintenance postulated by Dunbar—capping, in effect, the amount of time available for relationship maintenance through grooming—the left hemisphere was already preadapted for this affiliative function.

Thus, in modern apes, we find an unanticipated intersection between Dunbar’s (1996) gossip-as-grooming hypothesis and Corballis’s (2002) hand-to-mouth hypothesis. The former implies that grooming calls are those most readily adapted to new ecological circumstances, while the latter implies that the mouth is the next most flexible site for intentionally communicative signalling, after the hands. The evidence suggests that:

  1. (a)

    apes in ecologically novel circumstances tend to adapt grooming calls to novel ends, particularly when attempting to gain the attention of an otherwise inattentive social partner;

  2. (b)

    the intentionality of these attention-getting calls is well established, suggesting that the last common ancestor of apes and humans was preadapted for intentional signalling;

  3. (c)

    the left-hemisphere dominance associated with the production of these attention-getting calls presages the later left-hemisphere dominance for speech found for most humans;

  4. (d)

    the last common ancestor of humans and apes had substantial voluntary control over both their manual and oral gestures.

Hence, on the basis of these premises, we suggest that in the evolution of speech, voluntary control over significant aspects of both visual and auditory communication was already possessed by the last common ancestor of extant non-human and human apes. This ancestor was an ape that lived in the Late Miocene. If this is true, then the epoch of time required to develop the apparently uniquely human, rapid-fire, dynamic control over the larynx and the tongue is greatly increased in duration over most contemporary scenarios for the evolution of speech (e.g., Arbib 2005, Arbib et al. 2008; Corballis 1991, 2002). If the last common ancestor of humans and the other apes already had intentional control over the most rostral portion of the oral cavity, then there are approximately 6.5 million years in which to evolve the further specialized control over lingual and laryngeal structures evinced by our species. Others have noted the relative paucity of appropriate studies of wild apes to address the questions of intentional calling in great apes (e.g., Burling 1993; Owren et al. 2011; Zuberbühler 2005), but recent fieldwork on chimpanzees is beginning to demonstrate substantial apparent volitional control over their calls (see, e.g., Schel et al. 2013a, b). To the extent that this scenario is correct is the extent to which the evolution of speech becomes more of an evolutionarily adaptive solution and less of a deus ex machina.