Abstract
The chapter reviews evidence for the bodily mimesis hypothesis, which states that the evolution of language was preceded by an adaptation for improved volitional control of the body, giving our ancestors advantages in the domains of imitation, empathy, and gestural communication. Much of this evidence is also shared by other gesture-first theories of language origins, but they face the problem of explaining the “switch” from a gestural (proto) language to a spoken one. The bodily mimesis hypothesis fares better with this objection, since it (a) emphasizes the non-conventionality and non-systematicity of bodily mimetic signaling, (b) posits a long biocultural spiral of conventionalization and adaptation for speech, and (c) insists that the transition to speech should be seen as only partial. Following Brown (2012), a cognitive–semiotic explanation can further be given as to why speech has eventually taken on increasingly higher communicative load: Vocalization is intrinsically less capable of iconic representation, and given a multimodal gestural–vocal communicative signal, the vocal element is bound to eventually take on the role of symbolic representation, involving higher levels of conventionality and systematicity.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
- Cognitive semiotics
- Conventionalization
- Gesture
- Iconicity
- Intersubjectivity
- Mimesis
- Multimodality
- Speech
- Symbols
1 Introduction
It is now generally accepted that the human capacity to imitate bodily actions far outstrips that of other animals, including apes (Custance et al. 1995; Call 2001). Another capacity, closely related to imitation, in which human beings excel, is intersubjectivity or empathy (Hurely and Chater 2005; Zlatev et al. 2008). Jointly, imitation and empathy function as springboards for the development of uniquely human capacities for intentional communication in childhood (Piaget 1962; Tomasello 1999; Zlatev 2013). Considerations such as these have given rise to the bodily mimesis hypothesis, stating that an adaptation for improved volitional control of the body gave our ancestors advantages in the domains of imitation, empathy, and (gestural) intentional communication. It is assumed that this paved the way for the evolution of language, with no other biological adaptations being required apart from improved vocal control (Donald 1991, 2001; Zlatev 2008a, b).
The first aim of this chapter is to spell out this hypothesis in some more detail and to sum up the empirical evidence in its favor. To some degree, both the hypothesis and the evidence for it overlap with so-called gesture-first theories of language origins (Hewes 1973; Corbalis 2002, 2003; Arbib 2003, 2005), but there are some important differences, making bodily mimesis less vulnerable to the most common counterargument to gesture-first theories: Why are all current languages of hearing people predominantly spoken rather than gestural, like the signed languages of deaf communities?
The second aim of the chapter is therefore to elaborate on the possible transition from a predominantly mimetic form of communication to a predominantly symbolic one, using the vocal channel. The hurdle has appeared as so great for conceptual as well as empirical reasons, i.e., treating human language as a purely symbolic (“arbitrary”) code. It will be argued that the explanatory task appears differently, and as more manageable, if we rather acknowledge the inherently multimodal nature of linguistic communication, with differential roles for speech and gesture, and furthermore see speech itself not as completely arbitrary, but with a considerable degree of sound symbolism (Ahlner and Zlatev 2010).
2 Bodily Mimesis
Donald (1991) initially proposed that bodily mimesis served a crucial role in evolution in his general theory of human cognitive–semiotic origins, defining mimesis as “the ability to produce conscious, self-initiated, representational acts that are intentional but not linguistic” (ibid: 168). In another characterization, he explicates that “it manifests in pantomime, imitation, gesturing, shared attention, ritualized behaviors, and many games. It is also the basis of skill rehearsal, in which a previous act is mimed, over and over, to improve it” (Donald 2001: 240). Crucially, it allowed a qualitatively new form of culture to emerge: “Mimesis served as a mode of cultural expression and solidified a group mentality, creating a cultural style that can still be recognized as typically human” (ibid: 261). Thus, mimesis is manifested in the evolution of the following cognitive–semiotic capacities or functions, in ways that are uniquely human.
-
(1)
Functions of bodily mimesis are as follows:
-
Learning: through imitation and teaching
-
Skill: through conscious rehearsal
-
Imagination and planning: through re-enactment
-
Communication: through pantomime and other kinds of gesture
-
Culture: through shared practices, concepts, and beliefs.
-
What has made the bodily mimesis hypothesis attractive is that evidence from a number of different sources can be said to converge toward it. Donald (1991) appealed to the paleoanthropology, neuroscience, and gesture studies of his day. In addition, evidence from human ontogeny (Zlatev 2007), comparative psychology, “mirror neuron” neuroscience (Zlatev 2008b), and experimental semiotics (Brown 2012) has been argued to support the hypothesis as well. What follows is an updated summary of this supportive evidence.
2.1 Paleoanthropology
The hominin species with which bodily mimesis is most strongly associated is Homo ergaster, appearing about 1.8 mya in Africa, and the Asian version of this species, Homo erectus, attested between 1.5 and 0.1 mya: “the first universally accepted member of our own genus” (Fitch 2010: 265). The body size of H. erectus had increased at least twice compared to the earlier australopithecines and the brain size even more, to almost modern proportions. The shape of the body had changed as well, giving rise to complete bipedalism, with the capacity for efficient long-distance running—highly adaptive for hunting and/or scouting (Cela-Conde and Ayala 2007). In terms of technology, there was a qualitative shift in style and complexity from older Oldowan to the larger symmetrical hand axes of Achulean technology, requiring considerable skill, practice, and pedagogy. These biological and cultural adaptations, including the domestication of fire, from at least 400,000 mya (Weiner et al. 1998), made migration to most parts of Eurasia possible.
Yet, it is not clear whether all these achievements coincided with the evolution of the vocal control necessary for speech. One possible marker of such control in the fossil record is an extended thoracic canal, needed for controlling breathing during speech (or singing). Based on earlier evidence, it was concluded that H. erectus still had a thoracic canal in the range of australopithecines (MacLarnon and Hewitt 1999). This has been contested on the basis of more recent and extensive evidence, suggesting that the species may have had a thoracic canal in the range of modern humans (Gómez-Olivencia et al. 2007). The debate continues, but it remains that while it is clear that H. erectus must have had improved volitional control of the body and unprecedented level of culture, there is no firm evidence for the simultaneous evolution of speech. Bodily mimesis thus stands as the likely basis for achievements that are both remarkable, compared to those of earlier hominins, and yet limited compared to those of Homo sapiens.
2.2 Mirror Neuron Systems
Gestural/bodily theories of language origins received a major boost with the discovery of so-called mirror neurons, responding both to one’s own and to others’ hand movements, in the 1990s. One argument for their relevance for language was that they were initially found in area F5 in the premotor cortex of the macaque brain, which appears to be homologous to the left inferior frontal gyrus of the human brain, corresponding to the well-known “Broca’s area” (Arbib 2003, 2005). Extensive studies, using various imaging methods, confirmed that BA 44 and 45 (≈Broca’s area) and BA 22, 39, 40 (≈Wernicke’s area) overlap extensively with the (extended) human “mirror neuron system” (MNS) and are activated in tasks involving action recognition, imitation, pantomime, and iconic gestures (Iacoboni 2008).
Early enthusiasm that this would be sufficient to explain both the neural mechanisms of language and its evolution (Rizzolatti and Arbib 1998) was, however, rather premature. Admittedly, there is a major gap between the “parity” of action recognition and that of shared symbolic meanings (Hurford 2004). In response to such criticism, Arbib (2003, 2005) proposed a more elaborated scenario for how the MNS was gradually extended over evolution from serving the function of action recognition (in monkeys), to “simple imitation” (in apes) and to “complex imitation” and pantomime in early Homo, to “protosign” and eventually to speech. Apart from the stage of “protosign,” consisting of “elements for the formation of compounds which can be paired with meanings in a more or less arbitrary fashion” (Arbib 2003: 195), the model is consistent with the bodily mimesis hypothesis (Zlatev 2008b). For example, BA 4 and BA 6 are not credited with being part of the human MNS, but they have been shown to activate during the perception and production of meaningless syllables (Wilson et al. 2004), and BA 44 and 45 likewise are differentially associated with speech. All this is consistent with the hypothesis that speech was only gradually recruited for intentional communication, “atop” older systems serving action, imitation, and gesture.
2.3 Comparative Psychology
One of the primary types of evidence used by Hewes (1973) in arguing for a gestural origin of language was the recent for the time findings of relative success in “ape language” studies using a simple form of American Sign Language (ASL). The large controversies that surrounded these studies have made it clear that apes indeed have highly limited abilities to use manual signs compositionally and “declaratively” (i.e., to provide information rather than to request an action), but also that they are capable of learning manual and other forms of non-vocal signs and to use these flexibly, with close attention to the addressee’s state of attention (cf. Zlatev 2008a). These conclusions have also been confirmed by a number of naturalistic studies of spontaneous bodily communication in great apes, living both in the wild and in captivity (cf. Call and Tomasello 2007). Tomasello (2008: 54) summarizes the contrast between the vocal and gestural modalities in fairly categorical terms: “… primate gestures are individually learned and flexibly produced communicative acts. […] vocal displays are mostly unlearned, genetically fixed, emotionally urgent, involuntary, and inflexible. […] They are broadcast mostly indiscriminately.” Since extant great apes are our best approximate model for the last common ancestor (LCA) of hominins and apes, it is reasonable that the LCA had similar skills and that gesture/bodily mimesis was therefore within its “zone of proximal evolution” (Donald 2001), unlike speech. While several researchers have argued that such an appraisal underestimates chimpanzee vocal capacities and their communicative functions (Slocombe and Zuberbuehler 2005), it seems clear that there is at least a quantitative if not qualitative difference between the flexibility, volitional control, and referentiality of ape gestures as opposed to vocalizations (Pika 2008). Thus again, producing signs with the body was more “at hand” than with the voice.
Looked from the other direction, what are the main differences between ape and human cognition, leaving language aside? It has been popular for some time to downplay such differences (cf. Tallis 2011), but in a recent extensive review article, Vaesen (2012) examines the evidence from nine cognitive domains (including language) related to tool production and use and concludes that “striking differences between humans and great apes stand firm in eight out of nine of these domains” (ibid: 203). The seven non-linguistic domains in which human capacities clearly exceed those of apes according to this review are as follows: (a) hand–eye coordination, (b) causal reasoning, (c) functional representations (e.g., for tools), (d) executive control (e.g., inhibition and planning), (e) social learning (e.g., imitation), (f) teaching, and (g) social intelligence (e.g., passing false-belief tasks). Rather than considering one of these as the crucial difference, Vaesen concludes that “no individual cognitive trait can be singled out as the key trait differentiating humans from other animals” (ibid: 203). This claim is quite in line with the bodily mimesis hypothesis, since mimesis is polyfunctional. Indeed, there is a close correspondence between the functions associated with bodily mimesis under (1) and the features in Vaesen’s list given above, especially when the latter are grouped as (a) motoric, (b–d) cognitive, and (e–g) social–cognitive.
In such a manner, the bodily mimesis hypothesis of the origins of human uniqueness can help generalize over a number of findings from comparative psychology.
2.4 Gestures and Ontogeny
Several decades of extensive research on the spontaneous gestures of adults and their development in children have shown that gestures are ubiquitous in all human cultures and that they align temporally and semantically with speech, at least in adult language use (Kendon 2004; McNeill 2005). The explanations of these findings, however, differ. While McNeill (1985, 2005, 2012) considers speech and gesture (production) to be two parts of a single system, others point out that there are good reasons to regard them as two closely interacting, but distinct systems. The resolution of this controversy has direct implications for evolutionary hypotheses.
It is now generally accepted that gestures share semantic properties with what is being said and that speakers of different languages gesture somewhat differently, in ways that can be related to the semantics of the respective languages (Kita and Özyürek 2003). However, speakers also use gestures to represent objects and events iconically in ways that go beyond what is said and in ways that are similar across languages (Zlatev and Andrén 2009). This is consistent with a model of “the two qualitatively different representations [which] are adjusted with respect to each other and co-evolve” (Kita and Özyürek 2003: 30). Careful analyses have also shown that co-speech gestures synchronize with features of the interaction as a whole, including the responses of the addressee (Sikveland and Ogden 2012) and are thus not automatically tied to speech production itself.
The developmental evidence also appears to support an analysis in terms of two interacting systems rather than a completely inseparable speech–gesture bond of the kind that McNeill envisages. On the one hand, there is general agreement that there is close interaction between gesture and speech in language development (Volterra et al. 2005; Goldin-Meadow 1998; Andrén 2010). Still, it appears that both pointing and iconic gestures emerge prior to speech, at around 9–12 months, and play an essential role for the development of language (Bates et al. 1979; Liszkowski et al. 2012; Lock and Zukow-Goldring 2012). Speech and gesture become gradually integrated in ontogeny, with at least some analyses showing “a gradual specialization from unimodal forms of communication, less demanding in cognitive, social and semiotic terms, to multimodal patterns involving the coordination of specific gestures and vocalizations” (Murillo and Belinchón 2012: 31).
Of course, such apparent gestural primacy in ontogeny is not a strong argument for a corresponding primacy in evolution, since the old principle of “recapitulation” cannot be accepted without prior justification. Still, if gesture plays a scaffolding role for language in development, it is reasonable to suppose that it played an analogous role in evolution as well, since in both ontogeny and phylogeny, (a) bodily movement comes under volitional control earlier than vocalization, as argued in Sect. 2.3, and (b) gesture affords a greater degree of iconicity than speech.
The last point, i.e., the iconic (resemblance-based) relation between at least some gestures and their meanings, has been a rather controversial topic. Intuitively, communicating with the whole body should be easier than only with the voice when lacking a common language, since this is indeed what people do when they need to communicate in such cases. On the other hand, many gestures are conventionalized, and some researchers have even argued that iconicity plays hardly any role at all in gestural communication (Streek 2009). This controversy can be in part resolved by turning to semiotics, where the topic of iconicity has been thoroughly investigated.
2.5 Semiotic Analysis and Experiments
Semiotics is the interdisciplinary field investigating commonalities and differences between different communicative systems, such as visual representations, speech, and gestures (in both spontaneous and artistic forms), and their dependence on and interaction with cognitive capacities including perception, movement, and consciousness (cf. Sonesson 1989). While traditional semiotics was based almost entirely on a form of conceptual analysis and was often quite speculative, modern approaches of experimental (Galantucci and Garrod 2010) and cognitive semiotics (Zlatev 2012) are considerably more empirical. It is the combination of conceptual (intuition-based) analysis and experimental validation that makes semiotics so useful in addressing controversial topics such as the iconicity of gestures.
First of all, it is important to recognize that iconicity and conventionality (as well as the third type of expression–meaning relation known as indexicality, which is contiguity-based) do not stand in a mutually exclusive relation, as pointed out by several of the classics of the field:
One of the most important features of Peirce’s semiotic classification is … that the difference between the three basic classes of signs is merely a difference in relative hierarchy. It is not the presence or absence of similarity or contiguity between the signans and signatum, nor the … habitual connection between both constituents which underlies the division of signs into icons, indices and symbols, but merely the predominance of one of these factors over the others. (Jakobson 1965: 26, my emphasis)
Furthermore, in his defense of the iconicity of pictures, Sonesson established a useful conceptual distinction between primary iconicity, where “the perception of an iconic ground obtaining between two things is one of the reasons for positing the existence of a sign function joining two things together as expression and content,” and secondary iconicity: “the knowledge about the existence of a sign function between two things […] is one of the reasons for the perception of an iconic ground between these same things” (Sonesson 1997: 741). The iconicity of a typical picture (Fig. 1a) is primary, whereas that of a more abstract representation such as that shown in Fig. 1b is secondary: First, when we are told that this represents, e.g., a man in a telephone booth playing a trombone, we can see the resemblance.
The question concerning gestures can now be reformulated along the lines of Jakobson (1965): Does iconicity “predominate” over conventionality at least in some cases and in the style of Sonesson (1997): is it of the primary kind? A recent experimental study by Fay et al. (2013) suggests positive answers to both questions. The researchers asked pairs of participants to play a game in which a “director” had to communicate 24 different concepts, divided in the categories emotion, action, and object, to a “matcher,” without using language, by one of three means: vocalization, gesture, or a combination of both. The results showed that in all cases, matching was above chance and that for the emotion class, the vocalization-only group managed fairly well (ca. 70 %). However, (pantomimic) gestures with or without vocalization had a clear advantage, with success rates approaching ceiling level. The authors conclude that “gesture outperforms non-linguistic vocalization because it lends itself more naturally to the production of motivated signs” (ibid: 1). Since the game was played a number of times by each pair, a degree of simplification and conventionalization of the gestures occurred, but in no point did they become “arbitrary,” or their iconicity purely secondary. On the other hand, the success rates for vocalization-only increased considerably with use, suggesting that conventionalization played a stronger role for successful communication in that medium. This leads to an important conclusion: While both the bodily/gestural and vocal modalities can be used for signs that are fully conventionalized, to the extent of losing all traces of iconicity and indexicality and thus becoming “arbitrary,” the bodily/gestural modality is intrinsically more suited for motivated signs, while the vocal modality is less so. This difference is crucial to explain both why bodily mimesis and gesture are advantageous for establishing a sign system initially and why with time there will be a shift toward the vocal modality, i.e., speech, as argued below.
3 But Why Speech?
The different kinds of evidence discussed in the previous section are supportive not only of the bodily mimesis hypothesis, but also of gesture-first theories of language evolution in general. The proposal of a “gestural stage” in language evolution has always been found appealing to some, but objectionable to others who have theorized about language origins. The major objection can be formulated tersely: Why speech? Even authors who are very well aware of the importance of gesture in human communication find this objection (nearly) “fatal” or “insuperable”:
The gestural theory has one nearly fatal flaw. Its sticking point has always been the switch that would have been needed to move from a visual language to an audible language. (Burling 2005: 123)
Several different lines of evidence, then, can be added up to support the hypothesis that the first step in the evolution towards linguistic expression was taken with the employment of visible action, or gesture, for referential expression. Yet, as has often been pointed out, this seemingly attractive hypothesis faces […] an insuperable problem: Languages are overwhelmingly spoken. (Kendon 2008: 12)
In his critical review of “gestural protolanguage theories,” Fitch (2010, Chap. 13) argues convincingly that appealing to ecological factors is not sufficient to explain the transition to speech, since “each posited advantage can be paired with a similar selective force that would oppose them” (ibid: 443). Communicating in the dark may be beneficial, but silent gesturing is clearly safer in an environment of extensive predation. Speech may be “freeing the hands” for other purposes while communicating, but then it “burdens the mouth,” making communication somewhat difficult and even dangerous during a common communal activity: eating. Analogously, vocal communication may free visual attention, but it burdens auditory attention, and furthermore, in all cultures, linguistic communication is predominantly conducted “face to face,” involving multimodal perception (Kendon 2004).
As Fitch points out, Hewes (1973) did not appeal to such factors but rather to what he then thought to be certain linguistic disadvantages of signed languages compared to speech: having a limited vocabulary, lacking duality of patterning, i.e., the equivalent of phonemes, and being slower. However, such claims have been disproved since then. As even the currently popular praxis of parallel translation between spoken and signed languages shows, signed languages have the full linguistic functionality of spoken languages. This has made them a potent argument against an initial “gestural protolanguage”: If everything that can be said can be just as easily signed, then why turn to speech? Furthermore, as recent studies of emerging signed languages show, modern human beings are capable of spontaneously constructing a signed language from the pantomimic kind of gestures typical of bodily mimesis over the span of a few generations (Senghas et al. 2005; Sandler 2012).
The why-speech argument is indeed damaging to some proposals of gestural primacy, but not to all. On the one hand, proposals differ with respect to how exactly the “gestural protolanguage” is conceived of. Corballis sees it as “a form of signed language similar in principle, if not in detail, to the signed languages that are used today by the deaf” (Corballis 2003: 125). Arbib, it will be remembered, breaks up the evolutionary process in several stages, and preceding speech, there is “proto-sign: a manual-based communication system, breaking the fixed repertoire of primate vocalizations to yield a combinatorially open repertoire […] elements for the formation of compounds which can be paired with meanings in a more or less arbitrary fashion” (Arbib 2003: 195). Bodily mimesis, on the other hand, corresponds to neither: Its virtue (as well as its ultimate disadvantage) is that the type of signs (in the semiotic sense) that it gives rise to is precisely not conventionalized, arbitrary, and combinatorial (Zlatev 2008a).
Furthermore, very few if any of the proponents of gestural primacy in evolution view the transition to speech as a discrete “switch,” but rather as a process that was both gradual and, given the ubiquity of co-speech gesture, still remains only partial:
While human primates must have been at first better at transmitting information through gesture than through voice, at some point voice became the preferred vehicle. But what if this “point” was a transitional period of over half a million years, say, from the appearance of Homo erectus to that of archaic Homo sapiens? And what if, during all this time, humans regularly communicated bi-modally, only gradually shifting from a code that foregrounded gesture to one that foregrounded voice? (Collins 2013: 136)
In general, the less prelinguistic gestural communication is thought of as a “language,” and the less modern the spoken languages are conceived of as purely vocal, the less problematic the why-speech argument appears. While it is indeed damaging for scenarios that frame the transition as one “from hand to mouth” (Corballis 2002), they are not if stated in the much less idiomatic “from body to mouth and body” (Zlatev et al. 2010), that is, from whole-body communication supported by the human-specific capacity for bodily mimesis to the multimodal system of linguistic communication which we use today, involving both speech and gesture.
Thus, the typical counterargument against gesture-first theories is not in principle “fatal” or “insuperable” for the bodily mimesis hypothesis of human cognitive, and linguistic, origins. Still, a more explicit account of how and why the transition has taken place is due. In a recent doctoral dissertation, Brown sets herself this task precisely:
A major step in the evolutionary process by which human communication could have emerged has been proposed in the bodily mimesis hypothesis. … This ability provides a foundation from which symbolic communication can arise, but how such a transition would have taken place has not been fully examined. This thesis examines the gap between bodily mimesis and symbolic communication. (Brown 2012: 1)
Brown reviews different gesture-first theories of language origins and concludes, similarly to Fitch (2010), that those that posit some form of “switch” between an already conventionalized (proto) language and speech (e.g., Corballis 2002; Arbib 2005) fail to provide an adequate explanation for this switch. In addition to the issues discussed in Sect. 3, Brown argues that an intermediary stage of arbitrary gestures, e.g., corresponding to Arbib’s notion of “protosign,” would have minimized support for the stabilization of a conventional code: “the conventionalization process requires a rich and supportive communicative infrastructure in which novel arbitrary signs can be used … so that the intended form-meaning relationships could be correctly interpreted” (Brown 2012: 81). This conclusion is supported by computational models of language evolution, showing that the stabilization of a conventional language across a greater number of speakers requires factors such as extensive corrective feedback or restricted context—neither of which is characteristic of actual communication—or support from parallel non-arbitrary signals.
While theories that posit that “multimodal referential communication was a combination of arbitrary and non-arbitrary representation from inception” (ibid: 116), such as that of McNeill (2012), avoid the need to explain any switch, they face complementary problems since they both predict a stronger degree of speech–gesture unity that appears to be the case (cf. Sect. 2.4) and underestimate the degree of non-arbitrariness in speech.
By method of exclusion, Brown concludes that theories that propose a gradual and only partial transition from mimesis/gesture to speech (e.g., Zlatev 2008b; Collins 2013) are most plausible, but objects that they “do not provide a reason why one modality is now predominantly symbolic and not the other” (Brown 2012: 120), i.e., why speech has undergone a greater degree of conventionalization, showing less iconicity, than gesture.
The answer proposed by Brown is both simple and ingenious: “the vocal modality would have become predominantly symbolic because its lower non-arbitrary capacity increases the likelihood that vocalizations are perceived as arbitrary” (ibid: 134).
This conclusion is supported by the methods of experimental semiotics (cf. Sect. 2.5), showing that the gestural modality carries more “communicative load” than the vocal modality when communication is restricted to non-conventional signaling and furthermore that iconic gestures help the audience to interpret novel vocalizations as meaningful words, even when the latter are perceived as “arbitrary.” Supported by a combination of semiotic experimentation and computational modeling, Brown concludes that in multimodal gesture–vocalization communication, there will be an automatic pull toward increased arbitrariness with the need to communicate a larger and more diverse set of concepts and that this would take place in the vocal modality.
Taken along with the scenario suggested by Collins (2013) of a gradual shift of communicative load from gesture to speech over the duration of “over half a million years” thus gives a plausible answer to the why-speech question: Due to the diversification of hominin cultures, a less iconic (=more symbolic) code would have been beneficial, and since the vocal modality affords less iconicity than the manual/bodily one, it became naturally “recruited” to the task. The supposition that this took place from the emergence of H. erectus at 1.5 mya to H. sapiens at 0.2 mya gives more than sufficient time for necessary biological adaptations necessary for increased vocal control to take place. The answer is consistent with evidence for bodily mimesis summarized earlier and with the increasing evidence for the partial non-arbitrariness of speech (Ahlner and Zlatev 2010).
4 Conclusions
This chapter reviewed some of the confirming evidence for the bodily mimesis hypothesis, much of which can be also brought in favor of gesture-first theories of language origins. Unlike some recent and well-known proposals of a “gestural protolanguage,” however, bodily mimesis is both a more general adaptation, since it concerns the volitional use of the body for other means than gestural communication as well, and less language-like. Hence, it was argued that it fares much better against the argument typically bought against gesture-first theories: How to explain the switch from a gestural (proto) language to a spoken one. It does so since (a) it emphasizes the non-conventionality and non-systematicity of bodily mimetic signaling, (b) it rejects the notion of a switch and instead posits a long biocultural spiral of conventionalization and adaptation for speech, and (c) it insists that the “transition,” which is possibly the wrong word, should be seen as only partial, given all the evidence for the adaptive role of gesture in language development and face-to-face communication.
What Brown’s theorizing and evidence add to this is a cognitive–semiotic explanation for why speech has during this process taken an increasingly higher communicative load: Bodily movement and vocalization do not differ in their capacity to represent meaning purely conventionally, but vocalization is intrinsically less capable of doing so iconically. Given a multimodal gestural–vocal communicative signal, the vocal element is bound to be less iconic than the gestural and thus to differentiate more clearly between an extensive set of concepts, even when their referents are visually similar.
In sum, the transition from communication based on bodily mimesis to relatively “arbitrary” speech was made possible by the multimodal character of human communication, through a prolonged process of increased articulation and conventionalization, but without language cutting off its bodily roots.
References
Alhner F, Zlatev J (2010) Cross-modal iconicity: a cognitive semiotic approach to sound symbolism. Sign Syst Stud 38(1/4):298–348
Andrén M (2010) Children’s gestures between 18 and 30 months. Media Tryck, Lund
Arbib M (2003) The evolving mirror system: a neural basis for language readiness. In: Christiansen M, Kirby S (eds) Language evolution. Oxford University Press, Oxford, pp 182–200
Arbib M (2005) From monkey-like action recognition to human language: an evolutionary framework for neurolinguistics. Behav Brain Sci 28:105–168
Bates E, Benigni L, Bretherton I, Camioni L, Volterra V (1979) The emergence of symbols: cognition and communication in infancy. Academic Press, New York
Brown JE (2012) The evolution of symbolic communication: an embodied perspective. PhD thesis. University of Edinburgh, Edinburgh
Burling R (2005) The talking ape. Oxford University Press, Oxford
Call J (2001) Body imitation in an enculturated orangutan (Pongo pygmaeus). Cybern Syst 32:97–119
Call J, Tomasello M (2007) The gestural communication of apes and monkeys. Lawrence Erlbaum, London
Cela-Conde CJ, Ayala FJ (2007) Human evolution, trains from the past. Oxford University Press, Oxford
Collins C (2013) Paleopoetics. The evolution of the literary imagination. Columbia University Press, New York
Corballis MC (2002) From hand to mouth: the origins of language. Princeton University Press, Princeton
Corballis MC (2003) From hand to mouth: the gestural origins of language. In: Christiansen M, Kirby S (eds) Language evolution. Oxford University Press, Oxford, pp 201–218
Custance D, Whiten A, Bard K (1995) Can young chimpanzees (Pan troglodytes) imitate arbitrary actions? Behav 132:837–859
Donald M (1991) Origins of the modern mind: three stages in the evolution of human culture. Harvard University Press, Cambridge
Donald M (2001) A mind so rare: the evolution of human consciousness. Norton, New York
Fay N, Arbib M, Garrod D (2013) How to bootstrapp a human communication system. Cogn Sci 37(7):1356–1367
Fitch WT (2010) The evolution of language. Cambridge University Press, Cambridge
Galantucci B, Garrod S (2010) Experimental semiotics: a new approach for studying the emergence and the evolution of human communication. Interact Stud 11:1–13
Goldin-Meadow S (1998) The development of gesture and speech as an integrated system. Jossey-Bass, San Francisco
Gómez-Olivencia A, Carretero MJ, Arsuaga L, Rodríguez-García JL, García-González R, Martínez I (2007) Metric and morphological study of the upper cervical spine from the Sima de los Huesos site (Sierra de Atapuerca, Burgos, Spain). J Hum Evol 5:6–25
Hewes G (1973) Primate communication and the gestural origins of language. Curr Anthropol 14:5–24
Hurely S, Chater N (2005) Perspectives on imitation. From neuroscience to social science, vol I & II. MIT Press, Cambridge
Hurford JR (2004) Language beyond our grasp. In: Oller K, Griebel U, Plunkett K (eds) Evolution of communication systems: a comparative approach. Cambridge University Press, Cambridge, pp 297–313
Iacoboni M (2008) Mirroring people: the new science of how we connect with others. Farrar, Straus & Giroux, New York
Jakobson R (1965) Quest for the essence of language. Diogenes 13:21–38
Kendon A (2004) Gesture: visible action as utterance. Cambridge University Press, Cambridge
Kendon A (2008) Signs for language origins? Pub J Semiot 2(1):2–27
Kita S, Özyurek A (2003) What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. J Mem Lang 48:16–32
Liszkowski U, Brown P, Callaghan T, Takida A, de Voc C (2012) A prelinguistic gestural universal of human communication. Cogn Sci 1–16
Lock A, Zukow-Goldring P (2012) Preverbal communication. In: Bremner J, Wachs T (eds) The Wiley-Blackwell handbook of infant development. Willey-Blackwell, Oxford, pp 394–425
MacLarnon AM, Hewitt GP (1999) The evolution of human speech: the role of enhanced breathing control. Am J Phys Anthropol 109:341–363
McNeill D (1985) So you think gestures are nonverbal? Psychol Rev 92:350–371
McNeill D (2005) Gesture and thought. University of Chicago Press, Chicago
McNeill D (2012) How language began: gesture and speech in human evolution. Cambridge University Press, Cambridge
Murillo E, Belinchón M (2012) Gestural-vocal coordination. Gesture 12(1):16–39
Piaget J (1962) Play, dreams, and imitation in childhood. Norton, New York
Pika S (2008) What is the nature of gestural communication in great apes? In: Zlatev J, Racine T, Sinha C, Itkonen E (eds) The shared mind: perspectives on intersubjectivity. Benjamins, Amsterdam, pp 165–186
Rizzolatti G, Arbib M (1998) Language within our grasp. Trends Neurosci 362:188–194
Sandler W (2012) Dedicated gestures, and the emergence of sign language. Gesture 12(3):265–307
Senghas R, Senghas A, Pyers J (2005) The emergence of Nicaraguan sign language: questions of development, acquisition and evolution. In: Langer J, Parker S, Milbrath C (eds) Biology and knowledge revisited: from neurogenesis to psychogenesis. Lawrence Erlbaum, Mahwah, pp 287–306
Sikveland RO, Ogden R (2012) Holding gestures across turns. Gesture 12(2):166–199
Slocombe K, Zuberbuehler K (2005) Functionally referential communication in a chimpanzee. Curr Biol 15:1779–1784
Sonesson G (1989) Pictorial concepts. Inquiries into the semiotic heritage and its relevance for the analysis of the visual world. Aris/Lund University Press, Lund
Sonesson G (1997) The ecological foundations of iconicity. In: Rauch I, Carr GF (eds) Semiotics around the world: synthesis in diversity. Mouton de Gruyter, Berlin, pp 739–742
Streek J (2009) Gesturecraft: the manufacture of meaning. Benjamins, Amsterdam
Tallis R (2011) Aping mankind. Neuromania, darwinitis and the misrepresentation of humanity. Acumen, Durham
Tomasello M (1999) The cultural origins of human cognition. Harvard University Press, Cambridge
Tomasello M (2008) The origins of human communication. MIT Press, Cambridge
Vaesen K (2012) The cognitive bases of human tool use. Behav Brain Sci 35:203–262
Volterra V, Caselli M, Caprici O, Pizzuto E (2005) Gesture and the emergence and development of language. In: Tomasello M, Slobin D (eds) Beyond nature-nurture: essays in honor of Elisabeth Bates. Lawrence Erlbaum, Mahwah, pp 3–40
Weiner S, Xu Q, Goldberg P, Liu J, Bar-Yosef O (1998) Evidence for the use of fire at Zhoukoudian, China. Science 281:251–253
Wilson SM, Saygin AP, Sereno MI, Iacoboni M (2004) Listening to speech activates motor areas involved in speech production. Nat Neurosci 7(7):701–712
Zlatev J (2007) Intersubjectivity, mimetic schemas and the emergence of language. Intellectica 46(2–3):123–152
Zlatev J (2008a) The coevolution of intersubjectivity and bodily mimesis. In: Zlatev J, Racine T, Sinha C, Itkonen E (eds) The shared mind: Perspectives on intersubjectivity. Benjamins, Amsterdam, pp 215–244
Zlatev J (2008b) From proto-mimesis to language: evidence from primatology and social neuroscience. J Physiol—Paris 102:137–152
Zlatev J (2012) Cognitive semiotics: an emerging field for the transdisciplinary study of meaning. Pub J Semiot 4(1):2–24
Zlatev J (2013) The mimesis hierarchy of semiotic development: five stages of intersubjectivity in children. Pub J Semiot 4(2):47–70
Zlatev J, Andrén M (2009) Stages and transitions in children’s semiotic development. In: Zlatev J, Andrén M, Johansson-Falck M, Lundmark C (eds) Studies in language and cognition. Cambridge Scholars, Newcastle, pp 380–401
Zlatev J, Racine T, Sinha C, Itkonen E (2008) The shared mind: perspectives on intersubjectivity. Benjamins, Amsterdam
Zlatev J, Donald M, Sonesson G (2010) From body to mouth (and body). In: Smith A, Schouwstra M, deBoer B, Smith K (eds) The evolution of language. World Scientific, London, pp 527–528
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Zlatev, J. (2014). Bodily Mimesis and the Transition to Speech. In: Pina, M., Gontier, N. (eds) The Evolution of Social Communication in Primates. Interdisciplinary Evolution Research, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-02669-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-02669-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02668-8
Online ISBN: 978-3-319-02669-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)