Trackways Reading and the Evolution of Symbolic Communication

Our ancestors were the only creatures on this planet that ever acquired the cognitive capacities needed to glean the remarkable amounts of socio-ecological information contained in trackways, which are trails of footprints and other traces created by the activities of animals in terrestrial environments. I make the strong claim herein that if we had not begun stepping in each other’s (and our own) old footprints to maintain personal bipedal safety when we became obligate bipeds around 4 million years ago, we would never have acquired the cognitive capacities required to “read” trackways or comprehend and invent the symbolic sign systems of gestural and spoken languages. I will present this argument in the form of a positive model rather than attempt to critique the relevant literature.

It adheres to a pragmatic view of language (see Scott-Phillips (2014) and Francesco Ferretti (2014) for recent reviews). That is, the symbolic signs of language are better seen as clues, not codes, for they are extremely indeterminate or ambiguous when taken out of the narrative context of their overtly intentional “ostensive-inferential” usage. “Overtly intentional” is synonymous with “consciously intentional,” and means the producer intentionally makes it clear or ostensive (which means “to show”) in some way that she is aiming to communicate so the listener can come to know this, and thus begin to equally intentionally try to interpret or infer, to get the gist of the propositional argument, plan, or story the producer is presenting. In short, as Francesco Ferretti and his colleagues (Ferretti 2014; Ferretti et al. 2017) point out, all human discourse takes the socio-psychological form of a mutually intentional cooperative journey, and my evolution-of-language (or evolanguage) model dovetails nicely with their view.

Words and sentences, then, are just one part of the presentation of various clues, which often includes mimetic body language, depictive gestures, and expressive vocal tonalities. The content (meaning or “aboutness”) of the discourse is based on sharing perceptual here-and-now contact plus common elsewhere-and-when background knowledge (which includes knowledge of the conventional meaning of code-like symbols, when available) made up of both semantic and episodic or personally experienced memories and/or episodic future simulations in the minds of both producer(s) and interpreter(s). For example, if a friend and I exchange glances when a mutual acquaintance in the gathering we are part of begins to relate an often-repeated story, there is an extra level of mind reading going on between us, but the information intentionally transmitted depends on both of us personally remembering all the times we’ve heard that story before—which requires mental time travel.

I am therefore not convinced by evolanguage theories claiming all that needed to evolve in our originally ape-like cognitive makeupFootnote 1 before protolanguages could begin to emerge was an extra level of theory of mind (Scott-Phillips 2014; Tomasello 2009, 2014). In short, I concur with theorists that maintain some mental space-and-time-travelingFootnote 2 capacity was initially required as well (Corballis 2014, 2017; Ferretti 2014; Suddendorf 2013). In their view the two capacities are as inextricably linked as the sides of a coin, and I agree; but I part company with them when it comes to their versions of what triggered the genetic/cultural coevolution of these fundamental capacities. In my evolanguage model they are manifestations of the evolution of an unprecedented cognitive shift triggered by our Mid-Pliocene entry into the unique cultural niche of social trackways reiteration—so the first step in my model’s construction must explain why only our great ape lineage entered this niche.

The first portion of this article therefore presents a phylogenetic narrative of our earliest origins that explains why only our ancestors became non-tree-nesting obligate bipeds, because that is basally why they began to step in each other’s footprints, to maintain bipedal safety. We then briefly revisit my social trackways theory (Shaw-Williams 2014), which explains how entering this unique cultural niche triggered the incremental evolution of an autobiographical or narrative self-awareness,Footnote 3 a sense of ourselves (and therefore others) as agents traveling from the past into the future. This new kind of metarepresentational self-awareness immediately began to affect all domains, through overtly intentional imaginary self-projection (consider the what if pretend play of children, built around selective imitation, or pretending to be in the bodies and minds of absentee adult experts), and functionally manifested as extra levels of theory of mind and mental space-and-time-travel capacities.

In that earlier article, I discussed how these new cognitive capacities enabled fast-track instrumental (showing how) intentional teaching and non-associative social learning through auto-rehearsal, as evidenced in our deep past by stone toolmaking and the carving of large mammals. Here we are interested in how these new cognitive capacities enabled overtly intentional exploratory navigation and communication of elsewhere-and-when or displaced socio-ecological information. Hence the second portion of this article begins by discussing intentional navigation (orienteering, in modern parlance), which enabled far more exploratory extractive foraging/provisioning excursions. I then argue that as our ancestors’ self-projecting cognitive capacities for planned future “missions” expanded, they began to use depictive trail markers to decrease the cognitive load on memory, using conventional or unnatural signs.

In the next section I suggest simple elsewhere-and-when participant/event narratives experienced by more exploratory band members began to be reenacted when back in the safety of the band, using increasingly conventional mimetic/gestural/depictive signs and mimetic vocalizations. Of course, the early existence of these trail markers and mimetic/gestural protolanguages is not demonstrable through direct evidence. My aim here is to supply indirectly supportive evidence as we follow the chronology of the model through these earlier portions, thus creating a narrative package (Currie and Sterelny 2017) which achieves plausibility because it is coherent with all the available evidence, both old and new. I will then summarize the model before concluding.

Towards a New Evolutionary Narrative of Human Origins

Assuming our earliest ancestors were very much like our closest genetic cousins, chimpanzees and bonobos, is a theoretically fatal error, for three reasons. Firstly, the question of why no other ape lineage went down the same cognitive evolutionary trail to language acquisition remains very difficult to answer. Secondly, it constrains the beginnings of our evolutionary narrative to a social intelligence and navigational ranging pattern based largely on competitive foraging for limited resources and travelling between certain idiosyncratically fruiting fig trees and other trees suitable for nesting in. Thirdly, recent paleo-evidence (Lovejoy et al. 2009a, b; Sayers et al. 2008, 2012) strongly indicates we did not evolve from suspensory climbing, mainly fruit-eating apes like chimpanzees and bonobos—conversely, they are secondarily derived from our ancient omnivorous stem lineage of Ardipithecans, beginning to split off around 8–7 mya.

Recent evidence also indicates our Late Miocene/Early Pliocene lineage possessed a different sociality and ecological foraging style from our extinct Pliocene ape relatives—in short, I think we should reject 20th-century standard evolutionary textbook (SET) phylogenetic models that assume the Pliocene anamensis/afarensis australopithecines were the direct ancestors to Pleistocene Homo (see Fig. 1). We now know they were proficient suspensory climbers, with long hooked fingers and very Gorilla-like shoulder blades (Alemseged et al. 2006; Green and Alemseged 2012). Their somewhat bipedal morphology developed because they became habitually wading sedge harvesters. This idea is robustly supported by Wrangham et al. (2009), who point out that modern apes become bipedal when wading in waterways to obtain sedge corms and rhizomes. However, like other SET theorists they still assume australopithecines were our direct ancestors, mainly because of their highly reduced caninesFootnote 4 and merely facultative bipedal morphology.

Fig. 1
figure 1

These phylogenetic models do not include two other recently discovered Late Pliocene/Early Pleistocene Australopithecus sp. due to lack of room. Such rapid diversification is typical of herbivore lineages (Price et al. 2012). Notice that in the SET model the single Middle Pliocene omnivorous species Kenyanthropus is treated as a dead end, and the early 5.9 to 5.5 myaAustralopithecus-type jaw and molars are ignored

Most paleo-theorists agree all the australopithecines were very sexually dimorphic in body size, and remained arboreal suspensory climbers. I argue they possessed these traits because they were a more recent,Footnote 5 secondarily derived branch of the equally herbivorous, suspensory climbing, and sexually dimorphic Gorilla clade that split off from our stem lineage around 11 to 10 mya. The upshot is they possessed a sociality and foraging behavior that was very Gorilla-like as well, which is why, unlike our Early Pliocene Ardipithecan ancestors, they were never goaded by environmental change into becoming fully obligate, non-tree-nesting bipeds. The copious paleo-evidence that supports this unorthodox but theoretically plausible trophic niche-partitioning model of human origins is presented elsewhere (Shaw-Williams forthcoming). Space for reiteration is limited here, so we will turn to our relatively undisputed Late Miocene/Early Pliocene Ardipithecan ancestors (Haile-Selassie et al. 2016).

Old and new Mid-Miocene 17–9.5 mya evidence clearly indicates the 7.0–4.4 mya ArdipithecansFootnote 6 were survivors of an extremely ancient primate lineage that became large bipedal-wading apes because they were omnivorous “shore-line” foragers.Footnote 7 For a large wading ape with sensitive fingertips and a reasonably good precision grip, feeling out and capturing shallow water fauna is easy, and at least two major advantages are gained through habitual shoreline foraging. Firstly, highly nutritious aquatic/semiaquatic small fauna, fruiting bushes, and short fig trees, plus readily available aquatic flora (including the sweet corms of sedges) are present in staggering numbers in and around open wetlands—in fact all resources are richer than in any other ecological zone (Laden and Wrangham 2005). Secondly, large carnivoresFootnote 8 tend not to hunt in such areas because water wipes out scent trails, visibility is poor in tall grasses and shrubbery, and silent stalking extremely difficult.

The upshot is our Mid-to-Late Miocene omnivorous ancestors never became suspensory climbers like all their secondarily derived, frugivorous/folivorous/sedgivorous great ape descendants. Although the Late Miocene/Early Pliocene Ardipithecans were facultative bipeds, the 4.4 mya female “Ardi” possessed chimpanzee-like feet, with equally abductable but more robust halluces (big toes). In the adducted position, they enabled a strong push-off for bipedal wading locomotion, and when abducted provided good support on swampy shoreline substrates and subvertical riparian trees. The still abductable toe indicates Ardipithecans were tree nesters, but did not have the short weak thumbs and greatly elongated hook-like fingers of extant apes—that is, they remained “above branch clamberers” (Lovejoy et al. 2009a, b; White et al. 2015) like their omnivorous Mid-Miocene ancestors. Consequently, they never became quadrupedal knuckle-walkers, so kept their pliable wrists and larger thumbs.Footnote 9 Therefore, besides being more adept at using simple tools and capturing small fauna, they could have carried more resources for longer distances than chimpanzees when wading—so they would have made good provisioners.

I think the sexual monomorphism and omnivorous trophic niche of the Ardipithecans indicates they were alloparenting social breeders, like another ancient stem species of extant primates, the omnivorous North African macaque, Macacus sylvanus or “Barbary ape” (so-named because it has only a vestigial tail). Barbary macaque alloparenting entails intimate long-term care (several hours a day) and transportation (on their backs) of a favorite infant, which may be kin or non-kin, by all older juveniles and adults who are not mothers tending their own infants, and infanticide does not occur (Small 1990). In both Barbary and Tibetan macaques all older band members use infants for tripartite “bridging,” that is, two adults simultaneously fondling the same infant, or a third male will hand an infant familiar to one of two males who are getting aggressive with each other. Subdominant males will hand the alpha male’s infant-familiar to him if he becomes aggressive towards them—in short, infants are used as “agonistic buffers” (Briana 2014). Significantly, like our Late Miocene lineage’s secondarily derived Pan descendants,Footnote 10 younger Asian macaque descendants of sylvanus that are far more frugivorous have lost this social-breeding/alloparenting behavior,Footnote 11 and infanticide does occur (Maestripieri 1998).

Social breeders are smarter and more adaptable to environmental change, due to a greater window for developmental learning from many socially tolerant adults, not just the mother (Burkart et al. 2009). Hence Ardipithecan socio-cognitive learning capacity was arguably greater than that of chimpanzees and bonobos, thus providing a social intelligence platform far more conducive to our ensuing predominately cognitive evolution—and our Mid-Pliocene entry into the social trackways-reiterating niche. Now we will turn to foraging behaviors integral to our Late Miocene omnivorous, alloparenting lifeways. Importantly, the logistics of bipedal wading would suggest Ardipithecine adults were provisioning alloparenting group members and mothers looking after the young back on shoreFootnote 12: why carry noisy infants out into the water when foraging for wary aquatic/semiaquatic fauna?

Large catfish are easily caught by large bipedal-wading apes with precision-grip hands—in the Southern US this is called “noodling,” and even teenage girls can catch extremely large catfish.Footnote 13 Furthermore, the Ardipithecus ramidus fossils were surrounded by large numbers of catfish skulls (WoldeGabriel et al. 2009) on a paleo-floodplain. This idea of very early provisioning gains further support because the only other nonhuman primates known to directly provision their young are social breeders and faunivores (for example, marmosets and tamarins). Besides, socially tolerant alloparenting primates like Barbary macaques exhibit passive provisioning—infants can often take food from their adult male minders. Bonobo males also allow infants to take their food, and even chimpanzee males will begrudgingly share meat with fellow hunters of colobus monkeys, as well as with females who are in estrus, through “tolerated theft.” The point here is alloparenting/passive-provisioning behavior would engender more complex socio-ecological navigation than exhibited by our ape descendants, even if provisioning was not yet overtly or “consciously” intentional.

Equally importantly, omnivores are inherently wider ranging (and less numerous, like carnivores) than frugivores or herbivores (Kuhn et al. 2016). They are often seminomadic, due to their abiding interest in all aquatic, semiaquatic, and terrestrial small fauna, many of which are highly mobile as well as only seasonally abundant (for example, spawning catfish, migrating birds). So, when times became tough in any locale our omnivorous Pliocene ancestors could often survive by expanding their ranges as well as their dietary choices—or shift altogether to friendlier environments. We now know that by the end of the Pliocene (officially 2.58 mya) some populations of our ancestors had already left Africa altogether, for simple stone tools and cut-marked bones from large mammals found last century in the upland watersheds of Northwest India have recently been reviewed and reliably dated to before 2.6 mya (Tudryn et al. 2016).

Additionally, since the 1.8 mya Dmanisi Homo skulls have smaller brain cases compared to 1.9 mya Homo erectus skulls found in Africa, they probably represent a much earlier colonization of Eurasia by less encephalized 2.3 mya Homo habilis or Homo rudolfensis. This is evidence of distinctive, exploratory travel capacities, for there is no evidence of chronologically corresponding “migration waves” of other African species just prior to 2.6 mya—that is, our ancestors were not just following their accustomed prey (O’Regan et al. 2011; Palombo 2013). There is also more support for the argument that Homo floresiensis were survivors of a very early coastal colonization of Southeast Asia by Homo habilis,Footnote 14 now that signs of their presence on the Island of Flores have been put back to around 700 kya (Brumm et al. 2016; Van Den Bergh et al. 2016).

The point here is that our Late Pliocene ancestors were already cognitively capable of intentionally shifting into novel landscapes and ecologies. In addition, there is the new evidence of flaked stone tools dated 3.3 mya (Harmand et al. 2015). They were found where the flat-faced skull and omnivorous Homo-like molars of 3.4 mya Kenyanthropus platyops were discovered (Leakey et al. 2001, 2012)—significantly, contemporary Australopithecus afarensis fossils have never been found there. And first evidence of the Homo lineage (a jaw portion with molars) is now dated back to the Late Pliocene, at 2.8 mya (Villmoare et al. 2015).

Last but certainly not least, there is the headless partial skeleton of 3.6 mya “Big Man,” discovered near numerous fossils of a group of Aust. afarensis individuals (Haile-Selassie et al. 2010). It was in a different fault zone, so was not an associated fossil, and possesses an extremely human-like shoulder (Green and Alemseged 2012) and torso, as well as long legs. The lack of craniodental data and its shoulder alone should make it impossible to assign it to the afarensis lineage, as was so adamantly done by its discoverers. I argue the morphology of Big Man belongs to an obligate biped—that is, one of our fully bipedal Mid-Pliocene ancestors that made the Laetolian fossilized footprint trails. Furthermore, his human-like morphological traits would have enabled accurate throwing and clubbing, a powerful precision grip and long-distance transport.

If we add all this new evidence to the 3.4 mya cut-marked bones from a cow-sized mammal (McPherron et al. 2010, 2011) then my earlier claims (Shaw-Williams 2011, 2014) concerning our “Laetolian” ancestors have been vindicated. These claims were: (1) some very H. habilis-like toolmaker, a non-tree-nesting, seminomadic forager could have been extant in East Africa by 3.5 mya; (2) they were probably provisioning/alloparenting, omnivorous waterside foragers in and around upland wetlands and waterways like their Ardipithecan forbears; and (3) contemporary Australopithecus afarensis, who was clearly a Gorilla-like arboreal suspensory climber (Alemseged et al. 2006), were not the authors of the Laetoli Trackways. Yet SET theorists continue to adamantly state they belong to afarensis individuals, and cite this evidence to support their claim that they were direct ancestors to our Pleistocene Homo ancestors (e.g., Bennett et al. 2016).

The upshot is theories dealing with human evolution have been overly fixated on Oldowan culture, which began 2.6 mya. We must start looking at the Mid-Pliocene, a million years earlier, and recognize the extreme cognitive importance of becoming obligate bipeds. Other theorists have mentioned the cognitive salience of Oldowan long-distance transport of resources and tools (Jeffares 2014), claiming an associated embodied capacity for representing longer action sequences in the mind could trigger the evolution of mental time travel (Osvath and Gärdenfors 2007; Suddendorf 2013). In my model these embodied cognitive effects were already part of our much earlier facultative bipedal wading and alloparenting/passive provisioning behavior, and as such enabled the associative top-down executive control required to enter the social trackways reiterating niche—in short, they were necessary pre-adaptive cognitive traits but not sufficient for triggering the evolution of mental space and time travel.

I argue that what made the crucial cognitive difference was a new embodied vulnerability: for an obligate, non-tree-nesting biped, loss of the use of one limb threatens survival—so during every step forwards (or backwards) where one places each foot is something to be continually concerned about. This concern would inevitably trigger the evolution of more physical self-awareness, more top-down executive control—which is amply evidenced by the 3.66 mya Laetoli Trackways, as we shall see. To sum up: our facultatively bipedal ancestors had always been omnivorous waterside foragers, and were probably alloparenting social-breeders and passive provisioners long before 4 mya. Those omnivorous ancestors could only preserve their alloparenting, passive provisioning lifeways in the face of chronic and catastrophic Mid-Pliocene environmental changes by becoming non-tree-nesting obligate bipeds, in reaction to new socio-ecological navigation demands.

The Environmental Triggering of New Socio-Ecological Navigation Demands

The environmental mayhem that caused our Ardipithecan ancestors to become wide-ranging obligate bipeds began at 4.3 mya, when a super-volcano on the Ethiopian Heights spewed out enough volcanic ash to kill all standing trees throughout the Northern and Central Rift by covering their roots to a depth of 1.5 meters, therefore smothering them (WoldeGabriel et al. 2000). Such thick carpets of permeable ash create permanent savannahs because the roots of seedlings of water-dependent evergreen trees on higher ground can never reach the water table again (Barboni 2014). Only riparian trees bordering fluvial waterways can regrow, due to ash removal by water. Furthermore, volcanism remained chronic until another equally catastrophic event at 3.9 mya, and by then the predominately savannah landscape we see today was fully established.

A great deal of biogeographical evidence reflects this relatively sudden switch to open wetland/savannah ecologies. The high percentage of closed forest species present before 4.3 mya had decreased immensely by 3.9–3.6 mya (Bobe et al. 2007). There is also the 3.9–3.5 mya influx of social breeding, pack-hunting canids from Asia (Rolland et al. 2015), the diversification of resident African carnivores into similarly social breeding, pack-hunting lions and hyenas, plus the extinction of many species of frugivorous colobine monkeys and diversification into terrestrial cercopithecids such as omnivorous baboons and sedge-eating giant geladas (Macho 2014). Even better evidence is the Upper Laetolian Beds dated 3.66 mya that recorded the trackways of innumerable animals as well as our ancestors: they show the upland wetland/savannah ecology of East Africa was basically the same as now (Leakey 1981; Leakey and Harris 1987).

Plants like papyrus sedges are remarkably good at surviving minor onslaughts of volcanic ash, because they get all the air they need from their hollow stems. Also, in open savannah wetlands sedges are extremely prolific, and are the most productive, fastest growing family of plants on earth (Wrangham et al. 2009), so like folivorous Gorilla the australopithecines never had to move very far from their riparian nesting trees to fill their bellies. They consequently became extremely prolific after 4.2 mya, and diversified into several new species (similarly to contemporary bovine herbivores). Omnivores have the lowest diversification and extinction rates (carnivores take the middle position), because they are buffered from environmental change by their capacity to adjust their dietary sources (Price et al. 2012). Significantly, australopithecines became extinct by the Mid-Pleistocene, unlike our ancestors.

Conversely, smaller, less mobile terrestrial and semiaquatic animals are wiped out, due to lung damage and smothering, even by such minor ash falls as those that are indelibly recorded by the Laetoli Trackways. Small aquatic animals do not fare any better, for shallow and placid waterways quickly become completely anaerobic due to microfaunal die-off and rotting detritus from dead plants. So, for our waterside-foraging 4.3 mya Ardipithecan omnivores the only option was to give up tree-nesting altogether and become far more exploratory, ranging widely from wetland to wetland across patches of relatively treeless savannah. I argue they could do this because they were already spending a great deal of time in the bipedal stance when foraging in waterways, and all that was required for full terrestrial bipedalism was the permanent adduction of their already robust big toe and the beginnings of a stiff and spring-like longitudinal arch (we modern humans are born with flat feet; in quite a few adults fully functional arches never develop).

After 40 years of controversy recent work has shown the Laetolians had a longitudinal arch and extended-leg, heel-first gaitFootnote 15(Crompton et al. 2012). In short, the Laetoli Trackways prove that 600,000 years after the 4.3 mya super-volcano, our ancient omnivorous lineage of apes were fully obligate bipeds who were cognitively capable of continually stepping quite exactly in each other’s footprints. The flat, waterside terrain they were traversing was covered in 10 cms of volcanic ash, so our pre-Homo Mid- to Late Pliocene ancestors were still dealing with less major but chronic volcanically forced environmental effects. Meanwhile, overall aridity and seasonality was further exacerbated by the buildup of Northern Hemisphere glaciationFootnote 16 that culminated in a climate crash just after 3.0 mya. We now know the 3.4 mya Kenyanthropians had culturally adapted by making sharp stone flakes used to carve up large mammals; and my bet is frequent opportunities for this behavior were created when large quadrupeds became mired in natural traps of quicksand created by highly permeable volcanic ash deposits washed into wetland basins.

Furthermore, the stone and bone evidence in Northwest India mentioned earlier shows the combination of chronic volcanic mayhem, Northern Hemisphere glaciation, and associated lower sea levels must have encouraged exploratory migrations of some of our Late Pliocene ancestors into Eurasia well before 2.6 mya, which marks the beginning of the Pleistocene and Oldowan culture in East Africa. To find out why they were already cognitively capable of such cultural adaptations to environmental change, we must revisit the Mid-Pliocene Laetolians and discuss my social trackways theory, but for reasons of space I will be brief.

The Social Trackways Theory

Instead of being psychologically constrained to the here-and-now, humans have the unique ability to disengage from the external world and turn our thoughts inwards, to that which we find personally-significant. Through mental simulation of our past, future, and the minds of others, we travel far beyond the observable…. (Andrews-Hanna 2012, p. 12)

In a nutshell, then, the social trackways theory explains how the evolution of this “unique ability” was triggered. All other terrestrial animals use scent trails to find unseen, unheard targets, including other conspecifics: recognizable scents are their major socio-ecological markers, used every day in every way. Originally we were shoreline foragers within closed forests, with a relatively weak sense of smell compared to other animals, but excellent visual pattern readers, like other apes. When we became non-tree-nesting obligate bipeds in and around open wetland/savannahs our noses were way above ground level, but our own and others’ readily recognizable patternsFootnote 17 of footprints were more easily seen, and were ubiquitously recorded on shoreline beaches—and so became our unique social markers.

Trackways and footprints are depictive and indexical, far more decoupled from their referent than scent-trails. Scents are iconical and highly coupled, for they are just “chemical bits” of their authors left behind, and only last a couple of hours or so—which is why dogs so obsessively urinate on scent posts. Dogs can recognize their own scent (Bekoff 2001) but still get lost in unfamiliar environments, since they are not cognitively geared to follow any scent-trail that gets weaker. Furthermore, any scent is immediately wiped out by another stronger scent, a bit of rain, or just heavy dew. The upshot is much more durable combinations of trackways and changing patterns of footprints have a narratively generative structure, like symbolic sign systems of gestural and spoken languages. Hence our own recognizable trackways were behaviorally self-reflecting, recording what we had been doing, very often for as much as three or four days in the past, in reasonably settled weather conditions.

Therefore, due to daily embodied reiteration of their own and other conspecific trackways for bipedal safety and as recognizable wayfinding markers for socio-ecological navigation, incrementally our Mid-Pliocene ancestors began to acquire narrative minds. In other words, they were gaining a cognitive capacity for episodic personal memories and episodic future simulations, and therefore began autobiographically or narratively representing themselves as intentional agents continuously travelling from the past into the future. This unprecedented level of spatially and temporally projecting self-awareness manifested cognitively as extra theory of mind and mental space-and-time-travel capacities, which enabled intentional or conscious, top-down executive adjustment of past behaviors for the sake of achieving better ways of doing things in the future. Future-directed, more cooperative cultural adaptations began to increase fitness in all domains, thus creating powerful selective feedback for further entrenchment for these cognitive capacities.

To reiterate: when we became non-tree-nesting obligate bipeds, losing the use of one limb entailed extremely grave consequences. So, not only could we always see our feet pointing into the future (unlike quadrupeds), we had to be continually aware of where we were going to put them in that future. Furthermore, unlike scent trails, footprints are immediately directional: besides showing where their authors were in the past, they “point” to where they might be in the future. Being much longer lasting than scent trails, they are only useful for quickly finding targeted agents when you can ascertain how old they are; hence the tracking mind gets continually drawn into mentally representing other contexts (such as weather conditions when the prints were made) and envelopes of time—which obviously requires a cognitive capacity for mental space and time travel.

For example, imagine following your own trail of footprints made a couple of days before when fishing for trout in a river, because there is no marked-out trail, and you got there quite easily that way last time. As you do this, the visual patterns they make will remind you of a series of episodic, personally experienced events: where you stopped to make a few casts at an uninterested big fish, where you picked a few berries, then ran into the tracks of another fisherman who had got to the next pool just before you, so you turned around and went home early, feeling very disappointed. Put together, these episodic memories will form a short but autobiographical “day in the life of” narrative in your mind—that is, while you are following your old trackways, you are imaginatively projecting your present self into the past. Today you have purposefully got to the river much earlier, the big fish was still uninterested, but in the here-and-now you can see there are no fresh tracks of other fishermen at the next pool, so you know no one has fished it since you were last there. Silently sneaking up to the river’s edge, you are feeling excited, and imagining catching a fish in the future.

So there are very natural connections between using tracks for navigation and ecological inference: recalling episodic memories of personal experiences and using them to imagine future action sequences and outcomes, to guide exploratory behavior. The other side of this cognitive coin of autobiographical, episodic personal memory and future simulation is a capacity for self-inhibition in the present for the sake of future outcomes, something human toddlers and chimpanzees find extremely difficult. Here we can see possibilities for overtly intentional social cooperation in domains such as foraging and communication, not to mention the beginnings of reciprocity in provisioning, as well as more complex technologies, for innovations in social and technicalFootnote 18 life require exploratory behavior guided by both imagination and inhibition.

The next section is focused on consciouslyFootnote 19 intentional navigation into and out of highly changeable and visually opaque wetlands, and exploratory foraging in new territories. We will then turn to the intimately related overtly intentional or “ostensive” (showing outwardly) transmission of displaced, elsewhere-and-when socio-ecological information to other minds—and the ensuing evolution of the first protolanguages, using more symbolic or conventional, culturally normative (agreed upon) signs.

Consciously Intentional Navigation or Orienteering

Orienteering is about continually figuring out where one is in the environment in relation to landmarks seen on or towards the surrounding horizon during planned excursions. The cognitive process is very dynamic, but quickly becomes automatized when we are youngsters. It requires self-projectively taking, in a continuous turn-and-turnabout manner, the first-party agential and third-party observational perspectives (Paul 2017), thus maintaining a “bird’s eye” global perspective on one’s progress through any landscape during the time envelope of any intentional journey. The switching agential and observational perspectives are also reiteratively mentally projected backwards and forwards between the imagined end point and the remembered starting point.

Basically, we “pretend” to be on a mountain top or tall building looking—that is, taking a line of sight—back to where we are now; then we imagine being in a position further along the way, and what that landmark will look like from our agential perspective when we get to that imagined third-person position in the landscape ahead of us, at some point in the future. Orienteering is wayfinding by continually switching perspectives up and down and/or sideways from imaginary third-party observational positions on visible landmarks on the lateral horizons (if any) back to where one is in the agential first-party present, physically and mentally awake and grounded in the subjectively embodied or sentient perspective of the self. In modern orienteering parlance, the former is called allometric reckoning and the latter egocentric reckoning.

This is always the cognitive modus operandi when one is not following a previously marked-out trail, and can be very stressful when lost, or just not quite sure where one is—especially if night is falling. This is an important insight we will discuss further in the next section. Here the take-home message is that consciously intentional navigation would not be possible without some capacity for spatial and temporal imaginary self-projection—that is, mental space and time travel. Orienteering skills are especially crucial when traversing places where we must temporarily change our direction of travel, due to barriers such as unfordable rivers and unclimbable cliffs. In vast featureless flatlands with no visible landmarks on the far horizon, or in visually opaque environments covered in more than head-high identical vegetation (like papyrus sedges), one can only use the sun as a shifting directional landmark to maintain travel in a reasonably straight direction.

Importantly, if the sun is not visible due to heavy clouds or fog, in such environments even for modern hunters (when hunting ducks, for example) one’s own old footprints naturally become trail markers when homeward bound—if you get lost you can at least backtrack to where you started. I think this visual opaqueness of vast wetlands is the other major reason (besides maintaining bipedal safety) why reiterating our own and other band members’ recognizable footprints to and from home base became so habitual for our earliest fully bipedal, wetland-foraging ancestors. We need to remember here that our Mid-Pliocene forbears were at the very beginning of their cognitive evolutionary journey towards our modern brains and minds. They did not yet possess enough neural storage for topographical episodic memories, so could not find their way into new or recently transformed environments (wetlands and waterways are very changeable due to fluctuations in water levels) and back out again without continually reiterating their own recognizable trackways.

Taking notice of their own and other conspecifics’ trails of footprints became a major part of their culture, their daily lifeways, for two other reasons: firstly, being highly social, alloparenting primates, they were always interested in where band members (and strangers) were and what they were up to; secondly, doing so enabled optimal extractive foraging in previously unharvested territory—simply because their old footprints could show them where they (or someone else, as in my fishing story) had already been. However, this daily practice triggered the evolution of mental space and time travel, an incrementally increasing, overtly intentional capacity to explore new territory and find one’s way back home, then remember how to get back to newfound rich resource sites, thus incurring powerful selective feedback.

Since this capacity for more efficient foraging depended on episodic memories and future simulations of topographical and ecological features encountered during exploratory journeys, there was selection for more neural storage—that is, encephalization, a subject we will turn to later when we discuss evidence for our model. In the meantime, there was a simple way to decrease the cognitive load on memory during foraging excursions.

Emergence of Conventional Trail-Marking Signs

Any environment covered in thick, unchanging vegetation can become hard to recognize when returning later in the day, due to different angles and intensity of light and shadow as the sun changes position in the sky, or a sudden switch to cloudy conditions. In addition, one’s outgoing footprints can be wiped out by rising water levels or sandstorms, or not detectable on rocky substrates. Returning home at nightfall carrying bloody meat that is highly attractive to large predators is not a time in any foraging/provisioning excursion when one can afford to get temporarily lost, so modern hunters and gatherers will very often mark their trail in some way on their outward journey, to make homeward travel less cognitively demanding. And permanent trail marking is the norm in difficult country (Gatty 1998).

Trail marking, like orienteering, is the result of an introspective ostensive (consciously intentional) discourse between one’s present self (first-party/agential perspective) and elsewhere-and-when future self (third-party/observational perspective)—as well as the imagined future agential selves of other band members. This trail-marking behavior was probably reinforced by lucky accidents (for example, our own footprints) early in our cognitive evolution, but to be truly useful, the markers must be deliberately positioned to be visible from everyone’s outgoing and returning agential perspective. I think this cultural behavior of marking important trails, and making cache sites (for stone tools, say) easier to find, was a natural development from using our own more ephemeral footprints to find our way to and from the safety of the band. Furthermore, shapes of the first trail marks and the way they were laid out were probably influenced by footprints and trackways.

The most ubiquitous indexical “unnatural” sign used in modern human culture is the arrow sign. Easily drawn with a stick in softer substrates, or constructed on a flat surface with three sticks or a few small stones, it is extremely similar in shape to the most common and easily discerned footprints in nature: those of all cloven-hoofed mammals. Other markers are the “I” (as in trail-marking poles) or “X”, used to mark an important position in the landscape. Making these conventional “landmarks” reduced the cognitive load on everyone’s memory (Sterelny 2003), and I think this cultural practice constituted the earliest form of overtly intentional discourse that used unnatural signs. I also suspect the invention of these conventional signs scaffolded our evolutionary trajectory towards symbolic sign languages, which are essentially conventional, “agreed-upon” depictions in the air.

Green branches of shrubs and stems of sturdy sedges bent over to point to necessary changes in direction would mark out trails in areas covered in more than head-high vegetation. In very open arid landscapes small piles of stones would suffice, with arrows made of sticks or stones indicating significant sites or directional change. Crossed sticks stuck in the ground on a natural game trail or path through thick brush could be used to indicate a no-go zone, a “trackway” of longer poles dotted across swampy terrain to signal a firm surface to walk on. These ostensive or overtly intentional signs are all geared to the context of the environment: they are an example of pragmatically coherent schematic discourse between past producers and future interpreters, using depictive signs as clues. Furthermore, as a conventional sign system they already have a simple syntactical structure, like the depictive natural sign system of footprints and trackways.

Trackways Reading and Interpreting the First Proto-Symbolic Languages

Producing and interpreting these first conventional signs required our burgeoning capacities for self-projective mind reading and mental space and time travel, and the whole cognitive process of intentionally using the natural signs of footprints and other traces to find the targeted agent that made them is based on similar self-projection into other bodies and minds in other spaces and times. A human tracker is continually simulating the targeted animal’s first-party agential perspective in the present while she is reading what the animal was up to when it was where she now is at that moment of trackway reading. Through this self-projective cognitive procedure, she can continuously read what state of mind the animal was in from the distance between footprints and the shape of the trackway. Short gaps between prints and a wandering trackway indicates stopping to forage every few paces; large gaps and a straight trackway means alarmed and moving fast—in which case it is advisable to stop tracking that animal.

I argue that the coevolution of this unique self-projective reading of each other’s trackways and production and interpretation of depictive trail-marking signs selectively scaffolded the development of our self-and-other-aware capacities for overtly intentional or ostensive discourse. Other powerful selective feedback loops were caused by increases in social complexity (co-operative punishment/ostracism of repeated offenders, for instance) when we started to live with a sense of time, of course, but there is no space here for an in-depth discussion. Our first plans and stories were probably communicated using both mimetic reenactment (like pretend play) and gestural depictive signs for indicating spatial direction and past and future tense (consider pointing, which wild apes cannot comprehend), with vocal mimicry of animal calls and intonations for emphasis, plus graphic depictions on suitable surfaces (like aboriginal sand drawings). Increasingly conventional depictions such as simple maps (consider circles for waterholes and a trail of dots resembling a hominin trackway) incrementally became gestural signs depicted in the air.

Crucially, to be able to tell a story one must have a story in mind; hence our capacity for discourse depends on our cognitive capacity to retain memories. Developmentally our semantic/episodic memory capacities for schematic/narrative discourse are physically linked to our rapidly expanding neural storage capacity when very young, and our peak of encephalization (brain volume relative to body size) occurs around four years of age. Childhood amnesia then ends as the autobiographical narrative, mimetic pretend play, and auto-rehearsal normally begins, along with the ability to relate our first coherent plans and narratives. So, if we accept that ontogeny often recapitulates phylogeny, then encephalization is reasonably good evidence for protolanguages.

At present the first evidence of encephalization is the skull of Homo rudolfensis, dated 2.3 mya. I therefore suspect the first overtly intentional trackways followers, orienteering navigators, conventional trail markers, and mimetic/depictive storytellers were the 2.8 mya Homo species that colonized Northwest India before 2.6 mya, and I predict that if we ever find any of their skulls, they will turn out to be a bit larger than that of 3.4 mya Kenyanthropus, our first stone toolmaker and carver of large mammals.

The First Overtly Intentional 2.8–2.6 Mya Hunter-Gatherers

In my model the advent of the first true hunter-gatherers with proto-symbolic communication was chronologically a million years earlier than envisaged by most other evolutionary theorists, for two reasons. The first is the Late Pliocene evidence of colonization of Northwest India combined with the fact that smaller-brained Dmanisi hominins in Georgia could sustain old-aged toothless individuals (Lordkipanidze et al. 2005). This evidence directly indicates some modern hunter-gatherer-style, overtly intentional divisions of labor and provisioning/sharing of resources were well established by the Late Pliocene immigrations. The second reason coalesces around the following two considerations.

One is the fact that modern hunters and gatherers are far more focused on learning about changes in the topography, weather patterns, and general ecology of their territories, the movements and condition of targeted species, and the psychology as well as skills of fellow band members (viewed as potential foraging partners) than developing new technologies (Lee and DeVore 1968). Secondly, the 2.3 mya evidence of encephalization is associated with very little technological advancement, since the ability to carve up large mammals and make stone tools emerged as early as 3.4 mya. What we need is more archaeological evidence of forward planning that requires mental time travel and ostensive cooperative communication, and I think there is a great deal of this very early evidence that has always constituted one of the major puzzles in paleoarchaeology.

I am referring to the numerous so-called “palimpsest sites” where there are lots of stone tools found with numerous bones of large mammals. The puzzle is this: very few (if any) of the associated bones show cut marks, while most of them bear the toothmarks of carnivores. The clue here is they were all situated in the shallow waters of shorelines and feeder streams of paleo-lakes (Plummer 2004; de la Torre 2016). Furthermore, the bones and tools are in the paleo-streambed, never on surrounding banks (Organista et al. 2017). There is a very simple explanation for these sites that strongly indicates forward planning in foraging practices: they were baiting stations for small omnivorous/carnivorous aquatic and semiaquatic fauna such as catfish,Footnote 20 freshwater crayfish, eels, small crocodiles, turtles, and cane rats. That is why the remains of carnivore kills are so prevalent—it did not matter how old and putrefied bones were, only that they were reasonably nearby and had some meat scraps left on them.

In short, unlike many other paleo-theorists I argue our Late Pliocene/Early Pleistocene ancestors were not scavenging their meat from the kills of other predators, they were using their skeletal remains for bait to attract the creatures they preferred to eat within range of their hands, clubs, and simple wooden spears. The size and lengthy chronological continuity of many of these palimpsest sites show they were frequently used, relatively permanent hubs of activity; paleo-sites where regular butchering is undoubtedly evidenced are very rare. However, there are also several “one-off” sites where very large mammals such as hippos and elephants were butchered where they were mired in swamps: their articulated still-vertical lower limbs are found in situ with stone tools nearby.

I think an easy way to cooperatively obtain large animals was to gather together in groups to intentionally herd them into known miring spots (natural traps of mud and quicksand mentioned previously) in a synchronized manner, club them to death or cut their throats when they were exhausted, then carve them up in an organized, communally self-and-other-aware fashion, like sports teams. After all, as bipedal stone-throwing, club-wielding waders we had every advantage over large quadrupedal herbivores in such situations. But I suspect that early in our evolution such practices were at first opportunistic and incrementally became overtly intentional—and prevalent only when preferred aquatic/semiaquatic resources were in short supply (after yet another fall of volcanic ash, say).

It is becoming widely accepted that our Early Pleistocene ancestors were waterside foragers obtaining very high quality shoreline faunal resources (such as shellfish, catfish, reptile and bird eggs, frogs) containing the fats and minerals necessary for healthy brain growth (Joordens et al. 2014; Stewart 2014; Russon et al. 2014: from a special issue of the Journal of Human Evolution). The only other equally rich source of these high-quality nutrients is the marrow and brains of large mammals, and there is plenty of evidence of skulls and bones smashed open with stones for their contents.

The upshot is I think our 2.8 mya ancestors were overtly intentional, seminomadic wetland/savannah explorers, with an economy built on wide-ranging extractive provisioning based on cooperative planned miring of the odd large mammal and the routine baiting of small aquatic/semiaquatic animals. This is nearly a million years before the 1.9 mya Erectines, who by then possessed brains double the size of chimpanzees, yet were still using standard Oldowan stone tools. The skull and teeth of 3.4 mya stone toolmaking Kenyanthropus are extremely similar in shape and structure to that of 2.3 mya H. rudolphensis (Lieberman 2001)in fact the only obvious difference is the expansion of the skull of the latter. In terms of technology, the only discernible difference is Oldowan stone flakes are being knapped rather than shaped using other large stones as anvils, as was the case at Lomekwi at 3.3 mya (Harmand et al. 2015).

In sum, then, I argue the extra neural storage exhibited by more encephalized 2.3 mya Homo rudolfensis was required for the ever-increasing amounts of useful socio-ecological information being gleaned from our own and other animal trackways, and all other signs, by our incrementally expanding, self-projecting schematic/narrative brains. Many of those signs were intentionally produced by other band members, and becoming increasingly unnatural or conventional. I think they incrementally became depictive hand signs in the air, probably culminating in the full establishment of symbolic sign languages by 1.6 mya, the beginning of the Acheulian era, and the full colonization of all warmer parts of the Old World. See Fig. 2 for a timeline of the major turning points in our evolutionary trajectory.

Fig. 2
figure 2

Chronology and evidence of major developments in human evolution and associated environmental changes between 11.0 and 1.6 mya. Our timeline stops at 1.5 mya because very recent forensic evidence (Hlubik et al. 2017) shows fire was being centrally exploited over longish periods at that date. This cultural turning point led to a final surge in encephalization and spoken languages

Figure 2 makes it clear that our ancient omnivorous lineage became fully bipedal, alloparenting social breeders in an incremental manner, starting from as early as 11 mya. Furthermore, our evolutionary trajectory towards exploiting our own trackways and our ensuing cognitive transition was incrementally driven by our originally omnivorous phenotypic plasticity interacting with a series of truly catastrophic and semipermanent environmental changes—and most of them appear to have been tectonic in origin.

But importantly I would also add, similarly to Anton Killin (2016), the possibility of very early, increasingly intentional vocalizations, such as ritualized or entrained communal chanting of mimetic sounds (like hunters imitating the calls of prey animals to entice them into approaching). As he points out, they were probably combined with the synchronized tapping of stones and sticks, and mimetic/gestural reenactment (“dancing/acting out”) of simple narratives. I think those first mimetic/depictive stories were mostly about important elsewhere-and-when participant/events and significant landmarks encountered during our planned quests or journeys. They would also have contained important socio-ecological information (Sugiyama 2001) for younger band members, in the guise of group knowledge or “folk biology” gleaned from the trackways of conspecifics, dangerous predators, and targeted prey animals. Drawing representations of their prints in the sand, miming their characteristic actions, and reproducing their vocalizations would have enabled telling these stories.

These mimetic modalities are still used in modern hunter-gatherer societies, because they add dramatic flavor, which makes storytelling more entertaining. Importantly, storytelling combined with chanting and music would increasingly become valued for entertainment as well as for transmitting socio-ecological information, especially when we acquired the habitual use of fire (e.g., Killin 2016). In fact, new ethnographic evidence has shown good storytellers have higher reproductive success, even more than expert hunters. In addition, bands with more skilled storytellers remain more cooperative and egalitarian during times of stress (Smith et al. 2017, in press; cited in Boyd 2017).

Hence better ways of communicating our socio-ecological stories, and the encephalization needed for more neural storage for remembering those stories, would have been selected for at both group and individual levels. I have ended my timeline with the advent of fire control, because the aforementioned selective feedback loops would certainly be amped up by the extra hours of “downtime” created by centralized fireside sociality. Other powerful selective feedback loops, of course, were created by encephalization itself. Obviously, increasingly larger skulls created difficulties giving birth, which caused selection for shorter gestation and longer childhood dependency, which demanded more cooperative alloparenting, especially from older females and juveniles. More provisioning would also have been required, which in turn demanded more efficient exploratory foraging practices.

With regard to the transition to linguistic communication, it is very likely that when spoken (and sung or whistledFootnote 21) utterances began to be combined with or substituted for the conventional handsigns of fully established Acheulian gestural languages around the fire at night, the invention of spoken languages was relatively quick. Unsurprisingly then, given the remarkable mnemonic efficiency of spoken words and phrases when used as conventional signs (names or “trail markers”) for important places, characters, and action-events, and the extra, more dependable nutrition available from cooked food and dried/smoke-preserved animal flesh, the final surge in encephalization in our lineage was both relatively large and of short duration. In fact, after the advent of full control of fire around 790 kya (Attwell et al. 2015 for review), the above powerful coevolutionary selective processes caused fully one-third of the encephalization that began at around 2.8 mya, resulting in neandertalis and sapiens by 300 kya (fossil evidence of sapiens is now dated 300 mya; Hublin et al. 2017).

Conclusions

The social trackways evolanguage model is robustly plausible because it is narratively coherent as well as sensitive to all the available paleo-archaeological evidence. Theoretically speaking, it is an orthodox niche-construction, gene/culture coevolutionary explanation for the incremental evolution of: (1) our unique possession of an autobiographical or narrative self-projecting awareness, manifesting as extra theory of mind and mental space and time travel; and (2) our resulting capacities for overtly intentional exploratory navigation and coherent, pragmatically cooperative schematic/narrative discourse, using our original mimetic/gestural natural modalities and more recent unnatural (culturally or normatively invented) syntactically structured sign systems built out of symbolic depictions, hand-signs, and sung, whistled, and spoken utterances.

The last sentence of my second introductory paragraph stated my evolanguage model dovetailed nicely with the views of Francesco Ferretti and his colleagues, so to fully maintain the schematic/narrative coherence of our discourse, they should have the next-to-last words:

At the basis of our hypothesis is the concept that the narrative foundation of language and its proto-discursive origin are closely related to the functioning of cognitive systems that allow individuals to identify a goal to move toward as well as to construct and keep the correct route in order to reach it. In other words, both the actual functioning of language and its evolutionary roots rely on processing devices governing navigation in space and time. (Ferretti 2014, p. 243)

In sum, then, psychologically the human mind just is a uniquely self-projecting, narratively tracking, wayfinding mind—and socially we view our whole lives, our communities, our everyday routines, and our conversations as overtly explorative and cooperative what if narrative journeys through space and time.