Research on the evolution of language is often framed in terms of sharp discontinuities in syntax and semantics between animal communication systems and human language as we know them. In recent years, several researchers of animal communication have advocated adopting a “pragmatics-first” approach to the origins of human language (henceforth “a pragmatics-first approach”, for short). They have argued that our understanding of potentially telling continuities between linguistic and animal communication can be greatly enhanced by focusing on pragmatic – as opposed to semantic or syntactic – aspects of animal communication. Shifting focus to pragmatic aspects of animal communication would seem especially apt in light of the fact that some theorists have argued for specific pragmatic discontinuities between animal and human communication, side-by-side with widely acknowledged syntactic and semantic ones (e.g. Origgi and Sperber 2000, 2010; Tomasello 2008; Scott-Phillips 2015). However, proponents of pragmatics-first approaches have implicitly relied on two different conceptions of pragmatics – one “Carnapian”, the other “Gricean”. In Section 1, following Bar-On and Moore (2017) and Bar-On (2021), I distinguish the two conceptions and explain why neither conception can serve the purposes of those seeking to establish significant pragmatic continuities between animal and linguistic communication. In Section 2, I turn to a recent influential formal “semantic-pragmatic” analysis of monkey calls, due to Philippe Schlenker et al. (in, e.g.,2014, 2016a, b), which appears to improve on both Carnapian and Gricean analyses in certain ways. I argue that, appearances to the contrary, as it stands, this analysis cannot serve the purposes of pragmatics-first approaches any better than Carnapian analyses. And this motivates the need for a conception of pragmatics that is intermediary between the Carnapian and the Gricean ones. In Section 3, I illustrate the scope of this conception and its potential to illuminate pragmatic origins of linguistic communication, drawing on recent discussions of chimpanzee communication – both gestural and vocal. I conclude by briefly indicating what aspects of extant monkey call communication could potentially count as pragmatic according to intermediary pragmatic.

1 Two Conceptions of Pragmatics

The discussion in this section is a more careful elaboration of the distinction between two conception of pragmatics introduced in Bar-On and Moore (2017) and Bar-On (2021), which draws on some recent discussions in the Philosophy of Language. The present discussion builds on some ideas introduced in Bar-On (2021).

In their (2012) paper, after detailing various asymmetries between primates’ production of calls, on the one hand, and the interpretation of calls by receivers, on the other, Wheeler and Fischer conclude:

[A]ny continuities or parallels that exist between communication systems of humans and our extant primate relatives reside not in the ability of signal producers to transmit symbolically encoded information, but in the flexible, learned responses of receivers. (2012: 1990).

Primate call receivers, Wheeler and Fischer think, appear to engage in relatively sophisticated, context-dependent inferences about call significance. By contrast, primates call producers appear to have little voluntary control over their vocalizations. (Compare e.g. Fitch 2010: 189 f., Tomasello 2008: 16 f., and Cheney and Seyfarth 2003.)Footnote 2 Wheeler and Fischer therefore suggest that those interested in the origins of human linguistic communication should shift their focus from semantic, quasi-symbolic aspects of calls (aka “functional reference”) to “pragmatics, the field of linguistics that examines the role of context in shaping the meaning of linguistic utterances” (2012: 203). Arnold and Zuberbühler likewise recommend adopting “a pragmatics approach to exploring how primates extract information from … highly ambiguous, though discrete, signals” (2013: 2). Similar considerations have also motivated Seyfarth and Cheney (2017) to turn to “sophisticated pragmatic inference” as the “foundation” upon which language was built (2017: 340), claiming that “animal communication constitutes a rich pragmatic system” and that “the ubiquity of pragmatics, combined with the relative scarcity of semantics and syntax, suggest that, as language evolved, semantics and syntax were built upon a foundation of sophisticated pragmatic inference” (op. cit., p. 340, emphases added).

However, to evaluate these or other authors’ “pragmatics-first” proposals, we must be clear on what aspects of animal communication they would count as pragmatic. Philosophical work on pragmatics can help us gain some clarity.

1.1 Carnapian Pragmatics

On a standard understanding (e.g. Bach 2004), semantics and pragmatics cover mutually exclusive domains. Semantic properties belong to forms as paired with meanings and are had by types of sentences (or signals) “independently of anybody’s act of [producing] them”, whereas pragmatic properties pertain to meaningful utterances as issued by producers in particular situations and interpreted by their audience (op. cit., 27 f.). Thus understood, semantic properties (like phonological and syntactic properties) are context-independent, whereas pragmatic properties are essentially context-dependent. However, as observed by Carnap (1942), when it comes to the use of natural languages, obtaining mappings between utterance types and their truth-conditions to generate “the proposition expressed” almost always requires fixing certain contextual parameters. For example, the truth of the sentence “It is raining” will depend (in part, of course) on the time and place of the utterance of the sentence (it may be true if uttered in London right now but false if uttered in New York, and so on). The truth of the sentence “I am hungry” will (in addition) depend on the identity of the speaker; it may be true if uttered by John before breakfast, false if uttered by Mary after lunch. And the truth of “You are late” will likewise partly depend on who “you” refers to in the context. Thus,

Carnapian Pragmatics:

The study of the variation of the truth-conditional contents of sentence (including signal) types with elements of the objective context of production (such as time, place, speaker/hearer).

The context-dependence relevant to Carnapian pragmatics is of a rather limited sort.Footnote 3 It pertains to what is sometimes described as “narrow” context (Bach 1997), which “concerns information specifically relevant to determining the semantic values of indexicals and is limited to a short list of contextual parameters” – “objective facts”, such as “where and when the utterance takes place”, as well as the identity of the speaker or the hearer (Recanati 2002: 111). What is crucial about narrow (or “indexical”) context-dependence for present purposes is that it is independent of what – if anything – signal producers intend to communicate or are taken by receivers to intend; it is in this sense entirely intention-independent. Intention-independent context-dependence is ubiquitous. It is exhibited not only by linguistic utterances (as we saw above) but also by, e.g., baboon social grunts (Seyfarth and Cheney 2017) and by monkey calls (Fischer and Wheeler 2012), insofar as such signals are produced by particular individual signalers or addressed to particular receivers, at particular times, in particular places. Likewise for various vocalizations by non-primates (birds, prairie dogs, and suricats, among others), as well as a wide variety of insect signals – which are paradigm cases of code-like communication. A bee dance signals to observer bees the presence of nectar at a specific direction and distance from where the dance is performed, among other things. The loud high-pitch song belted out by a male cicada (or a mating flash produced by a firefly) signals that the producer of the song is ready to mate at this time. (See inter alia Maynard Smith and Harper 2003, Oller and Griebel 2008, and Fitch 2010: Ch. 4.)

Importantly, purely indexical, narrow context-dependence is exhibited not only by communicative signals (whether linguistic or nonlinguistic) but also by non-communicative (or so-called “natural”) signs (see below). For example, a deer track will signify the recent presence of deer at a particular point in a given path; tree rings indicate the age of this tree at a given time, and certain kinds of sneezes signify a virus that a particular individual has at the time of the sneeze; and so on. (Cf. Millikan 1984: 39–49,116–117; 1995: 190; 2017: Ch. 11.) From the perspective of Carnapian pragmatics, in all these cases, receivers’ uptake of signals or signs can be seen as (narrowly) context-dependent. Thus, Carnapian context-dependence is not only intention-independent; it also has nothing specifically to do with communication.Footnote 4

Now, as mentioned earlier, some proponents of the pragmatics-first approach hope to make progress in understanding the evolution of language by shifting focus from (allegedly) rigid call production to their seemingly more flexible context-dependent interpretation by receivers (Wheeler and Fischer 2012: 199). However, if the context-dependence exhibited in primates call communication is limited to Carnapian (narrow) context-dependence, which is entirely ubiquitous, then this is simply insufficient for establishing any significant “continuities” or “parallels” between human and nonhuman communication. As noted above, all animals exhibit sensitivity to narrow context in their interpretation of even non-communicative signs. But establishing continuities apt to shed light on the pragmatic origins of human language surely requires more than pointing to the narrow context-dependence exhibited by all interpretation of signals or signs on the part of animals.

1.2 Gricean Pragmatics

In philosophy of language, the term “pragmatics” is often reserved for a rather different conception, due to Grice (1957, 1968). As is well known, Grice maintained that, in sharp contrast with natural signs, such as dark clouds, or rings on a tree trunk, or deer tracks, linguistic utterances are endowed with speaker meaning: they are produced by speakers who intend to communicate particular messages and rely on their audience’s drawing rational inferences about the very communicative intentions with which the utterances are produced. Grice’s specific analysis of speaker meaning has come under much scrutiny and criticism. But a key idea informing his view has been retained by many contemporary followers, who take human communication essentially to involve rational, metarepresentational understanding of communicators’ mental states (a “theory of mind”). As one author puts it (following Sperber and Wilson 1986/1995), linguistic utterances are designed “to make evident to an addressee the intention to make some thought(s) manifest to [the addressee]” (Carston 2015: 454), relying on hearers’ ability to make inferences about the speakers’ intentions, or “read their mind”. This idea – that linguistic communication is ostensive-inferential – is at the heart of what I refer to as the Gricean conception of pragmatics. Footnote 5

Gricean pragmatics:

The study of rationally evaluable communicative utterances issued by producers overtly (“ostensively”) and inferentially understood as such by their “mind-reading” interpreters.

In sharp contrast with Carnapian context-dependence, Gricean context-dependence is not merely intention-dependent; it is mind-reading-dependent. Gricean interpreters do not simply interpret utterances as communicative acts that are produced intentionally. As just explained, beginning with what a sentence produced in a given context means, they engage in inferential derivation of the speaker’s meaning – what the speaker is overtly trying (“ostensively”) to get across to them. In a way, then, Gricean pragmatic interpretation begins where Carnapian pragmatic interpretation leaves off. The output of the latter provides the starting point, or input, to the former. (See below, 2.1.) For a linguistic example, if a speaker says: “I am hungry”, a Carnapian interpreter will derive the proposition that it is the producer of the utterance who is hungry at the time of the utterance (as mandated by the relevant conventional linguistic rules). Whereas a Gricean interpreter will take it that the speaker intends to get some specific message across to them and may go on to derive what the speaker meant in producing the utterance, as the best explanation of why the speaker produced a sentence with that content (e.g., that she would like something to eat).Footnote 6What I here refer to as Gricean pragmatics thus has a proprietary and very limited domain: it only covers communicative interactions that require Gricean mind-reading, as involved, specifically, in ostensive-inferential communication. Gricean mind-reading, on this construal, is rather demanding: it requires speakers and hearers to have a conceptual understanding of mental states and make attributions thereof to each other.Footnote 7

What is the relevance of Gricean pragmatics to the interests of pragmatics-first approaches to the evolution of language? Proponents of a Gricean pragmatics-first approach would have it that only forms of animal communication that fall under the scope of Gricean pragmatics could possibly have bearing on our understanding of the origins of linguistic communication. Clearly, from the sheer fact that animal receivers often extract narrow contextual information from signals it does not follow that their doing so depends on their employment of Gricean mind-reading capacities. After all, as already mentioned, animals extract such information even from non-communicative natural signs, which – by hypothesis – in no way depend on communicative intentions or the attribution thereof. With that in mind, a proponent of a Gricean pragmatics-first approach might be highly skeptical of the relevance of, specifically, primate calls, to the evolution of linguistic communication – unless it could be shown that primates’ communication via calls in some way paves the way for ostensive-inferential communication.

Several researchers interested in the evolution of language have explicitly advocated a Gricean pragmatics-first approach. For example, Origgi and Sperber (2000) – who adopt the “post-Gricean” Relevance Theory expounded in Sperber and Wilson (1986/1995) – have argued that explaining the emergence of human linguistic communication requires a sharp departure from a “code model”.Footnote 8 On a standard construal of the code model, producers issue signals that encode (context-dependent) messages – concerning the availability of food or mating opportunities, or the presence of danger, and so on, at a given time and location – which receivers then decode. Code-like communication clearly contrasts sharply with ostensive-inferential communication. Producers of coded signals issue them in direct response to environmental stimuli, without any concern for their audience; and receivers decode them without any concern for why the producers produced them. And the mechanisms for encoding and decoding signals (i.e., for pairing them with conveyed or interpreted messages) are either reflexive, automatic, or sub-personal, or else – if they involve any learning – are purely associative (e.g., Origgi and Sperber 2010; Scott-Phillips 2015: 5, 157). Crucially, the signals involved in code-like communication are issued and interpreted much like natural signs. They need not be treated as essentially communicative (or as in any way intention-dependent).

Like other proponents of the Gricean approach, Origgi and Sperber believe that all animal communication can be understood on the code model; but they think this model is entirely inadequate when it comes to explaining human linguistic communication, with its essential ostensive-inferential, mind-reading character. Given this character, they think that a form of non-human communication could shed light on the emergence of linguistic communication only if it could help explain the origins of the latter’s ostensive-inferential character. So they conclude that “language as we know it developed as an adaptation in a species already involved in [ostensive-]inferential communication, and therefore already capable of some serious degree of mind-reading” (2000: 159). In other words, on a (post-)Gricean pragmatics-first approach, our ancestors would have had to “go Gricean” before the emergence of language.Footnote 9 But this idea seems highly problematic from an evolutionary perspective. The ancestral capacity for ostensive-inferential communication posited by Origgi and Sperber would be a rational capacity involving other-directed propositional communicative intentions, mentalistic metarepresentations, and a person-level concern with – and inferences about – others’ states of mind (Sperber 2000). But then we would be facing a puzzle that seems entirely of a piece with the puzzle of language evolution itself, namely: How could a language-like psychological capacity (for recursive, propositional, and metarepresentational rational thought) have emerged where none existed before?Footnote 10 In any event, by the (post-)Gricean standards, primate call communication, on the whole, does not remotely foreshadow ostensive-inferential communication; the interpretive context-sensitivity it exhibits appears to be of the wrong kind and can have no more direct relevance to the evolution of human language than any other forms of non-mind-reading, Carnapian contextual processing of signals, including natural signs. So adopting Origgi and Sperber’s view of things would mean giving up altogether on the idea that one could gain any specific insight into the origins of human communication by investigating primate vocal communication.

Some post-Griceans attempt to address this worry by claiming that Gricean communication is not very cognitively demanding, appearances to the contrary. Indeed they are prepared to accept that not only very young children but even many existing animals could qualify as at least “minimally” Gricean communicators. (See e.g. Scott-Phillips 2015; Moore 2017 and 2018; Heintz and Scott-Phillips 2022). My worry about minimally Gricean views is, very briefly, that the pragmatic capacities they credit existing nonhuman animals with may not really be minimal, insofar as they involve communicators’ possession of concepts of mental states, the ability to think about and have an interest in others’ mentals states, and so on – that is, at least some form of Gricean mind-reading. Alternatively, if the posited capacities are indeed minimal – because, for example the nonhuman producers are not said to engage in person-level rational-cooperative-ostensive production and their audience is not said to engage in rational-inferential interpretation – then it is not clear why we should think of them as already even minimally Gricean communicators.Footnote 11

1.3 “Intermediary” Pragmatics?

The Gricean and the Carnapian conceptions clearly have different implications for the explanatory relevance of behaviors such as primate alarm calls to the study of language evolution. Adopting a Gricean approach to pragmatics entails that a form of animal communication could only potentially help illuminate the origins of language if it exhibited mind-reading-dependent context-dependence (which primate communication via calls does not). So the Gricean approach would set the explanatory bar too high. By contrast, on a Carnapian approach, primates’ call communication would indeed be relevant to the evolution of language – simply in virtue of the (narrow) context-dependence of the interpretation of calls – but then so would all forms of animals’ context-dependent interpretation, including their interpretation of non-communicative (natural) signs. Adopting this approach would, then, seem to set the explanatory bar too low. What is needed, I submit, is an intermediary approach to pragmatics: one that identifies a kind of context-dependence that goes beyond mere Carnapian indexical (narrow) context-dependence yet falls short of Gricean mind-reading-dependent context-dependence. Such an approach could potentially allow us to regard some animal communicators as exhibiting pragmatic capacities relevant to the emergence of linguistic communication, even if no animal communicators engage in ostensive-inferential communication proper.

An intermediary approach faces a certain challenge, which we may dub “the Gricean Challenge”. It must allow us to identify aspects of nonhuman communicative behaviors that could have potentially moved nonhuman animals beyond Carnapian, code-like communication and closer to ostensive-inferential communication. Some theorists of language evolution have responded to this challenge by arguing that, when searching for the origins of language, we should be focusing on primates’ gestural communication.Footnote 12 Thus, Michael Tomasello, who often highlights the distinctively ostensive-inferential character of human linguistic communication, has argues that, to understand how things could “move in the human direction”, we must identify the origins of “the underlying psychological infrastructure of human cooperative [Gricean] communication” (2008: 19 f., emphasis added). This underlying structure, he thinks, is entirely absent from primate vocal communication, with its failure to exhibit “[l]earning, flexibility and attention to the partner” (2008: 9). But he thinks at least hints of it are present, in the gestural communication of chimpanzees. For this reason, Tomasello thinks of apes’ gestural communication as “the closest thing we have to a ‘missing link’” (2008: 29) between nonhuman and human forms of communication – “the best place to look for the [pragmatic] evolutionary roots” of human communication (2008: 15).

As I read him, Tomasello thinks that it is in the gestural domain that we may be able to identify precursors of what I shall later describe as psychologically mediated communication.Footnote 13 This type of communication relies essentially on communicators’ capacity to respond to each other’s psychological states in issuing and interpreting their communicative signals. In Section 3, I will be offering my own take on the nature of such communication, arguing – contra Tomasello – that it can have (non-Gricean) precursors not only in the domain of primate gestures but also in the vocal domain. But my more immediate aim is to offer some (indirect) support for focusing – as does Tomasello – on the psychology of communication, when trying to understand the pragmatic origins of human linguistic communication. Toward that end, I turn to an evaluation of a recent formal linguistic analysis that utilizes ostensibly pragmatic principles to capture the character of monkey alarm calls. This analysis appears to offer an understanding of the context-dependence of monkey calls that is intermediate between the Carnapian and the Gricean. However, I argue that on closer examination it fails to do so, precisely because it fails to address psychological aspects of monkey communication. Our discussion will help sharpen the challenge facing pragmatics-first approaches to the evolution of language. It will also suggest how they might address it.

2 A Formal Linguistic Analysis of Monkey Calls

In this section, I critically evaluate a recent sophisticated formal linguistic analysis of monkey alarm calls. I argue that, when it comes to the purposes of a pragmatics-first approach – specifically, the goal of uncovering pragmatic precursors of human linguistic communication – this analysis fares no better than Carnapian analyses. Insofar as the formal analysis focuses exclusively on the contextual mapping of calls onto their truth-conditions, in complete abstraction from the psychology underlying call production and interpretation, it cannot help advance our understanding of the emergence of humans’ use of linguistic utterances.

On a standard paradigm, inspired by well-known studies of Vervet monkey alarm calls, the calls of some animal species (including many non-primate species) can be associated with fairly specific meanings, to do with types of predators – e.g. eagles vs. leopards vs. snakes. Such calls have been described as “functionally referential”, because they function to single out types of predators, in a way similar to the way words that refer to specific categories (such as “eagle”, “leopard”, “snake”) function. But the calls were also described as merely functionally referential, to highlight the fact that the calls of monkeys (and other species) are unlikely to be underwritten by cognitive capacities similar to or continuous with those of human speakers. This is claimed to be evidenced by the fact that Vervet call production appears to be inflexible and non-intentional (Seyfarth and Cheney 2003: 168).Footnote 14

In a series of recent articles, Schlenker et al. (e.g. 2014, 2015, 2016a, b), focusing on studies of monkey species other than Vervets – such as the Campbell’s, Diana and putty-nosed monkeys – offer an alternative semantic-pragmatic (henceforth “S-P”) analysis of monkey calls. They argue that the significance of at least some monkey calls is not best understood in terms of functional reference. This is because they think the data on the contextual variability of monkey calls justify taking their truth-conditions (or informational content) to be generated through the application of certain pragmatic principles. On the S-P analysis, the calls of these monkeys have more generic meanings than those claimed to be associated with Vervet alarm calls; and these meanings become more specific only through contextual “pragmatic enrichment”.

2.1 Pragmatic Enrichment and Wide Context-Dependence

To better understand this claim, we should consider what is involved in pragmatic enrichment. In philosophy of language,Footnote 15 “pragmatic enrichment” refers to one of several types of interpretative processes that require going well beyond identifying relevant elements of the narrow context (such as when, where, or by whom an utterance was produced; see above, 1.1). These processes are needed any time an utterance contains semantically underdeterminate – as opposed to purely indexical – expressions. Here are some standard linguistic examples:

  • demonstratives: “That guy is obnoxious” (which guy?);

  • proper names: “John is on his way” (which John?);

  • definite descriptions: “The book is on the table” (which book/table? the only book/table in the house/room?);

  • ambiguities: “My friend is throwing a ball” (a round object used in games? a big party?)

  • possessives: “Rachel’s book” (the book she owns? the one she wrote?);

  • adjectival constructions: “red pen” (a pen that is red? a pen that produces red marks?)

“Semantic underdetermination” refers to the fact that the semantic properties of the relevant expressions – their so-called “conventional meanings”, as fixed by the rules of the language – are insufficient for determining their contribution to the informational or truth-conditional content of the sentences in which they occur (=‘the proposition expressed’). So their “semantic value” “varies from occurrence to occurrence just as the semantic value of indexicals does”; but the variation is a function of more than “some objective feature of the narrow context” (Recanati 2002: 111).Footnote 16 Determining the content of an utterance that contains semantically underdeterminate expressions requires supplementation by what Recanati refers to (again, following Bach 1997) as the “wide” context (op. cit., 112). Whereas narrow context “concerns information specifically relevant to determining the semantic values of indexicals”, wide context is said to concern “any contextual information relevant to determining the speaker’s intention…”.Footnote 17 In other words, unlike narrow context-dependence (see 1.1), wide context-dependence is intention-dependent – at least insofar as it requires converging on the speaker’s intended referents: whom the speaker is indicating when saying “That guy is obnoxious”, which John she is thinking of, what reference class she is using when describing John as tall, which store she is talking about when saying that John went to the store, and so on. You cannot know what the speaker has said unless you settle what she has in mind. For a hearer to ‘get’ what the speaker is saying when her utterance is semantically underdeterminate, the hearer must recover what the speaker ‘has in mind’.

However, Recanati is careful to note that

[determining] what the speaker has in mind [does not necessarily] involve an inference from premises concerning what the speaker can possibly intend by his utterance. Indeed, they need not involve any [reflective] inference at all (op. cit., 113 f., first emphasis added).

In other words, when it comes to semantic underdetermination, intention-dependence is not Gricean mind-reading-dependence.

Recanati usefully distinguishes in this connection between primary and secondary interpretive pragmatic processes, and correlatively, between different roles that can be played by the wide context. Like the fixation of the semantic values of indexicals, the determination of truth-conditional content of semantically underdeterminate expressions can be accomplished through primary pragmatic processes: these processes take place before the hearer identifies a determinate proposition that has been expressed; they are “pre-propositional” (Recanati 2004: 23). For example, if I say to you “John is walking very fast”, you cannot know what I have said (what proposition I expressed) unless you know which John I’m talking about (as well the comparison class for fast walking). But you can know that without making any (mind-reading) inferences concerning my intentions when producing the utterance – for example, by simply following my gaze when I make the comment. (By contrast, to figure out why I’ve said that this guy, John, is walking very fast, you do need to venture a hypothesis about my state of mind – the motive or intention behind my saying what I said, my speaker meaning.) Primary pragmatic processes, Recanati suggests, are similar to perception, can be associative and automatic, rather than (properly) inferential, and can “take place at the subpersonal level” (2002: 114). By contrast, secondary pragmatic processes “are those that come into play to determine what the speaker means on the basis of what he says when what the speaker means goes beyond what he says – when he implies various things by saying what he says”, Gricean conversational implicatures being a prime example (op. cit., 114, emphasis added). Secondary pragmatic processes are thus “post-propositional” and inferential in a stronger sense of being “reflective”, at least available to rational evaluation, and take place at the personal level (ibid. and compare 2004: 2.1).

Now, as Recanati construes them, processes of enrichment are primary pragmatic processes. An example that will be relevant to our evaluation of the S-P analysis below is the following. Suppose I tell you: “The book is on the table”. The word “table” has such-and-such conditions of application. It applies to all objects of a certain sort. But, to understand what I said to you, you need to restrict the application of “table” to, say, the table in the living room (Recanati refers to this as “contextual strengthening”, which is a species of pragmatic enrichment;2004: 25).

Simplifying quite a bit,Footnote 18 then, we get the following picture (Fig. 1):

Fig. 1
figure 1

Primary vs. secondary pragmatic processes

2.2 The S-P Analysis

Suppose we accept that it is futile to seek even rudimentary instances of Gricean mind-reading-dependent context-dependence in any forms of animal communication, given its essential reliance on secondary pragmatic processes, which require a capacity for reflective inference and complex mind-reading. Recanati’s pragmatic framework (2.1) opens up the possibility that we could nevertheless find precursors of at least certain primary linguistic (even if non-Gricean) processes of contextual interpretation. Given the S-P analysis of certain monkey calls in terms of pragmatic enrichment, it may appear that it allows us to identify precisely such precursors, and that the analysis therefore supports the aim of pragmatics-first approaches to language evolution.Footnote 19 But these appearances, as I go on to argue, are misleading.

2.2.1 Monkey Calls

Male Campbell’s monkeys of the Taï Forest have a “non-predation-related call boom”; they also use the call krak “to raise leopard alerts and hok for raptor alerts” (Schlenker et al. 2016a: 897; and see 2016b for a fuller account). In addition, they have suffixed calls: krak-oo for weak, unspecified ground alerts and hok-oo for weak non-ground disturbances. Interestingly, Campbell’s monkeys on Tiwai Island where leopards have not been around for a long time – use krak for unspecified alerts. An analysis that sought to associate call types directly with specific meanings might recommend positing a dialectal difference between Taï and Tiwai; same call, different meanings. Schlenker et al. (in inter alia 2014, 2016a, b), however, argue that this is an unparsimonious reading, and they propose instead that krak has a uniform, unlearned meaning in both populations: a “general alert meaning” that is enriched through the application of a “pragmatic” principle. The relevant principle says roughly that “more informative calls are normally preferred to less specific ones” (2016a: 894). More precisely:

The Informativity Principle:

If a speaker uttered a sentence S that competes with S’, if S’ is more informative than S, infer that S’ is false (otherwise the speaker would have uttered S’). (op. cit.)

So: when krak (= S) is produced in the Taï Forest, its meaning gets enriched so it indicates the presence of leopards, given that one of its more informative competitors (S’) – viz., either krak-oo (= “unspecific ground alert”) or hok (= “non-ground alert”) – was not produced. By contrast, when produced on the Tiwai Island, krak “fails to be pragmatically enriched”, since “this would yield a useless meaning” (2016a: 899), given the absence of leopards there. In this way, the analysis posits “a ‘division of labor’ between call meaning, [pragmatic] rules of competition among calls, and non-trivial properties of the environmental context” (2017: 271).

Schlenker et al. (2016a, b) also apply their S-P analysis to the call system of putty-nosed monkeys – an arboreal species of monkeys, which (like Campbell’s monkeys) belong to the genus Cercopithecus. Putty-nosed males have a repertoire of three ‘loud’ call types that can carry over long distances: booms, pyows, and hacks. Booms are very rarely heard and occur in a wide range of contexts, whereas pyows and hacks are produced frequently. Eckardt and Zuberbühler (2004) suggested that these calls were functionally referential, just like the Vervet alarm calls, in effect serving as labels for leopards and eagles, respectively. However, following a series of playback studies combined with careful observations of the natural contexts in which these calls were used, this interpretation was reevaluated (Arnold and Zuberbühler 2006). First, while pyows were found to be generally produced in response to leopard stimuli and hacks to eagle stimuli, no one-to-one mapping between call type and predator type was observed. Secondly, both call types were observed to be produced in a variety of other contexts, many of them non-predatory. These findings suggested that the functional reference analysis of these calls was inappropriate (Arnold and Zuberbühler 2013).

Taking into account some of these observations, Schlenker et al. propose that (the call type) pyow has the content [general alert], but that (the call type) hack has the more informative content [(serious) non-ground (movement-related) event], of the sort appropriate to the presence of an eagle.Footnote 20 The Informativity Principle mentioned earlier could then be invoked to explain why, if a pyow is produced, call receivers can infer that there’s no (serious) non-ground movement related threat around – since, if there was such a threat, a hack would be produced. Notably, both pyows and hacks are almost exclusively produced in sequences, most often as repetitions of one or other of the call types, or sometimes in combination, and they appear to indicate group movement.Footnote 21 The appropriateness of the call sequences cannot be predicted on the basis of the so-called “lexical” meanings of the discrete calls, regardless of whether the discrete calls are taken to have a specific (quasi-)referential or more generic meaning. (In other words, pyow-hack sequences could not plausibly be assigned the meaning [leopard and eagle] – or even [threat all around]. Nor could they be interpreted to mean [non-specific, (serious) non-ground movement-related alert].) For this reason, Arnold and Zuberbühler (2008, 2012, and elsewhere) treated pyow-hack sequences as semantically non-compositional idioms with a (holistic) meaning that is not composed out of the meanings of the discrete calls they combine. But Schlenker et al. find this analysis “unsatisfying” and propose, instead, that “these sequences have a weak literal meaning but that it is pragmatically enriched by an Urgency Principle” (2016a: 899):

The Urgency Principle:

If a call sequence S is triggered by a threat and contains calls that convey information about the location of the threat, no call that contains such information should be preceded by a call that does not. (2016b: 15)

Like the Informativity Principle, the Urgency Principle, is a competition principle, which “mandates that calls that provide information about the nature/location of a threat must come before calls that don’t” (2016b: 33). Semantically speaking, a pyow-hack sequence will be taken to be indicative of some imminent non-ground movement-related event (with the meaning: [there’s impending movement of an attacking raptor or of the (arboreal) monkeys themselves]). But, given Urgency, if pyows precede hacks in given sequences, this licenses the “inference” that there is imminent group movement, rather than that there is a raptor about to attack (2016b: 34). This is because “[i]f a raptor were present, hacks would convey information about the location of the threat and hence (by the Urgency Principle) they should come before pyows”; and this “explains why pyow-hack sequences are indicative of group movement” (2016a: 900 f., emphasis added; but see below).

2.3 The S-P Analysis as Revealing Merely ‘Functional’ Pragmatics

Here are a few observations regarding the S-P analysis:

  1. (1)

    Whereas other accounts that focus on pragmatic aspects of primate calls (such as Wheeler and Fischer 2012) highlight asymmetries between call production and interpretation, the pragmatic principles posited by the S-P analysis are claimed to capture aspects of both production and interpretation.

  2. (2)

    On the S-P analysis, producing and interpreting calls with specific contextual informational contents requires knowledge of the physical environment in which the calls are produced beyond the time and place and identity of the producer of the utterance. So, in terms of our earlier discussion, the relevant context-dependence is not of the narrow variety. And both call production and interpretation are said to rely on the use of general pragmatic rules for pragmatic enrichment similar to certain principles that are used in linguistic exchanges. This may appear to suggest that that monkey call communication exhibits wide context-dependence.

  3. (3)

    However, although Schlenker et al. acknowledge that “in linguistics, the Informativity Principle is usually taken to follow from humans’ ability to communicate cooperatively and to reconstruct the intentions of language users”, the S-P analysis “does not require [attributing] such mind-reading abilities” to monkeys (2016a: 895). So, in terms of our earlier discussion, the suggestion is that no secondary processes are involved.

  4. (4)

    Nevertheless, a “key ingredient” in the analysis “is that the interpretation of a call or call sequence can be pragmatically enriched by competition with others” (2016a: 898), using “powerful mechanisms of competition among calls” (2016b: 3; 25). Schlenker et al. explicitly describe these in terms of “the device of monkey scalar implicaturesFootnote 22 (2015:10 emphasis added).

The combination of (1)-(4) clearly suggests that the conception of pragmatics implicitly invoked by the S-P analysis goes beyond the Carnapian conception outlined earlier, with its purely narrow context-dependence. At the same time, the analysis is explicitly non-Gricean; like standard Carnapian analyses it is designed to assign contextual truth-conditional content to calls – understood as “sentences” of “monkey language” – in complete abstraction from any communicative intentions and their attributions (2016a 2016: 895 ff.). This may make it appear that the conception of pragmatics and of context-dependence employed by the S-P analysis is intermediary (in the sense of 1.3 above), since, to repeat, it posits a kind of context-dependence that goes beyond narrow Carnapian context-dependence and yet falls short of Gricean mind-reading-dependence. And the invocation of the idea of pragmatic enrichment, in particular, may make it seem like the analysis helps identify some specific pragmatic continuities between monkey call communication and human linguistic communication: continuities in wide context-dependence that do not require secondary processes. However, our earlier discussion of what is involved in linguistic pragmatic enrichment suggests that we need to tread more carefully.

To begin with, consider the fact that (as per (1) Schlenker et al. take principles like Informativity to have both receiver and producer versions. On the receiver version: “if S’ is more informative than S, infer that S’ is false (otherwise S’ would have been produced)” (2016a: 899). On the producer version: “if a call C2 is more informative than a call C1, then whenever possible C2 should be preferred to C1” (ibid.). These formulations make it appear that the principles are designed to capture rules that monkey producers and interpreters, respectively, actually follow in some way. But, on the face of it, there is some room for skepticism here. To follow the relevant principle, it seems that monkey receivers would have to be able to consider and reject unproduced alternatives as not obtaining. This, in turn suggests that they are capable of having negative propositional thoughts (of the form: [It’s not the case that S’]. But are we to suppose that, although monkey “sentences” – unlike linguistic sentences – do not have proper subject-predicate structure, and do not admit propositional negation, monkey receivers are nevertheless capable of thoughts that do? Put differently: What is required, cognitively speaking, for monkeys to reason about alternatives that do not obtain, so as to “infer” something about the alternative that does obtain? (Relevant here are arguments that thoughts about what is not the case are not available to even chimpanzees and very young children. E.g. Horn 1989, Bermudez 2003, Millikan 1984: Ch. 14; 2017: 80–83.) Turning to producers, active reasoning in accordance with the relevant Informativity Principle would seem to require evaluating and comparing the information that would be conveyed by alternative possible calls and opt for one over the others, based on their assessment of the situation. This, in turn, would seem to require some capacity for voluntary, flexible control over which call to produce. Yet, on a dominant view of primate communicative vocalizations, they constitute more or less automatic/reflexive responses to environmental and affective triggers that are not under voluntary and flexible – let alone rational-evaluative – control. (Cf. Wheeler and Fischer 2012: 197, Fitch 2010: Ch. 4, Tomasello 2008: Ch. 2.) Schlenker et al. have not themselves provided any reason to question this understanding.Footnote 23

Schlenker et al. could object that there is no need to attribute to either monkey receivers or monkey producers any propositional thoughts with negation or reflective reasoning about hypothetical alternatives along the above lines. They might suggest that the S-P pragmatic principles simply capture aspects of monkey call communication that are hard-wired. But this would just provide grist to the skeptic’s mill. After all, it is precisely the allegedly inflexible, reflexive-automatic, and entirely hard-wired, code-like character of monkey (and other) animal call communication that has led advocates of the Gricean approach to deny that it has any relevance to our understanding of the emergence of human linguistic communication. And it is precisely the hope of discovering that animal call use does not fit this stereotype that has motivated the move to pragmatics-first approaches.

Relatedly (concerning (2)), unless there is evidence of relevant continuities in underlying psychological processes, it is not clear how it can be appropriate to take monkeys to engage in pragmatic enrichment. To be clear, I am not objecting that Schlenker et al. have failed to do what no one can do, namely: peer into monkeys’ minds to measure directly the psychological processes they engage in. Rather, I am claiming that, by design, their analysis is not apt to probe evidence concerning whether monkey producers or interpreters actually deploy pragmatic principles in processing the calls. For, like Carnapian analyses, the S-P analysis is exclusively designed to generate mappings correlating calls with truth-conditions. It departs from more traditional Carnapian analyses in invoking certain pragmatic principles. What it does not do, however, is rely on observational and experimental evidence that directly targets the ways primates use their signals – how they react when a signal is ambiguous (where they look, what they attend to, etc.), how they modify signals in response to the perceived reactions of others, and so on. For this reason, it seems that by itself the S-P analysis could not move us any further in our understanding of the emergence of human linguistic communication than do Carnapian analyses (with their purely indexical context-dependence).

In general, just as the possibility of a semantic mapping of Vervet monkeys’ alarm calls – understood as acoustic types – to distinct predator categories tells us only that the calls are functionally referential, so the applicability of analysis like the S-P analysis to a call system may tell us only that the system it describes is functionally pragmatic. Just as being functionally referential can mask significant discontinuities in meaning, so being functionally pragmatic can mask significant discontinuities in use. Whether the S-P analysis helps reveal interesting – and not merely functional – pragmatic continuities, I submit, depends on what is involved in the monkeys’ use of the relevant principles. And this cannot be settled simply by establishing the applicability of the S-P analysis to their calls. It requires observational and experimental evidence of a different sort.Footnote 24

When discussing linguistic scalar implicatures, Schlenker (2016) describes them as exemplifying “communicative rationality”. Yet clearly the applicability of the S-P analysis to monkey calls is insufficient to establish that monkeys’ call use is rational in the relevant sense (as in effect acknowledged in Schlenker et al. 2016b). Perhaps more importantly, it is also insufficient to establish the relevance of calls’ communicative character. For it seems that an S-P-like analysis could in principle apply to the interpretation of non-communicative, natural signs. (For an imaginary example: Suppose that, in a given environment, smoke is a natural sign of fire, and white smoke is a sign of chemical fire. Interpreters of those signs could use a principle very much like Informativity to infer the absence of chemical fire from the fact that the less informative sign [smoke] occurred and not the more informative sign [white smoke].) Relatedly, even when it comes to the interpretation of communicative calls, call interpreters need not in any way take the calls to be communicative. For, as Schlenker et al. note, “bystanders”, who are in no sense the intended audience of calls, could derive the calls’ informational content by “eavesdropping”, essentially treating the calls as though they were natural signs (2016b: 9). For all that has been said, the same may be true of monkey receivers.

However, if this is so, then the context-dependence captured by the S-P principles – just like Carnapian narrow context-dependence – cannot suffice by itself to support claims of human-nonhuman communicative-pragmatic continuities. As noted in (3), the context-dependence posited by S-P is supposed to be independent of Gricean mind-reading. But if the analysis is equally applicable to the interpretation of non-communicative signs, it would seem as though the context-dependence it posits is not only mind-reading-independent; it is also as entirely intention-independent as Carnapian context-dependence. But then it is not clear how the context-dependence posited by the S-P analysis would move communicators toward ostensive-inferential communication any more than Carnapian narrow context-dependence. The analysis could still provide a useful way of modeling the way monkey calls map onto the information they encode; but it would remain to be seen what reason – if any – there is to take the analysis to point to pragmatic precursors of human linguistic communication in monkeys’ call communication.

Now, Schlenker et al. at times disavow any commitment to specific continuities in underlying psychology between monkey call communication and human linguistic communication (e.g. 2014: 441, 2016a: 894). The present point is that, if this neutrality about processing mechanisms is taken at face value, their analysis can offer no support for pragmatics-first approaches as here construed. Advocates of these approaches, recall, propose that we should focus on legitimate pragmatic precursors of human linguistic communication in animal communication (rather than being distracted by significant discontinuities in syntactic structure and referential content). If all the S-P analysis aims to do is provide a formal – and merely functionally pragmatic – model that generates the right mappings between calls and truth-conditions, then it has nothing substantive to offer to proponents of this proposal. If our goal is to understand animals’ contextual use of communicative signals at least in part in order to determine whether such use reveals genuine pragmatic continuities with humans’ use of language, then the formal construction of signal-content mapping can only serve as a starting point (albeit an important one).

This brings us to (4). Side by side with professing complete neutrality on the question of underlying monkey-human psychological continuities, Schlenker et al. claim that monkeys’ call use involves the generation and derivation of scalar implicatures. And, in keeping with (3), they highlight the fact that they are relying on a “simple”, “neo-Gricean” understanding of scalar implicatures. They say:

One might object that the device of ‘scalar implicatures’ commits us to overly strong assumptions about a kind of ‘theory of mind’ in Putty-nosed monkeys. After all, Grice (1957) developed his original theory of implicatures within a framework in which addressees make use of complex principles of conversation to recover the intentions of the speaker. But … far less than a full theory of mind is required by the Informativity Principle. All that is needed is a principle by which a more informative sentence somehow blocks a less informative one. Thus if a sentence S0 is an alternative to a sentence S and is more informative than it, an utterance of S will lead to the inference that the utterance situation likely did not support S0. Knowledge of the Informativity Principle … is sufficient to derive this result, and no theory of mind needs to be posited. (2014: 440 ff., emphases added; and compare 2015: 10.)

Here they are making both a negative and a positive claim about monkeys’ use of scalar implicatures. Given our earlier discussion (in 2.1), the negative claim made can be perhaps understood as follows. Contrary to Grice’s original analysis, the processes involved in generating and interpreting scalar implicatures are not secondary pragmatic processes that require person-level reasoning about others’ states of mind. But the authors’ positive claim – that monkeys deploy “knowledge of the Informativity Principle” and an understanding that “a more informative sentence somehow blocks a less informative one” – suggests that they may take monkeys to engage in (something like) primary pragmatic processes. But this claim incurs specific commitments.

Adopting the less stringent (‘neo-Gricean’) view of the cognitive requirements on scalar implicatures is simply not sufficient for settling whether monkeys can be plausibly credited with engaging in the relevant processes.Footnote 25 We need some positive reason to suppose that monkeys satisfy even the neo-Gricean cognitive requirements – that, perhaps, monkeys engage in (non-mind-reading, primary) pragmatic processes that allow them to generate, compare, and evaluate alternative contents in terms of their informativity or urgency, to reject inappropriate alternatives, and so on.Footnote 26 Otherwise, Schlenker et al. reference to “monkeys’ device of scalar implicatures” would seem otiose, if not misleading. At best, the analysis could be said to point to the existence of “functional implicatures” in monkey call communication.

Summarizing: As I have presented them, pragmatics-first approaches to the evolution of language maintain that there are significant pragmatic continuities in some of the ways nonhuman animals use their communicative repertoires, despite important syntactic and semantic differences. My aim in this section was to highlight the fact that establishing the relevant pragmatic continuities requires going beyond establishing the formal applicability of pragmatic analyses to given forms of animal communication. In the absence of any commitment to specific psychological continuities in the use of the pragmatic principles posited by the S-P analysis – continuities in pragmatic processes (if only primary ones) – the analysis does not advance us beyond formal Carnapian analyses with their purely indexical (narrow) context-dependence. At the very least, the proponents of the analysis would need to supplement it with specific, empirically supported hypotheses about the mechanisms or processes that underlie monkeys’ use of pragmatic principles and about how these could have given rise to some of the (admittedly different) pragmatic mechanisms or processes involved in human communication. To be clear: I have not here questioned the descriptive-predictive adequacy of the S-P analysis or its cogency. Rather, I have questioned the usefulness of the analysis for pragmatics-first approaches, if we take at face value its proponents’ professed agnosticism about the psychology underlying monkey call communication.Footnote 27 If the S-P analysis is indeed compatible with there being no monkey-human continuities in underlying processing mechanisms – not even continuities in primary pragmatic processes – then the similarities revealed by the analysis would remain purely formal-functional.

3 The Gricean Challenge and Intermediary Pragmatics

Recall our Gricean Challenge (introduced in 1.3): To serve the purposes of a pragmatics-first approach, an account of the pragmatic origins of human linguistic communication ought to allow us to identify aspects of nonhuman communicative behaviors that could have potentially moved nonhuman animals beyond Carnapian, code-like communication and closer to ostensive-inferential communication. I believe we are now in a position to identify several desiderata on an adequate account. Such an account should identify nonhuman communicative behaviors that.

  1. a.

    exhibit more than merely Carnapian narrow context-dependence, but also

  2. b.

    manifest more than merely formal-functional pragmatic similarities of the sort established by the S-P analysis, and (relatedly)

  3. c.

    depend on communicators treating signals as communicative, though

  4. d.

    do not exemplify Gricean mind-reading-dependent context-dependence, and yet they

  5. e.

    can be seen to foreshadow in some way Gricean ostensive-inferential communication.

In this section, I very briefly canvass recent evidence that some nonhuman primates engage in what I describe as psychologically mediated communication.Footnote 28 As I read this work, it implicitly relies on a genuinely intermediary conception of pragmatics, which focuses on context-dependence that is intention-dependent without being mind-reading-dependent. I conclude by noting that, properly understood, this conception could in principle also encompass monkey call communication of the sort discussed in the earlier sections of this paper.

3.1 Non-Gricean Psychologically Mediated Communication

I begin by returning to Tomasello’s discussion of ape gestures – in particular, his discussion of attention-getters; i.e. “such things as ground-slap, poke-at, and throw-stuff”, which “serve to attract the attention of the recipient” to the communicator’s behavioral display itself. Crucially, these behaviors exhibit a specific kind of context-dependence: “the ‘meaning’ or function of the communicative act as a whole resides not in the attention-getting gesture, but rather in the … display, which the individual knows the recipient must see in order to react appropriately” (ibid.). Attention-getting, as characterized by Tomasello, has a “two-tiered structure”: “[t]he communicator has some action he wants from the recipient … and to attain this he attempts to draw the recipient’s attention to something … in the expectation that if she looks where he wishes, she will do as he wishes” (2008: 29). Attention-getters are issued – and received – as communicative behaviors that are “adjusted in various ways for particular circumstances” (2008: 14) based, specifically, on communicators’ responsiveness to each other’s states of mind. The relevant behaviors are specifically directed at a particular audience to accomplish a particular goal. In that sense, they are intentionally produced – and received as such. Notably, in work that precedes Tomasello’s (2008), Liebal et al. (2004), and Hopkins et al. (2007), inter alia, document various attention-getting behaviors on the part of chimpanzees, including novel ones, that are designed to capture the attention of humans.Footnote 29

No doubt, deflationary readings of attention-getting behaviors may be available, which analyze them in terms of lower-level discrimination mechanisms. On the other hand, it does not seem unreasonable to regard attention-getting and related forms of primate communication as relying on mutual psychological responsiveness, on the assumption that such responsiveness does not depend on the full conceptual resources of Gricean mind-reading, with its reliance on mental state attribution.Footnote 30 The chimpanzees described in the works just cited – much like preverbal children – appear to be monitoring each other for what can be described as “psychological cues”: such as eye gaze, bodily posture, and other signs that attest to communicators’ states of mind, to ensure successful communication. Unlike Carnapian communicators, they deploy a form of psychological sensitivity in their communicative acts, relying on context to determine what potentially ambiguous gestures mean and how to modify them to get what they want from their audience, based on the audience’s perceived responses. However, unlike Gricean communicators, they are not concerned to convey or decipher speaker meaning. Chimpanzee producers do not (as far as the evidence shows) produce gestures intending, specifically, to get their audience to think about those very intentions, and chimpanzee receivers are not directly concerned with the motivation behind producers’ utterances. In short, on my proposed construal, chimpanzees’ gestural communication exhibits context-dependence that is intention-dependent without being Gricean mind-reading-dependent. Such communication contrasts with purely indexical (“Carnapian”) context-dependent communication, which is entirely intention-independent. The latter works by allowing interpreters to map signals onto content directly and even automatically. The same, I have argued, applies to call communication as represented by the S-P analysis, insofar as it is also designed to capture contextual mappings between calls and truth-conditions in complete independence from communicators’ underlying psychology.

Now, as noted, Tomasello thinks that chimpanzees’ vocal communication is code-like and thus entirely intention independent. In more recent years, however, several researchers have tried to establish that chimpanzees and other great apes also engage in intention-dependent vocal – and not only gestural – communication. Although these authors agree that great apes should not be credited with Gricean mind-reading capacities, they nevertheless think that great apes meet criteria for (what I have described as) psychologically mediated communication. Along these lines, Schel et al. (2013) – following Crockford et al. (2015 and elsewhere) – have offered detailed evidence that, for example, chimpanzee snake calls are produced “tactically and target important individuals who are valuable to them … [were] often preceded by visual checking of the audience, accompanied with gaze alternations, and individuals were likely to persist in producing calls until all group members were safe from the ambush predator” suggesting “that call production is both socially directed and goal-directed” (2013: 8 f.). The experiments they survey were designed to establish, specifically, that calls were:

(i) used socially by examining sensitivity to the presence or absence of an audience and the composition of the audience; (ii) directed at recipients by examining audience checking and gaze alternation before and during calling; and (iii) goal directed by examining whether callers persisted in vocal production until all group members were safe from danger. (2013: 1)

And this, they think, supports the following conclusion:

[C]himpanzee vocalizations meet the same basic criteria for intentional signal usage which have been put forward for great ape and human infant gestures. … [O]ur results are inconsistent with the traditional notion of primate vocalizations being reflexively and unintentionally produced. (2013: 9, emphasis added)

Like Schel et al., Townsend et al. (2017) think we should set aside the question whether chimpanzees’ communication (whether vocal or gestural) reveals a capacity for Gricean mind-reading. They suggest that we should “exorcise Grice’s ghost” when trying to uncover evolutionary precursors of human linguistic communication, which means “avoiding the question of mental state attribution” and “focusing on behavioural markers of flexible and goal-directed communication”. And, like Schel et al., they believe that a more realistic goal is to seek to establish whether nonhuman animals meet non-Gricean criteria on “intentional communication”, such as acting with a goal that has content, producing voluntary, recipient-directed signals as a means for reaching the goal, and the signaling behavior changing the recipient’s behavior in ways conducive to realizing the goal.

I believe it is useful to understand these researchers as implicitly working with an intermediary conception of pragmatics. To a first approximation, intermediary pragmatics can be characterized on analogy with the way we earlier characterized Carnapian and Gricean pragmatics:

Intermediary Pragmatics:

The study of all non-Gricean psychologically mediated uses of signals: the production and apprehension of signals that have intersubjectively recognized communicative purposes and that essentially rely on animals’ sensitivities and responsiveness to each other’s states of mind, without requiring Gricean mind-reading capacities.

Intermediary pragmatics is intended to cover all intersubjective communicative interactions that essentially rely for their success on communicators’ production and apprehension of “psychological cues” – i.e. “behavioral markers” of goal-directedness, flexibility, persistence, audience attentiveness, and so on (as studied by the researchers just cited). The relevant interactions do not depend on producers or receivers thinking of each other’s states of mind as such or having any direct concern with what’s on each other’s mind – so they are not mind-reading-dependent. Nevertheless, I would argue that they are still directly relevant to meeting the Gricean challenge articulated earlier, insofar as they provide a credible steppingstone on the road to meaningful linguistic communication. An intermediary pragmatics-first approach to the origins of human linguistic communication should, accordingly, aim to determine which (if any) existing forms of animal communication fall under the scope of intermediary pragmatics and to what extent.

3.2 Revisiting Monkey Calls

We are now in a position to briefly revisit monkey call communication. As noted earlier, Tomasello (2008) and others (e.g. Burling 2005, Hurford 2007, Fitch 2010) dismiss primate calls as potentially illuminating the emergence of human language. These authors in effect reason from the fact that primate communicative vocalizations are unlearned and that their form and structure are rigidly fixed (together with the alleged fact that primates have little voluntary control over call production), to the claim that their use lacks the necessary flexibility to qualify as psychologically mediated in the right way. However, this reasoning regarding ape’s vocal vs. gestural communication seems to turn on conflating features of apes’ signal repertoires, on the one hand, with features of the use they make of them in communicative episodes, on the other.Footnote 31 Both calls and gestures (as well as bodily postures and facial expressions) can have rigidly fixed, species-wide, structural features, and even (merely) functionally referential contextual meaning. But this leaves open the possibility that individuals can use unlearned signals in flexible ways that, moreover, betray other-directed communicative purposes and sensitivity to receivers’ uptake. To reiterate, what is relevant to the question of the intentional-psychological character of a form of communication is whether the production or interpretation of vocal signals essentially depends, specifically, on communicators’ sensitivity to each other’s psychological states (as revealed in the behaviors that accompany their vocal or gestural signals). And this cannot be determined by focusing exclusively on formal mappings between signal types, understood as elements of a system (the signal repertoire), on the one hand, and informational contents, on the other hand – independently of the psychology underlying communicators’ use of the signals.Footnote 32

Now, in their reevaluation of putty-nosed monkey calls, Arnold and Zuberbühler go to some length to explain that “[n]either ‘hacks’ nor ‘pyows’” exhibit the tight connection to the presence of an eliciting predator threat that is distinctive of functionally referential labels (2013: 1 f.). Indeed, they remark that the pyow call, especially, “appear[s] to function primarily as an attention-getter” (2013: 5, and 2012: 307). They recommend “a pragmatics[-first] approach” (2013: 2), motivated by their observations about how putty-nosed monkeys act when producing and receiving calls. When a male produces a pyow call, his body posture and other features of his demeanor that serve to express aspects of his state of mind – whether he is alarmed or relaxed, if alarmed, how alarmed he is, what he is alarmed at, and so on. And, upon hearing a pyow, “listeners … attempt to acquire additional information about the behavior of the caller” (2013: 2, emphasis added) – provided they can observe him – rather than immediately reacting by reflexively engaging in a fixed pattern of anti-predator behavior. Females with visual access to the male will only chime in with their own alarm chirp calls if they observe the male’s alert body posture and his gaze fixated on some specific threat. And only then will other group members with no visual access to the male approach the threat and begin calling and mobbing. What I take this to illustrate is the potential mediating role of psychological cues in enriching ‘lean’ content of unlearned signals. An unlearned call with a rather generic meaning ([Watch out-something’s-up!] can acquire more specific contextual content when produced (and received) in conjunction with various behaviors that cue receivers to callers’ psychological states – whether they are calm or agitated, what they are alarmed by, and so on.Footnote 33

If this description of the putty-nosed monkeys’ dynamic ‘division of communicative labor’ is correct, it at least suggests that monkeys consult psychological cues in their call communication. Moreover, their use of psychological cues may be sufficient to help fix a specific contextual content for a call such as pyow. For, it can enable them to determine whether a given use of pyow is a call to flee from, or else to mob designated threatening predator – as opposed to being an invitation to group-movement. There may be no need for the monkeys to deploy, in addition, pragmatic principles such as Informativity or to employ scalar implicatures. At the same time, this reading of monkeys’ call communication would open up the possibility that some genuine pragmatic precursors could be found not only in the gestural communication of our closest primate relatives. Perhaps – as some proponents of the pragmatics-first approach have held – certain precursors of pragmatic communication can even be found after all in monkeys’ use of unlearned calls.Footnote 34