Keywords

Introduction

What could language have to do with the development of a theory of mind? There are a number of different theoretical positions on this question. In one extreme, the answer is “nothing.” That is, on that language-as-conduit view, theory of mind is a development in social cognition, perhaps one shared across our close primate relatives, and perhaps beginning in preverbal infants as part of core knowledge. When a child learns language, these foundational ideas become expressible. If language is delayed, a child could still pass nonverbal tests as long as the knowledge system is developed through observation of the social world added to core knowledge. If language alone is lost as in aphasia, the knowledge should remain intact. If language is tied up in adults performing dual tasks, the reasoning capacity about others’ beliefs should remain.

On a second, cultural view, theory of mind is a cultural development, enabled by discourse that the child hears about the mind as a cause of behavior. Infants would necessarily be at only a primitive stage, perhaps capable of grasping the intention of others towards goal states, but not the contents of others’ minds. Language, specifically conversation, teaches the child a theory about why people act the way they do, highlighting that discrepancies in expected behavior could result from mistakes, or ignorance. Language helps build the knowledge system, which could then, as in aphasia, survive language loss. However, language delay would imperil its development. Adults with language tied up in a dual task should be able to reason about others’ belief states since the knowledge has already been built.

On a third view, an individual’s language is a cognitive tool that assists in complex reasoning, rather like a mental scratch-pad. Learning the labels for mental states assists in this reasoning, as does the grammar, as it permits the construction of counterfactuals and conditional statements to express chains of thought that would be more awkward without language, but not impossible. For example, a succession of images might also work, or discourse of simple sentences that then chain together. Aphasic patients might be able to use some other means than language to reason, as might language-delayed children, though there should be a cost in efficiency. The development in typical children might be gradual: the greater the vocabulary and general syntactic skill, the better their theory of mind reasoning might be. It is easy to imagine that other factors such as short-term memory, and executive function, could play significant roles in this sensible, cognitive-efficiency view.

But why stop at sensible? On a fourth view, representing the opposite extreme from the conduit position, theory of mind is an outgrowth of language development. Language provides representational structures that scaffold belief reasoning: the semantics of intensional states arise through grammar. Infants and young children can perhaps predict goal-seeking behavior, but given syntactic development they can represent to themselves the complex propositions needed to predict and explain behavior. Language-delayed children would be impaired in this, even with tasks that make no linguistic demands. Aphasic individuals may still have sufficient access to the language faculty to so reason, but it is improbable. Adults with their language faculty occupied by another task should be unable to reason about false beliefs, as language is still needed to represent the complex propositions required for the reasoning.

Because space is limited, the focus here will be on false belief (henceforth FB) reasoning in particular, and the potential role of language in assisting that thinking in childhood. FB reasoning is sometimes considered the apex of reasoning about other minds in childhood, though even as adults we develop further nuanced understanding of other minds. In essence, FB reasoning is when the child comes to realize that an individual can believe something that is not true from the standpoint of “reality,” or consensual knowledge. For example, someone might believe that they lost their glasses, when their glasses are in their hand. Or, in a classic test, a character might believe she put her chocolate in the cupboard, when we know it was subsequently moved to the refrigerator. We explain the person’s searching in the cupboard by saying “she thinks her chocolate is in there.” It is about the same time in life that a child can recognize that they too, might have a mistaken belief. Reviews of theory of mind developments earlier than false beliefs, and their possible links to language, can be found in de Villiers (2007).

Evidence for the Conduit View

Infants

Important evidence for the conduit view of language comes from the study of preverbal infants. Three different types of study have been done, each of them finding evidence from measures such as looking time or gaze direction that preverbal infants might be attending to another’s mental states. These are called implicit theory of mind tasks, because no behavioral decision is required. In the first type, infants in the second year of life or even younger have been shown to gaze for a more protracted time at events in which a human character acts in a way contrary to expectation, in particular, a way that is not in keeping with the belief they should have formed (Onishi & Baillargeon, 2005). In particular, they are surprised when the character goes to a location where an object really is, when the character did not see it move there. In a second design, infants look expectantly at a location where a character should go, based on where that character believes something to be (Southgate, Senju, & Csibra, 2007). In a third design, very young children come to the assistance of another individual specifically if that person was not witness to how something works (Buttelman, Carpenter, & Tomasello, 2009).

The question is, are these children acting on the basis of a belief attribution, or something simpler? A number of interesting alternatives have been proposed, the most reductive of which is that the infant is responding to some accidental but correlated feature of the set-up. Another class of theories suggest that the child is responsive to a behavioral rule, such as “people go to where they last saw something” (Perner & Ruffman, 2005). However, others (Baillargeon, Buttelmann, & Southgate, 2018) have defended the sophistication of children’s responses. A compromise solution by Southgate and Vernetti (2014) suggested that infants may in fact be able to follow an agent’s point of view, as young as 6 months, but do not yet contrast it with their own. To put the two in contrast may only come later, perhaps when language is recruited. Finally, an important contribution comes from Apperly and Butterfill (2009)and Low and Watts (2013), who suggest that here may in fact be two systems for attending to others’ theory of mind, one fast and automatic, there from very early, and independent of language or culture, and the other reflective, slow, and perhaps contingent on language acquisition. These writers claim that the behavior of infants in these tasks might have a signature limitation that distinguishes it from the success of children on explicit false belief tasks at around age four. A possible signature limitation is that infants can attend to the direction of an intention, say to a location, but are as yet unable to represent the contents of another’s mental states, say that the object is a rock not a sponge (Low & Watts, 2013). However, at least a few experiments claim that infants can compute the contents of another’s beliefs (Scott & Baillargeon, 2009).

The research on infant theory of mind, crucial to the claim that this kind of mental activity can precede language skills, is fraught with uncertainty as to the studies’ replicability. There has been a general crisis of replicability in many areas of Psychology, and this particular domain suffers from special difficulty given the age of the participants and the chance that unwitting clues could be transmitted in subtle experiments. It is even more difficult when the measure might be a second or two of differential looking time, and no other behavior is available to confirm its meaning. Unfortunately there are dozens of failures reported across other labs, and as always with failures to replicate, they then do not get published (Kulke, Von Duhn, Schneider, & Rakoczy, 2018; Rakoczy, 2012; but see Baillargeon et al., 2018). This is a rich area to watch, with some brilliant innovation, but the conclusions are far from clear.

Preschool Children

If infants can succeed on implicit false belief tasks, then the conduit view faces the problem of explaining the gap that occurs between infant success and the failure of 2- and 3-year-olds on explicit false belief tasks. The answer cannot be merely that language is required to follow the task instructions or the story, because tasks have been devised (Schick, de Villiers, de Villiers, & Hoffmeister, 2007; Woolfe, Want, & Siegal, 2002) that require very little language to succeed. However, the explicit tasks do require a decision, an overt response, which differentiates them from those procedures that depend on eyegaze, an implicit response. For that reason, the common argument for the gap between infants and 4-years-olds on tasks with explicit demands has been that the younger child does not have the executive function skills to convert the implicit idea driving eyegaze into one that can mobilize a decision to act, choose the response, and resist competing demands such as reality, or a response based on the child’s own beliefs. Then the real driving force of development for explicit tasks is said to be executive function skills.

There are strong findings linking executive function skills—particularly inhibitory control—to the development of false belief skills (Carlson & Moses, 2001; Hughes & Ensor, 2007). However, there are other findings in which the relationship is weak or suggest that executive function is but one skill that affects success, and when pitted against other possibilities, its contribution is not unique. In particular, it can sometimes take second place as a predictor to language skills (de Villiers et al., 2015; Farrant, Mayberry, & Fletcher, 2012; Schick et al., 2007). Furthermore, the neuroscience findings reviewed in Saxe (2009) suggest a dissociation between inhibitory control and false belief understanding in patients with brain damage.

Aphasia

Finally, powerful evidence for the conduit view comes from a small sample of aphasic patients, that is patients with significant loss of language who were tested on theory of mind. The first case reported by Varley, Siegal, and Want (2001) was of a man, SA, with significant deficits in language, who nonetheless succeeded in passing a standard verbal false belief task. Varley argued that since language and theory of mind had dissociated, it proves that in adult reasoning about beliefs, language is not recruited. However, questions have been raised about how much SA’s language was lost if the directions could be followed (Baldo et al., 2005). The patient was given a verbal false belief task with reduced verbal demands. In addition, on many linguistic tasks for example, spoken word-picture matching and written word-picture matching, SA was still above chance, though grammar was impaired (Roche, 2018). Siegal and Varley (2006) reported on a second case study, MR, using a nonverbal task, and again found successful reasoning. In this case, the language tests showed impairment of the kinds of language skills normally said to be required for ToM reasoning, for example, understanding grammar. The third and perhaps most convincing case is patient PH, studied by Apperly, Samson, Carroll, Hussain, and Humphreys (2006), who was tested on a battery of language and ToM tasks, including tests of sentential complementation. Despite impairment on the language tasks, he made virtually no errors on first and even second-order tests of FB reasoning, all nonverbal in design, but each requiring an explicit decision. Apperly et al. (2006) write: “.. a complex task, it could be presented entirely nonverbally by establishing at the outset that on every trial (including a large number of filler and control trials) PH would be asked to judge where the searcher would search.” There is still an outstanding puzzle, especially for those of us who have worked with infants or profoundly language-delayed children: how were the task requirements conveyed? There is still research to be done perhaps with implicit tasks with patients with aphasia. In addition, there remains the possibility that aphasia could leave the language faculty itself intact, and affect only the performance aspects, whether production or comprehension of speech (Linebarger, Schwartz, & Saffran, 1983). If the deep aspects of language remain, they could in theory be used to explain the preserved thinking in this and other domains (see e.g., mathematics, logic: Benn, Zheng, Wilkinson, Siegal, & Varley, 2012; Fedorenko & Varley, 2016). There are differences of opinion about the nature of the “deep” aspects: Carruthers (2002) suggests that Logical Form, the logical propositions that underlie the structures in language, is the basis for human-type thinking; for Hinzen (2013), syntax is what allows human thought.

Non-humans

Perhaps the obvious place to look for evidence of theory of mind in the absence of human language is to look at non-humans. The literature is too vast to do justice to in such a chapter, but the studies suffer from many of the same difficulties as the research with infants. The work with chimpanzees and other great apes has been interpreted in a variety of ways, some generous and some, as Dennett (1983) described it, by a “killjoy” hypothesis that attributes much less intentionality to the creature’s response. Tomasello and colleagues (Call & Tomasello, 2008; Krupenye, Kano, Hirata, Call, & Tomasello, 2016) have continued to support the proposition that chimpanzees can reason about others’ belief states, for example, they will predict where a fellow chimpanzee (or someone dressed as one (Krupenye et al., 2016) will go to fetch a food object when that individual did not see it moved. Others, such as Povinelli and Vonk (2004), argue that empirical work suggests that chimpanzees do not even understand that “seeing leads to knowing,” a precursor skill for belief reasoning. Andrews (2005) proposed that all such tests will be ambiguous, and that looking for signs that a chimpanzee seeks an explanation for an odd behavior might be a more fruitful approach.

Surprisingly, some tantalizing tests have been done with birds rather than apes, in particular members of the Corvid family (jays, ravens, and crows). These birds prove highly sensitive to whether another creature was watching when they hide a cache of food, and clever experimentation suggests that they are monitoring the contingent behavior of a watching conspecific (Brecht, 2017). However, in a very clever experiment, scrub-jays did not take advantage of another’s false beliefs to hide food in a location that observer was led to believe was inaccessible. There is social intelligence here, but it is limited. In almost all the animal work, it is unclear how constrained these results are to hoarding food, and therefore to a specialized evolutionary path unlike that of humans.

Evidence for the Cultural View

Typical Development

We are all, at least in part, other-mind blind, because “a small but important part of the universe is enclosed within the skin of each individual” (Skinner, 1963). We have privileged access to the contents of our own minds and conscious mental states, and therefore we can only learn to describe it from others teaching us, who can only judge from our behavior. Other-mind blindness puts us at a disadvantage in teaching children words that refer to things inside of them. We can only infer that they are in the certain state say, pain as opposed to mere discomfort, happiness rather than excitement. Most writers on the subject of “private events” acknowledge that nevertheless, this is how we learn to interpret and describe the stimuli that lie inside our skins. On the cultural view, language about mental events has to be perceivedFootnote 1 for a child learning how to express those concepts in our culture. Infants and young children have wants and feelings, and practiced caregivers can “read” their behaviors and interpret them, providing food, or assistance, or comfort. Parents usually accompany their responses with explanations and labels, saying e.g., “Oh, so you want that set of keys?” Or “Did you hurt your toe?” What this means is that it can be subject to cultural variation. Not all cultures may label behavioral reactions in the same way, or provide the same labels for the supposed internal states. As a result, we each join a discourse defined by our particular culture (Nelson, 2005). In particular, when toddlers acquire internal state language they can talk about others’ feelings, preferences, desires, and perceptions. It makes new communication possible, allowing for example teasing, in addition to more positive aspects such as increased empathy (Dunn, 1988).

Through discourse, people develop a “folk psychology,” that is, a lay theory about how our own minds, and then the minds of other people, operate in the world and relate to observable behavior (Hutto, 2008). We hear behavior described and explained in mental terms. For example, people talk to us about someone trying to get an object that they want, and because they want it, they remember where they saw it last and go to the place they last saw it. As others talk about and interpret their inner worlds, children weave these accounts into their first psychological theories (Bartsch & Wellman, 1995; Bretherton & Beeghly, 1982; Dunn & Brophy, 2005; Nelson, 2005; Shatz, Wellman, & Silber, 1983).

The cultural theory highlights the importance of hearing words as labels for underlying mental states. Dunn (e.g., Dunn & Brophy, 2005) and Meins, Fernyhough, Arnott, Leekam, and Rosnay (2013) have shown how the frequency of mental talk in a child’s life influences their ability to pass false belief tasks. Some families engage in lots of mentalistic talk, and their children’s reasoning is advanced; others do much less. Interestingly, one of the major contributors is family size, in particular, the presence of siblings close in age. Children from larger families have shown an advantage in theory of mind development in several studies (e.g., Astington & Jenkins, 1995; Lewis, Freeman, Kyriakidou, Maridaki-Kassotaki, & Berridge, 1996; Peterson, 2001; Ruffman, Perner, Naito, Parkin, & Clements, 1998), particularly in the area of vocabulary about emotions.

At least in Western mainstream cultures (e.g., UK, USA, Australia, Germany), an important body of research on “mind-mindedness” has revealed that the degree to which parents use language about mental states contributes to children’s understanding of the mind (Dunn, Brown, Slomkowski, Tesla, & Youngblade, 1991; Meins et al., 2013; Perner, Ruffman, & Leekam, 1994; Ruffman, Slade, & Crowe, 2002). Parents also react to their child’s level of understanding, for example discussing more sophisticated mental states such as “remember” increasingly often once their infants become consistent gaze followers (Slaughter, Peterson, & Carpenter, 2009). Several studies have highlighted the active role played by children in shaping their own social environments (Dunn & Plomin, 1990). How do we show that the parents are not just responding to their child’s readiness to learn by speaking in a more complex way? Further research is needed to disentangle the contribution of the child’s own competence, as well as genetic overlap of child and parent.

Perhaps it is not hearing just mental state terms but hearing them integrated into causally connected conversation that counts. Individual differences in preschoolers rates of ToM development are linked to the richness of adult-child conversation involving explanations of mental states (Peterson & Slaughter, 2003; Slaughter & Peterson, 2012; Slaughter, Peterson, & Mackintosh, 2007). A recent study (Ebert, Peterson, Slaughter, & Weinert, 2017) looking at German and Australian families varying in SES replicated past studies in showing links between parents’ self-reported use of elaborated mentalistic conversation and children’s higher ToM scores.

Cultural and Typicality Differences

What matters most in the language input, and are there alternate routes depending on class and culture? There are some reports that propositional attitude reports are less common in the speech of some parents than others, worldwide. Several studies have found correlations between family socio-economic status (SES) and individual differences in false belief performance (e.g., Cole & Mitchell, 2000; Cutting & Dunn, 1999). Allen, de Villiers, and François (2001) investigated potential differences in white versus African American parents from different socio-economic classes in the USA, using the fairly extensive computerized transcripts from Hall, Nagy, and Linn (1984), in CHILDES (MacWhinney & Snow, 1985), of parents and their 5-year-old children. Looking just at the frequency of mental verbs, then Black working class parents produced proportionally fewer. However, Allen et al. argued if one is assessing how rich a linguistic environment is, it is important to consider more than frequency of, e.g., mental verbs; it is also necessary to consider the complexity of the contexts, and other aspects of that communicative context might compensate. For example, the Black working class children at age 5 talked much more than the other groups about communication, about who said what to whom. In doing so they used elaborate embedded language, but with no “mental” verb to be counted. Is this perhaps an important alternative route to sophisticated understanding? The use of mental terms with sentential complements is also rare among Mandarin-speaking parents and children (Snedeker & Li, 2000; Tardif & Wellman, 2000). However, Mandarin-speaking parents and children use sentential complement constructions for communication verbs (e.g., say, in Mandarin) more commonly and earlier in development than their English-speaking counterparts (Tardif & Wellman, 2000). Much more data are needed on this point cross-linguistically.

Here is yet a different perspective one can take on these findings, namely that the child is using the discourse as further evidence. Perhaps children use the language around them as a further source of evidence about other minds. In working out the meanings of words such as think and know, children may become aware of social cues that might be obscure if they just paid attention to behavior itself (e.g., Harris, 2005; Nelson, 2005). This can happen through speech or Sign. Native signing deaf children are as likely as typically developing children to engage with their parents in conversations about non-present objects, events, and ideas (Meadow, Greenberg, Erting, & Carmichael, 1981). The intact false belief comprehension shown by native signing deaf children is consistent with this richer linguistic environment (Schick et al., 2007).

A non-signing or late-signing deaf child’s input might be impoverished not just because of hearing loss, but also because they have less to work with to build a theory of mind (Peterson & Siegal, 1995). But on that view, that language merely adds to the evidence available, over a longer period of time, deaf children should eventually accumulate the necessary evidence using observation of behavior. The evidence from Pyers and Senghas (2009) on adult users of the not-yet-fully-developed Nicaraguan Sign Language contradicts that assumption. These adults from the first cohort of children entering the school did not have access to more sophisticated signed input to guide them as younger children, and still did not have a rich enough language as adults, lacking mental terms in particular (Pyers, 2004). These deaf individuals in their twenties were still impaired on false belief tasks. This finding suggests that language matters for such reasoning, and plays a role above and beyond providing extra evidence for a theory about other minds.

Other Linguistic Devices

It could be argued that language is full of devices that carry perspective, such as pronouns (I, you), spatial locatives (here, there) that indicate a speaker’s point of view. Other linguistic morphemes indicate the speaker’s predictions about a listener’s preparedness (a, the), that is, has the speaker mentioned this before to this individual? (Van Hout, Harrigan, & de Villiers, 2010). If the child surrounded by such talk from the start, why does FB reasoning take so long to learn? Yet research has failed to find strong connections between use or understanding of these devices and classic FB reasoning. Perhaps these devices offer evidence of difference in viewpoint, but not differences in truth, and it is only the latter that matter for FB reasoning (de Villiers, 2018).

In other languages, there are even more subtle devices for indicating epistemic state, such as evidentials. Evidentials grammatically mark an utterance for how the speaker knows what she is talking about: did she see it herself, or hear about it, or infer it from some clue? Research on Turkish, Bulgarian, Romani, and Tibetan as well as Korean has revealed the complex and sometimes protracted path of development of evidential morphology in children, and yet there is no compelling evidence that mastery of evidentials is linked to the onset of FB reasoning (Aksu-Ko & Alici, 2000; Aksu-Koc, Avci, Aydin, Sefer, and Yasa 2005; de Villiers & Garfield, 2017; Kyuchukov & de Villiers, 2009; Papafragou, Li, Choi, & Han, 2007).

Evidence for the Cognitive View

Typical Development

On the cognitive view, vocabulary and general grammar development assist theory of mind reasoning. The theoretical arguments in this domain are rather broad, and we begin with the broadest: one might argue that language skill is a proxy for verbal intelligence. That is, perhaps advanced theory of mind skills rest on the child’s general intelligence, and language skills are one of the best ways to measure human intelligence. It might be argued that finding a correlation in development between language and theory of mind reflects common genetic influence on each of these cognitive domains. Against this view are the preliminary findings from Hughes and Cutting (1999) in their twin study, indicating that the genetic influence on theory of mind was largely independent from the genes involved in language ability. The language index used in this study were the verbal subtests from the Stanford Binet Intelligence Scales (Thorndike, Hagen, & Sattler, 1986), a kind of general verbal facility or verbal intelligence, not communication skills.

Alternatively, language could be considered part of the cognitive tool-kit, not so much a reflection of intelligence as an instrument of reasoning and problem-solving. The more language a child has, the more the child can use this as a tool for reasoning: perhaps even controlling impulses and thus assisting executive function, or holding things in memory, or using chains of reasoning. All of these skills would help with explicit FB reasoning, a task where a child must follow a narrative, inhibit their own knowledge, and remember the events to predict a future action.

Several studies have found that vocabulary size predicts FB performance (Happé, 1995; Milligan, Astington, & Dack, 2007). In addition, the child’s general level of language measured on a standardized test has also been shown to be highly predictive of FB reasoning (Astington & Jenkins, 1999; See Milligan et al. (2007) and Farrar, Benigno, Tompkins, and Gage (2017) for meta-analyses). There is also important new evidence from a recent study by Brooks and Meltzoff (2015), who tracked the continuity in the development of children throughout the stages from gaze-following in infancy at 10.5 months to explicit FB reasoning at age 4.5 years of age. When the children were 2.5 years, their language was assessed by parental report, specifically looking at mental state vocabulary versus a matched list of non-mental state vocabulary. At the older age, children were also tested on the PPVT (a standard test of vocabulary). Controlling for their eventual verbal ability, the children’s gaze-following in infancy predicted their later mental state vocabulary, but not the matched non-mental vocabulary. The parent-reported mental state terms then predicted later ToM at 4.5 years.

Delayed Language

Language appears to be a powerful mechanism for the acquisition of explicit theory of mind skills in children with autism. Several studies show that performance on theory of mind tasks in children with autism is significantly related to both lexical knowledge (Dahlgren & Trillingsgaard, 1996; Happé, 1995; Leekam & Perner, 1991; Sparrevohn & Howie, 1995) and syntactic knowledge (Tager-Flusberg, 2000; Tager-Flusberg & Joseph, 2005; Tager-Flusberg & Sullivan, 1994).

Happé has argued (Happé, 1995) that children with autism may be able to use their language skills to “hack out” a solution, rather than using the routes to FB reasoning taken by typically developing children. That is, it is suggested that the dependency of theory of mind on language might be quite different for children with autism than for other children. Happé (1995) found that the threshold of language ability sufficient for passing such tasks is much higher in children with autism than in typically developing children. Those children on the spectrum with advanced language ability may pass false belief tasks using the scaffolding that these language skills provide. Tager-Flusberg and Joseph (2005) also argued that children with autism might miss out on securely establishing the precursors of belief reasoning, being delayed on such skills as shared attention, or sensitivity to other’s intentions. But those who are proficient at language might use this to scaffold their way into understanding the behavior of others.

Deaf children who are proficient native signers, especially those born to deaf parents who sign, show neither language nor theory of mind impairments, but other deaf children do (Schick et al., 2007). The consensus is that this is because of their delayed language. Peterson and Siegal (1995) found that only 50% of deaf children who were 8–13 years of age and born to hearing parents passed an unseen change-of-location task. Similarly, Russell et al. (1998) showed that non-signing deaf children who were aged 4, 9 to 16, 11 only passed a false belief task 28% of the time. In the study by Peterson, Wellman, and Liu (2005), only a third of the late-signing deaf children aged 5.5–13.2 years could pass a false belief task, but the group showed a similar ranking of five different ToM tasks to that of hearing children, albeit at a much later age. These results are echoed in other studies (Courtin, 2000; Courtin & Melot, 1998; Gale, de Villiers, de Villiers, & Pyers, 1996).

Adult Dual Task Studies

Much of the research on language and theory of mind has been developmental research, and the general assumption made by the cultural approach in particular, in all its variants, is that if language is needed at all for theory of mind, it must be just a developmental requirement. Once a theory of mind is established, then surely adults can operate without language as an intermediary. However, those who hold the cognitive view might make the case that language is a tool for such reasoning in adults as well as children.

Dual task studies have explored the possibility that language serves as a tool in reasoning even when the task is nonverbal, such as watching a brief video and predicting the ending based on a character’s belief or ignorance. The methods for such a study were established by work by Hermer-Vasquez, Spelke, and Katsnelson (1999), in which they showed that adults could not reason about complex spatial arrays while shadowing a narrative. In contrast, a rhythmic shadowing task, previously calibrated against the verbal shadowing on a visual search task, did not disrupt that reasoning. Borrowing that design, Newton and de Villiers (2007) found that complex verbal shadowing but not matched rhythmic shadowing also disrupted adults’ ability to reason about an agent’s false beliefs. A true belief task, in which the only difference was that the character saw what happened and acted on that true belief, was not disrupted by either kind of shadowing. A follow-up study showed that shadowing non-English (Swahili) also disrupted FB reasoning, even though no meaning could have been extracted from the Swahili being shadowed (Newton, 2006).

However, the results and interpretation of adult shadowing studies continue to be mixed. Dungan and Saxe (2012) replicated the finding that adults were impaired in reasoning about beliefs while verbal shadowing, but complex rhythmic shadowing, calibrated to the skills of the individuals, also disrupted their reasoning. They therefore interpret the interference effect as due to a more general attentional disruption. Forgeot d’Arc and Ramus (2011) also found some disruption of FB reasoning for adults who were verbally shadowing, but since their participants were also affected in their causal reasoning, the authors rejected the possibility that the language disruption was specific to belief reasoning. Most recently, and surprisingly, Samuel, Durdevic, Legg, Lurz, and Clayton (2019) tested adults who could succeed at FB reasoning while simultaneously engaged in verbal shadowing. However, their study used verbal shadowing of simple material, an 8-digit numeric sequence. The earlier studies had used shadowing of a complex narrative from an audio book. Clearly there is more work to be done here to discover whether there is some lower limit to the complexity of the material being shadowed, in order for it to interfere. In addition, perhaps the complexity of the event matters. Events involving true beliefs prove trivial to follow (Dungan & Saxe, 2012; Newton & de Villiers, 2007), but apparently some causal events can be made as complex as those involved in FB reasoning (Forgeot d’Arc and Ramus 2011).

Would adult success on an implicit, gaze-following task be impervious to any amount of verbal interference, which seems likely given the infant results? In principle this would appear to be a simple experimental question, were it not for the substantial difficulty several laboratories have had in getting adults to gaze consistently in the expected way in infant-style implicit false belief tasks (Kulke et al., 2018; Lin, 2009). More innovation and complexity might be required to engage adult participant’s attention in where balls get hidden!

A recent German study of a large number of typical adults using a complex structure equation model confirmed a significant contribution of language skills to a variety of theory of mind tasks that require reflective reasoning (Meinhardt-Injac, Daum, Meinhardt, & Persike, 2018). What is unclear is whether those skills are being recruited for that reasoning in the adults, or reflect the essential role played by language in the development of the knowledge about mind in childhood.

Evidence for the Representational View

Here evidence is reviewed that assesses whether specific syntactic achievements are necessary above and beyond the role of mental state vocabulary, rich discourse, and syntactic development in general.

Typical Development

The child’s own language appears to be a key, underlying mechanism for mastery of explicit FB tasks (Astington & Baird, 2005; San Juan & Astington, 2012). The question concerns the role of complement structures, a subset of the aspects of language-as-cognitive tool that seems theoretically to have special utility in representing states of others’ minds. The special property that complements (1) have relative to adjunct clauses (2), is that the embedded proposition in a verb complement can be false:

  1. 1.

    Arthur said that he finished the paper

  2. 2.

    Arthur slept after he finished the paper

    Complements thus allow the expression of, e.g., mistakes and lies, or false beliefs. The complement structure is unique and only occurs under certain verbs, exclusively communication and mental state verbs. These verbs allow mention of other possible worlds in which those propositions could be true, namely, worlds in the mind of the sentence subject. Finite complements are used to express what philosophers call propositional attitudes.

    Not only can such sentences express false propositions as belonging to another’s mind or perspective, but they can also capture the particular construal of a referent that may not be known to others. For example, one person may know something under a particular description, such as “my birthday gift from my grandmother,” but a friend may just know it as “the green vase.” If the friend then breaks the vase, it is still true to say:

  3. 3.

    Your friend broke your grandmother’s birthday gift to you

    But it would be untrue to say:

  4. 4.

    Your friend thought she broke your grandmother’s birthday gift to you.”

    Could other aspects of language play the role of complements? Specific vocabulary words exist, such as “deluded,” but the word alone fails to capture the specific content of a false belief:

  5. 5.

    Sally was deluded

  6. 6.

    Sally thought the pen was a candy cane.

    Discourse can perhaps do the trick, though it depends on mastery of ellipses like “so” or “that”:

  7. 7.

    The pen was not a candy cane

    Sally didn’t know that

    Sally didn’t think so

    Most intriguingly, discourse does not easily allow for recursive embedding though verb complements do:

  8. 8.

    The bridge was broken

    Sally didn’t think so

    Mary knew that

    Adults in English (Hollebrandse, Hobbs, de Villiers, & Roeper, 2008; Hollebrandse & Roeper, 2014) do not easily see that discourse to be equivalent in meaning to:

  9. 9.

    Mary knew that Sally didn’t think the bridge was broken.

    In sum, the special advantage of complements for capturing mental states is everywhere in the literature on propositional attitudes. But how do complements play a role in establishing the concepts of mental state?

    The first studies of this aspect of language in development showed that children begin using verbs such as think and know from an early age (Diessel & Tomasello, 2001; Shatz, 1994; Shatz et al., 1983) but their first uses may be less like expressions of propositional attitudes than like stereotyped forms, often self-referent, with narrow functions:

  10. 10.

    I don’t know (used as an escape from questioning)

  11. 11.

    I think it’s a dog (I think used as maybe).

    Crucial for FB reasoning is the ability to describe someone else’s thoughts (Bartsch & Wellman, 1995). The very first expressions of third person propositional attitudes seem to emerge around 3 or 3.5 years in spontaneous speech, and occur more rarely, e.g., in Adam’s transcripts in the computerized transcripts of child language CHILDES (Brown, 1973; MacWhinney & Snow, 1985):

  12. 12.

    Adam: She thought that was a tiger

  13. 13.

    Adam: He thought I said something about window

    However, in experimental settings when children are asked to understand these forms, consistent difficulty is revealed. For instance, de Villiers (1999) arranged scenarios in which characters made statements that were either lies or mistakes, such as:

  14. 14.

    The woman said she found her slipper. But look, it was really a mouse.

    What did the woman say she found?

Three-year-olds consistently answer “mouse,” even though the answer is provided in the sentence and one can argue that no “mind reading” is necessary in the situation. Four-year-olds answer “slipper.” A longitudinal study of 3–4-year-olds by de Villiers and Pyers (2002) and a very large study of children aged 4–10 years in the standardization of the DELV assessment test has exposed the time course and uniformity of this development (de Villiers, Burns, & Pearson, 2003).

It would be natural to propose that children at 3 or 4 do not yet have the conceptual resources to consider others’ perspectives and mental worlds, leading to errors with false complements as a result of their failures to understand others’ false beliefs. However, the reverse seems to be true. In several studies, children have been shown to understand complements before they can pass false belief tasks (de Villiers & de Villiers, 2009; de Villiers & Pyers, 2002), suggesting in the strongest claim, that such language is prerequisite for FB reasoning.

The finding of a strong correlation between complement mastery and FB reasoning has been documented now in several different languages: English (de Villiers & Pyers, 2002), German (Perner, Sprung, Zauner, & Haider, 2003), Danish (Knüppel, Steensgaard, & Jensen de López, 2008), and ASL (Schick et al., 2007). Aksu-Koc et al. (2005) found that production of complements predicted FB reasoning in Turkish better than evidentials did. A particularly interesting case arises with the deaf adults who learned a sign language from their peers as children in Nicaragua attending a school for the deaf established in the late 1970s. The sign language has been evolving in the hands—literally—of several generations of children over the past 40 years (Senghas, Senghas, & Pyers, 2014). Pyers (2004) asked whether the older signers, who learned a still-impoverished form of the sign as children, were able to pass nonverbal false belief tasks. By using an elicitation task, Pyers found those signers who could express propositional contents under mental state verbs were able to do FB reasoning, but those signers who did not, failed them even as adults (Pyers & Senghas, 2009).

However, there are some counter-instances to the claim that the complement mastery precedes false belief understanding. In children learning Cantonese (Cheung, Chen, & Yeung, 2009; Cheung et al., 2004; Tardif, So, & Kaciroti, 2007), a language in which the surface markers of complementation are virtually non-existent and there is no wh-movement, the results are less clear. Tardif et al. (2007) reported a large longitudinal study of children learning Cantonese in Hong Kong, and though she found significant correlations between complement comprehension on the de Villiers and Pyers (2002) “memory for complements” task and false belief understanding, overall the children were surprisingly poor at the complement comprehension test, even at age 6. These findings partially echoed Cheung et al. (2004). Thus the complements did not seem to be prerequisite for FB reasoning in Cantonese. One complexity worth noting is that there is a special lexical item in both Cantonese and Mandarin that means “to think falsely,” and it seems as if the burden of representing false beliefs is carried more by this special lexicon than by syntax in such a language. There is much that remains to be puzzled out.

Teasing Apart Variables

In particular, is it general language (e.g., Slade & Ruffman, 2005) or a specific understanding of sentential complements (e.g., de Villiers & de Villiers, 2009) that is responsible for the breakthrough around age 4 in false belief understanding? The conclusions are ambivalent, as not all the studies included both general language and complementation measures in the same investigation. In a meta-analysis in 2007, Milligan, Astington & Dack found support for the claim that complementation was the more consistent predictor of false belief understanding, though the number of relevant studies was very limited.

Other studies since 2005 have found strong effects but may not have included both complementation and general syntax among their measures. For example, Low (2010) found that understanding sentential complements predicted standard ToM tasks in a cross-sectional sample of English-speaking children, once age, nonverbal ability and implicit false belief scores were controlled. Farrar et al. (2017) provide a useful meta-analysis of the studies to date that did compare the role of complements and general language as predictors of FB reasoning. In 10 of the 18 studies (55%) that compared both, the general language hypothesis was supported over and above the specific role of complements. These studies have used a wide variety of measures to assess “general language ability,” including receptive vocabulary and different measures of syntax development. However, six of these ten studies were for Cantonese and Korean. As mentioned, mental state verbs differ in Cantonese and Mandarin compared to English, in that the distinction between true and false beliefs is carried lexically, in the verb (see Tardif et al., 2007). Nevertheless, the majority of these studies tested complements with communication verbs (except for Cheung, 2006; study 2). Thus Farrar et al. argue that even these cross-linguistic studies can be used to evaluate the relative contribution of complementation and general language.

Longitudinal studies are very rare, but they can help identify the direction of influence between the variables, as well as control for initial FB reasoning. Two early studies came to conflicting results. de Villiers and Pyers (2002) studied a small group (N = 28) of children over a year in preschool, and tested them at four points on a battery of theory of mind and language tests. Though they had begun the study expecting that false belief understanding might be necessary for comprehending complementation, the reverse turned out to be the case. At the time that children acquired a systematic understanding of sentential complements, then they also began to reliably pass false belief tasks.

Two larger longitudinal studies were rich enough to explore the relative contributions of vocabulary, general language, executive functioning and complements to FB reasoning in English-speaking preschool children. Farrant et al. (2012) added to the model the variable of maternal mindedness, predicting that variation in maternal input would predict children’s ability on sentence complements, which would then predict false belief understanding. Their sample included 91 typically developing Australian children studied twice across a year. Importantly the effects of variation in maternal mental talk was completely mediated by the children’s own competence at sentential complements, which predicted their belief ability. Cognitive flexibility was a further predictor, and the direction of effect was that sentential complement mastery predicted this executive function index rather than vice versa.

The Farrant study did not use structural equation modeling for their longitudinal portion, and had a relatively small sample size for the number of variables. We had the opportunity to test a large sample of low-income children (N = 325) over the course of several years as part of a preschool curricular intervention study (Lonigan et al., 2015). The children had received a large battery of language, executive function, and theory of mind measures, and these were repeated several times over the course of the study, making this an ideal group to test competing models. The results of a preliminary structural equation model looking at executive function (inhibitory control), vocabulary, and sentential complements at Time 1 and Time 2 (approximately 8 months apart) showed significant direct effects of complements, vocabulary and inhibitory control at Time 1, on FB reasoning at Time 2. In addition, there were significant indirect effects of inhibitory control and vocabulary at Time 1 on FB reasoning at Time 2, mediated through complement understanding at Time 1 (Chen, 2013; de Villiers, de Villiers, Lindley, Chen, and the School Readiness Research Consortium, 2015).

Atypical Children

If complementation is needed for ToM reasoning, then children who have not mastered them due to language delays or difficulties should struggle with ToM. We know that children with Developmental Language Disorder (DLD) display primary difficulties in formal language including complementation (Steel, Rose, & Eadie, 2016; Tuller, Henry, Sizaret, & Barthez, 2012). They are reportedly delayed in ToM, though these delays may be more subtle than those attested in ASD (Andrés-Roqueta, Adrian, Clemente, & Katsos, 2013; Holmes, 2002; Tucker, 2004). Mastery of complements by children with DLD also relates to their success at ToM (Miller, 2001). The verbal demands of the ToM tests administered in the studies are not sufficient to account for their ToM performance, as researchers have used tasks that are minimally verbal and the children still show difficulties (Nilsson & López, 2016). Complements have proven predictive of performance on minimally verbal ToM tasks for both DLD (Durrleman, Burnel, & Reboul, 2017) and ASD (Durrleman et al., 2016; Durrleman & Franck, 2015), and for deaf children with language delay (Schick et al., 2007). Farrant et al. (2012) had a sample of 31 children with language delay in their study, and the results showed that sentential complements and cognitive flexibility both predicted false beliefs in this population too.

Farrar et al. (2017) in their meta-analysis restrict attention to those studies in which both general language and complementation could be contrasted. They analyze eight studies of children with autism, deafness, or SLI, all of which indicate that language was associated with performance on false belief tasks. Complementation made an independent contribution in all of these studies except for two (e.g., Farrar et al., 2009; Lind & Bowler, 2009). In some of these populations, general language was also associated with false belief understanding. Thus, for the atypically developing children there was support for the complementation hypothesis, and Farrar et al. contend that language may be especially necessary for language-delayed children to succeed on false belief tasks.

Training Studies

The theory about sentential complements has the virtue of being falsifiable by means of experimental test, unlike many of the broader proposals. In particular, it is possible to test it via a causal intervention, namely an experimental manipulation in which one changes what children know about complementation, and see if the children improve on FB reasoning. That is, give the child the tool: does it help?

In the first such study, Hale and Tager-Flusberg (2003) took children who failed both a false belief and a sentential complement pretest and trained them in one of three conditions: direct FB reasoning, sentential complements, or relative clauses (a control group). Children trained in sentential complements were exposed only to communication verbs, allowing separation of the syntax of complementation from the lexical semantics of mental verbs. Children trained on either false belief or sentential complements significantly improved their performance on false belief tasks, whereas children trained in relative clauses showed no such improvement. What is unclear is whether the children trained on false belief directly did so without also understanding sentential complements, as the post-test arguably required them. In a second training study, Lohmann and Tomasello (2003) tested whether highlighting the nature of a deceptive object—say a candle shaped like an apple—might also improve FB reasoning. It appeared that deceptive discourse without complementation per se could suffice (Lohmann & Tomasello, 2003), though the children who received the deceptive discourse training were close to mastering complementation even on the pretest. The training condition that included training in both discourse about deception and sentential complements led to the most improvement in false belief understanding. Shuliang, Yanjie, and Sabbagh (2014) found that Mandarin-speaking preschool children trained on sentential complements with communication verbs showed improvement on FB reasoning, even without discourse about deception. However, they also found improvement in the conditions that used thought bubbles with representations of mistaken beliefs, despite the fact that those children did not improve on complementation. The possibility thus remains that children could succeed by other routes, as they did with direct training on false beliefs in Hale and Tager-Flusberg (2003). In sum, training complements of verbs of communication (Hale & Tager-Flusberg, 2003; Mo, Su, Sabbagh, & Jiaming, 2014) boost theory of mind reasoning in typically developing children. The optimum training may be to use complements and also deceptive objects.

The participants in these training studies were not delayed for either language or ToM, and were instead children on the cusp of developing these skills anyway. For clinical purposes, it seems important to see if enhancing complementation can boost reasoning about others’ thoughts in populations where ToM and/or language is affected. This might be especially useful if atypical children show the most benefit from acquiring complements (Farrar et al., 2017). Recent work by Durrleman et al. (2019) provides the answer. In that study of French-speaking children, three groups were used, all of them chosen because they failed on pretests of both FB and sentential complementation. One group were young, typically developing children, as in the previous training studies described. A second group were children with DLD, or delayed language development, that is, cognitively typical in other respects. The third were children on the autism spectrum. The criteria for inclusion were that the participants did not yet pass complement understanding or false belief tests, though the children with DLD and autism were older, and had enough language to follow the tasks. The children were all given one of two interventions using an iPad to deliver the training and automatically score: a vocabulary training app, versus a specially designed app all that trained communication verbs with sentential complements. The results of 2 or 3 sessions per week, for 3–6 weeks of training, revealed a significant change on post-tests in both sentential complementation and on FB reasoning only in the children given the complement training, and importantly, the training was equally effective across the three participant groups, suggesting clinical usefulness. It remains to be discovered whether the gains are short term, though in this study they persisted at least until a second post-test several weeks later.

Conclusion

What does a neuroscientist need to know in this area? Theories abound about the role of language in FB reasoning, or what is standardly called theory of mind. Considering all the theories: can we distinguish them with existing data?

The cultural view that children learn from discourse about the mind can be subsumed under the representational view, in that most relevant discourse would include complements of mental states. The reverse is less clear, since children can learn from acquiring complement of communication verbs in training studies. However, no-one has proposed that the information children receive in training studies is the only input they get: surely real life is simultaneously providing information that allows them to see analogies in usage across communication and mental verbs. Yet complement syntax does seem to play a critical role, even if in Chinese languages an essential cue is carried by the head of the complement, the verb “think falsely.”

The cognitive tool view is a broader version of the representational view, and there may be value in considering it as an extra perspective, e.g., the roles that language, even just labels, can play in inhibitory function or short-term memory. Without the syntax of complements, it falls short as an explanation.

The conduit view finds support in the cases of infant theory of mind and the case of aphasics. Both groups seem to be succeeding at complex theory of mind sans language. The result of each have been challenged, methodologically, so neither case is resolved. The infants cannot tell us what they think, or what drives their looking, but if it is genuinely based on reading beliefs, the result must mean that the concepts are there before language. The aphasia cases reveal nothing about ontogenesis: any of the other views could be true about development, but perhaps the reasoning about beliefs can survive language loss. However, the failure on false belief of the late- or incomplete-language learning Nicaraguan signers contradicts the conduit view.

It is likely that each theory adds something to the account, and the research area continues to be highly fruitful and innovative. More work is needed on the effects of language delay or disorder, especially with better nonverbal tasks that definitively tap reasoning about the contents of belief states. There is more work needed on the possibility of two systems, one fast and instinctive, the other reflective and guided perhaps by linguistic reasoning. Additional cross-linguistic work is needed, including languages that express mental states in less common ways, and in varieties of Sign.