He turned out the light and went into Jem’s room. He would be there all night, and he would be there when Jem waked up in the morning.

– The final lines of To Kill a Mockingbird, by Harper Lee

When we read these lines, we are interpreting marks on a page and imbuing them with meaning. Beyond basic decoding and semantic processing, understanding a text requires that we put ourselves in the mindset of the character. Here, Atticus Finch’s son Jem is lying unconscious after an attack. The author does not explicitly report what Atticus feels or thinks. However, the skilled reader who takes Atticus’ perspective would be monitoring and representing his mental and emotional states and would interpret these lines as a father’s love and devotion to his son and imagine what Atticus might be feeling and thinking at his son’s bedside.

As this example demonstrates, reading comprehension is multifaceted and requires numerous skills (Scarborough 2001). Furthermore, reading comprehension is a non-unitary construct and may occur differently in different circumstances (Duke 2005). Although there are several models that define and discuss language and reading comprehension (e.g., Gernsbacher 1990; Gough and Tunmer 1986; Kintsch 1988), we define reading comprehension as, “the process of simultaneously extracting and constructing meaning through interaction and involvement with written language” (RAND Reading Study Group 2002, p. xiii). Indeed, regardless of the specifics of the models, one thing is clear: understanding extended written narratives requires that the reader both use the text and go beyond the text to make sense of the words and capture the meaning. Not surprisingly, decoding (e.g., Chen and Vellutino 1997; Hoover and Gough 1990; Kendeou et al. 2009), background knowledge (e.g., Grissmer et al. 2010), linguistic ability (e.g., Ouellette and Beers 2010), reading fluency (e.g., Kim et al. 2012), vocabulary (e.g., Ouellette 2006), inference-making (e.g., Kendeou et al. 2008), working memory (e.g., Cain et al. 2004), and motivation (e.g., Guthrie et al. 2007) all likely play a role in this complex task. A recent practice guide for educators on developing reading comprehension in school-age children focuses on academic language, phonological awareness, and decoding as skills to foster in early readers that lay the foundation for reading comprehension (Foorman et al. 2017). Without these fundamental skills, students will be unable to construct meaning from text.

Yet a substantial amount of the variability in children’s reading comprehension performance remains unexplained. In some studies, variance in younger children’s skills appears to be more easily explainable than variance within older populations. As children’s preliminary reading skills become automatized, additional factors begin to enter the reading comprehension equation. For example, Ouellette and Beers’ (2009) model explained 75% of the variance for first graders, but only 56% of the variance for 6th graders. Similarly, although Kim et al. (2012) found that 85% of the variance in reading comprehension could be explained for average word readers in 1st and 2nd grades, only 66% of the variance could be explained once children were skilled word readers. (But others have found mixed results (Adolf, Catts, & Little, 2006; Foorman et al. 2015). For example, in Foorman et al.’s study of 4th to 10th graders, the variance accounted for ranged from 72% in 6th graders to 99.5% in 9th graders with no clear pattern across grades.) Thus, it is clear that additional factors come into play after the initial stage of learning to decode. Likely these factors will offer additional explanatory power in predicting the remaining variance in reading comprehension skills, especially at later grade levels.

Thus, despite our accumulated knowledge about the processes and factors that are involved in reading comprehension, we still have students whose progress stalls around third or fourth grade. Recent estimates show that 64% of fourth graders, including almost 80% of Black and Hispanic students, perform at or below proficient levels on standardized reading assessments, (National Assessment of Educational Progress 2015). For reasons like this, initiatives like the Campaign for Grade-Level Reading supported by the Annie E. Casey Foundation are working to improve third grade reading (The Campaign for Grade-Level Reading 2017). This problem is not limited to the USA. Almost one in five students in OECD countries never reach baseline proficiency in reading (Program for International Student Assessment 2015).

Here, we argue that a potential missing piece in accounting for reading comprehension is the role of theory of mind. Theory of mind is implicated in the opening quotation because the reader must monitor and represent Atticus’s mental and emotional states, which then enables making inferences about Atticus’s thoughts and feelings to understand the text. For example, Harper Lee does not mention why Atticus wants to sit by his son’s bedside all night or describe explicitly how he is feeling, so readers must create a mental representation of his desires and emotions to appreciate the text’s broader meaning.

For clarity, we use the term reading comprehension to refer to the process of extracting meaning from written text. Most reading comprehension measures have children read a passage and answer multiple-choice comprehension questions, but others ask children to fill in a missing word in a sentence or passage instead. Importantly, one need not read per se to understand a story. This distinction is evident in models of reading comprehension that highlight the role of language comprehension (Hoover and Gough 1990; Scarborough 2001). Numerous studies support the claim that language comprehension predicts reading comprehension. Children with poor oral language comprehension tend to have a hard time understanding written text (Hoover and Gough 1990; Kendeou et al. 2009; Kim 2017). Thus, some studies described here use listening rather than reading measures to avoid confounds with children’s decoding abilities. Throughout this review, we refer to these measures as listening comprehension to distinguish them from text-based measures. In these measures, children usually listen to a passage read aloud and then answer multiple-choice comprehension questions. Finally, the term narrative processing refers to how children process or understand a story spontaneously while listening or reading. This term refers specifically to narratives rather than expository passages and is measured using a variety of implicit reaction time or recall measures, as described below. Narrative processing is assumed to occur regardless of whether one is listening or reading and underlies more explicit measures of comprehension.

Adult skilled readers appear to routinely monitor these kinds of mental states. They appear to form mental representations of characters’ emotional states, for example, expecting that a character whose actions resulted in a friend being fired would feel guilty (Gernsbacher et al. 1992). To some extent, children as young as 4 also seem to monitor mental, emotional, or motivational states of protagonists (Diergarten and Nieding 2013; Fecica and O’Neill 2010; O’Neill and Shultis 2007). These representations appear to be formed during naturalistic narrative processing. That is, adults and children take longer to process a sentence that is inconsistent with a character’s implied emotion than a sentence that is consistent. However, there may be individual variability in the extent to which both children and adults represent characters’ mental states and in the specificity and automaticity of such representations. Variability in this skill is likely related to theory of mind and may affect reading comprehension (Cartwright 2015). That is, the ability to represent the mental states of characters will likely yield better reading comprehension for children and adults—a thesis this paper examines.

To present the case for a relationship between these variables, we first define theory of mind (ToM) operationally and next turn to describing research on children’s narrative processing, as that processing precedes children’s understanding of written text and also entails ToM. This examination highlights the close alignment between developments in narrative processing and children’s developing ToM abilities. Given that not all children progress at the same pace in either realm, we next consider research that links individual differences in narrative processing skills with individual differences in reading comprehension. Emerging empirical evidence shows that ToM is concurrently related to children’s reading comprehension and also predicts reading comprehension longitudinally. A summative evaluation of the research permits us to present a framework of the relationship between ToM and reading comprehension which leads to empirical predictions that can be examined with further research. Finally, we consider the possibility that the activity of reading itself might foster ToM development and discuss relevant research findings that point to a possible bi-directional relationship.

Importantly, we do not aim to put forth a new theory of reading comprehension. Rather, we argue that ToM should be considered in the context of current theoretical approaches. We consider the relations between the current framework and broader theories of reading comprehension after presenting our framework. Overall, we argue not that ToM is an explanation for all of reading comprehension but that current theoretical approaches may implicate ToM implicitly and that its role should be investigated explicitly and empirically. Similarly, we do not argue that ToM is the only, or even necessarily the most important, predictor of reading comprehension that should be considered. Indeed, as noted above, there are many factors that contribute to reading comprehension, and ToM is only one example of a skill individuals can bring to the task of comprehension. We highlight ToM because we believe that this ability has historically been overlooked in reading comprehension research, perhaps partially because much of the research on ToM comes from the developmental psychology literature rather than the education literature. However, we encourage future research examining other overlooked predictors, such as causal reasoning and spatial abilities, that may play an important role in comprehension.

Furthermore, we do not argue that the idea that ToM may promote reading comprehension is entirely novel or unique in the literature. Indeed, we rely on the work of a number of scholars who have recently begun investigating these variables empirically and providing theoretical explanations for why they might be linked. For example, Kim (2016, 2017), Boerma et al. (2017), and Atkinson et al. (2017) have all studied relations between ToM and reading comprehension. Much of this work has considered ToM as one of several variables predicting reading comprehension, a clearly valuable approach for understanding ToM within the context of competing predictors. However, by focusing solely on ToM in this review, we highlight the exciting embryonic literature in this domain, pointing to the potentially unique and important role of ToM plays, and spurring further research addressing this issue.

What Is ToM and How Is It Studied?

Theory of mind (ToM) is defined as the understanding that other people have mental states that drive their actions. Another person’s mental state can also be different from the mental state possessed by the onlooker. For example, ToM is implicated when children understand sentences like, “He thought that she knew he was leaving.” ToM also includes children’s reasoning about others’ perceptions, desires, wants, and beliefs. Much of our understanding of human behavior relies on our understanding the function of their inner life, where intention, purpose, and motivation trigger that behavior. ToM develops across childhood and is best conceptualized as an extended series of accomplishments that can be measured using a range of tasks at multiple levels of mental state understanding (Wellman and Liu 2004). Research in ToM has led to a rich multifaceted literature in developmental psychology (for reviews see Flavell 2004; Wellman 2014); here, we give only an overview and share some key understandings from this very active area.

Notably, ToM is related to causal reasoning (Frye et al. 1995), as children must appreciate the causal mechanisms by which external events trigger mental or emotional responses. For example, by age 4, children are generally able to recognize that losing a favorite toy would make someone sad, whereas getting a gift would make someone happy (Pons et al. 2004). Causal reasoning may underlie this ability, but ToM is distinct in its focus on reasoning about mental states and emotions.

Even infants and toddlers show early evidence of ToM in their looking toward unexpected outcomes (e.g., Onishi and Baillargeon 2005) and in taking other individuals’ preferences into account in their giving behaviors (Repacholi and Gopnik 1997). However, the most drastic development in explicit ToM begins during the preschool years. Around age 3, children exhibit an understanding of others’ basic visual perspectives. Flavell and his colleagues (Flavell et al. 1981; Masangkay et al. 1974) found that 3-year-olds recognize that someone else can see an object that is currently not visible to the child or, conversely, that the other person is unable to see an object that is currently visible to the child (Flavell 1988). However, 3-year-olds’ understanding of others’ perceptions is still tenuous. In a more complex task, children are shown a picture of a turtle laid flat on the table in front of them and asked whether the turtle looks right-side-up or upside-down to them (terms explained prior to testing [Masangkay et al. 1974]). Then the child is asked whether the turtle looks right-side-up or upside-down to the experimenter who is sitting across the table from them. Here, 3-year-olds’ developing ToM is overtaxed—children cannot understand that the picture looks different from the experimenter’s perspective. It is not until about 4 years of age that children begin to master this task consistently, recognizing that others’ mental representations of the world can be different from their own.

The importance of younger children being able to appreciate another’s visual perspective is critical because such an appreciation highlights children’s growing understanding of how another person’s state of knowledge may impact their behavior. Indeed, this understanding is related to a similar task assessing children’s understanding that someone can hold a false belief about the world (Flavell 1988). In the classic false belief task, an experimenter uses dolls or puppets to portray a scene in which a child named Maxi puts a chocolate bar in the cupboard and goes outside to play (Wimmer and Perner 1983). While Maxi is outside, his mother comes in and moves the chocolate bar from the cupboard to the drawer. Then the child is asked where Maxi will look for his chocolate when he comes back inside—in the cupboard or in the drawer. By age 4 or 5, most children recognize that Maxi will hold a false belief about the location of the chocolate bar and look where he left it—in the cupboard. However, most 3-year-olds respond based on reality rather than on Maxi’s false belief and report that he will look for the chocolate bar where it actually is—in the drawer. Here, as in the perspective-taking task reviewed above, 3-year-olds seem to be unable to take the perspective of a naïve or mistaken other—someone whose mental representation of the world differs from their own (Wellman and Liu 2004).

Although children seem to have a fairly firm grasp of false belief understanding and different visual perspectives around age 4, other aspects of ToM continue to develop throughout middle childhood. More complex measures, such as vignettes or short film clips followed by questions, tap into whether children can appropriately interpret non-literal situations that involve interpreting others’ mental states (Devine and Hughes 2013, 2016; Happé 1994; White et al. 2009), such as when a burglar mistakenly confesses because a policeman noticed the burglar dropped a glove. Children are asked why the burglar confessed, given that the policeman just wanted to give him back his glove. Children under 6 have difficulty explaining this situation, even when they recognize that the policeman was surprised by the burglar’s actions. It is not until about 9 years of age that children describe the mental states behind the burglar’s behavior, explaining that the burglar thought that the policeman knew that he had robbed the store (O’Hare et al. 2009). Reasoning about others’ mental states, such as thinking and knowing, is an important aspect of ToM. Vignettes assessing children’s understanding of persuasion and sarcasm appear to be even more difficult, with children up to 12 providing only partial explanations of these situations (O’Hare et al. 2009).

This sequence of ToM development from early through middle childhood is linked in that it represents children’s growing ability to understand increasingly complex issues of subjectivity (Wellman and Liu 2004). The early-developing ability to distinguish visual perspectives focuses on physical or spatial subjectivity: that two people can perceive the world differently due to their differing locations in space. As children get older, they come to understand mental subjectivity, or the idea of subjective-objective contrast in which a person can be ignorant or mistaken about what is objectively true in the world (i.e., false belief). ToM in middle childhood is represented by more complex representations, such as embedded subjectivity: that one person can have a subjective understanding of the world (which may or not may not be accurate) and another person can have a subjective understanding of the first person’s subjective understanding (e.g., the burglar thought that the policeman knew that he had robbed the store). Although the milestones in ToM development each represent conceptually distinct understandings, they can be seem as forming a single continuum because they represent children’s broadening understanding of subjectivity (Wellman and Liu 2004).

A large body of research has explored antecedents of individual differences in the developmental progression of ToM (Hughes and Devine 2015). Language skills tend to be a consistent predictor of ToM (Jenkins and Astington 1996; Lillard and Kavanaugh 2012; Milligan et al. 2007; Watson et al. 2001), as is the extent to which parents talk to children about mental states like wanting and thinking (Adrian et al. 2005; Meins et al. 2006; Ruffman et al. 2013). Number of siblings, particularly older siblings, is also predictive in some studies (Jenkins et al. 1996; Kennedy et al. 2015; McAlister and Peterson 2012; Perner et al. 1994; Ruffman et al. 1998).

Substantial individual differences in ToM development also show that this variability is predictive of other outcomes, such as social competence (Devine et al. 2016) more positive peer relations (Banerjee et al. 2011; Caputi et al. 2011), and even academic achievement (Lecce et al. 2011). Thus, it seems that although typically developing children eventually exhibit understanding of others’ mental states, the pace at which they do so impacts other aspects of their lives. Importantly, early research has begun to show that reading comprehension may also be an outcome related to children’s ToM.

How Do Children Process Narratives—Even Before Encountering Text?

Alignment Between ToM Development and Children’s Narrative Processing

The thesis we propose is that there may be a causal relationship between ToM and listening and reading comprehension. A first step toward analyzing this hypothesis would be to determine a potential temporal alignment across childhood between ToM abilities and how children process narratives while listening or reading. The review below indicates that the age at which children reach important developmental ToM milestones is notably similar to the ages at which they seem to master significant narrative processing abilities. The graphic in Fig. 1 includes approximate ages for each of these achievements in the middle row. The top row includes milestones in ToM development, labeled ToM 1, 2, and 3. The bottom row includes significant achievements in children’s narrative processing abilities that seem to occur around the same time in development, labeled NP 1, 2, and 3, and described further below. Crucially, we do not argue that temporal alignment in any way demonstrates a causal link or even suggests that these abilities are uniquely related above and beyond potential third variables like general cognitive development or working memory. However, we see temporal alignment as a necessary, although not sufficient, condition for proceeding to explore a potential causal relationship. Thus, this review provides a backdrop to our later discussion of preliminary studies linking individual differences in ToM and reading comprehension and for our proposed framework of a potential causal link between these variables.

Fig. 1
figure 1

The alignment between the development of theory of mind and children’s narrative processing abilities across childhood

Narrative Processing 1 (NP 1) and Theory of Mind 1 (ToM 1)

Even at the earliest stage that narrative processing has been assessed, children as young as 3 appear to process narratives from the spatial perspective of the protagonist in a way that implicates ToM. Rall and Harris (2000) read children stories that included deictic verbs (come, go) that were either consistent or inconsistent with the protagonist’s perspective. Children might hear, “Cinderella was in the hall scrubbing the floor when her stepmother came in” vs. “… when her stepmother went in.” Children were then asked to retell the story. In their retellings, children often changed the inconsistent verb to be consistent, suggesting that they were experiencing the narrative from the protagonist’s perspective, rather than from some outside perspective in which went would be just as reasonable as came. Notably, a later study by Ziegler et al. (2005) replicated this effect and showed that this correcting of the verb did not occur as strongly when the story did not include an agentic protagonist, that is, if the story was about a toy car instead of a person, suggesting that children are likely taking the characters’ perspective rather than simply anchoring to the characters’ physical location in the story. Tracking characters’ spatial perspective is represented in NP 1 in the graphic and is in line with ToM 1 studies showing that 3-year-olds can understand others’ visual perspectives. By at least age 3, children seem to “see” stories from the spatial perspective of the protagonist.

Fecica and O’Neill (2010) showed that at least by 4 years of age, spatial perspective taking appears to occur in real time during narrative processing. Prior research with adults had shown that, upon hearing a narrative, adult readers respond more quickly to questions referring to an item in close proximity to a character’s current location than to items further away (Bower and Morrow 1990; Glenberg et al. 1987; see Zwaan and Radvansky 1998 for a review). Participants who heard about a character putting his sweatshirt on versus taking it off before running halfway around a lake, responded to a question about the sweatshirt more quickly, presumably because they had been putting themselves in the shoes—and the sweatshirt—of the character (Glenberg et al. 1987). When the sweatshirt was left on the other side of the lake, participants responded more slowly because the sweatshirt was far away from their perspective in the story. Crucially, this study suggests that readers represent the narrative from the character’s perspective (i.e., on the other side of the lake), rather than simply representing the spatial information presented in the narrative (which would occur regardless of where the sweatshirt is located).

Fecica and O’Neill (2010) tested children’s narrative processing. Children heard a story one sentence at a time and had to press a button to progress to the next sentence—a measure of “processing time.” Results showed that children’s processing times were longer when the sentence said the character was walking rather than driving, presumably reflecting the difference in the actual time these two modes of movement would take. Apparently, children like adults engage in online processing of narratives, mentally simulating the actions from the character’s perspective. Notably, Fecica and O’Neill did not make claims about the role of ToM in this process, but rather argued for an embodied representation of narrative processing. However, these data seem to suggest that theory of mind processes related to visual or spatial perspectives (ToM 1 in the figure) may operate during narrative comprehension.

A final study makes clear that children adopt an internal perspective, constructing the story from the protagonist’s spatial perspective, rather than seeing the story and constructing the spatial relations from an external perspective. In line with work with adults (Bryant, Tversky, & Franklin 1992), Ziegler and Acquah (2013) showed that children responded most quickly and most accurately to prompts referring to objects that were in front of or behind a character in a story (in line with the canonical body axes of an upright observer) and more slowly for objects placed to the left or the right of character. If children simply encoded the spatial locations of the objects described in the narrative, they should respond similarly to all objects regardless of their orientation in relation to the character (see Franklin & Tversky, 1990, for more information about spatial framework theory).

Narrative Processing 2 (NP 2) and Theory of Mind 2 (ToM 2)

Adequate reading comprehension involves more than taking the spatial perspective of a character; it also requires an understanding of mental subjectivity. O’Neill and Shultis (2007) suggest that the ability to track a character’s mental perspective appears to emerge around 4 or 5 years of age. This emergence aligns closely with understanding others’ mental representations of the world (ToM 2). In O’Neill and Shultis’s (2007) study, 3- and 5-year-olds were shown a set of farm toys and told a story in which a character in one location (e.g., the field) was thinking about another location (e.g., the barn). When asked to point to “the cow,” would children point to the particular cow the character was thinking about or to a cow that was physically nearby? Five-year-olds pointed to the cow the character was thinking about, suggesting that they followed the character’s mental perspective, rather than focusing solely on the character’s current physical location. However, 3-year-olds tended to point to the cow that was physically nearby, suggesting that this ability to track a character’s mental perspective and follow their thoughts develops across the preschool years. Notably, 3-year-olds tracked a character’s physical perspective in this paradigm: When the character physically moved to the new location, rather than only thinking about it, children then chose the correct cow. This finding highlights the idea that spatial perspective taking (NP 1) emerges earlier than mental perspective taking (NP 2). This development seems to align with important advancements in ToM around age 4: the shift from simple visual perspective taking (ToM 1) to the ability to consider someone else’s mental representation of the world, which might differ from one’s own (ToM 2). Again, the close alignment in the developmental progression of these abilities highlights the possible causal role that ToM may play in narrative processing.

Noting a character’s psychological state may also be part of tracking a character’s mental perspective and understanding mental subjectivity. Fecica and O’Neill (2010) told 4-year-olds a story in which a character was either very excited about going somewhere or very reluctant about going somewhere. Processing times were longer when the character was reluctant than when the character was excited, suggesting that by age 4 children consider the character’s motivational state as they simulate the action of the narrative. In another study, 5-year-olds responded more quickly to sentences that describe an emotion with the same valence as the implied emotional state of the character than to sentences that describe an opposite-valence emotion (Diergarten and Nieding 2013), suggesting that at least the valence of the emotion is activated during narrative processing by age 5.

Related studies on children’s narrative production (storytelling), as opposed to narrative comprehension, show marked increases in children’s tendency to represent characters as persons with complex mental states in their own stories between ages 3 and 5 (Nicolopoulou and Richner 2007; Richner and Nicolopoulou 2001). Together, these findings suggest that around age 4 or 5, children are beginning to consider not just a character’s location and movements but also what the character is thinking and feeling. This timeline is consistent with a central aspect of the ToM development that is also occurring around this developmental period—the ability to understand others’ subjective mental representations of the world.

Narrative Processing 3 (NP 3) and Theory of Mind 3 (ToM 3)

The preponderance of research in both ToM and narrative processing has focused on preschoolers, so the alignment we describe among older children has a smaller evidence base. However, children’s narrative processing abilities may continue to develop beyond the preschool years, just as ToM does (Devine and Hughes 2013; Devine and Hughes 2016; Happé 1994; White et al. 2009). By the time most children are reading at age 7, they also show evidence of tracking characters’ goals, a vital process for understanding a narrative (NP 3). Nyhout (2015) had children listen to short stories on a computer in the processing time paradigm. Early in the story, a character’s goal is established (e.g., she wants to read her book). Later, children hear a sentence that is either consistent (e.g., she goes outside and reads her book) or inconsistent (e.g., she goes outside and jumps rope) with her previously stated goal. Children’s processing times were longer when the sentence was inconsistent than when it was consistent. Thus, by age 7, children seem to spontaneously represent goal information during narrative processing (NP 3). Although more research is needed to investigate whether younger children would also track characters’ goals, the appearance of this ability by age 7 aligns with the development of more advanced ToM competencies, such as beginning to be able to explain non-literal situations like misunderstandings (ToM 3).

Summary of NP-ToM Links

Together, these studies show a developmental progression in children’s processing of narratives that aligns with the development of ToM abilities across childhood. Even very young children seem to take a character’s spatial perspective during narrative processing (NP 1), in line with the early emerging ability to understand others’ subjective visual perspectives (ToM 1). O’Neill and Shultis (2007) demonstrate that understanding others’ spatial perspectives is easier and emerges earlier in development than a more complex understanding of characters’ subjective mental states. Later, as children develop through the preschool and early elementary years, they begin to understand a story from a character’s mental perspective, taking psychological states, emotions, and thoughts into account (NP 2). These abilities appear to emerge during an important developmental period when later ToM competencies like understanding others’ beliefs are also developing rapidly (ToM 2). Although the data are sparser, it seems that as children continue into middle childhood, we see some emerging evidence of more advanced narrative understanding, such as representing goal information (NP 3), in line with advanced ToM development when children begin to understand more complex mental phenomena such as misunderstandings (ToM 3). However, as noted above, the rough temporal alignment between the development of ToM and narrative processing across childhood does not provide evidence that these variables are causally linked. Furthermore, it is not clear from our review that the ToM milestones described here precede the narrative processing milestones in development. In fact, no published studies to our knowledge have included both ToM and narrative processing measures in order to examine their developmental progression, a gap that future research would do well to fill. We next move beyond narrative processing alone to consider whether there are individual differences in children’s ToM and narrative processing abilities that relate to measures of reading comprehension.

Evidence for Links Between ToM, Narrative Processing, and Reading Comprehension

Overall, children seem to gain basic ToM competencies across childhood (ToM 1, 2, and 3). However, the experience of taking a character’s perspective (NP 1, 2, and 3) may vary even for older children and adults. Some individuals may fully immerse themselves in a character’s perspective during narrative processing, whereas others may adopt the character’s perspective to a lesser extent. We hypothesize that these individual differences could be due to variable levels of ToM and may impact reading comprehension.

Narrative Processing Measures and Reading Comprehension

There are a few recent studies that explicitly investigate the link between ToM and reading comprehension in typically developing children. However, there are other studies that do not identify key study constructs they are examining as ToM but which can be interpreted as such. These studies measure narrative processing (NP 1, 2, and 3), but may implicate ToM (ToM 1, 2, and 3). In one such study, Barnes et al. (2013) demonstrate how children’s skill in taking a protagonist’s spatial perspective during reading might promote comprehension. Nine- to 16-year-old children memorized a model of a marketplace containing several shops with different objects outside each shop. After learning the layout, children read a story about a protagonist moving around the marketplace. Reading was interrupted periodically and children were tasked with identifying whether two objects from the market were from the same or different shops. If children are mentally following the protagonist’s perspective, they should respond more quickly and accurately when the objects are from shops located near the protagonist compared to those located far away. Indeed, the results showed this pattern, suggesting that children engage in perspective taking from the vantage point of the protagonist.

Barnes and her colleagues also measured reading comprehension independently of the marketplace task, using a passage comprehension test in which children read a sentence or paragraph and are asking to fill in a missing word (Woodcock, & Johnson 1989). Do children who are better at taking the perspective of the protagonist also have better reading comprehension? Notably, the study’s design allowed the researchers to determine which specific aspects of this task were related to comprehension. Results showed that performance on some types of probes, such as those for elements mentioned explicitly in the text, was not related to children’s performance on the reading comprehension measure. However, children’s accuracy and speed on probes along the character’s path, or shops that the protagonist would have had to pass but that were not explicitly mentioned, did predict reading comprehension. Some children seem to have taken the perspective of the protagonist to a greater extent than others, immersing themselves in the narrative and following the protagonist through the marketplace in their minds while reading the text. Furthermore, this tendency was related to children’s reading comprehension ability on an unrelated test.

Importantly, the reading comprehension measure Barnes et al.’ (2013) used consisted of short passages, which were not entirely or explicitly narrative based. Furthermore, because the study was correlational, it leaves open the possibility that other potential correlates, such as working memory skills, may be driving the relation between their measures. Nyhout and O’Neill (2013) addressed this concern by including a clever control. Seven-year-olds heard a story about a character delivering cookies to four buildings and then tried to arrange a three-dimensional model representing the neighborhood based on the story. Children’s performance (number of buildings placed correctly) was positively correlated with a standardized test of narrative comprehension, in which children listened to stories and answered a series of comprehension questions (Neale 1999). Importantly, another group of children heard an expository description of the neighborhood instead of the story. These children had worse performance on the model task than children who heard the story version. Furthermore, among children in the expository condition, performance on the model task was not related to narrative comprehension. This control allows for more confidence that the correlation between placing buildings correctly and comprehension in the narrative condition is not due to individual differences in related factors such as intelligence or memory.

Notably, both of these studies focus on taking a character’s physical or spatial perspective (NP 1). However, it is likely that measures of more advanced mental perspective taking (NP 2 and 3) would relate to comprehension more strongly. Unfortunately, to our knowledge, no published research has linked mental perspective taking during narrative processing (NP 2 and 3) to reading comprehension. However, it is also possible that measures of spatial perspective taking (NP 1) like those in Barnes et al. (2013) and Nyhout and O’Neill (2013), although theoretically distinct from measures of mental perspective taking (NP 2 and 3), would be at least moderately correlated with those more advanced mentalizing measures. Regardless, future research should investigate whether children’s tendency to follow a character’s mental perspective or attribute emotions and goals to characters would be even stronger predictors of comprehension.

ToM Measures and Reading Comprehension

Several recent studies have linked standard ToM tasks (ToM 1, 2, and 3) to listening or reading comprehension during childhood. Kim (2016) assessed listening comprehension in 6- to 7-year-old children in South Korea using a series of literal and inferential questions about short narrative-based stories. Children’s ToM was measured using a standard false belief task and a second-order false belief task (i.e., “He thought that she thought …”), both aspects of ToM 2. ToM was strongly predictive of listening comprehension, above and beyond other factors such as vocabulary and inferencing skills (see also Kim 2015). More recently, Kim (2017) has replicated this effect with second-graders in the USA for both listening and reading comprehension (see also Kim and Phillips 2014). In Kim (2017), listening comprehension was measured with three tasks, which are combined for the analyses: comprehension questions about narrative stories (Gillam & Pearson 2004), comprehension questions about expository passages (Leslie & Caldwell 2011; Woods & Moe 2011), and a task in which children listen to sentences and pick the picture that matches (Carrow-Woolfolk 2011). Boerma et al. (2017) found that this effect is not limited to young children. They tested 3rd and 4th graders in reading comprehension, measured with a standardized test that includes comprehension questions about primarily narrative texts (Weekers, Groenen, Kleintjes, & Feenstra 2011), and ToM using a common vignette-based test for middle childhood (White et al. 2009), a measure of ToM 3. They found that performance on the ToM measure predicted children’s reading comprehension skills above and beyond expressive verbal ability and print exposure. These findings provide initial empirical evidence for the role ToM may play in children’s developing comprehension skills.

Although correlational findings are suggestive, further evidence comes from longitudinal studies. Atkinson et al. (2017) followed children across 2 years, from preschool into early elementary school. They found that children’s ToM at age 4, measured using two classic false belief tasks (ToM 2), predicted to a standardized measure of reading comprehension that included comprehension questions about one fiction and one non-fiction story (Snowling et al. 2011) at age 6, controlling for other variables such as non-verbal ability, decoding, linguistic comprehension, and executive function. Thus, ToM seems to precede comprehension developmentally rather than simply being related concurrently, implying a potential causal relation.

Notably, these studies all included measures of cognitive development such as non-verbal ability, verbal ability, vocabulary, and linguistic comprehension and showed that ToM predicted reading comprehension when controlling for those variables. Thus, it is clear that ToM is not simply acting as an indicator of language development or cognitive development. Rather, there appears to be something unique to ToM that is predictive of reading comprehension outcomes above and beyond other cognitive abilities. Next, we speculate on what exactly might be unique about ToM that relates to reading comprehension, namely the representation of and inferencing about characters’ mental states.

A Framework of the Effect of Theory of Mind on Reading Comprehension

Together, the findings reviewed above provide emerging evidence that ToM may play an important role in children’s reading comprehension. However, these early studies leave open many questions, including why ToM might influence reading comprehension. Here, we lay out a framework hypothesizing a possible explanation. This framework makes novel and testable predictions about what types of comprehension and under what circumstances a relationship should be apparent.

Our framework suggests that increased ToM may lead to increased representation and monitoring of characters’ thoughts and emotions during narrative processing. When a reader can represent given information about a character’s emotional state or thoughts, then the reader is in a position to engage in increased inferencing about these mental states when they are not explicitly stated. This in turn leads to improved reading comprehension (see Fig. 2). As children develop ToM abilities (see Fig. 1 for more detail), they become able to take another’s perspective and transfer that to the process of reading comprehension. However, as we have described above, this process begins with listening comprehension. Considering how ToM links to reading comprehension may prove a fruitful direction for understanding what children who experience difficulty in comprehending narratives may be lacking.

Fig. 2
figure 2

The proposed framework for the effect of ToM on reading comprehension and related predictions

This proposed framework leads to several distinct and testable predictions. A first prediction implied by this framework is that the relationship between ToM and reading comprehension should be apparent for inferences about mental states and emotions and would not be apparent (or would be weaker) for literal statements and causal inferences. Some reading comprehension questions ask literal questions about the content, for example, whether the character went to the grocery store before or after the carwash. Other questions may require inferencing about causal factors. For example, a question might ask why the back seat of the car was wet after going through the carwash if the text did not explicitly state that the window was down. Neither of these questions should be strongly related to ToM, according to this framework. On the other hand, some questions require inferencing about mental states and emotions. Asking how the character felt after getting home from the carwash and realizing that the back seat was wet clearly requires perspective taking as it is not stated in the text.

Several scholars have proposed taxonomies for classification that should be useful in considering the types of inferences that would be expected to be related to ToM. For example, Johnson and Johnson’s (1986) feelings-attitudes inferences should rely on ToM, whereas other types like agent and action inferences should be less affected. Similarly, in Trabasso’s (1981) framework, we would expect that motivation and psychological cause inferences would uniquely implicate ToM. Finally, in Graesser et al.’s (1994) model, ToM should be needed for superordinate goal inferences (i.e., a goal that motivates an agent’s intentional action), character emotional reaction inferences, as well as state inferences that are related to knowledge and beliefs. Other models include several categories that may or may not involve mental states that would be related to ToM. For example, in Frederiksen’s (1979) analysis, goal inferences, manner attributions, act inferences, and instrumental inferences could all potentially involve ToM, but only when they include cognitive or emotional content (e.g., an instrumental inference could be related to the cause of a cognitive/emotional event or to the cause of a physical event). These taxonomies all hint obliquely at the utility of ToM for understanding texts but none have explicitly made that link.

On the other hand, Kim (2016, 2017) did make this link explicit and reported that even when controlling for general inferencing skills, ToM has an effect on reading comprehension. The current framework predicts that measuring mental state and emotion-based inferencing (e.g., Gernsbacher et al. 1992) would fully mediate the relationship between theory of mind and reading comprehension. That is, a measure of ToM would predict strongly to a measure of inferences about mental states and emotions which would then predict reading comprehension. Notably, although ToM is related to making inferences from text about mental states and emotions, they are not identical constructs. ToM does not mandate that a person will use this knowledge to make inferences from texts. Yet ToM in our framework is necessary for being able to make those inferences. It is important to note that the prior studies that found a link between ToM and reading comprehension all used global measures of reading comprehension, whereas we would predict that the effect would be stronger with more fine-grained measures such as those focusing on mental state inferences. That the relationship has been found repeatedly with more global measures may highlight the robustness of the link.

Furthermore, the step before inferencing in the current framework is the representation and monitoring of mental states during narrative processing. Thus, the framework predicts that an online measure of children’s spontaneous representation of characters’ mental states during narrative processing would also fully mediate the relationship between ToM and reading comprehension. That is, ToM should be related to the representation and monitoring of mental states during narrative processing, which leads to inferencing about mental states, and then to reading comprehension.

A second prediction resulting from our proposed framework is that, as ToM is about the minds of other animate creatures, it should be primarily useful to children in reading narrative texts. For this reason, the framework predicts that the relation between ToM and reading comprehension should be apparent for narratives with characters and less so for expository texts. Similarly, texts can vary in complexity, with some requiring minimal or basic representation of and inferencing about mental states, whereas others involve intricate mental state concepts that would rely more on ToM (Cartwright 2015). Thus, ToM is likely to be more predictive of reading comprehension for narratives with higher levels of text complexity. Furthermore, the line between narrative and expository texts is not always clear, and the varying role of individuals in different types of informational texts may explain some variability in the extent to which ToM predicts reading comprehension for expository texts. Indeed, some informational texts, like biographies, often employ a narrative structure; others may include substantial references to individuals, as would be found in a news story about a proposed bill making its way through Congress. This text might require representation of and inferencing about the thoughts and emotions of both the politicians negotiating for their interests and the individuals whose lives would be affected by the bill’s passage. Notably, this prediction does not imply that entirely different subgroups of children struggle with narrative and expository comprehension. Many skills likely predict comprehension from all kinds of texts (e.g., background knowledge, working memory, and general inferencing skills). On the other hand, ToM should be differentially predictive dependent on the extent to which the text includes narrative or character-based elements.

In sum, this new framework hypothesizes a possible explanation for the emerging empirical evidence for a relation between ToM and reading comprehension. It makes novel, falsifiable predictions that have the potential to move this area of research forward, either in supporting or opposing the proposed relationship. In future research, experimental designs intervening upon ToM and testing for differences in reading comprehension would be useful for establishing a causal link should one exist. Many types of ToM training programs appear to have promise for improving these important skills (Hofmann et al. 2016). For example, including more explicit conversations about others’ mental states (Lecce et al. 2014) and emotion understanding (Ornaghi et al. 2014) have been shown to improve children’s ToM. This framework predicts that an intervention that improves ToM would subsequently lead to increased representation and monitoring of characters’ thoughts and emotions, which would then lead to increased inferencing about these mental states, which in turn would lead to increased reading comprehension. If such a pattern was borne out by the evidence, one could be more confident that increased ToM causes better reading comprehension.

Relatedly, individual differences in ToM are predicted by language (Jenkins and Astington 1996; Lillard and Kavanaugh 2012; Milligan et al. 2007; Watson et al. 2001) and executive function skills (Henning, Spinath, & Aschersleben 2010; Hughes & Ensor 2005; Hughes 1998), both of which are also predictive of reading comprehension (Cain et al. 2004; Cartwright 2002, 2007; Hoover and Gough 1990; Kendeou et al. 2009; Kim 2017). For example, executive function is implicated when readers integrate information from across different sentences and sections in the text to construct inferences that go beyond the information that has been explicitly stated (Kintsch 1988). As it relates to ToM, readers might need to use inhibitory control, working memory, and set-shifting skills to update their representation of a character’s changing emotional state. Models that examine ToM, language, and executive function variables simultaneously will be important for disentangling their interrelations and determining their underlying causes.

Importantly, this proposed framework is not intended to be a complete theory or model of reading comprehension. Rather it proposes a specific relation between one possible predictor, ToM, and reading comprehension. Our framework should be considered in light of broader theoretical approaches to reading comprehension. For example, our proposal is compatible with Rosenblatt’s Reader Response Theory in that the reader’s ToM knowledge contributes to deriving meaning and comprehension from text (Rosenblatt 1978). Similarly, we argue that ToM may be needed to understand what Bruner (1987) called the “landscape of consciousness,” (i.e., characters’ thoughts and emotions) whereas comprehension of the “landscape of actions” (i.e., literal or physical events) relies on other skills and can be comprehended with minimal ToM. Our proposal is also in line with schema-theoretic views, such as Anderson and Pearson (1984), in that readers must make slot-filling inferences in the process of comprehension and ToM could be seen as a particular type of prior knowledge that could lead to the monitoring and representation of characters’ mental states, which is then used to make inferences about characters’ thoughts and emotions.

Our framework is especially well-aligned with Kintsch’s Construction Integration Model (1988). According to Kintsch, the situation model refers to a reader’s mental representation of a text’s meaning, including inferences the reader makes, often based on prior knowledge, that are not explicit in the text. Although all levels of semantic processing are important for reading comprehension, the formation of an accurate situation model is the highest level of comprehension and often determines what readers later recall as the important information from a text. Many studies provide evidence that readers do indeed form a situation model of a narrative text during reading (see Zwaan and Radvansky 1998 for a review). For example, readers remember the essence of a situation they read about, rather than the specific wording and sentence structure (Bransford, Barclay, & Franks 1972). But despite the importance of situation model building for reading comprehension according to the Kintsch’s model, little published research has investigated antecedents of individual differences in situation model building. In this context, ToM can be seen as potentially helping to promote the formation of rich situation models during reading, thus improving comprehension. Indeed, representing characters’ mental state and emotions is one important aspect of creating a situation model of a text. Children with more advanced ToM skills may be better able to represent and monitor characters’ mental states and then incorporate these representations into situation models, leading to improved comprehension. The idea that ToM can inform situation model building is also in line with the Reading Systems Framework (Perfetti 1999; Perfetti and Stafura 2013).

Additionally, we see our framework as aligning with embodied theories of language comprehension (e.g., Glenberg 1992; Zwaan 2004). These theories propose that low-level processes like perception and action influence how we perceive and process language, because we simulate the state of the world a text describes. For example, this idea is exemplified in Fecica and O’Neill (2010) who found that children processed a sentence more quickly when a character was said to be driving somewhere versus walking somewhere, presumably based on simulating the speed of these two actions. This study also highlights how ToM may inform embodied accounts of comprehension. When the researchers varied the character’s psychological state rather than transportation mode, children processed the sentence more quickly when the character was excited rather than reluctant to go somewhere, suggesting that they were simulating not only the movement but also the mental state of the character. Our framework proposes that ToM may influence children’s ability to engage in these types of simulations to accurately represent and monitor characters’ mental states, which would influence their ability to make inferences about those mental states, thus influencing reading comprehension.

Does Reading Improve ToM?

We argue here that ToM is implicated in reading comprehension. Yet other literature suggests that the activity of reading itself may promote ToM. For example, Lysaker and colleagues used case studies and content analyses of picture books to argue that reading may provide children with important information about how other people think and feel, thus improving what they call “social imagination,” a construct similar to ToM (Arvelo Alicea and Lysaker 2017; Lysaker and Arvelo Alicea 2017; Lysaker and Miller 2013; Lysaker and Nie 2017; Lysaker et al. 2016). Self-report data from eighth grade students suggests that children may see reading as promoting social imagination (Ivey and Johnston 2013).

Indeed, correlational research suggests that the reading of fiction or narrative texts may be related to ToM. Mar et al. (2006) found that adults’ lifetime exposure to fiction positively predicted measures of ToM, empathy, and social ability, whereas non-fiction exposure was a negative predictor (see also Mar et al. 2009). The study used the Author Recognition Task (Stanovich and West 1989) which requires participants to check off from a list of names those that they recognize as names of authors. Guessing is discouraged because participants are told that not all names included are names of authors. One version includes author of fiction and another includes authors of non-fiction. Participants’ scores on these measures are interpreted as a measure of exposure to fiction and non-fiction, respectively. In this study, exposure to fiction was related to measures such as the Reading the Mind in the Eyes task (Baron-Cohen et al. 2001), in which participants see an image of a person’s eye region and must choose which of four possible mental states the person is experiencing. Exposure to non-fiction was negatively related to this task, suggesting that the relationship is specific to fiction exposure, not to reading in general. A later study with preschoolers used an adapted Author Recognition Test with names of children’s authors given to parents along with standard false belief ToM measures to children. In line with the adult findings, study results showed that 4- to 6-year-olds’ exposure to storybooks predicted relatively more advanced ToM (Mar et al. 2010). These authors hypothesize that reading fiction may promote ToM because it provides readers with a simulated experience of social interactions (Mar and Oatley 2008). However, it remains an empirical question whether ToM abilities initially underpin narrative comprehension and only then become improved as a result of more reading.

Of course, because these studies measured fiction exposure and ToM concurrently, causality cannot be inferred: people with better ToM may be more drawn to fiction, perhaps partially because it is easier for them to comprehend, consistent with our framework. Other studies have used experimental designs to target a possible causal link. Although some studies have found that reading literary fiction results in an immediate improvement on ToM measures relative to non-literary fiction or non-fiction (Kidd and Castano 2013; Pino and Mazza 2016), more recent large-scale attempts to replicate these findings have found null results (Panero et al. 2016). Regardless, these studies examine the immediate effects of reading fiction; even if such an effect is not consistent, the long-term effects of reading fiction over months or years could still be consequential for later ToM.

Potential evidence for a causal relation could come from longer interventions involving reading with children and investigating subsequent effects on ToM. In one such study, Lysaker et al. (2011) used a “relationally oriented reading instruction” intervention with second and third graders and found that ToM improved over an 8-week period. However, it is not clear that reading per se rather than other elements offered in the intervention were responsible for the ToM boost. Training included an explicit focus on discussing emotional states, arguably the driving factor behind the improvements in ToM. Furthermore, as no control group was included, the advances seen in ToM might have been reflective of improvements children would have made in the absence of the intervention.

Two recent data points may shed light on this chicken-and-egg problem. First, Boerma et al. (2017) found that the relationship between third and fourth graders’ reading comprehension and ToM held controlling for a measure of children’s exposure to narrative books. Second, in a longitudinal design, Lecce et al. (2017) found a reciprocal relationship between theory of mind and reading comprehension, with early ToM at age 9 predicting variance in reading comprehension at age 10 controlling for maternal education, verbal ability, and earlier reading comprehension. Together, these findings suggest, consistent with our framework, that the relationship between reading comprehension and ToM is not solely due to a causal relationship whereby children who spend more time reading or who have better reading comprehension improve more in ToM.

Regardless, the potential directionality of the relationship between ToM and reading comprehension should be examined in future experimental designs. According to our framework, training that improves ToM should also lead to improvements in reading comprehension of narrative texts. Similarly, if reading causes ToM, one would expect that an intervention to increase narrative reading should also improve ToM. Importantly, it is possible that the relationship between developing ToM and reading may be mutually reinforcing rather than only flowing from one direction to the other. This possibility is not in conflict with our framework; rather, we see it as a potential extension, and future research should examine this prospect. For example, cross-lagged designs could measure both reading and ToM at multiple time-points and examine the pattern of predictive relations across development. Notably, the directionality of these relations may change over time, such that, for example, ToM in childhood may initially lead to better reading comprehension and then later in life, larger amounts of reading could promote further ToM development. Again, these possibilities are not mutually exclusive and both mechanisms may be operating throughout development as well as concurrently.

Conclusion

This review suggested that ToM is related to both listening and reading comprehension during childhood. We have reviewed relevant literature and proposed a framework that provides a possible explanation for why such a link might exist. Notably, we have attempted to put forth the idea that ToM may improve reading comprehension as a feasible hypothesis, supported by the existing evidence, but a hypothesis that to our knowledge has not been tested in a rigorous way. Rather than arguing that this hypothesis is necessarily true, we propose it here in order to call for more and better research to investigate its validity. We end with three main points following from this discussion.

First, our framework provides a possible causal explanation for the link between ToM and reading comprehension: that increased ToM leads to increased representation of and inferencing about characters’ thoughts, motives, goals, and emotions, which then lead to increased reading comprehension. This framework offers testable predictions about the types of texts (narratives) and the types of comprehension (mental states and emotions) that should be most strongly linked to ToM.

Second, we argue that if future research supports the conjectures laid out here, it would suggest that improving ToM could positively impact students who struggle with narrative reading comprehension—especially in the case of students whose decoding skills are adequate. Putting yourself in someone else’s shoes may be essential to comprehending squiggles on a page that refer back to human behavior. However, current models of reading comprehension appear not to consider children’s ToM as a root cause for part of children’s reading comprehension problems (e.g., Foorman et al. 2017). Clearly, there are many factors that could contribute to the unexplained variability in reading comprehension outcomes and we do not propose that ToM is the only, or even the most important, one. However, the evidence described here suggests that its contribution may be significant and should be investigated further.

Finally, the research evidence and the proposed framework have important educational implications. If the framework is reinforced by further evidence, it would suggest that ToM development must be supported throughout early and middle childhood, especially for children from low SES backgrounds who are more likely to fall behind in both reading comprehension and ToM (Dunn and Cutting 1999; Pears and Moses 2003; Shatz et al. 2003). In today’s educational climate, in which teachers are increasingly pressured to spend time only on competencies that will be assessed in standardized testing, interventions to improve ToM may be considered unnecessary. But to the extent ToM is crucial for reading comprehension, early instruction that focuses on supporting its development alongside academic content may prove to be maximally effective, as well as having collateral benefits for children’s social behavior (Banerjee et al. 2011; Caputi et al. 2011; Devine et al. 2016).