All over the world, there are researchers and educators who are passionately dedicated to understanding reading comprehension and improving its teaching. Each generation adds to our knowledge as new discoveries are made and as earlier theories and practices are replaced or revised. There has perhaps never been a time when a richer, more diverse array of theoretical perspectives, instructional methods, and individual or group strategies has been available for research and application to this enduring problem.

Yet progress remains elusive. Reading comprehension is primarily what reading is for, and it is what public accountability tests attempt to measure. But in the USA, scores on the National Assessment of Educational Progress (NAEP) long-term tests indicate no statistically significant change for 17-year-olds since 1971, and statistically significant but small increases for 9- and 13-year-olds (NCES 2013). Results of the NAEP main assessment for 2015 likewise show statistically significant but small increases compared to 1992 for grades 4 and 8 (NCES 2015a) but a statistically significant, small decrease for grade 12 (NCES 2015b). Comprehension scores from the Progress in International Reading Literacy Study (PIRLS) for grade 4 indicate statistically significant increases between 2001 and 2011 for nine countries or educational systems (e. g., states or provinces), statistically significant decreases for four, and no statistically significant changes for the remaining six (Thompson et al. 2012). While local successes may be lauded, they are often impermanent; on a larger scale, appreciable and lasting improvement has been hard to come by.

Although the NAEP long-term tests were specifically designed for longitudinal comparisons, such comparisons need to be viewed cautiously due to changing student demographics, educational policies, and so on. But whatever the causes of stagnant progress on large-scale tests, constructive new directions for improvement seem likewise hard to come by. There is confoundingly little evidence that widely researched and applied comprehension programs and strategies of the last nearly 50 years have had substantial, lasting effects including programs of phonics and fluency designed to facilitate comprehension (meta-analyzed in Suggate 2016). Our diverse array of approaches seems to have left us adrift, more bewildered than enlightened.

In perspective, however, this situation is not inconsistent with progress in science. The search for solutions in all scientific fields has been characterized by generations of trial and error, blind alleys and insightful advances, and frustrating formulation and reformulation. The history of science teaches us that growth in understanding comes only through openness to redirection and ardent perseverance.

A promising source of progress may be found in the changing world of cognitive theory. For more than 20 years, cognitive theory has been profoundly redirected by the embodied cognition movement, the consequences of which could be telling for reading researchers and educators. In the rest of this commentary, I will (1) introduce and trace embodied cognition theory and compare it to controversial aspects of other cognitive theories, (2) relate embodied cognition to the field of reading comprehension and its teaching, and (3) summarize some key studies as examples of its potential.

Brief Introduction to Embodied Cognition

The current popularity of embodied cognition began in the late 1990s. Embodied cognition was aptly summarized by Thelen et al. (Thelen et al. 2001, p. 1):

To say that cognition is embodied means that it arises from bodily interactions with the world. From this point of view, cognition depends on the kinds of experiences that come from having a body with particular perceptual and motor capacities that are inseparably linked and that together form the matrix within which memory, emotion, language, and all other aspects of life are meshed.

In embodied cognition, all mental representations occur in a specific, sensory modality (i.e., modality-specific rather than amodal). Continuity exists between perception and memory such that what we experience visually is mentally encoded as visual, what we experience auditorily is encoded as auditory, and so on (cf. Spivey 2007). Motor activity is also emphasized in embodied cognition, so the psychomotor actions of our bodies are part of our encoded cognitive experience, whether conscious or unconscious. In an early review of theories of embodied cognition, Wilson (2002, pp. 632–633) explained this point:

Consider the example of counting on one’s fingers. In its fullest form, this can be a set of crisp and large movements, unambiguously setting forth the different fingers as counters. But it can also be done more subtly, differentiating the positions of the fingers only enough to allow the fingers to keep track. To the observer, this may look like mere twitching. Imagine, then, that we push the activity inward still further, allowing only the priming of motor programs but no overt movement. If this kind of mental activity can be employed successfully to assist a task such as counting, a new vista of cognitive strategies opens up. Many centralized, allegedly abstract cognitive activities may in fact make use of sensorimotor functions in exactly this kind of covert way.

Theories of embodied cognition can be directly contrasted to contemporary theories in which perception in any sensory modality is converted to abstract, amodal, and arbitrary mental symbols analogous to computer codes. All mental operations are then carried out in this abstract code. In embodied cognition, there are no abstract mental codes, structures, or processes that are divorced from sensory experience; there is no deeper, amodal, common code that is the basis of mental activity. In fact, the contemporary embodied cognition movement arose largely as a rejection of theories based on abstract, amodal structures that could not account for growing behavioral and neuropsychological evidence. Such abstract structures include all forms of abstract propositions, schemata, nodes, and modules. As Glenberg (2015, p. 165) succinctly asserted, “We are embodied; nothing more.”

The idea that cognition is embodied and concrete rather than disembodied and abstract has a history from antiquity. Summaries are provided in Barsalou (1999, 2010), Glenberg (2015), Lakoff (2012), and, as directly applied to literacy and its teaching, Sadoski and Paivio (2001, 2013). Its current popularity in psychology can be dated from several key works published around the same time, including Glenberg (1997), Barsalou (1999), and Lakoff and Johnson (1999). There are similarities and differences in the specifics of each of these evolving approaches to embodiment, but we shall mainly concentrate on their commonalities especially as they apply to language comprehension.

One commonality in embodied theories is the rejection of any form of abstract, amodal mental representation, as noted earlier. Examples include the proposition and the schema. As we shall see, whatever their history in philosophy and psychology, the contemporary form of these constructs is heavily rooted in artificial intelligence (AI).

The Problem with Propositions

The mental proposition, like the related concept of the schema, is one of the most abstract and least observable scientific ideas ever construed. Originally, propositions were actual statements used by philosophers that could be verified as true or false according to the laws of reasoning (e.g., all humans are mortal; Socrates is a human; therefore, Socrates is mortal). These statements came to be abstracted into the symbolic calculus of mathematical set theory (e.g., all As are Bs; C is an A; therefore…), and later computerized for AI (e.g., 001 = 010:100 = 001::…). Note the increasing degree of abstraction from concrete statements that could be verified by real-world observation to purely abstract reasoning using arbitrary symbols.

Propositions in contemporary cognitive theories are not sentences with real words or any other form of symbol system that is available to our senses. A proposition is theorized to be the abstract “deep structure” meaning of a sentence that could possibly be expressed using various “surface structure” syntactic arrangements of symbols like words (e.g., creatures of the sea and marine fauna would emerge from a common abstract proposition). All sensory perception from any source is theoretically converted to this common propositional format. The proposition itself, however, exists only in some abstract form that is beyond all sensory perception. Like their historic predecessors, contemporary propositions can supposedly be verified as true or false, yes or no, on or off (i.e., digitally coded as 1 or 0). Versions used in AI are ultimately mathematical algorithms in computer code.

Some non-computer propositional schemes have been proposed for text analysis (e.g., Kintsch 1974; Turner and Green 1977). Propositions were proposed as a form of idea units, with each proposition representing an idea in a kind of case grammar shorthand. For example, the sentence A shark is a fish would be coded: ISA, SHARK, FISH. Propositions were identified and arranged into a propositional text base of micropropositions and macropropositions (i.e., hierarchical structures of micropropositions) according to elaborate rules.

This general notation scheme was developed by Kintsch and his colleagues, stressing that the basis of the individual proposition was a predicate-argument schema (van Dijk and Kintsch 1983). The predicate-argument schema was a mental frame where a verb (e.g., ISA) stood in relation to its subjects (e.g., SHARK), predicate nouns (e.g., FISH), and so on, or alternatively coded as FISH [SHARK] according to preference. More complex sentences involved more complex propositional structures reflecting compounding, subordination, and so on. Macrostructures reflected the global organization of the text and could be schema-based as well, including schemata for narratives and other rhetorical structures.

The situation model, a mental model of the text which emerged from the propositional textbase along with propositions from long-term memory, was later added. An important advancement was that the situation model could be experienced in a nonverbal form such as mental imagery which was somehow generated by the propositional textbase (Kintsch 1988). How that occurs remains one of the more unresolved aspects of propositional coding schemes (Sadoski 1999).

Whatever the potential of these systems, hand-coded propositional analyses have not advanced and have rarely appeared in published studies of the comprehension of extended texts due to their complexity and practical limitations (Kintsch 2013). More recent advances now treat propositions as vectors of numbers, with each number indicating the degree to which the proposition is mathematically related to other propositions in multidimensional statistical space (e.g., Burgess and Lund 1997; Landauer and Dumais 1997). That is, this approach to text comprehension is distilling into disembodied AI systems involving complex mathematical algorithms (e.g., Hyperspace Analog to Language (HAL); Latent Semantic Analysis (LSA)).

The Trouble with Schemata

Like the proposition, the schema remains a popular but misunderstood concept. Reviews of the schema literature have consistently questioned its theoretical definition, its empirical support, and the reification of the term among researchers and practitioners (e.g., Alba and Hasher 1983; Sadoski et al. 1991). These questions remain unanswered.

Like the proposition, the schema is not theorized to be knowledge that is concrete and perceptible. Rather, it is theorized as an abstract, amodal framework for perceptible knowledge that organizes that knowledge in and for use (i.e., instantiation). Consequently, most references to schema theory are heavily metaphorical (e.g., frames with slots to be filled).

Perhaps the most apt metaphor is one directly derived from its contemporary AI roots: Schemata are the software of the mind. Metaphorically, a schema is like a word processor. When you open your word processor, a framework of defaults appears including margins, font, spacing, and so on, before you type a word. These defaults apply automatically unless overridden. Likewise, a schema is theorized to have variables that are filled by default from memory unless they are overridden by text signals or reader strategies. This is the general basis of, as well as a serious limitation of, computational theories. Schemata are not simply like computer programs, they are computer programs in these theories.

Problems with schema theory permeate still deeper: There is no clarity or agreement about what form schemata take. Most accounts maintain that schemata are composed of propositions (e.g., Anderson and Pearson 1984) whereas others maintain that schemata are composed of subschemata down to the level of the atomic schema, the form of which is not specified (e.g., Rumelhart and Ortony 1977). In one of the most elaborate published examples, Rumelhart and Ortony illustrated a schema for a face where the schema was a network of nodes in mathematical locations in a Cartesian coordinate space. The nodes reflected subschemata with labels such as “eye,” and “mouth,” which were also mathematical locations in their own Cartesian coordinate subspaces. Each subschema comprised still smaller, similarly computational subschemata (e.g., “iris,” “pupil”) which could be reduced to an atomic schema which could not be further reduced. This definition seemed to imply that some schemata might have a spatial component, but no perceptual representations were included.

One candidate for the form of this atomic element is the proposition, which, as noted earlier, is an instantiation of the basic predicate-argument schema in some theories (Kintsch 1998). This perplexing chicken-egg circularity remains unresolved (Frank et al. 2008). Other computational theories have postulated multiple atomic units including abstract propositions, spatial images, and temporal strings (Anderson 1983). The concept of the mental model was popularized by Johnson-Laird (1983) as an alternative to schema and other formal systems. The mental model included both propositions and mental images, with mental images derived from propositional descriptions (cf. the situation model of Kintsch 1988). The computational architecture of mental models continues to be developed along these lines (Johnson-Laird 2013).

Perhaps the most fundamental problem for propositions and schemata is whether they can, in principle, ever account for comprehension—how can abstract, amodal, arbitrary symbols become meaningful through computational connections to other abstract, amodal, arbitrary symbols without something escaping the web and being grounded in the world of nonverbal reality (Sadoski and Paivio 2013; Searle 1980)? In practice, operational definitions are hard to come by except in computer programs or unwieldy hand coding systems, both of which have distinct inherent limitations including the modeling of mental images, motivation, emotion, and even meaning itself. We shall soon return to these issues as they apply to reading comprehension and its teaching, but we now return to the alternative offered by embodied cognition.

Embodied Cognition as a Fundamental Theoretical Alternative

As introduced earlier, contemporary embodied cognition emerged in the 1990s largely as a response to the assumptions of current AI-based cognitive theories. There are no abstract, amodal propositions in embodied theories. There are no schemata in embodied theories where the term is used in the abstract, amodal sense. All is physically grounded in our bodies and their interaction with the world including memory, language, and emotion. This is not merely a philosophical difference, but a difference based on behavioral results and a growing body of neuropsychological research that has practical importance.

As with all cognitive theories, embodied cognition maintains that knowledge is organized in and for use. The critical difference is that knowledge organization is based in the activity of the brain’s modal systems. The most commonly described form of embodied representation is the simulation. Simulation is the mental reenactment of sensory and motor (i.e., sensorimotor) experiences we have had in dealing with the world. For example, eating an ice cream cone would normally involve all our senses including seeing, hearing, taste, smell, and the haptic activities of holding the cone, licking the ice cream, chewing the cone, and so on. A full mental simulation of eating an ice cream cone would be a mental reenactment involving all of those sensory representations to some degree.

Mental imagery is the best known and researched form of simulation (e.g., Kosslyn et al. 2006; Paivio 1971, 1986, 2007; Sadoski and Paivio 2001, 2013). Researchers have traditionally studied visual imagery most prominently (e.g., Kosslyn 1980; Shepard and Cooper 1982) but these theoretical programs now include other modalities as well (e.g., Kosslyn et al. 2006). Moulton and Kosslyn (2009) propose that imagery is a basic form of simulation, and there is no difference between complex mental imagery and simulations in the form of the episodic scenarios we experience when we comprehend text—at that level, imagery and simulation are “joined at the hip” (p. 1278).

A central aspect of embodiment is motor activity. When cognition occurs, the body is often active in the covert way described earlier by Wilson (2002). For example, reading words that are associated with different parts of the body (e.g., lick, kick, pick) activate corresponding neurological motor systems (Pulvermüller 2005), and even the simple act of leaning forward has been shown to activate brain areas associated with positive motivation toward a goal (Harmon-Jones et al. 2011). The research teams of Glenberg and his colleagues and Zwaan and his colleagues have shown that performance in motor actions is related to first reading sentences in which the direction of the motor action is either consistent or inconsistent with the act. For example, moving the hand to or fro or clockwise or counterclockwise was faster or slower depending on first reading Close the drawer vs. Open the drawer (Glenberg and Kaschak 2002), or Dennis turned on the lamp vs. Eric turned down the volume (Zwaan and Taylor 2006). For possible limitations of this effect, see Papesh (2015).

Decades of neuropsychological studies have consistently shown that brain areas associated with nonverbal imagery systems are associated with processing concrete vs. abstract language (reviewed in Paivio 2007; Sadoski and Paivio 2013; meta-analyzed in Wang et al. 2010). Most studies have used words, phrases, and sentences for materials, but some have used extended texts. In a series of early studies, we found that students of various ages most frequently reported experiencing a mental image of the climax of literary short stories (Sadoski 1983, 1985; Sadoski et al. 1990). In a strong neurological confirmation of those results, Xu et al. (2005) used fMRI to observe brain activity while participants read Aesop’s fables. They found that brain activation in the early stages of reading was mostly, but not completely, in brain areas associated with language processing. However, the climactic outcomes of the stories showed additional brain activation far beyond areas that could be associated with language processing—areas associated with mental imagery, multimodal integration, and emotional responses.

In summary, embodied theories all share the assumption that cognition is basically concrete and modality-specific rather than abstract and amodal and also that comprehension and meaning are dependent on situational, imaginable contexts. This approach to meaning and comprehension is fundamentally different from AI-based theories.

Embodied Meaning as Comprehension

Perhaps the central deficiency of AI-based theories is the lack of a convincing account of comprehension and meaning. Publications on embodied cognition often cite Searle’s (1980) Chinese Room Problem. In this thought experiment, Searle imagined himself sitting in a room and having a string of Chinese characters slipped under the door (i.e., input). He cannot understand a word of Chinese, but by manually following a program for manipulating symbols and numerals (i.e., processing), he is able to produce a string of Chinese characters that fools those outside into believing he is literate in Chinese (i.e., output). Thus, Searle reasoned, the manipulation of abstract symbols by following rules does not produce language comprehension because he still had no understanding of Chinese. A strong rebuttal from AI advocates is that computer programs do understand—cognition is cognition whether it is carbon-based or silicon-based. A softer rebuttal is that AI can simulate certain aspects of human cognition and may be a useful tool in studying those aspects (for a full treatment of the Chinese Room Problem with rebuttals, see Cole 2015; see also Glenberg 2015; Harnad 1990; Sadoski 1999).

However, empirical testing shows that computer programs designed to simulate human comprehension have serious limitations. For example, Glenberg and Robertson (2000) tested multidimensional computer vectors from LSA against human participants’ ratings in determining the sensibility of sentences that varied in sensibility due to physical affordances (e.g., using a newspaper to protect one’s face from the wind vs. using a matchbook). Human participants found the distinctions trivially easy whereas LSA made no significant distinctions. The point is that disembodied computer vectors of word relationships could not comprehend situations that would be quite simple even for young children.

The point from the embodied perspective is that meaning is ultimately grounded in multisensory experience that includes vicarious, imaginative experience. Any nongenetic basis for understanding the world comes from actual experiences with the world. Embodied language comprehension is the simulated completion of acts of which the words are the first incipient motions. What is more, we can imaginatively engage in, and even be inspired by, reading works of fantasy, fiction, and nonfiction the content of which we may have perceived only in pictures, multimedia, or not at all. Humans engage imaginatively with texts in ways that computers were not designed to do. Whatever the value of computer simulations for robotics and other technological applications, this is a critical point especially for practitioners who deal with children and young adult learners daily.

Some Specific Embodied Theories

Extensive theories of embodied cognition have been developed, and several influential ones will be summarized here. One is that proposed by Lakoff and Johnson (1999) who started with three major findings of cognitive science: (1) the mind is inherently embodied, (2) thought is mostly unconscious, and (3) abstract concepts are largely metaphorical (p. 3). Such metaphors include the following: purposes are desired objects (e.g., seize an opportunity), categories are containers (e.g., sharks fall in the fish category), and time is motion (e.g., time marches on), among others. Research has determined that brain areas consistent with predicted metaphors are typically activated when a concept is evoked. As noted earlier, simply leaning forward activates brain areas associated with positive motivation, consistent with the desired object metaphor (Harmon-Jones et al. 2011). Even syllogistic reasoning, often thought to be the strongest case for abstract propositions, can be explained with the container metaphor. For example, if A is in B, and B is in C, then A is in C. Lakoff and Johnson proposed that embodied spatial and motor neuronal structures called image schemas organize thought unconsciously around such metaphors. Lakoff (2012) has now proposed a neural theory of thought and language based on this proposal.

Barsalou (1999) theorized that knowledge is grounded in perceptual symbol systems derived from sensorimotor experience. Perceptual symbols are bottom-up sensorimotor input based on our selective attention to multimodal experiences in the physical world. Perceptual symbols form the elementary basis of knowledge and combine into simulations that are the embodied basis of concepts. Perceptual symbols can vary in complexity so that, for example, our selective attention can distinguish a cup with a handle from a cup without a handle in the visual and haptic modalities and thereby develop different simulators for different kinds of cups including coffee mugs, styrofoam cups, trophy cups, and so on, depending on situated contexts. In one sense, Barsalou’s theory can be seen as an effort to replace abstract constructs like the proposition and the schema with embodied constructs. Barsalou refers to this theory as grounded cognition rather than embodied cognition because grounding refers to the physical and social environment as well as the body proper (Matheson and Barsalou in press).

The general theory has been extended to address the distinction between verbal and nonverbal representations and their interactions. Barsalou et al. (2008) proposed the Language and Situated Simulation (LASS) theory, the acknowledged basis of which is Dual Coding Theory (DCT). Both theories maintain that both verbal and nonverbal simulations are embodied, and the connections between them account for the understanding of both concrete and abstract concepts (cf. Louwerse 2010).

Glenberg (1997) argued that traditional accounts of memory focused too much on the function of memory for passive storage and too little on the function of memory in guiding action. He proposed that memory evolved to guide and adapt action in the world and that embodied simulations in memory reflect bodily actions and their ability to mesh with the physical world. Hence, all our cognitive processes are based on sensorimotor and emotional processes which are grounded in our anatomy and physiology. The perception of relevant objects triggers embodied affordances for action. For example, a small child might perceive that squeezing under a chair affords a hiding place in hide-and-seek. An adult would not because adult bodies do not squeeze under chairs too well. This does not mean that adults do not understand a chair as a hiding place for a child, only that this understanding is based on simulations from memory, either real or imagined.

Glenberg and colleagues (e.g., Glenberg and Kaschak 2002) also have addressed the theoretical issue of language comprehension, as noted earlier. A central proposal in this theory is the indexical hypothesis which involves three processes (Glenberg and Gallese 2012). First, words are indexed to perceptual symbols as suggested by Barsalou (1999). Second, affordances are derived from the perceptual symbols, as with the chair example earlier. Third, the grammatical construction of sentences (i.e., who does or is what) is used to mesh the affordances into a coherent and dynamic simulation. For example, the sentence hang your coat on the chair and pour yourself a cup is more sensibly coherent than hang your coat on the cup and pour yourself a chair simply because of the affordances of chairs and cups. Abstract language is largely based on metaphor in this theory, similar to Lakoff and Johnson (1999, 2003).

This brief tour of embodied theoretical alternatives concludes with what is perhaps the oldest and most established embodied theory, DCT (Paivio 1971, 1986, 2007; Sadoski 2015; Sadoski and Paivio 2001, 2013). From its inception, a basic principle of DCT has been that all mental representations are modality-specific, sensorimotor, and symbolic. All sensory modalities have been included in the theory including the visual, auditory, gustatory, olfactory, and haptic (e.g., tactile, kinesthetic) modalities as well as felt emotions. Another basic principle of DCT from its inception has been the distinction between the verbal code and the nonverbal code, a distinction that is orthogonal to sense modality. The theory has been updated and expanded to include accounts of the evolution of the human mind (Paivio 2007), the mental lexicon (Paivio 2010), intelligence (Paivio 2014), and motor learning in various fields including medicine (e.g., Goolsby and Sadoski 2013; Sadoski and Sanders 2008). As noted earlier, DCT is acknowledged as a basis of most embodied theories, and all other embodied theories bear a strong kinship. Counter to some incomplete interpretations of DCT (e.g., Hald et al. 2016), embodied motor processes are inherent in DCT and sensorimotor representations such as haptics are potentially useful in language and literacy learning (Minogue and Jones 2006; Paivio 1971, 1986, 2007; Pouw et al. 2014; Sadoski and Paivio 2013).

The area in which DCT has been most elaborated is literacy. DCT was proposed as an embodied alternative to schema-based theories including Construction-Integration Theory (CI; Kintsch 1988, 1998) as an account of reading comprehension (Sadoski et al. 1991), a position that has only strengthened over time. DCT is a candidate for a unified theory of literacy (Sadoski and Paivio 2007, 2013) because it has been systematically extended to all aspects of literacy including decoding, comprehension, and response in reading as well as written composition and spelling, all under a common set of theoretical principles. DCT also has been the basis of many well-researched educational applications including multimedia learning (e.g., Jankowski and Decker 2013; Leopold and Mayer 2015; Mayer 2009).

Comparing Embodied Theories

Although this review focuses mainly on what theories of embodied cognition share, these theories are not identical. For example, Paivio (Paivio 2007, chapter 5) compared and contrasted DCT with other embodied theories as well as disembodied theories, noting that none of them as yet adequately dealt with the structural and functional differences of the verbal-nonverbal distinction basic to DCT. These differences are important in predicting and explaining behavioral and neuropsychological results such as concreteness-abstractness effects in reading comprehension. We will return to this issue in a later section.

Differences between theories can be broadly classified as differences in theorized units and processes. For example, differences can be seen in the degree to which basic units are holistic or componential and in the processing necessary for the basic units to become activated and functioning. An example of a basic unit that was introduced earlier is the image schema for the container metaphor proposed by Lakoff. Theoretically, all instances of physical containment involve insides, outsides, and boundaries—the parts imply the whole and vice-versa in gestalt fashion. These spatial relations are derived from everyday bodily experiences such as putting things into containers and taking them out again. Our brains pattern these relations in various modalities including visual, motor, and so on. Language reflects and evokes these structures (e.g., prepositions like in, out) and can employ them metaphorically (e.g., having an idea in mind) even in domains as abstract as syllogistic reasoning (inferentially, if A is in B, and B is in C, then A is in C). Language comprehension heavily involves the evocation of basic image schemas which may be embedded (e.g., an idea is an object and the mind is its container). The description of an image schema based on spatial relations might seem suspiciously similar to the “face” schema presented earlier (Rumelhart and Ortony 1977), but evidence supports the hypothesis that image schemas are not learned by a process of abstraction over many instances but rather imposed by brain structure (Dodge and Lakoff 2005; Lakoff 2012).

The nature of units and processes in Barsalou’s grounded theory is somewhat different. Perceptual symbols are more componential than holistic, resulting from selective attention to aspects of embodied experiences. Perceptual symbols are then framed into simulators that produce coherent simulations of objects and events in relevant modalities. In the cup example used earlier, a simulator for a coffee cup with a handle might have two perceptual symbols, the cup and the handle, while a styrofoam cup without a handle might be one perceptual symbol. The complex process of arranging perceptual symbols into simulators and simulators into simulations is affected by situational information in the modal systems both from perception and memory. For example, different types of cups and cup behaviors are typically found at fine dining tables and at picnics. Language comprehension is the construction of a perceptual simulation of the meaning of an utterance or text (Barsalou 1999; Matheson and Barsalou in press).

Glenberg’s theory focuses prominently on the internalized motor activity associated with world objects and situations as effected by their affordances in various contexts. It therefore tends to be relatively more holistic in its view of basic units. In the hide-and-seek example used earlier, a child is more likely to perceive a chair as a hiding place than an adult. Chairs that are easier to squeeze under (e.g., chairs with long legs) are holistically different from chairs that are less easy to squeeze under (e.g., floor level recliners). This does not mean that chair legs could not be perceived as a separable component of chairs at some level, only that different holistic chairs would afford different behaviors when meshed with bodily abilities. Properties from the environment (e.g., the accessible space under a chair) are processed in terms of how the body can interact with that environment to reach a goal (e.g., hiding). That is, units and processes tend to apply at a more global level in this theory as situational constraints emerge and change. Language comprehension is seen as the successive transformation of patterns of possible embodied action (Glenberg 1997, 2015).

In DCT, units also tend to be more holistic but what counts as a “whole” is somewhat flexible. In the visual modality of the verbal code, units such as written letters and words are seen as wholes, but familiar written phrases can be holistic as well (e.g., rock ‘n’ roll). Likewise, in the auditory and haptic (i.e., articulatory) modalities of the verbal code, speech units can be phonemes, syllables, or word and phrase pronunciations. The haptic modality of the verbal code is used in Braille and fingerspelling. The nonverbal code tends to be even more flexibly holistic such that in the visual modality the whole could be a body with a face, a face, or an eye depending on mental perspective. Wholes are similarly fluid in the other modalities such as the sound of a noisy crowd or a close individual voice in that crowd. Cups, chairs, faces, and crowds are not encountered or represented in the abstract; they tend to be experienced and mentally represented in contextualized situations (e.g., cups at fine dining or picnics, different kinds of chairs, people at various distances). Language comprehension consists of associations within and between the units in the verbal and nonverbal codes in various modalities in a contextually constrained manner (Paivio 1971, 1986, 2007; Sadoski 2015; Sadoski and Paivio 2001, 2013).

We turn next to a brief discussion of applications of these theories to reading comprehension. However, we should note that (1) embodied theories continue to emerge and develop at a rapid rate, and (2) considerable controversy exists in the field regarding the desirability and place of disembodied representations vs. embodied representations as well as the nature of embodied cognition itself (e.g., Anderson 2003; Barsalou 2010; Dove 2010; Glenberg 2015; Goldinger et al. 2016; Kiefer and Pulvermüller 2012; Moulton and Kosslyn 2009; Paivio and Sadoski 2011; Psychological Bulletin and Review 2016; Wilson and Golonka 2013; Zwaan 2014).

Applications to Reading Comprehension

Since reading comprehension involves written discourse, embodied theories must address the issue of how written discourse evokes embodied simulations and ultimately how this might apply to education. This is an extraordinarily complex issue, and we will touch on only two key aspects of embodied reading comprehension here: context effects and concreteness-abstractness effects.

Context Effects

Context effects in written language are pervasive. Most research on embodiment and language has focused almost exclusively on words and brief sentences (Zwaan 2014), but more extended texts involve context effects at many levels (e.g., text cohesion, discourse structures, coherence). Contexts effects at discourse levels seem inherent in Lakoff’s embodied theory which evolved to explain the cognitive basis of metaphors that suffuse both everyday and specialized discourse (Lakoff and Johnson 2003). Familiar applications include employing metaphors to concretize abstractions that frame whole topics of discourse (e.g., the social ladder, the tree of evolution, the space-time rubber sheet). Reading comprehension educational programs based on this theory do not appear to have been extensively developed as yet, but several studies indicate that instructing students in understanding and using embodied metaphors enhances reading comprehension (e.g., Boers 2000; Wilson and Gibbs 2007). A particular implication of this theory may be teaching discourse structure metaphors (e.g., Jenson 1983; Sadoski 2009).

Context is also central in Glenberg’s embodied theory. In this theory, all comprehension involves the ability to effectively act in the world on the basis of our bodily affordances, the physical situation, and our goals. This typically includes real or imagined objects, spaces, and events within the bustling context provided by the world. In language contexts, action-based comprehension arises from mentally simulating the linguistic content in an embodied manner including action, perception, and emotion. In reading comprehension, however, the link between language symbols and embodied experience is usually more tenuous than in speech because of the absence of gestures, vocal prosody, and so on. Add to this the tasks associated with decoding print, and reading comprehension may be particularly difficult for struggling readers or second language learners (Glenberg 2011). Educational programs to improve reading comprehension based specifically on this theory using extended, event-based texts have been developed and tested (discussed later).

In DCT, context effects in reading comprehension are critical because contextual influences are critical to meaning. For example, heteronyms cannot even be decoded precisely without verbal context (e.g., minute detail, minute hand), and all polysemous words require verbal context for precise meaning (e.g., wedding ring, ring a bell, ring true). Nonverbal mental imagery forms an important aspect of context by representing world knowledge. The mental connections between language and the nonverbal world experience to which it refers provide embodied referents for the language and also serve to constrain images that are situationally appropriate. Both verbal and nonverbal contexts thereby serve to progressively refine the meaning of a text as it is read. Educational programs for reading comprehension based specifically on DCT or consistent with it have been developed and tested (discussed later).

As noted earlier, Barsalou’s grounded theory has been extended to language comprehension in the LASS theory which is consistent with DCT with certain caveats (Barsalou et al. 2008). However, these caveats might be resolved with closer analysis. For example, the LASS theory holds that meaning is largely represented in the simulation system (i.e., nonverbal simulations) but the linguistic system could include the representation of meaning in some contexts (Barsalou et al. 2008, p. 253). This is consistent with DCT. However, Barsalou et al. (p. 253 ff.) further assert that in LASS, the simulation system performs deeper conceptual processing, whereas in DCT deep conceptual processing occurs in both systems. However, in DCT, linguistic forms alone (e.g., words) have no inherent meaning beyond superficial recognition; they are meaningful in the deeper conceptual sense only through verbal and nonverbal contextual connections (Paivio and Sadoski 2011; Sadoski 2015; Sadoski and Paivio 2013). For example, the linguistic form ring can take a variety of different meanings in the contexts of jewelry, bells, mental impressions, boxing, and more. Verbal contexts restrict and refine likely alternatives, and they referentially evoke nonverbal images to varying degrees as vital world knowledge. This is particularly true in reading comprehension where immediate physical contexts may offer no clues (cf. Glenberg 2011; Zwaan 2014). The overall point is that LASS may be consistent with DCT in explaining reading comprehension although some theoretical differences remain to be defined and tested.

Concreteness Effects

The concreteness or abstractness of language is an important issue in embodied theories of reading comprehension. Decades of extensively controlled research has shown that concrete language (e.g., cup of coffee) is comprehended and recalled better than abstract language (e.g., theory of mind) at the word, phrase, sentence, and extended text levels (Paivio 1971, 1986, 2007; Sadoski and Paivio 2013; Wang et al. 2010). But how can embodied theories account for the comprehension of abstract language which seems to have no direct referent in embodied experience?

The embodied theories reviewed here offer various explanations. One way abstract language is embodied is through metaphor. Lakoff and Johnson (1999, 2003) maintain that we live by metaphors, proposing that image-based meaning is fundamental and abstract reasoning is a special case of image-based meaning using metaphor (e.g., image schemas such as the container metaphor). For example, the word theory is metaphorically understood as a structure with architecture—“theories are the invisible mansions of the mind” (Sadoski and Paivio 2007, p. 344). To ring true comes from the old practice of telling whether a coin was genuine or fake by the sound heard when it was tapped—a metaphoric idiom. Metaphor is now considered a central influence in thought and communication, and considerable empirical evidence shows that all language including metaphor is grounded in embodied imagery and simulated action (e.g., Barsalou et al. 2008; Gibbs 2008; Gibbs and Matlock 2008; Kiefer and Pulvermüller 2012; Lakoff 2012; Sadoski and Paivio 2013; Wilson and Gibbs 2007). Some suggest that a strict dichotomy between concrete and abstract concepts is questionable because abstract concepts are embedded in concrete situations without which they could be only vaguely understood, if at all (Kiefer and Trumpp 2012; Sadoski 2015).

Another way abstraction may be understood is through concrete exemplars (Barsalou 1999; Sadoski and Paivio 2013). For example, the abstract concept justice is often defined in abstract terms such as law, precedent, impartiality, and so on. However, these are abstractions themselves, and to say they are explained by verbal association with justice is circular. Association with more concrete terms such as judge, jury, and lawyer may provide some degree of nonverbal mental reference—an escape from the web of language. Ultimately, the meaning of justice, law, precedent, impartiality, judge, jury, lawyer, and so on, may lie on a foundation of contextually appropriate exemplars such as famous court cases, media accounts, personal experiences, and so on (Sadoski and Paivio 2013). Glenberg et al. (2008) showed that comprehending sentences describing the transfer of abstract information (e.g., “Anna delegates the responsibility to you”) evoked motor responses to the same extent as sentences describing the transfer of concrete objects—we may understand the abstract as if it were concrete.

The meaning of an abstract term ultimately may be all the verbal and nonverbal experiential associations we have for the term including emotional associations (e.g., Kousta et al. 2011; but see Paivio 2013, and Vigliocco et al. 2013). Even the meaning of highly abstract subjects such as mathematics may ultimately rely on concrete, embodied experiences (Brown et al. 2009; Kaschak et al. 2017; Kiefer and Trumpp 2012; Skemp 1987). However, considerable discussion exists regarding abstractions and embodiment (e.g., Dove 2010, 2016). For example, a quick answer to the question “Is titanium a metal?” does not necessarily call for any physical experience with titanium; a quick verbal association between the subordinate and superordinate terms would suffice perhaps employing the container metaphor (Goldinger et al. 2016; Paivio 1986, 2007, 2010; Sadoski and Paivio 2001, 2013). The extent and depth of concreteness effects in reading comprehension is task-dependent to a degree.

In sum, the application of embodied theories to reading comprehension instruction largely involves the provision of learning contexts that are rich enough to evoke simulations of concrete experiences even when the subjects are relatively abstract. Theoretically consistent educational methods have been developed with success, and we next turn to several examples of those methods. However, we note that there are many other issues in reading to which embodied theories are relevant including decoding, vocabulary, grammar, multimedia learning, study skills, and more, each worthy of its own extensive review. Different embodied theories may have different perspectives in explaining these issues.

Examples of Embodied Educational Practices in Reading Comprehension

Educational practices in reading that could be considered embodied are not really new. Familiar historical examples include the object lesson of Pestalozzi, the multisensory methods of Montessori, the experiential education of Dewey, and many more (reviewed in Sadoski and Paivio 2013, chapter 2). However, the embodied perspective on cognition has introduced (or re-introduced) some exciting new possibilities. Several examples will be briefly reviewed.

The most extensively researched comprehension technique that derives directly from an embodied theory is the formation of multisensory mental images, either spontaneously or through instruction (Sadoski and Paivio 2001, 2013). Small-scale studies of effective instruction abound, but very few have been scaled up to the curriculum level for delivery to whole schools and districts. The Visualizing and Verbalizing (VV) technique, based on DCT principles and developed for comprehension instruction by Lindamood-Bell Learning Processes (Bell 1986, 2007), was successfully scaled up in a large urban school district. The VV instructional program systematically guides students to form multisensory mental images and describe them in increasing detail, beginning with pictures and moving on to words, sentences, and longer passages. The emphasis on associating language with multisensory mental images is a direct application of DCT to reading comprehension. Over 5 years, the program was implemented according to successful scaling-up criteria (Borman et al. 2003) including extensive buy-in, inservice training, on-site monitoring, and the use of special program materials that were scaffolded to standard materials including basal readers and content area textbooks. An independent evaluation study found statistically and educationally significant gains in all years and grades on a state-mandated reading comprehension test (Sadoski and Willson 2006). Programs that are based on specific, embodied theories can be used at scale with success.

An instructional program based in kinesthetic imagery was developed by Glenberg and his colleagues and entitled Moved by Reading (MBR; Glenberg et al. 2004). The technique involves a two-stage intervention in which children first read stories of a particular scenario. For example, one scenario involves a farm, complete with a farmer, farm equipment, and animals. During reading, children have access to toy models or moveable images of them on computer screens. Children read text segments and physically manipulate the objects to conform to the content (e.g., reading “The farmer drives the tractor into the barn” while physically moving the tractor into the barn). This technique involves indexing language to objects and actions by performing the described actions in a multimodal way. The next stage involves imagining the manipulations when the objects are not physically present. In an experimental test where control groups read and re-read the texts with the objects visible but not physically moved, the treatments moving actual objects or their computerized counterparts produced large improvements in reading comprehension (d effect sizes approaching or exceeding 1.0). This embodied intervention has been modified for teaching reading comprehension to English Language Learners (Enhanced Moved by Reading to Accelerate Comprehension in English (EMBRACE); Glenberg et al. 2016).

Block and her colleagues developed the Comprehension Process Motions (CPM) method for learning comprehension strategies (Block et al. 2008) based on DCT. CPM lessons teach children physical gestures that stand for the comprehension processes of main idea, drawing conclusions, clarifying, and so on (cf. Kaschak et al. 2017). For example, when something read is not clear, children place their palms together in front of their chests. When it becomes clear they spread their hands apart to the sides with fingers splayed as if something opened up. In an experimental study, children were taught to use CPMs through teacher introduction and scaffolding as students internalized the strategies. A control group was taught the same comprehension strategies verbally without CPM gestures. Students were tested on standardized comprehension tests and criterion-referenced tests of drawing conclusions, clarifying, following a story’s plot, identifying writing patterns in nonfiction, and finding main ideas. Students receiving CPM instruction significantly outperformed controls on every measure. Using η2 effect sizes, more than 70% of the students’ achievement was attributable to CPM instruction on every measure. These are among the largest effect sizes in the literature of reading comprehension strategy instruction.

More examples could be cited. Techniques such as these offer promise for improving the reading comprehension of children in engaging ways. But why multimodal acts, overt or imagined, should embody meaning better than language alone is virtually impossible for single-coding, abstract knowledge theories to explain. New vistas of cognitive strategies are emerging.

Discussion: A Challenge and an Invitation

The challenge posed by embodied cognition is inescapable. A body of behavioral and neuropsychological knowledge has been developed that other theories cannot adequately address. This knowledge has redirected the field of cognitive psychology, and many leading researchers, even those versed in earlier theories, have responded positively. Besides the theorists cited earlier, Zwaan (2004) rejected amodal propositions and proposed an embodied “immersed experiencer” theory of comprehension (see Zwaan 2014 for an elaborated update). Indeed, Lakoff (2012, p. 733) wrote: “It may be hard to think back to a time before the idea of embodied cognition, but I was raised in that generation.” A field basic to reading has fundamentally changed.

Much scholarship in the field of reading might be seen as slow to entertain this change. The most recent, a 1317-page edition of Theoretical Models and Processes of Reading (Alvermann et al. 2013), had only two references to embodiment or embodied cognition in its index. Professional journal articles and book chapters on reading comprehension regularly cite theories based in abstract propositions and schemata, but a closer look shows that this may be only superficial.

As noted earlier, articles that claim to be based in abstract, amodal theories rarely present propositional analyses or operationalized definitions of abstract schemata. Rather, the terminology of these theories may be used, but actual comprehension is operationalized as answering test questions, verbal recall, and other reliable means that might just as easily be interpreted by any theory of comprehension. Mental images in reading are not theoretically synonymous with situation models (Kintsch 1998) because situation models in that theory are coterminous with well-organized propositional text bases. Mental images might more accurately be called embodied simulations, nonverbal representations, multisensory mental models, or other theoretical terms that do not imply the existence of propositions. Nor is gist recall of a text sufficient evidence of an underlying propositional textbase. The point is that the popular use of the terminology of a theory is not the same as the acceptance of the principles of that theory, and the field seems to have intuitively understood the difference. Moreover, there may be serious risk in using theoretically inappropriate or imprecise terminology in developing theoretically based interventions because such imprecision lends confusion and misdirection to the field.

This does not mean that AI-based research programs cannot contribute to our knowledge of reading. Indeed, basic changes in the assumptions of many of these theories would render them tantamount to embodied theories—“Kintsch’s evolving schema-based theory of text comprehension can be described as disembodied DCT” (Paivio 2007, p. 122). Practitioners need all the help they can get in improving reading comprehension, but the popularity of AI-based theories for nearly 50 years seems to have left us without much progress. It seems time for something new, and embodied theories may provide a needed contribution. Part of what all this evidence may be showing us is how children actually make meaning, not how strategies more suitable for educated adults can be imposed on them. And maybe adults are not so different at making meaning at that.

This leads us to the invitation of embodied cognition. There has perhaps never been a time when the door was flung open to more exciting opportunities for researchers and teachers. Brand new techniques may be developed, or older ones revised. Are bodily sensations, overt or covert, a critical part of the meaning that language evokes in humans? Do bodily attitudes serve, besides as communicative signals to others, as signs to ourselves of meaningful comprehension which is less than complete without them? Can comprehension be deep and complete without the sensuous realization, in imagination, of objects and events, even as metaphors for abstractions? Will techniques like VV, MBR, CPM, and their as-yet-undiscovered kin help make reading more meaningful to young and struggling readers of new generations? These are matters for the next generation of researchers and practitioners in reading to explore—the door is open and you are invited in.