The failure of many children to become proficient readers is a persistent problem. This is especially true with regards to the ultimate goal of reading comprehension where many children have significant difficulties (McNamara et al. 2007). According to the US Nation’s Report Card, 33 % of fourth graders and 25 % of eight graders read below basic levels (National Center for Education Statistics 2010). In other countries, such as the Netherlands, a large number of elementary school children also read below the minimum required reading comprehension level (Periodic Survey of Educational Level 2007). Children who have reading comprehension difficulties not only are likely to struggle throughout their school career but also to experience difficulties with employment, social functioning in society, and other aspects of daily living (Snow 2002).

It may be that these comprehension problems are due in part to the limited attention paid within reading instruction to reading as a sensory experience (Van de Ven 2009). In fact, many reading strategies are focused on stimulating text-based processing, such as questioning and making summaries of a text (Johnson-Glenberg 2000). Bringing about a change in this approach requires that readers not solely rely on the words being read but use their visual (and ideally also other) senses to create non-verbal representations of the concepts and the relations in a text thereby increasing the likelihood that the text is adequately understood (Sadoski and Paivio 2004; Van de Ven 2009). Drawing upon cognitive theories of reading comprehension, this paper proposes several ways to improve text comprehension by using visualization strategies, taking into account the extent to which they involve one or multiple senses and ask for internal or external visualization processes.

The Reading Comprehension Process

According to cognitively based views of reading comprehension, successful reading comprehension depends on the construction of a coherent meaning-based mental representation of the situation described in a text (Kintsch 1988; Van den Broek 2010). This so-called situation model representation is gradually constructed by continuously updating information from different text dimensions (e.g., space, time, protagonist, causation, and intentionality; Zwaan et al. 1995) and integrating this information with the readers’ background knowledge, as text unfolds (Zwaan et al. 1995; Zwaan and Radvansky 1998). Readers often extend situation model representations with knowledge about the world, for example when textual information does not provide sufficient coherence. The result is a coherent and richly connected visuospatial representation of the situations and events that are described in a text, which enables readers to draw inferences. Situation models therefore constitute the level of text representation which is associated with deep processing. This contrasts with lower-order levels of text representation, that is, surface-level and text-base representations, which are restricted to only the actual textual information. In summary, a situation model refers to a non-linguistic mental representation of what the text is about instead of a representation of the text itself (Glenberg et al. 1987). Consequently, readers are more likely to gain an in-depth understanding of a text.

However, moving beyond the word, phrase, or sentence level is a complex process for many readers. Situation model level comprehension is most likely acquired by proficient readers who are distinguished from struggling readers by their mastery and use of (meta)cognitive reading strategies (Dole et al. 1991; McNamara et al. 2007). According to Graesser et al. (1994), reading strategies play a prominent role in comprehension because readers use them to construct coherent mental representations and explanations of situations described in texts. Unfortunately, not all children spontaneously develop the required strategies and engage in strategic processes when encountering a text that is difficult to comprehend (Pressley and Allington 1999). There is, however, empirical evidence that reading comprehension strategies can be taught to help readers process texts more actively and deeply and, in turn, to more successfully understand text (Dole et al. 1991). In fact, there is now considerable consensus in the research literature regarding the importance of teaching reading comprehension strategies. In addition, in the past decades, a large number of reading strategies have been proposed and studied that are intended to improve situation model construction, such as identifying main ideas, prediction, summarizing information, and (guided) visualization (for an overview, see Trabasso and Bouchard 2002). However, regarding the practice of teaching reading comprehension, elementary school teachers still seem unsure about how to teach reading comprehension strategies and often test rather than teach comprehension by just concentrating on asking children questions about text content after reading, with little attention being devoted to the strategic aspects of processing and comprehending texts (Liang and Dole 2006).

Furthermore, it appears that, although strategies are part of reading instruction at the primary school level, only a restricted number of all possible strategies are actually used. Most often, the strategies originally developed by Palinscar and Brown (1984), that is, questioning, predicting, summarizing, and clarifying, form the ingredients for many comprehension instruction programs as these have been shown to promote good reading comprehension (e.g., McNamara et al. 2007; National Reading Panel 2000). There is relatively little attention for strategies encouraging readers to visualize text content. For example, in the Netherlands, only one out of five of the most widely used reading comprehension methods in elementary school contains a visualization component, such as drawing a picture that corresponds to text content (Stoeldraijer and Vernooy 2007). A recent analysis reports similar findings (Bos et al. 2013), suggesting that the role of visualization strategies in reading comprehension instruction has undergone little change in the last 5 years. When visualization is used in the classroom, this often is done by a guided imagery format in which teachers engage in the imaging process modeling their own personal images for children to observe, with little or no emphasis on encouraging children to construct their own mental images (Hibbing and Rankin-Erickson 2003). However, according to many researchers (e.g., Anderson and Pearson 1984), a central factor in differentiating proficient from less proficient readers appears to be their ability to actually visualize text content themselves. In line with this, Hibbing and Rankin-Erickson (2003) argued that readers who lack the ability to create visual mental images when reading often experience comprehension problems. The RAND Reading Study (Snow 2002) even concluded that a critical factor contributing to the comprehension success of proficient readers is their ability to mentally visualize the situation described in texts, suggesting that visualizing text content is necessary for adequate understanding of the situation described in a text.

In the present article, we also assume that visualizing while reading is an effective strategy to improve text comprehension and discuss the empirical evidence for this claim. Given the promising role that is assigned to visualization for improving reading comprehension, it is surprising not to see visualization used more often as a strategy in the practice of reading comprehension instruction. Perhaps this is due to an intriguing observation when looking into the literature on visualization and imagery as a strategy for improving reading comprehension: While research on visualization strategies for improving text comprehension flourished in the 1970s and 1980s, it is only recently, two decades later, that it has received renewed interest, at least partly due to the emergence of more sophisticated reading comprehension models such as the situation model theory (Zwaan and Radvansky 1998) and embodied theories of cognition (e.g., Barsalou 1999; Zwaan 1999). Thus far, however, these reading comprehension models have mainly addressed the role of visualization in the reading comprehension process at a theoretical level. Hardly any attempts have been made to translate the knowledge regarding visualization gained from these models to reading comprehension instruction and/or reading comprehension strategies to improve reading comprehension.

By drawing on both recent and more distant research on visualization strategies used in text comprehension research, the purpose of this article is to identify and categorize the different visualization approaches to encourage readers to build rich and elaborated situation models of texts. That is, we outline the different ways to support readers to create internal (e.g., imagination) or external (e.g., drawing) non-linguistic representations, which in line with research in this field, will mainly, but not exclusively, be concentrated on constructing visual representations. In doing so, this article will by no means be exhaustive, nor does it intend to provide a comprehensive review of the available literature in the field. Rather, it serves as a contemporary collection, as well as a critical discussion, of the different ways visualization strategies can be used to encourage readers to build non-linguistic representations of text that could inform practice and provide a basis for further research.

Theoretical Model of Visualization

Dual Coding Theory

The ability to visualize text content has been defined in various ways in the fields of psychology and education. It is generally agreed to be the process of forming nonverbal representations (internal or external) of objects or events that are not physically present but are described in a text (Hibbing and Rankin-Erisckson 2003; Sadoski and Paivio 2001). The role of making nonverbal representations to understand text has its origins primarily in Paivio’s dual coding theory (DCT; Clark and Paivio 1991). According to DCT, verbal and nonverbal information is processed in separate but interconnected mental subsystems in working memory, referred to as a dual-coding system, containing verbal and nonverbal representations. Verbal representations consist of words for objects or events (i.e., verbal code). For example, the word “hamburger” may evoke a series of verbal representations such as “made of meat” and “high in calories.” Nonverbal representations (i.e., nonverbal code) are composed of sensory images that to some extent retain the main perceptual features of whatever is being represented (Sadoski and Paivio 2001). In this sense, DCT is similar to Baddeley’s (2000) working memory model, which also distinguishes separate systems in working memory for processing verbal and nonverbal information. According to Baddeley (2000), working memory consists of two modality-specific systems, the phonological loop and the visuospatial sketchpad, as well as a central executive. The visuospatial sketchpad is dedicated to processing visual and spatial information, and the phonological loop is specialized in processing verbal and acoustic information. The central executive primarily functions to coordinate and integrate the information in these two systems. So, both DCT and Baddeley’s model share the assumption of independent but interconnected modality-specific systems in working memory. However, unlike Baddeley’s model, which assumes that integration of visual and verbal information occurs via the central executive, DCT allows for direct interaction between verbal and nonverbal systems. These interconnections allow the processing of visual and verbal information to be facilitated.

Another basic premise of DCT is that all mental representations in working memory retain some of the concrete sensory experiences from which they were derived (Sadoski and Paivio 2007). This means that verbal and nonverbal representations are formed based on information acquired in one or various senses. It follows that nonverbal representations are not limited to the visual modality, although the visual modality has received most attention from researchers (e.g., Gambrell and Bales 1986). For example, creating a visual mental image of an object such as a hamburger typically involves visual properties like its shape, color, and size. According to DCT on the other hand, there is not only a mind’s eye but also a mind’s ear and other senses of the mind as well, as the creation of mental images is based on all sensory experiences (Sadoski and Paivio 2001). Sadoski and Paivio (2001) stated that “mental imagery is comprised of representations that refer to internal forms of information used in memory” (p. 43). Therefore, the sensations and memories one gets from reading in the absence of real experience pertaining to visual, auditory, haptic/kinesthetic, olfactory, and gustatory senses may be categorized as mental images (Sheehan 1972). Moreover, Algozzine and Douville (2004) argue that making mental images from text serves as a kind of multisensory blackboard or personal movie screen that aids in verbal and spatial tasks. In support of this, Long et al. (1989) showed that spontaneously created mental images when reading a text often include memories of sights, sounds, tastes, touch, smells, feelings, events, and stories that may be replayed in the mind. Several other studies have shown similar findings (for an overview, see Sadoski and Paivio 2004). Hence, nonverbal representations constructed from texts (i.e., situation models) therefore likely comprise information from all sensory modalities to reflect the referent situation as accurately as possible, which, ultimately, is stored in long-term memory.

From the perspective of DCT, readers thus process text through direct interconnections between modality-specific mental representations within verbal and multisensory nonverbal subsystems in working memory. Readers can switch from one sensory modality to another within the same subsystem (e.g., from reading to listening to words) or between subsystems (e.g., from reading to mental images) to arrive at an adequate understanding of a described situation. By processing information through the verbal and nonverbal channels rather than through a single channel, the verbal and nonverbal codes can act as a complementary form of representation. Hence, a richer situation model is likely to be constructed, which is supposed to enhance understanding. Furthermore, creating rich (multi)sensory nonverbal representations may activate other verbal or nonverbal representations in memory and build a deeper experience and understanding. Although some of a reader’s nonverbal representations for a particular story are shared with all readers, most of a reader’s private nonverbal representations are different from those of anyone else. By creating nonverbal representations related to the text and connecting them to one’s private sensory experiences and images makes the story personal for the reader. It encourages piecemeal active integration of textual information in working memory, but it also helps to make something you read concrete in your mind and to connect it to your previous experiences stored in long-term memory (Sadoski and Paivio 2004). Moreover, by creating (multi)sensory representations, readers discover how the different segments of a text fit together in a three-dimensional situation, in which you can see, hear, feel, smell, and touch what the text describes (Zimmerman and Hutchins 2003). This suggests that visualizing provides a good means to support the understanding process during reading by bridging information in the text and a reader’s mental representation (Sadoski and Paivio 2004).

Theories of Embodied Cognition

Similar to DCT, more recently developed theories of embodied cognition assume that all sensory modalities are involved in the process of constructing a mental representation. In fact, this is even one of the central tenets of embodied theories of cognition. According to the embodied view of cognition, cognitive or psychological processes are influenced and shaped by the body including body morphology, sensory systems, and motor systems as well as the body’s interaction with the surrounding world (Barsalou 2008; Glenberg 1997; Zwaan 1999). That is, cognitive processes are grounded in the same neural systems that govern direct perception and action. It follows that perceptual and action-related processes are tightly linked to each other as well as to higher-order cognitive processes such as language (Barsalou 1999). Thereby, in contrast to other theories of language comprehension such as DCT, no transduction process to a verbal code occurs but only a nonverbal mental representation is available. The primary focus of an embodied approach to reading is therefore not, as in DCT, to provide a comprehensive account of the processing of text–picture combinations, but rather to explain the nature and development of nonverbal mental representations underlying text comprehension. Consider, for example, a situation in which someone is catching a ball. During the actual experience (i.e., catching a ball), patterns of brain activation are formed across multiple modalities, which are then integrated into a multimodal representation in working memory (e.g., how a ball feels, looks, smells, the actions involved). Later on, when retrieving the stored experience from long-term memory during reading, the multimodal representation captured during the actual experience is reactivated to simulate how the brain represented perception and action (i.e., mental simulation). According to this account, off-line cognitive processes, such as when visualizing text content, are therefore body-based in the sense that people can vividly remember or mentally visualize an experience as if they actually (re)experience the event by drawing upon previously acquired multimodal experiences. Evidently, both working memory and long-term memory are involved in this process, as is the case in DCT. As Postma and Barsalou (2009, pp. 103–104) stated: “Notably, the mechanism of mental simulation [….] is essentially the same mechanism as mental imagery, except that it operates both unconsciously and consciously, unlike mental imagery which is typically presumed conscious. […]. Although long-term memory is central to mental simulations that run automatically and unconsciously, working memory becomes relevant once these simulations reach consciousness, and especially when the executive system manipulates them.”

A growing number of empirical studies provide strong support for theories positing the importance of embodied multimodal mental representations in comprehending language (for overviews, see Fischer and Zwaan 2008; Glenberg 2007). Accumulating evidence shows that readers activate visual representations corresponding to described perceptual features such as shape and orientation even when these properties are not explicitly mentioned in the text but are only implied in a sentence (Stanfield and Zwaan 2001; Zwaan et al. 2002). For example, when reading the sentence “He saw the eagle in the nest,” readers respond faster to a subsequently presented picture of an eagle that has its wings tucked in than a picture depicting an eagle with outstretched wings, suggesting that readers change their mental representation of the eagle depending on the context in which it is described. Moreover, perceptual simulation of a described action also appears to influence motor performance or vice versa. For example, readers may require a longer time to perform a motor response that is incongruent (e.g., movement towards the body) with an action implied in a text (e.g., “John closed the drawer”) (i.e., the action–sentence compatibility effect; Glenberg and Kaschak 2002).

Consistent with these behavioral demonstrations of the importance of mental simulation in language comprehension, neural imaging studies show that brain regions responsible for the activation of perceptual and motor simulations during language processing overlap with the regions activated during real-world perception and action (e.g., Pulvermuller 2005). Together with other studies, these findings suggest that the mental representation constructed during language processing likely involves perceptual and motor simulation of the described events and that such activations may play an important role in successful understanding. Helping readers in trying to better understand text by drawing upon visualization therefore likely involves stimulating readers to activate perceptual and/or motor experiences stored in long-term memory that are relevant to the events described in the text. As people’s mental simulations of events described in a text seem to differ as a function of the amount of prior experience with the events, the chance that with some encouragement relevant experiences will be activated during reading is increased, if readers have already acquired, and hence stored, relevant experiences (Holt and Beilock 2006). Another interesting opportunity offered by embodied theories of cognition focused on connecting text information to bodily experiences is to provide readers actual experiences during reading by for example physically enacting the situation described in the text (Glenberg 2011).

Based on these theoretical and empirical considerations, visualizing text content improves the possibility that readers accurately understand the situation described in a text. Although this is likely true for all readers, creating multisensory nonverbal representations in working memory of what is read appears to be a natural process especially for more proficient readers; less proficient readers do not necessarily automatically create nonverbal multisensory representations in response to text (Oakhill and Patel 1991). However, as shown in the remainder of this article, there are nowadays various ways that can be used to encourage readers to create nonverbal representations from text using one or multiple sensory modalities.

Visualization Strategies for Improving Reading Comprehension

According to Rapp and Kurby (2008), a key dichotomy can be identified when classifying visualization as a strategy to construct a nonverbal representation from text: internal and external visualization. With internal visualization, we refer to mentally visualizing textual information, that is, creating mental nonverbal images of the information presented in a text. This process of visualization occurs in people’s minds and can therefore by definition not be physically observed (i.e., mental imagery). External visualization, on the other hand, refers to nonverbal representations of text content that are available in the environment, like a drawing that depicts text content into a nonverbal representation. In this article, external visualization is referred to as a nonverbal physical representation of text content, which could either be created by the reader (i.e., reader-constructed) or by someone else like a teacher (other-constructed). With respect to reader-constructed external visualizations, we refer to a physical representation that is produced by readers to depict text content in visual or multimodal (i.e., visual and other senses) format and which is used to help them to construct an internal representation of text. Making a drawing of the situation described in the text can be seen as an example of this type of external visualization. Of course, some form of mental imagery must initially occur for such an external representation to occur. In this sense, internal and external visualization processes do not operate in isolation but are interconnected. Self-generated external visualizations are central to our discussion in “Theoretical Model of Visualization” and “Discussion” sections. We consider other-constructed external visualizations as an external aid which is presented to, instead of constructed by, readers to support the internal visualization process. This type of external visualization especially occurs when providing readers illustrations to support the internal mental visualization process and will be primarily discussed next in “The Reading Comprehension Process” section. When discussing external visualization in the following sections, we indicate to which of these two types of external visualization the presented studies refer to. Despite these differences, internal and external visualization have in common that they are model-focused strategies (McNamara et al. 2007). That is, internal and external visualization strategies are aimed at helping readers to go beyond the text. They are intended to encourage a situation model of the objects and their relations described in a text. With this nonverbal representation, readers are supported in finding key features and making inferences (Larkin and Simon 1987), which in turn is likely to help them to better understand the text than those who do not visualize textual information.

Another dimension along which visualization as a strategy for improving reading comprehension may differ is the multimodal character of the visualization process, that is, the extent to which visualization strategies encourage readers to construct a multisensory mental representation of the text they are reading. As indicated above, situation model representations not only involve the visual sense but also, at least to some extent, all of the other senses. Hence, situation models can be regarded as multisensory nonverbal representations of text. Although the emphasis of the majority of research on visualization as a reading strategy has been on creating nonverbal mental representations using the mind’s eye (i.e., visual sense), recently researchers have started to explore the potential of drawing upon other senses, primarily the involvement of sensorimotor modalities for improving reading comprehension (e.g., Glenberg 2010).

In the following, both the internal–external and the unimodal–multimodal dimensions serve as the organizing framework for discussing visualization strategies that have been studied and discussed in the literature as well as for providing promising new directions for improving reading comprehension. Crossing these two dimensions results in four types of visualization (e.g., internal and unimodal visualization). In the next sections, each of these types will be discussed in turn by describing the corresponding visualization strategies as well as their empirical evidence.

Internal and Unimodal Visualization

Numerous studies have investigated the relation between mental imagery and reading comprehension (e.g., Center et al. 1999). Especially in the late 1970s and 1980s, research on mental imagery in the context of improving reading comprehension has flourished. This research has been based on the assumption that mental imagery is a knowledge representation system that readers can use to organize, integrate, and retrieve information from text, which in turn effectively improves readers understanding of text (Gambrell and Bales 1986). However, research reviews on mental imagery as a reading comprehension strategy in general are not consistently positive about its use (e.g., National Reading Panel 2000). In fact, results from the large body of research on mental imagery provide mixed findings with some studies finding a positive relationship between mental imagery and reading comprehension and other studies finding no relationship. The variations in these findings may be explained by a wide range of differences between studies making them incomparable in many respects such as the age of the participants, length of the texts studied, type of text studied (narrative vs. expository), whether imagery was done sentence-by-sentence or all at once, whether or not the study was computer-based, and whether or not text passages were used specifically written to evoke imagery. Perhaps the most important factor leading to the mixed findings is the amount of attention that is devoted to instructing readers on how to visualize during reading, which varied from instructions to make mental pictures to guided imagery using elaborated training sessions.

Instructing Readers to “Make a Picture in Your Head”

It is fascinating to note that many of the studies on inducing mental imagery did not give readers any instructions on how to create mental images that may enhance the text they are reading. In most of these studies, the researchers simply asked readers to make a picture in their heads during reading (e.g., Anderson and Kulhavy 1972; Gambrell 1982). It may not come as a surprise that only asking readers to make a picture in their head while reading a text has often failed to improve reading comprehension. For example, Anderson and Kulhavy (1972) required high school students to read a 200-word expository text and asked them either to read the text carefully or to read the text while making vivid mental images. Results showed no significant differences between the two groups on reading comprehension. Several other studies have confirmed these findings investigating prose passages (Cramer 1980), middle school students (Gunston-Parks 1985), as well as poor comprehenders in third and fifth grade (Moore and Kirby 1988), sixth grade (Koskinen and Gambrell 1980), and eight grade (Hayes and Readence 1982).

However, there are some studies providing evidence that simply prompting readers to create visual mental images during reading without further explicit instructions on how to do so was sufficient to improve reading comprehension. These studies typically took only one session to complete and used materials specifically designed for the study rather than regular classroom materials. Interestingly, almost all of the studies reporting positive findings were conducted with elementary school students using narrative texts. Taking a more in-depth look into when the “make a picture in your head” instruction enhances reading comprehension in elementary school learns that children’s age appears an important factor. Several studies report findings in line with the developmental imagery hypothesis (Levin and Pressley 1981) stating that children’s ability to benefit from imagery instructions improves over the elementary school years. Gambrell (1982), for example, showed that after instructions to “make a picture in your head” third graders, but not first grade students, reported twice as many facts and made twice as many more accurate predictions about the text than a control group who was instructed to “think about what you read to help you remember.” However, other studies suggest that older students not necessarily benefit from imagery instruction. In a study by Maher and Sullivan (1982), instructions to mentally create a picture while reading a text significantly improved the reading comprehension of fourth grade students, but it had no significant effect on sixth grade students. They concluded that this is due to older students already having sufficient reading strategies, including mental imagery, to make them successful readers (also see, Oakhill and Patel 1991; Pressley 1976). Taken together, these findings suggest an optimum for using “make a picture in your head” instructions with younger students having too few processing resources to simultaneously read the text and make a visual mental image, whereas older elementary students as well as those beyond elementary school already have developed adequate text processing and mental imagery strategies and therefore do not benefit from an explicit instruction to make a picture in their head. Besides a reader’s age, another factor that has been shown to influence the effectiveness of the “make a picture in your head” instruction is a reader’s reading proficiency. Several studies have provided evidence that “make a picture in your head” instructions are especially helpful for readers who are identified as poor comprehenders (e.g., Gambrell et al. 1980).

To sum up, the research discussed thus far does not seem to support a clear relationship between the “make a picture in your head” instruction and reading comprehension. This mental imagery instruction is most likely to result in improved reading comprehension for elementary school children and readers identified as poor comprehenders. However, simply asking readers to mentally visualize during reading without explicit instruction on how to do this may have serious limitations. That is, it may only help readers who know how to mentally visualize but for whatever reason fail to do so spontaneously (i.e., production deficiency, Flavell 1970). In addition, there is a possibility that readers create mental images that do not fall within the context of the text and thus are unrelated to the passage people read, which hinders comprehension (McCallum and Moore 1999). Moreover, the non-specific instruction “make a picture in your head” largely ignores the dynamic nature of the reading comprehension process and therefore likely leads to a static image of a given situation or event. When reading a text, readers actively try to construct a rich interconnected situation model by integrating information from the text with each other and to prior knowledge stored in long-term memory, monitoring their understanding, and by constantly updating and revising their current situation model (Zwaan and Radvansky 1998). In this sense, using more process-oriented instructions seems more in line with the way mental representations from text are constructed. Hibbing and Rankin-Erickson (2003), for example, suggest telling students to read a text and form images as if they were watching “television in their heads.” Nevertheless, this instruction has not been tested empirically yet and does not contain any guidance on how to create mental images. So, also here it is important that mental imagery instructions teach readers in what ways they can create mental images from the situation described in a text.

Training Readers in How to Make Mental Pictures

Research on using instructions teaching readers how to create mental pictures, thereby going beyond simply asking readers to “make a picture in your head,” has received fair amounts of attention in the literature. Looking into this collection of studies shows that researchers have used various methods for instructing readers how to mentally image a text. Approaches for instance vary from a single training session lasting just several minutes (e.g., Rose et al. 1983) to multiple training sessions lasting 30 min each (e.g., Bourduin et al. 1993), showing illustrations during the training session(s) (e.g., Pressley 1976) or not (e.g., Rose et al. 1983), and forming mental images sentence-by-sentence (e.g., Clark et al. 1984) or after reading a complete text passage (e.g., Gambrell and Bales 1986).

In contrast to the “make a picture in your head” instruction, the majority of studies using a training procedure teaching readers how to mentally visualize a text in general show positive findings across school levels (i.e., elementary, middle, high school, college). More specifically, using mental imagery training in a pure mental sense that is without external aids such as pictures helps readers to mentally visualize text and enhances their ability to recall information, make inferences and predictions, and monitor their understanding (e.g., Gambrell and Bales 1986; Rose et al. 1983; for an exception, see Warner 1977). Although these findings seem straightforward in showing positive gains of imagery training, the types of training interventions and the task required for the control group considerably vary between different studies. For example, Gambrell and Bales (1986) developed three high imaginable sentences, two high imaginable paragraphs, and four expository passages for training and testing fourth and fifth graders reading comprehension monitoring. The imagery condition consisted of a 30-min training session in which a trainer first modeled and then encouraged readers to make visual images of the text, gradually building up from sentences to whole passages. The trainer guided students with prompts to make images with full details such as the type of car and the type of road it was on. Readers in the comparison group were told to do whatever they could to understand and remember the story. Results showed that the use of imagery training was positively related to comprehension monitoring, at least when comparing an imagery group with a control condition receiving no explicit instruction on how to process the text.

Similar findings have been obtained regarding reading comprehension in a group of special education children (Clark et al. 1984). Clark et al. (1984) used a multi-component imagery instruction called read, imagine, describe, evaluate, repeat, which was implemented during seven instructional sessions each lasting about 45 min. This strategy required readers to form mental visual pictures and to describe these. As more text was read, the current mental pictures in the readers head had to be modified in congruence with the unfolding text, thereby directly addressing the dynamic nature of building situation models (Zwaan and Radvansky 1998). Results from a pre-posttest design showed comprehension gains for the imagery training group compared to the no-training group. Similar findings have been obtained in more recent studies. Johnson-Glenberg (2000), for example, gave third to fifth graders a 10-week training in either a verbal strategy (i.e., summarizing, predicting, clarifying, and generating questions) or a visual strategy in which readers learned to mentally create and then describe pictures in their head. Compared to an untreated control group which solved anagrams, the visual and verbal strategy groups demonstrated significant gains on several reading comprehension measures.

Together, the findings reported above suggest that students engaging in imagery training for the purpose of improving reading comprehension outperform those who do not receive explicit instruction or training in the use of a reading comprehension strategy. The studies, however, are not very convincing as they compared groups receiving a visualization training with groups who did not receive any strategy training. When using more adequate comparison conditions, however, by contrasting an imagery condition with other reading comprehension strategies such as a verbal strategy (Rose et al. 1983) or a text structuring strategy (Moore and Kirby 1988), empirical studies often find no significant difference on reading comprehension. This suggests that imagery training aimed at encouraging readers to make a visual mental image not necessarily results in better reading comprehension performance than other reading strategies. Nevertheless, it is likely that the pure mental imagery training has been used widely because its advantages such as that no additional materials like illustrations need to be prepared and only requires a short instructional period of time (usually a single session). A drawback is that some students may require more concrete guidance in imagery for example by using illustrations to start from.

To improve imagery training effectiveness, it has been suggested to show the readers a picture about the text passage before stimulating them to create their own mental pictures (e.g., Hibbing and Rankin-Erickson 2003). The approach taken here is to support the construction of an internal visual representation of text by providing readers an external aid in the form of an illustration during imagery training. This idea was investigated in several experimental studies. For example, in a 20-min training session, Pressley (1976) showed 8-year-olds in the experimental group illustrations that were accurate depictions of the situations described in narrative sentences. Moreover, these children were told that a good way to remember things is “to make up pictures in your head” with the ultimate goal of creating a mental image like the illustration that was shown. After this training, children were prompted to make mental images of the text they were reading using a booklet that consisted of alternating text and blank pages. Children were encouraged to project their mental images on the blank pages, that is, make mental pictures of the situation described in the text on the location where normally real pictures alongside the text would be provided. Results showed that children who received the training were better able to answer questions about story content than those who were not trained in the imagery technique. Pressley’s method using illustrations to train mental imagery has been used successfully in various other studies in modified form (e.g., Peters and Levin 1986; Shriberg et al. 1982). According to Oakhill and Patel (1991), imagery training with illustrations is especially effective for poor comprehending readers. In their study, children aged 9–10 years with good and poor comprehension skills were shown illustrations of the sequence of events while they read narrative stories during three training sessions. The children were then instructed to form pictures in their mind of story events, and they discussed their images with the teacher. Poor comprehenders who received the training improved their ability to answer factual, descriptive, and inference questions about the stories relative to controls, whereas good comprehenders did not show the same improvement. However, it is difficult to attribute the improved comprehension performance of the imagery training group to a visualization strategy as verbalizing and visualizing strategies were used together in an intricate fashion during the imagery training.

The beneficial effects of providing illustrations to readers during imagery training to facilitate the construction of an internal visualization of text content corroborate with studies showing that text and illustrations together lead to better understanding of text. Gambrell and Jawitz (1993), for example, compared four groups of fourth graders who were either instructed to form mental images in their head or not and to read a text with or without accompanying illustrations. Results showed that those instructed to look at the illustrations and to form mental images outperformed all other groups on recall and comprehension. They concluded that especially the combination of these two strategies was effective, suggesting that mental imagery might be facilitated by using external visual representations. Similarly, in the domain of multimedia learning, there are a large number of studies providing evidence for such a multimedia effect demonstrating that if readers simultaneously process verbal and visual representations they are more likely to remember and understand the presented information and do so with less cognitive investments than with either verbal or visual representations alone (Mayer 2009). Evidently, the theoretical rationale here, which is derived from DCT and Baddeley’s working memory model, is that by dividing the processing demands over verbal and visual information processing channels, working memory load is reduced enabling readers to engage in meaning-based processing activities, which likely increases their understanding of the presented information (Clark and Paivio 1991; Mayer 2001).

However, showing static illustrations as an example of a mental image only provides readers with the end-product of the imagery process. According to Hibbing and Rankin-Erickson (2003), an alternative that also incorporates the dynamic character of the described situation, and hence the situation model constructed from it, is to show readers movie fragments or let the teacher make a drawing of the text to help illustrate the mental imaging process. They describe, for example, the W–R–W–R cycle, in which readers consecutively watch a movie and then read a narrative text, as a useful method to help especially poor comprehenders to create mental images from text. A similar sequence can be used with static illustrations and text. In both situations, the extra illustrations added to the training are likely to improve motivation and positive attitudes toward reading. They also may serve to provide readers knowledge and experiences that they have not yet acquired or function as a vehicle to connect the text content to a reader’s prior knowledge (Hibbing and Rankin-Erickson 2003). However, these suggestions have not been tested empirically yet and may provide an interesting avenue for future research.

In sum, training readers in mental imagery generally shows positive results in terms of reading comprehension, especially when imagery training groups were compared to control groups receiving no reading comprehension strategy training and when using illustrations and discussing the constructed mental images with teachers. To further specify why and how imagery training is effective at promoting the construction of an internal visual representation of text, more research is needed that more systematically investigates what the effective ingredients of visualization training are. Especially, research disentangling the individual contributions of visual and verbal training strategies could provide more insight into this issue. Also, investigating the effectiveness of imagery training using a fair control condition, that is, a control condition receiving another training than visualization training rather than no training, could improve our understanding of the effectiveness of imagery training.

Internal and Multimodal Visualization

Evidently, a critical aspect characterizing the research on mental imagery discussed above is that the visualization instructions reflect a single modality imagery approach in which only the visual sense is involved when students are instructed to see a picture in their head. This reliance on only a single modality ignores the multisensory nature of the imagery process as well as the involvement of multiple sensory modalities in constructing a situation model of the events described in a text (Zwaan and Radvansky 1998). In fact, according to Long et al. (1989), mental imagery can occur in at least the following sensory modalities: visual, auditory, olfactory, gustatory, tactile, and kinesthetic perception. Prior imagery research thus only seems to represent a small portion of the range of possibilities that sensory modalities offer for using imagery to improve reading comprehension. Varies studies have investigated the involvement of multiple modalities in the imagery process when reading a passage, but the empirical work on instruction involving multisensory imagery strategies is very limited.

In a study by Douville (1998), the effects of instruction with a multisensory imagery strategy, referred to as the sensory activation model (SAM), on fifth graders’ comprehension of text passages were investigated. The SAM strategy incorporates the activation of five sensory modalities, that is, visual, auditory, gustatory, kinesthetic, and olfactory by, for example, using modified cloze procedures to evoke sensory characteristics from text called sensory language (i.e., describing how something looks like, sounds, tastes, smells, feels). Thereby, the SAM strategy is intended to help readers to self-construct elaborated images that comprise a range of senses. Students received explicit instruction in the SAM strategy for 4 weeks, consisting of observing teachers modeling how they created multisensory mental images using the SAM strategy followed by creating and then sharing their own multisensory images with fellow students. Finally, students engaged in a reading task in which they were required to construct multisensory images independently. Results showed that, compared to control groups who received no imagery instruction or who received imagery training that incorporated only the activation of the visual modality, students who had activated multiple sensory modalities using the SAM strategy constructed significantly more mental images in discussing the text with others and more often used rich and elaborated sensory vocabulary in a writing assignment. Qualitative interviews with students further showed that activating multiple sensory modalities caused readers to enjoy reading more as indicated by statements of students describing imagery as “going to a movie in my head.” Similarly, positive findings regarding the SAM strategy were obtained with second graders showing that the SAM strategy resulted in more elaborated descriptive responses to reading as well as in more elaborated sensory vocabulary in a writing assignment (Douville and Boone 2003).

Although this suggests that training readers to evoke sensory information necessary for imaging text is effective for creating rich mental images (Algozzine and Douville 2004), the research is very descriptive and did not explicitly assess readers’ actual comprehension and/or the quality of the mental representation that was constructed. Investigating these issues in future research should make clear whether the SAM strategy could become a useful reading comprehension strategy to help readers understand text. So, although the research on the SAM strategy does not convincingly show that a multisensory imagery strategy helps readers to better comprehend text, the studies described above do show that there are opportunities to involve multiple modalities in constructing a nonverbal mental representation from text. Of course, exploring the involvement of multiple modalities to construct nonverbal representations from text in ways that effectively translate into comprehension benefits requires further research.

Other than training readers in activating all of the senses and then asking them to use this strategy when reading, Glenberg and colleagues (for an overview, see Glenberg 2011) argue that especially involving a motor component in the visual imagery process enhances readers’ understanding of text. In Glenberg’s two-stage reading comprehension intervention program, called moved by reading, children read short stories related to a particular scenario (e.g., farm, house). While reading the stories, readers have access to actual toy figures corresponding to the objects and characters in the scenario. In the first phase, readers use a physical manipulation (PM) strategy, that is, they use the toys to simulate the situation described in the sentences. In the second phase, the toy figures are removed and readers are asked to imagine manipulating toys while reading new stories from the scenario, which is referred to as imagined manipulation (IM). Although in all of their studies Glenberg and colleagues have investigated the PM and IM strategies simultaneously, both approaches are also likely interesting and useful as individual reading comprehension strategies. Moreover, they appear to represent two distinct aspects of the internal–external dimension: an external multimodal visualization approach (i.e., stage 1—PM: visual and motor manipulation) and an internal multimodal visualization approach (i.e., stage 2—IM: imagining visual and motor manipulation) proposed in this article. We therefore opted to discuss the PM and IM strategies individually in separate sections. The PM strategy will be discussed in the “External Multimodal Visualization” section. As the results involving the IM strategy fit with the current section, they will be discussed here.

The IM strategy teaches readers to imagine manipulating toys corresponding to objects and figures in a text. So, readers learn to imagine how they would interact with the toys to act out the text content. It has been demonstrated that children engaging in IM (after physical manipulation on different texts) show better performance on text comprehension measures, that is, free recall and cued recall tasks compared to children who read or reread the text silently (Glenberg et al. 2004, 2011a; Marley et al. 2007). In other studies, no benefits of IM were observed for first graders (Marley et al. 2007) and second graders (Marley et al. 2007). Nevertheless, these studies did show enhanced performance of IM for third graders (Marley et al. 2010, 2011), suggesting that, in line with the developmental imagery hypothesis, the effectiveness of the IM strategy depends at least in part on students’ age.

According to Glenberg (2011), the positive effects of IM can be explained by the fact that readers can encode and store information in both visual and motor modalities rather than having to rely on visual imagery alone. In addition, involving the motor modality in the imagery process allows readers to engage in predictive processing while reading as motor areas in the brain are associated with (motor) prediction. Moreover, the IM instruction to “imagine moving the characters like you just did” is much clearer and more specific than those typically used in visual imagery studies (“make a picture in your head”). Readers can easily make sense of this instruction by indexing the instruction to sensory (i.e., visual) and motor experiences they acquired before during physical manipulation.

In sum, the first attempts to apply a multimodal imagery strategy for improving reading comprehension seem promising in helping readers to better understand text. However, empirical studies in this area are scarce at the moment, and hence, the reported findings are preliminary and restricted in several ways. For example, the IM strategy has only been compared to a condition in which readers reread a text rather than for example receive training in another reading comprehension strategy, which does not provide a strong control condition. Furthermore, the IM strategy (Glenberg 2011) thus far has mainly been applied to narrative texts involving sentences describing concrete imaginable actions such as “The farmer pushes the hay through the hole” (see Glenberg et al. 2011b for an extension to mathematics). It is therefore unclear to what extent the IM strategy also effectively improves comprehension in other situations, for example in text not involving an action. Another issue that requires more investigation is the extent to which multimodal imagery instruction like sense evoking instructions (e.g., Algozzine and Douville 2004) or IM (Glenberg 2011) allows readers to process the text at a deeper level resulting in improved understanding of what the text is about (i.e., situation model). Thus far, empirical studies have only examined relatively low-level factors of cognitive processing such as the memory for text content and/or text elements. All in all, more empirical work is required to further investigate the potential of multimodal imagery strategies.

External Unimodal Visualization

Other than solely constructing internal visual images from a text, several researchers have investigated whether building a physical visual representation of a text influences text comprehension. A relatively popular strategy in this regard is asking readers to draw a picture that represents the event(s) depicted in a story, but as yet this strategy has often not been acknowledged as such in the classroom (Ainsworth et al. 2011). In the studies described below, drawings are referred to as an external visual representation that is completely provided by the reader (i.e., reader-constructed). Drawing to support text comprehension is a productive learning activity focused on building an accurate mental model from text. In particular, drawing supports the integration of verbal and pictorial representations into a coherent situation model because it is the situation model that forms the basis from which a drawing is constructed (Van Meter and Garner 2005). However, empirical evidence regarding the effectiveness of drawing to support reading comprehension is mixed (for an overview, see Van Meter and Garner 2005), which is likely due to several methodological variations across studies such as participants’ age (from first grade to college), text content (e.g., mathematics, science), and assessment of the level of understanding (from recall to deep comprehension).

Nevertheless, several tentative conclusions can be drawn from the body of drawing research. First, providing readers with instructional supports such as drawing prompts or guiding questions improves free recall performance, drawing accuracy, and problem-solving outcomes (Van Meter 2001; Van Meter et al. 2006), whereas drawing without additional supports shows no comprehension benefits (e.g., Leutner et al. 2009; Snowman and Cunningham 1975). That the drawing construction strategy benefits from using drawing supports corroborates with findings showing that readers who have the most accurate drawings also obtain higher posttest comprehension scores (e.g., Greene 1989). Apparently, drawing supports help readers to construct more accurate external visual representations, which in turn facilitates the acquisition of an internal representation of the text (Van Meter 2001). Importantly, most of the studies finding benefits of (supported) drawing, however, only compared a drawing condition to a control condition that received no specific strategy instructions on how to process the text. It has only recently been shown that asking readers to construct a drawing rather than engaging in other reading comprehension strategies (i.e., main idea selection, summarizing) enhances understanding of scientific text (Leopold and Leutner 2012). Second, comprehension gains resulting from a drawing activity are most likely to be found on tests of deeper understanding (Van Meter and Garner 2005). For example, Van Meter et al. (2006) found that engaging in a drawing activity increased problem-solving comprehension scores, but it had no effect on a multiple-choice recognition test. These findings are in line with the idea that transforming verbal input into a nonverbal representation stimulates the making of inferences, which in turn promotes the construction of a situation model (Larkin and Simon 1987). Thus far, however, the majority of drawing studies discussed above have focused on scientific text. To what extent the findings from these studies also hold for non-scientific (i.e., narrative) texts remains to be further investigated. In sum, whether or not drawing supports reading comprehension seems to depend on the level of support provided and the measures used.

Another way to construct an external visual representation from text is to offer readers “ready to use” elements they can use to make an accurate pictorial representation of story events. This approach contains both reader-constructed and other-constructed elements in the sense that readers themselves construct an external visual representation, but they do so using (partial) illustrations that are provided to them. Lesgold et al. (1977; 1975; experiment 2), for example, provided first grade students with plastic cut-out figures that they had to organize into an accurate illustration of a narrative story that was read to them. This illustration activity was found to improve factual story knowledge, cued recall, and free recall for narrative text irrespective of text length, text difficulty, and timing of the illustration activity (i.e., during or after the story) when compared to students engaging in an illustration activity that consisted of copying or coloring geometric forms. Importantly, the beneficial effects of the cut-out figure illustration activity were restricted in the sense that it only facilitated understanding of text if students received the actual figures needed to create an illustration and did not have to select the appropriate figures from a random set of figures before they could start constructing the illustration. Using similar materials, Rubman and Waters (2000) investigated whether visualizing narrative story events with cut-out figures on a magnetic story board would help readers to detect textual inconsistencies in stories. Third grade and sixth grade students read stories involving inconsistencies while simultaneously they used the cut-out figures to make an illustration of the story. Results showed that compared to a read-only group, the illustration activity enhanced inconsistency detection, story recall, and integration of text propositions, especially for less skilled readers in both third and sixth grade. So, research on using cut-out figures to help readers visually depict story events in an external representation seems a promising technique to improve readers’ understanding of text.

Johnson-Glenberg (2005) extended this work in a digital sense by investigating whether a web-based interactive visual strategy training (3D-Readers) can improve reading comprehension. Sixth grade and seventh grade poor comprehenders read science-oriented texts (combination of narrative and expository text) in a within subjects design (control condition before experimental condition). In the control condition, students read texts and unscrambled anagrams embedded within it in three sessions. In the experimental condition, students build visual models of the texts they were reading using icons (e.g., lamp, turtle) which they could place in a model space via drag ’n drop to make their external pictorial representation of the text. For example, readers first read a sentence explaining that the color of an object is really the light waves that the object is reflecting, followed by a sentence like “A banana is reflecting…?.” Readers then have to drag the correct icons out of a toolbox in the right order, that is, first a flashlight which would emit white light, then a banana, and then rays with a yellow hue that bounce off of the banana. Results showed significant improvements on pre–posttest measures of comprehension for the interactive visualization condition compared to the control condition. In addition, readers also changed in rereading behavior, with more scroll-backs happening in the experimental condition, especially for readers having the lowest reading comprehension skills. Although the sample size in this study is limited, the findings open up a promising and largely unexplored area in research on improving reading comprehension. It suggests that computer-based activities focused on encouraging readers to build an external visual representation of text can support reading comprehension at least for poor comprehenders.

Together, these findings suggest that using an external form of visualizing text content is beneficial because readers do not have to split their cognitive resources between visualizing text elements, holding them in working memory, and integrating these elements into a coherent representation in memory (Lesgold et al. 1975), which in contrast would be required when internally visualizing story events (i.e., make a picture in your mind). Hence, creating an external visualization of text content is expected to offload a readers mind as, consistent with DCT and Baddeley’s working memory model, cognitive processing demands can be divided over visual and verbal working memory storage systems (Baddeley 2000; Clark and Paivio 1991). Consequently, attention can be focused on critical elements as well as meaning-based activities such as integration of text information (Rubman and Waters 2000). This is at least true for narrative texts referring to concrete situations that are easily portrayed by simple cut-out figures on a background that does not change as evidenced by the studies discussed above. To what extent more complex scenarios, such as when an object changes color or shape which is often encountered in more scientific explanations of dynamic systems (e.g., lightning formation), are suitable to be visualized using cut-outs remains to be investigated. At the same time, it is important to realize that using a drag ’n drop methodology in a computer-based environment rather than paper-and-pencil visualization activities may introduce additional challenges. In a recent study, Schwamborn et al. (2010) found no beneficial effects of reading scientific text and generating an external visualization on the computer screen by moving and combining pictorial elements. They argued that it is likely that this is due to unfamiliarity with the computer-based environment leading to higher unnecessary demands on cognitive processing as well as because drag ’n drop activities constrain the translation of verbal into pictorial representations. Hence, further research is warranted to investigate the conditions under which different visualization strategies are most effective.

External Multimodal Visualization

To date, empirical research on constructing an external nonverbal representation depicting story events using multiple senses is scarce. In this section, we continue our discussion on the aforementioned work by Glenberg and colleagues (see Glenberg 2011) on the moved by reading intervention, which provides a good initial attempt to explore the possibilities of using a multimodal approach to improve reading comprehension. That is, we discuss another part of this intervention used by Glenberg, the PM strategy, which focuses on stimulating readers to create an external nonverbal representation of text content. This intervention, in which readers themselves construct a physical representation of the text, combines visualizing and motor activities specifically related to the story content, thereby drawing on both visual and motor modalities. It is the specific emphasis on motor activities that distinguishes this intervention from externally oriented visualization approaches using only the visual modality that do not provide explicit guidance on how to involve the motor modality (see “External Unimodal Visualization” section). In their first study, Glenberg et al. (2004) asked first and second grade children to individually read short stories about activities within a particular scenario (e.g., farm, house). Children in the PM condition had a set of toys such as a tractor and animals in front of them that they used to match to events described in the narrative. On reading critical sentences, which were indicated by a picture of a green traffic light after the sentence, children acted out the action described in the sentence using the toy figures. For example, on reading the sentence “Ben drives the tractor to the pumpkins,” children picked up the farmer toy and moved it to the tractor, which in turn they moved to the pumpkins. Results showed improved recall and inference making abilities for children in the PM condition compared to those in a control condition that only reread the text. This study illustrates that the physical actions performed by the readers help them to link the words in the narrative to their physical referents, allowing them to make an accurate situation model that can be mentally manipulated and reasoned about resulting in improved understanding.

In subsequent studies, these results have been extended in several ways showing that benefits of PM also materialize when working in triads (Glenberg et al. 2007) or mixed-age (first and third grade) pairs (Marley et al. 2011), using a narrated rather than a visual text presentation format (Marley et al. 2007, 2011), and when manipulation with real objects is replaced by manipulation on a computer screen with children moving an image of a toy with a computer mouse to match narrative events (Glenberg et al. 2011b). Interestingly, an approach drawing upon external representations produced by others, which is comparable to using external aids (i.e., illustrations) to support readers in constructing an internal mental representation of the text during the mental imagery process, also appears effective. That is, observing a teacher/instructor (Glenberg et al. 2007; Marley et al. 2007) or a fellow student (Marley et al. 2011) using the PM strategy to act out story content appears as effective as actually engaging in PM oneself. This suggests that it is not necessarily the actual act of manipulating toys but the activation of the appropriate motor actions and/or action repertoires in the motor system that is the effective ingredient of the PM strategy. These findings are consistent with a growing body of literature on the involvement of the mirror neuron system in observational learning tasks. On this account, observing someone else performing an action activates the same cortical circuits in the brain (i.e., the mirror neuron system) that are involved in executing that action oneself (Rizzolatti and Craighero 2004). Enhanced understanding by observing someone else manipulate objects has also been reported in other domains such as learning hand-manipulative tasks while observing video models (Ayres et al. 2009). From an instructional perspective, the above results imply that PM can be efficiently used during whole-class or small group lessons.

Despite the promising findings, the empirical work discussed above is restricted in the sense that it only used short narrative stories involving concrete situations requiring relatively simple actions. Hence, there are many opportunities to further explore the potential of the PM strategy. For example, it is yet unclear whether similar results will be obtained with longer texts, when reading about multiple simultaneously occurring actions that need to be acted out, or when reading expository texts that not necessarily involve concrete or familiar actions such as when reading about the process of lightning formation. In addition, the extent to which the PM strategy will be useful for improving reading text that does not concentrate on actions remains to be investigated. Moreover, the PM strategy employs a very specific way to involve the motor modality, that is, it focuses on moving toy objects from one location to the other. Several other ways to involve the motor system during reading are possible. For example, when reading about earthquakes, readers might be provided with vibrating sensory input to experience the shaking situation described in the text and to connect this experience to their mental representation of the story.

Another issue that is not addressed in the studies by Glenberg and colleagues is the role other modalities than visual and motor modalities may play in trying to better understand text. In the past decade, however, evidence that modality-specific perceptual information (e.g., sound) is activated in linguistic tasks is accumulating. The majority of studies making this claim are conducted in the context of embodied theories of cognition, assuming that cognitive processes are grounded in the same neural systems that govern perception, sensations, and action (Barsalou 1999). Nevertheless, except for the Glenberg studies discussed above, hardly any research has been performed trying to draw upon these insights for improving reading comprehension by simultaneously using multiple sensory modalities. Research on drama-based visualization methods, in which readers themselves construct an external visuospatial representation of the text, provides the only examples of how various sensory modalities might be recruited to enhance reading comprehension. Rose et al. (2000), for example, used a 10-week visualization training in which elementary school children read short narrative stories (stage 1) and visualized what they read in their heads and made illustrations of the scenes (stage 2). In the third and fourth stages of the training, children dramatized, that is acted out and “replayed,” the scene via the sensations that story characters might experience (e.g., the heat of the sun on one’s skin after standing outside on a sunny day for a long time and the character’s motivations, emotions, and feelings) as if they were the story character themselves. Results showed higher factual (but not inferential) knowledge scores for the drama group compared to a control group receiving the standard reading curriculum (i.e., read-and-drill exercises). Obviously, using the multisensory drama method offers readers the opportunity to explore the different senses involved in the story they read. The difficulty, however, lies in having teachers and students accurately dramatize what they read as some teachers and students may not feel comfortable acting out a scene or do not exactly know how to do so. A more student-friendly and more focused and structured visualization instruction drawing upon multiple sensory modalities is hence preferable for most schools. Directions for future research should therefore concentrate on exploring such possibilities.

In sum, only a few studies have investigated ways to depict story content in an external representation making use of multiple senses. Thus far, research has almost exclusively concentrated on the role of the motor and visual modalities for trying to improve reading comprehension. As the situation model readers construct during reading ideally involves information pertaining to all of the senses, an important task for future research is to find ways to encourage readers to incorporate other senses such as sounds, smells, and feelings into the external visualization process.

Discussion

In this article, we have presented four types of visualization strategies for improving reading comprehension, which vary in the extent to which they focus on the construction of an internal or external visual representation and/or their emphasis on the multimodal character of visualizing. The research reported above suggests that visualizing the situation and events described in a text can be an effective way to help readers to better understand text. Especially when readers receive visualization instructions focusing on the construction process, that is, how to construct an (mental) image during reading, the chance is increased that an accurate mental representation of the events described in a text is constructed. Moreover, it seems important to guide readers during this process, for example by using prompts, to arrive at an accurate nonverbal representation of the text.

These conclusions are supported by relatively many studies, although the evidence is primarily based on the research on unimodal visualization instructions such as mentally representing story content using imagery training or creating a drawing from the situation described in the text with supporting prompts. In fact, this is not surprising, as the majority of research thus far has been conducted on these types of visualization. Although on a theoretical level both contemporary theories of embodied cognition (Barsalou 2008) and DCT (Sadoski and Paivio 2007) stress the importance of sensory and motor experiences for constructing mental representations, relatively little empirical work has been conducted on whether and how targeting multiple sensory modalities during reading effectively helps readers to visualize text content. The first empirical results in this emerging area of research are mixed. In particular, the studies focusing on a multisensory imagery strategy (SAM; Algozzine and Douville 2004) and an imagining manipulation strategy (IM strategy; Glenberg et al. 2004) thus far have provided no or only very weak support for the idea that encouraging readers to go beyond simply constructing a visual (mental) image of the text they are reading improves their understanding of text. Rather than requiring readers to evoke sensory information acquired during previous experiences, as was the central aspect in these approaches, endowing readers with multiple sensory and motor experiences during the reading process, such as when actually “acting out” text content (PM strategy; Glenberg et al. 2004), seems a more promising strategy to improve reading comprehension, but the evidence for this is thus far based on only a few studies. Further investigating how the visualization process involving multiple modalities could be optimally supported is therefore an important issue for future research. For example, it would be interesting to investigate whether more explicitly and focused training procedures and/or additional support strategies in how to involve all of the senses in the reading process, as has proven successful in unimodal imagery studies, could improve outcomes of multisensory imagery approaches.

Together, these findings suggest that, at present, there is not (yet) very strong evidence that encouraging readers to go beyond simply constructing a visual (mental) image of the text they are reading by requiring them to use or activate sensory and motor modalities facilitates readers to construct a rich (mental) representation of the events described in a text. Future research should therefore more extensively investigate the advances that can be made in the multisensory approach. Several issues that need to be considered in this endeavor are described below.

Another important conclusion is that visualizing text content can effectively improve text comprehension irrespective of whether the visualization process involves the construction of internal or external representations. It is, however, important to keep in mind that the ultimate goal of instructing readers (how) to visualize text events is to support them toward the automatic construction of mental (i.e., internal) images of the situation referred to in the text. Although thus far internal and external visualization strategies have been primarily used and tested separately from each other, both strategies could fulfill complementary roles in the visualization process. That is, by requiring readers first to depict text content in an external multimodal visualization and subsequently ask them to visualize the text content through mental imagery, readers are gradually introduced to the construction of a mental representation of the text. In fact, several studies described above have already started to explore such an approach, but results have been mixed so far and only involved visual and motor modalities (e.g., Glenberg et al. 2004; Marley et al. 2007). Future research on factors like which in-between steps are crucial in going from an external to an internal visualization is needed to further investigate the instructional potential of this approach.

Directions for Future Research

On the basis of our discussion of the different visualization strategies to improve reading comprehension, several issues can be identified that require further investigation and/or provide promising new directions for future studies. Many of the discussed studies have asked readers to construct, or provided them with, a nonverbal representation of narrative text. It is, however, unclear to what extent creating and/or using nonverbal representations contributes to a better understanding of other types of texts such as informational texts which typically have a higher level of abstraction. Investigating this issue is interesting as research on processing more abstract information in various domains like problem solving and solving mathematics problems indicates that learning outcomes improve by involving multiple senses, although the majority of these demonstrations involves the motor modality (for an overview, see De Koning and Tabbers 2011). This strong emphasis on the motor modality is also apparent in the studies discussed above and might be due to the fact that of all senses, involving motor information in the comprehension process is easiest to experimentally manipulate, making it the mostly tested educational application of theories of embodied cognition. Consequently, any definite conclusions on which modality or modalities should be involved in the visualization process to optimally support reading comprehension cannot be made.

Evidently, investigating to what extent and how other senses could be involved when constructing a visual representation of text content and whether or not this leads to better comprehension of text is an issue that needs to be studied in the next years. For example, it would be interesting to ask readers to imagine the smell of the described situation or to provide readers with a certain scent to let them more lively experience the described situation in order to help them to construct nonverbal representations from text and hence improve their comprehension of text. In this respect, the theoretical underpinnings of this research also require close examination. At present, the work by Glenberg and colleagues on imagined and physical manipulation could be explained in terms of the activation of neurons responsible for both experiencing and representing direct perception and action (i.e., mirror neurons), which helps readers to construct a nonverbal representation based upon (re)activation of motor system information. However, such a mirror neuron explanation cannot account for any beneficial effects that might arise due to the involvement of other senses like smell. Trying to gain more insight into these issues is an important area for future research.

The research on visualization strategies to improve reading comprehension could also benefit from a methodological point of view. Studies differ considerably in the level of understanding that is measured varying from assessing more shallow processing like literal recall of text information to deeper, meaning-based measures like inference making. In fact, the strong reliance on shallow processing measures observed in the studies above does not match theoretical assumptions made by theories of reading comprehension such as situation model theory. According to situation model theory, visualization strategies help readers to construct a deeper understanding of text by creating a non-verbal representation of text content. On this account, evaluating the effectiveness of visualization strategies is only meaningful when also measures for assessing deeper understanding are used. Related to this point, it would be interesting to more directly investigate how the comprehension process occurs, that is, how readers come to their understanding. Although attention is paid to the visualization process itself during visualization training, studies typically tend to primarily focus on the outcomes of the training and/or comprehension process. They do not examine the in-between trajectory from training to final assessment, that is, the cognitive activities and steps readers engage in to construct meaning out of the text they read. Yet, this could provide valuable information on what aspects of visualization training are effective and which aspects need refinement. Therefore, taking up measures for deeper comprehension and using process-oriented measures in future studies is necessary to provide more insight into the effectiveness of visualization strategies to improve reading comprehension.

Furthermore, the discussed studies have investigated the effectiveness of unimodal or multimodal visualization strategies, either trained or not, on reading comprehension compared to control groups, which usually engaged in unrelated tasks such as solving anagrams, reread the text or did nothing at all. In only a few studies (e.g., Leopold and Leutner 2012), control groups were used in which readers received another strategy training, but this typically involved a verbal strategy like summarizing rather than another visualization strategy to which the effects of visualization strategy could be compared. Moreover, according to DCT, representing text verbally may activate nonverbal representations and vice versa (Sadoski and Paivio 2004), suggesting that verbal strategies like verbal rehearsal may also help to bring about mental visualization effects. Therefore, verbal reading strategies may be a suboptimal control condition in (mental) visualization studies. A better alternative would be to use a visualization strategy as a control condition in visualization studies. However, there are yet no studies adopting such an approach. For example, in none of the studies comparisons were made between the effectiveness of two different visualization strategies within the internal (e.g., mental imagery vs. drawing) and external (e.g., evoking vs. creating multiple sensory and motor experiences) representation dimension. Also, no direct comparisons were made between unimodal and multimodal visualization approaches (e.g., drawing vs. evoking multiple sensory and motor experiences). Given these suboptimal control conditions, the beneficial effects observed for visualization strategies could very well provide an overestimation of the actual effectiveness of visualization strategies. It is therefore not possible to draw clear conclusions on the effectiveness of visualization strategies in general, let alone the relative effectiveness of one visualization strategy over the other. Likewise, it is yet unclear whether it is more useful to use multimodal visualization strategies, unimodal visualization strategies, or a combination of them for improving reading comprehension. Evidently, taking up these challenges in future studies could improve our understanding of the most effective (combination of) visualization strategies.

The studies discussed above and the conclusions drawn from them are also restricted in the sense that they concern the construction of non-verbal representations in readers who have adequate decoding skills. Not much is known about the role of visualization strategies to support the reading comprehension process in readers who have poor decoding skills. One reason may be that, in line with DCT and more classic information processing models like the shared capacity theory (LaBerge and Samuels 1974) or the “bottleneck hypothesis” (Perfetti 1985), poor decoders need most of their attentional and working memory capacities to decipher the words in a text, leaving little cognitive resources for engaging in meaning-based activities to understand the text at a deeper level. Therefore, an important line for future research is to investigate the extent to which teaching visualization strategies to poor decoders is an appropriate way to improve their understanding of text. Moreover, enhanced comprehension processes arising from applying visualization strategies may even compensate for their lack of decoding skills. For example, top-down processes such as predictive reading or context-use strategies have been found to speed up word recognition (Kintsch 1998; Stanovich 1980). These issues need to be addressed in future studies in order to gain more insight into the interplay between visualization strategies, reading comprehension, and decoding skill.

Another issue that requires attention in future studies is related to the trainability of visualization strategies. An important distinction in theories of embodied cognition is that between non-conscious automatic simulations and conscious deliberate simulations implicated in language understanding (Barsalou 2008). The latter results from effortful cognitive activities in working memory to construct a non-verbal representation of text content. It is this type of visualization that other theories like DCT are concerned with and often refer to as mental imagery. Evidently, within this tradition, intervention approaches aimed at facilitating the construction of a non-verbal representation from text logically have concentrated on training readers in different visualization approaches to deliberately depict text in a non-verbal way. In contrast, it is assumed that the mental simulation process occurs at least partially unconsciously and automatically (Barsalou 2008; Postma and Barsalou 2009). In this sense, it could be argued that training readers in visualizing text content would not help them in creating non-verbal representations from text and should therefore not be aimed at. Recently, however, empirical evidence has become available showing that large individual differences exist in the mental simulation of described situations and that these differences vary as a function of sex and spatial ability level (Wassenburg et al., 2013), suggesting that “automatic” mental simulations are more flexible than initially thought. An interesting question arising from this finding is whether it is possible to help readers via training to facilitate and/or improve upon the mental simulation process. A study investigating this issue is currently being conducted in our lab. At least, helping readers for whom the mental simulation process occurs suboptimally to consciously apply visualization strategies could support these readers in developing accurate non-verbal representations of the described situation and hence better understand the text that is read.

In conclusion, this article has provided insight into the different types of visualization strategies available for improving reading comprehension that are firmly grounded in cognitive theories of reading comprehension such as situation model theory and embodied theories of cognition. These contemporary theories have been shown to offer useful theoretical clues to develop and use visualization strategies for improving reading comprehension. A critical analysis of the studies empirically testing these suggestions showed that both internal and external visualization strategies as well as visualization strategies drawing upon a single modality or multiple modalities are effective at improving reading comprehension and/or provide promising directions for future research. Drawing upon these insights and translating them to effective comprehension instruction approaches in future studies could certainly advance readers’ comprehension of text.