Visual displays play a part in many of our everyday experiences. The data presented in articles and manuscripts, the websites we peruse for entertainment, and the software tools we utilize in our occupations and educational activities are obvious examples. We encounter visual displays on our portable media devices, in printed materials, as presented by teachers and in instructional guides, from Internet sources, and from colleagues and co-workers in their presentations. The mapping tools we use for directions, our weather updates, our favorite news webpages, and statistical output from analytic software packages include visual displays we consult on a day-to-day basis. When these visual displays are well designed, we interpret them easily, interact with their contents efficiently, and enjoy their use. When they are poorly designed, we struggle to interpret them, complain about their legibility, and may even ignore them.

Thankfully, there is an extensive body of research that suggests how and when visual displays effectively promote learning. This work is now more relevant than ever; it is incredibly cheap and easy to produce high-quality multimedia messages that can be disseminated quickly to audiences worldwide. Empirical work in educational psychology, cognitive psychology, and the learning sciences has attempted to identify how and when visual displays can be effectively understood and utilized by viewers. The purpose of this article is to provide a detailed overview of the ways in which visual displays can affect cognitive processing. The article consists of four main sections. The first section provides a general overview of visual displays. The second section describes three basic forms of processing and how visual displays can be used to leverage these forms of processing to afford different kinds of inferences. We also discuss how displays can affect processing efficiency. The third section identifies areas for future research. The last section provides a brief set of implications about the design and use of visual displays with a specific focus on the ways in which the design of visual displays can enhance cognitive processing.

Types of Visual Displays: Semantic and Pictorial

An effective visual display as designed for educational purposes has two main functions: (1) to communicate important information and (2) to communicate relations about information via spatial arrangement (Larkin and Simon 1987; Robinson 1998). Based on this definition, and the kinds of displays that people commonly encounter, we focus here on two main types of visual displays: semantic and pictorial. The main way that semantic and pictorial visual displays differ is that they utilize distinctive conventions to communicate information. Semantic displays use symbols, whereas pictorial displays use images (Carney and Levin 2002). Figure 1 is a hierarchy-matrix that shows main types and sub-types of these visual displays: semantic (sequence, hierarchy, matrix, concept map) and pictorial (static and dynamic). These two main types of displays are compatible with frameworks provided in previous work, including taxonomies described by Robinson (1998) and Kiewra (2012). Semantic visual displays are also known as graphic organizers, which are a type of adjunct display (i.e., a display that is inserted into a text to communicate important information and its structure; Robinson 1998). We opt to use the term semantic visual display as we believe it is more intuitive and can be readily distinguished from pictorial visual displays. Other researchers have parsed and labeled visual displays in other ways, but our attention on these two types allows us to both build on previous work and to focus on how these types of displays can influence cognitive processing.

Fig. 1
figure 1

Hierarchy-matrix of types of visual displays

Semantic Visual Displays

A semantic visual display communicates important information through symbols, most frequently text (i.e., words; Schnotz 2002). These symbols have meaning based on arbitrary social rules or conventions; that is, the symbols are not similar to the idea that they represent. For instance, a small furry mammal that has a tail and barks is symbolically represented as “dog” in English or “perro” in Spanish, based on social convention. The words themselves share no explicit similarity with the object they represent. Further, important information is arranged in a way that conveys relations between the symbols. For instance, a sequence display is a type of semantic display that uses space to communicate temporal relations (i.e., chronological ordering of steps, events, stages, or phases) among symbols. Consider that the steps in a sequence display that are closer in space are more proximal in time (more temporally contiguous), whereas steps that are farther away in space are more distal in time (less temporally contiguous).

There are several types of semantic visual displays including sequences, hierarchies/tree diagrams, and matrices (Kiewra 2012; Novick and Hurley 2001; Robinson 1998; Vekiri 2002; Winn 1991). A sequence uses arrows to show steps, events, stages, or phases in a temporally ordered process (see Fig. 2), such as the locations throughout the body to which blood flows. A hierarchy uses branches to display concepts on the basis of super- and subordinate conceptual levels (see Fig. 3). A matrix (see Table 1) is a two-dimensional table that is organized vertically by topic (e.g., artery or vein) and horizontally by category (e.g., thickness, elasticity, and function). Concept maps, a potential fourth type, display a concept’s critical elements with circles or boxes, which are linked by lines and labels that are used to establish associations (Nesbit and Adelsope 2006; O’Donnell et al. 2002). Concept maps are not included in Fig. 1 because relations between and among ideas are established primarily by the propositional nature of the links between nodes rather than by their spatial arrangement, per se. Graphs are a potential fifth type (Shah and Hoeffner 2002), which we consider to be second-order semantic visual displays as they necessitate a higher level of abstraction to interpret their use of symbols (e.g., axis, lines, points) as representing other symbols (e.g., numeric values) that communicate important information (e.g., aggregate data).

Fig. 2
figure 2

Sequence

Fig. 3
figure 3

Hierarchy

Table 1 Matrix

Pictorial Visual Displays

A pictorial visual display communicates important information through images, often simplified into icons and illustrations. These icons have meaning based on their similarity to the ideas or concepts represented; specifically, they share common properties and associations with what they are intended to depict (e.g., an illustration of an engine or the water cycle; Schnotz 2002). Pictorial displays can vary with respect to physical and conceptual fidelity (Hollan et al. 1984). For instance, Fig. 4 shows two pictorial displays of the human heart (Butcher 2006). Figure 4a has physical fidelity; it depicts, in detail, the actual physical characteristics of the heart. In contrast, Fig. 4b has conceptual fidelity; it depicts a model of the system that mirrors but does not replicate the heart’s physical characteristics. Importantly for this discussion, a pictorial display can be static or dynamic. For instance, a pictorial display of the heart and blood flow can be depicted as a still frame with fixed arrows or can be depicted in motion by showing the flow of blood through various chambers following each heartbeat.

Fig. 4
figure 4

Pictorial visual displays with physical and conceptual fidelity. From Butcher (2006)

Of course, a single display could have both semantic and pictorial visual features. The crucial idea is that different types of visual displays, whether they are semantic, pictorial, or some combination of each, constrain cognition in specific ways; that is, different displays afford or promote different kinds of inferences. For instance, a causal diagram can afford temporal inferences, whereas a matrix can afford relational inferences. Similarly, a static display can afford inferences about an object’s shape and its position in relation to other objects, whereas a dynamic display can afford inferences about transformations with respect to an object’s shape and its interrelations with other objects.

How Visual Displays Affect Cognitive Processing

A variety of models has attempted to describe how individuals interact with and learn from visual displays including those described in Marr (1982), Kosslyn (1985), Clark and Paivio (1991), Schnotz (2002), Lowe and Boucheix (2008), and Mayer (2009). In this section, we describe four main ways, derived from the above cited work, in which visual displays affect cognitive processing including selection, organization, integration, and processing efficiency. For the first three, we draw upon Mayer’s select-organize-integrate (SOI) model (see Fig. 5; Mayer 2009). We (a) define each cognitive process, (b) explain how it affects inference, (c) indicate why it is important, (d) recommend how to promote it from an instructional design standpoint, and (e) provide evidence to support the recommendations. As a fourth consideration, we discuss the role of visual displays on processing efficiency.

Fig. 5
figure 5

The select-organize-integrate model

Selection

Selection refers to focusing or directing attention to information in an instructional message. Selection affects inference by constraining what information is available to and active from memory. When a person selectively attends to a display, the contents and spatial arrangement are subsequently processed in memory to the exclusion of other information. Selecting important information is key because attention must be directed toward this information in order for it to be processed. Simply put, if attention is not allocated toward important information, it will not be consciously processed. Similarly, if attention is allocated toward interesting but unimportant information, those contents can disrupt the coherence of the main instructional message (e.g., Park et al. 2015; Sanchez and Wiley 2006).

From a design standpoint, signaling can help users to select important information (e.g., Mayer 2009). Signaling involves the use of cues to increase the salience of important information. Different ways to signal include the use of color, arrows, bold print, underlining, movement, luminance contrast (e.g., spotlight), text segments (e.g., labels), and arrows (de Koning et al. 2009; Gutierrez et al. 2015; Lane 2015; Tversky et al. 2000).

Signaling makes important information, as compared to less important information, more perceptually salient, which facilitates selection. In an illustrative study of such effects (Ozcelik et al. 2010), participants viewed a static image of a turbofan jet engine with its parts labeled while they listened to a narration that explained how the engine functioned. Participants who viewed the instructional message that included signals (i.e., important terms were temporarily presented in red print at the point at which they were mentioned in the narration) directed more attention (as measured via eye-tracking) toward relevant information and located necessary information more efficiently and effectively. Similar effects have been obtained with participants who viewed cued or non-cued animations of the subsystems of the human circulatory system (de Koning et al. 2010). Cueing in this study involved the use of visual contrast to highlight aspects of the display (i.e., shading all elements in the animation except a relevant subsystem). Participants who received such cues looked more often and for longer amounts of time at the cued than non-cued content, as measured with eye-tracking methods. These studies, and others like them, demonstrate that signaling promotes the selection of important information (de Koning et al. 2009; Hinze et al. 2013).

According to the signaling principle, learning is facilitated when important information in an instructional message is highlighted because directing user attention toward important information supports subsequent processing of that information (Mayer 2013). This principle provides a rationale for using cues to increase the salience of important information to promote selection. However, while research has shown that signaling facilitates the selection of important information, signaling does not on its own necessarily promote learning. This important consideration was demonstrated in three experiments by Kriz and Hegarty (2007) that required participants to view static and dynamic diagrams that potentially included arrows signaling parts of a mechanical system (i.e., a flushing cistern). Although participants who received signals allocated more attention to display regions that were highlighted by the arrows than did participants who did not receive the signals (as measured with eye-tracking), participants in both conditions showed similar performance on comprehension measures. The fact that signaling failed to facilitate learning in this set of experiments underscores the additional, necessary value of two subsequent processes: organizing and integrating selected information. Although cueing may guide attention toward important information, merely attending to cued information does not necessarily lead to greater gains in learning than when cues are absent; what is crucially important is how individuals process the information once it has been selected. Thus, signaling is an early process that guides attention toward important information and affects subsequent comprehension and learning (Fiorella and Mayer 2015).

Organization

Organization involves inferring relations between and among pieces of information in an instructional message. Organization is important because encoding and storing facts and/or concepts on the basis of how they are associated with each other, and with prior knowledge, facilitates retrieval from long-term memory (Dunlosky et al. 2013). Visual displays can afford different types of organizational inferences. Next, we describe the types of organizational inferences that viewers may be encouraged to construct with visual displays.

The first is temporal inference, which refers to coding the chronological ordering of steps, events, stages, or phases, such as the transformation of a caterpillar into a butterfly. With respect to semantic displays, sequences in particular afford temporal inferences because important information is spatially organized in a way that conveys the temporal relations among steps. For instance, sequence diagrams can help students understand causal relations about science topics. In two experiments by McCrudden et al. (2009), participants read a text about how airplanes achieve lift (experiment 1) or about how astronauts develop kidney stones during space travel (experiment 2) and then either studied a sequence diagram or reread the text. In both experiments, participants who studied the sequence diagram had higher scores on measures of recall and transfer than did participants who merely reread the text. With respect to pictorial displays, static and dynamic images both afford temporal inferences. Changes to important elements of the image can be salient because they are spatially and temporally contiguous, which facilitates detection of these changes (Höffler and Leutner 2007; Lowe 2003; Plass et al. 2009). For example, static or dynamic images can be used to support temporal inferences about the steps involved in lightning formation by showing air movement over time and how those changes are related to the air’s proximity to the ground and clouds.

The second is hierarchical inference, which refers to coding the structural relations (i.e., superordinate and subordinate) between concepts, such as superordinate categories (e.g., furniture, vehicle), basic categories (e.g., chair, car), and subordinate categories (e.g., kitchen chair, sports car). With respect to semantic displays, hierarchies, tree diagrams, and even outlines afford hierarchical inferences because concepts are organized spatially according to their level within an ordered structure (Kiewra 2012). Research on concept learning and classification suggests that conceptual knowledge is organized hierarchically in memory in the form of superordinate, basic, and subordinate categories (e.g., Rosch et al. 1976). For example, consider the biological taxonomy for the superordinate category of fish. The basic categories of fish and their subordinate categories could include bass (sea bass, striped bass), trout (rainbow trout, steelhead trout), and salmon (blueback salmon, chinook salmon). Displaying the taxonomic structure of concepts can therefore facilitate hierarchical inferences (Jonassen et al. 1993). These inferences can help individuals think about and distinguish among the structural levels of the concepts. With respect to pictorial displays, it is unclear whether purely pictorial displays explicitly afford analogous hierarchal inferences, although it is reasonable to predict that adding images to a skeletal hierarchy display could facilitate them (as is sometimes seen with cladograms).

The third is relational inference, which refers to coding comparisons between facts/concepts, such as inferring that arteries carry blood away from the heart, whereas veins carry blood to the heart (see Table 1). With respect to semantic displays, matrices in particular afford relational inferences, in part because the ideas to be compared can be presented in close proximity (Kauffman and Kiewra 2010). For instance, in Kiewra et al. (1999), participants read a text in isolation, or with outlines or matrices, and were assessed on relational learning of the presented contents (e.g., What is the relation between depth and diet? As fish swim deeper, the size of their diet increases). The matrix and outline displays led to greater relational learning than did presenting the text in isolation, and the matrices led to greater relational learning than did the outlines. Thus, matrices support relational inferences among important ideas.

Pictorial displays are particularly well-suited to afford relational inferences given they can depict the physical and conceptual fidelity of different objects and the visual-spatial arrangement of those objects in relation to each other (e.g., Butcher 2006). Given the diverse nature of pictorial displays, researchers have identified four kinds of coding involved in the comprehension of pictorial displays based on a viewer coding an object’s shape (intrinsic), an object’s relation to other objects (extrinsic), and whether the display is static or dynamic (Newcombe and Shipley 2012). These codings represent ways in which pictorial displays affect cognitive processing. Intrinsic-static coding involves inferring an object’s shape through a static image, such as an image of the human heart. Similarly, intrinsic-dynamic coding involves inferring an object’s shape and changes to the object through a dynamic image, such as an animation of the heart beating. Extrinsic-static coding involves inferring an object’s position in relation to other objects through static images, such as a map that shows a city’s location in relation to other cities (i.e., man-made features) and naturally created geographical features (e.g., lake, mountains). Similarly, extrinsic-dynamic coding involves inferring changes to an object’s position in relation to other objects through dynamic images, such as viewing a cheetah pursue a herd of antelope and how their spatial relations change over time during the pursuit. Intrinsic and extrinsic coding can happen at or near the same time; for instance, it is possible to perceive an object, and its relations to other objects, quite quickly during everyday processing. Thus, intrinsic-static and intrinsic-dynamic displays afford relational inferences within objects (e.g., structural and functional relations among the heart’s atria and ventricles), whereas extrinsic-static and extrinsic-dynamic displays afford relational inferences between objects (e.g., structural and functional relations between the heart and lungs).

It is important to note that it is possible for an individual to generate more than one type of inference from the same display. For example, a dynamic visual display of the heart pumping may show (a) changes to the heart, arteries, lungs, and veins over time, which would involve intrinsic-dynamic coding and temporal inferences, and may show (b) the relations of the heart to other organs such as how heartbeat affects artery size, which would involve extrinsic-dynamic coding and relational inferences. Similarly, a hybrid semantic display could afford multiple types of inferences. For instance, a hierarchy-matrix could afford hierarchical and relational inferences. We have identified these different types of inferences to specify some of the ways that displays can influence processing; however, we do not mean to imply that such inferences always occur nor that such inferences only occur individually and/or in the presence of particular displays. In fact, building combinations of such inferences may be crucial to supporting comprehension (Graesser et al. 1994; Rapp and Taylor 2004).

As described above, visual displays can afford different types of organizational inferences. What can be done to increase the likelihood that people will draw such inferences? From a design standpoint, extracting and localizing important information can support organizational processes (Jairam and Kiewra 2010; Kauffman and Kiewra 2010; Kiewra et al. 1999; Larkin and Simon 1987; Pastor and Finney 2013; Robinson and Kiewra 1995; Robinson and Schraw 1994). Extraction involves setting apart more important information from less important information, and localization involves placing related information in close proximity. Localizing information that has been extracted facilitates organization because more important information is physically separated from less important information, and more important information is spatially integrated. A study by Kauffman and Kiewra (2010) illustrated the combined influence of extraction and localization on learning. In their first experiment, participants listened to a lecture about wildcats and then were randomly assigned to one of four conditions on the basis of the type of materials they received. The first group received a standard text (1998 words), the second group received an extracted version of the text (351 words) that consisted only of the important words in the same location on the page as in the standard text but with the remaining words omitted (to investigate extraction in isolation), the third group received an outline of the important information (367 words), and the fourth group received a matrix of the important words (to investigate extraction and localization in combination). Participants who studied the matrix demonstrated the best performance on three tests of relations and facts. Further, participants who studied the outline outperformed participants who studied the extracted text on all three tests and outperformed participants who studied the standard text on two of the tests. These findings indicate that the localization of extracted information facilitates learning. Thus, once information is extracted, it must be localized in order to support organizational inferences.

According to the spatial contiguity principle, learning is facilitated when important pieces of information are presented in close proximity (Ginns 2006; Mayer 2013). Presenting important pieces of information in a spatially integrated format is more effective than when they are spatially separated (split-attention format) as it helps individuals more readily see relations among facts/concepts. For example, Moreno and Mayer (1999) found that when an instructional message included on-screen text and an animated visual display, people learned better when text segments were placed near the action being described than when text segments were placed at the bottom of the screen. Presenting important information in an integrated format means that fewer cognitive resources need to be dedicated to visually search for information (Ayres and Sweller 2005; Robinson 1998; Sweller et al. 1998).

Integration

As a complement to organization, integration involves inferring relations between important information contained in an instructional message and prior knowledge. For example, an individual may infer that arteries, which pump blood away from the heart, are thicker and more elastic than veins because they have to adjust to more dramatic changes in blood pressure. In contrast to organization, which can be driven by features of a display, integration necessitates making connections to what people already know. Integration involves the simultaneous activation of prior knowledge and information from an instructional message such that the two become associated in memory.

Integration can be passive or active. It is passive when retrieval of the information is automatic and is not guided or directed by the learner (O’Brien et al. 2010; O’Brien et al. 1998). A visual display can facilitate passive activation when a display’s spatial layout serves as a cue that affords the retrieval of information from memory. As a result, information from the display and prior knowledge can become associated because they are simultaneously available in working memory.

Integration is active when the learner consciously seeks to identify meaningful relations between activated prior knowledge and information contained in an instructional message (Ainsworth and Loizou 2003; Chi and Wylie 2014; McElhaney et al. 2015; Van Meter et al. 2015; Wylie and Chi 2014). One way to measure integration is through the use of a think-aloud methodology during reading to assess deliberate attempts to use information from a display to modify prior knowledge (Ericsson and Simon 1993; McCrudden et al. 2011). For instance, in Ainsworth and Loizou (2003), participants either read a text or viewed a diagram about the human circulatory system and were prompted to self-explain as they thought aloud. The results showed that participants who received the diagrams generated more self-explanations and had higher scores on post-tests than did participants who received the text. Thus, individuals may use prior knowledge to interpret or elaborate incoming information, or incoming information may lead learners to modify or in some way change their existing knowledge structures.

Integrating information is key because establishing connections between to-be-learned information and prior knowledge facilitates memory for and use of that information (Bransford et al. 1999). From a design standpoint, separating important information from less important information (extraction), placing related information in close proximity (localization), and engaging in constructivist activities with the material (i.e., activities designed to foster understandings through problem solving) can be used to facilitate the integration of important information. With respect to constructivist activities, visual displays can be used to encourage learners to reflect upon and test their ideas. For example, a visual display designed to explain how thunderstorms form might allow a user to select particular variables, adding and removing elements that are crucial in the process. By doing this, learners are able to actively and explicitly test hypotheses, as informed by their existing knowledge and as reiterated or supplemented by the use of the visual display. Rather than merely displaying information for learners to encode, these constructivist activities challenge students to build predictions, run simulations, and manipulate factors that are crucial to the learning process. This supports the construction of deeper understandings as students engage with and test different ideas.

It is unclear whether semantic and pictorial visual displays can encourage different types of integrative inferences solely on the basis of the spatial arrangement of their contents. Visual displays, however, can afford integration when learners use specific strategies to construct meaning as they interact with the display contents (e.g., Ainsworth and Loizou 2003; Van Meter et al. 2015; Wylie and Chi 2014). In Sauter et al. (2013), students worked directly with a visual display that allowed them to test hypotheses about the decay of radioactive materials. These materials decayed at predictable rates, so understanding the relations among variables, the results, and even the need for multiple investigations to collect valid data can prove quite challenging to novice students. The use of simulations embedded in a display that was designed to depict the decay processes supported students’ understanding of the core principles, particularly in comparison to non-interactive simulations. This example shows how constructivist activities can support integration as learners engage in knowledge generation that relates to and goes beyond the information provided in a display. Constructivist thinking occurs when individuals actively build understandings based on their knowledge. The specific tasks that individuals undertake as they interact with visual displays can facilitate this kind of learning.

Processing Efficiency

Learners have limited processing resources. Of particular relevance to visual displays are the resources associated with attention and working memory. Attention involves focusing on some feature of a message, the environment, or even an internal thought. This focus can be driven by external factors that draw attention automatically (e.g., color, a blinking light) or by the individual who makes deliberate decisions about where to direct attentional resources (e.g., expectations that the animation in a visual display will be useful for understanding a STEM topic). Attention directed toward a visual display, whether driven in a top-down (by the learner) or bottom-up (by features of the display) manner, is necessary to initiate and maintain focus on stimuli (Hegarty et al. 2010). This focus can be a challenge though as attentional resources are limited and can only be directed toward a narrow range of information contained within an instructional message.

Information that is given sufficient attention and recognition resides in working memory (WM), a term which some researchers use to refer to the mental “desktop” where thought occurs (e.g., Baddeley 2007; Mayer 2009). WM is a fixed resource that can be used to process an instructional message. For example, interpreting a display’s contents and determining the usefulness of those contents given a person’s goals (e.g., testing hypotheses; preparing a summary) involves active processing in WM (e.g., encoding information into memory, retrieving information from memory, etc.). Thus, actively maintaining and using information in WM consumes precious attentional resources, making it crucial that this limited pool be leveraged in a way that contributes to comprehension and understanding of important information in an instructional message. The hope is that some kinds of visual displays can support processing by minimizing the resources necessary to engage with information.

When designed effectively, a visual display can improve processing efficiency, such that it helps a learner select important information more quickly with the display than without it (Larkin and Simon 1987). For instance, when an instructional message is presented in text, it often appears with a variety of different kinds of information, some relevant, some complementary but not necessary, and some extraneous. This can increase the amount of time needed to identify the important information (Robinson 1998). A carefully designed display can present the important information, which facilitates selection. Similarly, a display improves processing efficiency when it helps a learner organize important information more quickly with the display than in its absence or if the display is not designed well (e.g., related ideas are not near one another). For instance, spatially integrating important information can make it easier to see relations among important content than when that information is spatially separated (Sweller et al. 1998). The spatial design of a display can thus potentially facilitate or impede organizational inferences of presented content. An important goal then is to ensure that a visual display reduces or eliminates processing that can interfere with the selection and/or organization of important information. For instance, a visual display can minimize the need to hold facts in WM during a search for related information as would occur when searching a text for disparate pieces of information that need to be related to one another.

Recommendations

The above sections described the kinds of processes that individuals use when they attempt to comprehend visual displays. The literatures to which we referred, and the studies that have informed these discussions, also provide a basis for recommendations for designing visual displays. Below, we make seven explicit recommendations for the design of visual displays based on the aforementioned ideas (see Table 2).

Table 2 Guidelines for the design of visual displays

First, displays should be designed to support the selection of important information. To do this, designers can use signaling or can highlight more-important information to increase its relative salience as compared to less-important information (Mayer 2009). Another way to support selection is by extracting and localizing important information. For instance, a well-designed matrix only includes important information, and that information is laid out in such a way that enables a viewer to readily select and compare related ideas (Kiewra 2012).

Second, displays should be designed to decrease the likelihood that viewers select information that may detract from their understanding of important information. Viewers may be drawn to interesting yet unimportant information or design features that interfere with their comprehension of important information (e.g., Park, et al. 2015; Sanchez and Wiley 2006). To avoid the selection of information or features that interfere with comprehension, designers can eliminate extraneous information from a visual display, which in turn can increase the coherence of the instructional message (Mayer 2009).

Third, displays should be designed to support the organization of important information. To do this, designers should localize or place related ideas or images near one another, which reduces the need to locate spatially separated idea and facilitates organizational inferences. Presenting important pieces of information in a spatially integrated format can support organizational inferences by increasing processing efficiency (Larkin and Simon 1987).

Fourth, designers should align the type of organizational inference that a visual display affords with the type of inference a learner is expected to make. This can facilitate comprehension and learning and increases the likelihood that the learner will make the expected inference. With respect to semantic displays, a sequence affords temporal inferences, a matrix affords relational inferences, and a hierarchy affords super- and subordinate inferences. Static and dynamic pictorial displays afford temporal and relational inferences, for instance. Further, combinations of semantic and pictorial displays can be complementary and support numerous types of inferences. During development, it is important to specify the type of inference that learners are expected to make to design the display accordingly.

Fifth, when designers expect learners to make integrative inferences, visual displays should be accompanied by constructivist activities that can support the integration of important information with what individuals already know. Displays that promote organizational inferences may not necessarily promote integrative inferences, particularly in the absence of constructivist activities (Mayer and Johnson 2008; McCrudden et al. 2014).

Sixth, designers should consider learner characteristics when developing their visual displays. Individual differences affect processing of visual displays (Ainsworth 2006). For example, people might have different amounts of spatial ability (Höffler 2010) that influence the ease with which they process a visual display. Or individuals might differ in how they utilize those resources (Just and Carpenter 1992); that is, people might exhibit different strategies and tendencies when they process displays (Ponce and Mayer 2014). Further, individuals can differ in both the quantity and quality of prior knowledge they possess, which could influence the ways in which integration operates during comprehension (e.g., Hegarty et al. 2010; Mason et al. 2013).

Besides these characteristics, other kinds of individual differences matter. Consider that the kinds of expectations or goals that individuals have when they approach a visual display could guide particular kinds of interactions with the material. For instance, learners who seek to understand content might work harder to organize and integrate what they are seeing, in contrast to learners who seek to peruse a display for fun in a more cursory way. Different learners might have different motivations to engage in the processing of a display, which can influence the extent to which they attempt to make connections or derive understandings from what they are viewing. Other individual differences that are only recently beginning to receive empirical attention could also play important roles. These include cultural considerations (e.g., Guitèrrez and Rogoff 2003), learner preferences (e.g., Kozhevnikov et al. 2014), and the need or desire for competency (e.g., Stroet et al. 2015). Across all of these characteristics, the ways in which an individual engages with, processes, and derives an understanding of the display could be related to features of the learner. Thus, it is important to identify learner characteristics and to carefully consider how they might interact with display experiences.

Lastly, as a caveat, readily comprehending a display does not ensure that the information will be encoded and/or retrieved from memory because conditions that enhance performance during instruction are not necessarily the same conditions that enhance long-term learning (Bjork 1994). Therefore, if the display contents are meant to be comprehended and used later, designers should offer accompanying activities (built in and/or complementary) that support encoding and retrieval. Viewers should be encouraged to interact with the contents of a display, and this should be done in a way that not only promotes comprehension but also facilitates long-term learning. This might mean increasing the cognitive demands placed on learners through various constructivist-type activities such as hypothesis testing.

Directions for Future Research

Visual displays are ubiquitous, whether examined as part of our research activities, perused as we browse online websites, or evaluated as we make decisions with respect to purchases, health, and entertainment. The current review has attempted to identify some of the assumptions underlying their use, the processes associated with understanding them, and ways in which designs might be leveraged to support those processes. But given the ubiquity of visual displays, whether intended for instruction or entertainment, research needs to focus on the everyday design, application, and understanding of a variety of kinds of displays. While the categories we have identified in this review can be used to fit different display tokens, one area for future work involves understanding the ways in which people actually utilize displays in and out of formal educational settings (Cromley et al. 2010). This is particularly important as visual displays are used to communicate information to broader audiences on critical issues and topics such as the use of vaccines, climate change, and scientific thinking. Along these lines, a second area for future work involves identifying specific ways in which visual displays can be used to support integrative inferences. This likely entails combining visual displays with constructivist learning activities.

As a third related area for future work, it is worth considering whether contemporary educational materials are effectively designed given what we know about the design and application of visual displays (Hinze, et al. 2013). Consider any current science, technology, engineering, and mathematics (STEM) textbook. Complex concepts and theories are often illustrated using displays. But do those displays adhere to principles that support, for example, organization and integration? A rigorous survey of textbook materials would prove incredibly useful not just to the community of learners relying on them, but also to the academic fields seeking to determine the utility of their accounts and to the industries responsible for producing effectively designed educational materials.

Finally, the current article focused on two-dimensional visual displays. However, there are a variety of other displays that are worth considering, particularly given the speed with which technological innovations are currently being developed and implemented. For example, three-dimensional visual displays (e.g., manipulatives) have shown promise in promoting learning (Carbonneau et al. 2013; Fyfe et al. 2014; Marley and Carbonneau 2014). Research could also focus on the use of haptic or tactile displays that rely on touch for use by different populations (e.g., individuals with visual impairments) and whether they enhance comprehension.

Conclusion

In conclusion, visual displays are not designed in the abstract; rather, they are often developed with a particular goal in mind on the part of the designer and relied upon by users who also have particular goals (which, of course, might not align with the designer’s goals). Different kinds of displays are more or less effective at serving particular kinds of goals. A full consideration of effective instructional pedagogy requires focus on the cognitive processes that underlie attempts at comprehending visual displays, the inclusion of engaging and supportive activities that can enhance their use, and the understanding of learner characteristics and the broader contexts in which the displays are used. One way to support comprehension and learning is to design visual displays that help learners select, organize, and integrate important information.