Educational MUVEs

Modern 3-D multi-user virtual environments (MUVEs) offer immersive, rich visual and auditory experiences to their users. In them, users guide computer-based avatars through vast virtual worlds filled with realistic buildings, objects, characters, and the avatars of other players (Fig. 1). In the commercial realm, entertainment games such as World of Warcraft (http://www.worldofwarcraft.com) and online virtual communities like Second Life (http://www.secondlife.com) attract millions of devoted fans who spend large amounts of time (and money) in their worlds. All MUVEs evolved from earlier, text-based worlds called MUDs (multi-user domains/dungeons/dimensions) and MOOs (multi-user domains, object-oriented). Initially designed as fantasy-based adventure games, MUDs and MOOs soon were being explored for their potential as learning environments (e.g., Brdicka 1999; Fanderclai 1995; Bowers 1987; Falsetti 1995). In recent years, interest in 3-D educational MUVEs has spiked as researchers focused on their potential power as learning environments designed to incorporate situated learning concepts of collaborative knowledge building among communities of learners in contexts that closely mimic real-world processes and practices (e.g., Barab et al. 2005; Nelson et al. 2005).

Fig. 1
figure 1

River City

The research to date on educational MUVEs has centered on theoretical, curricular, and learning questions. In this paper, we focus instead on issues of interface design for educational MUVEs by exploring the viability of applying multimedia design principles, derived from cognitive processing theories, in the creation of the rich sensory environments that 3-D educational MUVEs represent. Our discussion centers on the River City MUVE, a 3-D environment designed to mimic real-world science inquiry processes by immersing middle school science students in a complex situated virtual setting. We examine the interface and virtual world design of River City using a set of cognitive overload scenarios first proposed by Mayer and Moreno (2003) that describe issues in multimedia environments that can lead to overload of learner cognitive processing, and design approaches to help learners manage cognitive load based on several multimedia learning principles, including coherence, signaling, redundancy, and spatial/temporal contiguity of information (Mayer and Moreno 2003).

We analyze the River City environment through the lens of these scenarios, offering examples of how the current design of graphical and textual information in the 3-D virtual space and surrounding 2-D screen areas may contribute to high levels of learner cognitive load. Next, we describe multimedia design approaches that we are applying to the River City MUVE in an effort to reduce cognitive load, discussing areas in which design principles drawn from studies of “traditional” 2-D multimedia software may not transfer easily to a 3-D MUVE’s inherently complex multimodal experience, and that may clash with the theoretical premise behind the situated learning pedagogy that is often associated with educational MUVEs. Finally, we describe next steps we are taking to conduct a systematic program of research into the design solutions we describe with an educational MUVE.

The River City MUVE

River City (http://www.muve.gse.harvard.edu/rivercityproject/) is an educational MUVE designed to teach authentic inquiry skills to middle school students as part of a formal in-school science curriculum. Nearly 10,000 students in the United States and worldwide have completed the computer lab-based River City curriculum as part of their middle school science classes (Nelson 2007). Designed to mimic a realistic American town in the late 1800s, River City includes shops, a library, a school, a hospital, and a university—as well as geographically distinct upper-, middle-, and lower-class residential districts (Fig. 1).

Upon entering the city, visitors to River City can interact with town residents (computer-based agents), digital objects (such as pictures, books, and charts), and the avatars of other students. In exploring, students also encounter visual stimuli such as muddy dirt streets, and auditory stimuli such as the sounds of coughing town residents.

Students work in small teams to develop and test hypotheses about why residents of River City are falling ill. Three different illnesses (water-borne, air-borne, and insect-borne) are integrated into the environment allowing students to develop and practice the inquiry skills involved in disentangling multi-causal problems embedded within a complex environment (Clarke et al. 2006; Nelson et al. 2005). Over the course of a 17–20 h long curriculum, students experience a year of virtual time in River City, with several months having passed in River City on each of five visits to the town. A final sharing day at the end of the project allows students to compare their research with other teams of students in their class and to piece together some of the many potential hypotheses and causal relationships embedded in the virtual environment.

A series of studies has been conducted investigating the viability of the River City MUVE and its curriculum to improve science learning, to motivate students to learn science, and to create learning situations that appeal to both girls and boys. Results from implementations of an early version of the River City MUVE in three classrooms in Massachusetts indicated that the environment is motivating for students, particularly students with lower academic backgrounds (Dede et al. 2003). These findings were replicated in a larger-scale implementation of the MUVE in 2004 (Clarke et al. 2006; Nelson et al. 2005). In that implementation, more than 1,000 students took part in the inquiry-based curriculum. As in the pilot study, qualitative student data from this larger study showed that students (and teachers) were highly motivated by the curriculum, and actively engaged in what they described as realistic inquiry (Clarke and Dede 2005).

In a recent study, an individualized, reflective guidance system was embedded into the River City MUVE, to see whether use of the guidance led to more effective learning for students (Nelson 2007). In an exploratory study with 272 middle school students, it was found that increased viewing of guidance messages was associated with significantly higher (p < .05) scores from pre- to post-tests on scientific inquiry skills and disease transmission knowledge. Among the students making use of the guidance system, it was found that use of embedded guidance messages was equally beneficial for boys and girls. Interestingly, though, girls were more likely to choose to view guidance messages initially, and more likely to view more guidance messages overall than boys (Nelson 2007).

One area that the River City research team has not focused on in a systematic manner is the design and presentation of visual, textual, and auditory information in the 2-D and 3-D elements of the MUVE interface. Anecdotal evidence gained through classroom observations, student interviews, and surveys has indicated that many learners may experience high levels of cognitive load when interacting with the River City environment. Students report not knowing where to focus their attention in the environment, and difficulty in keeping track of the many sources of information encountered while exploring the virtual worlds (B. Nelson 2005, Unpublished Dissertation). For example, in the guidance system study it was found that as many as 25% of the students with access to the guidance messages did not view them. While engagement in the main activities of inquiry was high, use of the guidance component was uneven. When a random sub-set of students were interviewed about their use (or lack thereof) of the guidance system, their responses fell into two categories: (a) they didn’t notice the guidance tool (because their attention was centered on the 3-D environment, or (b) there were so many sources of information to keep track simultaneously on-screen that students gave up on trying to pay attention to all of them (Nelson 2007).

The River City curriculum, and the modifications that have been made to its curriculum over time, has been based on principles drawn from a situated learning approach. Situated learning proponents define learning as being embedded within and inseparable from participating in a system of activity, deeply determined by a particular physical and cultural setting (Brown et al. 1989; Lave and Wenger 1991). In essence, situated learning requires authentic contexts, activities, and assessments coupled with guidance based on expert modeling or situated mentoring.

While a situated learning based framework has seemed appropriate as a basis for curricular revisions of MUVEs like River City, anecdotal evidence such as that seen in the embedded guidance study indicate it may be less useful for improving the design and delivery of multimedia information in the 3-D environment. Instead, we propose that a design approach based on multimedia principles drawn from theories of cognitive processing may offer a promising alternative for the design of information presentation in 3-D educational MUVEs.

From a cognitive processing viewpoint, the difficulties faced by students exploring River City are not surprising. The rich experience enabled by MUVEs leads directly to what Mayer and Clark (2007) label the “rich media paradox.” The simultaneous presentation of multiple information sources supported by MUVEs raises a learner’s cognitive load (the mental effort needed to process all the visual, textual, and audio elements) until (s)he experiences cognitive overload, with the incoming stimuli outstripping the capacity of the learner’s memory systems to process the information. Research has shown that multimedia messages designed in light of how the human mind works are more likely to lead to meaningful learning than those that are not (e.g., Mayer 2005b; Mayer and Moreno 1999; Mayer 1997; Mayer and Sims 1994; Mayer and Anderson 1991, 1992). In a similar vein, by approaching the design of visual, verbal, and aural information presented in educational MUVEs from a cognitive processing framework, more meaningful learning may occur.

Multimedia learning principles

Mayer and Moreno (2003) describe multimedia learning as learning from words and pictures, and multimedia instruction as the presentation of words and pictures in a way that fosters learning by helping people form mental representations of incoming information. To process information as they develop mental models, cognitive theorists suggest that humans possess dual cognitive channels: separate information processing channels for words and pictures (Mayer 2005a; Paivio 1986, 1991; Mayer and Anderson 1991). In addition, they posit “the limited capacity” assumption: each of the two processing channels has a limited cognitive processing capacity at any given time (Sweller 1999, 2005). Finally, they assume that learning requires active processing of information in the visual and verbal channels (Mayer 2005a).

Based on these assumptions, cognitive scientists propose a number of design principles for the creation of multimedia learning materials. The most basic of these is the Multimedia Principle. This principle simply states that people learn better from words and pictures together than from words alone (Mayer 1997; Fletcher and Tobias 2005). Most other multimedia principles entail finding the best way to organize and present visual and verbal information to learners. The multimedia design principles most central in our current discussion are the coherence, signaling, redundancy, modality, contiguity (split-attention), segmenting, and pre-training principles. These principles are summarized in Table 1.

Table 1 Multimedia learning principles

In formulating these principles, Mayer and others have focused their attention primarily on what might be called “traditional” computer-based instruction environments. By this, we mean 2-D presentational learning environments in which learners move from screen to screen of content in a linear or branching sequence. The multimedia principles are concerned with how multimodal elements (text, pictures, sounds, animations, etc.) can be arranged visually or temporally to assist learners in managing cognitive load, which in turn allows them to focus the maximum amount of available short-term memory to dealing with new incoming information. To date, little research has been done on the usefulness of multimedia design principles in the creation of complex 3-D situated learning environments such as MUVEs.

Here we present a reflection on design practice, based on an analysis of the ways in which multimedia design principles may be applied in the development of educational MUVEs, using the River City MUVE as an example. To provide a framework for our discussion, we make use of cognitive overload scenarios described by Mayer and Moreno (2003). Cognitive overload occurs when one or both of the processing channels are overloaded by information that is either essential or extraneous to the learning process. Mayer and Moreno (2003) have created an analytical framework for examining situations in which overload may occur. In their framework, they list five cognitive overload scenarios (Table 2) that cover the most common types of overload along with discussions of the principles that can be employed to prevent overload by managing cognitive load in each case. In what follows, we will explore how the design of the River City MUVE interface may be contributing to high cognitive load, increasing the likelihood of overload scenarios occurring, as defined in Mayer and Moreno’s framework. We also discuss areas in which application of multimedia principles may (or may not) be successfully applied to 3-D educational MUVEs.

Table 2 Cognitive overload scenarios

River City interface

Before beginning our analysis, it is helpful to introduce the River City MUVE interface, which is representative of other educational MUVEs and massively multi-user environments currently being examined as platforms for education, including Quest Atlantis (Barab et al. 2005), and Second Life (Herman et al. 2006).

The River City interface (Fig. 2) consists of four primary functional areas including a team chat window, 3-D environment, tools area, and content window. The main functional area is the 3-D environment window. Through this window students interact with the virtual town of River City. Students can walk, run, swim, or fly through the environment. They can click on 3-D objects including pictures and signs and ask questions of town residents (computer-controlled agents) via pop-up dialog menus.

Fig. 2
figure 2

Current River City interface

The 2-D content window consists of four sub-sections including a “hints machine,” web content pane, toolbar area, and environmental health meter. The primary focus of the 2-D content interface is meant to be the web content pane. When a student clicks on any interactive object in the 3-D environment, the web content pane displays corresponding information about that object. For example, in Fig. 2, the student has clicked on a photograph hanging on the wall of the River City train station. The web content pane provides information about the photograph, as well as a link to a large-scale version of the picture. In other areas of River City, students can click on signs that link to interactive inquiry tools including microscopes and bug catchers (Fig. 3). These tools appear in the web content pane.

Fig. 3
figure 3

Microscope

The hints machine section of the content window houses an individualized guidance system (Fig. 2). This guidance system offers meta-cognitive prompts designed to assist students in making sense of data they collect in River City and aid their quest to discover the sources of illness in the town. The toolbar provides links to world-specific homepages, the interactive map, and a tutorial, as well as back and forward buttons for navigating through pages of web content. The environmental health meter offers an on-going visual indication of the level of “healthiness” in the student’s immediate surroundings. One hundred percent (100%) indicates high health in the area while lower readings indicate the presence of some unhealthy elements in the vicinity.

The tools area consists of three sub-components including the menu bar, tool bar, and avatar action bar. The menu bar contains various functional components for modifying the visuals, changing the appearance of a student’s avatar, teleporting to new locations, and receiving help. The tool bar contains icons that allow students to view an online notepad, return to a “Timeport” World, look up or down, take a snapshot of their current location in the 3-D environment, and switch between first- and third-person views of the 3-D environment.

The team chat window includes a text input area for typing chat messages to other students, and a larger chat viewing window for reading on-going chat discussions. The chat viewing area also displays replies to questions asked of River City residents via a pop-up dialog window that appears when students right-click on the residents in the 3-D environment window.

River City design analysis

Analytical framework

In the following discussion, we analyze the River City 2-D and 3-D interface elements utilizing Mayer and Moreno’s cognitive overload framework (Table 2) to look for potential overload scenarios currently inherent within the River City environment and describe approaches for managing cognitive load in River City through application of multimedia design principles. We also explore the tension that we believe exists in attempting to apply multimedia principles developed in a “2-D paradigm” to the design of 3-D educational MUVEs.

What we present here is a reflection on the design of the current River City interface and a discussion of changes that we are introducing as part of a systematic investigation of the viability of multimedia design principles in educational MUVE design. As such, our discussion can be seen as a principled design blueprint that we are following to conduct empirical studies. Additionally, we hope this will provide a useful framework for other researchers in this expanding area of inquiry.

Essential overload scenario: Type 1

A Type 1 essential overload scenario in the Mayer and Moreno framework occurs when a learner’s visual processing channel is overwhelmed by attempting to simultaneously process multiple stimuli that are relevant or “essential” to the learning process. Mayer (2005a) lists a number of tasks that take place when students process relevant information in a multimedia setting:

  1. 1.

    selecting words

  2. 2.

    selecting images

  3. 3.

    organizing words

  4. 4.

    organizing images

  5. 5.

    integrating words and images

When a student is faced with a large number of elements to process at the same time, she experiences cognitive overload and fails to effectively use all essential elements necessary to form mental models.

One factor that can lead to this type of overload is split attention (Sweller 1999, 2005). With split attention, multiple stimuli are presented simultaneously in two or more different locations. Students then attempt to juggle information from multiple locations, leading to overload.

The nature of educational MUVEs makes them quite prone to promoting split attention. Educational MUVEs are extremely rich in visual and auditory information. In fact, one of the touted benefits for learning in such environments is the richness and complexity of information they can display (Dede 2003). However, this richness of multimedia elements can easily lead to cognitive overload. For example, in the River City MUVE, essential visual information is presented to students in multiple locations simultaneously. While exploring River City, students are presented with visual information related to the learning task in up to five distinct locations on screen. These include the: (1) chat window (team chat and bot replies), (2) web pane (sign, picture, map, and book info), (3) hints machine pane (real-time hints), (4) main window (world graphics), and (5) environmental health meter.

In most MUVEs, learners can click on pictures and 3-D objects located in the environment and view related information via pop-up or peripheral windows. In River City, when students click on photographs hanging on the walls of buildings in town, they are shown information related to the photos and to events taking place in town. Both the photos themselves and the accompanying text contain important information. In addition, there are tacit clues to be gained from the location of the photos within the 3-D world. While examining photos and other objects in the virtual town, students also need to communicate with other members of their team via a text-based chat window beneath the 3-D world space. It is likely a difficult cognitive task for students to view a given picture, read the accompanying text, take note of the visual and auditory surroundings present in the 3-D space, and communicate with other students through text-based chat, all at the same time.

Similarly, students in River City can interact with computer-based town residents via a right-click question menu. A student right-clicks on a given resident in the 3-D environment and selects a question (from a pop-up menu) to ask that resident. The selected resident replies via a text message outside the 3-D world in a separate chat window. Here too, students need to process information about the resident, the questions associated with the resident and the response from the resident almost simultaneously.

Design solutions

One approach to reducing split attention and thereby lessening the potential for Type 1 essential overload is through spatial reorganization of relevant visual material on the screen. For example, when a student clicks on an in-world object or image, instead of related information appearing in a visually separate web content window, the student’s view could “zoom in” to a close-up of the object directly in the 3-D world. After zooming to the close-up view, the recorded narration would begin. In addition to helping manage cognitive load by offloading text into the verbal channel and reducing the split attention effect, such a redesign would more faithfully adhere to the notion of situating learning inside a context designed to mimic real-world situations.

A second approach to avoiding scenarios leading to Type 1 essential overload is to offload some of the content from one channel (visual) to the other (verbal) (Mayer and Moreno 2003). Doing so both lowers the amount of processing in a given channel and allows for dual-coding, enabling simultaneous processing of essential information in both visual and verbal channels. In both the 3-D visual space and the surrounding 2-D interface elements of the River City MUVE, this type of offloading could be a way of reducing instances of simultaneous image and text presentation. For example, when students click on in-world photographs in the 3-D space, they could see a close-up of the image within the 3-D environment while hearing a recorded narration about the image. Similarly, students could click on a River City resident (computer-controlled agent) to ask a question and then hear a narrated response seemingly emanating from that character instead of a text-based response that appears in a separate chat window outside the 3-D world. Finally, text-based chat among team members, currently taking place in a separate window, could be replaced by a voice-based system (such as VoIP) that allows team members to communicate aurally while continuing to focus their visual attention in the 3-D space. Although no research has been done (yet) measuring the relative reduction in cognitive load that use of VoIP can bring, use of this kind of system is commonly seen in commercial massively multi-player games such as World of Warcraft, often through the use of third-party VoIP chat applications such as Teamspeak or Ventrilo.

In our current research blueprint, we are investigating several design approaches to manage cognitive load in an effort to avoid Type 1 essential overload (Table 3):

  • Keep the visual channel for visual objects: avoid simultaneous presentation of visual objects and text in the world, unless their co-representation mimics a real-world counterpart (for example a cereal box with writing on the back).

  • Offload printed text to the verbal channel when possible: Interactions with in-world characters should be via narration. Users should be able to pause/replay/store narration, and a text record can be stored for later retrieval.

  • Offload learner-to-learner chat to VoIP system.

  • Keep in-world interactions in the 3-D world: reduce split attention by enabling learners to examine objects in the world itself rather than in a visually separate location on the screen. Avoid separate chat windows for user-user interactions and dialogues with in-world agents.

Table 3 MUVE design issues and associated redesign approaches

Essential overload scenario: Type 2

Just as there is a potential problem in MUVEs with overloading the visual channel through too much processing of relevant text and images, offloading large amounts of visual material into the verbal channel could merely shift the load elsewhere, rather than reduce it. Mayer and Moreno’s Type 2 essential overload scenarios occur when both visual and verbal channels are overloaded by attempting to process too much relevant information simultaneously in both channels. Mayer and Moreno (2003) describe an example scenario in which a learner views an animated narration describing the process of lightning formation. Here the positive effect of modality is present, with both visual and verbal channels in use. However, if the content is difficult and/or the pace of the material presentation rapid, students are unlikely to be able to process all the information at a level sufficient for deep processing and mental model formation. Sweller (2005) describes this problem as one of high-intrinsic load. The material itself is complex enough to lead to cognitive overload even when presented in both visual and verbal channels.

As we have described, educational MUVEs are often specifically designed to contain rich visual and auditory information in an effort to mimic real-world contexts as an outcome of the situated learning theoretical foundation of their curriculum. This situated learning-based approach also leads to curricula that are complex (Dede et al. 2003; Nelson et al. 2005), may have ill-defined learning outcomes and processes (Barab et al. 2005; Jonassen 1999), and may require a complex set of interactions (Bruckman 2000). The current design of the River City MUVE encodes most essential (i.e., relevant to completing the overall learning objectives) information as visual content in the form of pictures, 3-D objects, or written text. With little simultaneous presentation of visual and verbal/aural material, Type 2 essential overload scenarios are not likely to occur with the current design. However, offloading written content into the verbal channel to enable dual-coding may increase the likelihood of Type 2 essential overload.

Design solutions

To prevent dual-channel essential cognitive overload scenarios, Mayer and Moreno (2003) suggest two solutions: segmenting and pre-training. In segmenting, time is introduced between presentation of successive segments of information and learner control over these segments is supported. The presented information is deconstructed into smaller segments, affording students control over the presentation of relatively small elements of content. In the pre-training solution, students are given instruction on the components that make up the larger system or topic to be learned. Doing so reduces the amount of new information that needs to be processed in real-time as the students are learning about the more complex system.

Use of these solutions in the design of educational MUVEs presents an interesting challenge. As we have described, these environments are often designed as situated learning environments. The technical affordances offered by highly immersive 3-D environments support the development of curricula centered on ill-defined processes and complex outcomes. In an environment where there is no overt linearity to student actions, and in which students have great freedom to go where they wish and do what they want, how could segmenting or pre-training be used?

Let us first examine the potential use of segmenting in MUVEs to reduce the possibility of Type 2 essential overload. In River City, students have an overall goal of discovering why residents of the town are getting sick. To work toward their goal, they explore the city in small groups. Student groups enter the city multiple times over a period of several real-world weeks. In addition to the overall goal of uncovering the causes of illness in River City, students are presented with a set of smaller tasks as they explore the city. For example, they are expected to gather data by clicking on photographs, asking “What’s new?” to the town residents, taking water samples, and collecting/counting mosquitoes. Each of these tasks provides data that can be collected and analyzed by students to form a hypothesis about the illnesses in town.

Given these tasks, it may be possible to use segmenting on specific procedural tasks to reduce cognitive overload. For example, students can view an admissions chart in the River City hospital. The chart includes a long list of patients, their symptoms, and their addresses in town. Segmenting could be used to break this information up into smaller pieces, with basic controls available to allow students to flip through electronic pages of patient records. For information presented as text or narration along with the photographs students see in the world, simple segmentation could be integrated, allowing students to click through sequential components of information. In any community of learners or practitioners working together toward a large goal, there are great numbers of smaller sub-tasks, often involving procedural or hierarchical steps. Implementing segmentation into such sub-tasks in a MUVE could help reduce Type 2 essential overload.

In a similar vein, it would be useful to pre-train learners on the steps involved in the sub-tasks they may perform as part of a larger ill-defined learning experience within a MUVE. In the River City project, we have found it beneficial to pre-train students on the operation and manipulation of the environment and interface. The first day that students enter into River City, the primary goal is pre-training them on how to move through the world, how to chat with team members, and how to interact with objects and characters in the world. By allowing students time to explore the environment, we hope to reduce cognitive load in later visits to the world when they need to focus more attention on the gathering of data and collaboration with teammates. Later in the curriculum, we also offer students additional opportunities for training on components of their larger learning task via materials present in the River City library. Here students can click on library books to get basic training on steps of the scientific method, curriculum-related vocabulary, and experimental design.

In our current research blueprint, we are investigating several design approaches to manage cognitive load in an effort to avoid Type 2 essential overload (Table 3):

  • Identify and deconstruct procedural and/or hierarchical sub-tasks that take place within the context of the overall MUVE curriculum.

  • Design for segmentation and learner control over components within identified sub-tasks.

  • Design pre-training opportunities for the MUVE interface and processes.

  • Design pre-training materials for identified sub-tasks. Enable completion of pre-training within the context of the MUVE environment (embed the training as a natural component situated within the larger learning tasks).

Extraneous overload scenario: Type 1

Although the name sounds redundant, an “extraneous overload scenario” refers to a situation in which the combination of processing both essential (relevant) and extraneous (irrelevant) information is greater than a learner’s cognitive capacity. A Type 1 scenario occurs when one or both channels are overloaded by both essential and extraneous information processing; a situation which is generally attributable to extraneous material (Mayer and Moreno 2003). In this instance, extraneous material can be defined as any information presented which does not directly influence achievement of an instructional objective. An example of extraneous material presented within a typical multimedia learning environment would be a lesson about the physics of space travel that contains several text passages about life as a farmer and several images of farm animals.

In River City, the technical affordances of the 3-D environment have allowed us to situate learning in an authentic context that mimics the complexity of the real world. Consequently, there is an enormous amount of information, primarily visual, embedded in the 3-D worlds that may not be directly related to the learning goals of the embedded curriculum. For example, in River City, students can discover information relevant to the learning goals of the curriculum by reading signs and charts in the city hospital, city hall, local library, and numerous homes. However, there are a large number of other buildings embedded in the 3-D world, most of which students cannot enter. These buildings serve no direct learning purpose, other than to increase the reality of the environment (i.e., make it look like a real town). Presumably, students work to derive meaning from these buildings, taking up “cognitive space” in their working memory that could be used to process stimuli containing information more central to the learning task.

In addition to the plethora of relevant and extraneous visual content in River City, the environment contains large amounts of text-based material that appears in multiple locations at once: in the 3-D environment as objects modeled on real-world counterparts (signs, books, charts, etc.) and in the 2-D areas of the screen (web content window, chat windows, guidance window, online notepad, etc.). With large amounts of text on screen simultaneously, and no indication of which text they should pay attention to (i.e., which text is the most relevant to the current learning task), learners are more likely to experience a Type 1 extraneous overload scenario.

Design solutions

There are three main approaches to manage cognitive load in a way that will help avoid Type 1 extraneous overload, all of which are embodied in three principles of multimedia learning: signaling, redundancy, and coherence. One approach for managing an abundance of essential and extraneous information in multimedia applications is derived from the signaling principle. According to the signaling principle, designers can help learners manage cognitive load by creating an interface that provides clues for how to process presented information and assists learners in focusing attention on relevant elements. Designers add signals that show learners what information to attend to and how to organize that information into manageable, meaningful chunks—guiding the learner in selection, organization, and integration of explanatory information (Mautone and Mayer 2001).

In designing educational MUVEs, there are two methods of signaling that would help eliminate cognitive overload due to an overabundance of extraneous stimuli: information signaling and interaction signaling. Information signaling is that which is integrated directly within the content, such as titles and headings, function and relevance indicators (i.e., in conclusion), and typographical signals (boldface, italics, color). These signaling methods guide the search for specific information and simplify decisions of relevance for the reader. In a case study, Loman and Mayer (1983) found that signaling increased recall of relevant (i.e., signaled) idea units and decreased recall of irrelevant (i.e., non-signaled) idea units. Additionally, information signaling serves to cue the reader towards local and global organizational structure of material and to make explicit relations that might otherwise have to be inferred during reading (Lorch 1989).

While headings may prompt reader to pause and process topic shifts within material (Lorch and Lorch 1996), pointer phrases and logical connectors operate at a more local level (i.e., because of that or as a result), generally pointing out cause-effect relationships. This reduces the ambiguity of the relationship between two or more pieces of information (less inferences made by the reader), which is especially important if the reader has low prior knowledge of the subject matter (Loman and Mayer 1983).

Interaction signaling is that which serves as mediation between the user and the information with which he or she will interact. Icons and animated interaction indicators are examples of such signaling. More specifically, the sign objects within the 3-D world of the River City MUVE that represent bug catchers and water sampling stations are both interaction signals. All of these objects are colored with the same green background and white text to indicate to the user that they can be clicked and the same task type will result—in this case counting bugs or sampling water.

In redesigning the River City MUVE based on cognitive processing theory, we can utilize signaling principles in the redesign of both 2-D and 3-D areas of the interface. Two primary foci in the 2-D space for signaling that offer useful examples for general MUVE design are our content window and team chat window (Fig. 2). Both of these areas can benefit from a systematic redesign to incorporate visual signaling of essential information. In the content window, we can redesign messages to be formatted in a systematic way that provides consistent signaling of relevant text (via headers, font style and weight, etc.). In this manner, we can draw a learner’s attention to specific elements of text containing the most relevant information.

Similarly, we are designing a system of visual signaling for text-based messages appearing in the team chat window. For example, historical text content within the team chat window (i.e., previous encounters with River City residents or team members) will be clearly signaled in order to delineate it from ongoing conversations taking place. In addition, formatting can be used to signal text messages that originate from different groups of participants (team members, residents, teachers, etc.). Icons will also be used in the chat window for identification of each member of the current conversation (both human and non-human participants). These icons will serve as miniature chat avatars for each user, and precede each line of text submitted by the user to the current chat conversation. A default presentation style format will be created to delineate sent vs. received messages. This style format would dictate default color, weight, and style for each message type, ideally providing faster cognitive processing for the user trying to keep track of multiple participants in the same conversation.

We are implementing similar signaling strategies in redesigns of our hints machine, environmental health meter, and tool bar. The cumulative effect of these redesign efforts is to better enable learners to focus their attention on information most valuable to them at a given moment, reducing extraneous cognitive load while maintaining the background immersion that is the hallmark of MUVEs.

A second approach for managing the kind of cognitive load that can lead to Type 1 extraneous overload is derived from the redundancy principle of multimedia learning (Mayer et al. 2001). In this principle, the likelihood of cognitive overload can be reduced by avoiding the presentation of simultaneous streams of printed and spoken words concurrently with corresponding visual information. For example, in many multi-player online games, users are introduced to quests (small adventures that are undertaken in a world) via a combination of text and narration while viewing a 3-D scene. As we have described, designers can avoid this type of redundancy by presenting all information passages that accompany graphical information as narration rather than on-screen text. This delivery method should reduce redundancy and provide fuller engagement of verbal and visual channels of cognition in the learner—limiting split attention and promoting dual-coding of information.

For designers interested in creating educational MUVEs, there are two potential problems associated with the solutions proposed for reduction of redundancy. First is the issue of feasibility and/or practicality. Replacing all on-screen text in a MUVE with narration can create many technical burdens. Streaming audio comes with high bandwidth requirements, large server-side storage requirements, and increased minimum technical specifications required for an end-user’s computer. Additionally, many school labs do not have headphones available for all students. Second, from an accessibility standpoint, eliminating simultaneous text and narration presupposes a particular learning style on the part of users. The narrowing of information “intake” options may offset the cognitive benefits most students may enjoy through the elimination of redundancy.

The third, and perhaps most powerful approach for managing high cognitive load attributable to simultaneous essential and extraneous information processing, is to apply the coherence design principle (Mayer et al. 2001). Basically, use of the coherence principle implies weeding out extraneous material to present a more coherent, concise message. This process of elimination can be accomplished in multimedia learning environments through the exclusion of potentially interesting yet irrelevant visual and verbal information. Thus, a possible approach for implementing the coherence principle in an educational MUVE would be to determine the level of relevance of each visual and/or verbal object within the environment according to the underlying curriculum and learning goals. Once the relevance has been determined, designers can include only those elements that most directly support the learning goals.

When dealing with 3-D MUVEs, this task is easier said than done. In a non-linear, ill-structured curriculum embedded within a virtual environment, how should the diligent designer go about determining what is or isn’t relative to the learning goals? And if a set of objects can be identified that would efficiently direct students toward specific solutions through information processing cognizant of short-term memory limitations, would creation of a 3-D MUVE including only those objects be a worthwhile endeavor? In other words, would a MUVE designed to a level of simplicity compliant with the guidelines shown to improve learning in traditional multimedia applications still be a MUVE? These are questions that have not been explored in the field, but that we hope the blueprint we describe here may help address.

In our research blueprint, we are exploring the viability of several design approaches to manage cognitive load and avoid Type 1 extraneous overload in the design of educational MUVEs (Table 3):

  • Reduce presentation of seductive but extraneous details in the 3-D environment and associated content areas, while working to maintain contextual immersion.

  • Reduce presentation of redundant elements presented simultaneously in the 3-D environment and associated content areas, while working to maintain contextual immersion.

  • Use visual and auditory signaling techniques to cue learners to meaningful elements/events/processes within the 3-D world of the MUVE and in the world-external elements of the interface.

  • Keep signaling simple and transparent. If a signaling legend is used, keep it minimal and straightforward.

Extraneous overload scenarios: Type 2 and Type 3

Types 2 and 3 extraneous overload scenarios are closely related and occur in multimedia learning when the combination of essential processing and extraneous processing is greater than the learner’s cognitive capacity. Both scenarios are attributable to confusing presentation of graphics and text or narration (Mayer 2005b; Mayer and Moreno 2003). In Type 2 extraneous overload scenarios, confusing layout is embodied by the visual distance in 2-D space between a graphic and its associated descriptive text. This distance causes enough split attention to overload the cognitive capacity of the learner. The typical example of this kind of split attention can be found in books where a graphical example exists on a particular page, and the descriptive text for that example is on either a preceding or following page. This forces the reader to flip back and forth to view the graphics and text as closely together as possible. A less severe example of confusing layout can occur when the graphics and text are on the same page. If the graphics and text are simply far enough apart to require an unacceptable amount of visual scanning by the learner, the layout of those graphics and text on the page should be considered confusing, and needs to be remedied.

Type 3 extraneous overload occurs in multimedia learning when the combination of essential processing and representational holding (caused by confusing layout) is greater than the learner’s cognitive capacity. More specifically, one or both channels are overloaded by essential processing and representational holding, which is generally attributable to confusing layout of animation and its associated narration (Mayer 2005b; Mayer and Moreno 2003). In this instance, confusing layout is embodied by an asynchronous presentation of any animation and its associated descriptive narration, which overloads the cognitive capacity of the learner.

Design solutions

The method of avoiding Type 2 extraneous overload is based on managing cognitive load through use of the spatial contiguity principle (Mayer 1997). Printed words should be placed near corresponding parts of graphics to reduce the need for visual scanning. This suggestion applies to all types of graphical information: put printed words near rather than far from corresponding parts of an illustration or animation. Here too, a principle developed and evaluated in a primarily 2-D instructional design framework, faces unique implementation challenges in the 3-D paradigm of educational MUVEs, where confusing layout within the interface is not necessarily based on the traditional idea of proximity of graphics and text. To frame the problem within the traditional perspective, though, we can use a metaphorical association. In the case of the River City MUVE, consider the 3-D environment to be the “graphic,” and consider the windows arranged around the 3-D environment (the content window, environmental health meter, and the hints machine) to be the “text.” From this perspective, the visual separation of text and graphics in the River City MUVE can lead to Type 2 extraneous overload.

To reduce this type of overload, the “graphic” and “text” elements of the MUVE should be visually integrated as completely as possible. However, due to technical limitations in the software engine beneath River City, Quest Atlantis, and many other educational MUVEs, absolute integration of all “text” elements within the “graphic” window is not easily achieved. In these environments, a separate web content window containing the “text” elements is the only option for displaying information concurrently with the 3-D environment. Consequently, we believe the best way to more closely integrate text and images in the River City environment (or any MUVE) is to create an accordion-style student scientist toolkit. In an educational MUVE, such a toolkit will provide an integrated visual shell containing the hints machine, content window, toolbar, and environmental health meter.

Each of these parts of the interface will be contained in an individual, collapsible segment of the toolkit (Fig. 4). Initially, the segments will default to an “open” state, but learners will be able to customize the status of each segment, toggling between the open (expanded) and closed (collapsed) states. Additionally, if a segment happens to be in the closed state when important information or a status change appears within the collapsed segment, the segment will immediately indicate to the user in some fashion (such as a blinking bulb) that it should be expanded to view the pertinent information.

Fig. 4
figure 4

Interface redesign mockup

One approach to combat Type 3 extraneous overload is through use of the temporal contiguity design principle (Mayer and Sims 1994). This principle states that learners will have a more difficult time processing information that is presented in a temporally dislocated fashion (i.e., asynchronous presentation of a graphic and its related narration). Based on the temporal contiguity principle, we suggest that designers can reduce the likelihood of Type 3 extraneous overload occurring in an educational MUVE by presenting corresponding narration and animation synchronously. This should minimize the need for any learner to hold representations in memory (i.e., remembering a narration while viewing the ensuing associated animation). In other words, one should present corresponding narration and animation pairs simultaneously rather than successively.

In our current research blueprint, we are exploring the following design approaches to avoid Types 2 and 3 extraneous overload in the design of educational MUVEs (Table 3):

  • Closely integrate presentation of visual and textual/verbal elements of information in the 3-D worlds.

  • Eliminate or greatly reduce the presentation of visual information and related textual verbal information outside the 3-D environment.

  • Where tight spatial integration of visual and textual information is not possible, use signaling and hiding/showing techniques to draw attention to related conceptual elements.

  • Present corresponding narration and animation pairs simultaneously rather than sequentially

Discussion and next steps

A major goal of the application of multimedia design principles to the creation of technology-based learning environments is to simplify and organize the delivery of information to allow for efficient encoding in visual and verbal channels, and to improve opportunities for mental model building. In our analysis of the River City MUVE design, we have identified a number of areas in which use of multimedia principles may hold promise for helping learners achieve these aims. However, our analysis has also indicated that the cognitive load management goals behind multimedia design principles may not always be achievable or desirable when dealing with immersive, highly visual and aurally complex 3-D environments such as educational MUVEs.

As an example of this complexity, there is the challenge of applying the coherence principle of multimedia learning to a MUVE-based curriculum. As mentioned in Table 1, the coherence principle states that people learn more deeply when extraneous material is excluded. The pragmatics of applying this principle in the design of MUVEs would involve using fewer extraneous words, pictures, and 3-D objects. This is a relatively easy process when considering constructed learning materials that are embedded in a MUVE—such as a self-paced tutorial that would be delivered to a user in the content panel upon clicking a virtual book in the virtual library. However, the same rubric of extraneousness is more difficult to apply when dealing with the embedded words and pictures (and all other sensory material, for that matter) that make up the fabric of a user’s immersive experience in the MUVE. Should visual objects such as streetlights, signs, sidewalks, and storefronts be managed, reduced, or eliminated altogether due to potential for cognitive overload? Or, would doing so eliminate the “essential complexity” of an educational MUVE?

In addition, an overall aim of reducing complexity may conflict at some level with the situated learning approach underlying the curricular design of educational MUVEs like River City. Dede (2003) describes the River City MUVE as an environment that supports a level of complexity occupying a middle ground between canned classroom labs and the real world. In other words, situating learning inside a MUVE is presumed to be beneficial because it can replicate a relatively high level of real-world sensory complexity. If this assumption of benefit is accurate, designers need to perform a balancing act between incorporating multimedia design principles aimed at reducing complexity to support intrinsic cognitive processing, and decreasing the complexity of environments that allow for MUVEs to more closely resemble the real-world contexts they are designed to emulate.

Following a cognitive processing-based design framework, one would want to remove as much extraneous graphical and textual information as possible from the learning environment. It is not straightforward to determine which material present in a MUVE is extraneous to the learning and which material contributes to situating the learning in a context of real-world “messiness.” Following a situated learning pedagogy, it is important not to trim away too much contextual and immersive “fat” of a given MUVE. What, exactly, is the best way to decide the relevance of information within a given curriculum? How does one create a design based on multimedia principles proven to strengthen mental model building while maintaining the elements that, from a situated learning perspective, are important to learning?

In creating educational MUVEs, we suggest that designers occupy a middle ground of sensory complexity. On one hand, virtual worlds can be created that contain a rich set of visual and textual elements necessary to recreate a real-world context for exploring problems and developing hypotheses. At the same time, however, designers can work to avoid inclusion of clearly extraneous information that cognitive processing principles have shown to be detrimental to learning. By designing the presentation of visual and verbal information in educational MUVEs with an eye toward reducing extraneous cognitive processing and promoting dual-channel encoding of essential information, it may be possible to help learners manage cognitive load in these inherently complex environments while maintaining the immersive quality that is their hallmark.

Through use of the research blueprint presented in this paper, we are now in the process of conducting a systematic investigation into the strengths and challenges of application of the most promising multimedia design principles to the creation of 3-D educational MUVEs incorporating principles of cognition in visual, textual, and auditory design while not stripping away the situated complexity that is the hallmark of these environments.

Our first study, taking place in fall 2007, investigates the use of the modality principle as a design approach for reducing cognitive load in our new educational MUVE, “SimLandia.” In a traditional multimedia learning design framework, the modality principle states that people learn more deeply from a multimedia message when the words are spoken rather than printed. Moving beyond the traditional framework, we have implemented a MUVE-based science inquiry curriculum adhering to the modality principle by offloading text-based “team chat” through the use of a voice-based chat system for teams conducting inquiry in the MUVE. We are investigating the extent to which cognitive load can be reduced through the use of this voice-based chat treatment in which team members collaborate on inquiry activities through speech as opposed to text. Similarly, our second study, also taking place in fall 2007, investigates the use of the spatial contiguity principle as a design approach that may reduce cognitive load in educational MUVEs. This principle simply states that people learn better when related visual and textual content are presented near rather than far from each other on the screen. Presenting related multimedia elements such as pictures and words closer together helps reduce a deleterious “split attention” effect in which learners try to switch focus repeatedly between different related sources of information (Sweller 1999). In the study, we are implementing a MUVE-based science inquiry curriculum in which the split attention effect is reduced by (a) displaying all text-based descriptions related to in-world visual objects such as pictures in the 3-D world itself (rather than in the 2-D web content window outside the 3-D environment) and (b) introducing a “resident chat system” in which students ask questions of computer-controlled MUVE residents through an interface that displays resident answers in text-based captions above their heads in the 3-D world (rather than in a separate text window outside the 3-D environment).

We hope findings from our analysis here, and our related studies, will contribute to the burgeoning field of collaborative, game-based learning by producing a set of empirically-based design guidelines that educators and researchers can use to implement technology-supported science inquiry curricula that contain “essential complexity:” rich immersion coupled with manageable cognitive load. Because this area of research is in its infancy, we hope that other researchers will find value in the blueprint we offer here, as a tool for framing their own empirical studies.