Keywords

Learning with Multimedia Information Environments

The use of explanatory texts (spoken or written) combined with pictorial representations (e.g., diagrams, animations, or videos) as a format of instruction has become omnipresent in educational information environments. Such text-picture combinations are referred to as multimedia (Mayer, 2009). There is a large body of research showing that learning with multimedia is under most circumstances more effective than learning just from text alone, a finding commonly referred to as the “multimedia effect” (Anglin, Vaez, & Cunningham, 2004; Butcher, 2014). Different explanations for this multimedia effect have been proposed. According to the two most prominent theories in the field, the Cognitive Theory of Multimedia Learning (CTML; Mayer, 2009) and the Integrated Model of Text and Picture Comprehension (ITPC; Schnotz, 2005), learning with text and pictures can result in richer and more elaborate mental representations of the to-be-learnt content than when studying only text. For this to happen, learners have to select relevant information from each representational format (i.e., the text or the picture) and mentally organize the information into meaningful mental structures. Most importantly, they need to establish coherence between information from the text and picture by mapping elements from the text onto those of the picture and vice versa (Seufert, 2003). This process has been referred to as global coherence formation (as opposed to the organization of information within a single representation; Seufert, 2003) or text-picture integration (Mayer, 2009; Schnotz, 2005), respectively. Only if learners mentally integrate information from texts and pictures into coherent mental models, deeper learning as a prerequisite for being able to apply the learnt contents to novel situations (i.e., transfer) is assumed to occur. A number of studies have deployed process measures such as recordings of students’ eye gazes as indicators for integration. These studies showed that more switches between text and picture processing as well as more frequent processing of pictorial elements that corresponded to text are related to better comprehension of multimedia materials (e.g., Mason, Tornatora, & Pluchino, 2013; Scheiter & Eitel, 2015).

In particular, the use of eye tracking as a research methodology has, however, also revealed that many learners fail to adequately carry out the cognitive processes necessary in order to benefit from multimedia (Renkl & Scheiter, 2015). Many learners do not pay sufficient attention to the pictures but rather focus on the text (e.g., Hannus & Hyönä, 1999; Schmidt-Weigand, Kohnert, & Glowalla, 2010a, 2010b); moreover, they fail to integrate information from texts and pictures if no proper instructional guidance is provided (Cromley et al., 2013; Mason et al., 2013; Scheiter & Eitel, 2015; Schwonke, Berthold, & Renkl, 2009). In early research on multimedia learning, this problem of inadequate processing of multimedia instruction has been addressed by developing various instructional design measures that are tailored towards supporting learners in integrating texts and pictures (Mayer & Moreno, 2002). For instance, spatial contiguity between texts and pictures, which is established by presenting texts close to their corresponding picture elements, was shown to be effective in fostering the integration process (Johnson & Mayer, 2012) and comprehension (for reviews and meta-analyses see Ayres & Sweller, 2014; Ginns, 2006). Similarly, cueing or signaling correspondences between texts and pictures, for instance by showing corresponding text and picture elements in the same color, also was shown to aid integration and comprehension (e.g., Ozcelik, Karakus, Kursun, & Cagiltay, 2009; Scheiter & Eitel, 2015; for a meta-analysis see Richter, Scheiter, & Eitel, in press).

Even though these design measures improve learning outcomes, relying on them as the only approach to support learners may be problematic for both practical and theoretical reasons. First, although a large body of research exists on them, knowledge on these design measures has not yet made it into broader educational practice. That is, by and large, existing educational resources such as printed and digital text books as well as online learning environments that have been produced for commercial reasons often violate the design measures that have been identified in educational research. Thus, it is rather exceptional that learners face well-designed multimedia material based on proven design measures. Second, partly as a consequence of the prior point, learners should not become overly reliant on the design of learning material; rather, they should be able to learn even with less optimally designed instructional material and make the most out of it. Third, there is evidence that applying design measures is helpful only for those with little prior knowledge of the considered domain, whereas those with more advanced knowledge tend to not benefit or even suffer when faced with allegedly “optimized” instruction. This finding has become known as the expertise-reversal effect in instructional design research (Kalyuga, 2007). For instance, with regard to signaling it has recently been shown that low prior knowledge learners benefitted from highlighting text-picture correspondences in the material, whereas learners with either medium or high prior knowledge showed even worse performance in the signaling condition if compared to a control condition (Richter, Scheiter, & Eitel, in press). Different explanations have been provided in the literature for why harmful instructional design effects for learners with advanced prior knowledge occur. First, the design of the material may nudge these learners into processing information that is redundant to what they already know (Kalyuga, Ayres, Chandler, & Sweller, 2003). Second, some designs may suppress meaningful learning activities that would have helped advanced learners to achieve deeper comprehension. For instance, it has been shown that highly coherent texts hamper understanding of these texts for advanced learners, because there is no more need for knowledge-based inferences to overcome the incoherence of the materials (McNamara, Kintsch, Songer, & Kintsch, 1996). As a consequence, learners with more domain-prior knowledge will learn better from incoherent compared with coherent texts, whereas the reverse is true for learners with less prior knowledge. To conclude, the expertise-reversal effect suggests that a one-size-fits-all for instructional design will not guarantee effective learning for all learners.

More recently, researchers have reframed the problem of inadequate cognitive processing of multimedia material into a challenge regarding learners’ lack of or erroneous self-regulation of their learning processes (e.g., Kombartzky, Ploetzner, Schlag, & Metz, 2010; Stalbovs, Scheiter, & Gerjets, 2015). This reinterpretation, which will be explained in more detail in the next section, implies that rather than relying on optimally designed instructional materials, learners should be guided towards selecting and applying appropriate cognitive processing strategies. That is, learners should be supported in becoming “good information processors” (Pressley, Borkowski, & Schneider, 1989).

A Self-Regulated Learning Perspective on Multimedia Learning

Self-regulated learning can be characterized as including metacognitive, motivational, and behavioral processes that result in the active engagement of individuals in their own learning (Azevedo, 2005; Boekaerts, 1999; Winne & Hadwin, 1998; Zimmerman & Schunk, 2001). According to Boekaerts (1999), the heterogeneity of theories of self-regulated learning can be captured by analyzing these processes as a function of three different layers: The outer layer comprises the motivational and volitional regulation of the self (i.e., choice of goals and resources). The middle layer addresses the regulation of the learning process (i.e., use of metacognitive skills to direct one’s learning), and the inner layer refers to the regulation of the information processes (i.e., choice of cognitive strategies). Here we focus on the two inner layers, since we are considering instructional settings, in which resources (i.e., the information environment) are given and instructional goals are predefined.

According to models of self-regulated learning, an inadequate cognitive processing of multimedia materials can be seen as a failure to regulate one’s learning at the meta-cognitive and information-processing levels. Such a failure can occur for different reasons. First, learners may be unable to judge what they know and what they do not know, thereby failing to monitor their understanding of a domain (Bjork, Dunlosky, & Kornell, 2013). Inaccurate comprehension monitoring can result in either over- or underconfidence regarding one’s level of understanding, with both biases resulting in different problems for a student’s regulation of learning behavior. Overconfidence in one’s knowledge may cause learners to terminate studying prematurely, whereas underconfidence will result in learners investing time in studying materials already well understood. Hence, both biases result in an inadequate allocation of study time (Son & Metcalfe, 2000). Recent research has shown that the use of multimedia materials (compared with text-only instruction) increases the likelihood of learners becoming overconfident in their knowledge (multimedia heuristic, Eitel, 2016; Serra & Dunlosky, 2010). Failure to regulate one’s learning may also be caused by a student’s lack of knowledge regarding the question of how to respond to, for instance, gaps in their understanding. That is, even when correctly detecting gaps, students may not know how to overcome these gaps. Veenman, Van Hout-Wolters, and Afflerbach (2006) have conceptualized this problem that occurs when students lack strategy knowledge. They suggest that ideally learners should know what to do (declarative strategy knowledge), when and why to do it (conditional strategy knowledge), and how to do it (procedural strategy knowledge).

Applying a self-regulated learning perspective to multimedia learning suggests that learners need to be supported in assessing what they (do not) know (monitoring support) and in regulating their learning behavior in a way that matches their current understanding. So far, support measures have focused on verbal or visual instructions that convey strategy knowledge and make its use in a given learning situation more likely (for an overview see Renkl & Scheiter, 2015). For instance, a number of studies have used prompts or prompt-like instructions that tell students to apply certain cognitive processes such as information integration (e.g., Bartholomé & Bromme, 2009; Kombartzky et al., 2010; Schlag & Ploetzner, 2010; Stalbovs et al., 2015). Moreover, visual instructions have been used where learners were shown eye movements that illustrated helpful visual behavior in advance to learning from multimedia materials (e.g., Mason, Pluchino, & Tornatora, 2015; Skuballa, Fortunski, & Renkl, 2015). These support measures have in common that they can be deployed irrespective of whether well-designed instructional materials are available; moreover, they can help students to become independent learners who are able to control their learning without having to rely on high-quality instructions. However, their effectiveness depends on a number of possible boundary conditions that are not yet fully known (Renkl & Scheiter, 2015). Moreover, they are deployed in a one-size-fits-all fashion to all learners regardless of whether they are already able to self-regulate their learning or not. Furthermore, not only for instructional design measures but also for self-regulation interventions, expertise-reversal effects (Kalyuga, 2007) have been revealed. For example, Nückles, Hübner, Dümer, and Renkl (2010) found that psychology students who self-regulated their learning during journal writing initially benefited from being supported by prompts that activated elaborative, organizational, and metacognitive learning strategies. When such prompting was continued over the course of a semester, however, it had detrimental effects on the motivational as well as on the cognitive level in the second half of the semester. Adaptive forms of support, which will be introduced next, offer a potential solution to this problem.

Adaptive (Multimedia) Learning Environments

Adaptivity is present when the instruction automatically changes in response to the learners’ states and learning behaviors (Akbulut & Cardak, 2012; Park & Lee, 2004). Adaptive learning environments or response-sensitive systems (Park & Lee, 2004) require that relevant learner states and learning behaviors are assessed and evaluated online. Thus, the system takes over some of the monitoring that is required from learners during self-regulated study. Then the system will react towards the results of this diagnosis, thereby supporting (or even replacing) learners’ regulation of the learning process. Adaptive learning environments respond to a learner’s state and behavior on a moment-to-moment basis rather than to the results of a one-time assessment prior to learning. Their responses can take many forms: For instance, the system could reduce or increase the difficulty of the learning task, offer prompts that tell students how to proceed, or choose a different format of instruction (e.g., more or less elaborate explanations, another multimedia design variant). From a self-regulated learning perspective it is important to disinguish two different forms of adaptivity. Assistive adaptivity occurs when the system suggests to a learner how to proceed, while the learner maintains control over whether and how to follow the suggestion. For instance, a learner could decide to (not) follow the system’s advice of restudying a previously encountered unit of instruction. Directive adaptivity, on the other hand, constrains learner control to a much stronger extent by offering a further choice to the learner. For instance, automatically displaying the previously encountered unit of instruction leaves no option to the learner other than to restudy it. Thus, whereas assistive adaptivity only scaffolds the process of self-regulated learning, directive adaptitivy imposes external control of the learning process. In both cases, the adaptivity mechanism presupposes that there is an unambigious mapping between a diagnosis and a system response (e.g., “if test item X is not answerred correctly, then display unit X again for restudy”).

Adaptive learner environments differ in whether they select responses relative to single learner states or behaviors or relative to a learner model. Such a learning model captures all relevant states and behaviors and is continuously updated while the learner proceeds with the learning task. For adaptive systems relying on learner modeling, Shute and Zapata-Rivera (2008) have proposed an adaptive cycle that includes four components, namely capturing, analyzing, selecting, and presenting. First, the system captures data about a learner who is interacting with the system. These data are the foundation for the learner model, which is generated later in the course. Data collection is ongoing during the whole interaction process and aims at updating the learner model. Next, the captured data are analyzed in order to directly create a model of the learner according to content-specific information presented in the system’s learning environment. Based on the learner model, the system can determine whether an intervention is necessary and, furthermore, identify the kind of intervention that should be presented to the learner. Thus, the third process refers to the selection of a system response such as a hint, a prompt, or an explanation. The main purpose of the system is to determine what kind of system response or information is appropriate. Although decision rules can be predefined, they continually need to be updated during the interaction between the learner and the system. Finally, the last component corresponds to the presentation of the selected adaptive intervention. Although such an adaptive cycle is linear at the beginning, recurrences and returns between the components become inevitable. The first cycles generate a rather coarse learner model, which becomes refined over time. Thus, the learner model is not static because new learning traces can be used to update, revise, or verify the learner model. Creating adaptive systems based on a comprehensive learner modeling can be very challenging, because it requires extensive a priori knowledge regarding an effective mapping between the various combinations of learner states/behaviors and the adequate responses of the system.

Moreover, adaptive systems differ in whether they rely on a diagnosis of a learner’s processing behavior or on his/her current state of knowledge, skill, or motivation (e.g., disengagement).

Finally, adaptive systems can be distinguished according to the way that they gather information required for choosing an appropriate response from the system’s repertoire. Relevant information can be gained explicitly by asking the learner to answer a questionnaire or test or implicitly by drawing inferences from the way how the learner interacts with the system. Explicit assessment methods provide—if well designed—a valid judgment of the current state of affairs. However, in particular the use of long assessments may disrupt the learning process; moreover, working on such assessments repeatedly can be demotivating for learners. To counteract these problems, rapid assessment tasks (RATs) have been suggested (Kalyuga, 2008) as an alternative to more comprehensive tests. RATs are short assignments that are interspersed into the learning material and that aim at assessing the current cognitive state of the learner. Verification tasks are a special version of RATs. They ask the learner to judge whether statements referring to the previously learnt content are correct or incorrect. This technique has been proven to be efficient and non-reactive in past research (Renkl, Skuballa, Schwonke, Harr, & Leber, 2015). Based on the results of such verification tasks the further path of the learning experience can be adjusted to the learners’ needs: learners who provide correct answers may continue with their learning paths, while learners with insufficient knowledge receive specific instructions and interventions.

In contrast to explicit assessment methods, implicit assessment methods are characterized by their unobstrusiveness because data collection may take place even without the learner realizing it (Barab, Bowdish, Young, & Owen, 1996). However, it may be rather difficult to infer and interpret a learner’s knowledge, and learning merely from his/her interactions with the learning environment. For instance, adaptive hypermedia systems (Brusilovsky, 2001), in which students’ navigation behavior was used to inform the system’s response, were not very successful in providing information concerning the deployment of information utilization strategies. This is because navigational behavior (e.g., selecting a link, browsing through a section) operates at a rather coarse level where it often remains unclear which navigation behavior corresponds to a certain cognitive or affective process. Moreover, rather than looking at a single click on a link, it is often necessary to analyze longer navigational sequences to derive meaningful behavioral patterns, making unambigious data interpretation rather difficult.

More recently there have been attempts to use more fine-grained assessment methods that are assumed to be more closely linked to the actual learning behavior and learner states (cf. Azevedo et al., 2017; Spüler et al., 2017; Winne et al., 2017). In the present chapter we focus on the use of eye tracking as a way of assessing learning behavior as well as inferring learners’ cognitive (and motivational) states from it (cf. Conati & Merten, 2007; D’Mello, Olney, Williams, & Hays, 2012; Roda & Thomas, 2006; Toet, 2006). Eye tracking provides information on where a person is looking, for how long, and in which order. According to Just and Carpenter (1980), this information can be taken as an indication of a person’s processing of information at the cognitive level (eye-mind hypothesis). In particular, it is assumed that elements that are fixated will be processed in the mind without any considerable delay (immediacy assumption). The duration of a fixation (i.e., the time when the eye is positioned on one spot and when information intake occurs) thus can be interpreted as the intensity with which some information element is processed. Longer fixation times can thus indicate more interest from a learner or more relevance of that information for the learning task. Saccades (i.e., rapid eye movements in between fixations during which information intake is suppressed) are indicative of the order of information processing. In multimedia research, saccades between text and pictures are seen as indicators for the process of text-picture integration (Johnson & Mayer, 2012; Scheiter & Eitel, 2016).

Conati and Merten (2007) investigated the usefulness of gaze data and compared three different probabilistic models describing a learner’s behavior. They demonstrated that the inclusion of on-line eye movement data improved sensitivity and specificity in predicting when learners were implicitly self-explaining learning content. Gaze data hence helped to model the mental state of a learner more accurately. In addition, gaze data can also provide information about the motivational states of learners such as boredom and disengagement (D’Mello et al., 2012). Eye tracking has also yielded important insights into how students learn from multimedia materials (cf. Scheiter & Eitel, 2016; Scheiter & van Gog, 2009; van Gog & Scheiter, 2010). Thus, this method seems highly suitable as a diagnostic instrument that could also be used for the design of adaptive learning environments.

Until recently, the use of eye tracking was both cumbersome and expensive; however, this has changed to some extent with the development of customized systems. Importantly, at the moment these systems provide relatively easy-to-use methods for analyzing eye-tracking data offline after the learning took place. However, online analysis methods have not yet been implemented in the software packages. Therefore, one important aspect of the development of the multimedia learning system described in the next section was to allow for an online analysis of the learners’ eye movements as a prerequisite for incorporating adaptivity.

The Adaptable and Adaptive Multimedia System (AAMMS)

The Adaptable and Adaptive Multimedia System (AAMMS) is a multimedia learning environment that was developed in an interdisciplinary project with researchers from computer science, psychology, and educational technology. It is based on the ILIAS 4 Open Source Framework (www.ilias.de) and uses the infrastructure of an established ILIAS installation to present multimedia content as well as instructional interventions such as prompts. The AAMMS allows for different modes of regulation: On the one hand, learners can adapt the instruction to their own information needs by choosing which content should be displayed in which format and which types of support they would like to receive (adaptable mode). On the other hand, the AAMS offers an Adaptive Learning Module (ALM) that allows for implementing adaptive instruction based on rapid assessments and eye tracking (adaptive mode). Both modes can also be switched off, in which case the multimedia content is displayed as pre-determined by an instructor or researcher (fixed mode).

The User Interface

Figure 9.1 shows the user interface with its five main areas. The navigation tree (1) provides access to the learning units by means of a hierarchical menu. The navigation bar (2) offers page-by-page browsing. Each learning unit is made up of several representations such as written and spoken texts, schematic and realistic images, videos and animations. In adaptable learning scenarios, where learners can decide upon the learning content themselves, different representation formats can be accessed via the media shelf (3). The content area (4) displays the representations that were selected by a learner (adaptable mode), selected by the system (adaptive mode), or pre-determined by an instructor or researcher (fixed mode). The support area (5) can be used to offer prompts or other interventions to foster specific cognitive learning activities.

Fig. 9.1
figure 1

The user interface of the Adaptable and Adaptive Multimedia System (AAMMS)

When learning in the adaptable mode of the AAMMS, users can either follow a linear pathway through the learning units or they can use the navigation tree to navigate freely in accordance with their individual learning goals. For each learning unit, the system suggests an initial combination of representations and displays them in the content area. The learners may customize this combination by dragging their preferred representations from the media shelf into the content area. A preview of each representation is displayed in the media shelf to assist in the selection process. The learners’ combinations of representations are recorded by the system. If the learners re-visit a learning unit, the stored combinations are displayed again.

In each learning unit prompts can be presented to the learners that ask them to engage more deeply with the instructional materials (cf. Ruf & Ploetzner, 2014). They are stated as questions, for example: (a) Which information is essential? (b) Which relations can be identified? (c) How is the information related to the overall topic? Learners who need additional information can obtain more specific questions by clicking on the questions. Each question has a textbox below it to take notes. The notes are automatically stored so that learners can review or change them when they re-visit the learning unit.

Moreover, it is possible to employ different types of assessment during learning. Before the learners start a new learning unit, they may be asked questions about the unit just completed. These questions can be used either by the learners to self-assess and monitor their current level of understanding or by the system to trigger adaptations of the learning material. Moreover, in conjunction with an eye tracker, the ALM included in the AAMMS allows analyzing students’ eye movements online and adapting the instruction based on this analysis.

System Architecture of the Adaptive Learning Module (ALM)

The ALM offers predefined adaptivity functions that authors can implement in their learning environments. These adaptivity functions allow adaptive behavior of the learning platform in response to a user’s behavioral data (Schmidt, Wassermann, & Zimmermann, 2014). The adaptivity functions are mainly based on learner data from two sources: a learner’s eye movements and answers to rapid assessment verification tasks.

The eye-tracking application of ALM connects the eye-tracking hardware and the web-based learning environment (cf. Wassermann, Hardt, & Zimmermann, 2012). In particular, the ALM allows receiving, analyzing, and responding to eye-tracking data via a web-socket interface in real-time. The application’s adaptivity is based on the capture of gaze fixations on pre-defined areas of interest (AOIs). When a user fixates an AOI for a predefined time, a fixation is registered and recorded. The system allows counting the number of fixations, their overall duration (dwell time), and the frequency of transitions between AOIs, that is, when learners move their visual attention from one AOI (e.g., a text) to another (e.g., a picture) and vice versa. The data are recorded for each learning unit separately and are analyzed once a learner indicates that he or she wants to proceed to the next learning unit. For each learning unit, threshold values can be defined that need to be reached in order to be considered as adequate learning behavior. If the learning behavior remains below the threshold values, a system response is generated and the learner is prevented from proceeding to the next learning unit. For instance, when the ALM registers a too short reading time for a text in a given learning unit, it can prompt the learner to re-read the text or the text can be highlighted to nudge the learner into rereading it (for details on the adaptivity of ALM and its technical implementation, see Schmidt et al., 2014).

Adaptivity functions can also be based on a learner’s performance in interspersed rapid assessment tasks (e.g., Kalyuga, 2008; Renkl et al., 2015). ALM supports overlay prompts on the screen for asking, for example, multiple-choice questions. This ALM function can be used to easily integrate rapid assessment tasks into the learning environment. When learners answer a rapid assessment task, the system assesses the predefined answers and reacts to it. For example, the system can limit the learner’s ability to navigate through the learning content by temporarily disabling the “continue” button and asking the learner to re-read highlighted areas of the learning material again. The system can also present other learning aids such as prompts asking the learners to rethink particular aspects of the learning content or to write down a self-explanation.

Empirical Evidence

Two sets of studies have been carried out evaluating the effectiveness of the ALM for supporting learning from multimedia. In the first set of studies, we tested ways to best close knowledge gaps that remain during learning with multimedia learning materials. In the second set of studies, the ALM was used to detect potential learning problems in real-time using eye tracking and to change the design of the instruction adaptively. Both sets of studies will be sketched next.

Closing Knowledge Gaps During Learning with Multimedia

This section exemplarily presents two studies on how to best close learners’ knowledge gaps by an adaptive procedure based on rapid assessment tasks. An initial study addressed the question of whether eye-tracking indicators can be used to reduce the number of presented rapid assessment tasks and, thereby, learning time without the drawback of overlooking many of the learners’ knowledge gaps. The second study evaluated variants of restudy prompts that can be presented when a learner fails to correctly answer a rapid assessment task. Hence, this second study implemented assistive adaptivity.

In the first study (Skuballa, Leber, Schmidt, Zimmermann, & Renkl, 2016; N = 60 university students), we tested whether eye-tracking data can be used to select and, thereby, reduce the number of rapid assessment tasks without the disadvantage of missing potential knowledge gaps. More specifically, we compared two conditions: (a) In the full-presentation condition, we provided all available rapid assessment tasks; and (b) in the adaptive-presentation condition, we provided a rapid assessment task whenever the eye-tracking data hinted towards a potential knowledge gap (e.g., very short dwell time at a certain picture). We found that the selection of rapid assessment tasks increased their hit rates (i.e., enhanced diagnostic sensitivity) in that these tasks were answered more often incorrectly as compared to tasks in the full-presentation condition (43% vs. 32%). The adaptive presentation also reduced the learning time by about 17% without compromising learning outcomes; the latter were comparable in both conditions. Overall, our findings suggest presenting rapid assessment tasks based on eye-tracking indicators hinting towards potential knowledge gaps. The main advantage is that the learners need less study time.

Beyond the question about which rapid assessment tasks should be presented, it is an open question which type of restudy prompt is best provided in the case of a knowledge gap. A prompt may encourage a learner to close the very specific knowledge gap that has been detected by a rapid assessment task (e.g., the fact that it is the nucleus where DNA doubles during mitosis). Such a specific prompt might be a parsimonious intervention but it may fail to address a potentially “bigger problem”: The learner may miss not merely a specific piece of knowledge, but his or her knowledge representation may be incomplete with respect to a broader sub-area of the learning contents (e.g., what happens in general in the nucleus during mitosis). In the latter case, a prompt that encourages not just looking up the specific missing knowledge piece, but considering the “field” of related knowledge pieces as well might have broader effects on learning outcomes. Such broader and more unspecific prompts, however, have the potential disadvantage of being less efficient than specific prompts when just such a specific piece of knowledge is missing. Furthermore, unspecific prompts can induce redundant (i.e., unnecessary) processing of already understood materials (cf. the redundancy effect; Sweller, Ayres, & Kalyuga, 2011).

Renkl, Skuballa, Schwonke, Harr, and Leber (2015; Exp. 2) compared the effects of specific and unspecific restudy prompts (i.e., focusing on a very specific piece of knowledge or the “field” of related knowledge pieces; N = 41 university students). In the specific prompts condition, the relevant text passages were highlighted by darkening the less relevant information on the page. The prompt requested learners to restudy the relevant passage in order to solve the task correctly, and the task was repeated (Fig. 9.2). In the unspecific prompts condition, the learners were asked to restudy and figure out both the direct answer to the question and to explore the broader context (Fig. 9.3).

Fig. 9.2
figure 2

Screenshot of a specific prompt (taken from Renkl et al., 2015)

Fig. 9.3
figure 3

Screenshot of an unspecific prompt (taken from Renkl et al., 2015)

We assumed that specific prompts were superior in repairing the specific knowledge gaps identified by the rapid assessment tasks and in acquiring knowledge about the central issues of the mitotic process (as these issues were covered by the rapid assessment tasks). We expected that unspecific prompts were more effective in fostering knowledge about more general issues related to mitosis. The results showed that both types of prompts repaired the specific knowledge gaps in most cases (in over 80% of the cases). Moreover, we found a general superiority of unspecific prompts, thereby suggesting that knowledge gaps should be closed by unspecific restudy prompts.

Overall, adaptive systems based on a rapid-assessment procedure should provide rapid assessment tasks only if eye-tracking indicators hint towards a potential knowledge gap. If the learners cannot answer correctly a rapid assessment task, they should receive prompts that ask them to restudy the corresponding field of related knowledge pieces.

Adapting the Multimedia Design in Response to Learners’ Eye Movements

Previous research has shown that some learners have difficulties in adequately using effective cognitive processes like selection, organization, and integration while processing multimedia materials (e.g., Mason et al., 2013). To support learners by providing them with personalized, just-in-time instructional support, the ALM was used to monitor, analyze, and modify the learners’ individual processing behavior online based on the learners’ eye movements in two studies described below. In contrast to the aforementioned set of studies, the adaptivity mechanism relied on directive adaptivity in that the system’s response led to a change in the instructional format of the material, which learners were forced to make use of.

In order to tailor the ALM to the population under study, Schubert et al. (n.d.) first determined threshold values in a pre-study. In this study (N = 32 students), patterns of eye movements were identified that were suited to distinguish successful versus less successful learners in a non-adaptive multimedia learning session on mitosis. Results showed that successful learners had longer fixations times and higher fixation counts on text and pictures as well as more transitions between text and pictures than less successful learners. These findings were then used to implement the gaze-based adaptive system that analyzed learners’ eye movements during learning and altered the presentation of the materials according to learners’ viewing behavior. Whenever learners showed a viewing behavior similar to that of the unsuccessful learner group in the pre-study (i.e., too short fixation times on either text or pictures or too few text-picture transitions), the system presented the same content in an instructional design that should prompt adequate processing. In particular, whenever either text or picture on a given page were processed for too short a time (i.e., below the threshold values derived from the first study), the text or the picture was enlarged, thereby covering most of the screen (Fig. 9.4, left panel). In case of too few text-picture transitions, the page design was altered in that corresponding elements from the text and the picture were then highlighted using the same colors (Fig. 9.4, right panel), thereby signaling the conceptual relations between the verbal and pictorial information (cf. Richter et al., 2016, for the effectiveness of multimedia integration signals). The system enforced processing of the redesigned multimedia materials in that these were presented for a fixed amount of time before the students were allowed to proceed to the next page of the learning materials. This page was again presented in the standard layout without enlargements of color coding.

Fig. 9.4
figure 4

Exemplary pages of the adaption: Zooming-out of the picture (left panel) and presentation of the color-coded version (right panel)

The first study with the adaptive system (N = 79 students) investigated whether the adaptive multimedia learning system would have any beneficial effects on learning compared to a non-adaptive, fixed presentation of the same materials. Students learned with either the adaptive or the non-adaptive multimedia instruction about mitosis while their eye movements were recorded. After learning, their recall and comprehension of the materials was assessed. As thresholds, we used the mean fixation times on either text or picture and the mean number of transitions of the group of non-successful learners identified in the pre-study and added one standard deviation to it. Results showed that irrespective of learners’ prior knowledge, the gaze-based adaptive system had no effect on the effectiveness of multimedia learning. A possible reason for these findings was that the thresholds might have been set too high so that successful learners were also falsely identified as poor learners.

The aim of second study (N = 58 students) conducted with the gaze-based adaptive system was hence to improve the adaptivity mechanism of the system. To this end, we adjusted the threshold values by choosing the mean values of the group of unsuccessful learners from the pre-study (rather than adding one standard deviation to it). This way we expected only learners with inadequate processing behavior to receive personalized instructional support. Again, students either learned with the adaptive or the non-adaptive multimedia instruction about mitosis. Results showed no effects for recall performance. For comprehension, there was a significant interaction between experimental condition and students’ prior knowledge: stronger students scored marginally higher with than without adaptive instructional support, whereas weaker students scored significantly worse with adaptive instructional support. These results can be interpreted in at least two ways: First it may have been the case that the adjusted thresholds were now too restrictive so that learners with inadequate processing behavior were falsely identified as successful learners. Second, there might have been too little adaptive instructional support especially for learners with lower prior knowledge. Further studies are required that look more closely at both the assessment as well as the support component of the adaptive multimedia learning system, before conclusions regarding the effectiveness of adjusting instruction to a learner’s gaze behavior can be drawn.

Conclusions

Adaptive learning environments are supposed to enhance learning by providing personalized support for every individual student. The multimedia information environment described in this chapter shows how challenging the design of such a system can be. From a learning sciences perspective, the challenges pertain to at least three aspects that match well onto the description of adaptive systems provided by Shute and Zapata-Rivera (2008).

The first challenge is to decide what constitutes a learner state that requires adaptive instruction and how this state should be diagnosed, thereby pertaining to the assessment component of adaptive systems. In the sample studies sketched in this chapter, two diagnostic approaches have been implemented: rapid verification tasks and eye movements. Rapid verification tasks address students’ current knowledge in an explicit fashion, hence their interpretation is rather straightforward. Eye movements, on the other hand, are more ambiguous in this respect. For instance, the relation between a given eye movement behavior and learning outcomes appears to vary across different material and learners (cf. Scheiter & Eitel, 2016; Schwonke et al., 2009), requiring extensive pretesting to calibrate these relations before an adaptive system can be deployed. Moreover, eye movements are not just indications of students’ cognitive processes while learning, but they may also point towards students’ motivational states such as boredom and disengagement (D’Mello et al., 2012), thereby further complicating the state of affairs. Against this background, Skuballa et al. (2015) used an adaptation procedure in which eye-tracking data and rapid assessment tasks were combined. First evidence suggested that such a combination is promising.

Despite these difficulties, there is a current trend of considering more rather than fewer physiological indicators as possible candidates for assessing learner states as a prerequisite for adapting instruction—including emotional expressions, skin responses, and brain activity parameters (see also Azevedo et al., 2017; Spüler et al., 2017). The promise of this multivariate approach is that triangulation of these different data sources will allow better disambiguation of a student’s current state of learning. Moreover, these advanced learning technologies often rely on more complex methods of data analyses such as machine learning algorithms to determine patterns of parameters related to successful learning (see also Azevedo et al., 2017; Spüler et al., 2017; Winne et al., 2017). Using these methods thus allows describing learner behavior and states in a multidimensional space that comprises emotional, cognitive, and metacognitive components. Moreover, there is increasing interest in learning analytics based on large data sets to account for the complexity of (self-regulated) learning that results from interindividual as well as intraindividual differences in learning processes (Winne et al., 2017).

The second challenge pertains to the question of which instructional support measures promote learning best, thereby addressing the response component of adaptive systems. While instructional design research has made enormous progress in determining how instruction should be delivered in order to be effective, there still remains quite a bit of ambiguity in this regard. Instructional design variants do not prove effective in all situations; rather, their effectiveness seems to be tied to certain boundary conditions that are not yet fully understood (cf. Renkl & Scheiter, 2015). For instance, the effectiveness of prompts appears to depend on their focus (specific vs. general), whether they require externalizing of knowledge, or reproduction of knowledge versus generating new information via inferencing, to name just a few dimensions.

The third challenge results from the first two and refers to the question of matching a learner state with the most adequate system response. Thus, it addresses the rules that should be used to link assessment and response to each other. It is yet unclear whether we will ever be able to come up even with heuristics telling instructional designers which variant of an instructional material will be most suited for which learner. What has become very evident in the past years of educational research is that a student’s level of prior knowledge plays an important role in this regard. That is, for many multimedia designs it has been shown that design variants that clearly improve learning for students with less prior knowledge have no or even detrimental effects for students with high prior knowledge (cf. expertise reversal effect, Kalyuga et al., 2003; Kalyuga & Renkl, 2010). Hence, this research suggests that adaptivity mechanisms need to consider a person’s prior knowledge. However, it is still an open question whether there are other variables that have similar effects on the effectiveness of different instructional designs.

In the present chapter, we described different versions of a fully adaptive system that diagnoses a learner’s behavior and responds accordingly. Based on the results of the two sets of studies reported, it looks as if assistive adaptivity is more promising than directive adaptivity. However, even though we used similar learning materials and assessments, there are still a number of differences between the sets of studies beyond the adaptivity mechanism. Most importantly, the studies also differ in the assessments (eye tracking plus rapid assessment tasks to diagnose a learner’s knowledge state vs. eye tracking to diagnose learning behavior) based on which a response was given. Thus, further studies are needed that systematically address the question of how much learner control should/can be offered during regulation.

Irrespective of the adaptivity mechanism that had been implemented, we faced all three of the aforementioned challenges in the design of our adaptive system. An alternative to this approach is offered by systems that provide an assessment a learner’s current state and/or his/her learning processes, which is then communicated to the learner (e.g., via a visualization of his/her state that is described based on multichannel data, see Azevedo et al., 2017; nStudy, Winne et al., 2017). Importantly, these systems then leave regulation to the learner, thereby implementing the extreme of assistive adaptivity. They thus rely on the implicit assumption that while learners may experience difficulties in accurately monitoring their current state, they are very capable of regulating their learning behavior relative to their states. These systems thereby circumvent the second and third challenge mentioned above. That is, they do not require any insight into possible system responses and how these match with learner states.