Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Dynamic Visualisations and Learning

1.1 General Properties of Dynamic Visualisations

The words picture, visualisation, image, depiction are employed in education and psychology research interchangeably to describe representations that can convey stationary or moving elements. In the latter case, the representations are generally called dynamic pictures, dynamic visualisations, or—the shorter and preferred form—animations. So, what do education and psychology researchers mean by this word? According to Roncarrelli [84], animation can be defined as:

Producing the illusion of movement in a film/video by photographing, or otherwise recording, a series of single frames, each showing incremental changes..., which when shown in sequence, at high speed, give the illusion of movement. The individual frames can be produced by a variety of techniques from computer generated images, to hand-drawn cells. (p. 8)

One important aspect to highlight in this definition is that animations are composed of a series of single frames (i.e. static pictures). The quantity of these still-frames changing per unit of time can be defined as temporal granularity [88]. An animation with many frames per unit of time would have higher temporal granularity than one with a lower number of frames in the same amount of time. Figure 1 shows two animations with different temporal granularities, 12 frames per second (fps) versus 24 fps.

Fig. 1
figure 00221

Comparison between two animations with different temporal granularities

Following mainstream conventions of temporal granularity, we can mention that the animated cartoon standard of 12 fps is a condition of lower temporal granularity than the film industry norm (24 fps), or the US analog television and video convention (NTSC, 30 fps).

The standard of 12 fps has been used extensively in animated cartoons [cf. 109] because it is sufficient to achieve the illusion of movement—due to persistence of vision—that is mentioned by Roncarrelli. As it can be inferred, when the pace of the dynamic visualisation is progressively reduced, the illusion of animated frames gradually disappears, until they are perceived as static pictures. On the contrary, in typical conditions of watching animations (≥12 fps), the individual static frames are hardly noticed, and, more importantly, each frame is visible for only an instant (≤1/12 s). This means that one distinguishing property of animations is their transiency.

A second important aspect to highlight from the definition above is that the approach employed to produce the animations, whether vector or raster graphics, is irrelevant. On the one hand, vector graphics, the branch of design that illustrates line drawings and shapes, is associated with the traditional concept of animation and cartoons; on the other hand, raster or bitmap graphics, the branch of production that records audiovisual frames, is associated with the traditional concept of film and video [84]. As an example, consider a chemistry multimedia that includes three different dynamic visualisations: (1) a video (bitmap) of the procedures to mix substances in the laboratory; (2) a symbolic representation (2D vector) of the equations describing reactants and products; and (3) a molecular depiction (3D vector) of the changes in chemical bonds occurring [cf. 54]. Although these three screens show very different visualisations, which were produced by different methods, according to the definition used here we can consider all of them as animations.

Then, why not distinguish dynamic depictions according to these subcategories? Although we acknowledge that vector and bitmap graphics have obvious visual differences [cf. 43], we want to focus on the similarity of animated cartoons, 3D depictions, films, videos, television, etc., which is their transiency. Thus, any transient visualisation will be defined as animation or dynamic visualisation henceforth. As it can be inferred, this transient property of animations affects learning, what we describe next.

1.2 Relevance of Dynamic Visualisations to Learning

Animations are popular and their influence transcends educational boundaries. In fact, dynamic visualisations are so ubiquitous that viewers may confound their commercial-recreational or informational-educational roles [cf. 102]. For example, Wineburg (2000) noted that both students and their parents made references to fictional movies such as Forrest Gump and Schindler’s List to support claims in their arguments about historical events [as cited in 108]. Also, Lewis et al. (1997) showed that students employed detective series, science fiction movies, and television programmes as their main sources for science and genetics knowledge [as cited in 52]. Similarly, a study with 218 subjects (12–13 year-old) that learned from a material consisting of 16 short animations and 16 corresponding short text pieces, showed that the students spent more time learning from the animated information. This preference for dynamic visualisation over text was also unexpectedly observed in the condition where subjects were notified that they would be assessed later, although it had been predicted that they would choose text, as this representation is the conventional for school evaluations [13]. These three examples show that dynamic visualisations are so appealing that they are preferred for informational-educational purposes over other instructional materials that might be more suitable for that role.

Interestingly, this preference can be exploited by using animations as motivational enhancers of other educational media or materials. This is not the same as stating that the only educational purpose of dynamic visualisations is a boost in motivation. Above all, researchers agree that the design of instructional animations must be aimed beyond motivation per se. In this sense, dynamic visualisations can give students opportunities beyond the spatial or temporal limitations of learning from static materials, as the following examples show:

  • To shorten, via fast motion, the duration of phenomena excessively slow to be observed in real time [102]. Also, the method of fast motion can be used to analyse dynamic recordings that have extended for lengthy periods [29].

  • On the contrary, to study the details of movement via slow motion, irrespective of the original speed of the phenomenon [102]. For example, this technique is very important for studying motor tasks, facial expressions of emotion, and nonverbal communication [29].

  • To zoom at microscopic details or at far away phenomena in outer space [102].

  • To study events that occur in unapproachable places, such as inside the human body, underwater, in extreme temperature conditions [102], or in situations that a live observer would rather avoid [29].

  • To watch otherwise invisible events, by using X-rays, infrared, gamma, ultraviolet, holographic techniques, and many others [102]. For example, the assistance of X-rays can be used in medical diagnosis [29]. Otherwise, dynamic visualisations can assist to explain not inherently visual phenomena, such as forces, energy or electrical circulation [12].

  • To analyse the same phenomenon multiple times [102]. Also, the analysis can be performed by many reviewers, what can benefit interjudge reliability [29].

  • To compare simultaneously two or more events by splitting the screen accordingly. This can be a valuable tool in sports training, where a close-up view can be juxtaposed to a wider setting [31].

According to these capabilities, and inspired by research into explanations and maps, Tversky et al. [104] gave the following summary on how animations might promote learning:

Explaining a process can be supplemented with analogies to other processes, with enlargements of subprocesses.... Just like good explanations and good maps, animations can include other views, other scales, other examples, other processes; they can use language and extra-pictorial devices to connect them. Animations can exaggerate and minimize and distort and highlight information. (p. 281–282).

Because of this potential, an enormous variety of animations, videos and films have been designed to assist learning, including school-age instruction, university settings, advanced and continuing education, job training, health education, safety education, medical treatment preparation, and special population instruction [e.g. 30]. However, educational animations motivated by different aims than students’ learning, chiefly for technology showcase, may prove ineffective. This focus exhibits what Mayer [65] criticised as a technology-centred approach, which forces learners to adapt to technologies, instead of shifting to the desirable opposite direction with a learner-centred approach. Hence, for animations to be effective learning tools they must be designed according to the cognitive architecture of the learner.

In conclusion, animations may have educational relevance and potential, provided that limits of the human cognitive architecture are considered when designing these dynamic visualisations. Acknowledging these limitations is also fundamental to designing other materials for learning, for example, the instructional static visualisations that we incorporate in the next discussion.

1.3 Learning Outcomes with Dynamic Versus Static Visualisations

When learning a dynamic system from static visualisations, the student must deduce the movement of the components involved, by a process described as mental animation [39]. In contrast, when learning the same dynamic system from an animation, the movement of the components is shown directly [40, 68]. If we consider that it may be more difficult to deduce than to observe directly the dynamics of the moving system, there appears to be a cognitive advantage of a dynamic visualisations as compared to its static equivalents [40, 68]. One example of better learning with animation versus static pictures is an experiment with 119 fourth and fifth graders from an urban elementary school, who were given the topic of Newton’s laws of motion. In this study, children in the animated conditions scored significantly higher on the posttest than those in the static pictures conditions [82]. Another experiment with 160 first year university students that learned from two multimedia instructions of astronomy and geology, revealed that the dynamic condition (12 animated steps of each phenomenon) outperformed the static version (12 key frames of each) in tests of retention. Similarly, the researchers observed that, when the learning setting was in dyads of students, the animation groups presented higher transfer scores than the static conditions [81]. These findings tend to support that better learning is achieved after watching an animation of a dynamic system rather than after inferring mentally the dynamics from static visualisations.

However, the fact that a dynamic depiction shows directly the motions of the components of the system, can also be a cognitive disadvantage, since it may promote “mental laziness” in the learners. According to this view, if learners are able to invest cognitive effort to perform a mental animation when given static pictures, giving them the already animated visualisation may impede this form of active generative processing that should enhance learning [40, 89].

Whether an animation fosters either a learning advantage or a disadvantage over static visualisations has not been resolved by research [e.g. 78]. For example, after reviewing studies of chemistry multimedia, Kozma and Russell [54] concluded: “We are not able to say... for which topics or students it is best to use animations versus still pictures or models. Nor can we say how these various media can be used together...” (p. 424). Another example that fails to show a significant advantage of animation or static pictures is an experiment with 82 students learning how surfactants dissolve dirt from a surface, in which the authors concluded that “neither animations nor static pictures are generally superior” [44]. To add to this inconclusiveness, there are some voices, for example Schnotz and Lowe [88], who consider it irrelevant to compare statics versus dynamics:

In our everyday lives, we do not continually compartmentalize our environment into static and dynamic parts... Even when we move through a static environment, our visual field is continuously changing... On this basis, it seems difficult to justify a sharp distinction between learning from animated and learning from static pictures, and there appears to be little reason to assume that animations are necessarily easier or more demanding than static pictures. (p. 305).

In addition, to complicate the dynamic versus static comparisons further, there is research that goes beyond these two extremes to include middle ground conditions, which either: (1) lower the temporal granularity of the animation, or (2) show only some important frames of the animation. In one example of the first situation, where the learning content was the kangaroo hopping cycle, participants watching the condition of 1 fps outperformed another group that observed the same depiction at the higher granularity of 8 fps [59]. In one example of the second scenario, better learning outputs resulted when the participants watched either an animation of a flushing cistern or three key frames, as compared to a single static image [41].

It is important to mention that the lack of definitive conclusions to support dynamic or static visualisations may depend on the fact that some studies fail to control all the experimental variables that distinguish both representations. In line with this, a review on animation found that the seeming victories of the dynamic over the still pictures were mainly because the formers presented more or better information, or included facilitators of learning such as interactivity [103]. Take the case of a study with 415 university students who were given the topic of electrochemical features of a flashlight in a lecture supplemented with either dynamic or static visualisations [112]. As the researchers discussed, the better learning of the class exposed to animations could also be explained by alternative factors: (1) multiple exposures (in the animation class the depictions were played several times), or (2) motivation on attention (the animations were coloured, but the static pictures were black and white). Nonetheless, this does not mean that there is no evidence for better outcomes of dynamic versus static pictures in studies that did control these extra variables. For example, a study with 112 university participants, found a significant medium-to-large effect favourable to animations in the retention test of concepts about the rock cycle, even when the interactivity in both conditions was equivalent [55].

Fostering static visualisations, there is accumulating evidence that shows better learning from static pictures instead of animations. For example, in a study that compared illustrations plus text versus narrated animations, Mayer et al. [68] found that the groups with static pictures outperformed the groups with dynamic visualisations. Although it can be argued that this investigation did not control the different media employed (the comparisons were between paper illustrations versus computer animations), the contents depicted were varied enough—lightning, ocean waves, toilet tanks and car’s brakes—to at least show an important tendency to promote static pictures. More evidence comes from a review of animation in e-learning, where Clark [21] concluded: “these studies provide preliminary evidence that at least for relatively simple procedures and processes, nonanimated treatments that communicate motion such as line drawings with motion indicators can result in learning equivalent to that resulting from animations”. Similarly, Koroghlanian and Klein [53], who found in an experiment with biology students that animation implied more instructional time than static pictures, without corresponding improvement, concluded that “[w]hether to include animation or not in multimedia... is still a matter of instinct, not research, and the final decision may be dictated by pragmatic concerns such as budget or time” (p. 40).

So what is the explanation of these results in favour of static visualisations? One advantage of processing static as opposed to dynamic pictures is that, generally, static depictions focus on the fundamental steps of the processes, thus allowing learners to study exclusively from the most important pieces of information [68]. Other cognitive advantage of statics over dynamics is that, learners can control the speed of presentation of the static visualisations in order to meet their cognitive capacities [68]. When studying static pictures, as learners can observe permanently available depictions, they can accumulate information without speed limits via consecutive eye fixations; on the contrary, when studying dynamic pictures, information that was available at one instant disappears a moment later, obstructing the accumulation of information to be processed [40, 88]. In cognitive load theory, this phenomenon has been termed the transitory effect of animations [4], which we describe in a following section.

On the whole, it seems that there is not a unique answer to whether animation or still pictures is a better learning tool. In line with this, a number of moderating variables might be considered, such as learner characteristics. For example, in an experiment with participants with a cognitive style of either visualisers or verbalisers, it was shown that visualisers performed better with animation and that verbalisers showed a trend (not significant) to perform better with static pictures [44]. A more important learner characteristic is most likely spatial ability. A meta-analysis of 27 different experiments that involved learning from visualisations revealed that high-spatial-ability learners were significantly more superior to low-spatial-ability learners when learning with static pictures instead of dynamic pictures. In other words, although high-spatial learners still learned better with animations than low-spatial learners, the difference was reduced as compared to learning with statics. The researchers remarked that, because many studies of dynamic versus static visualisations had not controlled for spatial ability of the subjects, this variable could explain the heterogeneous findings regarding comparisons between both instructional depictions [42].

Another moderating variable besides learner characteristics is depiction characteristics. One of these is the topic to be learned. In fact, there are some contents that seem better portrayed as dynamic rather than static visualisations. One first example where animation shows an advantage is in topics that require students to visualise motion and trajectory attributes [82]. A second example is in recognising faces or facial expressions [1, 11]. However, more empirical evidence for a favourable learning with animations can be found in manipulative-procedural tasks. Correspondingly, a review by Park and Hopkins [78] aimed at finding instructional applications for animations, concluded that the dynamic visualisations were most effectively applied for “demonstrating sequential actions in a procedural task” (p. 443). In a similar way, a more recent meta-analysis by Höffler and Leutner [43] showed that the largest mean effect size in favour of dynamic as compared to static visualisations was found when procedural-motor knowledge was depicted.

Why is learning from either dynamic or static visualisations influenced by the learning topic or task? To understand this, we need to consider the evolution of our cognitive architecture, which has been highly influential in recent developments of cognitive load theory.

2 Cognitive Load Theory and Dynamic Visualisations

2.1 General Aspects of Cognitive Load Theory

Key components of humans’ cognitive architecture are two memory subsystems called long-term memory, and working memory. Long-term memory can sustain large amounts of elements over long periods of time. Unlike long-term memory, working memory (WM) has processing limitations in both the quantity of elements to manage [22, 71], and the retention of these elements [79]. As a result, WM is a limited processor in managing cognitive load when learning.

The total cognitive load imposed on WM when learning can be divided into two different categories [97]: (1) intrinsic cognitive load, and (2) extraneous cognitive load. As its name implies, intrinsic cognitive load cannot be modified substantially by instructional interventions because it is intrinsic to the material being learnt [98]. A learning material with a high intrinsic cognitive load comprises many elements related to one another, so they must by processed simultaneously in WM; by contrast, there is low intrinsic load when the learning material consists of elements that can be processed independently, so they are not to rather than simultaneously in WM [19, 98]. As its name implies, too, and opposed to intrinsic load, extraneous cognitive load is not related to the complexity of the content to be learnt but to the way that this content has been instructionally designed. In other words, extraneous load is a typical problem of poorly designed educational materials. An obvious conclusion is that extraneous load must be minimised when designing instruction, as it diverts cognitive processes from the goal of learning [48].

That conclusion can be regarded as the leitmotif of cognitive load theory [for a recent review, see 99]. In brief, cognitive load theory (CLT) is a learning and instruction theory, which considers the human cognitive architecture in order to prescribe ways to manage cognitive load in learning events [47]. As reviewed by Paas et al. [77], the potential of CLT has been empirically tested in fields such as mathematics, statistics, physics, computer programming, electrical engineering, paper folding, and with different types of participants (children, young adults and older adults). In all these experiments, compared to conventional designs that did not consider the limitations of WM, the designs based on CLT required less training time and less mental effort to reach the same or significantly higher learning and transfer performances.

To summarise, instructional methods that consider the limitations of WM will be more effective than approaches that do not. Then, why do apparently difficult tasks, such as speaking or gesturing, seem not to be constrained by these limits? An explanation to this dilemma can be better understood through the lens of David Geary’s evolutionary educational psychology. Next, we explain this relatively new addition to CLT.

2.2 Cognitive Load Theory and Evolutionary Educational Psychology

Most humans effortlessly learn tasks such as speaking, recognising faces, and gesturing. As Sweller [95] argues, because these tasks are easily learnt before going to school, it is tempting to suggest that going to school is the problem. In other words, the pedagogy of schools—namely, guided methods or explicit instructions— should be blamed. Additionally, it is also tempting to claim that the learning methods of pre-school children—such as pure discovery approaches—should be taught in schools, because they are easily adopted, even by juveniles. However, most of the evidence based on randomised and controlled experiments points in the opposite direction and supports the need for explicit instructions in schools [for reviews, see 50, 61].

This apparent paradox may be solved by not focusing on the methodology of the task to teach but in the task itself. For example, learning to gesture is easy not because it is taught without educational guidance, but because it is an easy task to learn. In other words, although gesturing may involve the coordination of many processes of the motor repertoire that could overload WM, nonetheless it is an easy skill for humans [95]. As evolutionary educational psychology portrays, tasks such as understanding gesturing and nonverbal behaviour, using physical materials as tools, or learning to speak are all effortless for humans because they have evolved those skills over many generations [34].

TheHomo sapiens species had many years, and thus opportunities, to evolve or refine, for example, the skill of gesturing. Using a theory of evolution’s term, the refinement to gesture was driven by the struggle for existence. As Darwin (1859) explained, struggling for existence means that different species compete for limited resources, with the final goal of perpetuating their own kind [as cited in 80]. Therefore, learning to gesture was a skill that evolved because it helped the human species to establish beneficial relationships for accessing essential resources, what to that supported the species in its survival or struggle for existence [34]. In the same vein, every other skill that has benefited humans in their survival should have evolved, and thus be in the species’ motivational and cognitive disposition [34]. As such, these evolved skills are learned easily, without conscious effort and without explicit education [96].

Geary has named these evolved effortless skills as biologically primary abilities, which are grouped under the term folk knowledge. This primary knowledge can be subdivided into three main categories of human survival: (1) folk psychology considers social abilities for survival, such as competing for mates, recognising facial expressions or understanding gesturing and nonverbal behaviour; (2) folk biology is related to the knowledge of local flora and fauna that assists in survival, such as discriminating food from poison; and (3) folk physics includes abilities such as manipulating physical materials as tools [34].

In contrast to the primary abilities, we can mention biologically secondary abilities or secondary knowledge, which is not evolved and thus effortful. As its name implies, this secondary knowledge emerges from the primary knowledge, as the secondary scientific and academic domains converge around the primary areas of folk psychology, biology, and physics [34].

Stated another way, since species evolutionary adaptations have a fundamentally slower pace than intellectual and academic shifts, evolution has only been capable of equipping humans to manage effortlessly biologically primary but not secondary knowledge [95]. This has two implications: (1) we can use the relatively easy acquisition of primary tasks to assist students to learn the more difficult secondary knowledge skills [34], as we shall see in last section; and (2) students need guided methods and explicit instruction that consider their limited WM to learn secondary tasks [95].

Restating the last implication, CLT and resultant strategies to manage cognitive load apply mainly to biologically secondary knowledge [95]. For that reason, the instructional methods for dynamic visualisations that we present in next section are generally aimed at learning tasks or topics that deal with secondary as opposed to primary knowledge.

2.3 Instructional Strategies to Manage Cognitive Load in Dynamic Visualisations

In order to manage cognitive load when designing educational materials of biologically secondary knowledge, CLT has fostered diverse strategies. They are generally referred to as cognitive load effects [e.g. 97], but some are sometimes called multimedia learning principles, especially in the context of Mayer’s cognitive theory of multimedia learning [for a review, see 65].

Building on those strategies, CLT has opened research directions aimed at methods for designing better dynamic visualisations. In line with this, one appealing new field of study is the transitory effect of animations [4]. This effect deals with the fact that some educational animations can be highly transitory, meaning that as the animation progresses, the elements depicted continually disappear. As a result, this transitoriness produces a high extraneous cognitive load in learners, because they are forced to perform three cognitive tasks simultaneously in WM: (1) process the current visible information [4], (2) remember the previous elements that are no longer visible [4], and (3) integrate these two streams of information in order to comprehend the material [106]. To help students cope with these demanding mental processes, CLT has provided various instructional strategies.

Next, we will describe the following methods to manage the transitory problem of dynamic visualisations: (1) pace-control, (2) modality, and (3) attention cueing.

2.3.1 Pace-Control

In a pacing-controlled–also named stepwise, self-pacing, or learner-controlled–animation, the learner can control the incremental progress of the visualisation, as opposed to a continuous or system animation where the depiction controls how to run the display [89]. Hence, the pace-control effect occurs when learners have control over the speed of the animation, and thus they can manage its transitoriness in accordance with the learners’ capacities [66]. As a result, a CLT prediction is that animations may be more effective learning materials if the students have control over the pace of the presentation [3].

Supporting evidence for this effect in animations is an experiment about historical inquiry that was presented in a multimedia tutorial of 11 min. After this study, it was concluded that providing a pace-control feature could facilitate basic (retention) and deep (transfer) knowledge acquisition [60]. Another experiment to support the pace-control effect was conducted with 82 university students, where the instructional content was the chemical process of dirt removal from a surface [44]. In this study, it was shown that the self-pacing groups outperformed the system-pacing conditions, and that the former also reported a lower subjective cognitive load. Extending from the latter results, we could assume that both examples of a better performance in the self-pacing conditions might have been caused by a reduction in the cognitive load of the instructional material.

Related to the pace-control effect is the segmenting effect or principle, which advocates segmenting whole animations into shorter sections that do not overload the learners’ WM capacity [72]. An obvious CLT prediction of the segmenting or segmentation effect is that animations may be more effective learning tools if they are divided into smaller segments [3]. Figure 2 illustrates the application of either pace-control or segmenting strategies to a whole-continuous animation.

Fig. 2
figure 00222

Pace-control or segmenting strategies to a whole-continuous animation

We have included segmenting techniques inside this pace-control discussion because both strategies: (1) share the basic action of controlling the transitoriness of a lengthy animation, and (2) have been used interchangeably or combined in the literature [e.g. 60, 64, 67]. Detailing the second point, we can mention a study consisting of two experiments in which university students received narrated animations in formats that combined partial student-paced (P) with whole system-paced (W) visualisations. In this work, Experiment 1 compared PW (partial student-paced followed by whole system-paced) with WP conditions, whereas Experiment 2 contrasted PP against WW. Taking together the two experiments, it was consistently shown that groups that firstly received the P format outperformed in transfer tests those students who firstly watched the W design. This result has a CLT explanation: starting by studying a short segment of an animation places less unnecessary load on WM than attempting to study the whole longer animation firstly [67]. Although the results support the CLT explanation, this study did not isolate segmenting from pace-control effects. In other words, the experimental conditions differed by two variables instead of one: (1) animations extension (whole versus short segments), and (2) controller agent (system versus learner paced). However, a later study that considered this confounding variable problem was conducted with 72 male primary school students (ages 9–11) who watched a multimedia presentation that explained the causes of day and night. In this experiment, the participants who were assigned to the conditions that followed either the pace-control or the segmentation methods outscored the conditions of system-controlled pace, in the more complex questions [38]. In addition, since the pace-control and the segmentation conditions did not differ between them, these results support use of either or both of these two strategies in designing better instructional animations.

In conclusion, the rationale that both pace-control and segmentation strategies share, is to reduce the transitoriness of a whole animation, thus allowing pauses in the visualisation, which are needed to process both current and previous information shown. Consistent with this reasoning is what has been termed the piecemeal hypothesis of mental animation, where mentally animating a dynamic system is supposed to consist of animating the individual components one by one, in a causal chain of events, rather than mentally animating the whole system in a single process, which should be more cognitive demanding [39]. Although the piecemeal hypothesis fosters divisions of whole systems, whereas pace-control and segmentation effects are related to divisions of whole animations both approaches imply reduction of cognitive load generated by the transitoriness of the visualisations.

Recognising the similarity between pace-control and segmentation techniques, there is a critical difference between them, dependent upon the agent who segments the animation. In this case, the depiction might be segmented by either: (1) the learner, which comprises the pace-control strategy, or (2) the instructional designer, which comprises the segmenting method. As a result, each of these agents influences two factors differently: (1) interactivity, and (2) segment boundaries.

Firstly, if learners segment the visualisation via pace-control, they incur an interaction with the material—chiefly through delivery learner control [47]—, which is generally absent in the segmentation strategy. Although delivery control does not use all of the interactivity potential, it is enough to incorporate learning variables that fall beyond the scope of this chapter, such as motivation and adaptation [e.g. 87]. Since these “interactivity variables” are moderators of the pace-control effect, they should be considered with respect to the studies that employ this strategy [For overviews of interaction in animations, see 74, 111].

Secondly, if designers divide the visualisation via segmentation, they can present the information in meaningful narrative pieces, which may help to better understand the depiction [94]. Put differently, rather than relying via pace-control on students’ capacity to determine where to split the animation’s sequence into meaningful substeps, this decision of finding narrative boundaries is achieved for the students via segmentation. Therefore this boundaries factor should also be considered as a moderator, especially for the segmentation effect [94].

2.3.2 Modality

It is commonly accepted that WM consists of two relatively independent channels or modalities: visual and auditory [6, 7, 20]. As a consequence, it may be more efficient to use both channels simultaneously than to use either independently, especially when the learning situation involves trying to deal with the visual load of an animation that contains written descriptions. In this scenario, learners may manage the transiency of the visualisation better if they are given the descriptions in narrated format, where they can employ the underused auditory channel, while the visual channel processes the dynamic frames. This use of both channels is known as the modality effect, which postulates that it is better to learn from animation and concurrent narration rather than from animation and on-screen written text [58, 65, 66]. Figure 3 shows an example of the modality strategy.

Fig. 3
figure 00223

Modality strategy to an animation with on-screen text

The above explanation of the modality effect can be referred to as the visuo-spatial load hypothesis [86] or the hypothesis of expansion of effective working memory capacity [98]. In support of this effect we mention two experiments with dual-task methodology, where the primary or learning tasks were computer-based training animations about the cardiovascular system or the city of Florence [17]. In each experiment, the participants had a better secondary task performance when the primary task was designed with lower visual load (narration) rather than higher visual load (on-screen text). Since the secondary or monitoring task was a visual-load task (noticing when a letter on the screen changed), these results tend to validate that there are less visual resources available to study when watching an animation with on-screen text versus a narrated condition.

Additional evidence in support of the dual-mode instructional strategy was found in an experiment that employed computer-controlled animation on theoretical aspects of heat soldering. It was found in this study that participants in the narration group showed lower subjective ratings of cognitive load and higher test scores compared to the on-screen text trainees [48]. In another example, researchers compared retention and transfer scores for students who watched either a narrated multimedia or the same depictions with on-screen text. In these studies about lightning storms and car breaking systems, the average modality gain was 33 % for retention and 80 % on transfer scores [65]. Similarly, a related study with animated pedagogical agents showed that the learners achieved better transfer performance with spoken narrative as compared to written texts [75].

In addition to the argument that effective working memory capacity is expanded, we will describe two other hypotheses that may explain the modality effect: (a) social-cue hypothesis, (b) simpleness of the orality. It should be noted that all these hypotheses may not be mutually exclusive.

Firstly, the social-cue hypothesis posits that the modality effect results from the additional interest that a voice can bring to a visualisation, compared to the presence of on-screen text [73, 75]. This hypothesis can be included in the broader social agency theory, which states that the social responses that personality cues prompt in students, motivate them to engage in greater learning [63]. In line with this theory, it has been shown that students can achieve higher transfer scores when narrations are in a more dialogical rather than formal style—personalisation principle—or when the words are not spoken in a foreign-accented or a machine-generated voice—voice principle—[63]. In addition, as the speaker/gender effect shows, female speakers are not only perceived as being more appealing than male speakers, but also when the feminine preference is presented there is a higher problem-solving performance. These results cannot be explained from a purely cognitive view because the processing of either a male or a female voice should require the same amount of mental resources [56]. In this way, the social-cue hypothesis, or other similar social-motivational perspectives, can be considered alternative or supplementary explanations for the modality effect.

Secondly, according to Liberman (1995), the modality effect can also be ascribed to the simpleness of the orality, shown in evidence such as: (a) speech appears before written words both in the history of each individual and in the records of humankind, (b) literacy is not as universal as orality, (c) speech is learned more easily than reading and writing, (d) although text can be overlooked when not attending to the written words, the persistent property of sound hinders speech to be missed [as cited in 73]. In the same line, we can add the described distinction between primary versus secondary knowledge, as humans may have evolved the primary skill of listening, but did not evolve the secondary skill of reading [76].

2.3.3 Attention Cueing

The attention cueing effect—also referred to as the signalling [62] or the attention-guiding principle [12]—states that students learn more deeply when cues or signals are added to a dynamic visualisation in order to highlight the important information and, therefore, to direct the learner’s attention towards it [62]. As a result, a CLT prediction is that animations may be more effective learning materials if the key information is cued or signalled [3].

Cueing techniques can be divided into two broad groups: (1) with added elements, and (2) without added elements. Examples of the former are signalling with arrows or other pointing elements, lines, thick frames, symbols, texts, etc. Examples of the latter can be colours or patterns, dynamic contrasts, transparency shifts, flashing, zooming, panning, exaggeration and simplification [88]. It seems that the current trend is to prefer the latter method without added visual elements, possibly to avoid adding extraneous cognitive load by crowding the visualisation [cf. 25]. Figure 4 illustrates the application of either cueing techniques to an animation.

Fig. 4
figure 00224

Two strategies for cueing an animation

Empirical support for the strategy of attention cueing with added elements was found in an experiment with 129 French primary school participants studying animations of different gear systems. The study showed that students with low expertise levels benefited from attention cueing in the forms of arrows, dots and words [15]. Another example is an experiment that compared cueing (on-screen technical text and colour coding) versus no cueing in a sample of 83 biology college students. In this study, by adding these cues to a system-paced animation of the structure and function of an enzyme, Huk et al. [45] found a medium effect size for retention scores, and also that students reported fewer problems in comprehension. This cueing effect was replicated—but with a smaller effect size—in a more ecologically valid setting of a classroom scenario where the animation was learner-paced and part of a multimedia learning environment [45]. Interestingly, the cueing effects were observed both in 2-D and 3-D representation formats of the enzyme ATP-Synthase animations. More support in favour of this strategy, was found in a study which compared learning about an earth science topic for conditions that included red arrow cueing versus conditions without these added elements. Students in the conditions with signalling had significantly higher learning efficiency gains. However, cueing failed to show an effect in the retention tests, both for concept and process questions [55]. The lack of a more robust effect of cues may be due to the signalling technique used, which employed the extra elements of red arrows that could produce overcrowding of the visualisation.

In line with this, although the outcomes of attention cueing with added elements are encouraging, a more efficient signalling strategy might be one that does not add extra elements to the display, but, for example, increases the salience or visual contrast between the key elements and the secondary components [25]. One example of these salience techniques is a study with 73 psychology undergraduates who were required to learn about the cardiovascular system either from cued or uncued animations. In this example of spotlight-cueing, all components in the animation had their luminance decreased except for the cued key elements. It was found that the cued version resulted in better performances on retention, inference and transfer questions than the animation without this type of signalling [26]. Other example of attention cueing without added elements is an experiment with 102 undergraduate students who learned from different cueing conditions of animation about the cerebral base of language production. In this study, two methods to increase visual contrast or salience where employed, namely, change of colour and sudden appearance. It was found that salient colouring was significantly better than no salient colouring for the tasks diagram completion, process retention, function retention, as well as the perceived ease of use scale. Additionally, it was found that the students who learned with the cueing of sudden appearance performed significantly higher than those without this signalling, in the tasks of diagram completion and function retention [46]. More direct evidence favouring attention cueing strategies without extra elements was found in a study that coloured dynamically important parts of a piano animation. Results indicated better comprehension of the piano mechanism with the colouring method, rather than the use of external pointing in the form of arrows [16].

To conclude, pace-control, modality, and attention cueing are three examples of strategies that can be considered when designing educational animations of biologically secondary topics. For dynamic visualisations of primary knowledge, such as motor skills, we present the following section.

3 Learning Motor Skills Through Dynamic Visualisations

3.1 Video Modelling of Motor Skills

Modelling—also referred to as observational learning—is the learning process by which an individual (the model) demonstrates actions that can be imitated by another individual (the observer) [33]. Models may be live enactments, recorded in diverse ways, described in many different forms or even imagined [33]. Moreover, although we will focus on observation in this chapter, it should be noted that listening and other forms of perception have been employed in modelling methods [e.g. 28].

From this broad area, we will describe modelling via observation of dynamic visualisations, more specifically, video modelling. Two powerful advantages of video recordings over live enactments are: (1) video can incorporate many different situations and types of models, and (2) video can focus attention on certain aspects of the depiction by use of the camera or editing tools [33]. However, consistent with all learning environments, quality products must be pursued. For example, producing videos for modelling of professional skills may involve all the following actions: (1) structuring the content, in preproduction; (2) monitoring the conditions of recording, in production; (3) finding the most instructional efficient depictions, in postproduction; and (4) considering alternatives in all these steps depending on the intended learning outcomes of the trainees [modified from 33].

We will next centre the discussion on video modelling about tasks that involve human motor skills, for example, manipulative-procedural tasks. To illustrate a few examples, we present the following list that shows some video modelled motor skills and their respective references:

  • Whole-Body Skills

    • Executing foul basketball shots with perfect form [36].

    • Following effective behaviours of lecturing in a lesson [92].

    • Performing ballet routines [35].

    • Practising complex movements of a modern dance sequence [23].

  • Manipulative- Procedural Skills

    • Tying nautical knots [91]. Tying a series of 3 Scoubidou knots [5].

    • Learning paper-folding tasks [110].

    • Performing individual assembly tasks, like folding cardboard boxes [32].

    • Disassembling a machine gun [93].

    • Assembling an abstract and novel machine device [107].

    • Building a model helicopter with 54 pieces of an assembly kit [8].

  • Other Hands or Arms Skills

    • Displacing wooden barriers following a pattern [28].

    • Following a percussionist’s hitting pattern through wooden barriers [14].

    • Following an action pattern of moving a lightweight paddle [18].

    • Practicing different first-aid procedures [2].

    • Performing dart throwing subskills [51].

So, what does video modelling of motor skills entail? Using the first levels of the social cognitive model of sequential skill acquisition [51, 90], we divide the learning process of video modelling of motor skills into two sequential steps: (1) observation, where modelling experiences gives the observer a representation of an accurate way to perform the motor skills, and (2) emulation, where modelling experiences continue to be important but now the corporal experiences (i.e. practice) of the observer becomes relevant [51].

As stated, video modelling of motor skills involves both observation and practice. Approaches that consider one step, for example emulation only, may be less effective. For instance, an experiment about 5 dart throwing skills with 60 high school girls allocated to conditions observation plus practise compared to practise alone, found higher levels of dart skill, intrinsic interest and self-efficacy perceptions in the groups exposed to the combined observation and practice condition [51]. A similar result was found in an experiment with undergraduate students following a sequence to displace seven vertical wooden barriers in a fixed time. In this study, a sole practice group forgot significantly more of the task after a period of 48 h, than a combined observation plus practise group [85]. Other studies have compared observation plus practice versus observation alone, showing that the combination is more effective. For example, in an experiment where the task was to replicate a videorecorded percussionist’s hitting pattern, the authors concluded that observation alone may produce approximate reproductions of the motor task, but measurable improvement was only achieved through physical practice [14].

A distinction to make in the first step of video modelling of motor skills concerns the type of model that the student observes, which can be: (1) coping model, who starts performing with errors but gradually corrects them; or (2) mastery model, who always performs in a flawless way. Evidence that supports the coping model is the mentioned study about dart skills, where the participants who watched the coping model displayed higher performance than the group observing the mastery model [51]. In a related vein, a study that contrasted these types of model in 36 snake-anxious female undergraduates, showed that participants exposed to films with coping models—who initially showed fear, but later dominated it—displayed significantly more snake-approach behaviour than subjects watching mastery models—who demonstrated complete fearlessness [70]. Although this experiment involves behavioural change rather than motor learning, it adds empirical evidence in favour of coping modelling. An explanation for the better learning generally reported in coping modelling when compared to mastery modelling, might be that the former tends to demonstrate the desired skill in attainable steps rather than an unrealistic target performance [33]. This explanation can be framed in a CLT interpretation: since the coping model shows achievable steps, each of them contains few key elements; thus, every step in the coping model contains a number of new elements that do not exceed the processing capacity of the learner. On the contrary, a mastery model may exhibit in a single step a greater number of new elements than those the observer can handle.

To conclude, according to the social cognitive model of sequential skill acquisition, modelling experiences may involve not only observation of a model’s performance, but also listening to the performance or watching its outcomes [51]. As all three different modelling experiences—observe actions, listen to actions, and observe results—assist learning, an important question is how does this actually happen? The answers may be at least partially explained by the brain’s mirror neuron system, which we describe next.

3.2 The Mirror Neuron System

Initially mirror neurons were thought to be a class of visuomotor nervous cells that activate when individuals complete a particular motor action and also when they observe other individuals doing a similar action [for a review, see 83]. More recent data has expanded that view indicating that mirror neurons can also be fired when an individual either: (1) listens to actions related to motor skills [e.g. 101], (2) imagines his own actions being performed [e.g. 23], or even (3) observes the result of an action that can be linked to a motor skill [e.g. 57].

Mirror neurons were firstly identified in area F5 of the premotor cortex of the brain in macaque monkeys (Macaca nemestrina). The original findings described that in a macaque’s inferior area 6, termed sector F5, there were neurons that fired during particular goal-directed hand movements, such as holding, tearing or grasping [27]. Furthermore, for actions like grasping with the mouth or the hands, and rotating and manipulating objects, similar F5 neuronal discharges were recorded during direct execution of the actions by the monkey and during observation of these actions performed by the researchers. It was also found that these mirror neurons participated in action prediction. Notably, some neurons where activated both when the macaque performed an action (for example, grasping an object) and when it observed the experimenters performing a related but preparatory action (placing the object near the animal). Thus, mirror neurons were found to play an important role in the rich social interactions (understanding direct actions and intentions) of macaques [27].

In addition to these results in monkeys, a similar mirroring phenomenon has been reported in humans. It has been found that the precentral motor cortex was not only activated when humans manipulated a small object but also when they observed another individual executing the same action. Although the activation observed in the motor cortex was weaker during observation as compared to direct execution, these findings revealed the presence of mirror neurons in humans [37]. Furthermore, these neurons constitute a network called the mirror neuron system (MNS), which has an extensive brain representation, including the parietal, premotor, and subcortical areas.

As well as being triggered when doing or observing an action, the MNS is also triggered when individuals simulate or imagine that they are performing the actual action. For example, in one experiment of dancers observing and imagining another dancer’s body movements, the brain regions that were activated have been related to the MNS. Interestingly, this study also showed that the greatest activation of one of these regions (inferior parietal lobule) was observed when participants imagined dance steps they had practiced for some time, what also means that the MNS is sensitive to previous corporal experience [23].

Similarly, besides being fired when doing, observing or imagining an action, the MNS may also be triggered when a subject listens to actions that can be related to motor skills. For example, one study showed that, when the participants heard sentences describing mouth-, hand- or leg-actions, there was an activation of the same regions that are triggered by observation and execution of those actions [101].

Furthermore, the MNS may be activated by more “indirect stimuli”, such as observing the outcomes of a manipulative task, rather than the direct ongoing action. For example, a study by Longcamp et al. [57] found that a brain region connected to the MNS was activated more strongly during observation of handwritten letters as opposed to printed letters. It appears that this brain area can react to handwritten text, because it is an outcome of a hand action. In other words, for this activation it was not necessary to observe or execute the writing action itself [57]. Another indirect stimulus that may trigger the MNS is the observation of instruments that allow a manipulative task. In fact, parts of the brain motor system have been shown to be triggered with the minimal action of observing manipulable objects [69]. To summarise, the MNS can be activated by executing, observing, imagining, or hearing hand tasks, and also by solely perceiving the end results or the instruments of hand actions.

Finally, the MNS is more strongly triggered when the actions are perceived and undertaken by individuals that belong to the same species. In other words, the MNS is biologically tuned to be activated preferentially in contexts of social human-human interactions. For example, a neuroimaging study showed that a brain region related to the MNS was activated only when observing manual grasping actions achieved by a human hand but not by a robotic hand [100]. More evidence was found when participants had to make either horizontal or vertical sinusoidal movements with their arm while observing either congruent or incongruent arm movements made by either a human or a robot. No interferences occurred with the participants’ execution of arm movements when they observed movements that were: (1) congruent made by robot, (2) congruent made by human, and (3) incongruent made by robot. The only significant interference on arm movement was recorded when the participants observed incongruent arm movements made by another human, suggesting that the MNS is biologically tuned [49].

3.3 Modelling of Motor Skills and Dynamic Versus Static Visualisations

As it was anticipated in the section on evolutionary educational psychology, instructional design should profit from the high efficiency of biologically primary tasks to facilitate the acquisition of secondary knowledge [34, 76]. In the field of dynamic visualisations, when the biologically primary task involves manipulation or gestures, this approach has been termed the human movement effect. This effect proposes that, since learning human movements is a primary skill, humans have evolved to learn it without investing as much WM resources as learning other mechanical (non-human movement) skills. As a result, when watching a dynamic visualisation of human movements, the learner has more WM resources available that can be allocated to deal with additional cognitive load caused by transient information [76]. It is proposed that underlying physiological reason why this is possible is that humans have the MNS that enables the observation and emulation of human movement skills [106].

Evidence that learning of human movement can overcome the transitory effect has been found in studies that have compared instructional dynamic visualisations with equivalent static instructions. A meta-analysis of 26 studies, by Höffler and Leutner [43], found that dynamic visualisations were better for learning than the static equivalents. The advantage of animations was more evident in procedural-motor skill learning than in other types of training.

This advantage has been reported in studies that have featured whole body movements. For example, in learning to imitate ballet steps [35]. In another study, subjects had to predict the weights of boxes carried by an actor. The dynamic visualisations of these events resulted in better predictions than the static images, even considering that the statics where longer available for observation [105].

Other studies supporting this advantage have used manipulative-procedural tasks requiring only the use of hands. Dynamic visualisations have been shown to be superior to static conditions in assembling a firearm by watching a television video [93], and in hand-puzzle rings and knot tying tasks [5]. Figure 5 shows a static picture from this study about knot tying skills.

Fig. 5
figure 00225

Frame from a knot tying task (Reprinted from [5] with permission from Elsevier)

The advantage of dynamic visualisations has also been found with paper-folding tasks. Of particular interest in this study was the finding that representing the hands in the depiction was not necessary for the effect to occur. As long as learners could relate the task to a manual skill, in this case, origami, the hands did not need to be observed [110]. Figure 6 shows a simplified static picture from this study.

Fig. 6
figure 00226

Simplified frame from an origami task (Reprinted from [110] with permission from Elsevier)

Summarising the findings of research that has compared animation with statics, it can be concluded that the depiction of human movement in dynamic visualisations generally aids understanding of whole-body or manipulative tasks. CLT researchers have predicted that, since the transient information of animations is less likely to hinder learning if it includes forms of human movement, dynamic visualisations would be equally effective or more effective than the static counterparts [76, 106].

What is more, human movement depictions may help learning beyond whole-body or manipulative tasks, as it has been reviewed recently in animations that incorporate one or more of the following strategies [24]:

  • Gestures. Learning from dynamic visualisations can be boosted by either: (1) observing gestures, or (2) making gestures. In the first case, the gestures can be added to the visualisations, for example, by employing an animated character who gesticulates. In the second case, students can follow the motion in the visualisation, for example with their index finger—provided that they do not block the view with their hands [24].

  • Manipulation. Similarly, learning from animations can be enhanced by either: (1) observing manipulations, or (2) making manipulations to the system depicted. In the first point, as with gesturing, another person or an animated character can do the manipulations [24]. For example, by showing manipulations to solve ring puzzles, it was observed that the animation condition learned better a related cognitive task, as compared to the static pictures group [5]. For the second point, actually completing the manipulation can be regarded as a more active strategy, where learners interact with the animation, for example by manipulating a virtual replica of the system to be learnt [24]. However, as observed in the pace-control section, interactivity may act as a moderator of any manipulation effect.

  • Body Metaphors. Learning from animations can be boosted by adding human characteristics to the moving elements For example, an animation depicting a loader crane could use a picture of a human arm upon the crane, and the hook could be replaced by a finger [24].

To reiterate, the depiction of human movement helps learning from dynamic visualisations. As previously argued, this human movement effect may be underpinned by the simpleness of biologically primary knowledge. But what is it that makes depictions of human movement easy? In other words, how did the human brain evolve to manage the transitory information of human movement as a biologically primary task? We have argued that it is the human MNS. There are two other research areas that support this argument, which we briefly describe next:

  • Embodied Cognition. Humans possess an embodied or grounded cognition, and not an amodal cognition that can operate without references to one’s own body or environment [9, 10]. This implies that when learners watch a dynamic visualisation, they ultimately link all the contents—even if they contain highly abstract elements and movements—to their prior bodily or environmental experiences. In line with this, the nearer the animation’s components and motions are situated to bodily experiences, the greater the human movement effect. In order to embody their cognitive experiences, humans have evolved the MNS, a system the links motor perception with motor emulation or practice.

  • Social Belonging. Humans possess a sense of social partnership. In other words, when subjects watch a dynamic visualisation about hands executing an action, sympathy for the owner of those hands may be triggered, which might motivate a greater learning [63, 66]. In line with this hypothesis is social agency theory, which includes empirically supported principles that foster a social belonging, such as the personalisation or the voice principles, previously described [66]. In order to experience social belonging, humans have evolved the MNS, a system the links individual embodied experiences with those of other humans.

We end this chapter by concluding that dynamic visualisations can be very effective media for learning human motor skills. It is likely that this advantage exists because of the MNS, which has evolved to help humans learn quite effortlessly procedural-motor tasks. This conclusion was made possible by considering the limitations and evolution of human cognitive architecture. Through such analysis it will be possible to design more effective dynamic visualisations for instruction.