Objective and rationale

Instructional video is a type of multimedia instruction in which graphics are in the form of motion pictures recorded by a camera and words are in the form of speech and background sounds recorded by a microphone (Mayer 2009). As shown in Table 1, examples of instructional video include video lectures such as used in online courses or as resources archived in learning management systems (LMSs), video demonstrations of how to carry out a task such as found on YouTube, or educational TV shows or documentaries. For example, imagine you are watching a recording of a video lecture from a college chemistry class, or a demonstration of how to build a complex electrical circuit, or an episode from a TV documentary on wildlife in Antarctica. How should these instructional videos be designed to foster learning? In light of the increasing popularity of instructional video both in formal and informal learning (Fiorella and Mayer 2018), the goal of this review is to examine five ways to increase the effectiveness of instructional video (as well as one way not to use video). Our intention is to provide evidence-based examples from our own research rather than to offer a comprehensive review of the broader literature (Derry et al. 2014).

Table 1 Types of instructional video

Historical overview

The instructional use of motion pictures (and video) has a long history dating back more than 100 years (Cuban 1986; Orgeron et al. 2012; Saettler 2004). Table 2 summarizes four phases in this history. The first phase, starting in the early 1900s, involves the rise of short motion pictures for the general public on topics such as the lifecycle of bees or personal hygiene. People would come to local venues to experience the novelty of seeing a short movie and discussing it. However, after this brief initial enthusiasm for movies with instructional themes, the fledgling motion picture industry took a hard turn towards movies for entertainment as the primary focus.

Table 2 Four phases in the history of instructional video and movies

In the second phase, reaching its heights in the mid-1900s, educational movies moved to the classroom, with an explosion of films in academic subject areas intended for school use (Orgeron et al. 2012; Saettler 2004). This phase (and the previous phase) were instigated by the Thomas Edison’s invention of motion picture technology and his advocacy for educational applications as capsulated in his 1922 pronouncement: “I believe that the motion picture is destined to revolutionize our educational system and that in a few years it will supplant largely, if not entirely, the use of textbooks” (Cuban 1986, p. 9). During the mid-1900s, Audio-Visual Departments opened in many school districts, and teachers had access to a wide range of educational movies (Cuban 1986; Saettler 2004). However, after a few decades of access to educational movies, the evidence shows that "most teachers used films infrequently in classrooms” and “film may have entered the teacher’s repertoire, but, for any number of reasons, teachers used it hardly at all” (Cuban 1986, p. 17). Although educational television, which began in the 1950s was intended to provide another medium for educational shows and documentaries, ultimately, it too came to be rarely used by classroom teachers after an initial period of enthusiasm (Cuban 1986).

In the third phase, during the last third of the 1900s, the emerging technology of video recording had its major impact on the general public in the form of video for personal use rather than for education. People could create their own videos of family events and store them on cassettes. This era represents the opening of video production to the general public, but it did not greatly impact education.

As can be seen, throughout the twentieth century, there have been several cycles of enthusiasm for the educational potential of visual technologies followed by a disappointing lack of implementation in education. Finally, in the fourth and current phase, which can be called the Internet video age, we have access to a great variety of instructional video in informal learning venues, including How-To videos such as on YouTube, videos meant to inform and inspire such as through Ted Talks, subscription services for on-the-job training such as Lynda.com, and academic assistance sites such as the Khan Academy. Similarly, in formal learning venues, we have access to online courses such as in the form of Massive Open Online Courses (MOOCs) or as recorded video lectures for student review in college courses. The current phase contains elements of each of the previous phases, as well as the same potential danger of becoming yet another cycle of initial enthusiasm followed by lack of educational impact. This review is intended to show how scientific research can play a role in improving the design of instructional video in all its various forms, and thereby, increase its viability for education and training.

The following sections review five evidence-based principles for how to design instructional video and one for how not to use it, based on examples from our own research. Our rationale is that instructional videos that are based on evidence-based principles are more likely to be effective. As shown in Table 3, we summarize how our findings suggest the following principles: dynamic drawing, gaze guidance, generative activity, perspective, subtitle, and seductive details.

Table 3 Design principles for instructional video

Dynamic drawing principle

Description

The dynamic drawing principle is that people learn better from a video lecture that shows the instructor drawing graphics as she lectures rather than referring to already drawn graphics. Video lectures found in online courses (such as MOOCs) or as course resources in Learning Management Systems (LMSs) often involve an instructor standing next to projected slides while lecturing or an instructor standing next to a board and writing on the board while lecturing.

Example

Figure 1 shows screenshots from a video lecture on the Doppler Effect in which the instructor draws on a whiteboard as she lectures (left panel) or points to already drawn illustrations on the whiteboard (right panel). The two versions involve exactly the same graphics and exactly the same script (Fiorella and Mayer 2016), but simply differ in whether or not the video shows the instructor drawing as she lectures.

Fig. 1
figure 1

Screenshots from a video lecture on the Doppler Effect with the instructor drawing as she lectures (left panel) or pointing to already drawn illustrations (right panel)

Evidence

Fiorella and Mayer (2016) asked college students to view a short video lecture on the Doppler Effect and then take a transfer test. In one experiment, students with lower prior knowledge performed significantly better on the transfer test when they received a video lecture with the instructor drawing graphics while lecturing rather than pointing at already drawn graphics. In a companion experiment, in which only the instructor’s hand was shown on the screen, viewing the instructor draw graphics was significantly more effective than the control condition regardless of learners’ prior knowledge. In contrast, the benefit of instructor-drawn illustrations was eliminated when the video did not show the instructor’s hand doing the drawing (Fiorella and Mayer 2016; Fiorella et al. in press). An important boundary potential condition is that the instructor’s body, particularly the instructor’s hand during drawing, may be an important component in video lectures that involve instructor-drawn graphics.

Theory

According to social agency theory (Mayer 2014), seeing the instructor draw while lecturing is a social cue that can foster a sense of social partnership, leading to deeper learning. Similarly, according to embodiment theory (Robbins and Aydele, 2009), seeing the instructor’s hand in action can prime a sense of self-reference, in which the learners feel as if their hand is doing the drawing, which leads to a more salient learning experience. This approach is consistent with the embodiment principle, in which the instructor’s body movements can guide the learner’s cognitive processing (Mayer 2009, 2014). Finally, video lectures with instructor-generated drawings are consistent with basic principles of multimedia instructional design (Mayer 2009) including the signaling principle (i.e., showing where to look in the graphic), the segmenting principle (i.e., breaking the graphic into smaller segments), and the temporal contiguity principle (i.e., coordinating visual and verbal aspects of the lesson).

Implications

A common video lecture format is for the instructor to talk about slides as they are presented as complete graphics. However, this practice conflicts with the dynamic drawing principle and suggests benefits of using elements of the classic talk-and-chalk approach in which an instructor writes on a board as he or she lectures. Another common video lecture format is to have the instructor’s narration synchronized with the graphics being drawn (but without any writing instrument or hand being shown), such as in Khan Academy lectures. However, this practice also is not entirely consistent with the dynamic drawing principle, because students do not see a human hand doing the drawing. The research work summarized in this section suggests that video lectures should contain at least some instances showing the instructor writing or drawing on a board or screen while lecturing. It may be particularly important for learners to see the hand that is doing the writing or drawing.

Gaze guidance principle

Description

People learn better from a video lecture when the onscreen instructor shifts gaze between the audience and the board while lecturing rather than looking only at the board or only at the audience. The act of looking from the audience to the board can be called gaze guidance, because it is intended to suggest that the learner should look at the relevant portion of the board (van Wermeskerken and van Gog 2017).

Example

Consider viewing a video lecture on how human kidneys work in which the instructor writes on a conventional whiteboard (as exemplified in the left panel of Fig. 2) versus on a transparent whiteboard (as exemplified in the right panel of Fig. 2). A conventional whiteboard is commonly used in classrooms, and in extreme situations the instructor only looks at the board while lecturing. A transparent whiteboard involves a glass surface that the instructor stands behind and writes or draws on while facing the camera and lecturing. A computer algorithm then transposes the writing or drawing as a mirror image so it appears normal to the learner who views the video lecture. In this case, the instructor looks at the audience, shifts gaze to the board when she writes or draws, and shifts gaze back to the audience and so on. In both cases the spoken lecture and the writing and drawing on the board are identical, but access to the instructor’s eye gaze differs (Fiorella et al. 2019a, b).

Fig. 2
figure 2

Screenshots from a video lecture using a conventional whiteboard (left panel) or a transparent whiteboard (right panel)

Evidence

In one set of studies study, Fiorella et al. (2019a, b) reported that college students who learned about human kidneys from a video lecture with the transparent whiteboard (and had gaze guidance from the instructor) performed better on a transfer test than students who viewed the video lecture with a conventional whiteboard (and had no access to the instructor’s eye gaze). In another set of studies, Stull et al. (2018a, b) also found that students who learned about chemistry from a video lecture with a transparent whiteboard significantly outperformed students who learned from a video lecture with a conventional whiteboard on an immediate posttest, but only at a nonsignificant level on a delayed posttest. In an eye-tracking study involving a video lecture in chemistry (Stull et al. 2018a, b), college students in the transparent whiteboard group tended to look more at the instructor’s face and less at the material on the board than students in the conventional whiteboard group, although the transparent group performed only slightly better than the conventional group on a delayed posttest. Overall, there is some evidence that students learn better from lecture videos when gaze guidance cues are visible, such as with a transparent whiteboard lecture, but more research is needed to help explain the boundary conditions for when this effect is and is not found.

Theory

Transparent whiteboards allow the learner to have eye contact with the instructor, which is a social cue intended to build social partnership between the instructor and the learner. According to social agency theory (Mayer 2014), when students feel the instructor is working with them in partnership, they try harder to learn the material. This may be part of the explanation for why students learn better with transparent whiteboards. An additional explanation is that the videos using transparent whiteboards are more consistent with multimedia design principles, such as the signaling principle, because instructor’s gaze shifts can guide where the student looks.

Implications

Overall, there is emerging evidence that learners are sensitive to the instructor’s eye gaze in instructional video. Based on these findings, it may be useful for video lectures that include an onscreen instructor to make sure that the instructor looks at the audience while talking and sometimes shifts gaze to the board to signal where to look. Instructional videos where the instructor looks directly at the audience throughout a lecture may be less effective than those in which the instructor occasionally looks over at the material on the board that he or she is talking about.

Generative activity principle

Description

People learn better from a video lecture or demonstration when they are asked to engage in generative learning activities during learning. Generative learning activities are behaviors that the learner performs during a lesson with the intention of improving learning (Fiorella and Mayer 2015). Examples include taking summary notes (i.e., learning by summarizing) or writing an explanation (i.e., learning by self-explaining) or physically imitating the instructor’s demonstration (i.e., learning by enacting).

Example

Suppose that you are asked to view a 16-min video lecture on computer programming or a 22-min lecture on a statistical procedure. You could simply watch the video or you could write down summary notes as you watch it. This is the comparison examined by Peper and Mayer (1978).

Evidence

Across two experiments, college students who were asked to write down summary notes as they viewed a video lecture on computer programming or statistics performed better on a subsequent transfer test (without having access to the notes) than students who simply viewed the lesson without taking notes (Peper and Mayer 1978). The effects were particularly strong for low-knowledge learners. Similar results were obtained across two experiments involving a 23-min video lecture on how car engines work (Peper and Mayer 1986) and one experiment involving an 11-min video lecture on taking photos with a 35 mm camera (Shrager and Mayer 1989). This work shows the benefits of prompting low-knowledge students to engage in the generative activity of summary note-taking during a video lecture.

Concerning another generative activity, college students viewed a video demonstration of how to construct a complex circuit board and then took a posttest on what they had learned (Fiorella et al. 2017). Some students were given the same board and elements, and asked to imitate the instructor’s actions while watching the video, whereas others simply watched the video without imitating. The group that engaged in the generative activity of imitation (or what can be called learning by enacting) performed better on the posttest, suggesting the benefits of prompts to engage in generative activities during viewing a video demonstration.

In another study on the generative activity of self-explaining (Fiorella et al. in press), college students viewed a 12-min narrated video lecture on how the human kidneys work that was broken into five segments, and after each segment they either rewatched the video (control group) or wrote an explanation in a booklet (self-explanation group). On a transfer posttest, students in the self-explanation group performed better than students in the control group, thereby showing the benefits of what can be called learning by explaining in the context of a video lesson.

Theory

According to generative learning theory, the act of taking summary notes or physically copying the actions of the instructor in a video demonstration or writing explanations after each section of a video lecture can prime three cognitive processes during learning: selecting, which involves focusing on the important information; organizing, which involves mentally building a coherent structure; and integrating, which involves using relevant prior knowledge (Fiorella and Mayer 2015). Engaging in appropriate cognitive processing during learning results in deeper learning outcomes that are better able to support subsequent performance on transfer tests. This approach complements other multimedia design principles aimed at fostering generative processing (Mayer 2009, 2014).

Implications

The educational impact of video lectures and demonstrations can be enhanced by prompts that encourage students to engage in generative learning activities while viewing the video. For video lectures, prompting students to take summary notes during a video lecture (or writing explanations during breaks throughout the video lecture) can be an effective practice. For video demonstrations, there is preliminary evidence to support the practice of asking students to imitate the actions of the instructor during the video demonstration. These activities may be particularly effective for low-knowledge learners. In short, this work suggests that instructional videos should be supplemented with prompts to engage in an appropriate generative activity.

Perspective principle

Description

The perspective principle is that people learn better from narrated video of a manual demonstration when it is filmed from a first-person perspective rather than a third-person perspective. A third-person perspective involves placing the camera across from the instructor as she or he demonstrates a sequence of actions (as is common in YouTube videos), whereas first-person perspective involves placing the camera on or above the instructor’s shoulder or forehead (as in GoPro videos).

Example

Figure 3 shows screenshots from a video demonstration of how to construct an electrical circuit that was filmed from a first-person perspective (left panel) or a third-person perspective (right panel). The two versions of the lesson show exactly the same actions and have exactly the same narration, but simply are filmed from opposite perspectives (Fiorella et al. 2017).

Fig. 3
figure 3

Screenshots from a video demonstration on circuit building recorded from a first-person perspective (left panel) or a third-person perspective (right panel)

Evidence

Across two experiments, conducted in the United States and in the Netherlands, students who viewed the first-person video performed significantly better on posttests than students who viewed the third-person video (Fiorella et al. 2017). This pattern was obtained when students were asked or not asked to imitate the instructor during learning, and when students were asked or not asked to give explanations during the posttest.

Theory

First-person perspective is a social cue that is intended to make learners more involved in the actions shown in the video. First-person perspective is intended to prime a sense of self-reference in which learners are more likely to feel as if their hands are building the circuits, thereby creating a stronger memory for the actions in the video. This interpretation is inspired by embodiment theory which holds that people think and learn with their body as well as their mind (Robbins and Aydele 2009; Wilson 2002). This approach complements other multimedia design principles aimed at fostering generative processing such as the personalization principle, which involves using conversational language, and the embodiment principle, which involves using appropriate gesturing (Mayer 2009, 2014).

Implications

The most straightforward implication of the perspective principle is that first-person perspective should be used for How-To videos ranging from construction and repair tasks, to cooking, to medical procedures. Research is needed to determine the conditions under which the perspective principle applies.

Subtitle principle

Descriptions

People learn better from a video documentary in their second language when the words are printed (or printed and spoken) rather than spoken. Based on research with native speakers, the modality principle is that people learn better from graphics with spoken words than graphics with printed words (Mayer 2009; Mayer and Pilegard 2014), and the redundancy principle is that people learn better from graphics and spoken text than from graphics, spoken text, and printed text (Mayer 2009; Mayer and Fiorella 2014). However, when learning from video lessons in a second language, these two principles are reversed.

Example

Suppose that Korean college students view a slow moving 16-min narrated video about wildlife in Antarctica, with the words spoken in English. In an attempt to aid comprehension, we could add subtitles at the bottom of the screen that duplicate what the narrator is saying, or we could simply replace the narration with the subtitles.

Evidence

Lee and Mayer (2018) asked Korean college students to view a 16-min video on wildlife in Antarctica taken from a TV documentary with the script in English. Students performed better on a comprehension posttest if they viewed a video with printed words rather than a video with spoken words (i.e., reverse modality effect) or a video with printed and spoken text rather than a video with spoken text alone (i.e., reverse redundancy effect). However, adding subtitles to a fast-paced 9-min episode of a science TV show containing dialogue in English, did not help non-native English speakers perform better on a subsequent comprehension test, presumably because they lacked the cognitive capacity to process the fast moving material in the subtitles (Mayer et al. 2014). Thus, a potential boundary condition for adding subtitles to support learning from narrated videos in a second language is that the pace of lesson should be slow enough to allow students to be able to process the subtitles without overloading their working memory.

In another set of three studies, Korean college and high-school students performed better on a comprehension test if they received a narrated video on wildlife in Antarctica (in English) than if they received solely the audio without the video (Lee and Mayer 2015; Mayer et al. 2014). Apparently, seeing the video helped students be able to identify some of the names of the creatures described in the audio.

Theory

Spoken words are transient whereas printed words can be revisited. When words are presented in a second language, they may be hard to perceive or identify, so learners may need to revisit them. In this case, printed words are more helpful because they are available for a longer duration. In contrast, printed words may be less helpful for native speakers because words on the screen compete for processing capacity in the visual channel and may cause the learner to miss some of the visual information in the video, particularly when it is fast paced. This work helps to identify a boundary condition for the redundancy principle, which holds that printed words should not be added narrated graphics (Mayer 2009; Mayer and Fiorella 2014).

Implications

This work suggests that when students are viewing an instructional video in their second language, it would be useful to add subtitles and make sure the pace is slow enough not to overload working memory.

Seductive details principle

Description

People do not necessarily learn better when interesting but extraneous video is added to a multimedia lesson. Although it might be tempting to insert exciting video clips or a window with a talking head, these features can turn out to be seductive details. Seductive details are interesting but irrelevant words or graphics that are added to a lesson, and have been shown to be distracting (Mayer 2009; Mayer and Fiorella 2014).

Example

Consider a multimedia lesson on how lightning storms develop. The lesson is scientifically accurate but may be a bit dry. To spice it up, we can insert short video clips showing spectacular lightning storms (Mayer et al. 2001).

Evidence

Mayer et al. (2001) reported that college students who viewed a multimedia lesson on how lightning storms develop performed significantly better on a transfer test if short video clips involving lightning storms were not interspersed in the lesson. In a computer-based game on how plants grow, college students did not perform better on a transfer test when narrated animations included a window showing a talking head giving the explanation (Moreno et al. 2001). In both studies, inserting interesting but irrelevant video into a multimedia lesson did not help student learning.

Theory

Seductive details, such as attention-grabbing but irrelevant video clips, serve to distract the learner. As a result, the learner engages in extraneous cognitive processing (i.e., cognitive processing that does not support the instructional objective) and thereby has less cognitive capacity available to engage in deeper cognitive processing during learning. Thus, the learner is less able to build a meaningful learning outcome capable of supporting transfer test performance.

Implications

Consistent with other research on the distracting effects of seductive details (Mayer 2009; Mayer and Fiorella 2014), this work suggests caution in using video as an entertaining decoration within a lesson. Consistent with other research on the image principle (Mayer 2009, 2014), adding a talking head to the screen does not add instructional value. Instructional video should be used to help learners build knowledge rather than mainly to promote excitement or arousal.

Conclusion

This brief review of our research on learning with instructional video shows that progress is being made in developing evidence-based principles for how to design effective instructional video. This review was limited to research conducted in our labs, as our intention was to provide examples rather than a comprehensive review or an exhaustive list of principles. The principles suggested in this review should be seen as tentative guidelines that are subject to additional research.

Our ongoing research seeks to understand whether these results can be replicated, the conditions under which they apply, and the learning mechanisms by which they operate. For example, although our focus was on young adults with low prior knowledge, it would be useful to examine how these principles apply to other age groups and types of learners. Also, although our focus was on short lab-based lessons in STEM domains, it would be useful to determine how the principles apply in classrooms, with longer lessons, and in other domains. Research is needed to continue to expand the list of evidence-based principles and to examine related media such as virtual reality and augmented reality. We will consider this review to be successful to the extent that it stimulates further research that better pinpoints evidence-based principles for the design of instructional video.