Introduction 

Improvement of education in the twenty-first century is strongly tied to the establishment and application of novel educational and pedagogical technologies. With the constant arrival of new and arising educational technologies, it is crucial to estimate the degree to which technology employment leads to more effective learning and teaching (Iriti et al., 2016).

In this regard, Computer Assisted Language Learning (CALL) as a techno-centric practice (Stockwell, 2007) has hugely contributed to the development of teaching approaches in language classes. The enormous progress in software and hardware technology in this arena has enabled the assistance of complicated user interfaces and multimedia exhibition manners (Alzahrani & Roberts, 2021) that has massively promoted the learning and teaching of languages in the last decades.

The use of computer-based learning environments provides the opportunity of combining different media (text, audio, graphics, motion, etc.) in teaching that enriches the input by making it multimodal and more comprehensible. The benefits of this type of input for language learning are enormous as multimedia has been found to both affect the development of almost all language skills (Liu et al., 2018; Racicot, 2016; Türk & Erçetin, 2014) and has a far-reaching impact on language learners’ personality traits such as motivation, anxiety, and attitudes (Huang & Mayer, 2016; Leutner, 2014; McDonald, 2004). However, an underlying question concerning multimedia instruction is how to efficiently display verbal and visual information to encourage learning in multimedia environments. One response to this subject originates from the modality principle of cognitive theory of multimedia learning (CTML) (Mayer, 20032005, 2009, ; Mayer & Moreno, 2003) and the modality effect of cognitive load theory (CLT) (Sweller et al., 2011).

Based on CLT, working memory (WM) has a restricted capacity that is associated with the mental effort invested in a learning task. This span restriction should be taken into account in designing well-organized instructional materials (Sweller et al., 1998); otherwise, the cognitive load may increase. It is known that the CL enforcement on a student during learning is due to a mixture of the complications of the instructional design (i.e., germane load) and the information to be learned (i.e., intrinsic load) (Sweller et al., 2011). In this framework, multimedia learning principles have been established to regulate the CL of multimedia tasks. These principles are of three categories, i.e., reducing extraneous processing, managing essential processing, and fostering generative processing. Those principles that target the way instructional materials should be designed to minimize extraneous cognitive load are grouped in reducing the extraneous processing principles of CTML. The main goal of applying these principles in multimedia design is to prevent instances when poor instructional design can drain limited cognitive processing capacity without contributing to learning (Mayer, 2014).

Considering the human brain architecture and the hypothesis that biologically secondary knowledge (oracy and literacy development in L2) needs delicate instructional design, the role of principles of reducing extraneous processing in the foreign language learning domain is highlighted. Besides, sound pedagogical approaches to the improvement of English as a foreign language (EFL) comprehension (via both reading and listening) assume that learners should be imposed to attractive, appropriate, and comprehensive language input (Krashen, 1985). Without a doubt, multimodal learning can create such a learning condition as “oral communication is multimodal, that is, speech is just one component part of the great amount of oral and visual information that is conveyed and perceived when we construct meaning” (Jewitt et al., 2013 as cited in Campoy-Cubillo & Querol-Julian, 2015, p. 195). 

Thus, it can be hypothesized that by implementing principles of multimedia task design, CL would decrease, and as a result processing the aural input would be easier. In this way, language learners would face fewer problems during language instruction, can optimize the deployment of cognitive processing strategies, and comprehend the message more easily. This assumption is partially backed up by empirical studies that demand incorporating certain pedagogical and technical principles in designing effective instructional multimedia (Issa et al., 2011; Leutner, 2014; Wang & Li, 2019) . Although previous studies have focused on the role of incorporating selected multimedia design principles in language learning (Dawson et al., 2021; Hung, 2011; Tsai, 2010), there is still a gap in our understanding of the effectiveness of multimedia on the development of comprehension especially when CTML design principles are applied in listening instruction within an EFL context. To fill this gap, the current study has been performed with the aim of applying multimedia learning principles in designing tasks in listening instruction and probing into the development of listening and reading comprehension in the environment of multimodal learning.

Review of Literature

Listening Comprehension and Multimodal Input

Listening is a key component of communication (Nunan, 2002) and “the most widely used language skill in normal daily life” (Martínez-Flor & Usó-Juan, 2006, p. 29). Listening comprehension as a complex and demanding cognitive mechanism involves four overlapping types of processing, including neurological, linguistic, semantic, and pragmatic (Rost, 2011). This demands learners to differentiate between sounds, comprehend the grammatical arrangement and vocabulary, attain acquaintance with the intonation and stress, and contextualize the speech in terms of sociocultural expressions (Vandergrift, 1999). 

While the difficulty of listening comprehension has been attributed to factors such as the difficulty of the task or shortage of linguistic knowledge on the part of the listener, a surge of interest in understanding the contribution of various semiotic resources (verbal, visual, aural, and spatial) to how the message is understood is observed in recent years. Based on the multimodality approach, verbal language analysis is insufficient to understand communication and how messages are sent across (Cui, 2019; Ho & Tai, 2020; Valentini et al., 2018). In this framework, the meaning of input in listening comprehension is transformed as “genuine listening input takes a broader perspective to embrace not only oral features but also visual ones” (Campoy-Cubillo & Querol-Julian, 2015, p. 195). From a purely cognitive perspective, this postulation is supported by Paivio’s dual coding theory (Paivio, 1986, 2007), presuming that cognition happens in two separate but related codes: a nonverbal code for cerebral imagery and a verbal code for language (Sadoski, 2005). The activation of these channels fosters learning, in spite of the fact that the two channels have limited capacity for information processing. One focal goal of any listening instruction thus should be providing language learners with authentic materials encompassing real-life events where comprehension develops through processing multimodal input. As Rost argues “using multimedia involving visuals and audio, and with multiple modes of presentation (e.g., video with subtitles), will increase context, reduce cognitive load, and improve comprehension” (2011, p. 152).

Implementing multimodal learning in listening has a relatively long history tied to the advancements of technological devices and the development of theoretical underpinnings of the brain and memory mechanisms. As literature shows, in pioneering studies, some guidelines for using multimedia in listening comprehension have been provided but coincidental with the development of theory and practice, more empirical studies boomed on the use of multimodal learning in listening. Brett (1995), for instance, designed a multimedia application for developing listening skills in Business English. The basic technical requirements and an overview of the application’s essential features, video, tasks, subtitles, and provision of learner choice were introduced. In another study, Meskill (1996) demonstrated, with illustrative scenarios, how multimedia technology could support a pool of 33 micro-skills of listening that Richards (1985) had claimed to be employed by effective listeners when trying to understand aural input. Neither of these studies performed any experimental investigations to verify the effectiveness of their designed instructional content.

With the development of cognitive psychology and the emergence of revolutionary views toward the brain’s architecture, the scope of listening research has broadened in the last two decades. CTML (Mayer, 2003), the model of thinking process (Moreno & Mayer, 2007), the multicomponent model of working memory (Baddeley, 2012), and CLT (Sweller et al., 2011) dramatically contributed to the way the brain processes information and its cognitive enterprises, the importance of the input modality for working memory, how the working and long-term memory should be fed, and the role of instructional content in assisting listeners in having a better listening experience.

Most of the studies within this arena have focused on how multimodal input vs. single mode input may contribute to listening comprehension. İnceçay and Koçoğlu (2017) investigated the effect of one single mode (audio-only) and three dual input delivery modes (audio–video, audio–video with target language subtitles, and audio with PowerPoint presentation) on listening comprehension. The results demonstrated that the audio with PowerPoint presentation group outperformed other groups in listening comprehension. Zarei and Oruji (2019) examined the effect of three types of glossing (textual, pictorial, and textual-pictorial) on listening comprehension and found that textual-pictorial glosses improved listening comprehension significantly in comparison to the other two conditions. The results supported the superiority of integrating multimedia into cognitive instruction over a metacognitive cycle of teaching. In a case study, Sanguino (2020) explored the impact of audio and video materials presented during pre-, while-, and post-listening activities on facilitating L2 listening comprehension in a group of EFL learners. The findings showed that video materials assisted listening comprehension and were beneficial for other aspects of language learning, such as motivation and cultural awareness. In a recent study, Lee, Liu, and Tseng (2021) examined the effects of four various caption modes (control, real-time, full, partial) on listening comprehension and found no significant difference in learners’ listening comprehension when their caption reliance was unkept in view.

The modality unspecific view, indicating that “reading and listening comprehension are two versions of the same comprehension skill” (Wolf et al., 2019, p. 1748), has inspired studies on bimodality of input focusing on the relationship between reading and listening and how proper instruction in one mode would promote the ability of the other. It is reported that listening comprehension can predict 40% of reading comprehension while reading can predict 34% of listening (Wolf et al., 2019), reading-while-listening makes listening tasks easier and more interesting (Chang, 2009), listening comprehension training has a significant impact on strategic listening and reading (Aarnoutse et al., 1998), and the combination of both modalities is more beneficial for vocabulary acquisition compared to when only one mode is involved (Shamir et al., 2012). The effect of multimodal input on reading comprehension has also been examined by a few studies (Naderi Anari et al., 2019; Pellicer Sánchez, et al., 2020), but the role of multimedia-based listening instruction in the development of both listening and reading comprehension needs further clarification.

Multimedia Design Principles

With regard to the profitable learning of knowledge and the improvement of comprehension, the usage of multimedia is discussed to have the possibility to considerably develop instructional effectiveness (Miller et al., 2011); however, concerns remain about the degree to which its arrangement and application have accomplished or optimized such possibility (Massa & Mayer, 2006). To address these concerns, Mayer (2005, 2009, 2014) proposes twelve practice-driven multimedia principles grounded in cognitive theories of learning and instruction to design effective instructional videos. The principles clustered in three categories, namely, reducing extraneous processing principles (coherence principle, signaling principle, redundancy principle, spatial contiguity principle, and temporal contiguity principle), managing essential processing (segmenting principle, pre-training principle, and modality principle), and fostering generative processing (personalization principle, voice principle, embodiment principle, and image principle). The validity of these principles in designing instructional materials and improving learning outcomes of multimedia instruction has been the focus of some studies.

Issa et al. (2011) probed into the effectiveness of the slides prepared based on multimedia principles in a medical college course. The results showed statistically significant improvements in retention and total scores for those students instructed using multimedia design principles compared with those taught using the traditional design. Similarly, Pate and Posey (2016) examined the effect of multimedia design principles on test item performance, student satisfaction, student confidence in potential exam performance, and classroom dynamics in a medical course. The result showed that students retain information better when presented in a multimedia design adherent format and prefer this method to traditional multimedia.

Schwan, Dutz, and Dreger (2018) investigated the effect of various combinations of text with static pictures based on multimedia principles on visitors’ behavior and knowledge acquisition, as well as the average time visitors spent with the artworks in an art exhibition setting. The result supported the validity of the principles of multimedia learning in informal learning settings and elaborated the assumptions of CTML as a theory that specifies the interplay of multimedia learning material, cognition, and motivation. Nagmoti (2017) examined the effect of multimedia principles on students’ learning and feedback on the quality and content of lectures in a medical course. Significant differences were found between the post-test scores of those who received traditional slides and those who received slides modified based on multimedia principles indicating improved short-term memory, long-term memory, and comprehension. Many students appreciated learning through multimedia slides and suggested their continued use.

Kuba, Rahimi, Smith, Shute, and Dai (2022) designed videos based on multimedia principles for a physics educational game to help learners engage in cognitive processing. The results showed that the designed videos significantly predicted the post-test scores and game levels completed. Pantazes (2021) explored the extent to which higher education instructors who created digital instructional videos for online learning had applied multimedia design principles. The results showed that the instructors often implemented the design principles, but they applied certain principles like redundancy less frequently. The instructors’ personal experiences and preferences had more role in applying the principles than their knowledge of the design principles.

A few studies have examined the validation of multimedia principles in language courses. Ayub, Talib, and Siew (2018) explored the users’ perceptions of using seven multimedia principles, generative learning, spatial contiguity, temporal contiguity, coherence, modality, redundancy, and personalization, in mobile-based Japanese language learning. A mixed methods approach was employed. The results showed that most respondents agreed that the multimedia principles were appropriate for the application design except for the personalization and redundancy principles. Beukes (2019) used four multimedia principles, including the redundancy principle, spatial congruity principle, coherence principle, and personalization principle to design a computer program for teaching vocabulary in a foreign language class. The result showed that except redundancy principle, no significant difference was found for applying multimedia principles in designing the game on vocabulary retention. Schrader, Reichelt, and Zander (2018) investigated the effect of the personalization principle in preparing two different language presentation formats of a multimedia presentation on students’ learning outcomes and interest in the learning material. The result showed a positive effect of personalization on both learning and interest. Liu (2019) examined the applicability of the modality and redundancy principles for English as a second language (ESL) students learning. Both knowledge retention and vocabulary test results indicated that input modes did not have an impact on ESL students’ learning, and consequently the modality and redundancy principles had an insignificant role in instruction.

As this brief review reveals, applying multimedia principles in designing language tasks yields mixed findings. Therefore, the examination of incorporating reducing extraneous processing principles in task design and their possible effects on EFL learners’ comprehension development via listening and reading is open to further research. Focusing on this issue, the current study seeks answers to the following research questions:

  1. 1

    Do educational multimedia presentations designed based on reducing extraneous processing principles of CTML have any significant impact on EFL learners’ development of listening comprehension?

  2. 2

    Do educational multimedia presentations designed based on reducing extraneous processing principles of CTML have any significant impact on EFL learners’ development of reading comprehension?

Methods

Design Phase

A group of researchers consisting of one TEFL faculty member, one TEFL research assistant, and two computer science faculty members teamed up and designed 12 sets of multimedia tasks considering two conditions of applying and violating five principles of reducing extraneous processing of CTML (Table 1).

Table 1 Five principles of reducing extraneous processing of CTML (Mayer, 2014, p. 4) 

The design phase lasted for around four months. First, the goals and topics of the videos were set, the scripts and storyboards were created, and the materials for making the multimedia (texts, images, sounds) were prepared. Then, Corel Video Studio X10 was used to produce multimedia videos. In the following, the principles and a brief account of how each has been applied or violated in making multimedia videos will be presented (Clark & Mayer, 2016).

Coherence Principle

This principle indicates that adding extra material to multimedia can hurt learning, and thus extraneous materials should be excluded from multimedia presentations. To achieve this goal, words, graphics, or sounds that are not directly related to the instructional goal of the multimedia should be removed. Examples of how the coherence principle and its three sub-principles were applied and violated in making the multimedia videos for this study are depicted in Fig. 1.

Fig. 1
figure 1

Examples of two conditions of applying and violating the coherence principle in making the multimedia videos for this study. (a1) Avoiding irrelevant or extra words. (a2) Including irrelevant or extra words. (b1) Avoiding irrelevant/decorative graphics. (b2) Including irrelevant/decorative graphics. (c1) Avoiding background music. (c2) Including background music

Signaling Principle

The main goal of applying the signaling principle is to add visual or verbal cues to the multimedia to highlight the organization of the essential materials and direct the learners’ attention to them. To add verbal cues, the designers can use outlines, headings, vocal emphasis, or pointer words (Clark & Mayer, 2016). To add visual cues, it is recommended to use arrows, distinctive colors, flashing, pointing gestures, and graying out techniques (Clark & Mayer, 2016). Examples of how verbal and visual signaling principles were applied and violated in making the multimedia videos for this study are depicted in Fig. 2.

Fig. 2
figure 2

Examples of two conditions of applying and violating the signaling principle in making the multimedia videos for this study. (a1) Inserting an outline at the beginning. (a2) No outline is inserted at the beginning of the multimedia of the multimedia. (b1) Using graying out to highlight the cues. (b2) No highlighting is applied

Redundancy Principle

Based on this principle, people learn better from concurrent graphics and audio than from concurrent graphics, audio, and on-screen text when the on-screen text is the same as the narration. Examples of how the redundancy principle was applied and violated in making the multimedia videos for this study are depicted in Fig. 3.

Fig. 3
figure 3

Examples of two conditions of applying and violating the redundancy principle in making the multimedia videos for this study. (b1) Using audio + graphics. (b2) Using audio + graphics + text

Contiguity Principles

Based on this principle, people learn better when corresponding words and pictures are presented near rather than far from each other on the page or screen, both spatially and temporarily. According to the spatial contiguity principle, the printed word should be placed as near as the part it describes. Based on temporal contiguity, spoken words should be synchronized with corresponding graphics. Examples of how the spatial and temporal contiguity principles were applied and violated in making the multimedia videos for this study are depicted in Fig. 4.

Fig. 4
figure 4

Examples of two conditions of applying and violating the spatial and temporal contiguity principles in making the multimedia videos for this study. (b1) Contiguity principle was applied. (b2) Contiguity principle was violated

Two English language teaching (ELT) experts reviewed all multimedia videos by completing the ELT multimedia courseware evaluation questionnaire (Jiang et al., 2017) that assesses the appropriacy of integrating five principles of reducing extraneous processing in designing courseware and multimedia. Both evaluators were experienced language teachers and members of the materials development department of their district education office. Considering the comments and suggestions, the multimedia presentations were revised and finalized for instruction. The duration of instructional multimedia presentations was about 5–7 min.

Experimentation Phase

Participants

Thirty EFL learners participated in this study. They enrolled in two advanced English conversation courses. There were 15 students in each class. The sample included both male (n = 17) and female (n = 13) students. Female students comprised almost half of the control group (n = 6) and the experimental group (n = 7). Considering the size of the sample, gender was not considered to be an intervening variable in the design of the study.

The homogeneity of both groups in terms of English proficiency was assessed by the International English Language Testing System (IETLS) test before the study. The result of the independent samples t-test indicated an insignificant difference [t(28) = − 0.715, p = 0.481 < 0.05] between the groups, verifying the homogeneity of their English proficiency before the experiment. Further, the normal distribution of the sample (as a whole and as two groups) was assessed by normality tests, and no violation of the normal distribution was observed.

The participants ranged in age from 18 to 22 (mean = 19.2) and 18 to 23 (mean = 18.8) in the control and experimental group, respectively. The results of the independent samples t-test showed that no significant difference existed between the groups in terms of their age [t(28) = − 0.983, p = 0.334 < 0.05].

The Instrumentation

International English Language Testing System Test

The International English Language Testing System (IELTS) test assesses the English language proficiency of people who want to study or work in English-speaking environments. It provides a fair, accurate, and relevant assessment of language skills, based on well-established standards, and covers the full range of proficiency levels, from non-user to expert user. The IELTS test has four sections, assessing the four language skills, i.e., listening, reading, writing, and speaking. The candidates receive individual scores for each section.

For this study, listening and reading papers of the IELTS were given to both groups prior to and after the study to determine their listening and reading comprehension levels before and after the experiment. The details of the reading and listening tests are summarized in Table 2.

Table 2 Details of IELTS receptive skill test components (IELTS. Cambridgeenglish.org. )

The reliability coefficients of the listening section of IELTS for this study for the pre-test and post-test were estimated to be 0.71 and 0.82, respectively. The reliability coefficients of the reading section of IELTS for this study for the pre-test and post-test were estimated to be 0.78 and 0.79, respectively.

The Textbook

The main objective of the course was to improve the oracy skills of the participants in an advanced conversation course. The main textbook of the course was Open Forum 3 (Parker & Duncan, 2008) whose focus is on academic listening and speaking. The themes feature academic content areas such as ecology, business, and astronomy.

Open Forum 3 includes authentic listening materials and a wide variety of texts-including lectures, radio interviews, news reports, and informal conversations. Students’ awareness of features of spoken English is raised through working on various types of exercises with different speakers of English (Parker & Duncan, 2008) and listening to different English accents that may be encountered in lectures, discussions, or on the radio (Zou, 2007) .

All 12 units of the book were worked on throughout one semester that lasted for four months. The materials for both experimental and control groups were the same. Each unit consisted of eight sections with a variety of activities. A summary of the sections, their goals, and activities is depicted in Table 3.

Table 3 Unit format of Open Forum 3 (Parker & Duncan, 2008, pp. vi-vii) 

The Procedure

Both classes took part in IELTS listening and reading papers before the study. The listening section of the unit was taught based on a comprehension approach by applying a three-cycle of pre-listening, listening, and post-listening.

In the pre-listening phase, the students were familiarized with the theme of the listening tasks and possibly some language forms (grammatical points, new words, etc.). This part of the instruction was the same for both groups. In the while-listening phase, both groups watched the multimedia presentations. The experimental group watched multimedia prepared in accord with the principles, and the control group watched traditionally designed videos. In the post-listening phase, both groups’ comprehension was assessed by various activities, including questions and answers, summary writing, and fill-in-the-blanks.

At the end of the experiment that lasted for 16 weeks, both groups took part in the IELTS listening and reading posttests again to examine the development of their listening and reading comprehension.

Results

The Development of Listening Comprehension

To examine the effect of the intervention on participants’ development of listening comprehension, the multivariate analysis of variance (MANOVA) was used. In this analysis, IELTS listening served as the dependent variable, and the type of instruction (instruction with multimedia designed with CTML principles vs. instruction with multimedia designed without CTML principles) was the independent variable.

The results from the Multivariate Tests suggested a statistically significant difference between the post-test scores of the two groups on the combined dependent variables (Wilks’ Lambda = 0.531, F = 11.927; p = 0.001 < 0.05; ηp2 = 0.469). As Box’s Test of Equality of Covariance Matrices and Levene’s Test of Equality of Error Variances were not significant at p = 0.001, the results of Tests of Between-Subjects Effects were examined.

The results for considering the dependent variables separately (Table 4) showed that the difference between the groups’ post-test scores reached statistical significance just when the comprehension of the monologues was involved. A new alpha level was selected based on Bonferroni adjustment (0.05/2 = 0.025) to avoid error Type I.

Table 4 Tests of between-subjects effects

Based on Cohen’s guideline (Cohen, 1988), the effect size for the intervention (ηp2 = 0.469 > 0.14) was large. The descriptive statistics showed that the experimental group outperformed the control group in the IELTS listening post-test (Table 5).

Table 5 Descriptive statistics for IELTS listening pre- and post-test scores across groups

The Development of Reading Comprehension

To examine the effect of the intervention on participants’ development of reading comprehension, MANOVA was used. In this analysis, the IELTS reading section served as the dependent variable, and the type of instruction (instruction with multimedia designed with CLMT principles vs. instruction with multimedia designed without CLMT principles) was the independent variable.

The results from the Multivariate Tests suggested a statistically significant difference between the post-test scores of the two groups on the combined dependent variables (Wilks’ Lambda = 0.583, F = 6.207; p = 0.003 < 0.05; ηp2 = 0.417). As Box’s Test of Equality of Covariance Matrices and Levene’s Test of Equality of Error Variances were not significant at p = 0.001, the results of Tests of Between-Subjects Effects were examined.

The results for considering the dependent variables separately (Table 6) showed that the difference between the groups’ post-test scores reached statistical significance when both understanding the gist of meaning and specific information were in focus. A new alpha level was selected based on Bonferroni adjustment (0.05/3 = 0.017) to avoid error Type I.

Table 6 Tests of between-subjects effects

The result showed that the effect size for the intervention (ηp2 = 0.583 > 0.14) was large. It was revealed that the effect size for general comprehension (ηp2 = 0.396 > 0.14) was larger than that of understanding the detailed information (ηp2 = 0.194 > 0.14). Examining the descriptive statistics showed that the experimental group outperformed the control group in IELTS reading posttest (Table 7).

Table 7 Descriptive statistics for IELTS reading pre-and post-test scores across groups

Discussion

The potential of multimodal input for increasing the capacity of working memory and assisting comprehension has given rise to research efforts on identifying effective ways to simultaneously present verbal and visual materials. The practice-driven evidence would have profound implications for instructional designers in producing and delivering educational multimedia. Taking this into account, the present study aimed to evaluate the effect of incorporating reducing extraneous processing principles in multimedia task design on comprehension development in listening instruction among EFL learners.

The results first and foremost revealed that incorporating the reducing extraneous processing principles into task design contributed to developing both listening and reading comprehension. This finding gives credence to CTML (Mayer, 2014), as based on this theory, optimum learning occurs when both the auditory and the visual channels in WM are used to a comparable extent. The significant role of multimedia in listening instruction is evident in previous research as it encourages more engagement in performing the task, particularly among low-achievers (Lee & Mayer, 2015), increases comprehension of the aural input (Yang, 2014), and lowers the cognitive load of the listening task (Rahimi & Sayyadi, 2019). In agreement with the literature, this study depicts that multimodal input is effective in promoting listening comprehension; but what it adds to the previous storehouse of knowledge is that multimedia can be even more beneficial in listening courses when it is designed based on practice-driven design principles and human brain architecture. As the findings showed, both groups’ listening comprehension developed as a result of multimedia instruction, but the change was more profound among those who worked on tasks designed based on five principles of reducing extraneous processing.

A more detailed analysis showed that the experimental group better understood the monologues than the dialogues at the end of the experiment. In other words, the multimedia presentations prepared based on design principles helped language learners’ understanding when a single speaker was narrating rather than two or more speakers were conversing. One reason for this finding is that most types of multimedia, such as videos, digital stories, and animated explainer videos, have one narrator and they rarely involve people conversing. This is done to increase the concentration of the viewers and help them focus on a single speaker narrating. Therefore, to improve dialogic ability with digital technology more delicate design procedures and attention to conversations are required. Online sessions combined with multimedia rather than offline multimedia with no chance of interaction and cooperation outside the classroom milieu (Mercer et al., 2019; Park & Kim, 2011) can be a more appropriate multimodal input for listening instruction. More studies concerning Mayer’s multimedia principles in second language learning are also required, as the incorporation of the personalization principle that deals with the formal and informal style of narration has been investigated in multimedia research (Bol et al., 2015; Schrader et al., 2018), but the impact of dialogue vs. monologue voice-overs on learning gains and cognitive load of multimedia tasks is open to further examination.

Focusing on the second goal of the study, it was revealed that listening multimedia tasks prepared based on multimedia principles improved the learners’ reading comprehension. It is suggested that perception is a domain-general competence that is not connected to the modality of the input (Wolf et al., 2019), and based on this, listening and reading are two forms of the same perception competence. The modality of primary data does not affect the generation of the situation pattern, instead, the effect of input modality on perception is “a general comprehension skill that transcends modality” (Gernsbacher et al., 1990, p. 430). This finding is in agreement with what CLT proposes that incorporating an additional mode of input into reading materials such as combining texts with pictures, reading-while listening, and reading captions and/or subtitles in multimedia can manage the cognitive load of reading comprehension (Hannon, 2014; Schaffner & Schiefele, 2013; Schaars et al., 2019). The use of illustrated texts has long been thought to assist comprehension and encourage young L1 readers to read by making reading a more joyous activity. Review studies support this issue and reveal the positive effects of the combination of illustrations and written texts on comprehension and memory in comparison to text-only input (Carney & Levin, 2002; Choi, 2011). An increase in vocabulary learning (Chang, 2009), reading speed (Chang & Millett, 2015), and learners’ satisfaction (Brown et al., 2008) are also among the positive effects of the bi-modal input of reading.

It was also found that the effect size of the intervention for promoting understanding gist of meaning was larger than the effect size for comprehending specific information. The use of multimedia that combines text, audio, and video in reading comprehension shows that multimedia facilitates reading comprehension as students can produce a mental portrait from oral or written language, and their sensory system rapidly transfers fragments to the whole by the image (Wang & Li, 2019). One reason for this can be related to applying the signaling principle to attract the attention of the learners to “the important material in the lesson and how it is organized” (Mayer, 2014, p. 5) by including general ideas as printed words for giving outlines, headings and highlighting these texts. Applying this principle has eased understanding of the text, that in comparison to elementary reading, needs higher stages of cognitive and linguistic expertise for the reader as they need to be competent to comprehend both the literal and inferential meanings of the content (Sun et al., 2013). 

Conclusions

Instructional multimedia design and its incorporation into the teaching of different subject matters have been practiced and researched in the last two decades. Generally, considerable potential of multimedia for learning is realized; however, mixed findings with respect to applying design principles in making effective multimedia presentations for second and foreign language classes are reported. To address this issue, the current study probed into the effect of multimedia instruction designed based on CTML principles on EFL learners’ listening and reading comprehension.

The findings of the study, consistent with the results of a few works, underscore the key role of applying practice-driven design principles in making instructional multimedia to help language learners benefit from the instruction. This draws the attention of language materials developers to the role of multimedia in promoting comprehension and how these contents should be produced more carefully and meticulously. Also, it shows how essential it is to train teachers in materials development and evaluation and make them aware of the importance of CTML principles during multimedia instruction and how it should be designed and used. This matter emphasizes the role of teacher trainers in familiarizing teachers with technological advancement and its implication in theories and approaches of language teaching. The study offers valuable insights into how instructional multimedia can affect the understanding of both oral and written input in a listening instruction and how the application of certain principles leads to optimum cognitive processing involved in both listening and reading and the interplay between these processes.

The findings of the present study should be interpreted considering its limitations. Due to practicality issues and the limitations of the seats in the language lab where the classes were held, the study was performed with small sample size. Further, because of time and budget limitations, out of 12 principles of multimedia, the first five principles were considered in designing the multimedia tasks. Follow-up studies are recommended by incorporating language proficiency and gender as intervening variables. The effects of other types of multimedia, such as digital storytelling or animated explainer videos designed based on multimedia principles, can be examined. Also, due to the scarcity of research, investigating the impact of multimedia on the development of productive language skills (wiring and speaking) is recommended.