
Listening as an important communication skill has been the primary foci of many pedagogical theories and practice-driven instructional approaches in recent decades. Despite the remarkable similarities that exist between first and second language comprehension (Goh, 2000), their complexity is not comparable as the lack of communication opportunities particularly in the English as a foreign language (EFL) context has turned listening into the monster of language skills. Second language (L2) listening literature thus has enriched in pursuit of finding ways to tackle EFL learners’ problems with listening comprehension and aid them in adopting strategies that make oral input processing easier and more efficient.

Listening plays a fundamental role in developing language learners’ communicative competence and is the most frequently used language skill in L2 classes (Hamouda, 2013). Listening is a channel for comprehensible input and a necessary skill for language acquisition (Vandergrift & Goh, 2009) through which “learners can build an awareness of the interworkings of language systems at various levels” (Peterson, 2001, p. 87). The significant role of listening in language curriculum highlights the teachers’ responsibilities to understand the mechanisms of listening and implement listening tasks and activities that assist comprehension (Goh, 2014). As a result, input and its role in speed and depth of comprehension are one of the most debated issues in L2 listening research. Different aspects of input such as originality (authentic vs. modified), communication type (dialogic vs. monologic), and discourse style (planned vs. unplanned) have been surveyed for their possible effect on listening comprehension (e.g., Monteiro & Kim, 2020; Papageorgiou et al., 2012). The coincidence of the evolution of cognitive theories of multimodal learning (Mayer, 2009) with the ubiquitous presence of digital devices and social media in people’s everyday communication attracted considerable attention to a particular feature of input, which is modality (audio vs. audiovisual), and “how visual information enhances linguistic input, or distorts it, or replaces it, and sometimes even contradicts it” (Rost, 2011, p. 50).

Drawing on cognitive psychology and a multicomponent model of working memory (WM) (Baddeley, 2010), listening is redefined as a mental activity that demands the processing of information across various modalities including “acoustic and visual signals” (Imhof, 2010, p. 98). Recent models of listening comprehension set forth that multimodal input is more comprehensible (Rost, 2011) as it enhances contextual clues (Danan, 2019), reduces the CL, and improves comprehension (Chang et al., 2011 ). Integration of different types of multimedia into listening instruction has been examined, and their influence on the development of listening comprehension is reported. Most of these studies have surveyed the role of multimedia as the comprehensible input (e.g., Sayyadi et al., 2022) or the stimulus of the listening phase in the pre-during-post-listening cycle (e.g., Soleimani & Mirsayafi, 2018) in the classroom.

Although listening needs sustained effort and extensive practice outside the class, there is a dearth of research on the effect of homework in general (e.g., Afsharrad & Nafchi, 2015; Sunyoung, 2017) and multimodal listening homework (MLH) in specific on listening comprehension and its CL. This shortage can be attributed to the lack of a systematic classification of L2 listening homework as most types of listening practices instructional designers recommend are suitable for in-class time (e.g., Brown & Lee, 2015; Rost, 2011). The role of practice in developing L2 listening and minimizing its difficulty is supported both by cognitive and second language acquisition (SLA) frameworks (Krashen, 2003). From the cognitive load theory (CLT) perspective, L2 listening is a secondary biological knowledge that needs careful instructional design and practice as it is not acquired naturally and effortlessly (Sweller, 2017). Based on CLT, teaching practices and instructional interventions that let learners utilize the capacity of their WM have great roles in reducing the mental effort of a learning task (Sweller, 2017). This issue is closely related to the way the input is presented (i.e., instructional design) and the type (i.e., multimodality) and the amount (i.e., redundancy) of information that is processed (Sweller et al., 2011). From the SLA point of view, more exposure to oral input and extensive practice play a key role in successful listening.

Reckoning on the principles of the cognitive theory of multimedia learning (CTML) (Mayer, 2014) and the cognitive model of listening (Imhof, 2010), the incorporation of MLH into L2 listening instruction is rational and yet arguable. Based on CTML “ people learn more deeply from words and graphics than from words alone” (Mayer, 2014, p. 1) as multimodal input increases the number of resources the WM can allocate to a task (Sweller, 2016) by organizing the input into visual/pictorial and auditory/verbal channels. Being compatible with this architecture of the human mind, MLH is expected to impose less pressure and CL on students’ cognition in comparison to other types of exercises and thus reduce CL of listening tasks while boosting comprehension.

Further, MLH includes the source and context of listening and thus creates a meaningful environment and more comprehensible input for practicing L2 listening, which is generally lacking in many workbooks of EFL textbooks (Amiryousefi, 2016). To test these hypotheses, the current study adopts a mixed-methods approach and first gathers quantitative data from a one-semester intervention and then triangulates them with the qualitative data of the participants’ perceptions of the experience. The research questions of the study are as follows:

    Does multimodal listening homework have any impact on L2 listening comprehension?

    Does multimodal listening homework have any impact on L2 listening cognitive load?

    What are language learners’ perceptions of multimodal listening homework?

The answers to these questions illuminate the benefits of homework for developing listening comprehension as claimed by extensive listening models (e.g., Brown & Lee, 2015; Rost, 2011) and whether the modality of homework would enhance this effect as underpinned by CTML (Mayer, 2014). The findings also contribute to filling the lacuna in both theory and practice of L2 listening instruction by verifying the educational value of multimedia in teaching and practicing listening comprehension (Sayyadi et al., 2022).

Review of Related Literature

Listening Comprehension, Cognition, and Multimodality

Listening comprehension is axiomatically defined as “the act of understanding what another person is saying” (Sohler, 2020, p. 1) and encompasses three essential elements of perceiving the aural input, building meaning, and linking what was heard to the prior knowledge (Nadig, 2013). These components are seemingly the criteria for expanding the definition of listening by psychological and pedagogical frameworks across the past few decades.

The historical evolution of teaching listening shows how the type of input, schemes of information processing, the mechanism of knowledge construction and retrieval in the memory, and consequently principles of instructional design characterize listening approaches from text-based to comprehension-based and metacognitive models (Goh, 2008). This trend has been influenced by the ubiquitous presence of technology in everybody’s life and how creating and sharing information by merging voice, picture, text, video, and graphics changed the nature of oral communication. Backed with the integration of cognitive theories (Baddeley, 2010; Mayer, 2014; Sweller et al., 2011) into listening comprehension models (e.g., Imhof, 2010), the mechanism and function of WM during listening comprehension and how the input features and mode can alter the difficulty of listening tasks were underscored.

In this scheme, the listener selects the verbal/visual information and then organizes it into two separate but interrelated verbal and pictorial channels, to be integrated into the long-term memory (LTM) (Paivio, 2007). The input thus is processed both by the listener’s ears and eyes, and its difficulty is distributed in these two channels. As this type of input increases the context of listening, it has the potential to reduce its CL and thus increase comprehension (Rost, 2011). This viewpoint is based on three basic assumptions of CTML: (a) dual channel assumption, i.e., there are two separate channels to process visual and verbal materials; (b) limited capacity assumption, i.e., only a limited amount of information can be processed in each channel at a given time; and (c) meaningful learning assumption, i.e., while processing multimodal input, people select the relevant type of information (pictures or words), organize them into either channel of the WM, and then integrate them with relevant knowledge in their LTM (Mayer, 2014).

The practicality of these postulations for listening instruction led to the inception and development of one line of research that focused on the educational value of multimodal input for the enhancement of listening comprehension. The very first studies of this type took the initiative to devise methods and strategies for developing multimedia software for teaching listening (Brett, 1995) or suggesting guidelines on how to integrate multimedia into listening instruction (Meskill, 1996).

Concurrent with the development of psychological views and advancements in technology, other researchers embarked on more applied studies to compare the effects of multimodal and single-mode input on listening comprehension. The literature on integrating multimedia input into second and foreign language learning shows that the studies on multimodal listening ranked second after multimodal vocabulary acquisition (Zhang & Zou, 2022). The effect of multimodal input generated by videos (e.g., Lee & Mayer, 2015), animated videos (e.g., Nhung, 2020), digital stories/books  (Rahimi & Yadollahi, 2017), and virtual reality (e.g., Tai & Chen, 2021) on listening comprehension has been surveyed. The general findings of these studies indicate that multimodal input is often a better source of authentic input and provides students with contextual clues that promote interest in doing listening activities (Shaojie et al., 2022).

However, the findings about the role of multimedia in listening comprehension are mixed as other factors such as participants’ listening proficiency and thinking skills as well as multimedia design principles impact the outcomes of the studies. Lee and Mayer (2015), for instance, examined the effect of single mode (text only) and bimodal input (text and audio) in doing listening tasks by EFL learners. Their results showed that the single mode input was more beneficial for EFL learners than the dual mode input while performing the tasks, which was interpretable by learners’ language proficiency and prior experience of listening to English. Ahmadpour Kasgari et al. (2020) examined the effects of audio-only and audio-video materials on EFL students’ listening comprehension by considering their critical thinking skills. Their results showed that students’ level of comprehension was significantly higher in the audio-video listening condition in comparison to the audio-only condition. Yet, a notable relationship was observed between students’ listening comprehension and their level of critical thinking, meaning that multimodal oral input is more beneficial for those who are more critical thinkers. In another study, Sayyadi et al. (2022) investigated the impact of instructional multimedia tasks designed in two conditions of applying and violating five principles of reducing extraneous processing on language learners’ listening and reading comprehension development. The results revealed that condition 1 was significantly effective in the development of both listening and reading comprehension.

As shown in the reviewed studies, while our understanding of the cognitive processes of listening comprehension has thoroughly changed as a result of CLT, there is still a gap in the literature to identify the benefits of incorporating multimedia into L2 listening instruction, particularly when extensive listening is concerned. Given the association between listening practice, multimodal oral input, and comprehension, it is thus hypothesized that doing multimodal listening homework would lead to better listening comprehension. Therefore, the first hypothesis of this study is as follows:

H1: Multimodal listening homework impacts L2 listening comprehension.

Listening Comprehension, CL, and Multimodality

Based on CLT, L2 listening as a biologically secondary knowledge requires careful instructional design to manage its CL or “the load imposed on the working memory during performance of a cognitive task” (Chen et al., 2016, p. 4). Three types of CL may be generated in the process of doing a mental task (Sweller et al., 2011). Intrinsic load (IL) is created when the task is inherently difficult and the WM is expected to hold lots of new elements in its storage for processing. This type of CL is generally high in L2 listening because the listeners are expected to process many elements simultaneously from linguistic features and activation of prior knowledge to deploying listening skills and strategies. Extrinsic load (EL) is caused by instructional design and when the mental task encompasses redundant information that occupies the capacity of the WM. It is suggested that multimodal input can drain the load of two channels of the WM and thus decrease the EL of listening comprehension (Lee & Mayer, 2015). The germane load (GL) is the effort required to process information into the LTM by making or modifying the schemata. Pre-training on the topic of the listening task and activating enough prior knowledge for doing a mental activity would foster GL of listening comprehension (Mayer, 2014).

Drawing on CLT, a few studies have surveyed the impact of multimodality on listening comprehension and its associated CL in the same project. Chang et al. (2011), for instance, surveyed the impact of multimodal input on English listening comprehension, the CL, and learning attitude in a ubiquitous learning condition among participants with different levels of language proficiency. The findings showed that both proficiency groups had higher listening comprehension in the multimodal listening conditions, while low-proficiency learners in the multimodal condition experienced less EL in comparison to learners in the single-mode condition. Similarly, Inceçay (2012) investigated the effects of delivery mode (audio only, audio-video, audio-video plus subtitles, audio plus PowerPoint presentation) on listening comprehension and CL by a mixed-methods approach. The result revealed that the audio-video plus subtitles condition led to lower listening scores and higher confusion and anxiety.

Yang (2014) investigated the effect of display modes, English proficiency, and cognitive preference on EFL listening comprehension and its CL. The result showed that dual coding improved listening comprehension and reduced the CL regardless of the participants’ cognitive styles and level of English. In the same vein, Chang et al. (2014) investigated the impact of media delivery mode (sound + text versus sound) on English listening comprehension and CL. Their results showed that text significantly enhanced English listening comprehension and lowered CL, and an inverse correlation between English listening comprehension and CL was observed. In a recent study, Karabıyık et al. (2022) compared EFL listening comprehension and CL in four listening conditions: audio, audio with video, captions with video, and audio with video and captions. The results indicated that the audio-with-video condition helped students to have higher comprehension and less difficulty and CL in comparison to the audio-only condition.

In sum, these few studies signal the possible effect of multimodal input on reducing the difficulty of L2 listening by managing its CL through the lens of CLT and CTML. Capitalizing on theoretical underpinnings and to fill the gap in the existing research, it can be assumed that practicing L2 listening with multimedia would assist learners in managing the CL of the listening task. Therefore, the second hypothesis of this study is as follows:

H2: Multimodal listening homework impacts the cognitive load of L2 listening comprehension.

Listening Comprehension and Practice

Listening comprehension demands listeners to take an active role in acquiring and using a range of listening skills to be able to operate successfully in different communication situations. EFL learners acquire and improve listening skills as a result of frequent practice and prolonged experience to reach a satisfactory degree of automaticity in using them (Goh, 2014). In essence, all approaches to teaching listening stress the role of practice in listening instruction; however, their definition of practice differs fundamentally. While practice is characterized as drill and practice in the 50s, it is typified by meaningful and communicative activities in the 80s and strategy awareness activities in the 90s (Goh, 2008).

Because of the educational value of listening activities, language pedagogues have proposed some classifications of listening practices to be used by teachers during instruction. Wilson (2008) offers nine post-listening techniques for practicing and reflecting on what the students have processed in the while-listening phase. Rost (2011) outlines six types of listening practices to prepare the ground for understanding and gleaning more from the input. He, however, gives priority to those types of practices that offer class engagement. Goh (2014) lists eight one-way and four two-way listening tasks that can be done individually, in pairs, or in groups that are often used for in-class practice. Brown and Lee (2015) enumerate a list of 41 activities organized into bottom-up, top-down, and interactive exercises targeting three groups of language learners, advanced, intermediate, and basic. They also specify six performance techniques to be used in class for teaching listening.

The effectiveness of some of these techniques framed in listening assignments on the development of listening comprehension has been put to the test by empirical studies. Fahim and Heidari (2005) reported a non-significant but positive effect of using the concept-map technique as a post-listening practice on EFL learners’ development of listening comprehension. Onoda (2012) reported that using QuickListens (to listen to or watch self-selected materials for 30 min daily) as extensive listening assignments had a significant impact on the development of listening skills and self-confidence among EFL learners. Afsharrad and Nafchi (2015) used transcribing exercises as an input enhancement technique and reported a positive effect on basic learners’ EFL listening ability. Sunyoung (2017) documented a significant effect of using reading aloud of the listening texts as weekly assignments on language learners’ listening comprehension. Nhat (2021) showed that weekly listening assignments focusing on bottom-up activities had significant effects on the development of EFL listening comprehension and autonomy. In a recent study, Aljuhani (2022) reported a significant effect of note-taking on EFL learners’ listening comprehension and their positive perceptions of the usefulness of the technique in helping them to improve their listening.

The synopsis of the findings of the past literature supports the value of practice in developing listening skills, the role of multimodal input in teaching and learning listening skills, and the benefit of multimodality for comprehension by exploiting the capacity of WM and thus managing task CL. It is yet unclear if the combination of these conditions may lead to L2 learners’ optimum performance in listening tasks. Further, concurrent analysis of comprehension and mental effort required to process oral input while doing listening assignments needs further investigation. The goal of this study thus is threefold:

    Comparing the effects of two homework conditions (multimodal vs. single mode) on the development of L2 listening comprehension,

    Comparing the effects of two homework conditions (multimodal vs. single mode) on L2 listening CL, and

    Probing into the perceptions of the participants of the multimodal homework condition regarding their experience with and the educational value of multimodal listening homework.



Fifty-eight EFL learners participated in this study and were organized into the experimental and control groups. The sample included female students who ranged in age between 14 and 15. The participants were comparable regarding their years of English learning, age, and mother tongue. They were studying at grade 9 in a public junior high school in a suburban area of the capital Tehran in the academic year 2022–2023. The sample was randomly assigned into an experimental (n = 29) and a control group (n = 29).

Based on the EFL curriculum of Iran, the students of junior high school should have foundation English proficiency (A1–A2) based on the Common European Framework of Reference (CEFR) (Sabeghi & Rahimi, 2024). All participants took part in the A2 Key test before and after the experiment, and their entry and exit English proficiency was measured. The participants did not have any experience in working with multimedia content in their English classes, because the EFL coursebooks of the junior secondary program (K7–9), known as the Prospects Series, do not have any software program or video clips as supplementary materials. The Prospect Series includes the student book, the workbook, and the audio files of the content.


A2 Key Test

A2 Key is the first level test of Cambridge English Qualifications proceeded by B1 Preliminary which assesses whether a candidate has achieved a good foundation in learning English. For this study, the listening paper of A2 Key was used before and after the study to detect any change in the participants’ listening comprehension. The listening paper has 5 parts and 25 questions of different types such as multiple choice, gap fill, and matching. The reliability coefficients of 0.79 and 0.84 were calculated for the A2 Key listening paper pretest and posttest, respectively.

Cognitive Load Scale (CLS)

To assess students’ CL, the cognitive load scale (CLS) (Klepsch et al., 2017) was used. CLS is a practical and reliable domain-unspecific measure of CL that can be used in various learning situations. It is preferable to other scales of CL such as NASA TLX because it differentiates three aspects of the CL and is designed for naïve raters who do not have explicit knowledge of the concepts of CLT (Klepsch et al., 2017). The scale has eight items anchored on a 7-Likert scale from 1 = very low to 7 = very high that assess three types of CL:

  • IL (two items), e.g., for this task, many things needed to be kept in mind simultaneously.

  • EL (three items), e.g., the design of this task was very inconvenient for learning.

  • GL (three items), e.g., I made an effort, not only to understand several details but to understand the overall context.

The scale has been reported to have satisfactory reliability indices by the developers. The Persian version of this scale has also been reported to have good reliability and validity (Zahed et al., 2012). The reliability indices of the three components of the scale in this study were estimated to be 0.82, 0.80, and 0.81, respectively.

Listening Assignments

Two sets of homework assignments were prepared for the study: MLH and single mode.

  • Multimodal listening homework: Multimodal homework consisted of an animated video listening task and a three-part worksheet. The students should listen, watch, and possibly read some texts in the videos and do the exercises. Six animated videos with a mean length of about 2 minutes on the theme of the conversations of the textbook were adapted from YouTube.

Bloom’s taxonomy was applied to design the worksheets for assessing students’ cognition in six layers of remembering, understanding, applying, analyzing, evaluating, and creating. The activities were organized into six parts: three parts under lower-order thinking skills (remembering, understanding, and applying) and three parts under higher-order thinking skills (analyzing, evaluating, and creating). At the end of each worksheet, the CLS was located for the students to assess the load of the assignment they had completed.

  • Single-mode homework: Single-mode homework consisted of the audio-only listening assignment. The students should listen to the audio and complete the worksheet. The content of the audio tracks and the worksheets were the same as the MLH.

Open-Ended Questionnaire

The multimodal homework group was asked to complete an open-ended questionnaire and express their perceptions of the experience at the end of the study. There were five questions regarding the development of listening comprehension, learning the content of the book, the value of multimodal input, interest in language learning, and the experienced challenges of MLH assignments.


Multiple steps were taken to perform this study. They are explained below in detail.

    Pretest: Both the experimental and control groups took part in the A2 Key test and CLS before the study.

    Homework preparation: To choose suitable content for listening homework assignments, first, a pool of animated videos for each topic was made (18 videos). Two English teachers were asked to evaluate the videos of each theme separately considering their general and pedagogical features by Educational Multimedia Evaluation Checklist (EMEC). Six videos with the highest scores on the evaluation checklist were selected, and their worksheets were prepared. The videos were used for the experimental group without any further adaptation or modification. The audios of the clips were also saved in MP3 format, and they were used for the control group’s homework.

    Instruction: The main listening section of each lesson, that is conversation, was taught based on a three-cycle of pre-during-post listening. One homework assignment was given to students as a part of the post-listening activity to be done at home. The second homework assignment was used as a complementary activity in the fluency section. The students had listening homework assignments every other week. The homework conditions for the two groups of participants were different, as Group 1 did multimodal homework and Group 2 did audio-only homework. The experiment lasted for one semester (4 and a half months) because the academic year in Iran consists of two separate semesters, and students’ achievements are assessed independently in the final exams of each semester. The length of the instruction, the textbook, the workbook, the homework sheets, and the teacher of both groups were the same.

    Posttest: At the end of the experiment, both groups took part in the A2 Key test and CLS again.

    Open-ended questionnaire: The MLH group was asked to complete an open-ended questionnaire and express their perceptions of the experience.


The current research utilized an explanatory sequential mixed-methods design that begins with quantitative data collection and analysis and is then followed up by qualitative data gathering and analysis. The quantitative design included an experimental design to carry out the study “in an objective and controlled fashion so that precision is maximized and specific conclusions can be drawn regarding a hypothesis statement” (Bell, 2009, p. 672). As the study aimed to establish the effect of the independent variable (Multimodal homework vs. single-mode listening homework) on the dependent variable (listening comprehension and its CL), the pre-test-post-test control group approach was used. The participants were divided into two groups randomly, and their listening comprehension and its associated CL were examined before and after the instruction.

The quantitative study was followed by a qualitative content analysis to be able to systematically analyze the data gained from the open-ended questionnaire to gain “insights into inner processes of learning and development” (Mayring, 2023, p. 322). At the end of the study, interpretations of the gathered data were discussed, and conclusions were reached. The research procedure is depicted in Fig. 1.

Fig. 1
figure 1

Research procedure



To ascertain the homogeneity of the participants concerning their listening proficiency and its associated CL, the one-way multivariate analysis of variance (MANOVA) was used. As Table 1 illustrates, no significant differences were detected in the groups’ listening comprehension and its CL before the experiment.

Table 1 The results of multivariate tests on A2 Key and CLS pre-test scores across groups

Effect of MLH on Listening Comprehension

To examine the modality effect of listening homework on the development of listening comprehension, MANOVA was utilized. Initial data analysis displays a general difference between the two groups’ listening comprehension at the end of the intervention [Wilks’ Lambda = 0.775; F (5, 52) = 3.015; p = .018 < .05; ηp2 = 0.225]. The large effect size (ηp2 = 0.225 > 0.16), based on Cohen’s guideline (1988), is indicative of the considerable power of the experiment in improving the participants’ listening comprehension. Further, the results of tests of between-subjects effects show that the MLH group did better in Parts 3 and 4 of the A2 Key posttest (Table 2).

Table 2 Tests of between-subjects effects

Effect of MLH on Listening CL

To examine the impact of MLH on listening CL, another MANOVA was used. A general difference between the CL of listening was detected at the end of the experiment [Wilks’ Lambda = 0.819; F (3, 54) = 3.976; p = .012 < .05; ηp2 = 0.181]. The large effect size (ηp2 = 0.18 > 0.16), based on Cohen’s guideline (1988), shows that the power of the experiment in assisting the participants in managing the CL of listening comprehension is considerable. The results of tests of between-subjects effects (Table 3) showed that the difference between the CL perceived by the two groups can be attributed to the rise of GL in the multimodal homework condition.

Table 3 Tests of between-subjects effects

Further, the comparison between the GLs of different parts of the A2 Key test across groups shows that the GL reported by the MLH group is higher in all five parts (Fig. 2).

Fig. 2
figure 2

GLs of five parts of the A2 Key test across groups

Perceptions of MLH

The data from the open-ended questionnaire were analyzed both manually and with the qualitative data analysis software NVivo 12 Plus. The coding was done in three steps based on Saldana’s guidelines (2016): pre-coding, first-cycle coding, and second-cycle coding.

In the pre-coding phase, the texts were read and reread carefully several times by both researchers focusing on major and minor points the respondents mentioned. In the first-cycle coding, the two researchers went through the texts and individually coded the data. Then the researchers’ codes were compared, and disagreements or differences were discussed. The intercoder Cohen’s kappa was calculated to ascertain the coding reliability (K = 0.91). In the second-cycle coding, the data were inserted into the software NVivo 12 Plus for organizing the classification of themes and subthemes based on the first-coding cycle (Table 4). The computer-assisted and manual codes were compared again, and conclusions were reached.

Table 4 Main themes and subthemes and their meanings

The Effect of MLH on Listening Comprehension

The first question of the open-ended questionnaire was as follows: “What was the effect of watching animated videos while doing the listening homework on improving your English listening comprehension?” Most students’ responses to this question show that they felt their listening comprehension improved as a result of multimodal homework. This improvement is evident in two aspects: cognition and motivation (Table 5).

Table 5 The effect of MLH on listening comprehension

Considering the cognitive domain, the students were aware of the benefit of MLH for understanding the linguistic features (vocabulary, grammar, pronunciation) of the listening and its overall message and gist of meaning. In other words, they believed that the MLH assignments assisted them in both bottom-up and top-down information processing. The most frequently referred to aspect of bottom-up processing is “words,” meaning that animated videos improved comprehension because they aided students in detecting/guessing unknown words or remembering the known ones. The students also believed that the tasks helped them understand the content and the topics of the videos and the lessons of their books better by familiarizing them with the general theme of the listening task.

Observing the second aspect, i.e., motivation, most students have referred to the fact that these tasks impacted their listening because they became more interested in listening, or their negative attitudes towards English listening declined.

The Effect of MLH on Learning the Content of the Textbook

The second question of the open-ended questionnaire was: “To what extent does watching animated videos while doing the listening homework help you remember the content of your textbook?” Almost all students believed that multimodal homework impacted the learning of the content of their textbook by helping them review and practice the taught words and grammatical structures.

They think this type of homework not only reviews the taught materials but also helps those who were absent from the class. They attributed this benefit of multimodal homework to its audiovisual feature and how the multimodal input helped them remember the taught words and structures by watching the images and illustrations and listening to the narration at the same time (Table 6).

Table 6 The effect of MLH on learning the content of the textbook

In their opinion, MLH was perceived to be an advantageous listening practice to just listening to the audio because it extends the class time and gives students more opportunities to work on their listening skills outside the classroom. The reasons they mentioned for their sustained effort in practicing listening include the intriguing nature of the clips, their suitable difficulty level, and the way the MLH supports the content of their textbook.

Homework type preference

The third question of the open-ended questionnaire was as follows: “If you compare this experience with your last year’s listening homework that was done just by audio files, which homework type do you prefer, audio-only or multimedia? Why?” The majority of the respondents asserted that they prefer MLH to audio-only tasks. Their reasons mainly were the role of the audiovisual features of this type of homework in their learning and engaging them in the listening tasks. They also believed that MLH extends their memory, assists them in doing the listening tasks, and helps them learn English better. Just one person preferred audio-only homework, and two students also believed that both types of tasks are useful and beneficial (Table 7).

Table 7 Homework modality preference

Challenges of MLH

The fourth question of the open-ended questionnaire was as follows: “Did you have any problems while doing multimedia homework?” Most students said that they did not have any problem with MLH. A few said that they had some problems with understanding the pronunciation of some words or the speakers’ accents. Yet, they deployed listening strategies such as inferencing to guess the words they did not know or understand by watching the contextual clues provided by the pictures. They had difficulty with the rate of speech, but they managed to tackle this problem by watching the clips several times. Despite the mentioned challenges, most students still prefer this type of homework to traditional listening homework assignments (Table 8).

Table 8 Challenges of MLH

Interest in Learning English

The fifth question of the open-ended questionnaire was as follows: “After doing this type of homework, do you feel you like learning English more than before?” Almost all students said that their interest in English has increased more than before. In their opinion, this type of homework condition has affected both their learning ability and interest.

As for learning, they think retention of the materials, communication skills, curiosity, and linking the new content to their background knowledge have improved through this type of homework. They also believed that MLH has helped them develop other language skills, particularly speaking.

Most students think that they are more interested in English as their learning motivation has positively increased and their negative attitudes towards English have declined. They asserted that they like English more now, they are interested in spending more time learning English, and they are more willing to do English tasks and activities (Table 9).

Table 9 MLH group’s interest in learning English


The current study probed into the effect of multimodal listening homework on EFL learners’ listening comprehension and CL by gathering and analyzing both sets of quantitative and qualitative data. The results of quantitative data analysis primarily revealed that multimodal homework assignments influenced the development of listening comprehension at the end of a one-semester intervention. This finding needs to be examined and interpreted from different aspects. First and foremost, the outcome of the study corroborates the cognitive model of listening in the sense that selecting, organizing, and integrating multimodal input (Imhof, 2010) when the capacity of the WM is optimized would lead to a better understanding as the resources of the WM are freed and allocated to the listening task. As the participants’ perceptions of the experience show, they remembered the materials longer since the combination of visuals with texts and narrations caused the material to stick in their minds longer (Tables 6 and 7). In support of the implementation of CTML principles into L2 listening instruction, these findings are in alignment with those few empirical studies that delineated a positive effect of multimedia content in boosting comprehension of the oral input (e.g., Karabıyık et al., 2022; Sayyadi et al., 2022; Yang, 2014).

Second, listeners can process the oral input more efficiently when they obtain visual aids during listening (Becker & Sturm, 2017) because they have more contextual clues (Shaojie et al., 2022) and thus build referential connections between verbal and nonverbal information (Paivio, 2007). Multimodal input provided the listeners with complementary information about the situation and aided them in identifying the speakers, communication goals, and setting (Rost, 2011). This is particularly backed by further statistical analyses in this study as the multimedia condition provided the basis for the development of listening comprehension particularly when contextualized dialogues and monologues were listened to in Parts 3 and 4 of the listening test. In other words, as supported by other studies (Shirazi & Rahimi, 2023), practicing listening with multimedia can improve students’ imagery repertoire and help them develop a mental image of the topics to understand the gist of meaning more efficiently. This is also reflected in the participants’ responses to the open-ended questionnaire as they believed that the content was easily understood through watching the images and pictures while synchronizing them with what was being said (Table 5). Similar to a limited number of studies, this outcome casts doubt on the appropriacy of the redundancy principle of multimedia in L2 learning, as integrating redundant text and visuals into the listening tasks not only did not hinder comprehension but also helped L2 learners understand the meaning of the words and the content of the oral materials better (e.g., Lee & Mayer, 2014).

More importantly, the outcome of the study uncovers the potential of multimedia input for listening practice and reflects the fact that multimedia is advantageous for both listening instruction in the classroom and extensive listening outside the class to reinforce listening skills and strategies. There are plenty of listening skills and strategies that language learners are required to master to become successful in L2 communication. The development of listening skills needs extensive instruction and practice, and the limited time of language classes often forces the teachers to spend the time on other skills than listening. As shown in this study, MLH functioned as a valuable source of comprehensible input and assisted students in developing their listening in line with the objectives of the instruction. Out-of-class listening tasks make students more self-regulated and let them plan their listening, monitor their comprehension, and assess strategy deployment independent of the class setting (Zeng & Goh, 2018). Despite language teachers’ and learners’ negative attitudes towards the educational benefits of listening homework (Amiryousefi, 2016; Wallinger, 1997), as shown here, well-designed assignments can meet the needs of the students and provide an opportunity for students’ academic success (Costa et al., 2016). This is also documented in the MLH group’s perceptions of multimodal homework stating that doing MLH was more interesting and joyful than the conventional audio-only homework they had experienced in their English classes before. Also, they liked the video clips and found them useful for reviewing and practicing their lessons and developing other language skills. Therefore, they were motivated to do their listening assignments even if they were difficult or they had missed the class session of that lesson (Tables 6, 7, 8, 9).

Further, the findings of the study showed that multimodal homework led to an increase in the GL of listening comprehension which supports better comprehension (Kolfschoten et al., 2010). One possible reason for this finding is that GL has given more cognitive resources to the mental task of listening in a multimodal homework condition by activating prior knowledge and filling the gap of topic unfamiliarity. When GL increases as a result of expertise in a topic/subject, the resources from extraneous processes are redistributed to deal with IL (Endres et al., 2023), and thus, the tension of the mental task is relieved. The lower pressure on the listeners in processing the oral information leads to more engagement in the task, less anxiety and fear, and thus higher motivation to do the task (Debue & van de Leemput, 2014). The finding also aligns with the fact that enhancing GL needs practice to enforce self-explanation as higher GL is reported by learners who are highly engaged in the learning activities, can make a link between what is known and the new information, and do the activities that boost memory (Klepsch et al., 2017). In this study, as reflected in the qualitative data, multimodal homework assignments efficiently fulfilled these conditions because they focused on both sets of lower and higher cognitive skills (bottom-up vs. top-down) and demanded students to carefully watch the animated videos and answer the questions (Table 5).

The students’ responses to the open-ended questionnaire displayed that multimodal input has the potential to support the success of L2 listeners in all three linguistic, cognitive, and affective components of listening (Karalık & Merç, 2019). In their view, the multimodal condition optimized their cognitive processes, and both bottom-up and top-down information processing played a large part in the development of their comprehension as a result of this practice condition. As for bottom-up processing, linguistic features and in particular words are mentioned to have a pivotal role in increasing comprehension of the oral texts. This outcome, in line with previous research, underscores the role of bottom-up processing in the development of listening skills (Brown & Lee, 2015) and particularly the role of vocabulary knowledge as a correlate of L2 listening comprehension (Karalık & Merç, 2019). It also validates the positive effect of multimedia and animated videos on increasing incidental vocabulary learning (Arifani, 2022). As for top-down processing, the multimodal homework helped the participants expand their knowledge of the topic and understand the content better. Top-down processing is associated with the learners’ ability to use their previous experiences to understand the meaning of the overall text (Utomo & Sulistyowati, 2022), and multimedia suitably provides such a condition. This is a justification of what has already been reported by the quantitative data regarding the higher GL perceived by MLH participants at the end of the experiment (Table 3), higher GL of different parts of the A2 Key (Fig. 2), and their better performance in the listening posttest (Table 2).

It was also found that multimodal homework had a considerable impact on students’ learning of the content of their textbook by providing them with review and practice opportunities. Simultaneous watching of the video and listening to the audio supported word retention and enhanced guessing of the meaning from the context. This outcome first and foremost is a support for the value of homework in reinforcing in-class learning and inspiring students to learn (Blazer, 2009) as well as aiding them in the development of study skills and habits (Bempechat, 2004). Second, the frequent use of the word ‘review’ in the responses, accentuates the need of EFL learners for extensive and outside listening activities, particularly at the beginning levels, to let them adjust their own pace of learning. There are abundant studies on the benefits of instructional videos for personalized learning in the form of recorded class lectures (Zarrinfard et al., 2021), and as the findings of this study show, multimedia affords that same potential when it functions as an assignment. This outcome is supported by the participants’ higher GL at the end of the experiment (Table 3) which is indicative of the students’ engagement in active learning through appropriate integration of the selected audiovisual data into their LTM.

The students’ affective side has also been influenced by the multimodal condition as during the experiment they were more motivated to do the assignments, felt less anxious while processing the input, and were more engaged in the listening activities. One possible reason for these positive emotions is the tremendous potential of technology-enhanced learning environments to raise learning motivation (Wei, 2022), positive attitudes to learning (Wesely & Plummer, 2021), self-efficacy (Zhang, 2022), and active learning and engagement (Schindler et al., 2017). Another reason can be related to the type of multimedia that was used in preparing the homework, i.e., animated videos. Animated videos are often used for young language learners or children, and they are considered to be very interesting, engaging, and motivating multimedia content (Laksmi et al., 2021).

The only problems that the participants had with the multimedia homework were understanding some new words and the pronunciation of the narrators. Both problems are expected to arise when students face authentic content. Fortunately, the students themselves have found some ways to resolve their issues with unknown words as they used the inferencing strategy to guess the meaning from the context. As for pronunciation, more instruction and practice are required to enhance students’ pronunciation as both teacher support (Cengiz, 2023) and practice (Celce-Murcia et al., 2010) contribute significantly to the development of L2 pronunciation.


The gap in the research on the integration of multimodal input into the practice phase of listening instruction inspired the researchers of the current study to examine the impact of multimodal listening homework on EFL learners’ listening comprehension and its associated CL. The study benefited from a mixed-methods approach, and both listening test scores and self-assertion assessment of CL as well as perceptions of the experience in response to an open-ended questionnaire were obtained. The outcome displayed significant improvement in the participants’ listening comprehension and a rise in the listening GL after a one-semester intervention. Both IL and EL declined in relation to the pre-test scores; however, the differences from the pretest to the posttest were insignificant. The perceptions of the students of the experience unravel the potential of multimodal listening practice to impact all three components of listening, i.e., linguistic, cognitive, and affective.

Undoubtedly, the limitations the researchers encountered in the process of the survey influenced the interpretation of the results. Due to time and financial resource limitations, the animated videos were not designed and developed by the researchers, although the researchers made every effort to select the most appropriate ready-made videos. This may have affected the full understanding of the content of the videos. Further, the participants of the study were limited to just girls studying in public schools at the basic level of language proficiency, and because of bureaucratic procedures, experimentation in other schools was not possible. This can limit the generalizability of the findings to the student population as boys did not participate in the study and gender was not a mediating variable in this study.

It is suggested that the impact of teacher-made and ready-made multimedia in designing multimodal listening assignments be considered in future research. Also, cross-comparison of the performance of participants with different proficiency levels and genders in multimedia homework conditions is recommended. Follow-up research may be performed by utilizing neuroscientific devices such as eye trackers and EEG to shed more light on psycholinguistic aspects of practicing L2 listening in multimodal conditions.

The study has some contributions to the literature on multimedia learning. First, as the findings showed, the use of multimedia in practice and doing exercises is as valuable as its use in in-class instruction. Second, multimodal input does not necessarily impact all types of CLs substantially, and the issue of CL should be reconsidered in different contexts of teaching and learning. Last, multimodal input not only has great potential for enhancing learning outcomes but also can affect language learning attitudes and motivation significantly.

The practical implications of the study for EFL teachers, the Ministry of Education (MOE), and EFL material developers are evident. EFL teachers are advised to include multimedia input into the teaching of language skills, particularly oral skills that are challenging for students, and provide assistance to learners through scaffolding and providing constructive feedback. The MOE is required to redefine the IT policies and practices of mainstream education and pave the way for designing and using technologies in instruction. This demands focused attention to empowering in-service teachers through holding workshops as well as integrating TPACK courses into the curricula of teacher education universities. EFL material developers are encouraged to implement technology into the practice phase of instruction to enhance language learners’ self-regulatory skills and self-directedness.