1 Introduction

Educational animation, a pedagogical tool amalgamating audio-visual elements, exhibits significant appeal and instructional potential in contemporary education. It is extensively employed across diverse learning environments to foster active student engagement and facilitate profound comprehension (Krieglstein et al., 2023), particularly emerging as a pivotal instrument in science education and related domains (Barak et al., 2011; Castro-Alonso et al., 2019). Concurrently, to enhance students' learning outcomes, textual cues are commonly integrated into educational animations to direct attention (Arslan-Ari et al., 2020; Lowe & Schnotz, 2014). However, despite the evident benefits of employing textual cues in educational animations to enhance learning experiences and comprehension, certain studies have highlighted that inadequately designed textual cues may diminish instructional successfulness by increasing cognitive load (Alpizar et al., 2020; Castro-Alonso et al., 2018; De Koning & Jarodzka, 2017).

To optimize future design and teaching practices in educational animation, thereby enhancing student learning outcomes and experiences, this study applied Kintsch's (1998) construction-integration model theory to develop three types of textual cues: subtitle textual cue (STC), keyword textual cue (KTC), and structured textual cue (CTC). The study specifically focuses on digital natives born after 2010, whose interaction and cognition with multimedia resources may differ significantly from previous generations (Tarchi et al., 2021). Employing a quasi-experimental approach, the study investigated how these different textual cues impact students' learning outcomes, including achievement and knowledge retention. Additionally, the study examined the contributions of educational animations with these textual cues on students' cognitive load and self-efficacy, while exploring the relationships among achievement, knowledge retention, self-efficacy, and cognitive load.

2 Literature review

2.1 Educational animation

Educational animation is a form of animated media designed specifically to teach or convey educational content. It utilizes visual storytelling, motion graphics, and often sound to simplify and illustrate complex concepts, making them easier to understand and more engaging for learners. (Krieglstein et al., 2023). Educational animations have five key characteristics, as summarized by Ploetzner and Lowe (2012) through a systematic literature review: (1) The technological platform for presenting animation is the electronic devices; (2) Educational animation aims to provide explicit explanations of the entities, structures, and processes involved in the subject to be learned; (3) Educational animation consists of entities created manually through drawing or other modeling techniques; (4) Educational animation mimics the process of a series of different images to be learned. With advancements in technology, the use of educational animation in technology-enhanced learning environments has become more prevalent and easier to implement. (Castro-Alonso et al., 2019; Yilmaz, 2023). It helps students understand abstract concepts through dynamic visual presentations (Barak et al., 2011). Numerous studies highlight the significant potential of educational animation compared to other instructional media. For example, Ploetzner et al. (2021) found that educational animation is more beneficial than static visualization for supporting dynamic perceptual learning and the acquisition of kinematic models. In the field of science education, comprehensive educational practices have been developed by many scholars (Arslan-Ari et al., 2020). Türkay (2016) demonstrated that educational animations in physics classrooms positively affect students' knowledge retention and classroom engagement. Karlsson (2010) found that educational animations can engage students in collaborative problem-solving, leading to meaningful construction of scientific knowledge. Consequently, the use of educational animation in science learning is increasingly recommended by researchers (Tosun, 2022).

2.2 Cognitive load in educational animation

While the use of educational animation in pedagogy has gained widespread acceptance, some studies express concerns regarding its potential to impose higher cognitive loads (Castro-Alonso et al., 2018; Krieglstein et al., 2023). Cognitive load refers to the mental workload imposed on an individual's cognitive system during task processing or information acquisition. According to Paas and Van Merriënboer (1994), cognitive load can be divided into two components: mental load and mental effort. Mental load pertains to the demands placed on the cognitive system by the task itself. It is influenced by the complexity of the content and the manner in which information is presented. Mental effort refers to the amount of cognitive capacity that an individual allocates to meet the demands of the task. According to the Cognitive Theory of Multimedia Learning (Mayer, 2024), students must rapidly process information from both visual and auditory channels within a short period, when watching educational animations. They also need to integrate new information with prior knowledge and keep relevant information active in working memory (Mayer & Pilegard, 2014), all of which contribute to increased cognitive load. Research on cognitive load in educational animations highlights the need for well-designed animations. For example, poorly designed educational animations can overwhelm students' limited working memory capacity, leading to cognitive overload and hindering learning outcomes (De Koning & Jarodzka, 2017). In response, instructional designers must consider the constraints of cognition to optimize the use of educational animations. Consequently, there is a growing call in the scholarly community to enhance technological elements in educational animations and combine novel theoretical frameworks to reduce cognitive load and amplify the educational benefits (Arslan-Ari et al., 2020; Lowe & Schnotz, 2014).

2.3 Using textual cues in educational animations

The VARK theory highlights the diversity in learning styles, emphasizing the importance of offering multiple avenues for information delivery to meet the needs of all learners (Nguyen et al., 2024). This theory supports the shift from relying solely on visual and auditory stimuli in educational animations to integrating textual cues, representing a significant advancement in instructional design. By presenting textual information alongside animation, educators can create a multimodal learning environment that caters to diverse learning preferences and optimizes cognitive processing (Yilmaz, 2023). Textual cues act as guiding markers within the animation, directing students' attention to key instructional points and facilitating learning achievement (Clark & Mayer, 2016; Mayer & Pilegard, 2014; Wang et al., 2020). Textual cues help students corresponding information from the different external representations into a coherent mental representation to reduce students' cognitive load (Richter et al., 2018). Additionally, textual cues can improve the accuracy of knowledge retention (Rop et al., 2018). However, not all textual cues lead to better learning outcomes. Some researchers have demonstrated that inappropriate textual cues can result in a heavy cognitive load for students, negatively affecting their achievement and knowledge retention (Alpizar et al., 2020; Arslan-Ari et al., 2020). Berney and Betrancourt (2016) concluded through a meta-analysis that the pedagogical value of educational animations is affected by the manner in which accompanying textual information is presented. This suggests that textual cues in educational animations need to be carefully designed to be impactful and engaging, ensuring they enhance comprehension and retention.

2.4 Construction-integration model

In the realm of text information type and design, the Construction-Integration Model (CIM) posited by Kintsch (1998) serves as an important theoretical framework within cognitive science, elucidating the mechanisms underlying information processing during textual comprehension. The model emphasizes that readers' understanding and retention of text involve hierarchical representation and processing stages to construct coherent meaning, highlighting the dynamic and interactive nature of this construction process. CIM categorizes text understanding aimed at cognitive construction into three levels of representation: surface level, textbase level, and situation model level.

The surface level involves the decoding of textual information in the flow of discourse, akin to the supportive role played by subtitles in educational animations (Tarchi et al., 2021). Subtitled text information in educational animations refers to written language that appears synchronously with the animated content, providing viewers with concurrent written language cues while engaging with dynamic visual imagery (e.g., Clark & Mayer, 2016; Matthew, 2020). The textbase level corresponds to the conceptual and propositional content of the text's meaning, equivalent to critical information presented alongside the narrative in educational animations. Textual cues can be subtle and flexible, utilizing color in written texts or varying intonation in spoken texts to draw attention to key terms (e.g., Rop et al., 2018; Wang et al., 2020). The situation model level is constructed through various forms of reasoning, including knowledge-based inference. During the integration process, students integrate relevant background knowledge activated by textual cues to generate a coherent representation (Kintsch, 2019). Moreover, it has been suggested that dynamic models can be constructed at finer-grained levels, illustrating how the knowledge structures depicted in animations scaffold students' partial cognitive structures (De Koning et al., 2009). However, despite these considerations, the application of situation model level informed textual cues within the domain of educational animations remains underexplored, prompting calls for empirical research from scholars such as Tarchi et al. (2021). Thus, this study adopts the principles of the CIM to develop educational animations featuring three distinct text types and explores the practical instructional effects of educational animations produced using different types of textual cues.

2.5 Self-efficacy and learning outcomes

Self-efficacy refers to an individual's belief in their ability to perform a specific task proficiently and confidently (Pintrich et al., 1991), which is crucial for motivation and perseverance in educational settings. In the context of multimedia instruction, particularly in learning environments enhanced by educational animations, self-efficacy is considered a crucial predictor of learning outcomes (Mayer & Pilegard, 2014; Semilarski et al., 2022). This assertion finds robust corroboration in a vast corpus of research, which underscores the advantageous influence of self-efficacy on academic accomplishments, particularly within the realm of science education (Lin, 2021; Tosun, 2022). The notion of self-efficacy as a favorable contributor to learning is further fortified by a recent systematic review of literature on technology-assisted education (Granic, 2022). Nonetheless, amidst these affirmations, social psychological research raises concerns. Researchers have drawn attention to potentially adverse consequences resulting from exaggerated self-efficacy, epitomized by the Dunning-Kruger effect, where individuals might overestimate their competence to meet set performance targets (Dunning, 2011). Given the pivotal role of self-efficacy in shaping scientific learning outcomes and its multifaceted impact on instructional quality, there is a need to explore the association between students' self-efficacy and achievement when exposed to diverse types of textual cues embedded in educational animations. This study endeavors to shed light on how varying textual cues might influence the self-efficacy-learning outcome relationship among the learners, thus providing insights into optimizing the integration of educational animations in science education to improve their learning benefits.

2.6 Present study

Educational animations are prevalent in teaching due to their vividness, intuitiveness, and simplicity, demonstrating great potential in improving learning outcomes (Krieglstein et al., 2023). However, the efficacy of educational animations depends not only on their visual and auditory appeal but also on their impact on learners' cognitive processing and cognitive load (Castro-Alonso et al., 2018; Paas & Van Merriënboer, 1994). Specifically, the strategic use of textual cues within animations can significantly aid learners in comprehending and retaining key concepts while reducing unnecessary cognitive load (Alpizar et al., 2020; Arslan-Ari et al., 2020; Richter et al., 2018). The question of which types of textual cues can best achieve these goals still requires further exploration. Moreover, self-efficacy has been identified as a crucial factor influencing learning effectiveness (Pintrich et al., 1991; Mayer & Pilegard, 2014; Semilarski et al., 2022). Interestingly, high self-efficacy in multimedia environments has been found to potentially harm learning outcomes (Dunning, 2011). Whether self-efficacy in educational animations with different types of textual cues also exhibits this double-edged effect is worth investigation.

Despite the importance of these factors, research on the effects of different types of textual cues in educational animations on learning effectiveness, cognitive load, and self-efficacy remains limited. Therefore, this study, grounded in the Construction-Integration model (Kintsch, 1998), developed three types of textual cues for educational animations: subtitled textual cue (STC), keyword textual cue (KTC), and structured textual cue (CTC). A quasi-experimental design was employed to compare the teaching effects of these three types of textual cues on learning performance and psychological experience. This study aims to fill gaps in the existing literature, provide a scientific basis for the design of educational animations, and offer practical guidance to educators on balancing self-efficacy and actual learning needs in educational design.

Specific research questions include:

  • RQ1: What differences exist in the achievement of students when exposed to educational animations with different types of textual cues?

  • RQ2: What differences exist in the knowledge retention of students when exposed to educational animations with different types of textual cues?

  • RQ3: What differences exist in the cognitive load experienced by students when exposed to educational animations with different types of textual cues?

  • RQ4: What differences exist in the self-efficacy of students when exposed to educational animations with different types of textual cues?

  • RQ5: What correlations exist between achievement, knowledge retention, cognitive load, and self-efficacy of students when exposed to educational animations?

3 Method

3.1 Participant

The study recruited a sample consisting of fifth-grade pupils from a Chinese public primary school, with ages ranging from 10 to 11 years old. A total of 261 students volunteered to take part in the experiment, with four unable to complete the entire study because of sickness or scheduling issues, resulting in 257 valid subjects overall. All participating students and their parents were informed that the data collected would be anonymized and solely used for research purposes. All participants in the study shared the same science teacher, who provided full cooperation throughout the experimental process, thereby guaranteeing a standardized teaching tempo and consistent content delivery across the board.

3.2 Research procedure

The research procedure is shown as Fig. 1. During the preparation phase, we developed three types of animated materials required for the study and recruited participants. All participants were informed about the experiment's basic information and procedure without compromising the research's integrity. Subsequently, the researchers randomly assigned all participants into three groups. The STC group consisted of 87 participants, with 52% male and 48% female. The KTC group had 84 participants, with 49% male and 51% female. The CTC group comprised 86 participants, with 51% male and 49% female.

Fig. 1
figure 1

Research procedure

Based on the study by Arslan-Ari et al. (2020) on the effects of prior knowledge on learning outcomes from educational animations, all participants completed a prior knowledge test. A one-way ANOVA indicated similar levels of prior knowledge among the three groups (F = 1.021, p > 0.05). Additionally, the subject of this study was "Earth's rotation", and none of the participants had previously studied the relevant course. Therefore, considering these two points, it can be concluded that prior knowledge is unlikely to interfere with the results of this experiment.

Over three consecutive days, participants in each group watched the corresponding educational animations in batches, with each type of animation lasting 367 s. To minimize potential distractions and other sources of interference, participants viewed the educational animations in a familiar computer classroom environment. In this learning environment, the computer systems operated under the centralized supervision of a teacher. The teacher initiated and concluded the animations uniformly, standardizing the collective viewing process. Throughout the animation, the playback speed and video progress were predefined and locked, preventing individual learners from adjusting the speed or progress in real-time. Each student was provided with headphones to watch the videos independently. During the viewing, students remained in a quiet environment, which helped them focus and engage with the presented material.

After watching the educational animations, students had a five-minute break, followed by a 30-min knowledge post-test and a combined 10-min cognitive load and self-efficacy questionnaire to assess the achievement and learning experiences of the three groups.

One month later, all participants from the three groups took a 30-min delayed post-test at a unified time to evaluate their knowledge retention. It is important to note that during this month, all participants continued their regular science classes under the same science teacher, which did not involve the use of the three types of educational animations or content related to Earth's rotation.

3.3 Educational animation materials

The educational animation utilized in this study focused on the topic of "Earth's rotation". According to the national science curriculum standards of China, this topic is included in the curriculum for students in grades 5 or 6 (MOE, 2022). To ensure the validity of the content, the animation was adapted from current Chinese textbooks and expanded to meet the needs of the students. The development of educational animation materials adhered to established principles: (1) Utilize animation technology to simulate the Earth's rotation and the resulting shift between day and night, effectively presenting dynamic information that is not easily captured in static material; (2) Employ a visual design that aligns with the educational objective to engage students' curiosity, including vibrant hues, appealing interactive characters, and lucid scenes; (3) Divide intricate concepts into manageable, sequential steps and vividly transmit knowledge; (4) The teaching objectives and learning orientations are clearly defined, and the content is concise and presented in easily understandable language; (5) Interactive elements such as questioning are incorporated to increase student participation and interest in learning, as well as to promote metacognition and knowledge construction; (6) Both the content and presentation are appropriate for learners’ cognitive levels; (7) The animated dialogue is accurately and dubbed with a moderate speech speed for full student comprehension.

Three versions of textual cues were developed to correspond to the three design sets in this study, as presented in Fig. 2. The first type is the Subtitle textual cue (STC), as illustrated in (a) (d) (g). The STC adds subtitles at the bottom of the educational animation, which comprises all the dialogue text and is dynamically updated. To ensure visibility for students, the subtitle color is in contrast with the page (for example, white subtitles are used on a dark blue background). The second type is the Keyword Text Clues (KTC), as depicted in (b) (e) (h). KTC displays crucial keywords at relevant positions during the course content, and each keyword remains visible for 2–4 s before disappearing. To ensure keyword clarity, a large and prominent font is used. The third type is the structured textual cue (CTC), as depicted in (e) (f) (i). Structured textual cues are dynamically generated following the development of the animated content, always displaying on the left screen. Newly appearing textual cues are in red and slightly larger than the rest of the text. It is noteworthy that the educational animations viewed by the three groups were identical in content, with the only variation being the type of textual cues used. Furthermore, the duration of all educational animations was standardized to 367 s. The specific definitions, characteristics and uses of textual clues are shown in Table 1.

Fig. 2
figure 2

The example screenshot of each textual cue

Table 1 The specific definitions, characteristics and uses of subtitle textual cue (STC), keyword textual cue (KTC), structured textual cue (CTC)

3.4 Data collection tools

3.4.1 Learning outcomes

The learning outcome is accomplished through three academic-level tests. Firstly, all the questions related to the "Earth's rotation" were screened from the educational exams organized by the local government educational authorities from 2015 to 2022, which ensured the trustworthiness of the quality of the questions. The Educational authorities classifies each question into three grades: easy, medium and difficult. The questions were categorized and grouped according to the difficulty level of the questions published by the educational authorities, and three sets of questions were obtained with the same difficulty level. Each set contained 4 multiple-choice questions, 2 fill-in-the-blank questions, and 1 short answer question, totaling 10 points. Three experienced elementary school science teachers were invited to re-evaluate the three sets of questions, which were found to be of a good level of difficulty, and the three sets of questions were used in the pre-test, the immediate post-test, and the delayed post-test to determine the students' basic level, achievement, and knowledge retention, respectively. Two researchers, specializing in science education, independently assessed learning outcomes, unaware of students' identities or group affiliations. Inter-rater reliability exceeded 0.95, indicating high consistency in scoring. Rasch analysis was employed to measure the validity of the test items, obtaining infitMNSQ and outfitMNSQ statistics. The results indicated that all item values fell within the range of [0.7, 1.15], suggesting that the item difficulty levels were well-aligned with the participants' ability characteristics.

3.4.2 Cognitive load and self-efficacy

The employed scales consist of two sections: cognitive load (CL) and self-efficacy (SE). Responses to all scale items were rated using a 5-point Likert scale. The cognitive load (CL) measure is derived from the scale developed by Paas and Van Merriënboer (1994), featuring two dimensions: mental load (ML) and mental effort (ME). Specifically, mental load encompasses 5 items, and mental effort includes 3 items. This scale, following adaptation by Hwang et al. (2013), has been effectively implemented in the context of elementary natural science curriculum evaluations, demonstrating positive outcomes in its application. The SE scale was adapted from an instrument originally created by Pintrich et al. (1991), consisting of 8 items, and stands as a widely utilized psychological instrument aimed at evaluating students' self-efficacy beliefs within learning environments. The Cronbach's α coefficients for the CL, ML, ME, SE scales were 0.841, 0.835, 0.858, and 0.892. To ensure the validity of the instruments, we conducted both content validity and construct validity checks. Content validity was ensured through expert review, where three experienced educators evaluated the relevance and clarity of each item in the context of the study objectives. All questionnaire items were translated into Chinese by expert translators and then reviewed by science education researchers. A pilot test was conducted with 10 fifth-grade students from the same school who did not participate in the main experiment. Based on their feedback, adjustments were made to guiding language and certain phrasings to ensure each question could be accurately comprehended by the students. Construct validity was examined using factor analysis. The Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy and Bartlett's test of sphericity were employed to confirm the suitability of the data for factor analysis. The KMO value of CL, ML, ME, SE scales were 0.703, 0.717, 0.833 and 0.762. The Bartlett's test of sphericity was significant (p < 0.001), indicating that the factor analysis was appropriate. The factor loadings for all items were above 0.5, confirming that the scales effectively measured their respective constructs.

3.5 Data analysis

The study examined the influence of three types of textual cues (STC, KTC, and CTC) on students' learning outcomes following the viewing of educational animations, employing various statistical analytical methods. All data demonstrate normal distribution characteristics, and have satisfied the requirements of the following variance analysis's assumption, as confirmed by the hypothesis test. Initially, repeated measures analysis of variance was used to analyze pretest, post-test, and delayed post-test data across the three groups to assess both immediate and delayed effects comprehensively. To address potential confounding effects of students' prior knowledge levels, ANCOVA was employed, treating pretest scores as a covariate. The study compared differences in immediate learning effectiveness between groups. For investigating the impacts of knowledge retention and the "faded effect" on knowledge retention, one-way ANOVA was utilized to determine which types of textual cues were more beneficial for maintaining knowledge over the long term and to elucidate distinctions in their contributions. To examine cognitive load, MANCOVA was employed to explore how various textual cues individually impacted two key cognitive dimensions—mental load and mental effort. The Bonferroni posthoc tests method was further used to scrutinize differences among groups. Additionally, ANOVA was leveraged to contrast and analyze the extent of change in students' self-efficacy under diverse textual cues. Finally, correlation analyses were conducted by calculating correlation coefficients between post-test, delayed post-test, "faded effect" scores, and cognitive load and self-efficacy to elucidate the relationships among these variables.

4 Result

4.1 Learning outcomes

To answer the immediate and delayed effects of 3 types of textual cues on students' learning outcomes after viewing educational animations. We conducted pre-tests, post-tests, and delayed post-tests from 3 groups. From the descriptive statistics, the pre-test scores of STC, KTC, and CTC were 2.20, 1.96, and 2.05, respectively, indicating that the students' basic knowledge was weak. After having watched the educational animation, the post-test scores of the participants in the three groups increased significantly, which were 7.03, 8.13, and 8.73, respectively. After 1 month, the delayed post-tests of the three groups decreased compared to the post-tests, which were 4.93, 6.29, and 7.36, respectively. The data for all groups passed Mauchly's test of sphericity (W = 0.995, χ2 = 1.367, p > 0.05). The results showed that the overall difference between the means of the three groups at different time points was significant (F = 47.420, p < 0.001, η2p = 0.291), as shown in Fig. 3. This implies that there is a "faded effect" in the knowledge retention of the three groups. Therefore, the post-test scores, delayed post-test scores, and the "faded effect" of knowledge retention were analyzed separately for the three groups.

Fig. 3
figure 3

Descriptive statistics and repeated measures analysis of variance for the pre-, post-, and delayed post-test of 3 groups

4.1.1 Analysis of the variability for achievement of three groups

To compare the achievement of the three groups, and to take into account the effect of prior knowledge, a one-way analysis of covariance was performed and all hypotheses were tested. The pre-test was used as the covariate, the group as the independent variable, and the post-test as the dependent variable (F = 31.584, p < 0.001, η2p = 0.201). It can be seen that after controlling for the effect of the pre-test level, there is a significant difference in the amount of higher effect of achievement among the three groups.

To further investigate and statistically compare the post-test score differences among groups of participants, a covariance analysis was conducted in accordance with its methodological requisites (see Table 2). After controlling for the effects of the pre-tests, among the achievement of the 3 groups, the knowledge retention of the students in the CTC group were significantly higher than those of the KTC group (p < 0.001), and the knowledge retention of the KTC group were significantly higher than those of the STC group (p < 0.001). The above data indicates that the three groups have significant differences in the immediate test of learning outcomes and show the results of CTC > KTC > STC.

Table 2 The planning comparison result for post-tests of 3 groups

4.1.2 Analysis of variance in knowledge retention of three groups

A one-way ANOVA was used to analyze the delayed post-tests of the three groups, which showed a significant difference between the three groups with a large effect size (F = 46.459, p < 0.001, η2p = 0.268). To clarify the differences between groups, post hoc multiple comparisons were performed using the Bonferroni method. The results showed that students in the CTC group had significantly higher knowledge retention than those in the KTC group (p < 0.001), and the KTC group had significantly higher knowledge retention than the STC group (p < 0.001). The above data indicate that the three groups have significant differences in delayed post-tests and show the results of CTC > KTC > STC. The specific data are presented in Table 3.

Table 3 Difference analysis results of knowledge retention of 3 groups

4.1.3 Comparison of the "faded effect" of knowledge retention of three groups

To examine the distinct disparities in the "faded effect" of knowledge (KF) among the three groups, the current study computed each student's KF score (delayed post-test minus post-test) and executed a comparison. The KF score of every student was differentially calculated and compared in this investigation. Table 4 presents the statistical depiction and analysis of the variance of KF among the three student groups. It demonstrated a significant difference in KF among the three groups with a medium effect size (F = 7.931, p < 0.05, η2p = 0.061). Bonferroni posthoc comparisons method unveiled a statistically significant disparity solely between STC and CTC (p < 0.05). There was no significant difference between KTC, and STC or CTC. However, it was evident that the ∣KF∣ of the 3 groups had the following order: STC (M∣KF∣ = 2.11) > KTC (M∣KF∣ = 1.85) > CTC (M∣KF∣ = 1.37).

Table 4 Comparison of "faded effect" of knowledge retention in 3 groups

4.2 Cognitive load

MANCOVA was used to analyze the effects of applying STC, KTC and CTC in educational animation on students' cognitive load. Students' cognitive load was mainly reflected through mental load and mental effort. The descriptive statistics of the three groups of subjects are shown in Table 5.

Table 5 Descriptive statistics of the cognitive load among the three groups

Before analysis, the assumptions of the MANCOVA method were tested. The box plot test did not reveal any one-way outliers and the scatter plot test did not reveal any linear relationship between the dependent variables. The maximum Mahalanobis distance is 13.519 (< 16.270, α = 0.001), indicating the absence of multivariate outliers. Box's M test showed F = 1.621 (p > 0.001), indicating equal covariance matrices for variance. As shown in the table, there was a mild correlation between the 2 variables of mental load and mental effort (r = 0.16) and no multicollinearity (all |r|< 0.9). The above results indicate that all hypotheses have been tested.

The results of the MANCOVA method of analysis are shown in Table 6. From a holistic perspective, the combined effect of textual cue type on students’ cognitive load consisting of mental load and mental effort was significant (F = 21.357, p < 0.001, Wilks' λ = 0.652; η2p = 0.233), and more specifically, the one-way test analysis showed that there was a significant difference in the medium effect of mental load and mental effort for all three groups of subjects.

Table 6 Results of the MANCOVA for cognitive load: Multivariable test and Univariate tests

Pairwise comparisons were made using the Bonferroni posthoc tests method, and the results are shown in Table 7. For mental load, STC > KTC > CTC, and the results are significant. For mental effort, STC > KTC > CTC, and the results were significant. Cognitive load reflects the total amount of mental resources an individual requires to allocate and organize, representing the demands on the brain during the processing and retention of new knowledge (Paas & Merriënboer, 1994). Cognitive effort constitutes the actual portion of cognitive resources expended within the broader construct of cognitive load, denoting the actual intensity of cognitive processing invested by an individual during task execution. As the degree of task design rationality increases, the cognitive effort required from individuals diminishes correspondingly (Hwang et al., 2013). Consequently, in comparison with two other text-based cues, the presentation form in CTC appears relatively more rational and conducive to learners' comprehension of knowledge.

Table 7 Results of pairwise comparisons for cognitive load of 3 groups

4.3 Self-efficacy

To investigate whether there exist significant differences in the effects of various types of educational animations on elementary school students' self-efficacy (SE), the ANOVA was employed (as presented in Table 8). The results revealed that indeed there are statistically significant differences in the influence of different types of educational animations on students' self-efficacy (F = 23.423, p < 0.001, η2p = 0.141).

Table 8 Results of ANOVA and Bonferroni post-hoc multiple comparisons for self-efficacy of 3 groups

Given this significant outcome, further investigation was conducted through the application of the Bonferroni post-hoc multiple comparisons test to ascertain specific differences among the groups. Upon comparison, it was found that students exhibited the highest levels of self-efficacy when exposed to the educational animation with CTC, followed by the educational animation with CTC, while the educational animation with STC elicited the weakest self-efficacy scores, with all differences between these groups means being statistically significant.

4.4 Relationships among learning outcomes, cognitive load and self-efficacy

To investigate the relationship between learning outcomes, cognitive load and self-efficacy, we analyzed the correlations between post-test scores, KF, CL, and SE among three groups. Firstly, rPT&CL = -0.195 (p < 0.001) and rDPT&CL = -0.189 (p < 0.001), implying that the students' immediate post-test, delayed post-test all showed negative correlation with cognitive load, which verified the negative effect of cognitive load on the learning outcomes. rKF&SE = -0.338 (p < 0.001), This indicates that the results of the delayed post-test are negatively correlated with self-efficacy. in addition, rDPT&SE = -0.152 (p < 0.05), indicates that the "faded effect" related to knowledge retention has a significant negative correlation with students' self-efficacy. Since the mean self-efficacy of the three groups in this experiment was high after viewing the educational animation (mean = 4.18), excessive self-efficacy on the part of the students may hinder the knowledge retention. Detailed data are presented in Fig. 4.

Fig. 4
figure 4

Correlation analysis of learning outcomes, cognitive load and self-efficacy

5 Discussion

Educational animation engages students and promotes understanding of complex concepts by providing visual and auditory stimulation. However, students' learning outcomes and experience using these educational animations are affected by the type of textual cues that accompany them. This study investigated the impact of different textual cues—subtitle textual cue (STC), keyword textual cue (KTC), and structured textual cue (CTC)—on student achievement, knowledge retention, cognitive load, and self-efficacy.

In terms of achievement, students using educational animations with CTC achieved the highest scores, followed by those using KTC, with STC performing the worst. The poor academic achievement associated with STC could be attributed to visual distraction caused by dynamic subtitles, which obscured key information (Arslan-Ari et al., 2020; De Koning & Jarodzka, 2017). Additionally, based on the transient information effect (Castro-Alonso et al., 2018), the fleeting nature of subtitles can lead to cognitive overload, impeding students' ability to process and retain information effectively. In contrast, students using CTC in educational animations outperformed those using KTC. This supports previous research that constructing knowledge from animations is an iterative cognitive process (Kriz & Hegarty, 2007), and well-structured textual cue can more clearly guide this process. CTC, with its hierarchical structure, facilitates the knowledge development process, leading to meaningful learning (Semilarski et al., 2022).

Regarding knowledge retention, all groups showed lower delayed post-test scores compared to immediate post-test scores, indicating the "faded effect". While the CTC group scored slightly higher than the KTC group, the difference was not significant, and the STC group lagged significantly behind the other two. This interesting finding suggests that the positive impact of educational animations on students may diminish over time, regardless of the type of textual cues used. This aligns with Ebbinghaus's forgetting curve (Ebbinghaus, 2013). The significantly lower performance of the STC may be due to the transient information effect (Castro-Alonso et al., 2018), where the fleeting nature of subtitles leads to cognitive overload, hindering students' ability to process and retain information effectively. The dynamic visual nature of STC introduces additional cognitive burdens, making comprehension more difficult over time, thus resulting in lower retention rates (Paas et al., 2007).

Regarding self-efficacy, students using CTC reported the highest self-efficacy scores, followed by those using KTC, with STC yielding the lowest self-efficacy scores. For students, constructing knowledge from animations is essentially an iterative cognitive process (Kriz & Hegarty, 2007). CTC could more clearly use textual cues to organize the semantic content and structure of the material, guiding students' cognitive processes and enhancing their confidence in their learning abilities (Sweller, 2010). While KTC had a lower degree of structuring, its clear keywords still enabled students to immediately notice the key points of the course, thereby enhancing self-efficacy (Semilarski et al., 2022). Students using STC reported the lowest self-efficacy, possibly due to cognitive overload caused by subtitles, which hindered their engagement and confidence in the learning tasks (Paas et al., 2007).

In terms of the correlations between learning outcomes, cognitive load, and self-efficacy, the study found a significant negative correlation between cognitive load and both academic achievement and knowledge retention, consistent with previous research (Arslan-Ari et al., 2020; De Koning & Jarodzka, 2017; Semilarski et al., 2022). However, we also found an interesting correlation regarding self-efficacy. Students' self-efficacy was significantly positively correlated with their learning outcomes, but it also showed a significant correlation with the faded effect. Through discussions with students and their science teachers, it became evident that many students in the CTC group believed they had thoroughly grasped the concept of Earth's rotation after watching the educational animation. Consequently, they did not feel the need for further review. When students perceive themselves as having rich learning experiences in a specific medium, feel highly confident about their learning situation, and optimistically believe they can acquire extensive knowledge in some way, they may misjudge the difficulty of the task (Toni et al., 2023) or erroneously believe they do not need further guidance or review (Acuna et al., 2011). We observed that students in all groups reported high self-efficacy in their learning experiences. In traditional science classrooms, students' self-efficacy is considered a crucial indicator of teaching effectiveness (Lin, 2021; Tarchi et al., 2021). When students have high self-efficacy, they are generally expected to exert more effort in science learning and have higher cognitive pursuits (Semilarski et al., 2022). However, this study's findings extend previous research on science learning, suggesting that, for indigenous digital students, self-efficacy may be a cautionary indicator in measuring multimedia-supported science education. Teachers should provide more metacognitive support and materials for review and reflection to address this potential challenge.

6 Implications

The findings underscore the critical role of textual cue design in educational animations. Effective cue design can enhance learning outcomes, improve knowledge retention, manage cognitive load, and boost self-efficacy, providing valuable insights for educators and instructional designers in creating optimized multimedia learning environments. Meanwhile, the study highlights the need for a nuanced approach to leverage self-efficacy for multimedia-assisted science education, emphasizing the importance of designing instructional materials that simultaneously leverage students' confidence while addressing the potential pitfalls of overinflated self-efficacy in sustaining long-term learning outcomes. To this end, the present study proposes the following recommendations for the design of educational animation:

  • Educational animation should employ textual cues that possess distinct semantics and a well-structured format, such as CTC, to facilitate students' cognitive development and the establishment of cognitive schema.

  • The educational effectiveness of transient information, such as dynamic subtitles (STC), in educational animation is less than satisfactory. Therefore, it is advised to minimize its usage. However, if its utilization is deemed necessary, it is vital to adopt more nuanced instructional designs that enable students to concentrate on pertinent information.

  • When managing cognitive load, educators can consider highlighting key information by marking keywords (such as KTC) and placing textual cues in fixed positions that align with reading habits (such as CTC). These methods help reduce cognitive load and improve learning effectiveness.

  • When digital native students engage in multimedia with a lower cognitive load, such as CTC educational animation, they are often inclined to exhibit a heightened sense of self-efficacy, which may result in overconfidence in the learning outcomes and a disregard for subsequent review and consolidation. Thus, it is crucial for teachers to remind and guide students toward conducting effective reviews and knowledge retention. Alternatively, teachers can enhance students' metacognition by incorporating additional cues in instructional design or educational animation.

7 Conclusion

This study highlights the significant impact of textual cues on students' learning outcomes, cognitive load, and self-efficacy in educational animations. The findings suggest that structured textual cues (CTC) are most effective in enhancing student achievement and self-efficacy while minimizing cognitive load. Keyword textual cues (KTC) also provide benefits, though to a lesser extent, while subtitle textual cues (STC) may introduce additional cognitive challenges that hinder learning. Additionally, the study underscores the complex role of self-efficacy in learning, indicating that high self-efficacy may sometimes lead to overconfidence and decreased review efforts, potentially impacting knowledge retention. These insights provide valuable guidance for educators and instructional designers in creating effective educational animations that support meaningful learning and sustained student engagement.

8 Limitations and future

Although significant effects were achieved, this study still has some limitations. Firstly, the educational animations used in this research were presented to students in a systematically controlled manner, depriving them of control over the pace of the animations. Previous studies (Tabbers & De Koeijer, 2010) have suggested that allowing students to control their learning pace can lead to better learning performance. Future investigations could explore the impact of students autonomously controlling educational animations. Secondly, this study investigated how three different types of textual cues impact various learning outcomes concerning self-efficacy and cognitive load from the perspective of the learner's learning experience. Acknowledging the critical role that text quantity plays in achieving efficacious educational outcomes through animated instruction (Berney & Betrancourt, 2016), we advocate for further research into designing an appropriate amount of text for each cue type, aiming to ensure that essential information is conveyed effectively, thereby optimizing learners' usage experience and learning effectiveness. Thirdly, we suggest that future researchers extend the duration of the experimental period and develop a series of educational animations with the three types of textual cues to observe the long-term effects on students' learning outcomes and learning experiences.