1 Introduction

Emotions are essential to learning as they influence various cognitive processes, such as perception, cognition, problem-solving, and creativity (Li et al., 2020). Empirical findings focusing on emotions in learning revealed that emotional experiences are also fundamental in technology-based learning (Loderer et al., 2020), and there is a need to investigate such implication towards cognitive and affective learning outcomes (Mayer, 2020). Likewise, numerous measures have been applied to investigate these outcomes by focusing on different theories, models, designs, tools, and measurements (Hillaire, 2021), where the key approach is commonly associated with studies focusing either on affective computing or emotional design. Nevertheless, in regard to multimedia-based learning, two main concepts were unified, which were the Emotional Design Theory (Norman, 2004) and the Cognitive and Affective Theory of Multimedia Learning (CATLM) (Moreno & Mayer, 2007) to evaluate how emotions could be induced based on the manipulation of multimedia elements. The concept was based on the ideation focusing on user experience and multimedia learning theories in the hopes of gaining a broader perspective on how 'visually designed learning emotions' could influence cognitive and affective learning outcomes. Hence, one strategy is to redesign essential instructional elements to create appeal without adding extraneous elements (Mayer & Estrella, 2014; Um et al., 2012) by hypothesising that it may induce positive emotions and learning outcomes (Kumar et al., 2019; Shangguan et al., 2020a).

On the other hand, applying multimedia unlocks various design possibilities as it implies that essential instructional elements manifested based on colours, images, sounds, or a combination could have different learning implications. Hence, design considerations, such as applying colour combinations (Kumar et al., 2019; Mayer & Estrella, 2014; Park et al., 2015; Plass & Kaplan, 2015), shapes, anthropomorphism (Wang et al., 2022), baby-face bias (Um et al., 2012; Moremoholo & de Lange, 2018; Plass et al., 2020) and fonts (Kumar et al., 2018) have been considered, yet Uzun and Yıldırım (2018) claimed that the outcomes sporadically portrayed an improvement in learning. Shangguan et al. (2020b) explained that these empirical findings only provided scarce evidence on the potential benefits of visually attractive elements, and researchers such as Brom et al. (2018), Endres et al. (2020), Münchow and Bannert (2019) and Shangguan et al. (2020b) questioned if positive emotional designs could effectively influence positive learning outcomes. Moreover, emotions can facilitate or impede learning, and similar could be said about using emotionally designed multimedia elements (Wong & Adesope, 2021). However, Chung and Cheon (2020) argued that such a misconception could be by virtue attributed to limited studies on the manipulation of emotional design elements and research strategies in multimedia learning, while Endres et al. (2020) suggested exploring untested moderators that could influence these learning outcomes. Additionally, studies tend to primarily utilise the self-perception method, focusing on questionnaires to measure emotional change and learning behaviour, and recent studies question the validity of such measurement while suggesting a shift towards non-invasive methods (Mayer, 2020).

2 Anthropomorphism in multimedia learning

Subsequently, by considering previous findings, we focus on building upon the application anthropomorphised multimedia elements as a means to execute emotional design principles. Anthropomorphising happens when inanimate objects are characterised by human features to represent emotions and motivation (Cao et al., 2022; Epley et al., 2007). Airenti (2018) describes it as a basic human tendency developed during childhood to comprehend situations, and this tendency persisted throughout adulthood as a coping and learning mechanism. In digital learning environments, empirical findings indicate that the degree of anthropomorphising could possibly influence learning adaption, perception, and outcomes as anthropomorphised elements could be associated as a social interaction partner (Schneider et al., 2022). The degree stipulated here could vary from a simple addition of eyes and mouth, such as in a smileyϑ or by adding complex human qualities, such as gestures, attributes, and emotions. Likewise, with humanlike appearances, the anthropomorphised objects tend to reflect novelty in interaction and can visually attract users towards engagement and interaction (Chowdhury et al., 2018) while predominantly inducing positive emotions (Cao et al., 2021).

Fittingly, in the context of multimedia learning, the emotional significance of anthropomorphism was initially investigated to manipulate emotions through positive emotional design in a seven-minute presentation on "How Immunization Works" by Um (2008). While the findings did not reflect a direct relationship between anthropomorphism and learning outcomes, Um reported that positive designs induced positive emotions, which indirectly improved cognitive and affective learning outcomes. Ensuing, Mayer and Estrella (2014), using an eight-slide multimedia presentation titled "How a Cold Virus Attacks the Body" investigated the effects of colour combinations with anthropomorphised elements and reported that anthropomorphised lessons yielded better learning achievement and mental effort. In retrospect, studies on anthropomorphism as strategised for emotional design in multimedia learning tend to mainly focus on building on the seminal work by Um (2018) and Mayer and Estrella (2014). Nevertheless, there are a growing number of studies in other areas, such as Münchow and Bannert (2019) explored the effects of anthropomorphised organelles of eukaryotic cells, Hight et al. (2021) on chemistry concepts, Brown et al. (2022) on pharmaceutical concepts, Liew et al. (2022) on computer viruses, Shangguan et al. (2020a) on the formation of lightning, Slabbert et al. (2022) with solar systems and Haaranen et al. (2015) on programming concepts. These studies indicate a need to consider reproducibility with other subject domains, as suggested by Liew et al. (2022) and Pang et al. (2021), while also deliberating on the importance of culture and context (Kumar et al., 2019; Schneider et al., 2022). Collectively, studies in this context outside the western countries have also been scarce (Stárková et al., 2019), with a handfull of studies from China (Li et al., 2020; Shangguan et al., 2020a, b), Malaysia (Liew et al., 2022), South Africa (Slabbert et al., 2022) and Türkiye (Türkoguz & Ercan, 2022; Uzun & Yıldırım, 2018) which warrants further exploration.

In retrospect, despite decades of study on the effect of anthropomorphism, it remains unclear why and how it captures people's attention (Cao et al., 2021). Anthropomorphism has been regarded as an attention-holding variable (Dehn & van Mulken, 2000) and in the recently introduced Cognitive-Affective-Social Theory of Learning in digital Environments (CASTLE), Schneider et al. (2022) regards anthropomorphism as a social process that supports the transfer of information into working memory through association with attention. Likewise, in emotional design studies, Park et al. (2015) initially examined attention by utilising eye-tracking methodology on Um's presentation and found that expressive anthropomorphisms have possibilities to improve learning attention. Following this, Brom et al. (2018) highlighted the need for non-invasive measurements such as eye-tracking to further understand anthropomorphism's role in attention-capturing effects. Subsequently, Stárková et al. (2019) extended the seminal work on anthropomorphism by Mayer and Estrella (2014) and revealed that anthropomorphism only affected attention through initial fixation, which was supported by Wang et al. (2022) by adding that it may also affect the length of fixation and warranted further investigation.

2.1 Gaze behavior

Non-invasive methods, namely eye-tracking technology, have immerged as a vital tool in understanding the cognitive process involved in multimedia-based learning (Arslanyilmaz & Sullins, 2021; Coskun & Cagiltay, 2022; Tzafilkou et al., 2021). Furthermore, empirical evidence in e-learning and human-computer interaction observed that eye-gaze movement could be associated with cognitive processes such as attention (Ramachandra & Joseph, 2021). In retrospect, anthropomorphism usually represents a form of visual cue (Frischen et al., 2007; Stárková et al., 2019; van Wermeskerken & van Gog, 2017) that promotes guidance that deliberately forces users' eye gaze to move toward the stimuli (Moon and Ryu, 2021). Hence, gaze data could be vital in determining the effect of multimedia elements in directing learners' attention (Coskun & Cagiltay, 2022) by analysing fixation time (Arslanyilmaz & Sullins, 2021), the number of views, initial view, and dwell time by referring to the areas of interest (AOIs) (Moon and Ryu, 2021).

3 Research aim

In this study, we developed a new learning material focusing on solar systems to identify if the results obtained from empirical findings on anthropomorphisms could be replicated for different topics and contexts. Hence, an anthropomorphised multimedia learning material was compared with one void of anthropomorphic design. Next, we also considered the role of epistemic emotions, as recommended by Loderer et al. (2020), as there is a dearth of research focusing on its effect and correlation with multimedia learning outcomes. Epistemic emotions such as surprise, curiosity, enjoyment, confusion, anxiety, frustration, and boredom (Pekrun et al., 2017) are defined as emotions developed due to cognitive inaptness that occurs from unanticipated or differing information (Millis et al., 2011). We observed that epistemic emotions are fundamental in this study as the learning material was designed based on a general topic that was not aligned with the respondents' course. Moreover, epistemic emotions address emotions primarily developed due to cognitive tasks and activities (Pekrun et al., 2017), and we identified this as an essential consideration, especially when Mayer (2020) highlighted the need to identify a causal link between affective and cognitive processing. Furthermore, Schneider et al. (2022), emphasis the need to explore emotions when learning with media especially when social cues are concerned such as by anthropomorphism.

Likewise, emotions are a cognitive process (Triberti et al., 2017) that defines the inclination and need to engage (Park et al., 2008). For that reason, we theorised that the need for cognition (NC) might have a moderating role in learning outcomes, as Cacioppo et al. (1996) describes relates to learning performance. While Cacioppo et al. (1984) define NC as a predisposition to engage and enjoy a cognitive activity, the researchers stipulates that it may provide further insights into affective-cognitive outcomes. Epley et al. (2007) describe individuals with high NC usually negatively reflect on anthropomorphism, yet to the researchers' knowledge, this has not been investigated in the scope of emotional design in multimedia learning. As a result, we aimed to investigate the role of NC in moderating learning outcomes, namely achievement, perceived satisfaction, and effort. Lastly, we explored gaze behavior by measuring AOI's initial view, the number of views, and dwell time, in which we specifically focus on the anthropomorphised essential elements. While some studies have used eye-tracking such as Stark et al. (2018) and Stárková et al. (2019), according to Othman et al. (2020), such application are often costly. Therefore, as an alternative approach, we used a webcam-based tracker called GazeRecorder to analyse users' gazes while viewing the learning material. GazeRecorder uses a mobile interface, and according to Tzafilkou et al. (2021), mobile sensing has a high potential for identifying learning behaviours in a natural environment. Therefore, while the self-perception method was employed to investigate learning outcomes and moderators, the researchers utilized gaze tracking to understand gaze behaviours based and answer the following questions:

  • RQ1. What are the differences in learning outcomes when anthropomorphised elements are used, and does NC moderate these outcomes?

  • RQ2. What are the differences in correlation between epistemic emotions and learning outcomes when anthropomorphosis elements are used?

  • RQ3. How does fixation defer when anthropomorphosis elements are used?

4 Methodology

4.1 Participant

A total of 68 undergraduates from a social science course at a university in Malaysia participated in this study. The sampling was based on purposive sampling, referring to an intact classroom grouped into two groups. Of the 68 respondents, the intervention group had N = 33 participants, whereas the control group had N = 35 respondents. Six respondents in each group were randomly selected and given access to a separate link to the videos to facilitate gaze-tracking data collection.

4.2 Intervention

Two designs of multimedia-based video were used to control the effect of treatments, namely Non-Anthropomorphised (NAT) (Fig. 1) and Anthropomorphised (AT) (Fig. 2). Both videos had the same contents and duration and varied only based on anthropomorphic attributes. Anthropomorphic characters were not designed by emotional valance but were randomly selected and associated with the content in line with the description by Airenti (2018). For example, Venus was depicted with a female character, Neptune, which was the furthers from the Sun was reflected as 'crying' while Pluto with uncertainty due to being downgraded as a planet (Table 1).

Fig. 1
figure 1

Screen shot from NAT video

Fig. 2
figure 2

Screen shot from AT video

Table 1 Anthropomorphic characters used in the video

4.3 Variables

Ten multi-choice questions measured students' pre- and post-achievement. Ensuing, NC was measured prior to the intervention using the 18-item questionnaire adapted from the Need for Cognition Scale (Cacioppo et al., 1984). Next, we measured perceived satisfaction which Topala and Tomozii (2014) defined as a multifaceted measurement of learning attitude based on an adapted version of the System Usability Scale (SUS). Whereas perceived effort is defined as the perception of work students recognise, they put in while learning and was adapted from Mayer and Estrella (2014). According to Needham (1978), effort relates to satisfaction, and the outcome of effort can be represented as knowledge gained that may yield satisfaction. Nevertheless, as students were unaware of their scores, satisfaction due to achievement will be disregarded. Subsequently, emotions were measured using Epistemically Related Emotion Scales by Pekrun et al. (2017) to measure Surprise, Curiosity, Enjoyment, Confusion, Anxiety, Frustration, and Bored. All items except the pre and post-test were measured using a five-point Likert scale ranging from strongly agree to strongly disagree. As for the Epistemically Related Emotion Scales, a four-point scale was used to reflect; Not at all, Very little, Moderate Strong, and Very Strong.

4.4 Eye gaze tracking

A mobile eye-tracking tool from https://gazerecorder.com was used where eye position was captured via the mobile device's integrated camera, reflecting a more natural setting. The gaze recorder captures information based on dwell time, first view, and the number of views in the selected AOI, as shown in Fig. 3. Dwell time represents the length of time viewers focus on an area (Bruno et al., 2021) based on the AOI (Moon & Ryu, 2021).

Fig. 3
figure 3

Dwell time analytics based on AOI

4.5 Procedure

Both groups answered the pre-test followed by the NC questionnaires before the intervention. Next, respondents were given the video link through their email to access the assigned video. Similarly, the six respondents randomly selected from each group were provided with the link to the gazerecorder system where these twelve students were briefed on the calibration procedures required one day before the intervention. When the students click the link, the calibration process starts, where the student will have to look at several targets at a fixed position on the screen to map the gaze coordinates. Next, students were provided links to respond to the satisfaction, effort, and emotion instrument, followed by the post-test using Google forms. Lastly, students were thanked and debriefed. The analysis in this study was done using SPSS v27, where the data from the forms were first transferred into Microsoft Excel to match respondents' pre and post-test results. Five respondents were eliminated due to missing pretest results. Based on the data obtained, the reliability for learning effort (α = .632), satisfaction (α = .909), and NC (α = .661) indicated that the Cronbach α could be accepted as satisfactory (0.58–0.97) (Taber, 2018).

5 Findings

The findings of this study are reported based on effects on learning outcomes (RQ1), epistemic emotions (RQ2), and gaze behaviour (RQ3).

5.1 Learning outcomes

NC levels were categorized based on mean = 3.132, s.d. = .417, where 18 High and 15 Low respondents were observed for AT, whereas NAT had 16 High and 19 Low respondents, respectively (Table 2). A two-way ANCOVA indicated no significant difference between the groups on achievement between AT (M = 6.76, s.d. =1.324) and NAT (M = 7.03, s.d. =1.248) at F (1,63) = .259, p = .612, ηp2 = .004 and levels of NC at F (1,63) = .381, p = .539, ηp2 = .006. Similarly, there were no significant interaction effect at F (1,63) = 1.493, p = .226, ηp2 = .023. As for satisfaction, a two-way ANOVA indicated no significant difference between the groups between AT (M = 3.596, s.d. =0.740) and NAT (M = 3.448 s.d. =.672) at F (1,64) = .633, p = .429, ηp2 = 010 and levels of NC at F (1,64) = .456, p = .502, ηp2 = .007. There was also no significant interaction effect at F (1,64) = .867, p = .355, ηp2 = .013 as also indicated for effort F (1,64) = .198, p = .658, ηp2 = .003. The AT group (M = 3.855, s.d.=.690) reflected more mental effort than NAT (Mean = 3.737, s.d.=.559), however there was no significant difference between groups at F (1,64) = .686, p = .411, ηp2 = .011 nor based on levels of NC at F (1,64) = .560, p = .457, ηp2 = .009.

Table 2 Descriptive statistics learning achievement, effort and satisfaction

5.2 Epistemic emotions

We analyzed the difference between the groups using a series of t-tests which indicated no significant difference for all positive emotions (Surprised, Curious, and Excited) and the cumulative positive experience for AT (M = 3.647, s.d.=.601) and NAT (M = 3.724, s.d. =.730) (Table 3). However, there were significant difference in negative emotions at t (66) = 2.307, p = .024, d = .755 between AT (M = 1.849, s.d.=.731) and NAT (M = 2.271, s.d.= .777) for Anxious at t (66) = 2.624, p =. 011, d = .896 and Bored at t (66) = 2.184, p =. 033, d = 1.028.

Table 3 Mean, standard deviation for epistemic emotions

Next, we investigated how these emotions are correlated with the learning outcomes by conducting a Pearson correlation for both groups individually. For the AT group (Table 4), it was observed that learning gain had a significant negative relationship with Confused at r = − .506, p < 0.01, and overall negative emotions at r = − .368, p < 0.05. While no significant relationship was observed for learning effort, there was a significant negative relationship between learning satisfaction and Frustrated at r = − .344*, p < 0.05. As for the NAT group (Table 5), while there were no significant relationships between learning gain with any of the emotions, learning effort indicated a significant negative relationship with Confused (r =-.404, p < .05) and Frustrated (r = − .396, p < 0.05) whereas learning satisfaction indicated a significant negative relationship with Bored (r =-.490, p < 0.01) and interestingly with learning effort at r =-.356, p < .001.

Table 4 Correlation between emotions and learning outcomes for AT
Table 5 Correlation between emotions and learning outcomes for NAT

Concurrently, the Fisher-z transformation method, as suggested by Eid et al. (2011), was applied to compare the correlation difference between learning outcomes and emotional relationships for AT and NAT. The comparison was done using the online calculator available at https://www.psychometrica.de/correlation.html#independent (Lenhard & Lenhard, 2014) and reported in Table 6. The findings reflected no significant difference between the groups for perceived satisfaction, however, there were significant difference between AT and NAT for learning achievement focusing on confused (z = -1.758, p = 0.039) and frustrated (z = -1.675, p = 0.047) and perceived effort focusing on emotions namely frustrated (z = 2.160, p = 0.015) and overall negative emotions (z = 1.742, p = 0.041).

Table 6 Comparison of correlations between AT and NAT (Z-Value)

5.3 Gaze tracking behaviour

The AT group indicated faster response (tavg =8.79, s.d.= 4.953) (Fig. 4), compared to the NAT group (tavg =10.408, s.d. =5.258) except for Saturn, Uranus and Neptune. However, the frequency of views for each planet was higher in the NAT group (favg=4.273, s.d.=.647) in comparison with the AT group (favg =3.00, s.d. = .447) (Fig. 5) except for Venus and Jupiter, where both groups had the same number of views. In terms of dwell time (Fig. 6), it was observed that while initially AT (Mean t = 1.19s) had a higher dwell time compared to NAT(t = 0.47s), the dwell time was lower than NAT for every other planet until Pluto. Interestingly, throughout the mid of the video (Venus, Earth, Mars, Jupiter, Saturn, and Uranus), the dwell time for AT was significantly lower compared to the NAT group. However, among all the planets, the highest dwell time is for Uranus (t = 5.49s), which was observed in the NAT group. The average dwell time for AT group was tavg =1.041, s.d.= .923 where else for NAT was tavg = 2.468, s.d.= 1.487).

Fig. 4
figure 4

Comparison between groups of initial view of objects

Fig. 5
figure 5

Comparison between groups for frequency of view of objects

Fig. 6
figure 6

Comparison between groups for dwell time

6 Discussion

The findings indicated no significant difference between AT and NAT on learning outcomes, namely learning achievements, satisfaction, and effort. Thus, supporting the findings of Endres et al. (2020) and Stárková et al. (2019). Nevertheless, it was found that while achievement was higher for NAT, AT indicated higher satisfaction and perceived effort. Similar findings were also reported by Um et al. (2012) for satisfaction and Mayer and Estrella (2014) and Shangguan et al. (2020a) for effort. According to Shangguan et al. (2020b), emotionally designed elements may stimulate learning effort as additional cognitive resources are required to process these elements. Additionally, even attentional focus, which we theorized was implicated by anthropomorphised elements could be associated with effort as described by Kanfer and Ackerman (1989) and satisfaction (Needham, 1978). Likewise, while NC motivates additional attention and evidence indicate a low preference for anthropomorphism (Epley et al., 2007), NC did not moderate any learning outcomes. Nevertheless, high NC respondents in the AT group indicated higher achievement, perceived satisfaction, and less effort when exposed to the intervention. We stipulate that this group may view anthropomorphised elements solely for decorative purposes and ignore their attributes, as also explained by Slabbert et al. (2022).

Subsequently, in view of epistemic emotions, we observed a partiality towards reducing negative emotions. The findings revealed no significant difference between the groups concerning all epistemic emotions except for Anxious and Bored, which transpired into effects relating to negative emotions, and the impact was deemed high (danxious=.896 and dbored=1.028). According to Wong and Adesope (2021), positive and negative emotions may lead to different learning behaviours, attitudes, and outcomes, and Shangguan et al. (2020b) claimed that positive emotions relate to students learning control which was not assessed in this study. While empirical findings have indicated that the aesthetic appeal of anthropomorphism may foster positive emotions and learning outcomes (Plass & Kaplan, 2015), Loderer et al. (2020) explained that both positive and negative epistemic emotions have a similar impact. Hence, as anxiety and boredom were reduced, the experience can be reflected as a positive experience as anthropomorphised elements successfully improved intent and interest (opposite of bored). Next, we observed favourable outcomes for the AT groups as a higher mean value was observed for the overall gain in positive emotion, as negative emotions were also significantly reduced. Additionally, by observing how each emotion was associated with learning outcomes, the AT group only reflected significant negative correlations between learning achievement with Confused and learning satisfaction with Frustrated. However, negative associations were observed in the NAT group between learning effort with Confused and Frustration and interestingly between learning satisfaction with effort. Next, the relationship between epistemic emotions and learning outcomes further differed in both groups based on learning achievement with Confused and Frustrated and perceived effort with Frustaration. Nevertheless, Bored and Anxious were not associated with any learning outcomes; we theorized that the influence might be directed towards other outcomes not tested in this study. These epistemic emotions could also be associated with emotions when exposed to a new learning material, namely Confusion, Frustration, and Boredom (D’Mello, 2013), while anthropomorphism could contribute to reducing anxiousness and adaptability.

Next, as emotional design is a tool used to improve attention (Chung & Cheon, 2020; Endres et al., 2020), anthropomorphism was measured based on gaze behaviour to determine attention based on the initial view, the number of views, and dwell time. According to Lodge and Harrison (2019), the clarity of the term attention in digital learning is still widely speculated and differs from the cognitive neuroscience perspective. Endres et al. (2020), as an example, explained that attention to new learning materials usually happens in the first minute, but manipulation is still needed to maintain the attention where the instruction should create situational interest. In this study, we observed that the AT group viewed the intended objects in the AOI earlier than the NAT group. Nevertheless, the number of average views for each AOI was higher for the NAT group. Henceforth, while this study's findings supported Stárková et al. (2019), indicating that anthropomorphisms affect initial fixation, we are unable to speculate the reason behind such findings as sensibly anthropomorphism should create such situational attention.

However, Lodge and Harrison (2019) explained that attention to learning could also be speculated based on voluntary and involuntary attention. In this study, anthropomorphism was regarded as a stimulus for saliency, and the design was intended to create involuntary attention. For that reason, to further provide in-depth understanding, the dwell time, which is the duration viewers spend focusing on a section (Bruno et al., 2021), was imperative. It was observed that the AT group showed a higher dwell time for anthropomorphised elements at the beginning of the video, but the dwell time for the same AOI for the NAT group was surprisingly higher for other elements. Additionally, we observed a U-shape trend behavior reflecting a prolonged stagnation in dwell time before increasing again when there is situational interest that we hypothesized may be due to the anthropomorphised characters portraying negative emotions (Neptune and Pluto) or the anticipated end of the video. Additionally, we speculated that the addition of anthropomorphism might at times mask the original attributes of the essential multimedia element that implies a need to consider design implications such as proportional sizes and colour variations.

7 Conclusion

The finding indicates that anthropomorphised elements did not influence achievement, perceived satisfaction, and effort even when considering NC. Nevertheless, the addition of anthropomorphised elements reduced negative emotions, specifically epistemic emotions, which were Bored and Anxiety. Furthermore, the difference in the relationship between epistemic emotions and learning outcomes indicated that adding anthropomorphised elements influenced learning achievement by minimizing Confused and Frustrated emotions and perception of mental effort by Frustrated. In hindsight, gaze behaviour indicated that while anthropomorphisms influenced the initial view, it did not improve the number of views or dwell time in the AOI. Interestingly, we observe that anthropomorphised elements depicting negative emotions may improve the dwell time.

7.1 Limitations and recommendations for future studies

This study is limited to undergraduate students in Malaysia, and further exploration is required to investigate if the findings of this study can be applied to other age groups, populations, larger sample sizes, subject matter areas, and types of learning material. Next, additional multimodal considerations such as sound effects and animation should be considered, as using idle anthropomorphised elements may not be sufficient to understand how anthropomorphism influences learning. Furthermore, Wong and Adesope (2021) claim that the nature of presentation (animated or static) in emotional design is still understudied, with empirical findings indicating an influence on mental effort and learning outcomes. Similarly, as Krämer et al. (2013) described, emotions are influenced by the unconscious mimicry of characters' expressive cues; we stipulate that animating these cues may have a different impact on emotions and learning. Hence, another consideration is exploring anthropomorphised elements in educational chatbots, as mobile users today vastly use 'stickers' and 'emoticons.' Furthermore, this will provide insights into the complex dynamics as to the influence of mobile interaction in educational environment (Kumar, 2021) as chatbots have possibilities to create social cue replicating humanlike interaction (Kumar & Silva, 2020). Consequently, an interesting perspective is also to consider special education and senior citizens and how anthropomorphised elements could aid learning by reducing anxiousness. Moreover, we hypothesise three design considerations for anthropomorphism manipulation that should be considered in future studies: aesthetics, emotional expression, and size proportion. Future studies could also consider new content, context, culture, and testing methods beyond the typical self-evaluation method. Hence, we encourage using emotion recognition systems (Triberti et al., 2017) as such measurement could provide an in-depth understanding of emotions, such as the effect of boredom, engagement (Barron-Estrada et al., 2018), and emotional valance in real-time.