1 Introduction

In this digital age, new technologies have transformed how teaching and learning are delivered, poised to reshape the boundaries of the future of education. Language education, in particular, has entered an era focused on experience and characterized by third-generation digital literacies, where multisensory, communicative, collaborative, and active learning is stressed (Pegrum et al., 2022). Nevertheless, as Lee (2022) suggest, the transmissionist methods of instruction that teachers are used to may not support learners in an engaging fashion. One main obstacle in the conventional classroom is the decontextualization of language learning (Chun, 2019). Despite teachers’ serious attempts to provide learners with authentic learning environments, English as a foreign language (EFL) learners tend to find it hard to relate textbook knowledge to their personal lives, especially when they had no lived experience in English-speaking environments, which often results in students’ disengagement in language learning (Tseng & Yeh, 2019).

Against this backdrop, immersive virtual reality (iVR) has emerged as a powerful tool to support the development of learner engagement in the field of technology-enhanced language learning (TELL) (Lan, 2020). iVR builds on the premise of creating a 3D, simulated digital environment that provides an immersive experience that feels authentic (Parmaxi, 2023). As discussed, as language learning without context makes learning abstract, language learning is a more daunting task for young EFL learners due to their immature behavioral, affective, and cognitive skills (Hu-Au & Lee, 2017). For EFL learners, iVR has the potential to create an infinite set of possibilities for them to experience. For example, it can place language learners in a setting that immerses them in situations similar to real life and allows them to have conversations with virtual characters or other users (Taguchi, 2021). According to Lan (2021) and Wu et al. (2021), learning with iVR can facilitate learners in increasing learning motivation, learner self-efficacy, and learning behavior, further highlighting the unique affordances of an increased levels of immersion and engagement in learning.

Some recent studies, although not on language learning, have provided preliminary evidence on the use of iVR to improve learner engagement. For example, in a pottery-making lesson, Guan et al. (2023) adopted a quasi-experimental design to contrast three learning groups with different learning conditions. The findings suggested that the iVR group outperformed the paper-and-pencil and the clay-based groups in terms of cognitive aspects. Moreover, iVR learning also showed improvements in affective and behavioral engagement. Similarly, Bodzin et al. (2021) emphasized the role of iVR in fostering learner engagement through the implementation of Unity. Their research involving high school students in the United States revealed that the sense of presence, facilitated by context-aware technology, significantly contributed to learner engagement, allowing students to concentrate and pay attention during the learning process.

While iVR technology is said to be a promising addition to traditional education, little research has been conducted to explore elementary students’ iVR experiences regarding language teaching and learning. In contributing a deeper understanding of language education with iVR, this paper aims to explore how young language learners perceive iVR and how iVR supports to engage them behaviorally (live), affectively (play), and cognitively (learn) in learning.

2 Literature review

2.1 Three dimensions in learner engagement

No matter how technology transforms learning, learner engagement is a key construct in understanding and augmenting students’ learning performance and achievement. In this paper, learner engagement broadly refers to the effort made by a learner in iVR learning. More specifically, based on Fredricks et al. (2004), learner engagement can be understood from at least three dimensions: behavioral, affective, and cognitive engagement.

Behavioral engagement is closely associated with the actions, which are usually observable, that learners undertake in their learning. This dimension concerns students’ physical participation in learning, such as the amount of time spent on assignments, interaction with others, and learning behavior in class (Salas-Pilco et al., 2022). According to Li and Lerner (2013), behavioral engagement ranges from shallow engagement (e.g., attendance) to deeper engagement (e.g., effort). Unsurprisingly, educators such as Philp and Duchesne (2016) have found that the better on-task behavior learners demonstrate, the better they are engaged in learning. In contrast, learners tend to be disengaged when negative, off-task learning behavior is present. However, the differentiation between positive and negative behavior is by no means clear-cut or binary (Henrie et al., 2015). Therefore, to fully understand students’ behavioral engagement, it is also vital to examine their affective and cognitive engagement.

Affective engagement, marked by positive emotions (e.g., excitement, enjoyment, motivation), is believed to enhance a learner’s cognitive and behavioral engagement, while affective disengagement (e.g., boredom) can be detrimental to the amount of effort that a learner invests (Balwant, 2018). Affective engagement strengthens learner agency, which regulates cognitive and metacognitive processes vital to successful learning, and thus, increases learnability (Taub et al., 2020; Wang et al., 2019). As our human emotions play an integral part in our learning processes and emotions drive our attention and motivation, affective engagement impacts long-term memory and deep learning and as a result, it helps achieve intended learning outcomes (Dickey, 2020).

Cognitive engagement refers to the mental investment made to understand and internalize knowledge and skills. Psychological involvement is manifested via a variety of aspects such as learners’ self-efficacy, self-regulation, and meta-cognitive strategies (Nguyen et al., 2018). However, compared to behavioral engagement which centers on tangible effort, cognitive engagement focuses more on intangible effort in learning. Hence, cognitive engagement can be detected by students’ overt behavioral or verbal manifestations while they learn, and students’ action is a strong indicator of cognitive engagement (Chi & Wylie, 2014). In the similar vein, retrospective data (e.g., interviews, reflective journals) can provide useful information regarding students’ inner cognitive engagement (Philp & Duchesne, 2016).

Because engagement is dynamic, the three dimensions of engagement are intertwined and jointly indicate a student’s learning performance. For instance, Luan et al. (2023) argued that cognitive and affective engagement are strongly associated with behavioral engagement, and Li and Lerner (2013) found that behavioral engagement (e.g., active, interactive behaviors) could predict cognitive and affective engagement in learning. Furthermore, the three dimensions mutually reinforce each other. For example, Dubovi and Tabak (2021) affirmed that positive emotions, namely, affective engagement, lead to cognitive engagement, which in turn results in more on-task behavior. In another study that involved 114 EFL learners, researchers found positive and significant correlations between the three dimensions; when learners were behaviorally engaged, their emotional and cognitive engagement was enhanced (Al-Obaydi et al., 2023).

2.2 Language learning and iVR

The literature has documented various beneficial properties of iVR in promoting L2 learning. Most importantly, iVR can realize contextualized learning and facilitate active learning. First, it provides a fully immersed, context-rich environment where learners can be engaged in a realistic situation (Taguchi, 2021). In this visually simulated environment, rather than pretending to be in an imaginary situation, learners have immersive, multisensory learning experiences with whole-body involvement in which they can interact with virtual objects, other learners and the environment (Mystakidis, 2022). As experience-oriented interactions among learners are indispensable to language learning, embodied learning through iVR can play a critical role in L2 learning (Lan, 2021); it increases learners’ behavioral and cognitive engagement and ignites their imagination, especially due to spatial knowledge representation (Dalgarno & Lee, 2010).

Second, iVR enables experiential and active learning by engaging more senses during learning activities. iVR provides learners with fully interactive, immersive, and engaging experiences that would otherwise be less accessible. In this environment, learners are no longer passive recipients of knowledge or information in a conventional classroom but active learners who can learn by doing, living, and playing (Hu-Au & Lee, 2017). Embodiment via avatars in iVR allows learners to actively navigate and manipulate objects from the first-person view, manifesting a strong sense of immersion (Sadler & Thrasher, 2023). iVR effectively also boosts students’ motivation in language learning and lowers their language anxiety (Chien et al., 2020; Wu et al., 2021). In particular, iVR creates a conducive learning environment and meaningful social interactions, which play a major role in augmenting learners’ learning motivation and constructing their understanding of knowledge (Dalgarno, 2001). Overall, iVR exerts positive psychological impacts on learners, as experiences in iVR deepen learners’ emotional engagement and elicit a high level of motivation and enjoyment in learning (Di Natale et al., 2020; Lan, 2021; Taguchi, 2021). Consequently, prior studies have reported pedagogical benefits of using iVR in diverse areas of L2 learning, including vocabulary, cultural learning, and pragmatic competence (Chun et al., 2022; Lan, 2020; Shen & Xu, 2015; Taguchi, 2021).

Moreover, iVR is a particular domain in which learner engagement plays a significant role. Liu et al. (2020) showed that the experimental group with iVR demonstrated higher levels of behavioral, affective, and cognitive engagement than the control group without iVR in the science class. Studies also found that iVR increased learners’ affective engagement, particularly enjoyment, and cognitive engagement in language learning (Lan, 2021; Wu et al., 2021). Yet, to the best of our knowledge, learner engagement remains under-researched in iVR language learning.

Despite the potential benefits of iVR and the increasing affordability of the device, the body of research on iVR remains limited (Taguchi, 2021); moreover, most studies have focused on college students (Di Natale et al., 2020). In contrast to adults, elementary school learners tend to be less mature physically, affectively, and cognitively. Moreover, decontextualized language learning, such as imagination-based role play, can be very difficult for children with low English proficiency and limited cognitive development (Blyth, 2018; Frank et al., 2021). As a rudiment stage of using iVR, it is imperative to examine pedagogical implications of using iVR in the classroom. Thus, this study examines the effectiveness of iVR for low-proficiency English learners in Korea. Considering the idiosyncratic immersive nature of iVR, the present paper explored educational benefits and drawbacks particularly from the engagement perspective, and aims to understand the following questions:

  1. 1

    How did the elementary school students perceive learning English in the iVR environment?

  2. 2

    How did the iVR environment help to engage elementary school students in language learning behaviorally, affectively, and cognitively?

3 Method

3.1 The context and participants

Twenty-five 4th graders attending a private elementary school in Korea participated in the research. Their English proficiency was at a beginner level (around A1 in CEFR). The period for learning English ranged from one to five years. None of the students had used a head-mounted device (HMD) or iVR technology for educational purposes. Ethical clearance was obtained to collect data before the study. Two Korean English teachers planned the sessions with the researchers and took turns teaching in the sessions.

Immerse, an HMD-based commercial iVR platform developed particularly for learning English, was used for the current research. It offers various real-life places, such as an airport, zoo, and shopping center, wherein students can explore, act, interact, and communicate to learn the language (Fig. 1). This iVR environment enables the user’s perspective; thus, to view the other side, the user must turn his or her head as in real life. The user’s avatar moves in iVR as the user moves (e.g., walks, climbs, or sits). As Immerse is a high-immersion VR, it allows users to interact with other users through multimodal communication channels and manipulate with objects (Sadler & Thrasher, 2023). The platform also provides diverse instructional functions for teaming, rally, prompting, and focus to the instructor. The instructor can add objects (e.g., timer, scoreboard, and TV monitor) and embed teaching materials (e.g., slides, PDFs, images, and videos), as shown in Fig. 2. However, only a limited number of users, eight students with a teacher, can access each class on Immerse simultaneously.

Fig. 1
figure 1

Place selections

Fig. 2
figure 2

Instructional functions of Immerse

3.2 Procedures

The students participated in the project for four weeks including six iVR sessions (45 min each, twice a week), in-class worksheet activities, pre-and post-tests, and the survey. Due to the limited class size, the students were divided into three groups and participated in the sessions. The students used the HMD, with hand-controllers, to teleport to another location or move and interact with objects (Fig. 3). The teachers selected three places for the sessions, the gym, zoo, and fast food restaurant, which the elementary school students would like. Considering that the students enjoyed physically active activities, the gym was selected as the first place to familiarize and motivate the students with this novel way of learning. After completing each scene, the students completed the worksheet on paper.

Fig. 3
figure 3

Student movement during iVR learning

Below, we present brief descriptions of the six sessions:

  • Scene 1: Gym

    • Session 1: (Objectives: Introducing the device and the platform)

      The teachers provided an orientation session about using the HMD and the controllers, moving, acting, and interacting with others and interactive objects in Immerse. The students frequently asked the teacher for help and spoke mostly Korean (Fig. 4).

      Fig. 4
      figure 4

      Gym

    • Session 2: (Objectives: Getting familiar with the environment and learning the vocabulary of objects in the gym and expressions)

      The students continued to explore the gym and interact with objects. The teachers taught some English words and expressions in relation to the objects and activities in the gym. The students mixed Korean and English.

    • Session 3: (Objectives: Learning vocabulary and expressions and responding to the teacher’s questions)

      The Speaking English Only rule was applied. The teachers explained the vocabulary and gave instructions in English. The students listened to and responded to the instructions, both physically and verbally.

    • Session 4: (Objectives: Describing their activities in the conversations)

      The teachers and the students engaged in various conversations using target expressions. The students explained what they were doing in the gym in English.

  • Scene 2: Zoo

    • Session 5: (Objectives: Learning vocabulary about animals and expressions to describe and compare the animals’ weight, size, and speed)

      The students learned the target expressions for five minutes before the VR session. While exploring the zoo, they learned vocabulary, sentences, and animal content information (Fig. 5). The teacher provided a vocabulary list on the board, hid prompt cards and posted information about the animals (e.g., The penguin is 10 kg. The polar bear is 100 kg.) in front of each animal to teach the target expressions and comparative sentences (e.g., The polar bear is bigger than the penguin). After gathering the information, the students made comparative sentences about the animals using the flashcards on the board.

      Fig. 5
      figure 5

      Zoo

  • Scene 3: Fast food restaurant

    • Session 6: (Objectives: Performing different roles in the role play)

      The students were assigned roles before the VR session: cook, customer, server, and cashier. Because the students had learned basic expressions to order food (e.g., “what do you want to have?” and “I’d like to order a hamburger.”) previously, teachers intervened only when the role-playing did not go well. To build a conducive learning environment, teachers also posted the expressions used in the fast food restaurant on the board that the students could refer to during the role play (Fig. 6). The teacher set the timer, and after 5 min, the students switched roles.

      Fig. 6
      figure 6

      Fast-food restaurant

3.3 Data collection and analysis

The present study employed a mixed-methods approach, with qualitative data as the primary method and quantitative data as the secondary method. The qualitative data included video recordings of student verbal behaviors and their avatars’ movements and interactions in the three VR scenes (screen recordings), the students’ physical movements in class (in-class recordings) and post-project interviews. Interviews were conducted with two teachers (70 min) and five students (30 min, randomly selected) to develop deeper insight into students’ behaviors and attitudes. The interview questions for the teachers included their general perceptions of the iVR sessions, comparisons of the iVR sessions with traditional classrooms in terms of the students’ behaviors, interaction, participation in activities, and motivation, and the strengths and drawbacks of using iVR for language learning. The interview questions for the students included what they did and learned during the sessions, what they liked and did not like about the iVR sessions, their preferences between the iVR sessions and traditional English classrooms, and their reasons for that preference.

The videos were analyzed by three experienced coders to identify emerging themes based on a qualitative analysis coding protocol (Corbin & Strauss, 2008). The coders utilized both inductive and deductive coding schemes. For inductive coding, the coders repeatedly viewed the videos until they identified emerging themes. Based on the core themes, they conducted axial coding from the emerging themes. Then, deductive coding was performed to organize the themes according to three categories of engagement based on prior studies (Dubovi, 2022; Salas-Pilco et al., 2022), as shown in Table 1. In addition, because cognitive engagement could not be overtly observed, the videos were further analyzed in terms of the students’ passive, active, and interactive modes based on Chi and Wilie’s (2014) framework, which were indicative of the students’ cognitive engagement. The interview results were triangulated with the results of the recording analysis. Discrepancies between the coders found during qualitative data coding were discussed until consensus was achieved. Last, the researchers discussed and selected the representative language learning episodes (samples) that best described each theme of the research. The episodes were then transcribed and compared for closer analysis of each theme based on the learning affordance framework of the virtual learning environment by Dalgarno and Lee (2010). For anonymity, each student was identified as a number, such as S1 or S2.

Table 1 Coding samples

The quantitative data included the pre-and post-tests and the post-survey. Because the participants were young learners, to reduce test anxiety and cognitive load (Frank et al., 2021), the tests were made short and included images. The tests consisted of 10 multiple-choice questions: meaning recognition (finding the word or the expression to describe the image), sentence completion (fill-in-the-blank), and conversation completion (finding a correct response to a question). The post-test consisted of the same questions as the pre-test. The post-test was conducted two weeks after the students completed the iVR sessions to measure the long-term effect of learning. A post-survey was administered to better understand the students’ perceptions and experience with iVR, including 17 questions on the 5-point Likert scale about the experience of learning in Immerse in terms of behavioral, affective, and cognitive engagement and the usability of the HMD device. The survey questions were developed based on prior studies (Dubovi, 2022; Salas-Pilco et al., 2022) and modified for elementary school students according to their cognitive level. The quantitative data were analyzed for descriptive and inferential statistics (t-test) using SPSS 26.

4 Findings

4.1 Students’ perception of learning English in iVR and learning outcomes

The post-survey results indicated high levels of engagement of the students in all three dimensions. The means for behavioral, affective, and cognitive engagement were 4.38, 4.36, and 3.97 respectively. Overall, the results showed that the students enjoyed the activity and perceived learning English in the iVR environment effective and interesting. From the behavioral perspective, the students actively moved, interacted, and participated in the activities in iVR. From the cognitive perspective, they preferred the iVR environment in learning English and perceived role-playing in the iVR environment as easier than in the traditional classroom. The post-survey results are summarized in Table 2.

Table 2 Descriptive statistics of the survey

The post-test results showed that the students’ learning outcomes after the activities in iVR significantly increased in all question types, i.e., recall, sentence completion, and conversation completion (M = 6.70, SD = 1.550), compared to the pre-test (M = 2.87, SD = 1.424). The t-test confirmed that the difference between the pre-test and post-test results was statistically meaningful, and the effect size turned out to be extremely strong (Cohen’s d = 2.050) (Table 3).

Table 3 Paired t-test results

4.2 Students’ engagement in the iVR English learning environment

The current study analyzed qualitative data (screen and in-class recordings, interviews with the teachers and the students) and discovered findings based on the engagement framework (Dubovi, 2022; Salas-Pilco et al., 2022).

4.2.1 Behavioral engagement

The screen recording analyses showed that the students were behaviorally engaged; they actively moved, explored, and participated in the activities in the iVR learning environment. Particularly, in the gym, due to the nature of the situation, they were physically more active, and language acquisition and practice occurred in accordance with their physical involvement, as shown in the following excerpts:

  • Excerpt 1 (Session 3)

    1. 1

      T: Okay, where is the treadmill? Let’s go to the treadmill. Come on, everyone, to the treadmill.

    2. 2

      S1: Treadmill is here. (Ss are moving to the treadmills.)

    3. 3

      T: Yeah, that’s a treadmill. Over there. Let’s go. (Others are following)

    4. 4

      T: Let’s run on the treadmill. (Ss are running on the treadmills.)

    5. 5

      T: Now, let’s throw the ball into the hole. (They are moving to the ball area) Can you throw the ball? (Ss are throwing the ball into the hole. The balls are going into the hole; Ss are cheering.)

In Excerpt 1, the students moved as requested by the teacher, which had been inconceivable in the traditional classroom. The students not only physically moved but also verbally responded to the teacher (Line 2), showing their understanding of the instruction in English and active participation in learning. By seeing, touching, and running in the gym, the students naturally and easily understood and learned new vocabulary. While the students physically responded to the teacher more often in Sessions 2 and 3 (Excerpt 1), they were more verbally engaged in Session 4, as shown in Excerpt 2.

  • Excerpt 2 (Session 4)

    1. 1

      T: What are you doing?

    2. 2

      S1: I’m climbing.

    3. 3

      S2: I’m running.

    4. 4

      T: Be careful!

    5. 5

      T: What do you want to do?

    6. 6

      S1: Boxing.

    7. 7

      S5: I want to do boxing.

    8. 8

      S6: I want climbing.

    9. 9

      T: Is there anybody who wants to do sit ups?

    10. 10

      Ss: (Raising their hands, speaking loudly) Me, me. (Ss are moving to the yoga mat area)

    11. 11

      T: Everyone, sit up. Who can do sit ups?

    12. 12

      Ss: (Ss are doing sit ups) Up, up!! (Cheering for others)

The target expression pairs of Session 4 included (1) “what are you doing?” “I am doing ~,” (2) “what do you want to do?” “I want to do,” and (3) “what’s your favorite activity?” In this session, the students said what they were doing (Lines 2 & 3) and what they wanted to do (Lines 5–8) using the target expressions while they were doing those activities. Unlike in the traditional classroom, where students sat and practiced drills for speaking, in this environment, the students were engaged in the activities verbally and kinetically at the same time; thus, they learned the language through lived experience. The in-class recordings captured the students’ movements during the sessions and showed their active participation and excitement about the new learning environment. In the videos, the students, not only their avatars, were constantly moving (e.g., running on the treadmill, climbing the wall, exploring the zoo) and remained physically active throughout the sessions (e.g., Fig. 7).

Fig. 7
figure 7

In-class recordings of the students at the gym

The students mentioned during the interview that they enjoyed the iVR sessions because they could physically move in the environment and interact with the objects. S10 commented that touching and moving the objects in iVR was a new and interesting experience. He and other interviewees responded that they liked the gym best because they loved jumping, running and climbing with friends, and they could participate in a wide variety of activities in the gym. S13 said that she especially liked the zoo because she enjoyed moving around with friends and finding information cards about the animals. In addition, the teachers said that by being physically involved in the activity, the students could learn the language in a more interesting and easy way and remember what they learned longer. Although moving around made the classroom quite messy and noisy, the teachers believed that students could learn best by actually being engaged and interacting in the iVR environment.

4.2.2 Affective engagement

Overall, the video recordings indicated that the students enjoyed learning English in iVR. In particular, the in-class recordings vividly showed how motivated and affectively engaged the students were during the activities. Wearing the device, they were totally immersed in the environment and actively moved. They were very excited and frequently smiled, laughed, and as shown in Excerpt 1 and 2, cheered. In fact, in the first session, the students were too excited to study and entirely distracted by the interactive objects in the gym. They were constantly playing with the objects and exploring the environment and did not pay attention to the teacher, despite the teacher’s efforts to teach.

In the second session, the students calmed down and began to listen to the teacher. The students became better focused on the teacher and activities in the following sessions in the gym. In other words, they began to study as their excessive excitement subsided. However, the students became excited again when visiting a new place, the fast food restaurant (Session 6). Before logging on, each student was assigned a role in the restaurant; however, as soon as they logged on to the site, everyone began to rush to the kitchen. The interactive objects in the kitchen were so tempting that everyone became a cook regardless of their assigned roles. After 10 min, they began to listen to the teacher again, performed their assigned roles, and completed the role-play mission (Fig. 8).

Fig. 8
figure 8

Role-play at the fast-food restaurant

Both the students and the teachers said in the interview that the students truly enjoyed learning English in the iVR environment. The students said that they liked being in and studying in iVR, “the VR sessions were too short,” and they “wanted to study in VR five times a week.” The teachers explained that the students perceived the activities as fun games rather than studying at first. Originally, the classes were designed to have both a control group and an experimental group for the current research; however, the experimental group bragged about their “fun VR games” to others, and the control group wanted fun VR games as well and joined later (thus, no control group was included in the current study). The teachers also said that they regarded the students’ laughing during class as an important indicator of their motivation and enjoyment of learning. They observed that the students laughed constantly in the iVR sessions; therefore, they believed that learning in iVR was successful and satisfactory.

On the other hand, the teachers mentioned difficulties due to the students’ excessive excitement. They described studying in iVR at first as “disciplining the students and telling them to study in the playground where there were so many fun things to do and toys to play with.” However, because the students were highly interested and motivated during the iVR sessions and behaved better in the later sessions, despite the initial excessive excitement, the teachers believed that iVR would be an effective language learning environment in the long term.

4.2.3 Cognitive engagement

As mentioned earlier, cognitive engagement could not be directly observed, but it was observed indirectly through students’ overt behaviors while learning. Hence, the videos were analyzed in terms of the students’ passive, active, and interactive modes (indicative of the students’ cognitive engagement) based on Chi and Wilie’s (2014) framework. The results revealed that the students were constantly active (97.6%) and interactive with objects or others (i.e., teachers, peers) during most of the activity time (78.9%) and rarely remained passive. This result suggested that the students were cognitively engaged while learning in iVR. The following excerpt exemplifies how a student learned vocabulary (dumbbell) and understood the meaning (lift a dumbbell) through interaction with others in iVR.

  • Excerpt 3 (Session 3)

    • T: It’s weight training time. Let’s move. (Ss are moving). Lift dumbbells.

    • S1: What is dumbbell?

    • S2: (Showing the dumbbell) This is a dumbbell.

    • T: Lift it. S1: (Picking up a dumbbell)

    • Other students: (Lifting up dumbbells) Lift up a dumbbell.

    • S1: (Lifting the dumbbell) Lift dumbbell.

The excerpt showcased that the students were cognitively engaged in learning, and S1 learned the language easily by watching others and interacting with props in iVR. Importantly, the excerpt showed that context of the iVR environment played a crucial role in enhancing students’ cognitive engagement in language learning. That is, the simulated environment allowed the students to be cognitively engaged in the activity in a more authentic and natural way. Moreover, it enabled them to be engaged in the role play in a more realistic and immersive way. For instance, in the fast food restaurant, they performed the missions according to their assigned roles, such as ordering and making hamburgers. During the role play, they could use the props, such as vegetables, oven, trays, credit card, and cash, and they put patties on the oven to cook (patties smoked and sometimes burned), assembled hamburgers, and delivered them to the customers, which mimicked the real-life situation. With regard to cognitive and language learning aspects, this simulated environment enabled the students to acquire language more naturally; at the same time, it helped to reduce their cognitive load for language learning through representations that were similar to real life.

The interviews also indicated that the students were cognitively engaged in the activities and learned the language. In the interviews, all five students reported that they could learn English expressions and vocabulary from the activities. S6 mentioned that she could learn English in iVR and remember the expressions. She said that she would not forget what she learned because she learned it by experiencing and doing. S11 said that she learned many interesting facts about the zoo animals and English expressions to describe and compare the animals. She remarked that she “improved her knowledge about English and animals from the activity (e.g., learning from the information cards) in the zoo.” The teachers agreed on this; while the students had similar information-card activities in the traditional classroom, the teachers found that the students were more engaged in reading the cards in the zoo. The teachers also said that “the most significant advantage of VR technology was to provide a real-life, authentic context for language learning, which was seldom possible in the traditional English classroom in Korea.” The teachers mentioned that in this environment, the students could acquire the language naturally and more effortlessly; hence, language learning became easier, and even low-level students could enhance their cognitive engagement and language learning more easily in this environment. Most students offered remarks such as “the iVR environment definitely kept my motivation of learning and helped me stay on-task (S5)”.

5 Discussion

The current study discovered that using iVR technology positively impacted students’ engagement in language learning. The survey results suggested that the students maintained high levels of engagement in all three dimensions. The current study also showed that the iVR learning environment supported students’ engagement. From the affective engagement perspective, previous studies have argued that iVR positively influences learners’ emotions, engagement, and motivation to learn (Allcoat & Muhlenen, 2018; Di Natale et al., 2020). Similarly, in the present study iVR brought a powerful new experience to the students and they were highly motivated, interested, and immersed in iVR. Due to the washback effect of exam-oriented culture in many countries, language learning in South Korea has long been plagued by students’ negative emotions and feelings, such as fear of making mistakes (Xu & Carless, 2017). By creating a nonthreatening, fun, and interactive learning environment, learners were encouraged to explore the virtual world in a less anxious and more affectively engaging manner compared to the traditional teacher-fronted classroom (Lee, 2022). Accordingly, our findings suggested that laughter was a marked feature in the recorded learning episodes, a strong indicator of students’ reduced levels of anxiety in learning. In addition, Dalgarno and Lee (2010) claimed that the high fidelity and interface of the VR environment increase students’ intrinsic motivation, and similarly, in this study, the visual representation of the real world, navigating the space with the avatar, manipulating objects, and interacting with peers was sufficient to motivate the students. Their increased motivation and enjoyment seemed to in turn, lead to behavioral and cognitive engagement.

Prior studies have noted that students often become passive and demotivated in the textbook-based language learning classroom, which cannot lead to students’ engagement or learning outcomes (Blyth, 2018; Lee, 2023). For instance, in a teacher-centered classroom, learners behaviorally complete a language drilling task assigned by the teacher. However, they affectively dislike this way of learning, so they cognitively fail to internalize the knowledge or transfer the skill to other assignments. In contrast, the present study showed how actively the students participated in the activities in iVR. Because the students were engaged in real-life activities in the simulated context, learning became more meaningful and powerful. With the kinetic ability to manipulate objects in iVR, the students in this study became intrinsically active and actively participated in the activities. In particular, Immerse offered multisensory-motor engagement and ego-centric navigation from the first-person view, and these features helped increase the students’ sense of themselves and their peers and enhanced active learning (Di Natale et al., 2020).

Prior literature has suggested that learning by doing is effective for both knowledge attainment and retention (Allcoat & Muhlenen, 2018; Hodges, 2020), and iVR furnishes interactive objects corresponding to a specific context, and the capability of iVR to physically gesture and manipulate further facilitates students’ active learning and learning by doing (Legault et al., 2019). In the same vein, the students in the current study constantly moved around and interacted with objects, and their bodily movement promoted embodied experiences, which enhanced behavioral engagement and experiential and active learning. For instance, in terms of vocabulary, the students had better chances of incidental vocabulary acquisition and retention by touching and interacting with objects in iVR, compared to the traditional English classroom where they usually memorized the word without any context. As active learning develops an understanding and deep learning of a subject matter (Annansingh, 2019), the students’ learning outcome significantly increased in the current study.

Therefore, active learning is not limited to behavioral engagement but is closely related to cognitive engagement (Chi & Wilie, 2014; Lai & Chen, 2023). From a cognitive engagement perspective, the iVR learning environment stimulated the students’ active learning, reduced cognitive load, and helped them learn English more naturally. While students learn the language as abstract knowledge in a decontextualized manner in the traditional EFL classroom, according to the cognitive theory of embodied representation, iVR helps students reify abstract knowledge (language) and create mental models (Hu-Au & Lee, 2017). Simply put, we make mental representations of words, objects, action, and concepts when learning a language. In the traditional classroom, making these mental representations in the new language is difficult. In contrast, iVR demonstrates the visual representations of words, objects, and actions; thus, it helps reduce students’ cognitive load and facilitates language learning. The visual representation of the real world in iVR cognitively stimulated the students, and according to Lan (2020), language learning in this environment changes the learner’s cognitive structure. Moreover, while the traditional classroom lacks perceptual features in language learning, language learning in iVR is more perceptual and thus intuitive and natural (Di Natale et al., 2020. For instance, in the present study, the students utilized props to order food (e.g., money, a credit card, and a menu), take an order (e.g., use a cash register), make food (e.g., use an oven and make bread, drinks, and patties), and serve food (e.g., use a tray) in the restaurant. Such activities in a similar real-life environment can bridge the gap between knowledge acquisition and application so that it is more likely that students will be able to recall the acquired knowledge and transfer and apply it in the corresponding real-life situation (Dalgarno & Lee, 2010; Di Natale et al., 2020). In conclusion, the students in this study were highly motivated and enjoyed the activities in iVR, which led to on-task behaviors and, in turn, to enhanced cognitive engagement. The three dimensions of engagement, intertwining and influencing each other resulted in improved learning outcomes, as indicated by the t test.

Interestingly, the current study discovered a finding on the novelty effect inconsistent with previous studies. Previous studies warned that when the novelty effect wears off, learning effects will wane as well (Zhang & Zou, 2022). However, in the present study, the novelty effect of new technology in the beginning was too large and interfered with learning. Instead, learning occurred only when the novelty effect waned and the students’ excitement had subsided. As children’s learning is usually marked by a short attention span and easy distraction (Frank et al., 2021), the students in this study were easily distracted when situated in a new, content-heavy environment. The contrasting examples of the fast food restaurant (i.e., there were many interactive objects) and the zoo (i.e., there were few interactive objects) showed this; when the students first logged on to the restaurant, they could not focus on the teacher’s instruction or perform their assigned roles due to their excessive excitement. In contrast, the students focused on the mission in the zoo from the beginning. Although (or because) they could not touch the animals or make anything, they concentrated on finding the prompt/information cards, obtaining the information from them, and making sentences with flashcards. They successfully completed the missions and enjoyed the activities. As a result, most students performed well on the worksheet after the zoo activity. Accordingly, the teachers regarded the session in the zoo as the most successful.

The current study proposed several pedagogical implications based on the results. First, merely using cutting-edge technology does not guarantee successful learning. The present study showed that the students exhibited different behaviors in the zoo than in the restaurant. In the course of learning, even a small detail can influence students and their learning. Therefore, to successfully implement iVR in the language classroom, careful pedagogical decisions are needed, including deciding why and how to use the technology. The teacher must first assess the students’ needs, goals, and levels. In designing tasks, the teacher should consider activities both endogenous and exogenous to iVR and effectively incorporate iVR into traditional classroom activities (Dalgarno & Lee, 2010). While using the technology, the teacher should monitor students’ progress and provide adequate feedback. Importantly, the teacher should monitor for distractions and make plans ahead to manage them. As the results of the present study showed, factors in the behavioral, cognitive, and affective dimensions are intertwined with and influenced by one another. Therefore, when designing the lesson with iVR, the teacher must carefully consider all three dimensions and maneuver in ways that each dimension can facilitate the others.

In addition, although the students in the current study did not report physical discomfort with the technology, other researchers warned of potential health issues, such as dizziness and fatigue (Han, 2021; Wu et al., 2021). As shown in this study, potentially distracting features of iVR may increase the cognitive load (Chun et al., 2022). Hence, instead of replacing the entire classroom learning, iVR at this stage may only be a supplementary tool of learning, particularly for young learners (e.g., using iVR for 20 min to enhance student experience after 1 h of classroom teaching). Last, as iVR is still a novel technology for most students, teachers should provide clear instructions on how to use the technology to avoid confusion and frustration. As shown in this study, giving an orientation session to become familiar with the technology will be helpful.

6 Conclusion

The present study included 25 elementary school learners and contributed to our limited understanding of how low-proficiency learners learn with iVR. The qualitative and quantitative data highlighted and rationalized the positive effects of iVR on language learning. Particularly, this study investigated the students’ behavioral, affective, and cognitive engagement during learning in iVR and found the positive impact of using iVR on student language learning. Although small in scale, this study is one of a few studies to explore elementary language learners’ experience and perceptions of iVR. However, due to the small number of participants, it may be difficult to generalize the results of the study. As the participants were young learners, obtaining their ideas during the interview was difficult. Perhaps the draw-a-picture technique can be an effective way to elicit deeper thinking from young students (e.g., see Hwang et al., 2023). Longitudinal studies on the effectiveness of using iVR in L2 learning will also be fruitful. More related studies are welcome and needed to deepen our knowledge and guide the development of iVR technology. Future research is required to cover topics such as age, gender, learning styles, self-regulation strategies, and cultural differences.

Language education has been thought to lag behind the development of technology, and even worse, technology companies often define what language learners and teachers need by offering limited choices for students and teachers (Mystakidis, 2022). Since iVR is still a novel technology, educators, such as Wu et al., (2023), advocate the active involvement of language learners, teachers, and researchers to better mold the technology to serve the needs of language education.