1 Introduction

Artificial intelligence (AI) technology has had a significant impact on educational institutions (Kowitlawakul et al., 2017; Logan et al., 2021). The use of AI in education has opened up new possibilities for enhancing technologically enhanced learning environments (Hwang et al., 2020; Kabudi et al., 2021; Moreno-Guerrero et al., 2020). Systems for learning that are supported by technology have many benefits, such as improved understanding of learning, time flexibility and management of students’ education, as well as faster user development (Chou et al., 2018; Kabudi et al., 2021; Moreno-Guerrero et al., 2020; Pliakos et al., 2019).

Virtual reality (VR), as opposed to conventional multimedia resources, digital learning games, and educational software, was used in new approaches to technology-enhanced learning systems (Reitz et al., 2019). VR, by definition, removes users from their current environment by completely submerging users in three-dimensional (3D) simulated reality via head-mounted displays (HMDs), tracking devices, data gloves, and software systems (Bamodu & Ye, 2013; Lin et al., 2021a). AI and VR are two distinct technologies that can be interconnected in various ways to enhance and optimize user experiences (Bastug et al., 2017). Some key connections between AI and VR could be intelligent avatars, which create intelligent virtual avatars within virtual reality environments, behavior prediction and adaptation which analyze user behavior patterns within virtual reality environments and make predictions about their preferences, reactions, and actions, as well as training and simulation which can be combined to create realistic training and simulation environments (Chaudhary, 2019; Oyelere et al., 2020).

The connection between AI and VR in education can revolutionize the learning process by offering immersive, interactive, and personalized educational experiences. VR allows and enhances students’ comprehension of abstract concepts using a realistic technique (Oyelere et al., 2020). As a result, it is widely used in engineering and health sciences (Hamilton et al., 2021). The use of VR technology in language learning, however, is novel, and language instructors have embraced VR as a cutting-edge method of learning to enhance users’ language learning experiences (Lin et al., 2021a; Liu, 2008). VR has gained widespread acceptance by combining language acquisition theories like communication, logical input, and output theory (Egbert et al., 2020; Lin et al., 2021a). For instance, Japanese undergraduate English as a second language (EFL) students said that VR was more enjoyable and entertaining than voice-video-based oral communication learning (York et al., 2021). In lower elementary grades, the integration of VR technology in English learning can be particularly advantageous. Young learners in these grades can greatly benefit from the immersive and engaging nature of VR, as it offers them a unique and interactive language learning experience (Egbert et al., 2020). VR can effectively advance problem-solving, active learning, creativity, interaction, and collaboration (Kessler, 2018; Lin et al., 2021a). As a result, VR can be used more frequently in language learning situations to benefit from its educational benefits.

Since English is widely regarded as an international language, it is frequently incorporated into education and other specialized fields, like business and engineering (Logan et al., 2021). One of the main objectives of current educational initiatives is to teach English for specific or professional purposes in order to prepare students to manage persistent globalization for future careers (Latif, 2017; Chen et al., 2021; Fillmore, 2014).

Since the 1970s, emphasis has been placed on authenticity in language learning, especially for those learning EFL because they typically have limited contact with authentic input and opportunities for language use outside of the classroom (Chen et al., 2021; Gardner & Lambert, 1972). The development of technology-enhanced learning has made it possible to contextualize the study of foreign languages in everyday situations. By creating and simulating real-world scenarios using “embodied cognition,” emerging technologies like VR enable students to learn in language-immersive environments using a variety of informational modalities (Chen et al., 2021; Hamilton et al., 2021).

Many studies confirm the value of technology-enhanced learning (Allcoat & von Mühlenen, 2018; Fisher, 2005; Khan et al., 2019; Maheshwari, 2021), but there is little research on how these technologies are used based on technology-related theories (Suh & Prophet, 2018). Additionally, as most education was done from home during the COVID-19 pandemic, which was a significant change in the field of education, online learning tools like Zoom and Google Meet became popular (Valentino et al., 2021). The effectiveness of online versus offline learning has been the subject of numerous studies (Wiyono et al., 2021); however, less research has been done on the effectiveness of VR in language education when compared to conventional methods of instruction (Köse & Güner-Yildiz, 2021). The majority of studies on VR in education are focused on students who have typical development and Science, Technology, Engineering, and Math (STEM) education (Köse & Güner-Yildiz, 2021; McMahon et al., 2016). In STEM education, VR has been used to examine 3D solar system models and introduce human organs and structures (Köse & Güner-Yildiz, 2021; Taryadi & Kurniawan, 2018).

The present study explored the influence of using VR on students’ abilities to learn English compared to voice- and video-based oral communication. In addition, this study investigated what constructs affect students’ continuous use of VR utilizing the Technology Acceptance Model (TAM).

2 Literature review

There are several theoretical stances in the extensive field of study on information systems (IS) applications. The TAM is regarded as the most well-known and frequently applied theory for explaining a person’s acceptance of IS (Adams et al., 1992; Ajzen, 1985; Ajzen & Fishbein, 1980; Chau, 1996; Gefen & Keil, 1998; Lee et al., 2003; Triandis, 1980; Strader & Shaw, 1997; Alfadda et al., 2021; Natasia et al., 2022; Rad et al., 2022). This IS theory explains how users use technology and propose behavioral intention as a driving force behind user adoption. Additionally, this model is a multifaceted paradigm that illustrates how interactions between cognitive factors in challenging learning situations lead to certain outcomes (Panisoara et al., 2020). Using TAM3, this study carried out an empirical investigation into VR acceptance and assimilation.

2.1 Theoretical framework: Technology acceptance model 3 (TAM3)

According to Davis (1989), who created the TAM, the fundamental characteristics of behavioral intention are perceived usefulness and ease of use of information technology. As opposed to perceived ease of use, which is defined as “the degree to which a person considers that using a specific system would be free of effort,” perceived usefulness is defined as “the degree to which a person considers that using a specific system would enhance his or her job performance” (Davis, 1989, p. 320).

With academics and practitioners examining the impact of users’ perceptions and attitudes toward IS on acceptance and resistance, TAM has constantly improved from its original model (Lucas et al., 1990). The TAM also suggests that perceived usefulness and ease of use serve as a mediator between external factors like design features and behavioral intention. TAM2 reveals the external variables of perceived usefulness and ease of use and provides a tangible mechanism for progressing the multi-level model. Venkatesh and Davis (2000) identified social influence, such as subjective norms, and cognitive instruments, such as job relevance, image, quality, and results demonstrability, as external variables of perceived usefulness. Venkatesh (2000) reported anchors, such as computer self-efficacy, perceptions of external control, computer anxiety, and computer playfulness, and adjustments, such as perceived enjoyment and objective usability, as external variables of perceived ease of use.

TAM2 (Venkatesh & Davis, 2000) and the model of variables of perceived ease of use were combined by Venkatesh and Bala (2008) to create the integrated model of technology acceptance known as TAM3 (Fig. 1) (Venkatesh, 2000). TAM3 offers a comprehensive nomological network of variables that influence how people adopt and use IT. In TAM3, the key constructs are: 1) Perceived usefulness: Users' perception of the extent to which a technology can improve their performance or productivity, 2) Perceived ease of use: Users' perception of the degree of effort required to use a technology, 3) Perceived enjoyment: Users' perception of the degree to which using a technology is fun, enjoyable, or entertaining. These three constructs (perceived usefulness, perceived ease of use, and perceived enjoyment) collectively influence users' attitudes toward technology, which, in turn, impact their behavioral intention to use the technology (Venkatesh & Bala, 2008). Also, TAM3 proposes three relationships between: 1) perceived ease of use and perceived usefulness, 2) computer anxiety and perceived ease of use, and 3) perceived ease of use and behavioral intention (Venkatesh & Bala, 2008).

Fig. 1
figure 1

Integrated model of technology acceptance, TAM3

It is suggested that TAM3 explains how the aspect of technology acceptance affects users; attitudes toward technology, which is a clear indicator of their behavioral intention to use it for a particular purpose.

While the TAM was initially developed by Davis in 1989, its application has evolved over time, incorporating new technologies such as Web 3.0 and Web 4.0. While there is no specific version of the TAM that explicitly addresses Web 3.0 and Web 4.0, researchers have extended the model to incorporate these advancements (Choudhury & Pattnaik, 2020; Natasia et al., 2022). In the context of Web 3.0 and Web 4.0, researchers have considered additional factors and variables within the TAM framework. For example, perceived usefulness may be expanded to include the enhanced capabilities and intelligent features offered by these advanced web technologies. Perceived ease of use may incorporate aspects related to the intuitive and user-friendly interfaces of Web 3.0 and Web 4.0 applications (Choudhury & Pattnaik, 2020). The external variables that influence perceived usefulness and ease of use in the TAM framework can be adapted to encompass factors specific to Web 3.0 and Web 4.0. These factors may include the semantic web, artificial intelligence, machine learning, natural language processing, data interoperability, and personalized user experiences (Choudhury & Pattnaik, 2020).

Therefore, while the TAM itself has not been directly updated to explicitly include Web 3.0 and Web 4.0, researchers have extended and customized the model to explore the acceptance and adoption of these advanced web technologies, considering their unique features, benefits, and user perceptions (Choudhury & Pattnaik, 2020).

This research emphasizes the impact of VR use on students’ English learning abilities; thus, several VR-related constructs, such as image, result demonstrability, computer anxiety, computer playfulness, and perceived enjoyment, were considered for analysis. Although the TAM has not been specifically revised or updated to incorporate Web 4.0 technologies, TAM3 was used to determine the impact of various VR technology properties on user attitudes.

2.2 Potential of VR in education

Previous studies have outlined some benefits and potential uses for VR in education. First, it is critical to pique students ‘interest and motivate them to learn (Amabile, 1990; Lei et al., 2018). VR outperforms traditional education in terms of establishing learning interests and influencing students’ internal motivations, which leads to behavioral changes (Lin et al., 2021b). Furthermore, VR can help students push themselves out of their comfort zones and challenge their boundaries, which is a significant factor in education (Lin et al., 2021b).

Second, VR can create environments that require a lot of focus in an educational setting. These settings allow for the creative and innovative teaching of concepts, as well as the stimulation of students’ imaginations, which is essential for creative work (Hu et al., 2016; Patera et al., 2008). These simulated environments can also focus students’ attention and provide a top-notch educational experience. Students' focus can be increased by the first-person perspective, three-dimensional (3D) panoramic animation, and speaking voice associated with VR settings (Wyk, 2011).

Finally, VR enables experiential learning (Lei et al., 2018). Students learn the knowledge required in a situation and apply what they have learned. VR activities necessitate observation, communication, and self-clarification, all of which can help students improve their comprehension skills (Lin et al., 2017). Furthermore, VR provides a safe environment for students to act vicariously (Lei et al., 2018; Wyk, 2011) as well as a cost-effective approach to optimizing all traditional creativity development techniques (Thornhill-Miller & Dupont, 2016).

2.3 Educational VR applications

VR educational environments have high levels of interactivity and participation, which can help enhance learning motivation and collaborative learning (Lin et al., 2017; Vedadi et al., 2019). Several studies have found VR environments (VREs) or educational VR applications for students. VREs can expose students to abstract concepts, such as the “Round Earth Project,” which teaches students about the Earth as a sphere (Johnson et al., 1999). Some VREs allow students to create new virtual objects at their leisure. NICE, an immersive multiuser learning environment, for example, allows students to create their virtual garden in which they can control the weather and time, allowing them to investigate complex ecological interrelationships (Johnson et al., 1998). VREs can be used to rebuild defunct historical sites (Mosaker, 2001), allowing students to visit and experience historical landmarks that were initially only accessible through photographs or videos (Lei et al., 2018). VR encourages immersion by enabling exploration in 3D space, in contrast to two-dimensional (2D) images or videos where the students only interact as separate observers.

2.4 English learning through VR technology

Students should indeed learn a foreign language to be competitive due to the rising demands for international communication brought on by globalization (Chen et al., 2021). Learning English has been given priority in many Asian nations, including Japan and the Republic of Korea, in order to participate in international contexts (Chen et al., 2021; Honna, 2016; Tsui, 2020). The acquisition of vocabulary, particularly for specific filed or terminology, is difficult for EFL learners most of the time (Elahe & Alireza, 2018; Patahuddin et al., 2017). This is a significant drawback because a strong correlation exists between adequate vocabulary knowledge and English reading, writing, and listening comprehension (Chen et al., 2021; Johnson et al., 2016).

The limited situations in which English can be used for communication present another issue for EFL students. Genuine input can contribute to the development of favorable learning attitudes, motivation, and results (Hidayati & Diana, 2019; Huda, 2017; Monteiro & Kim, 2020). The effectiveness of educational materials can be increased if they are paired with authentic learning tasks and integrated into specific scenarios and meaningful contexts (Yeh et al., 2020). However, insufficient emphasis has been placed on authenticity in the acquisition of English vocabulary for specific purposes, which could be enhanced by integrating language education resources into realistic scenarios through VR mediation.

Recently, the use of VR for language learning has caught the attention of researchers. For language learning, VR provides immersive environments similar to those found in other fields. Students can use virtual avatars in 3D environments to adopt a first-person perspective (Lan, 2020; Slater, 2017). Additionally, VR enables highly interactive learning environments with visual, aural, and tactile experiences where students can interact in the target language (Chen et al., 2021; Chen, 2016a, 2016b; Yamazaki, 2018; Yeh et al., 2020). Numerous empirical studies showed that VR can support language education in a variety of ways (Chen, 2016a; Hamilton et al., 2021; Parmaxi, 2020). The impact of VR-assisted English education platforms on Students’ cognitive and linguistic development has been studied in the classroom, with the results indicating phonological, morphological, grammatical, and syntax knowledge (Chen, 2016a). To investigate the impact of 3D avatars on English listening comprehension, Lan et al. (2018) used on-site and virtual education with two groups of students. They discovered that the virtual education group outperformed the physical education group on the listening comprehension test. Alfadil (2020) studied the effect of a VR game on students’ English vocabulary acquisition and discovered that the VR learning group outperformed the regular classroom learning group. In immersive VR educational settings, Legault et al, (2019) showed that interaction with 3D avatars and objects increased learners’ vocabulary acquisition accuracy and speed. In addition, Chen et al, (2022) examined the overall effects of VR in teaching English as a second language. The findings suggest that VR-based interventions positively impact language learning outcomes, including vocabulary acquisition, listening comprehension, and oral production (Chen et al., 2022).

Although VR technology has an impact on language learning, it has not been thoroughly investigated how it differs from traditional teaching techniques. Additionally, few studies have been conducted to comprehend students’ experiences with VR technology using technology-related theories (Chen et al., 2022; Suh & Prophet, 2018). Therefore, the present study aimed to address these gaps.

3 Materials and methods

This study posed the following research questions (RQ):

  1. 1.

    What constructs affect students’ continuous use of VR?

    Perceived usefulness and perceived ease of use are two important constructs that can influence students' continuous use of virtual reality (VR)

  2. 2.

    What are the advantages and potentials of VR for English education?

3.1 Participants

This study enlisted participants in VR-use English classes for Spring English Camp. Participants in Study 1 included 120 students and 300 students in Study 2, who were chosen from a pool of 476 students based on their English ability as determined by a pretest. All participants were second or third-grade students who attended Korean elementary schools and spoke Korean as their first language. The $70 entrance fee for the English camp was waived for participants to encourage participation. Participants in the second quantitative analysis received a $30 English book. Table 1 displays demographic information for Study 1 and 2 participants.

Table 1 Demographic information of participants in Studies 1 and 2

3.2 Procedure

All participants provided written informed consent before the participant. Following that, a questionnaire was distributed to participants to collect demographic information. Among 300 students, 120 were assigned at random to Study 1, related to students’ continuous use of VR and was examined using TAM. Participants in Study 1 entered the VR room, where the instructor instructed them on VR tools like a head-mounted display (HMD) and other equipment, experiment time (20 min), and class contents. Students were given a questionnaire on VR technology acceptance after experiencing it, which was rated on a five-point Likert scale (1 = strongly disagree; 5 = strongly agree). If they had any questions, they were answered by the instructor. This procedure was repeated until the participants completed the survey. On average, it took 23 min to complete questionnaires.

The first step of Study 2 was the same as Study 1; however, Study 2 included two phases: pretest and main study. In the pretest stage, the students were expected to answer basic English questions (see Appendix 3). As Study 2 aimed to understand the advantages and potentials of VR compared to traditional teaching methods, the level of students in each instructional method was similar. When students indicated that they were ready to start, the main study commenced. Participants were randomly assigned to one of two teaching methods. Students learned English regarding police stations and performed problem-solving tasks, which consisted of three multiple-choice questions using action keywords, vocabulary, and expressions for police stations (Appendix 4). The time they took to answer each problem-solving question was recorded. On average, it took 17 min to complete both questionnaires. This experiment was performed between April 18th, 2022 and May 17th, 2022.

3.3 Instrument

Three 5,000-lm projectors, a 360-degree stereoscopic screen, six VIVE pro controllers, six VIVE MAG P90 Guns, and an Intel Core Processor (CPU) i7 server were used in this study. To track user motion, the VIVE Pro headset was used to play the catch vocab game. The VIVE MAG p90 Gun was used for the catch criminal game. The research instruments are depicted in Fig. 2. A classroom, whiteboard, activity book, and screen for watching videos were used in the traditional teaching method. Figure 3 depicts the participants who are learning English the traditional way.

Fig. 2
figure 2

Research instruments for technology-enhanced learning

Fig. 3
figure 3

Traditional teaching method

When designing a questionnaire for lower elementary school students' English test, we consider their age, language proficiency, and cognitive abilities. Therefore, firstly, we provide clear instructions such as using clear and explicit instructions for each question and stating what is expected from the students and how they should respond. Second, we use a simple and concise language that is easy for young children to understand such as avoid using complex sentences or technical terms. Last, we use multiple-choice or matching questions. for lower elementary students, multiple-choice questions or matching exercises are often more suitable than open-ended questions. (see Appendix 3 and 4).

3.4 Design and measure

3.4.1 Research model of study 1

As shown in Fig. 4, the key VR adoption constructs from TAM3 were chosen to develop the research model. Eight constructs related to emerging technology, such as VR, were selected because the research was focused on its use in language education. The definitions and constructs are listed in Table 2.

Fig. 4
figure 4

Research model. H1. Image has a positive influence on Perceived Usefulness of VR in English Education. H2. Result Demonstrability has a positive influence on Perceived Usefulness of VR in English Education. H3. Computer Anxiety has a negative influence on Perceived Ease of Use of VR in English Education. H4. Computer Playfulness has a positive influence on Perceived Ease of use of VR in English Education. H5. Perceived Enjoyment has a positive influence on Perceived Ease of Use of VR in English Education. H6. Perceived Ease of Use has a positive influence on Perceived Usefulness of VR in English Education. H7. Perceived Usefulness has a positive influence on Behavioral Intention of VR in English Education. H8. Perceived Ease of Use has a positive influence on Behavioral Intention of VR in English Education.

Table 2 Definitions of constructs

This study looked at how image and result demonstrability affect perceived usefulness, how computer playfulness, and perceived enjoyment affect perceived ease of use, how perceived ease of use affects perceived usefulness, and how perceived usefulness and perceived ease of use affect behavioral intention. The research model is depicted in Fig. 4.

3.4.2 Research model of study 2

The Study 2 research model considered the benefits and potential of VR as a teaching technique. The experiment used a between-subject design with a manipulated teaching method between groups. By random assignment, each student learned English using either traditional teaching or the VR method, diminishing the learning effect. Table 3 summarizes the entire experiment design.

Table 3 Summary of study 2 experimental design

Prior research chose measures of performance based on problem-solving as dependent variables to measure how effectively each teaching method delivered the police station contents in English to students and eliminate as many confounding factors as possible in evaluating this outcome (Mayer, 1989). These tasks serve as an indicator of how well students learn English using the method (Suh & Park, 2017).

The dependent variable in this study was problem-solving performance because performance is a better predictor of students’ deep understanding of English. Problem-solving accuracy was selected for measuring problem-solving performance.

4 Results

Two studies were examined in various ways. The hypotheses were tested in the first study by performing statistical analysis to determine which construct affected the students’ continuous use of VR. The results of the second study were analyzed in two stages. First, the problem-solving results of the students were computed. Second, the hypotheses were tested using statistical analysis to determine the differences in problem-solving results between technology-enhanced learning and traditional teaching methods.

4.1 Study 1

4.1.1 Assessment of the measurement model: Reliability and validity

The measurement model was developed to investigate the relationship between the constructs and their indicators (image, result demonstrability, computer anxiety, computer playfulness, perceived enjoyment, perceived usefulness, perceived ease of use, and behavioral intention). The research model was evaluated before testing the proposed hypotheses to ensure the reliability of each item, the reliability of the scale, the convergent validity, and the discriminant validity (Bajpai & Bajpai, 2014; Malhotra & Dash, 2013).

The Kaiser-Meyer-Olkin (KMO) and Bartlett’s Test were used. The KMO Value for Sample Adequacy was greater than 0.8. Bartlett’s test of Sphericity chi-square values was 104.19 (Image), 258.35 (Result Demonstrability), 280.01 (Computer Anxiety), 297.14 (Computer Playfulness), 321.05 (Perceived Enjoyment), 529.88 (Perceived Usefulness), 302.82 (Perceived Ease of Use), and 72.15 (Behavior Intention). The significance level was set at 0.01 and the significance value was 0.000. Furthermore, all the constructs’ Cronbach’s alpha scores were greater than 0.7, indicating that the constructs were reliable (Cronbach, 1951; Hair et al., 2011) (Table 4).

Table 4 Results of factor analysis and reliability analysis of image (IMG)

This study analyzed the convergent and discriminant validity of each construct (image, result demonstrability, computer anxiety, computer playfulness, perceived enjoyment, perceived usefulness, perceived ease of use, behavioral intention). Convergent validity was established when the average variance extracted (AVE) value was greater than 0.5 and the composite reliability (CR) value was greater than 0.7 (Kline, 2011). The maximum value of the AVE was 0.520, both of which were greater than the maximum value for the squared correlation coefficient; thus, discriminant validity was demonstrated (Table 5).

Table 5 Convergent and discriminant validity of each construct

4.1.2 Hypothesis test

All constructs of the VR system had varying degrees of association with the TAM; however, not all of them were statistically significant. Likewise, constructs had positive associations with the behavioral intention to use VR in English learning; however, one hypothesis, H1, showed statistical non-significance. In detail, as examining the effect relationship of Image (IMG) on Perceived Usefulness (PU), (β = -0.042, C.R. = -0.77, 0.444, p < 0.1), the hypothesis was rejected because it was not statistically significant even at the 0.1 level. Therefore, Image (IMG) had non-significant associations with Perceived Usefulness (PU). Except for Hypothesis 1, all Hypotheses (H2-H8) were supported. When examining the effect of Result Demonstrability (RES) on Perceived Usefulness (PU), Result Demonstrability (RES) on Perceived Usefulness (PU), Computer Anxiety (CANX) on Perceived Ease of Use (PEOU), Computer Playfulness (CPLAY) on Perceived Ease of Use (PEOU), Perceived Enjoyment (ENJ) on Perceived Ease of Use (PEOU), Perceived Ease of Use (PEOU) on Perceived Usefulness (PU), Perceived Usefulness (PU) on Behavioral Intention (BI), and Perceived Ease of Use (PEOU) on Behavioral Intention (BI), the analysis results were statistically significant at the 0.01 level. Standardized Regression Weights (β) were 0.191 on H2, -0.230 on H3, 0.310 on H4, 0.223 on H5, 0.540 on H6, 0.257 on H7, and 0.513 on H8 individually. Table 6 lists the inferential statistics of the model, and Fig. 5 shows the final model with non-statistically significant values represented by dotted lines.

Table 6 Hypotheses testing results (SEM)
Fig. 5
figure 5

Final model

Study 1 explored the implementation of VR as a pedagogical tool by measuring students’ acceptance of VR technology.

4.2 Study 2

4.2.1 Data scoring

The scores were awarded as follows. One mark was given if the answer was correct, whereas zero was given if an answer was incorrect or left blank. Students were encouraged not to answer the question by guessing. None of the students’ answer sheets had blank answers.

4.2.2 Hypothesis test

Study 2 aimed to understand the effect of technology-enabled learning by comparing teaching techniques, VR, and traditional teaching methods based on the problem-solving test score. Therefore, the hypothesis was as follows:

  • H1. The ability of students to learn English differs between VR and traditional teaching methods, voice-video-based oral communications. This study specifically aimed to recognize the students’ English capacity based on not only the total test result of problem-solving questions but also question types, action keywords, vocabulary, and police station expressions. The t-test was used in this study to compare VR and traditional teaching methods in the police station domain. As shown in Table 7, for comprehension accuracy, the difference between VR (M = 86.27) and traditional teaching method (M = 78.80) was statistically significant at the 0.01 level (t = -4.07, p = 0.000***, Mean difference = –7.47). As a result, using VR improved English learning more than voice-video-based oral communications.

Table 7 Comparison of comprehension accuracy of traditional teaching methods and VR

In the case of Action Keywords, the difference between VR (M = 27.47) and traditional teaching method (M = 24.73) on test score was statistically significant (t = –3.08, p = 0.000***, Mean difference = –2.72) at the 0.01 level. In the case of Vocab, the difference between VR (M = 32.73) and traditional teaching method (M = 27.47) on test score was statistically significant (t = –3.89, p = 0.000***, Mean difference = –3.80) at the 0.01 level. Finally, as a question type Expression, the difference in quiz score between VR (M = 26.47) and traditional teaching method (M = 25.53) was statistically not significant (t = -1.03, p = 0.304, Mean difference = –0.93) at the 0.01 level. In conclusion, VR on Action Keywords and Vocab had higher scores than the traditional teaching method.

5 Discussion

5.1 Implications

These results have academic and practical implications. This study presents guidance for the rigor aspects of technology-enhanced learning. The academic study of educational technology is strengthened by a broad and rigorous engagement with theory; therefore, this study applied the TAM, the most influential and commonly employed theory for describing an individual’s acceptance of technology (Lee et al., 2013; Salloum et al., 2019), to understand the reasons students use the VR technology in English learning.

Based on TAM, several interesting observations were shown. First, Image had no effect, whereas Result Demonstrability had a positive effect, indicating that students who used VR in their English learning did not have a higher profile or prestige. Students are familiar with electronic devices these days, so VR technology is not considered unique. Further analysis could explore the reasons behind the lack of effect of "Image" and the positive effect of "Result Demonstrability" on students' acceptance of VR in English learning. Some possible directions for additional analysis could be investigating the role of familiarity, examining the influence of social factors, and exploring the potential of gamification, because "Computer Playfulness" and "Perceived Enjoyment" had positive effects on students' acceptance of VR.

Second, Computer Anxiety was found to have a negative effect on Perceived Ease of Use, and Computer Playfulness, and Perceived Enjoyment had positive effects. In addition, Computer Playfulness had the largest effect on Perceived Ease of Use, and Computer Anxiety and Perceived Enjoyment affect the order. This suggested that rather than just a pleasant, enjoyable, and fun factor of using VR in English education, spontaneous and creative causes made it easier to use the VR system in learning. In other words, using the VR system voluntarily and being creative were more critical factors for students than using the VR system for pleasure or fun.

Third, Perceived Ease of Use influenced Perceived Usefulness positively. The Ease of Use of the VR system increased students’ effectiveness and productivity. As a result, students thought the VR system was very useful for English learning.

Fourth, Perceived Usefulness and Perceived Ease of Use influenced behavioral intention positively. Furthermore, Perceived Ease of Use had a greater effect on Perceived Usefulness, indicating that learning English was important; however, if the VR operation was too complex to operate, the lower grades of elementary school may refuse to use VR for their English learning.

Another significant implication related to the research scale. Previous studies’ experiment sizes were small in comparison to the current experiments. They used less than 30 participants and three pieces of experimental equipment, including tracking head-mounted mounted devices (HMD) and electronic gloves as experimental devices. However, in the current experiment, 300 students participated, and the experimental devices included a 360-degree stereoscopic screen, 3D simulated reality with HMD, and a VR shooter. This addressed the issue of investigation size, which had previously been identified as a limitation of technology-enhanced learning research. Further analysis could focus on the relationships between computer anxiety, computer playfulness, perceived enjoyment, perceived ease of use, perceived usefulness, and behavioral intention in the context of VR-assisted English learning. Some potential directions for additional analysis could be exploring the underlying factors contributing to computer anxiety such as conducting a qualitative study or using additional measures to investigate the specific causes of computer anxiety among students using VR for English learning another guideline is assessing the impact of perceived usefulness on learning outcomes such as exploring the relationship between students' perception of the VR system's usefulness for English learning and their actual learning outcomes.

Furthermore, this study found strong evidence that VR had a better educational effect than traditional education methods, which can help instructors and academics in the design of technology-enhanced learning materials and activities. This study discovered a significant positive effect of VR-assisted English education on elementary school students by engaging students in using VR to solve English questions. These findings contributed to language education by demonstrating that incorporating VR systems can improve learning motivation and effectiveness. VR provided students with an immersive and practical experience, in which they not only viewed but also experienced the specific situation using the target language, deepening their understanding of English.

5.2 Limitations and future research

Virtual reality (VR) technology, despite its potential benefits in education, presents certain restrictions in terms of health issues and accessibility, particularly for students from different socio-economic status (SES) conditions (Southgate et al., 2019). From a health perspective, prolonged exposure to VR can lead to a range of physical and psychological effects such as eye strain, motion sickness, and disorientation. These issues can be exacerbated if students do not have access to high-quality VR equipment that provides comfortable and immersive experiences. Moreover, the cost of VR devices and related hardware, such as powerful computers or gaming consoles, can create a financial barrier for students from lower SES backgrounds, limiting their ability to access and benefit from VR-based educational resources (Ford et al., 2023; Southgate et al., 2019). Additionally, limited internet bandwidth and infrastructure in some communities can further impede equitable access to VR experiences, exacerbating the accessibility gap for students from disadvantaged backgrounds (Ferri et al., 2020). Overall, while VR has great potential in education, addressing these health and accessibility challenges is crucial to ensure that students from different SES conditions can equally benefit from its implementation.

By considering following limitations, future research could be pursued in four directions. First, while comparing different problem-solving contexts in VR for English learning is valuable, the results may not be easily generalizable to all educational settings. Factors such as cultural differences, educational systems, and individual learner characteristics could influence the effectiveness of VR in different contexts. Future research should consider these factors to ensure the applicability of findings across diverse populations.

Second, this study mentioned elementary school students in the second and third grades, which limits the generalizability of the results to higher grade levels. Future experiments should include students from a wider range of grade levels, including middle and high school students, to provide a more comprehensive understanding of the impact of VR on English learning across different educational stages.

Third, while this study focused on English language education, exploring the effectiveness of VR for learning other languages is crucial for a comprehensive understanding of its potential benefits. However, it is important to consider that each language has its own unique characteristics, structures, and learning challenges. Future research should test the effectiveness of VR in teaching various languages, such as Chinese, French, and Korean, to determine if the benefits observed in English learning can be replicated in other language contexts.

Fourth, this research may not have captured the long-term effects of VR-based English learning. Future research should incorporate follow-up assessments to evaluate the durability of the learning outcomes and determine if the benefits of VR persist over time. Understanding the long-term effects is crucial for assessing the sustainability and effectiveness of incorporating VR into language education curricula.