Introduction

In recent years, due to the COVID-19 pandemic, many schools have had to shift from traditional classroom to online learning (Bao, 2020; Koenig et al., 2020; Mukhtar et al., 2020; Su et al., 2021; Zhang et al., 2020). Despite the effectiveness of online learning (Stevens et al., 2021), some students have reported feeling disengaged and distracted while studying remotely (Ray et al., 2022). Moreover, studies have shown that it is more challenging for students to maintain their concentration while learning online (Coman et al., 2020). Therefore, it is important for educators and researchers to design online learning experiences that motivate and engage students.

Virtual Reality (VR) technology has emerged as a promising tool that can be used to enhance online learning (Monahan et al., 2008; Checa et al., 2021; Liu & Shirley, 2021; Li et al., 2022; Rawson et al., 2022; Sai, 2022). VR can create immersive and interactive environments that allow students to better understand complex concepts, while also stimulating their emotions and enhancing motivation (Ślósarz et al., 2022). However, the use of VR can also lead to increased cognitive load (Makransky et al., 2019) and lost where to focus on, which may be detrimental to learning. Therefore, how to design VR to promote learning outcomes and learning experiences instead of hindering learning is an important challenge. To address this challenge, researchers have proposed using cues to reduce the cognitive load associated with VR (Nelson et al., 2014). However, few studies have investigated the use of cueing in VR (Albus et al., 2021; Liu et al., 2022; Nelson et al., 2014, 2016).

While there has been significant research on the effectiveness of online learning in K-12 and higher education settings, little attention has been paid to the experiences of students in vocational colleges. Secondary vocational schools could benefit from using VR for instruction since students can have access to virtual simulations of equipment that they may not otherwise have access to online. This can help bridge the gap between theoretical knowledge and practical skills. Studies have shown that students in vocational colleges may struggle to maintain control in the online learning environment (Cigdem & Ozkan, 2022), and the complexity of the VR environment may require additional time for students to adapt, leading to insufficient study time. Furthermore, due to the complexity of the VR environment, some students stated that it may take a long time to become acquainted with the software when directly operating VR (Liaw, 2019). Therefore, Vicarious learning may be an alternative. Students do not operate VR directly, but instead learn by observing the teacher’s operations and listen to the instruction. Or even if there may be a shortage of VR devices, teachers can provide educational VR by playing recorded videos of vicarious VR learning experiences that have been shared by others. In vicarious VR learning, VR serves as a multimedia resource similar to PowerPoint, and instructors explain and demonstrate while interacting with VR. This enables online learners to experience the benefits of VR-enhanced instruction under limited situations.

The study on spatial reasoning indicated that students’ performance is determined not by whether they can interact, but by whether they see the key information of the task (Keehner et al., 2008). However, the research on vicarious VR is not clear at the moment; although some studies show that there is no significant difference between direct and observed VR learning experiences (Luo et al., 2021), vicarious learning is as effective as direct learning (Dubovi, 2022), Jang et al. (2017) considered that direct operation is more helpful to express the internal anatomical structure than passive viewing. As a result, whether vicarious learning strategies can be used in VR requires further investigation.

The present study aims to examine the effects of VR instructional approaches and textual cues on performance, cognitive load, and learning experience of students, and explores the interaction between VR instructional approaches and textual cues. We conducted an educational VR study using a 2 × 2 + 1 between-subjects design involving 67 secondary vocational students. Participants learned computer assembly online and were exposed to either vicarious experience or direct manipulation instructional approaches, with or without textual cues. The control condition was a standard online conferencing class with slides that did not include VR or cues. The same lecturer used slides to teach the control group the identical computer hardware assembly knowledge as the experimental groups. We collected data on retention, transfer learning outcomes, cognitive load, and learning experience measures including presence, motivation, cognitive benefits, perceived learning effectiveness, and satisfaction. Our study found that students were better able to acquire knowledge immediately when exposed to the vicarious VR instructional approach. However, we found no significant positive effect of vicarious VR on long-term retention, transfer, and learning experience. Textual cues did not affect learning in general. However, for immediate knowledge gain, they did provide a positive boost to learning in VR involving direct manipulation, while they were unnecessary in vicarious VR experiences. This study contributes to our understanding of how the cueing principle can be extended to educational VR contexts and expands our knowledge of vicarious VR learning.

In this paper, the term “vicarious VR experience” refers to the experience from a teacher directly manipulating objects and completing tasks in VR, and explaining and demonstrating them to students. Students do not operate VR directly, but instead learn by observing the teacher’s operations and listen to the instruction. Throughout the paper, we will refer to it as the “vicarious experience” for short. The term “direct manipulation” refers to students directly manipulating objects and completing tasks in VR by themselves. The term “VR instructional approaches” refers to two instructional approaches based on: vicarious VR experience and direct manipulation in VR. The online teaching is given by instructors who will explain and demonstrate while interacting with VR. VR serves as a multimedia resource similar to PowerPoint, animations, or figures, and. For direct manipulation in VR instructional approach, the instructor only gives the most basic guidance to the students.

Relevant literature and theoretical background

Educational VR and desktop VR

VR has been regarded as a highly promising tool for enhancing learning. As such, it is essential to examine the various applications of VR in education and assess the extent to which they have effectively been implemented in the learning and yielded positive results. According to Radianti et al. (2020), VR can be divided into two categories: immersive virtual reality and non-immersive virtual reality. Immersive VR (IVR) is primarily facilitated through the utilization of head mounted displays (HMDs), which provide a more immersive visual experience (Kamińska et al., 2019). On the other hand, non-immersive VR, also referred to as desktop VR, does not require specialized equipment and can be accessed through more affordable hardware, through standard computers, using mouse or keyboard as the primary input methods (Chen et al., 2004; Robertson et al., 1997; Zhou et al., 2018). As a result, non-immersive VR is more cost-effective in terms of both use and maintenance, making it more likely to attain widespread usage compared to immersive VR.

The benefits of desktop VR are further demonstrated by its use in education. Makransky et al. (2019) found that although users of an IVR system felt more present, they actually learned less than those who used a low-immersion desktop computer. This demonstrates that learning outcomes may not be positively related to immersion. Srivastava (2019) found that HMD VR was not superior to PC VR for spatial learning and that the immersion provided by VR had no significant effect on learning (Alrehaili & Osman, 2022). Besides, the desktop VR group scored much higher than the IVR group in the retention test (Moreno & Mayer, 2004). Several studies also indicated that when properly designed, non-immersive desktop VR environments can be highly effective in facilitating learning and training (Dubovi et al., 2017; Lee et al., 2010; Lee & Wong, 2014; Ogbuanya & Onele, 2018).

In addition, the use of desktop VR has been found to have a positive effect on students’ self-efficacy, perceived ease of use, and learning behavior (Luo & Du, 2022). Lee & Wong, 2014 similarly reported significant differences in academic performance between an experimental group using desktop VR and a control group of traditional slides teaching. The majority of desktop VR research is presently done offline in classroom settings. Liu and Shirley (2021) have reported the benefits of integrating virtual reality into online learning activities, highlighting the potential of desktop VR in promoting active student participation. Moreover, Rawson et al. (2022) found that desktop VR not only simulated real-world experiences in virtual spaces but also enabled remote access to such experiences through personal devices. As a result, integrating VR into online learning and investigating VR instructional approaches can aid in resolving issues in online learning.

Cues in VR

Due of the richness of visual effects in VR, students may ignore some information and experience higher cognitive load. Cues or signals are used to direct learners’ attention to basic learning materials by using texts, images, gestures, or other people’s eye movements (Alpizar et al., 2020). According to the cue principle, when cues direct learners’ attention to relevant information or emphasize the organizational structure of key content, a deeper cognitive process occurs (Mayer, 2005). Adding cues into multimedia learning can provide needed attention guidance while reducing unnecessary visual search and cognitive load (Ozcelik et al., 2010).

Cues are classified into two categories: textual cues and visual cues, as proposed by (Mayer, 2002). Textual cues usually comprise sentences embedded within learning materials, aiming to direct attention towards key concepts. Visual cues, on the other hand, encompass the addition of features such as arrows or color to enhance learning materials. Koning et al. (2009) applied Mayer’s multimedia cognitive learning theory to classify the functions of cues. This classification outlines three functions of cues, which include selecting, organizing, and integrating information. Jamet et al. (2008) defined the selection function as guiding students to notice the location and information related to learning conceptual knowledge. The organization function refers to establishing the coherence between materials and emphasizing their own organizational relationship. And the integration function is assisting students in making connections between new and prior knowledge. Recent research by Wang et al. (2020) based on eye-tracking technology revealed that textual cues were more effective in selecting information, which was related to retention tests, while visual cues were more beneficial for organizing and integrating information, which was related to transfer tests. Additionally, previous studies have emphasized text as the primary medium for information acquisition (Folker & Sichelschmidt, 2005).

Current cue research focuses primarily on visual cues, particularly the role of color coding, with little focus on textual signals. The study of cues in VR is still in its early stages. Albus et al. (2021), for example, found that the addition of textual cues in VR environments can improve students’ memory and reduce germane cognitive load. Incorporating textual cues in VR environments may help students pay more attention to their studies (Liu et al., 2022). Nelson’s work (Nelson et al., 2016) demonstrated that using visual signals in virtual environments reduces students’ cognitive load.

The purpose of this research is to incorporate textual cues into the VR learning environment and investigate their role in VR instructional approaches.

Vicarious learning

Vicarious learning, also known as observational learning, involves learning by observing other people’s behaviors and actions (Bandura, 1969). As Bruner (1987) noted, many of our experiences with the world are not direct, suggesting that we can learn through mechanisms other than firsthand experience. Stegmann et al. (2012) found that vicarious learning can be unexpectedly more effective than learning by doing. In addition, vicarious learning can play a role in students’ acquisition of professional skills (Tufford et al., 2021), and has been found to be helpful in online language learning (Pleines, 2020).

Moreover, Dubovi (2022) conducted a study on the effects of direct VR and vicarious VR on students’ learning and found that while direct VR resulted in stronger emotional engagement, knowledge acquisition did not significantly differ between direct VR and vicarious VR. This suggests the potential applicability of vicarious learning in VR. However, despite claims that self-explanation in multimedia design principles can be extended to computer-based vicarious learning environments (Gholson & Craig, 2006), no research has yet examined the relationship between cue principles and vicarious learning.

Despite extensive research on online learning in K-12 and higher education, little attention has been paid to vocational college students. Using VR simulations can benefit these students by providing access to equipment not readily available online, bridging the gap between theory and practice. However, students may struggle with control in online learning environments (Cigdem & Ozkan, 2022) and require additional time to adapt to complex VR software (Liaw, 2019). Vicarious learning, where students observe teachers’ VR operations, can be an alternative to direct VR use. Teachers can provide educational VR through vicarious VR learning experiences, allowing students to experience VR-enhanced teaching in limited situations.

Research questions

This study aims to fill the aforementioned gap by examining the effects of textual cues and VR instructional approaches on secondary vocational students. The study will explore four research questions:

RQ1: Are online VR instructional approaches more effective than traditional online conferencing class?

RQ2: Can the incorporation of textual cues have an effect on learning within virtual reality contexts?

RQ3: How does learning through vicarious experience compare to direct manipulation in terms of knowledge gain, cognitive load, and learning experience?

RQ4: Do textual cues and VR instructional approaches interact in their impact on learning?

Methods

Participants

The participants in the present study were 67 participants (26 boys and 41 girls) from two classes of a public secondary vocational school. They were all first-year students (Age M = 15.88, SD = 1.20). The participants were randomly assigned to one of the five conditions, with the exception that the numbers of boy and girl students were assigned proportionately to the five conditions. A power analysis was conducted using G*Power 3.1 (Faul et al., 2007) based on the effect size (> 0.44) found in Kyaw et al. (2019). With an effect size of f = 0.45, power of 0.80, and alpha level of 0.05, a total 65 participants were needed for five groups to detect an effect.

The five conditions included four experimental conditions and one control condition: (1) vicarious experience with textual cues, (2) vicarious experience with no cues, (3) direct manipulation with textual cues, (4) direct manipulation with no cues, and (5) traditional online class using slides.

Instead of pre- and post- tests, we employed a post-test design to prevent learning from being triggered by the pre-test (Marsden & Torgerson, 2012). To verify the students in each condition had similar baseline knowledge of learning materials, we evaluated the participants on their task-specific prior knowledge. None of the participants has prior knowledge of the learning materials, and all participants had the same baseline knowledge and skills on computer assembly and VR.

Computer assembly VR application

The learning materials employed in this study cover topics related to computer hardware and assembly, which are integral parts of the vocational education curriculum. VR technology can be used to simulate the process of assembling computer hardware (Checa et al., 2021; Chen et al., 2021; Westerfield et al., 2015; Zhou et al., 2018). The desktop VR application used in this study was developed using Unity. The VR environment was a computer studio mainly equipped with an operating desk and interactive 3D models related to the topic of computer assembly. Most of the 3D models were found on the Unity Asset store. The 3D objects in the application contain all the computer hardware to be learned and assembled, including case, central processing unit (CPU), motherboard, memory, graphics card, computer data storage, power supply and cooling fan.

As shown in Fig. 1, the application offers five interactions for students: instruction, conversation, move in the scene, browse the objects task, and assembly task. The explanations are as follows.

  • The instruction is to guide the students in navigating and using the application effectively and efficiently. Students can use ENTER to open and close the pop-up instruction.

  • The conversation is to advance the task and provide information to the students.

  • When students press the keys of W, S, A, D or ↑, ↓, ←, → on the keyboard, they can move to four directions: front, back, left, and right in the scene, from a first-person perspective. They also can rotate the visual angle by holding the right-click.

  • For the knowledge of computer hardware, students can click the objects and view the pop-up text to learn.

  • Once the student is dragging a hardware for assembly task, a yellow contour the hardware will appear in the case. The yellow contour disappears until the hardware was placed correctly.

Fig. 1
figure 1

The interface of the computer assembly VR application (translated)

Experimental design

This study adopts a 2 × 2 between-subjects design, with an additional control group for a total of five conditions. The independent variables manipulated were VR learning methods (vicarious experience and direct manipulation) and textual cues (with and without). The four experimental conditions were vicarious experience with textual cues (VETC), vicarious experience with no cues (VENC), direct manipulation with textual cues (DMTC), and direct manipulation with no cues (DMNC). An additional control group of traditional online class using slides was added. The five groups’ instructor was the same person.

For vicarious experience, the teacher performed actions, such as manipulating objects and completing tasks, in VR while explaining and showing these actions to the students (see Fig. 2, the second and third column, top). for direct manipulation, the students themselves performed actions, such as manipulating objects and completing tasks, directly in VR (see Fig. 2, the second and third column, below).

Fig. 2
figure 2

Experimental design of the study (translated screenshots of the applications and slides)

Textual cues were functioned by manipulating the content of the text (Van Gog, 2014). They can take the form of key words, a brief explanation, or even a representation of the contents (Liu et al., 2022; Moreno & Abercrombie, 2010; Vogt et al., 2021; Wang et al., 2020; Zheng et al., 2023). In this work, the textual cues condition refers to when a red pop-up text appears as the student is browsing hardware (see Fig. 2, the second column), whereas the no cues condition refers to the opposite scenario (see Fig. 2, the third column). The name and a brief description of the hardware were included in the pop-up text.

The control condition was a standard online conferencing class with slides that did not include VR or cues. The lecturer used slides to teach the control group the identical knowledge as the experimental groups (see Fig. 2, the first column).

Measures

The dependent variables studied are divided into objective measurement and subjective measurement. Objective measurement mainly measures students’ knowledge acquisition level, which is measured twice.

Demographic and prior knowledge

We gathered demographic information from the participants, including their gender and age. All participants were first-year students from two classes at a secondary vocational school.

The instructor individually asked each student about their prior knowledge and experience with VR, and received a confirmation of their responses. All participants had no prior knowledge of computer hardware and assembly. They also had no experience with VR.

Learning outcomes

The students were tested using the assessments including the retention and transfer tests. Some of the test questions were created by the in-class teacher, while others were taken from the teacher-specific instructional materials and have been reviewed by the same teacher. The questions were designed to be consistent with the learning goals set in the teacher-specific instructional materials, and their types and levels of difficulty were in line with these goals.

The retention test assesses the students’ knowledge of computer hardware components over time, which have been covered explicitly during the vicarious learning or in VR. The retention test consisted of 14 multiple-choice questions, each with one correct answer. For instance, “What is the name of the hardware shown in the following picture?” The five options were: (A) Memory, (B) Cooler, (C) Power supply, (D) Hard disk, and (E) I do not know. To discourage guessing, the last option of “I do not know” was included. A correct answer was awarded 1 point and an incorrect answer received 0 points. Therefore, the maximum score for each test was 14. A low score indicated limited comprehension of the material. The retention test was administered one month after the first retention test. The test was identical in content, but the order of questions and response options was randomized.

The transfer test assesses students’ ability to use their knowledge and skills in new situations. The students were presented with a computer that had been assembled, and were then asked to identify the names of each of its components. They had not previously been presented with a cross-sectional view of the assembled computer. The transfer test consisted of 6 multiple-choice questions each with one correct answer. For instance, “Please choose the name of the hardware that corresponds to the mark labeled “6” from the available options.” The nine options were: (A) Graphics card, (B) Memory sticks, (C) Power supply, (D) CPU, (E) Hard disk, (F) Cooler, (G) Motherboard, (H) Case cover, and (I) I don’t know. The maximum score for each test was 6.

In total, we conducted two tests. The first test measured retention 1, which had a Cronbach’s α of 0.803 (control group: 0.719, VETC: 0.771, VENC: 0.754, DMTC: 0.707, and DMNC: 0.867). The second test measured retention 2 and transfer, which had a Cronbach’s α of 0.812 (control group: 0.843, VETC: 0.646, VENC: 0.669, DMTC: 0.847, and DMNC: 0.872).

Subjective measures

The subjective measures in this study were comprised of two components, cognitive load and learning experience in desktop virtual reality (VR). The cognitive load scale, which included three questions, was adapted from Paas (1994). The learning experience scale was adapted from Lee et al. (2010), including five dimensions as presence, motivation, cognitive benefits, perceived learning effectiveness and satisfaction. In total, the subjective measures contained 19 questions, all of which were assessed using a Likert five-level scale. The measures demonstrated good reliability and validity, with a Cronbach’s alpha coefficient of 0.935.

Procedure

The online teaching was delivered using online conferencing software. Each group spent approximately 30–40 min learning. Students in the direct manipulation groups (DMTC and DMNC) began by downloading the VR application onto their personal computers before the experiment. They received initial guidance from the teacher and a presentation slide outlining the usage of the Computer Assembly VR Application. Afterward, they independently acquired knowledge about computer hardware and proceeded to execute assembly tasks on their personal computers. Within the application, an on-screen character (see Fig. 1) was integrated to deliver conversations and provide essential information to guide students through the different stages of the application. For the vicarious groups (VETC and VENC), the teacher pre-installed the application before the experiment and proceeded to demonstrate the VR content (i.e., computer hardware knowledge) as well as the assembly process, which the students observed and learned from. The control group listened to the teacher’s lecture with a PowerPoint presentation. After the learning session, the students were given retention tests and also the Cognitive Load Scale. About a month later, the students were assessed again on the retention test and took a transfer test. Before the end of the experiment, a survey was conducted to collect their perceptions of the learning experience.

Data analysis

Kruskal-Wallis test of observed values and visual inspections of their histograms, normal Q-Q plots and box plots, showed that the data were normally distributed for all five conditions. ANOVAs were therefore used. The dependent variables included the retention and transfer test scores, the cognitive load score, and the learning experience score.

Results

Learning performance

Table 1 shows an overview of the descriptive statistics across five groups. Table 2 shows an overview of main effect and interaction of independent variables.

Table 1 Means and standard deviations of dependent variables across the five conditions
Table 2 Main effect and interaction of independent variables

Retention 1

The results of the one-way ANOVA analysis indicated that there were significant differences in the first retention scores of the five groups of students immediately after acquiring the knowledge (F(4,62) = 3.556, p = .011, η²= 0.187). Post hoc Tukey’s HSDs showed that the scores of the control group differed significantly from the VENC group, with the control group having lower scores (See Fig. 3). There was no significant difference between any two of the four experimental groups.

Fig. 3
figure 3

Differences in the first retention across five conditions

We conducted two-way ANOVAs for the four experimental groups. The main effects of textual cues (F(1,48) = 0.892, p = .350, η²= 0.018) and VR instructional approaches (F(1,48) = 0.005, p = .942, η²= 0.000) were not significant. We found a moderate interaction between the two independent variables (F(1,48) = 3.780, p = .058, η²= 0.073) although the effect was not significant, as shown in Fig. 4.

Fig. 4
figure 4

Interaction of first retention between cues and VR instructional approaches

Retention 2

The results of the one-way ANOVA analysis indicated that there were not significant differences in the second retention scores across the five groups (F(4,62) = 1.557, p = .197, η²= 0.091), with the control group having lower scores than all four experimental groups (See Fig. 5).

Fig. 5
figure 5

Differences in the second retention across five conditions

The two-way ANCOVA results showed that the main effects of textual cues (F(1,48) = 0.829, p = .367, η²=0.017) and VR instructional approaches (F(1,48) = 3.131, p = .083, η²=0.061) in the second retention were not significant. The interaction between cues and VR instructional approaches in the second retention test were not significant (F(1,48) = 0.609, p = .439, η²=0.013).

Transfer

The results of the one-way ANOVAs indicated that there were significant differences in the transfer performance among the five groups (F (4,62) = 7.053, p = .000, η²= 0.313). The post hoc Tukey’s HSD tests indicated that there were significant differences between several group pairs: the DMNC group and the VENC group, the DMNC group and the VETC group, the DMTC group and the VENC group, as well as the DMTC group and the VETC group (see Fig. 6).

Fig. 6
figure 6

Differences in the transfer across five conditions

The findings of two-way ANOVAs suggested that vicarious experience significantly affected students’ transfer performance (F (1,48) = 34.14, p = .000, η²= 0.416), as evidenced by the higher transfer performance of the direct manipulation groups (DMTC and DMNC) compared to the vicarious experience groups (VETC and VENC). The main effect of cues was not significant, and there was no interaction between cues and VR instructional approaches.

Cognitive load

The one-way ANOVA results showed that there were no significant differences in cognitive load among the five groups (F (4,62) = 0.939, p = .447, η²= 0.057), but the cognitive load of the four experimental groups was higher than that of the control group (see Table 1).

Furthermore, two-way ANOVAs were conducted on cognitive load for the four experimental groups. The main effects of the two independent variables were not significant. There was a small interaction effect between them (F (1,48) = 0.640, p = .428, η²= 0.013) although the effect was not significant, as shown in Fig. 7.

Fig. 7
figure 7

Interaction of cognitive load between cues and VR instructional approaches

Learning experience

We also analyzed students’ learning experience on four experimental groups. The one-way ANOVA results showed that there were significant differences in learning experience among the four groups (F (3,48) = 5.077, p = .004, η²= 0.241), with presence (F (3,48) = 1.777, p = .164, η²= 0.100), motivation (F (3,48) = 1.764, p = .167, η²= 0.099), cognitive benefits (F (3,48) = 4.824, p = .005, η²= 0.232), perceived learning effectiveness (F (3,48) = 6.597, p = .001, η²= 0.292) and satisfaction (F (3,48) = 7.615, p = .000, η²= 0.322). The post hoc Tukey’s HSD tests indicated that there were significant differences between the DMNC group and the VENC group, the DMTC group and the VENC group in total learning experience. There were significant differences between the DMNC group and the VENC group in cognitive benefits. And there were significant differences between the DMNC group and the VENC group, between the DMNC group and the VETC group, between the DMTC group and the VENC group, between the DMTC group and the VETC group in perceived learning effectiveness. We also found that there were significant differences in satisfaction between the DMNC group and the VENC group, between the DMTC group and the VENC group, between the DMTC group and the VETC group.

The two-way ANOVA results showed that students reported higher levels of presence, motivation, cognitive benefits, perceived learning effectiveness, and satisfaction in the direct manipulation groups (DMTC and DMNC) compared to the vicarious experience groups (VETC and VENC). This difference was significant, with p-values less than 0.05 for all five dimensions, indicating a clear main effect of vicarious experience. However, the main effect of textual cues on learning experience was not significant, and there was no significant interaction between the textual cues and VR instructional approaches.

Discussion

The objective of this study is to investigate the impact of VR instructional approaches (direct manipulation vs. vicarious experience) and textual cues on students’ learning outcomes, cognitive load, and learning experience.

Contributions to theory

RQ1: are online VR instructional approaches more effective than traditional online conferencing class?

Overall, our study demonstrates that interactive and well-designed VR application, as well as vicarious VR experience, are both feasible instructional approaches for effective learning. The present study found that participants performed well on retention and transfer tests, and their learning experience was positive, with no evidence of excessive cognitive load when under VR instructional approaches. This is consistent with those in several other studies showing that desktop VR facilitates learning outcomes, interests, and engagement (Dubovi et al., 2017; Lee & Wong, 2014; Ogbuanya & Onele, 2018) than traditional class, as well as the results in meta-analysis (Yu & Xu, 2022). However, the findings of this study further extend educational desktop VR theories by demonstrating the potential of vicarious VR learning and remote VR learning.

Besides, our results suggest that students who learned through vicarious experience, which involves observing an instructor’s manipulation and explanation, demonstrated significantly higher immediate knowledge acquisition than those who attended traditional online conferencing classes. However, for longer-term retention and transfer, a well-designed self-study program based on direct manipulation in VR can achieve better learning outcomes than traditional online learning. Previous work only evaluated the immediate learning effects (Dubovi, 2022). This study contributes to broadening our understanding on how vicarious VR learning impact longer-term learning gains.

We also examined the cognitive load of students across five groups and found that, while the cognitive load was higher in the four VR conditions compared to the traditional online condition, there were no significant differences among them. VR does require participants to use more cognitive resources to process information, but this additional load did not harm the students’ performance and may even have helped to facilitate knowledge acquisition. Our study showed that if the pedagogical factors were considered, the visual presentation of VR learning material might not have been overwhelming for the learners. This also supports the importance of the pedagogical aspect of VR design (Zhou et al., 2018).

RQ2: Can the incorporation of textual cues have an effect on learning within virtual reality contexts?

The overall findings showed that the presence of textual cues neither facilitated nor impeded the learning. The retention tests showed no significant difference in learning outcomes between students who had received textual cues and those who had not. Similarly, there was no significant difference between groups with and without textual cues on the transfer test. The impact of cueing on learning in animations yielded mixed findings. Cueing can improve cognitive processes, according to some research (Koning et al., 2009), but not others. For example, Albus et al. found that annotations in VR can assist learners with recall-level information processing (Albus et al., 2021). Wang et al. (2020) found that textual cues primarily guide students’ attention and only promoted the cognitive process of selecting information. However, we observed that incorporating textual cues into VR learning had little impact on the students’ learning. One possible explanation is that textual cues have different effects on the learning of different types of knowledge. The distinct nature of knowledge types leads to differing impacts of design factors on learning, a phenomenon that has also been observed in empirical studies involving the effects of other design factors on learning (Skulmowski, 2022). Our VR learning program includes two types of learning. One type involves conceptual learning of 3D objects, such as learners needing to understand what a graphics card is and how it looks. This type of learning does not involve associations, memory, or imagination of 3D object positions, essentially not dealing with spatial knowledge. The other type involves learning the positions of 3D objects, which directly engages spatial knowledge. These two types of learning correspond to the computer hardware knowledge learning and assembly learning in our VR program as outlined in this study. The hardware knowledge learning is controlled by the cuing factor, while the latter form of knowledge acquisition did not contain such cues across all relevant groups. In our investigation of previous research, we observed that cues tend to yield positive learning effects when it comes to spatial knowledge which contains the relative position between objects (Kuipers, 1982; Qiu et al., 2020). For instance, Wang et al.‘s research employed learning materials related to the geographical features of the Jade Dragon Snow Mountain (Wang et al., 2020). The study by Albus et al. (2021) involved learning tasks concerning the procedure of seawater desalination. Through a comparative analysis of previous research and our own findings, it becomes evident that learning of conceptual knowledge that does not involve spatial knowledge appears not to benefit from textual cues.

Our findings contribute in two significant ways. First, they enrich the application of multimedia cognitive learning theory applied within VR settings. At present, most of the literature focuses on the application of multimedia learning principles, while empirical studies on multimedia learning principles in VR remains limited (Ceken & Taskin, 2022). The cueing principle is a multimedia learning technique that directs students’ attention and decreases external cognitive load (Mayer, 2014). This study applies the cue principle to VR and investigates the impact of textual cues on students’ retention and transfer tests, cognitive load, and learning experience. Second, we offer new evidence regarding the cueing principle and clarify certain boundary conditions that determine when the principle is effective and when it is not. Through an examination of several empirical studies and ours, we concluded that textual cues generally prove beneficial for acquiring spatial knowledge-related information, but have little effect on the acquisition of conceptual knowledge that does not involve spatial associations.

RQ3: How does learning through vicarious experience compare to direct manipulation in terms of knowledge gain, cognitive load, and learning experience?

Our results showed that direct manipulation resulted in significantly better transfer performance and learning experience including five dimensions as presence, motivation, cognitive benefits, perceived learning effectiveness and satisfaction, when compared with vicarious experience. However, the retention performance and cognitive load were not significantly different between these two VR instructional approaches. Therefore, we draw the conclusion that effective interactive learning is superior, while vicarious learning can, to a certain degree, yield comparable outcomes.

First, interaction indeed has an impact on the learning process. Embodied cognition theory (Lakoff, 2012) posits that people construct their comprehension of the world through interactions with their environment. Their interactions with their environment have impact on their cognitive processes. It emphasizes that the human cognitive process extends beyond the confines of the brain alone, and is impacted by factors like bodily perception, actions, as well as the environment and cultural context. However, the effectiveness of interaction relies on its well-design, as opposed to allowing learners to engage in unguided, aimless exploration. The latter mode of interaction often results in ineffective outcomes. The study by Keehner et al. (2008) demonstrates that effective interaction yields superior learning outcomes compared to ineffective interaction. This explains why some empirical studies have suggested that interaction isn’t always effective, leading to mixed findings within interaction-related research. In our present work, the VR program is thoughtfully designed. The program incorporates structured conversations to regulate the learning process and prompt for when interactions should occur, as opposed to allowing learners to navigate aimlessly without a purpose. As a result, we have observed that interaction can indeed facilitate learning, particularly in the context of transfer learning. This aligns with the findings of Jang et al. (2017). Jang et al. conducted a study comparing medical students who either directly manipulated a virtual anatomical structure or passively viewed an interaction in a 3D environment (Jang et al., 2017). They found that direct manipulation was more effective than passive viewing.

Furthermore, we have also found that vicarious learning can achieve comparable outcomes to interactive learning in terms of immediate knowledge acquisition. This finding aligns with the results of Dubovi’s work (Dubovi, 2022). Dubovi’s study found that learning through observation in a VR environment can result in immediate knowledge gain comparable to that of direct VR interaction (Dubovi, 2022). Building upon this, our research has revealed an even more interesting aspect – the knowledge acquired through vicarious learning can have the same level of retention after a certain duration as effective interactive learning. This sheds light on a previously unexplored aspect, as researchers were uncertain whether vicarious learning would maintain the same positive effect on knowledge retention over time. Nevertheless, it’s undeniable that in the context of transfer learning, hands-on interaction does indeed yield a more positive outcome. During vicarious learning, students are unable to interact with the material being presented through computers. As a result, they may lack the opportunity for deep processing and organization of knowledge.

In conclusion, we recommend that, when feasible, the integration of interactive learning for students should be considered, with a strong emphasis on ensuring the effectiveness of the interactive design. However, when conditions are constrained, vicarious learning can still yield outcomes in knowledge acquisition comparable to those of interactive learning.

RQ4: do textual cues and VR instructional approaches interact in their impact on learning?

Finally, the interaction between the two independent variables in the study is discussed. A moderate interaction in the results of the first retention test indicated that the learning effect with or without textual cues was related to whether or not students can directly control VR for learning. When students learn vicariously by viewing the instructor operation and listening the explanations in VR, the learning effect without cues is superior to that with textual cues, and the cognitive load of reporting is lower. We believe this may be a result of the redundancy effect (Mayer, 2014; Plass et al., 2020), which has been observed in many previous studies (Gerjets et al., 2009; Baceviciute et al., 2022). The redundancy principle states that learning is hindered by redundant information. This is due to the fact that redundant information produces unnecessary cognitive load, which occurs when redundant information is processed concurrently. When textual cues are added to vicarious VR, textual cues conflicts with teachers’ spoken explanations, which increased the cognitive load. However, in direct manipulated VR, the presence of cues helps and reduce the load. According to Mayer and Anderson (1992), a high degree of continuity in time and space is the foundation of successful integration, i.e., images must be close to the paragraphs of text in time and space. When students click on VR hardware, textual cues will appear next to it to facilitate information connection (for example, what a graphics card is and how it looks). Thus, we concluded that textual cues can be a boost for direct manipulation VR learning while they are not necessary for vicarious experience when students are processing recall-level information. This finding serves as a reminder that, when designing vicarious VR learning experiences, it would be better to avoid introducing redundant textual cues in order to prevent unnecessary visual information consumption by learners.

Contributions to practice

First, the present study has shown that incorporating VR into online teaching can enhance teaching by offering a range of VR instructional approaches. This research is particularly useful for educators who are interested in integrating VR into their online teaching practices. Furthermore, vicarious VR learning allow students to experience the VR environment without requiring direct manipulation. This approach can not only maintain the effectiveness of learning and provide a good learning experience, but also integrate the instruction of teachers with the experiential learning of students.

The second important contribution to practices relates to the design of cues for VR instructional approaches. Although cues can enhance learning in some VR environments, our research has shown that cues are not always effective. Whether they facilitate or hinder the learning depends on the type of instruction. This not only extends the theories about cues in instructional VR, but also provides guidelines for the design of textual cues in VR.

Finally, the implementation of a VR instructional environment in remote regions is economically unfeasible, potentially giving rise to a digital divide and educational inequality (Jones et al., 2022). Therefore, research on vicarious VR learning has the potential to mitigate the digital divide and reduce educational inequality. Our study investigated the effectiveness of vicarious VR learning and provides an illustrative example. In rural areas where there may be limited access to VR devices, teachers can provide educational VR through recorded videos of vicarious VR learning experiences shared by others. This allows online learners to experience the benefits of VR-enhanced teaching despite limited access to technology.

Conclusion, limitations and future directions

This work examines the effects of VR instructional approaches (direct manipulated VR and vicarious experience) and textual cues on online learning, and investigates the interaction between these two independent variables. Our findings revealed that students were better able to acquire knowledge immediately when exposed to the vicarious VR experience. However, there was no significant positive effect of vicarious VR on long-term retention, transfer, and overall learning experience. Textual cues did not affect learning in general. However, for immediate knowledge gain, textual cues provided a positive boost to learning in VR experiences that required direct manipulation. In contrast, they were not necessary in vicarious VR experiences. The findings extend the cueing principle to the context of VR and add new insights into the academic discussion on VR instructional approaches for online learning.

However, the present study has several limitations that should be addressed in future investigations. First, it is important to highlight that computer assembly involves operational learning, which was not thoroughly evaluated in our experiment’s retention and transfer tests due to the constraints of online learning. Given the significance of operational learning, it would be beneficial to examine the impact of VR instructional approaches on operational knowledge and skill development in future research. A second limitation of the present study is its focus on the effects of vicarious learning and cues on knowledge acquisition. Reasoning and problem-solving involves high-level cognitive processes. Therefore, we will concentrate on examining the impact of vicarious experiences on reasoning and problem-solving skills. Third, the impact of vicarious experience on information processing, specifically with regards to the selection, organization, and integration of information, remains unclear. Therefore, in future research, we intend to gather data via electroencephalography (EEG) and eye-tracking techniques to better understand these internal cognitive processes. Fourth, it is unclear whether other multimedia learning principles facilitate or obstruct vicarious VR learning. Our current research focuses merely on cueing principles. We will investigate the impact of other design principles on vicarious learning in the future. Besides, the current vicarious experience solely was generated from the teacher’s experience. However, vicarious experiences can manifest in a variety of forms, such as those stemming from peers or groups. Ascertaining the optimal type of vicarious VR experience for promoting effective learning is a future direction. Finally, it should be noted that the present study was conducted over a rather short period of time. Short interventions may not adequately capture the full spectrum of effects and variations that can occur over time. This limitation, however, serves as a stepping stone for future research. We will investigate if the findings observed in this study can be sustained throughout longer periods of learning. Future studies can bridge the gap left by the present study by embracing an extended research period, increasing the credibility and application of findings in various contexts.