1 Introduction

Virtual Reality (VR) and Augmented Reality (AR) technologies are currently receiving a great deal of attention, thanks in large part to the commercial availability of new immersive VR/AR platforms (Microsoft HoloLens, n.d.; Oculus Rift, n.d.) and lower-cost standalone VR/AR platforms such as the Oculus Quest (Oculus Quest 2019). Additionally, frameworks are quickly appearing to make VR/AR development easier for the web (A-Frame, n.d.), through plugins into popular game engines (Valve, n.d.), and with the technology built directly into the operating systems of mobile platforms (Apple, n.d.). While these technologies first appeared in research and development dating back to middle of the twentieth century (Azuma 1997; Mazuryk and Gervautz 1996) there is tremendous human interest in the concept of simulating reality which can be seen within fiction as early as the 1930s (Weinbaum 1935), and much earlier within the philosophical realm, when humans started to consider whether our perceived reality is an “absolute” reality, rather than merely “shadows on a cave wall” (Plato and Lee 1974), “a dream” (Descartes and Cress 1993) or a robust “computer simulation” (Bostrom 2003).

Current lower-cost and higher fidelity VR/AR technological developments such as increased resolution, reduced latency, and higher framerates have raised hopes for more mainstream and diverse applications within the fascinating area of simulating and augmenting/enhancing reality, in which we are no longer bounded by physical spaces and the physics of the known universe. We are seeing an explosion of experimentation and development of novel applications within VR/AR forms such as gaming (Keep Talking and Nobody Explodes, n.d.; Pokémon Go 2016; Star Trek Bridge Crew n.d.), film (Dear Angelica n.d.; Google Spotlight Stories, n.d.), social communities (Mozilla Hubs 2018; Rec Room, n.d.; VRChat, n.d.); and, most interestingly for this survey, educational endeavours (Dede 2009; Dunleavy et al. 2009; Grotzer et al. 2015; Ketelhut et al. 2010; Salzman and Dede 1999; Schrier 2006).

1.1 Problem statement and contribution

Though many educational endeavours use technology to make learning more motivating and effective, and the use of VR/AR in education is not new, there are many facets of VR/AR use that could be improved. Specifically, there is ambiguous evidence of effective learning gains using VR or AR technologies (Dalgarno and Lee 2010; Fowler 2015) or that generalizing results has many caveats (Dede and Richards 2017; Merchant et al. 2014), minimal focus on theoretical backgrounds grounded in learning theory (Dalgarno and Lee 2010; Fowler 2015), minimal research into combining VR and AR, and very few explorations into how we better acknowledge the social properties of social learning spaces where we interact together in co-located areas such as classrooms and museums. In summary, the areas covered by this review are:

  • Multi-user, specifically closely coupled (collaborative), VR/AR theory, and interaction.

  • Responsive, multi-platform VR/AR content that adjusts functionality, interaction input, and display output depending on the platform accessing the content. This adaptivity aligns with Universal Design for Learning (Rose et al. 2006) methodologies for increasing the accessibility of learning materials.

  • The use of VR and/or AR in social, educational contexts such as the more formal classroom and informal educational institutions such as museums.

  • Suggested learning theories that may provide for more complete reflections of how learning happens in social learning spaces, with greater consideration of how the embodied, social, and spatial environments affect learning.

2 Methods

This literature review is a broad and qualitative overview of the use of VR and AR within a social education context, serving as an entry point into a discussion and more sophisticated analysis into the present and near-future of VR/AR in learning. To minimize the size and scope of a paper with such an overwhelming amount of literature on this subject we focused on building upon prior surveys (Clarke-Midura et al. 2011; Dalgarno and Lee 2010; Dunleavy and Dede 2014; Freina and Ott 2015; Salzman and Dede 1999; Shin 2017) then expanding upon the gaps found in social interactions within VR/AR, the considerations of VR and AR combined into the same educational framework and platform, and how VR/AR in education could be more accessible through multi-platform implementations.

Additionally, to bring more recent literature into this survey, which was predominantly completed in late 2018, and to address some of gaps noted since then, we also build upon the use of HMD VR in the classroom learning spaces (Southgate et al. 2019), as it has been suggested that few classroom studies evaluate HMD VR (Markowitz et al. 2018), and embodied design principles in VR and learning (Johnson-Glenberg 2018). With these key papers, we then applied a snowball method following papers cited within articles until no new relevant articles were identified. Where possible, we also used more focused keyword searches using the terms “education, learning, multi-user, social, museum, virtual reality, augmented reality, mixed reality” within Google Scholar, IEEE, and ACM databases.

3 What is VR and AR?

While VR and AR share many similar technologies, such as various tracking sensors and displays, they represent two different approaches in blending the physical and virtual world realities. VR and AR are defined as the following:

  • Virtual Reality (VR): “an artificial environment which is experienced through sensory stimuli (such as sights and sounds) provided by a computer and in which one’s actions partially determine what happens in the environment” (Jerald 2015; Merriam-Webster, n.d.).

  • Augmented Reality (AR): “AR allows the user to see the real world, with virtual objects superimposed upon or composited with the real world. Therefore, AR supplements reality, rather than completely replacing it”. (Azuma 1997).

AR traditionally overlays digital content onto a live view of the environment, often as a camera view with mobile platforms, or a see-through display, as found in wearable AR platforms such as Microsoft Hololens (Microsoft HoloLens, n.d.). VR technology, conversely, aims to immerse users within a completely artificial environment with various forms of technology to address one or more senses. HMD VR is often referred to as Immersive Virtual Reality (IVR) by providing a stereo display, spatial audio, and controllers (or hand-tracking) for interactions and haptic feedback; but there are many other forms of VR with varying degrees of immersion such as handheld displays and projected walls, in addition to HMDs (Buxton and Fitzmaurice 1998).

Both VR and AR fall subjectively on the Reality-Virtuality (RV) continuum first proposed by researchers Milgram et al. (1995). In this continuum, “reality” lies at one end, and “virtuality” (VR) lies at the other, with Mixed Reality (MR) displays placed between, which denote a category of displays which represent a blending of reality and virtuality to varying degrees. There have also been recent attempts to rename the entire RV continuum as XR (Extended Reality) (Extended Reality, n.d.) or “Spatial Computing” (Greenwold 2003) to denote all MR platforms and the edge cases of complete Virtuality (immersive VR) and Reality, but to avoid confusion in this paper, we will adhere to the less ambiguous descriptions of VR and AR.

3.1 Immersion, presence, and embodiment

The degree to which an individual accepts that a virtual world is real is generally referred to as presence and is an important part of bringing an individual into a virtual space. Note that the terms presence and immersion are sometimes used interchangeably, but most now accept the following definitions. Additionally, we define embodiment below, as it and presence are referred to as the “two profound affordances of VR” (Johnson-Glenberg 2018):

  • Immersion: what the technology delivers from an objective point of view. The greater the number of technologies that cover various sensory modalities, in relation to equivalent human real-world senses, the more that it is immersive (Bowman and McMahan, 2007). For example, a 2D display is less immersive than a stereoscopic display that tracks head movement.

  • Presence: the point at which an individual begins to accept an artificial reality as reality. It includes two main illusions to be accepted by an individual—(1) the place illusion (that where they are is actually real) and (2) the plausibility illusion (that what is happening is actually happening) (Slater 2009).

  • Embodiment: describes the mental representations of the body within space—which can be physical and/or virtual. The three main components of embodiment are (1) body ownership (sense that the body inhabited is one’s own), (2) self-location (being in the place where one’s body is located), and (3) agency (that an individual can move and sense their own body) (Borrego et al. 2019). Embodiment is considered an integral part of learning (Johnson-Glenberg 2018).

Zimmons and Panter find that making worlds more photorealistic does not necessarily increase presence (Zimmons and Panter 2003), and Jerald suggests that complete presence is reached by focusing on world stability, depth cues, physical user interactions, cues of one’s own body, and social communication (Jerald 2015). Additionally, there are several trade-offs such as how closely do the visuals match reality (representation fidelity), how closely do the interactions match reality (interaction fidelity), and how closely the perceived experiences match reality (experiential fidelity) to consider when developing VR applications (Jerald 2015). For some, more immersive VR HMDs may also induce nausea. This is often called cybersickness (LaViola Jr 2000; McCauley and Sharkey 1992), and likely due to perceived differences between the spatial orientation of the VR visuals and the spatial orientation of the body’s balancing system called visual-vestibular conflict (VVC) (Akiduki et al. 2003). Cybersickness appears to be aggravated by VR experiences that are more action-oriented and individuals that do not have a predisposition to high adrenaline sports (Guna et al. 2019).

3.2 Interaction methods

Bowman and Hodges define interactions within Virtual Environments (VEs) as concerned with three main task categories: viewpoint motion control (navigation), selection, and manipulation (Bowman and Hodges 1999). Furthermore, these selection and manipulation techniques can be classified into six interaction metaphors. LaViola et al. describe these metaphors as grasping (e.g. using a virtual hand), pointing (e.g. ray-casting), surface (e.g. using a 2D multi-touch surface), indirect (e.g. ray-cast select then perform additional multi-touch gestures to modify without directly selecting the object of interest), bimanual (using two hands to interact), and hybrid (interaction technique changes depending on context of selection) (LaViola Jr. et al. 2017).

Additionally, the consideration of social interactions in VR/AR is important as learning methodologies such as “Together and Alone” (Johnson and Johnson 2002) and Computer-Supported Collaborative Learning (Stahl and Hakkarainen 2020) suggest that closely coupled collaborative interactions enhance learning. Interestingly, some of these collaborative dynamics can be framed within a social interdependence model (positive = collaborative, negative = competitive) where “the accomplishment of each individual’s goals is affected by the actions of others” (Johnson and Johnson 1989). This type of “closely-coupled collaboration” (García et al. 2008) is evident in works such as Schroeder et al. (2001) “Rubik’s Cube puzzles”, the narrative-based constructionist works of the NICE project (Roussos et al. 1997), and numerous/multiple studies on how collaborative manipulation can happen within VR/AR environments (Aguerreche et al. 2010; García et al. 2008; Pinho et al. 2002). Additionally, the virtual gazebo building project by Roberts et al. (2003) broke down tasks into sub-tasks that required multiple users to complete concurrently in both “distinct attribute” (e.g. one holds a wooden beam and the other screws a hole) and “same attribute” (e.g, both users need to pick up an object that is too heavy for one) tasks. These categories of distinct and same attributes are further broken down into also including asynchronous (sub-tasks completed sequentially) and synchronous (sub-tasks completed concurrently) tasks by CVE researchers Otto et al. (2006).

Pinho et al. (2002) also expand upon this CVE framework by including four considerations when developing a virtual environment for collaboration—see Table 1. These considerations echo similar principles we see when defining Reality-Based-Interactions (RBIs), i.e. considerations for naïve physics and body-awareness & skills for RBIs (Jacob et al. 2008) and social immersive media, i.e. considerations for socially scalable and socially familiar interactions (Snibbe and Raffle 2009).

Table 1 Pinho et al’s considerations for “usable and useful” cooperative manipulation techniques (Pinho et al. 2002)

3.3 Interaction methods between VR and AR

Collaboration is not limited to users of the same medium (e.g. just VR or AR). Though research is limited within an educational context, there is some interesting work on the use of VR and AR Collaborative techniques. Grasset et al.’s studies point towards negligible effects on task performance and note interesting possibilities whereby the environment provides physical interaction affordances with the use of AR and VR for collaborative tasks (Grasset et al. 2008). Unfortunately, this study does not seem to sufficiently answer whether some tasks are better suited to VR or AR and what are the effects of technology limitations. There are many instances of digital technology granting each user a unique view as a positive affordance in multi-user interactions (Smith 1996). Additionally, cognitive load can become an issue in varying perspectives as seen Yang and Olson’s paper studying the use of egocentric and exocentric views in collaborative tasks—“One lesson learnt is that it is harmful to correlate views across sites in a way that requires real-time effortful mental operation such as mental rotation” (Yang and Olson 2002).

There has also been some exploration into users collaborating at different scales in Multiscale Collaborative Virtual Environments (mCVEs) (Zhang and Furnas 2005), and though the “VARU framework” (Irawati et al. 2008) only uses with AR and projection VR, it opens up some interesting discussion on how objects could have different descriptions (or “extensions”) in each VR or AR space. In learning, there are fewer examples, though there is some promising work that explores how a virtual museum could emulate the social experience of visiting a physical museum by allowing learners to interact with virtual artefacts with VR or AR together (Li et al. 2018). Interestingly, Li et al. note that learners were interested in additional interactions that could be completed together beyond seeing each other’s artefacts moving in the VE, that unintentional collaboration happened when the AR experience could rotate a model that the VR experience could not, and they had to work together to share information (Li et al. 2018).

4 Educational context for VR/AR

4.1 Overview of related pedagogy and theory

There are several learning theories often used to describe educational technology contexts. Merriam et al. categorize current learning theories as behaviourism, humanism, cognitivism, social cognitive theory, and constructivism (Merriam and Bierema 2013); but it is also worth considering learning theories that better acknowledge the interconnected and complex relationships we have with both the physical and digital environments such as connectivism (Siemens 2005) and paradigms founded by activity theory such as CSCL (Stahl et al. 2006) and Expansive Learning (Engeström 2016). The selected learning theories that represent existing and potential foundations of learning within VR/AR social learning spaces follow.

  • Constructivism: Merriam et al. describe constructivism as a collection of perspectives, all of which share the common assumption that learning is how people make construct meaning from their experience (Merriam and Bierema 2013). This theory focuses on the importance of learners actively constructing their knowledge via a more experiential model. Dewey referred to this as “genuine education” (Dewey 1938), Vygotsky highlights that “this process is a social process mediated through a culture’s symbols and language”(Merriam and Bierema 2013; Vygotsky 1978). Additionally, constructivism is generally considered crucial to self-directed learning (Zimmerman 1989) and to Lave and Wenger’s concept of situated learning, which suggests that the environment helps to inform learning in individuals (Lave and Wenger 1991; Merriam and Bierema 2013). One of the better known experiential learning processes is Kolb’s learning cycle, which defines learning in four steps—concrete experience, reflective observation, abstract conceptualization, and active experimentation (Kolb 1984).

  • Social Cognitive Theory: proposed by Bandura to consider both the social and personal effect on individual activity and motivation (Bandura 1989). Schunk defines Social Cognitive Theory as learning that occurs within a social environment—through observation and emulation of others. That by observing others and validating our efforts by their reactions we learn (Schunk 1996). It is an essential consideration for any social VR systems as it helps us better understand how the social context can both help and hinder learning within the individual.

  • Connectivism: focuses on the concept that all learning occurs in a network, a connection of entities, within not merely the learner’s mind but also external nodes, such as “non-human appliances”, i.e. smartphones and the web. Siemens defines connectivism as driven by the understanding that decisions are based on rapidly altering foundations, that new information is continually being acquired and processed, and that the ability to draw distinctions between important and unimportant information is vital (Siemens 2005). Though not yet accepted as an independent theory, some psychologists refer to its concepts as compatible, in conjunction, with existing learning theories (Bell 2011).

  • Computer-Supported Collaborative Learning (CSCL): CSCL is likely an important part of any discussion of the use of VR in social learning spaces, as it is concerned with how learners collaborate using computers (Stahl et al. 2006; Stahl and Hakkarainen 2020). Though CSCL is not explicitly a learning theory when considering the effect of the environment on learning, it is essential to look towards learning frameworks that additionally examine the socio-cultural or socio-historical contexts of social learning spaces (Stahl and Hakkarainen 2020). For example, is it culturally acceptable or comfortable to use unfamiliar technology in front of others? These types of questions appear significant, within a VR/AR context, where virtual environments can act as effectors or replacements for our physical learning environments. Addressing this, some CSCL frameworks build on the foundations of activity theory, a German and Marxist framework for describing human activity through a lens that considers the interconnected individual, objectives, community (Engeström 1987; Stahl and Hakkarainen 2020) and the cognitive tools or artefacts used to mediate learning such as digital interfaces (Nardi 1996). Engelström suggests that the application of activity theory to learning provides for a more complete process-based learning alternative to Kolb’s experiential learning cycle (Kolb 1984) and Nonaka and Takeuchi’s four modes of knowledge conversion (Nonaka and Takeuchi 1995) by explicitly considering the socio-cultural contexts of social learning spaces and differentiation between instruction and self-guided learning (Engeström 2016). Activity theory is comprised of several key elements—the (1) subject/individual participating in the activity, (2) the object, not tangible like a tool, but rather the “object” of direction that motivates activity, (3) the actions as conscious goal-directed processes to reach the object, and (4) operations as internalized sub-conscious processes to reach the object (Leont’ev 1978). Activity theory allows us to better understand interface interaction as a sequence of actions and processes (Cranton 2016; Kuutti and Bannon 1993) within constructivist learning environments (Jonassen and Rohrer-Murphy 1999). Though activity theory is often analysed concerning an individual, albeit with some input from the surrounding culture and community, there are versions that suggest that social interactions are significant within the learning sciences (Engeström et al. 1999). For example, instead of merely considering the individual and object (Leont’ev 1978), Engelström suggests that an activity contains three entities: the individual, the object, and the community within a proposed form of activity learning called expansive learning (Engeström 1987, 2016). Though CSCL is not exclusively concerned with any particular learning theory, those with activity theory foundations, such as expansive learning, remain significant considerations for VR/AR CSCL (Stahl and Hakkarainen 2020).

In the context of constructivism and experiential learning, it is important that learners can enter real-world situations and “authentic” environments that might otherwise be unavailable to them, due to monetary or physical space constraints. This type of learning is generally referred to as “situated learning”, where learning is situational (Stahl et al. 2006), which could also include socio-cultural aspects. Additionally, this learning can also be mediated through the use of tools, i.e. physical books, maps, or VR/AR (Engeström 2016; Merriam and Bierema 2013; Nardi 1996). Contextual learning is what researchers would refer to as near-transfer, “when evaluation is based on the success of learning as a preparation for future learning—researchers measure transfer by focusing on extended performances where students ‘learn how to learn’ in a rich environment and then solve related problems in real-world contexts” (Dunleavy and Dede 2014). We can also note that memory recollection is closely associated with environment (Chun and Jiang 1998, 2003; Smith 1979), and the power to recreate these “spatial contexts” as virtual spaces (or virtual environments) in VR/AR has great potential to help in the form of virtual “memory palaces” (Krokos et al. 2018). Not unlike some indigenous groups in Canada and their extraordinary tradition of oral histories that consist of stories passed down the generations—sometimes stories that can only be told “during certain seasons, at a particular time of day, or in specific places” (Hanson, n.d.).

Within this context, it is also worth mentioning modern teaching methodologies related to social cognitive theory such as “Learning Together and Alone” which focuses on increasing collaborative activities and group processing and reflection to enhance academic achievement (Johnson and Johnson 2002), and the importance of designing learning materials as consumable by multiple pathways (i.e. a document also being designed to be easily read by text readers) with a “Universal Design for Learning (UDL)” framework (Rose et al. 2006). Additionally, with an activity theory lens in VR/AR guided by CSCL, enhanced socio-cultural connections through distributed social activities and various virtual environments can be developed where VR/AR interactions are framed as process-based activities with digital tools (both physical and virtual).

4.2 Using a VR/AR platform in learning

Immersive 3D VLE’s allow learners to explore environments and situations that would be impossible to visit in the real world (e.g. the abstract—non-Euclidean geometry, or the physically impossible—the surface of Venus), or even to collaborate at different scales (Irawati et al. 2008) or different VR/AR spaces (Grasset et al. 2005, 2006). This is where digital tools can be of great use in developing Virtual Learning Environments (VLEs), sometimes referred to as Educational Virtual Environments (EVEs). VLEs are limited only by the creators’ vision and computer hardware, allowing for significant opportunities for learners to experience situations and environments otherwise inaccessible. Motivation for these digital tools comes from our ability to use embodiment to aid learning via three constructs proposed by researchers: (1) the amount of sensorimotor engagement, (2) how congruent the gestures are to the content to be learned, and (3) the amount of immersion experienced by the learner (Johnson-Glenberg and Megowan-Romanowicz 2017). Epistemic action, described by Kirsh and Maglio as “physical actions that make mental computation easier, faster, or more reliable” (Kirsh and Maglio 1994) also suggest great potential for using digital tools in learning, enhancing arguments for connectivism (Siemens 2005) and the ability for digital tools to expand learning. This concept can be seen as an extension of embodied cognitivism where our bodies, or perceived bodies in the case of the “Proteus Effect” (Yee and Bailenson 2007), can influence our minds. Some work even suggests that this body transfer can be effective with non-human avatars (Stevenson Won et al. 2015). Most interestingly are the forms of embodied cognition, categorized by Wilson (2002), that offer symbolic off-loading onto the environment, similar to connectivism, and situated cognition, that deals with spatial cognition within the context of real-world environments (Wilson 2002). We can quickly see how VR/AR could utilize controls, such as motion controls, and avatar representation within the VE to help learners through the environment and thus triggering cognitive processes that help influence and enhance their learning.

4.3 Prior work into VR/AR learning platforms

Within previous reviews of the literature into 3D VLEs, it is generally considered that most research in this area does not have strong learning theory foundations, where constructivism is most often described (Dalgarno and Lee 2010; Fowler 2015). In this review, we build from the foundations of constructivism to also include social cognitive theory and CSCL as significant additions to the arguments for the use of VR/AR in education. The potential is large for digital technologies, such as VR/AR, to help recreate traditional learning experiences in both self-directed and social settings.

VR/AR can help create more immersive and experiential learning opportunities by encouraging self-learning via tangible and immersive construction tasks, not possible within current Learning Management Systems (LMSs). Additionally, the concept of context within situated learning/situated cognition theory, or more specifically the wide variety of possible settings in VEs, is strengthened when we can share these experiences with students’ peers to further enhance the effect of individuals share and learn from each other (Wenger 1998), propagated by theories of social cognitivism. The ability to jump from one environment and context to another, as virtual near-instantaneous field trips, is a powerful motivator for pursuing the use of VR/AR in education—and these “field trips” within VLEs (Bouras and Tsiatsos 2006) need not be based in reality. VR/AR allows for a more immersive exploration of abstract concepts such as electromagnetism, Newtonian dynamics, or molecular attachments (Salzman and Dede 1999). Additionally, “the potential advantage of immersive interfaces for situated learning is that their simulation of real-world problems and contexts means that students must attain only near-transfer to achieve preparation for future learning”. (Dede 2009)

VLE researchers Dalgarno and Lee, extending upon the prior research of Wann and Mon-Williams, define the most significant affordances of these environments, from a learning theory perspective, are (Dalgarno and Lee 2010):

  • Enhanced spatial knowledge representation: VLEs can be used to facilitate learning spatially of environments and/or objects.

  • Greater opportunities for experiential learning: VLEs may increase experiential learning opportunities not be practical or possible in reality.

  • Increased motivation/engagement: VLEs increase engagement and motivation in learning.

  • Improved contextualization of learning: VLEs create more opportunities for learning within a context that better represents how that learning would be used in reality (e.g. learning how to speak publically in a virtual auditorium). Steffen et al. (2019) expand on this further by noting that VR/AR affordances may include enhancing positive aspects, reducing negative aspects, and recreating aspects of physical reality.

  • Richer/more effective collaborative learning: VLEs may access the digital mediums for greater collaboration possibilities (e.g. remote participation and innovative multi-user interactions—see Sect. 3.2).

Also, VLE researchers Salzmann and Dede, who often base their VLE research and development around constructivism, suggest the three following affordances of VR technology as most significant (Salzman and Dede 1999):

  • Immersive 3D representations: VLEs allow for more detailed and richer 3D environments that help to create a greater sense of actually being somewhere else (Heeter 1992; Witmer and Singer 1998).

  • Multiple Frames of Reference (FORs): Being able to see environments and objects from various points of view helps to learn (Erickson 1993). Additionally, one could also add that being able to see the world from others’ perspectives (Bertrand et al. 2018) can also be valuable for learning critical thinking, such as challenging ones’ values and beliefs (Cranton 2016).

  • Multisensory Cues: Using multiple senses (i.e. visual cues, proprioceptive cues, auditory cues, etc.) in learning helps to deepen recall (Nugent 1982; Psotka 1995).

And finally, VLE researcher Shin also notes the following, additional, affordances (Shin 2017):

  • Empathy: Empathy and embodied cognition are two concepts that frequently arise in the discussions of VLE. People can understand and empathize when they comprehend another person’s subjective experience and environment (Bertrand et al. 2018).

  • Embodiment: A virtual body, an analog of the physical body, is used to interact within the virtual environment—an essential part of presence (Biocca 1997; Slater 2009) and learning (Johnson-Glenberg 2018).

Yuen et al.’s potential benefits of Augmented Reality in education echo the principles above (Yuen et al. 2011); but also emphasize on, along with educational AR researchers Dunleavy et al., that AR is a “good medium for immersive collaborative simulation”(Dunleavy et al. 2009), well suited to social, educational settings.

5 Familiar environments using VR/AR in learning

This section will detail the primary environments in which VR/AR technologies enhance learning effects. This section will not be an exhaustive list of all VR/AR experiences created for pedagogical purposes; but rather a selection of some interesting examples that aim to show the diversity of approaches within both the research and the commercial worlds within social learning spaces, or where there is exciting potential for further development within (e.g. transformative learning within a social context). We will also break down the experiences into “education-type” categories, as some may not be formally acknowledged as educational endeavours.

5.1 Learning platform technology in education

Before discussing VR and AR platforms, we must explore traditional technology platforms in the form of LMSs, currently in widespread use across many post-secondary institutions. They have allowed students and instructors to communicate with each other via textual techniques such as forums, message boards, and email; and have been essential for running online or hybrid classroom-online courses. They also allow students to communicate with each other through forums and private groups; and that this collaboration and communication has been considered invaluable (Preece 2000). This type of technology use within classrooms have also led to new kinds of classroom structures such as online-only classrooms, hybrid (a mix of online and face-to-face), and face-to-face classrooms that utilize the concept of the “Flipped classroom”, in which “that which is traditionally done in class is now done at home, and that which is traditionally done as homework is now completed in class”, to help personalize learning for each student (Bergmann and Sams 2012). Admittedly though, there is far too little research into flipped classroom effectiveness (Abeysekera and Dawson 2015; Bishop and Verleger 2013). However, the recent demand for learner-driven models within formal education contexts lends itself well to further research into how technology such as VR/AR can help augment and accommodate these current “learner-driven” objectives in formal and informal education institutions.

5.2 The classroom

Within the classrooms of grade school and post-secondary institutions, we are seeing VR/AR technologies being used to help educate students, making classes more engaging. One of Google’s VR ventures, Google Expeditions (Google Expeditions, n.d.), was launched in 2014 to help teachers provide more immersive educational experiences. The instructors pass out smartphones to students for use within Google VR Cardboard headsets, and the students are transported to 360 videos of environments, chosen by the instructor via the tablet. Additionally, InMediaStudio provides a similar system to Google Expeditions for classroom use; but with additional interactive content (Educational Experiences–inMediaStudio, n.d.). There are ongoing explorations into the use of VR and VLEs to create more opportunities for “innovation education” that will promote ideation and innovation skills within the national curriculum in Iceland (Thorsteinsson and Denton 2013), and some early research into exploring supporting multiple non-immersive and immersive VR platforms for content delivery (Scavarelli et al. 2019). Within social learning, researchers are also exploring collaborative content creation and virtual note-taking for better retaining knowledge within co-located contexts (Greenwald et al. 2017).

Thorsteinsson and Denton feel that VLE’s are incredibly relevant to education pedagogy—namely “Constructivism, Computer Supportive Collaborative Learning (CSCL) and Computer-Mediated Communication (CMC)” (Thorsteinsson and Denton 2013). The use of computers in a social setting (i.e. the classroom), the construction of tangible pieces, and playing a role (i.e. as the avatar) within VLE’s lends itself well to VR/AR technology use within the classroom.

Kerawalla et al. (2006) explored AR for teaching science to primary school students with a “virtual mirror” (a screen that displays a live camera feed of the student with 3D content overlaid into the scene) to allow children to manipulate a model of the Earth orbiting the sun. The researchers observe that their findings “support previous work that explored the use of both VR/AR … where the focus has been on designing environments that students can manipulate and explore promoting inquiry-based learning”. Expanding upon this concept is “Save the Wild”, using a similar virtual mirror system to allow primary grade students to track origami creatures the students created within a VE (Bodén et al. 2013). Liarokapis and Anderson also explored the use of AR to help university students better understand engineering concepts, finding that AR is useful when used in parallel with traditional methods (Liarokapis and Anderson 2010), and Du and Arya (2014) propose the use of an HMD AR system as a single screen to replace the personal computer screen and large screen in classrooms. Du and Arya suggest that such an approach can help reduce distraction and investigate various methods of content control, by the teacher, student, or the system.

There are also several Universities with medical education facilities exploring how to more effectively teach anatomy to students, as the material can be difficult for students to retain. Some studies have looked at spatial awareness as the key to learner’s ability to better contextualize anatomical features; and have explored using VR/AR for more efficient and convenient anatomical model viewing (Garg et al. 2002; Preece et al. 2013).

5.3 E-learning

E-Learning generally refers to companies and individuals that create and sell products dedicated to teaching others about certain subjects via online technology. Research has been completed on building and testing VR systems for E-learning such as Monahan et al. (2008) “collaborative learning environment with virtual reality (CLEV-R)” which explored the use of 3D virtual environments and avatars that allowed multiple simultaneous students and teachers to communicate with each other via text, voice, camera. Interestingly this research team also introduced a mobile version, mCLEV-R, that allowed access to the same information with significantly limited functionality (Monahan et al. 2007). Avatar representation in these systems is essential as “online learning environments tend to be designed to facilitate disembodied ways of learning and knowing, which is at odds with contemporary epistemological theories that emphasize contextually, embodied knowledge. 3-D VEs have the potential to address this through user representation and embodied action” (Dalgarno and Lee 2010). The researchers’ system allowed for virtual lectures and for users to be able to teleport to various environments, but it was mostly a recreation of more traditional and physical forms of teaching through desktop and mobile systems. Arya et al. (2011) also covered two case studies (ESL and archaeological online courses) involving the use of virtual environments in learning and note several advantages that virtual environments, such as those found in VR/AR, that state that VR may not only allow “people in different locations to interact”; but also gives users access to facilities not available physically, enables activities that are not possible in physical settings, and offers a variety of observation and measurement tools for performance evaluation and improvement.

There are also a few companies looking at using HMD VR to allow for increased virtual immersion, and some expanded tools for using the technology, to more uniquely augment the learning experience. For example, Labster (2016) is creating VR lab training technology for increased immersion and safety (Labster 2016), and is also working with Ontario to help post-secondary institutions around Ontario set up VR labs (Virtual Reality Labs, n.d.).

5.4 Museums

Museums should also be included within VR/AR learning environments as they explore the use of VR/AR technologies to naturally engage with visitors in public settings, while also fulfilling mandates of imparting knowledge of cultural heritage. Researchers note that “museums now place an emphasis on education that they never did in the past” (Falk and Dierking 2016; Styliani et al. 2009; Sylaiou et al. 2010). Museums are currently dealing with reduced interest and attendance in younger generations, with some advocates suggesting to “make the experience personal and interactive” (Marketing Museums to Millennials 2010). This has lead to experiments in using interactive methods such as various forms of VR/AR displays to help draw in and engage younger audiences. Some interesting examples using VR/AR technologies within museum exhibits (Alexander et al. 2013; Dreams of Dali: Virtual Reality Experience–Salvador Dali Museum Salvador Dali Museum 2016; Lacoche et al. 2017; Snibbe and Raffle 2009; Sylaiou et al. 2010) often use Reality-Based Interactions (RBI) (Jacob et al. 2008), to create more embodied interactions. There is also research exploring how VR and AR artefact manipulation could help emulate the social experience of visiting the physical museum (Li et al. 2018), and explorations into using narrative across both virtual and physical museum contexts (Hoang and Cox 2018). Research by interactive artist Snibbe highlights that developing “social immersive media” installations within museums “accommodates the public, social, and informal learning that museums champion” (Snibbe and Raffle 2009). This type of media, arguably an AR form, focuses on RBI interactions that scale for one to many participants (social scalability) and may be useful in future research into social classrooms that focus on learning experiences that require many learners simultaneously using VR/AR technology. The seven principles of “social immersive media”—visceral, responsive, continuously variable, socially scalable, socially familiar, and socially balanced (Snibbe and Raffle 2009)—also appear quite relevant to socio-educational VR/AR contexts.

5.5 Simulation for training

Within various industries, there are efforts to use both VR and AR to prepare individuals for engagement with more expensive, complicated, or potentially dangerous hardware or processes. For example, the use of VR/AR in simulation could include flight simulation (Pausch et al. 1992), training for complex surgeries (Moglia et al. 2016; Sielhorst et al., n.d.), military training (Kiesberg 2015), or athletic conditioning (Belch et al. 2017). Often these systems place users into VEs that recreate a real experience or involve AR overlays that help guide users through a situation. Though these areas are beyond the scope of this review, as we are focusing less on specific training applications and highly specialized hardware, they are still worth mentioning for a broader view on the use of VR/AR in learning.

5.6 Transformative learning

Learning is more than retaining knowledge or a process, and can also involve critically evaluating held assumptions, beliefs, values, and perspectives—opening learners to mindful change. This is generally referred to as transformative learning (Cranton 2016; Mezirow 2003), and one powerful example of transformative learning is in using VR to better imagine another’s perspective (Bertrand et al. 2018), as a form of creating empathy for other individuals, cultures, or even environmental issues (Markowitz et al. 2018; Shin 2017). These changes are possible due to VR’s immersive affordances of perceptual illusions (Bertrand et al. 2018) such as embodiment (Johnson-Glenberg 2018) and presence (Slater 2009) that help to create a sense of actually being someone else, or within another environment. For example, researchers have found that VR experiences that place you within the virtual situation of homelessness can help create longer-term empathy for the homeless (van Loon et al. 2018). Filmmakers have also explored documentary and 360 film-making to place individuals into unfamiliar situations in an attempt to create the “ultimate empathy machine” (Milk 2015); or to create a more robust connection to news stories, such as those that cover prison interrogation (de la Peña et al. 2010). It is still a developing area, as it has also been noted that embodying others in VR experiences may also enhance stereotypes rather than reduce them (Kilteni et al. 2013; Nakamura 1995), but it presents an opportunity to use VR/AR technology to better connect learners with new and different situations and environments.

5.7 Socio-educational VR/AR platform

Within a discussion of VR/AR examples in education, we can also look towards other VR/AR frameworks that, though may not be directly related to education, may hold interesting lessons and system structures that can be relevant to our survey (some of these have been mentioned previously). For example, social VR platforms such as VRChat (VRChat, n.d.), AltSpaceVR (AltspaceVR, n.d.), and Mozilla Hubs (Mozilla Hubs 2018) share several characteristics. Shared features include avatar visualization, VEs that can be visited by multiple users, various forms of communication, and supporting one or more platforms (see Table 2 highlighting these differences). Across each there is a diverse spectrum of visual quality where applications such as High Fidelity focus on higher-end immersive VR hardware such as the HTC Vive (2016); and AltSpaceVR and Rec Room that support several low fidelity platforms (Desktop, Mobile, and HMDs). Additionally, these frameworks often support voice communication, gestures via motion controllers, and floating diegetic GUIs for system interactions.

Table 2 A list of VR/AR applications and platforms over the past two decades showing relative functionalities a platform support (HMD, Desktop, and/or mobile etc.) and social interactions context

Within Table 2, we have listed a diverse cross-section of the platforms that support social VR/AR of some sort over the past two decades (both in research and commercially). We note that within the last few years, with the resurgence of popularity into immersive VR with commercially available HMD’s, that the motion/6DOF controllers included are now supported in most new VR frameworks. We also note that very few platforms support more than one modality of interaction/display (i.e. only supporting AR or only supporting desktop or immersive HMD VR). The only real exceptions we observe are within AltSpaceVR (AltspaceVR, n.d.), Rec Room, and Mozilla Hubs that support VR across several platforms—desktop, and mobile, and HMD—or Google expeditions which has two forms that support either VR or AR. Interestingly, these multi-platform experiences are becoming more common in recent years as attempts to increase participation in social VR experiences has become difficult with just HMD VR, due to HMDs not being as successful as many VR enthusiasts had hoped thus far (Jenkins 2019) This has lead to platforms such as High Fidelity, that aimed support exclusively at higher-end VR HMDs, to be abandoned (Baker 2019).

The advantage of greater accessibility, combined with the openness of content created for the web, make WebXR (WebXR Device API 2019)—the successor to the non-standard WebVR API (WebVR, n.d.)—for supporting desktop, mobile, and HMD VR/AR on the web, an attractive platform to build a social VR/AR platform. Additionally, Beck et al. describe an interesting use of a VR single-wall CAVE, which also tracks another group of users from a remote location to provide an example of both remote and co-located “Group-to-Group Telepresence” and multi-user closely coupled interactions. Unfortunately, Beck et al. (2013) apparatus does require a highly specialized setup of depth cameras, projected displays, and a “Spheron” navigation/interaction device.

It should also be noted that no social VR platform currently supports co-located experiences that allow learners to move around together within a shared physical space. Allowing learners in VR to use their bodies to move around an area is more immersive and yet there is still no clear example of how to prevent issues such as collisions in HMD VR, though there has been some general work in exploring potential solutions (Langbehn et al. 2018; Scavarelli and Teather 2017). However, within AR, we do see many examples of co-located social learning experiences (Snibbe and Raffle 2009). This is likely due to AR’s more accessible nature in a multi-user context (i.e. can see others sharing the space more efficiently than within VR). Where there is VR co-located learning, in the case of Google Expeditions (Google Expeditions, n.d.), only seated VR experiences are supported, with minimal multi-user interactions, (i.e. students cannot actively move around the class and interact with each other physically); or a highly complex and low-immersion apparatus (arguably closer to the AR side of Milgram’s reality-virtuality spectrum Milgram et al. 1995) is required, as in the case of the “Group-to-Group” telepresence with projected screens (Beck et al. 2013).

6 Discussion

In this section, we describe the common themes found within our overview of the literature. Also, we highlight exciting but under-researched areas of research into VR/AR technology. There appears to be minimal research on the use of both VR and AR or a comparison between the two different technologies in their effectiveness for educational applications in similar experimental setups. Research on the individual technologies is also incomplete and conflicting, but even so, there is strong motivation to not treat VR/AR as two completely separate technologies but instead as a spectrum between “virtual/physical” to “completely virtual” (Milgram et al. 1995). We also try to take the affordances discussed in Sect. 4.3 and separate them into Table 3 as either VR, AR, or VR/AR shared affordances with a few additions. Table 4 also highlights some examples of learning with AR and VR as both individual and social learning use-cases. Within the literature, we can point to several principles that will likely be important when considering state of the art, and the future, of VR/AR in education.

Table 3 Table describing the basic affordances of VR/AR, building from prior research while also highlighting differences between VR and AR
Table 4 Some examples of solo/multi-user experiences within either a VR or AR context

6.1 Consideration of technological limits

Dede notes, in his paper on “Immersive Interfaces in for Engagement and Learning” that “understanding the strengths and limits of these immersive media for education is important, particularly because situated learning seems a promising method for learning sophisticated cognitive skills, such as using inquiry to find and solve problems in complicated situations” (Dede 2009). It is a reminder that technology should be considered part of the design for an educational lesson as opposed to the technology being projected onto an existing traditional lesson. The limits of the interaction and display methods should also be noted so that a lesson can be created that focuses on the strengths of the technologies (e.g. embodiment and presence) and not so much on its weaknesses (e.g. resolution, lack of 6DOF in some mobile VR/AR, and/or complex/unsatisfactory interaction methods). In this regard, it should be noted that some researchers feel that there is still potential in combining both traditional and “new technology” lessons to help bring in “new and old” learners (Ivanova et al. 2014).

6.2 Not a replacement

Liarokapis and Anderson note that “AR technology is a promising and stimulating tool for learning and that it can be effective when used in parallel with traditional methods” (Liarokapis and Anderson 2010). This is an important note as neither AR nor VR can display virtual environments indistinguishable from physical environments at this time. VR/AR technology is merely a tool to help augment and enhance existing educational methods rather than replace them—perhaps by offering multiple unique frames of reference (Salzman and Dede 1999).

6.3 Conflicting and ambiguous results

Due to the lack of standardization and attempts to reproduce other research results, there is conflict within the literature as to what “the best practices” are for VR/AR in education. Merchant et al. (2014) found that VR games were most effective as learning tools and that surprisingly, individual play was more effective than collaborative play. However, these results could be countered by other work suggesting individual “play” is also essential in fostering group activities (Sawyer 2017). Still, other researchers, such as those within the medical anatomy field, do not find any significant advantage of 3D models over physical models in knowledge retention (Garg et al. 2002; Preece et al. 2013), though the benefits of reduced storage space, that one virtual model can serve several students simultaneously, and the remote interaction possibilities of virtual models are significant. However, they do note it could be due to limitations within the study—perhaps HMD VR or AR with 6DOF (more immersive technologies) would create a better result? This is also noted by Du and Arya in their research into an Optical Head-mounted display (OHMD) learning assistant (Du and Arya 2014).

There is a lack of conclusive evidence that suggests that 3D VLEs support learning well (Dalgarno and Lee 2010), echoed again by Fowler that more concrete guidelines for creating VR learning content would help (Fowler 2015). Additionally, Merchant et al. (2014) conclude that though VR instruction is effective, that there are caveats, such as repeated assessments reducing learning gains. There is much work to be done in standardizing the shared terminology surrounding VR/AR, how we measure effectiveness, and what pedagogy designs should be based on. These types of difficulties in validating learning gains with VR/AR learning activities is well summarized by Dede and Richards whom acknowledge that designing, assessing, and creating VR/AR learning content, within various learning contexts and with various learners, is challenging but still a critically important endeavour going forward (Dede and Richards 2017).

6.4 Importance of embodiment

As noted by several researchers, one of the main advantages of VR is the use of embodied interaction, whereby users feel as if they are strongly connected to their avatars within a virtual environment (Ahn and Bailenson 2011; Dede 2009; Johnson-Glenberg 2018; Pan et al. 2006; Wu et al. 2013). Though embodying virtual avatars is something that is seen in VR, rather than AR, AR examples may become more common as we use virtual dressing rooms (Preuss 2019) to change appearance, and potentially, combine VR and AR multi-user experiences where AR avatars may become necessary for visualization by VR learners. To help users feel as if they are more immersed in virtual environments, and acting within them, we can look to some of the research done on the “Proteus Effect” or body transfer, an element of embodied cognition, which describes how users assume the perceived behavioural characteristics of their virtual avatars (Slater et al. 2010; Yee and Bailenson 2007), or even for learners to assume non-human characteristics (Stevenson Won et al. 2015). This will help us understand how to keep the presence high while not necessarily striving for authenticity, or hyper-realism (Jerald 2015; Zimmons and Panter 2003), as though embodied interactions are important for learning (Johnson-Glenberg 2018) there is some research suggesting that for increased accessibility some less-immersive interactions is also meaningful (Rogers et al. 2019).

6.5 Accessibility

As noted previously, there are still many avenues to explore in determining the most effective techniques for utilizing VR/AR technologies in education. This includes students with special needs as they may not be able to use technologies that require subtle movements with their bodies, such as HMD VR. For example, “AccessibleLocomotionWebXR” was an explorative project, created at the 2019 MIT Media Lab “Reality Virtually” hackathon, that developed an HMD VR component that allowed users to navigate and interact with just a single input (Dubois 2019). Also, within the broader context of Universal Design for Learning (UDL), how do we make sure the learning technologies and materials can adapt to learners with various needs and preferences (Rose et al. 2006). As noted by others “in the studies reviewed from journals there was no evidence of AR applications in educational settings that address the special needs of students” (Bacca et al. 2014), and that social VR can be uncomfortable for women (Outlaw and Duckles 2017), or for anyone using unfamiliar technology, such as interactive screens (Brignull and Rogers 2002) or HMD VRs in social environments (Rogers et al. 2019; Southgate et al. 2019). Bodén et al. (2013) have the following suggestions for AR, which could be applicable to social VR also:

  • AR needs to be as time efficient as existing methods of teaching.

  • The exploration performed with AR needs to be guided as to maximize learning.

  • AR within classroom environments needs to be designed for the institutional context.

Platforms such as WebXR (WebXR Device API 2019), which support desktop, mobile, and HMD VR/AR, will help in this regard as they will force VR/AR application developers to consider multiple, accessible forms of display and input technologies, which can help inform experiential learning methods/strategies in VR/AR.

6.6 Content creation

Often, the process of content creation is a complex task left to knowledgeable developers and designers, as an afterthought, rather than being accessible to low-technical knowledge users such as instructors and learners. There are some examples of commercial software such as VRChat (VRChat, n.d.), and Second Life (Second Life, n.d.) allowing import of previously created 3D avatars and environments but this often still requires some knowledge on where to find, update, and adapt these models. For a learning platform, that one would hope to be successful, there should be considerations on how to allow low technical knowledge users to create, or piece together, their content as environments, interactions, and learning experiences. It could take the form of a marketplace as found in Second Life; perhaps as an online repository of virtual experiences such as might be found in endeavours to make all of the web-accessible in WebXR (Supermedium, n.d.) or Mozilla’s “Spoke” (Mozilla, n.d.) for creating and importing content into Mozilla Hubs (2018). These types of content creation or “content collage” tools for VR/AR content in learning are precedented by the structure of most LMSs that allow instructors to bring together modules to create custom online learning environments.

6.7 VR versus AR

There are few examples, both within educational contexts and otherwise, that support both VR and AR. VR and AR share many similarities, and researchers such as Milgram et al. (1995) group them into a spectrum with Mozilla Mixed Reality Research recently publishing blog posts on designing for both simultaneously in WebXR (Paracuellos 2018). It seems inevitable that VR/AR platforms of the future will incorporate both VR and AR modes. This could merely be the detection of the environment and people around us to prevent collisions (Scavarelli and Teather 2017), incorporating humans’ limbs into VR environments via the depth-sensing technologies such as the Leap Motion (Leap Motion, n.d.), Kinect (Developing with Kinect, n.d.), and the Logitech “Bridge” VR keyboard (Introducing the Logitech Bridge SDK, n.d.) (arguably an example of AV, Augmented Virtuality Milgram et al. 1995). Also, perhaps, it could be something more intrinsically tied to the type of educational experience we are striving for, taking into consideration both individual and social accessibility along with subject matter.

6.8 Challenges of implementing VR/AR in the classroom

There are several challenges to implementing VR/AR into formal educational curriculums in classrooms. Some of these relate to teacher training and student expectations, where systems such as Google Expeditions (Google Expeditions, n.d.) requires some, albeit minimal, setup and instruction on how teachers and students can navigate the system. In several studies, technology pitfalls provide for some muddled empirical results, creating “false expectations” (Bodén et al. 2013) of interactions. This concept was noted in varying forms within the literature—that interactions were not always clear and that the affordances of digital technology somewhat limits the freedom of movement and interactions within the virtual world. Additionally, the cost of VR/AR equipment may still be an issue—listed as the second primary concern after user experience by Perkins Coie (Augmented and Virtual Reality Survey Report 2018), and so a platform and framework that includes lower-cost entry points such as allowing personal smartphones to access content (e.g. Google Cardboard or WebXR) will be important. Currently, much of the research cited in this survey focus on post-secondary education and, in most cases, we can assume most students have access to a smartphone.

How the technology is used is also essential, and researcher Ed Smeets notes “93% of teachers surveyed had implemented some form of technology integration into learning, but rather that the technology is being used for skill-based learning, as opposed to supporting deeper levels of learning” (Smeets 2005). Bodén et al. (2013) suggest “teachers should be educated on methods in which they can adapt existing technologies to support their learning structures purposefully, rather than treating technologies such as computers as isolated activities”. There is more research required to find stronger correlations between the use of VR/AR in learning and more traditional educational media.

Dalgarno et al. note in their paper on learning affordances in virtual worlds “currently, design and development efforts in this field are largely hit-and-miss, driven by intuition and ‘common-sense’ extrapolations rather than being solidly underpinned by research-informed models and frameworks” (Dalgarno and Lee 2010). Some results also appear to conflict, as within the literature where anatomy research suggests that 3D virtual models are not much better than physical models (Garg et al. 2002; Preece et al. 2013); or that many researchers have focused on the social learning aspect of their systems; but that still other researchers find “game-based learning environments were more effective than virtual worlds or simulations” (Merchant et al. 2014).

The technology also remains a barrier for implementing VR/AR into classrooms easily as there are many resources required to build content (3D modeling, texturing for building VEs, developing systems capable of displaying 3D content, and handling many simultaneous connections, etc.). Utilizing more accessible technologies such as WebXR (WebXR Device API 2019) and A-Frame (A-Frame n.d.) could be helpful in this regard—allowing for a large community of resources and accessible technologies to create content. Unfortunately, there are few examples of this type of technology for educational content delivery with “low-friction” interactions (Scavarelli et al. 2019). There are a few pre-made systems for use within educational institutions, but thus far, no widespread adoption and system stands out to minimize financial risk to institutions. From the user’s perspective, there is also still much work to be done on standardizing interactions and allowing explorations of the virtual worlds without users feeling too constrained by immersive VR systems. For example, current commercial VR input methods lack true haptic (physical) feedback beyond controller vibration and have many buttons and controls more familiar to console gamers that mobile device users. Additionally, current VR systems do not possess methods for preventing collisions between multiple users sharing the same physical space (Langbehn et al. 2018; Scavarelli and Teather 2017).

Another concern is in the widespread adoption of VR/AR as educational tools are in their accessibility. Current popular methods of VR involve stereoscopic HMDs that may not work as well for those that have vision problems, and mobility issues could make using AR platforms or VR/AR inputs/controllers difficult. These accessibility concerns will also refer to the social embarrassment or social anxiety of using VR/AR around others (Rogers et al. 2019; Southgate et al. 2019) until the technology is more widely adopted. Though touched on briefly by some papers cited within this survey, more work needs to be done in allowing these systems to better degrade into experiences/platforms that can be used by students with a wide range of varied accessibility issues within modern implementations of VR/AR technology.

7 Future research directions for VR/AR in education

As noted in the previous sections, there are many exciting facets to consider when looking to create VR/AR applications in social learning spaces. Generally, three primary areas of interest and research direction that become uniquely apparent are accessibility, the unclear interplay between parallel realities (the virtual and the physical) in learning, and the learning theories and methodologies that can better support VR/AR learning within social learning spaces. Additionally, we must also always look to observing and verifying, through experimental rigor, how VR/AR can help enhance educational practices, propagating the use of these specific tools within these learning contexts (Dalgarno and Lee 2010; Fowler 2015). Researchers note there are not enough real-world case studies on the use of VR/AR for learning, particularly HMD VR (Markowitz et al. 2018), and that researchers struggle to find will to engage with the risk-taking required to try out these technologies within authentic contexts (Dede and Richards 2017).

7.1 Accessibility

As discussed in the previous section, accessibility will always be a significant concern for any learning materials as learning is not exclusive to one group of people, but rather to all. When we consider social learning spaces, such as classrooms and museums, we must also consider how to make sure that the technology we use within these spaces enhances learning rather than hindering it. We suggest three specific areas where further exploration may help the use of VR/AR in social learning spaces better follow Universal Design for Learning (UDL) principles in creating technology adaptable to a variety of learners—as individuals and as groups of individuals learning together.

7.1.1 Platform scalability

Platform Scalability refers to a system capable of adapting to a range of VR/AR capable platforms (desktop, mobile, large screens, etc.). This is comparable to a virtual form of UDL, which describes how to increase the accessibility of learning materials via (1) Multiple Means of Representation, (2) Multiple Means of Expression, and (3) Multiple Means of Engagement (Rose et al. 2006). By supporting multiple platforms, VR/AR content can be potentially more accessible with “multiple means of expression.” WebXR, as a possible solution, supports many of these platforms; but more research into this area would help understand and design how interactions, navigation, and embodiment in an education context may change as one moves between platforms. This is especially important in social learning spaces as prior research into public technology use suggests that “social embarrassment” may limit the use of unfamiliar devices (Brignull and Rogers 2002), including papers that suggest that “the awkwardness of physically moving in VR with an onlooker” may also be an issue in VR (Rogers et al. 2019), and that female students may be more hesitant to wear HMD VR in social spaces (Outlaw and Duckles 2017; Southgate et al. 2019).

The effect of the social environment when using technology falls well in line with recent work that suggests that social facilitation (simple tasks becoming easier with an onlooker) and social inhibition (complex tasks becoming more difficult with an onlooker) also applies to completely virtual avatars (Miller et al. 2019), and that learning theories such as social cognitive theory and activity theory will be critical in helping to define the social relationships between technology, learners, and their virtual and physical spaces. Additionally, cybersickness in HMD VR is still an active line of research due to remaining present in the general population, even with access to contemporary VR systems (Guna et al. 2019; Magaki and Vallance 2019). The ability to choose another platform, such as desktop or mobile, that suffers less from these problems is worthwhile.

  • Does responsive VR/AR design that adapts the platform accessing content, increase engagement, and participation in learning?

  • What are the best practices for adapting interaction types across multiple platforms?

  • Does social embarrassment/social anxiety limit the use of some VR/AR platforms (e.g. HMDs), limiting learning?

7.1.2 Social scalability

Social Scalability is based on Snibbe et al.’s definition of social scalability within a museum context whereby “interactions are designed to share with others … interaction, representation, and users’ engagement and satisfaction should become richer as more people interact” (Snibbe and Raffle 2009). This definition could expand to include VR/AR multi-user applications that support variable numbers of remote (to reduce geographical barriers) and co-located (classroom) users working together towards shared goals. This would build from Roberts et al.’s explorations into supporting teamwork via tightly coupled interactions (Otto et al. 2006; Roberts et al. 2003) but could also include discussions on how, or if, to support multiple co-located learners in HMD VR-based platforms to prevent collisions between learners and objects.

  • How does social scalability affect co-presence and learning outcomes?

  • What do socially scalable interactions look like in VR/AR learning?

  • How do remote and local learners communicate and interact together in virtual spaces?

7.1.3 Reality scalability

Reality Scalability refers to the concept of an application allowing both VR and/or AR perspectives. Some studies explore “mixed-space collaboration” (Grasset et al. 2005) and VR and AR collaborative interfaces (Grasset et al. 2006) but there are few examples of explorations of these techniques within education. Reality Scalability may become increasingly important in remote collaboration and co-located collaboration between peers. As noted within the prior section on platform scalability, allowing learners to use a platform such as AR, over VR, may be preferred as they can be more aware of the social environment at this time.

  • Are there any learning advantages for adopting non-egocentric viewpoints?

  • How do we design a Virtual Learning Environment (VLE) for switching between VR and AR?

  • How do we synchronize users, environments, and real/virtual objects between physical and virtual locations in AR and VR?

7.2 Parallel realities

There is some work looking at how the virtual work can affect our reality, in how we identify in virtual worlds can change our behaviour (Yee and Bailenson 2007), in how task performance can be affected by others through social facilitation and social inhibition (Miller et al. 2019), and in how virtual spaces can also change behaviour (MacIntyre et al. 2004; Proulx et al. 2016); but there is still much work to be done on how the physical learning spaces we inhabit may affect our virtual behaviours. We have seen that the very nature of using this technology can inhibit participation and comfort (Brignull and Rogers 2002; Outlaw and Duckles 2017; Rogers et al. 2019); but it is still very early beyond some studies into how we prevent collisions in shared virtual spaces (Langbehn et al. 2018; Scavarelli and Teather 2017). Just as connectivism and activity theory suggest that our digital tools and the socio-historical culture that surround learners become intrinsic part of the learning process, we should also consider how these same processes apply to both virtual environments and physical worlds as it becomes clear that the virtual worlds and physical worlds are not mutually exclusive entities. Rather, they are interwoven into parallel realities that affect each other and every individual within them in strange and exciting ways (Stevenson Won et al., n.d.). Notably, as we consider how increasingly blurred the lines between VR and AR become in modern HMDs that support both via hand-tracking, windows into the physical world, and potentially, in the future, virtual spaces that scan and enhance our physical spaces digitally (Sra et al. 2016).

  • How does the interplay between the virtual and physical spaces help or hinder learning?

  • What are the ethics that surround the use of VR/AR that enhances or augments reality with measurable behavioural effects?

  • Does the interplay of physical and virtual realities necessitate the construction of physical learning spaces built with virtual world modelling in mind?

7.3 Learning foundations

Though most VR/AR projects in learning depend on constructivism, experiential learning, and/or social cognitive theory as a foundation for chosen features and properties, there are additional theoretical and methodological foundations within CSCL that may help lend more significant consideration to both the virtual and physical environments within a socio-cultural context. Activity theory, in the form of expansive learning, includes not only digital tools and objects/artefacts as an intrinsic part of the learning process; but also the socio-historical properties of learning spaces (Engeström 2016; Stahl and Hakkarainen 2020). This could include some exciting explorations into the interplay between the social, spatial, and cultural aspects present within both the virtual and physical learning spaces; and how to better create VR/AR content that acknowledges them. This could include exploring how wearing in HMDs in learning spaces is not yet culturally acceptable (Rogers et al. 2019), or that being a woman in social VR spaces may encourage virtual harassment, decreasing participation in activities using these technologies (Outlaw and Duckles 2017). The interconnected processes of learning within individuals and their actions, the social environment, and the spatial environments are complex, and as we add in virtual environments that may change behaviour, we may need to look towards additional learning theories that better encapsulate how this learning happens. In the case of activity theory there is already precedent for exploring its use in HCI (Kuutti and Bannon 1993) and constructivist learning environments (Jonassen and Rohrer-Murphy 1999), with some reality-based interaction frameworks echoing similar principles about greater consideration of social skills and environment (Jacob et al. 2008; Snibbe and Raffle 2009), and in learning (Engeström 2016). Activity theory thus appears a good candidate for future explorations including VR/AR.

  • What is the effect of the socio-cultural context on VR/AR learning performance?

  • Are learning theories from other fields, such as activity theory, worth exploring for use within VR/AR in social learning spaces?

  • How do existing learning theories apply to parallel realities (e.g. physical and virtual)?

7.4 Summary

The future of VR/AR in education will involve the use of a platform, not unlike current LMS/CMS systems used within educational institutions such as schools and museums, built with more significant consideration of accessibility and the interplay between the virtual and physical, social and individual, in mind. Note that a VR/AR platform need not be mutually exclusive from current LMSs and could extend their existing functionality. These new VR/AR frameworks and platforms will allow instructors and directors to not only customize content but also help direct it live while learners explore it with various VR and/or AR devices. Desktop VR/AR systems will likely cede to, or work with, smaller mobile implementations such as the Google Expeditions system (Google Expeditions, n.d.) and standalone platforms such as the Oculus Quest and/or Microsoft Hololens with high-quality input and output controls that allow for more natural interactions within the world. This is perhaps where existing research into Reality-Based Interactions using full-body and gestural inputs can be useful (Jacob et al. 2008; Snibbe and Raffle 2009) as it allows another perspective into how we can have multiple learners interacting together in a genuinely collaborative manner (Greenwald et al. 2017; Keep Talking and Nobody Explodes, n.d.; Scavarelli and Arya 2015).

8 Conclusion

In this survey, we explore the use of VR and AR for education within social learning spaces, while also highlighting new areas of research and development to explore. We suggest that VR/AR educational platforms should include accessibility as a primary concern across three main areas: Platform Scalability, Social Scalability, and Reality Scalability for better UDL considerations (Rose et al. 2006) and more accessible social engagement between learners sharing the same social learning spaces. We also suggest that greater consideration should be placed on exploring the interplay with virtual and physical realities, and on exploring learning theories that may better guide VR/AR learning within physical/virtual social learning spaces.

Many researchers are optimistic about the use of VR/AR in education as Merchant et al. note that research into using these technologies for learning is encouraging in that they “provide evidence that virtual reality-based instruction is an effective means of enhancing learning outcomes. Educational institutions planning to invest time and financial resources are likely to see the learning benefits in their students” (Merchant et al. 2014). The greatest challenge will lie in determining how best to utilize this technology to better enhance students’ learning in a manner that is not merely recreating, or replacing, the physical classroom but also enables activities, and access to facilities, that are not possible in physical settings (Arya et al. 2011).