Introduction

A laboratory course experience is recognized as a key component and degree requirement for most undergraduate science and engineering majors and has generally been considered the backbone of science education (Hofstein and Lunetta 2004; Reid and Shah 2007). Such experiences, which typically occur in specially designated physical spaces known as teaching laboratories, are intended to provide students with the opportunity to put theory into practice in the form of appropriate experiments at a given level of advancement in a discipline or with specific topics within a course or program of study (Ural 2016). The activities that occur in such courses are typically designed as expository experiences that both require students to follow specified procedures such as performing experiments or making observations while at the same time demonstrating skills and understanding of associated concepts (Kempton et al. 2017). This implies that current laboratory education lacks activity akin to that of a practicing scientist, which would include “determin[ing] the problems, develop[ing] solutions and alternative solutions for these problems, search[ing] for information, evaluat[ing] the information and communicat[ing] with [peers]” (Ural 2016 p. 217). In essence, a teaching laboratory is more likely to consist of rote procedural tasks that do not support the development of critical thinking, ones that could be used to translate concepts, theories, or arguments beyond a superficial level of knowing (Basey et al. 2000; Gobaw 2016; Ryker and McConnell 2017; Wan et al. 2020).

Current technology has the capacity to challenge the traditional notion of a teaching laboratory and subsequently our perspective on the required laboratory experience for undergraduate students (Bernhard 2018). Virtual laboratories (V-Labs) are technology-mediated experiences in either two- or three-dimensions that situate the student as being in an emulation of the physical laboratory with the capacity to manipulate virtual equipment and materials via the keyboard and/or handheld controllers. Such experiences are typically delivered in two-dimensions (2D) with either a desktop or laptop computer, which is considered a low-immersion technology, or in three-dimensions (3D) with a head-mounted display, which is classified as a high-immersion technology. The use of the term immersion thus describes these experiences as a measure of the technology used and implies a collection of affordances for the student (Cummings and Bailenson 2016). The actual experience of the student, often captured as the variable presence or feeling of being in the emulated environment, involves the interplay of their psychology in response to these affordances, which includes their sensory perception, level of control, and ability to modify the environment (Riva et al. 2003).

V-Labs typically entail making observations and completing experiments that may involve the testing of theories and/or hypotheses (de Jong et al. 2014; Potkonjak et al. 2016). As such, this technology has the potential for delivering a first-person experience that very closely approximates not just that of a teaching laboratory (Vrellis et al. 2016) but that of a research laboratory, a situation where the student stands virtually in the shoes of a researcher and performs complex research skills as part of a larger multipart task (Makransky et al. 2016, 2017) (Fig. 1). Examples of such skills would include preparing solutions, pipetting or creating calibration curves, as well as setting up and operating specialized analytical equipment such as a spectrophotometer or PCR machine. To date, evaluation studies in certain STEM contexts have shown that student achievement or performance for such V-Labs is consistent with that found in traditional face-to-face laboratory experiences (Darrah et al. 2014; Ekmekci and Gulacar 2015; Goudsouzian et al. 2018; Hawkins and Phelps 2013; Koh et al. 2010; Makransky et al. 2016; Ogbuanya and Onele 2018; Olympiou and Zacharia 2012; Vrellis et al. 2016).

Fig. 1
figure 1

Screenshot image of a VR Lab showing the student view as they work to determine blood type for a sample (image courtesy of Labster ApS)

A V-Labs’ contextually rich environment and high-fidelity approximation of research practice is hypothesized to provide students with transferable knowledge (Baker et al. 2016) in the service of adaptive expertise (Alexander 2003). Transferable knowledge is that which has been acquired in one situation, but is available and can be used in the performance of a new or novel task in another situation or context (Kester et al. 2001). If students perceive and experience a V-Lab as if physically being in that laboratory, then their proficiency with any skills they acquire should translate to future work in a physical laboratory (Jensen and Konradsen 2017). Recent research has indicated that the intentional blending of V-labs with physical experiences has the potential for being especially effective in this regard (Olympiou and Zacharia 2012). In addition, if the assumption of role-taking proves to be accurate and V-Labs serve to make the practice of research more explicit and accessible, then such experiences could serve as a vehicle for supporting student persistence and success by aiding the development of their identity and confidence in relation to the professions of science and engineering (Chemers et al. 2011).

Since all of the necessary laboratory equipment is virtual, V-Labs also hold promise for institutions that are under-resourced for traditional laboratory facilities (Ogbuanya and Onele 2018). V-Labs can be completed in any physical space, such as a traditional classroom, lecture hall, or conference room (de Jong et al. 2013). Using V-Labs to offer laboratory courses in such spaces means that they can be completed by groups of students in arrangements that include the support of an instructor or teaching assistant without the need for costly, specialized and often difficult to schedule facilities (Tatli and Ayas 2013). V-Labs would also have utility in similar ways for online courses as long as students are provided with adequate social experiences and support. Thus, V-Labs can be viewed as representing a contemporary step in improving the accessibility of science and engineering education by minimizing the need for specialized laboratory infrastructure while maintaining the valued essence of the experience.

Over the past decade, immersive technologies have become much more affordable and feasible for educational applications (Martín-Gutiérrez et al. 2017). Though the potential of V-Labs has been extolled and investigated in a number of areas (e.g. Jones 2018), this research has not been synthesized for the specific context of undergraduate science and engineering education. For example, recent reviews of the literature pertaining more generally to virtual technologies have been completed for such topics as: (a) the general application in the context of K-20 education (Mikropoulous and Natsis 2011), (b) trends in available laboratory environments and their capabilities (Potkonjak et al. 2016), (c) training environments in the context of building evacuation (Feng et al. 2018), and learning outcomes of non-traditional laboratories compared to physical laboratories in the full K-20 context (Brinson 2015; Brinson 2017).

Our effort builds upon the work by Brinson (2015, 2017) who used a broad, K-20 perspective on participants and equally diverse collection of laboratory types in synthesizing studies from 2005 to 2015. While quite helpful as a synthesis of general learning outcomes due to their expansive collection of studies and KIPPAS analytical framework (i.e., Knowledge & Understanding, Inquiry Skills, Practical Skills, Perception, Analytical Skills, Social & Scientific Communication), the studies lack disaggregation by educational level, context, or specific nature of the experience, which are significant limiting factors. For example, V-Labs (as defined here) were classified together with other very different kinds of technology experiences such as simulations (see the methodology for how and why simulations were distinguished from V-labs) as well as those labeled as remote laboratories (see Ma and Nickerson 2006 for a discussion of the differences). The context and purpose of an undergraduate college or university course are significantly different from that of a typical middle or high school and combining these results is problematic. In addition, aside from aggregating findings for general learning outcomes, a synthesis of the perspectives, goals, contextual characteristics, and activities that were used in these studies was not provided. Therefore, a more nuanced approach focused on elements such as these was intended to illuminate the scope and limitations of the how, what, and why of our understanding about using V-Labs in this specific context.

Considering the global effort in higher education to adopt inquiry-based learning practices in laboratory education (Healey and Jenkins 2009; Mavromanolakis et al. 2014), as well as the increasing interest in innovative ways to support and engage students in the domains of science and engineering (see Holden and Lander 2012), a synthesis of V-Lab interventions in this context merits consideration. Thus, the goal of this research was to focus only on the use of V-Labs in the specific context of undergraduate science or engineering education and to do so through systematic review and synthesis of existing peer-reviewed and published empirical studies. Further, in an effort to provide continuity across time and to support the differentiation of results based upon the nature of studies synthesized, we continue the use of Brinson’s (2015) KIPPAS framework as one component of our analysis.

Due to the range of possible applications and perspectives that encompass undergraduate science and engineering education coupled with the rapid changes in technology that influence the form and function of what might have been defined as a V-Lab, we focused our study on the contextual characteristics that have been studied and the goals and perspectives that have influenced and emerged from this genre of empirical research during the time period 2009–2019. Contextual characteristics included the types of activities and outcomes that were investigated while the goals, perspectives, and interpretations were those explicitly indicated by the researchers in their published research papers. Accordingly, the following research questions framed our synthesis:

  1. 1

    Which contextual characteristics, types of activities, and outcomes have been explored in studies of V-Labs?

  2. 2

    What themes define the goals, perspectives, and interpretations used in this genre of research?

Methodology

This study involved a systematic review and synthesis of peer-reviewed empirical research papers published between 2009 and 2019 that were first identified through a search of seven relevant databases, then selected for inclusion based upon a defined criterion (Petticrew and Roberts 2006). Our approach employed elements of the protocol set forth by Khan et al. (2003) and according to our research questions, consisted of identifying relevant work, assessing the quality of studies, summarizing the evidence, and interpreting the findings. Each of these elements are detailed in the subsequent sections.

Identifying Relevant Work

The studies for review were located through systematic searches of ProQuest, Educational Journal, Wiley Online, APA PsyNet, Web of Science, JSTOR, and Applied Science and Technology databases where only peer-reviewed empirical journal articles were deemed eligible. The use of published journal articles as acceptable evidence for review was based upon the assumed level of quality that is provided by the peer-review process. As well, journal articles tend to provide a more complete description and analysis of data, as opposed to conference proceedings that may involve only the early stages or preliminary forms of each.

The search process was conducted using the following terms: “virtual reality” and undergraduate and “STEM,” “virtual learning” and undergraduate; “virtual laborator*” and undergraduate; “simulation learning”; “virtual simulation” and undergraduate; “computer simulation” and undergraduate and science; “computer simulation” and undergraduate and engineering. To limit results to the most up-to-date research, additional parameters included limiting the applicable date range to January 2009 to December 2019. This process resulted in 1653 potential articles which were then subjected to our additional criteria in two additional steps based upon a review of abstracts and then screening the full articles.

Inclusion-Exclusion Criteria

The inclusion and exclusion criteria were constructed based upon the purpose and research questions, which were then modified slightly as the abstracts were read and issues and opportunities were discovered. As part of our exploratory searches to identify relevant studies, we recognized the need for a primary criterion based upon a clear distinction between V-Labs and what are more commonly described as simulations. Since this directly involved our operational definition of V-Labs, we begin here by explaining the nature of this distinction and why studies of simulations were excluded from our corpus of research as the first stage in our process.

The term simulation and some form or iteration of the phrase virtual laboratory are oftentimes used interchangeably and synonymously in reports of research when referencing certain educational technologies. For instance, Ma and Nickerson (2006) defined virtual laboratories as, “imitations of real experiments…[where]...the infrastructure required for laboratories is not real but simulated on computers” (pg. 6). This definition was then used by Brinson (2015) who’s synthesis included studies that used both simulation and virtual laboratories but were couched under the term virtual laboratory because of the fully online nature of the experience. Though the terms share aspects, there is a clear distinction in terms of what they describe. Both are based upon computational models, but simulations typically do not involve solving a scientific problem, but are constrained in such a way so as to focus users on manipulating and/or modifying a given set of parameters as a precursor to understanding the effect on other variables in the form of an output. Examples of simulations include the many options available from the PhET project at the University of Colorado (https://phet.colorado.edu/). An exception to this would be simulations in nursing or medical education where the computational model can take the form of a virtual patient or a haptically enabled mannequin which are lifelike, but not situated in an emulation of the real-world (Rush et al. 2010).

In medical training programs, the simulation is used more broadly to define a purposefully constructed situation involving a role-player/actor, virtual patient, or haptically enabled mannequin, such that the term represents a portrayal of a relevant actual event or situation (Rush et al. 2010). An example of this would be Shadow Health (www.shadowhealth.com), which creates virtual patients that are designed to train medical staff on interviewing and examining patients. Such a simulation allows users to use natural language to engage with the virtual patients and receive answers based on the questions asked. In domains such as architecture, engineering, and construction, simulation refers to creating prediction models with the intent of determining accurate outcomes that can then be used to inform decision-making (Akhavian and Behzadan 2015).

Studies that involved simulations were not included in our review and applying this criterion excluded 970 manuscripts. The remaining 88 manuscripts were subjected to the following additional criteria. Studies that reported the results of an empirical study involving undergraduates as participants using a V-Lab in either 2D or 3D format as part of laboratory learning and were published between January 2008 and December 2018 were included and those that did not meet this criterion were excluded (e.g., Schott and Marshall 2018; Uribe et al. 2016). Manuscripts that did not include the collection and analysis of data on student outcomes (e.g., learning, perception, self-efficacy, motivation) were excluded (e.g., Dalgarno et al. 2009; Fang and Tajvidi 2018; Kang et al. 2018; Williams 2010). Additionally, Achuthan et al. (2017); Reyes-Aviles and Aviles-Cruz (2018); and Tetour et al. (2011) were excluded because these studies focused on remote experiments and/or augmented reality environments. After applying the full complement of inclusion and exclusion criteria, the corpus of data consisted of 25 articles, which were published in 23 different journals that cover the subjects of science and engineering. These articles were then subjected to our content analysis procedure (Fig. 2).

Fig. 2
figure 2

Inclusion and exclusion criteria and their impact on the number of studies reviewed

Analysis Procedure

The initial phase of content analysis consisted of a coding process for study characteristics based upon our research questions where individual narrative or representational elements of each manuscript were selected as evidence (Krippendorff 2012). Elements of the manuscripts that addressed our research questions were highlighted and given a name (code) that represented a characteristic. Characteristics and example codes included the: general domain of science or engineering (e.g., chemistry, biology); nature and form of technology used (e.g., 2D, 3D); specific domain or course context (e.g., microbiology, manufacturing, introductory physics for engineers); study design (e.g., stated purpose or goal, definition of virtual reality, research questions, constructs investigated); theoretical framework and/or major areas of research reviewed for framing the study (e.g., motivation, cognitive theory of multimedia learning); methodology (e.g., qualitative, quantitative, mixed); forms of data (e.g., interviews, surveys); results (e.g., charts, graphs, knowledge claims). In addition, student outcomes were coded according to the KIPPAS framework (Brinson 2015), which consists of six categories for the types of student outcomes: Knowledge & Understanding (e.g., Quiz/Exam), Inquiry Skills (e.g., testing a hypothesis), Practical Skills (e.g., Lab Practical), Perception (e.g. Survey/Questionnaire), Analytical Skills (e.g., performing an analysis), and Social & Scientific Communication (e.g., Lab Report/ Written Assignment). As an analytical tool, the KIPPAS was designed to reflect the goals and laboratory experiences identified by the National Research Council in two separate reports (NRC 2006, 2012). The KIPPPAS was also designed to capture the frequency of outcomes assessed, therefore, if one study assessed two outcomes, both would were represented. The identified characteristics were thus used as a means for making the epistemic elements of each study explicit and accessible for comparison.

The authors met regularly to code, categorize, and discuss emergent patterns and themes to consensus (Lincoln and Guba 1985) and a synthesis table was constructed to organize the characteristics and improve our interpretation of the commonalities among the studies (Table 1). We began our synthesis by building an understanding of the why and what concerning the use of V-Labs, then proceeded to the how. Exploring the why and what resulted in our theme of Perspectives on Implementing V-Lab Experiences, which involved the expectations that were used as a means for predicting or hypothesizing the impact of V-Labs. We inferred that how someone planned on using something was most likely influenced by their perception of its usefulness or intent for how it should be used. Exploring the how of these studies resulted in our theme of Interpretation of V-Lab Experiences.

Table 1 Overview of papers used for the review

In the following sections, we present our results based upon the research questions. We begin by defining the content domains, contexts, and outcomes, which is followed by a synthesis of the perspectives, goals, and interpretations.

Results

The majority of studies fell within the general domain of science (60%) and to a lesser extent engineering (40%), which is consistent with prior results from Mikropoulos and Natsis (2011) (Table 2). Ninety-two percent of the learning environments described in these studies were of the 2D desktop variety (e.g., Hawkins and Phelps 2013) and 75% were acquired from an outside source/vendor and were not exclusively designed for the course in which they were implemented. The vendor for the V-Labs was also not exclusive, with only seven articles sharing the same vendor (i.e., Labster), but for different V-Lab experiences. While the selection criteria required articles to be situated in undergraduate education, 26% reported involving an introductory course (e.g., Olympiou and Zacharia 2012) and for 80% of the total, this was a single experience that was related to a module or activity within the course.

Table 2 A breakdown of studies by subject within the general domain of science and engineering

As table one suggests, the overall evaluation focus in lieu of research is dominant. In these cases, the authors identified the V-Lab condition as playing a role in the learning process and subsequently measured the impact on some form of student outcome. The studies typically lacked a theoretical perspective or in some cases even research questions that could have been used for interpreting changes in student outcomes. A positive change in content knowledge was the most often targeted student outcome (Fig. 3). For instance, Chini et al. (2012) explored how the use of a V-Lab would help students understand science concepts related to pulleys while Darrah et al. (2014) evaluated students’ content knowledge after experiencing a V-Lab. Exploring student perceptions of V-Labs was also a common goal for these studies. For example, Dyrberg et al. (2017) studied science students’ attitude with regard to their use of a V-Lab while Koh et al. (2010) explored how the use of a V-Lab in an engineering course would influence students’ motivation to learn the course content.

Fig. 3
figure 3

The a types of student outcomes that were assessed and b instruments that were used to evaluate learning across all 25 studies. Some studies assessed numerous outcomes with multiple instruments

Although improving practical or inquiry skills is considered an affordance of laboratory education, it was not widely studied (24%) with V-labs. Examples of when it was, included the process of analyzing protein expression by muscle cells (Polly et al. 2014) or practicing how to properly generate x-ray images with virtual patients (Gunn et al. 2018). This finding is also consistent with those of Brinson (2015).

Perspectives on Implementing V-Lab Experiences

More than half of the studies (60%) were based upon an expectation that V-Labs would bring about some predetermined student outcome of interest. A number of studies based this expectation on V-Labs functioning as inquiry activities, assuming that students would find this interesting and would take advantage of opportunities for practice with laboratory methods. For instance, Polly et al. (2014) predicted that a V-Lab would improve students’ ability to apply molecular laboratory techniques as well as to repeat and redesign experiments. Cheong and Koh (2018) indicated that a V-Lab could be used by students to solve engineering related math problems and Ogbuanya and Onele (2018) hypothesized that a V-Lab would influence deeper learning through engineering practice.

No consistent pattern was identified for the remaining studies, as they were driven by a variety of perspectives. For instance, Chini et al. (2012) was based upon the expectations of dynamic transfer—the capacity to use new knowledge and problem-solving skills in a unique context—they endeavoring to understand how the use of support elements within the V-Lab such as alternative interpretations and feedback influenced students’ problem-solving skills in physics. In an effort to provide “alternative interpretations and feedback” (Chini et al. 2012 p.3), the V-Lab provided immediate feedback to responses concerning abstract concepts. Similarly, Vrellis et al. (2016) inquired how a problem-based activity situated in a V-Lab would impact students’ cognitive and non-cognitive outcomes. Ideally, the process that students experienced solving an authentic content related problem in the V-Lab would result in improved knowledge construction. Likewise, Secomb et al. (2012) explored the influence of a V-Lab on students’ scientific reasoning which entailed explaining and analyzing results, effect change, and scientific arguments. It was hypothesized that the engaging and interactive environment presented in the V-Lab would promote “effective learning” (p. 3476).

Goals for Implementing V-Labs

The role and significance of laboratory learning have evolved over the last century and the goals for implementing V-Labs have emerged as a product of this process. As our understanding of how students learn science has expanded, so has the emphasis on laboratory education, the “experiences [that] provide opportunities for students to interact directly with the material world (or with data drawn from the material world), using the tools, data collection, techniques, models, and theories of science” (Singer et al. 2005 p. 31). From this corpus of research, our analysis indicates that researchers translate the potential promise of V-Labs into two primary goals for their work. First, V-labs are implemented with the intent of serving as an approach to teaching, which differs from them as a resource or element of curriculum. Second, V-Labs are intended as a vehicle or intervention for improving student outcomes.

As V-Labs have become more accessible to the general public, principally due to the reduction in cost for the technology, researchers have increasingly focused on examining their viability in what we interpret as a teaching approach; a means to “pass acquired knowledge, skills, and technology between individuals” (Fogarty et al. 2011). A number of studies proposed that V-Labs were an improvement to how content and/or skills had previously been taught where the V-Lab was assumed to be more beneficial than traditional instructional methods (Ekmekci and Gulacar 2015). In these instances, the student-focused nature of V-Labs is promoted as a way to improve the learning of the science or engineering content (Goudsouzian et al. 2018; Ekmekci and Gulacar 2015). As a teaching approach, V-Labs are indicated as a means for increasing student satisfaction, despite situations of limited physical resources (Cobb et al. 2009) or instances where the V-labs expand on the locations where laboratory learning can take place (Bortnik et al. 2017).

Ángel (2015) studied the capacity of V-Labs for providing instructors with an opportunity to eliminate language barriers that can occur in laboratory learning for students who are not native English speakers. In an effort to move away from passive teaching methods, those that are based upon merely transferring knowledge from the instructor to the students, V-Labs have been investigated as an active teaching alternative (Michel et al. 2009). For example, Cheong and Koh (2018) sought to understand how a V-Lab could improve “general passivity” (p. 58321) in a classroom by investigating whether the V-Lab’s replicated real-world scenarios would support students’ having thoughtful discourse and engagement, which would result in a greater capacity to apply their knowledge. Similarly, Goudsouzian et al. (2018) and Toth (2016) acknowledged the need for multiple teaching approaches when disseminating information to students and proposed that the affordances of V-Labs (i.e., animation, videos, and teaching aids) were well suited for improving learning outcomes. “[V-Labs] are software-tools that allow users to design repeated experiments to test the effects of variables…but in a shorter amount of time, with increased safety, and at a reduced cost” (Toth 2016 p. 158).

V-Labs were promoted as being poised to provide different instructional formats, especially in the case of blended learning. In the study by Bortnik et al. (2017), students used a V-Lab with traditional in-class instruction as a supplement prior to completing their hands-on laboratory. Researchers identified perceived affordances of the V-Lab (i.e., student-centered, inquiry-based, affordable, and expanded access) as being beneficial to improving students “scientific literacy, research skills, and practices” (Bortnik et al. 2017 p.3). In a similar fashion, Goudsouzian et al. (2018) sought to explore an expanded teaching approach using V-Lab as a way to disseminate course content to students in a more engaging manner. Citing a limited availability of laboratory activities for illustrating microbiology topics such as the cell cycle, a V-Lab was determined to be ideal, providing learning opportunities where none previously existed (Goudsouzian et al. 2018). “Multiple pedagogical approaches are useful in helping undergraduate students learn difficult scientific concepts, and incorporating multisensory learning tools aids in the ability of individuals to recall information” (Goudsouzian et al. 2018 p. 361).

A number of studies present V-Labs as being wholly responsible for providing instruction, akin to being the teacher. However, other studies acknowledge that learning using a V-Lab should only be seen as an instructional tool, not supplanting the teacher, but something for a teacher to use (Ekmekci and Gulacar 2015; Vrellis et al. 2016). Yet, many of the actions and activities that are typically provided by a teacher or laboratory instructor, such as feedback during work sessions are being subsumed by features of V-Labs and lauded as an advantage (Cobb et al. 2009; Bortnik et al. 2017). For example, V-Labs are tasked with providing content knowledge (Ogbuanya and Onele 2018; Bortnik et al. 2017) with the intent of assisting students in memorizing information (Goudsouzian et al. 2018) or providing students with knowledge of “how to streak out bacteria on agar plates to isolate single colonies [and] described the principles of using selective and differential culture media” (Makransky et al. 2016 p.11). To assure the fidelity of a V-Lab for providing instruction similar to a teacher, it is critical to examine the components of how students engage with their instructor(s) during a lesson and exploring to what extent the implementation of the V-Lab supports the identified engagement components.

As an intervention intended to improve student outcomes, studies were categorized into one of two different themes based upon the overall intent for using a V-lab. Either the V-Lab was used as a replacement to an existing learning activity in a traditional laboratory or as a supplement. The theme of replacement included a few studies that also explored the sequencing of V-Labs with physical laboratories. In sum, each of these goals resulted in mixed outcomes (Table 3).

Table 3 A breakdown of effects on student outcomes by overall goal of the study

The theme of replacement involved the use of V-Labs as an assumed equivalent or greater learning experience and for these studies, the most likely finding was no significant difference from a traditional laboratory. This was the case in the context of a microbiology course with content knowledge and self-efficacy as outcomes (Makransky et al. 2016), in the context of electrochemistry assessing conceptual understanding (Hawkins and Phelps 2013), and with physics (Darrah et al. 2014) and electrical circuits more specifically (Ekmekci and Gulacar 2015). However, two replacement studies involving an electrical circuit laboratory (Ogbuanya and Onele 2018) and offshore practical training (Zhu et al. 2018) did report an improvement in content knowledge as well as increased interest in favor of V-Labs. In a few cases, the notion of replacement was extended and modified into an approach that involved the sequencing of V-Labs and alternating them with traditional forms of laboratory. For example, Olympiou and Zacharia (2012) found that students who experienced a blending of physical laboratories with V-Labs indicated enhanced understanding of the content. Chini et al. (2012) employed a similar approach of concept and implementation sequencing but did not find a positive influence on student outcomes.

The theme of supplement refers to an investigation into whether adding a V-Lab was beneficial as an additional learning experience without taking anything away. When used in this fashion, a positive influence on student outcomes was more likely found than not. For example, Dyrberg et al. (2017) reported an increase in feelings of confidence and comfort when operating the physical laboratory equipment after use of a V-Lab supplement, while Bortnik et al. (2017) documented enhanced research skills and practices. In the case of Nolen and Koretsky (2018) the researchers inquired if the perceived affordances of a V-Lab such as, the ability to provide, “wide range of length scales, time scales, and complexity” (p. 227) would influence student engagement in an undergraduate engineering course. Results indicated an increase in interest and engagement in the course content when using the V-Lab. Similarly, Goudsouzian et al. (2018) found that students who completed either a live or V-Lab also made gains in laboratory skills and experimental predictions while Toth (2016) found an increase in students’ knowledge scores after using a V-Lab, but not when asked to apply information.

Interpretations of V-Lab Experiences

Aside from evaluating student outcomes of interest, two larger frameworks were noted for interpreting the influence of V-Labs, motivation (generally, as well as expectancy value, self- determination theory, and control value theory of achievement emotions specifically) and the Cognitive Theory of Multimedia Learning (CTML). The theories used to interpret the impact on student outcomes suggests that researchers inferred the primary impact as either being personally motivating for students or for improving their capacity for processing information. A modest number (15%) of articles in the review apply theories of motivation for interpreting undergraduate student experiences with V-Labs pursuing to extend understanding to include non-cognitive factors. For example, Dyrberg et al. (2017) acknowledged students’ perception of a task influenced their level of engagement and explored if they found value in the use of V-Lab as a pre-laboratory exercise. Koh et al. (2010) documented the extent to which a V-Lab influenced participants’ psychological needs and learning outcomes. Nolen and Koretsky (2018) explored the influence a V-Lab would have on student engagement in an engineering course. The V-Lab experience was designed to improve participants’ ability to work in a team and collectively engage in an intricate task, as well as increase their interest in the subject matter. Likewise, Makransky and Lilleholt (2018) investigated how an individual’s perceived level of immersion influenced their “non-cognitive and perceived learning outcomes” (p. 1144).

The utilization of process related theories occurred in a single study that applied the Cognitive Theory of Multimedia Learning (CTML), which contends that a selective amount of both visual and verbal representation is ideal for a conducive learning experience (Mayer 2014). In Makransky et al. (2017), it was hypothesized that learning outcomes would be a direct reflection of how increased immersion influenced students’ extraneous processing and that V-Labs that include both narration and written text would negatively influence student outcomes. The results supported this prediction, showing higher cognitive load and lower student learning in the V-lab condition.

Discussion

This review reveals a dearth of varied theoretical and methodological approaches regarding V-Labs. Nearly half of the articles did not explicitly state a theoretical perspective in interpreting student outcomes or offered research questions that guided the study. Lacking a theoretical perspective eliminates the critical assessment of how a pedagogical practice such as implementing a V-Lab could influence forms of engagement or knowledge construction. In essence eliminating the capacity to associate any change in performance with a specific process for explaining why the change occurred (Driscoll 2005). Additionally, the reviewed articles lacked the use of a consistent definition for V-Labs, one that (a) might have accounted for the intended experience, (b) included established characteristics and critical features, or (c) was not just a description of the technology used.

The majority of the studies were evaluative with most seeking to establish the efficacy of V-Labs as an option for meeting the specific needs of a particular context. Pre-and post-tests in the forms of a survey, quiz or exam were used to measure if the V-Lab, serving as an intervention, increased content knowledge when compared to another group. This form of baseline assessment is customary in technology integration, but does not provide a rich understanding of the teaching or learning context in ways that can advance our capacity to explain and account for any differences that might be detected. The current corpus is devoid of studies that take more than a minimal cognitive approach to understanding student learning. All of these studies were completed from the perspective that a V-Lab alone would serve as a learning intervention by replacing all human interaction (e.g., peer-to-peer, peer-to-instructor). This suggests a need for future studies that take a multidimensional construct- or person-oriented approach, one that recognizes learning as more than acquired content knowledge and explores the importance of social interactions in relation to V-labs. For example, teaching assistants are a mainstay of traditional undergraduate teaching laboratories and prior research indicates their importance (Gardner and Jones 2011), yet it seems that most V-labs are designed and subsequently investigated as if these individuals are not important or simply do not exist. Teamwork and collaboration among students is widely recognized as a key element of laboratory learning (Bauerle et al. 2011; Committee 2018; Hofstein 2004), but is not addressed as part of how V-labs are designed or how they are applied or researched. The teaching laboratories that V-Labs are intended to emulate include people who mentor, coach, collaborate, counsel, and instruct. These elements of the broader learning environment need to be better addressed in both the design of V-labs as well as research related to their use.

A small number of select quantitative studies did go beyond basic efficacy and should serve as models for future studies. Makransky et al. (2017, 2020); Makransky and Lilleholt (2018) showed promising results in their exploration of the influence of V-Labs on non-cognitive outcomes such as motivation and cognitive load. Yet, we found no attempts at replicating or building upon these studies. Studies such as these should serve as a starting point and initial model for how quantitative studies should be conducted in order to further develop our understanding of the phenomenon. New replication studies would validate the reported findings and confirm the results as broadly applicable. Future research should emphasize building from these methodologies to include multiple iterations, various V-Lab vendors, diverse domains, and varied forms of data. Quantitative research needs to move beyond the paradigm of only evaluating acquisition of content knowledge and skills with methods that involve comparison of V-lab with real-world experiences under the assumptions of equivalence.

The first-person perspective of participants, including students, teaching assistants and instructors is noticeably missing from the existing research on V-labs. Student perceptions were most often captured using survey instruments that offered minimal opportunities for free response. This represents a lost opportunity, producing a situation that may result in reinforcing or validating problematic assumptions about learning, including the role of people in the process. Given the predominance of using off-the-shelf software, this would be particularly pertinent and implies that participant needs and expectations are not being assessed or subsequently addressed. There seems to be a universal assumption that students view V-labs as interesting and important, expect to be successful in navigating and learning from them, find inherent value in this type of task, and have confidence in what they know and are able to do. This assumption seems to be grounded more in presumptions about participants being motivated by the novelty of the technology than the learning attributes that would be afforded by the design of any learning environment (Wells et al. 2010). In their description of using a fidelity of implementation process for constructing evidence-based practices, Stains and Vickrey (2017) acknowledge that simply inserting a new approach such as V-lab and measuring the student outcomes ignores vital elements of the implementation process that can influence a study’s results.

Studies that detail and describe the variation in student experience with V-Labs are sorely needed. In particular, those that explore the influence of individual background or personal characteristics on the V-Lab experience and how design can be used to afford social learning and models of instruction that include peers and instructors. The meaning that students make of their experience with V-labs is largely undocumented and the design of the available environments is all based upon the assumption that students can discern the critical features when appropriate in order to derive the intended meaning. Existing software designs are heavily influenced by instructional perspectives, the intent to emulate the attributes and affordances of an existing physical space, and to provide a singular participant experience. Studies of this type would focus on using qualitative inquiry to document the variation in student experiences in order to provide key information about the features that students attend to as they work through a V-lab and then use that experience to make personal meaning (Bussey et al. 2013). Insights derived from these studies would offer the potential for new approaches to design, ones that meld the expectations of students with the goals of instructors and designers. Better accounting for the varied student perspectives in such a manner would result in collectively moving from “does [it] work’, to “why, how, and under what conditions could [it be] impactful” (Stains and Vickrey 2017 p. 2).

Limitations

Our search process was limited by our choice of search engines as well as our ability to identify and use search terms that were consistent with how researchers defined their work. Thus, it is possible that our search process did not identify all of the published studies on V-labs during the time period. However, even in such a scenario, this review can still be viewed as a relative and likely representative sample of the available literature, which exposes a need for much more robust and diversified research regarding the use of virtual reality in undergraduate STEM laboratory education.

Conclusion

Use of V-Labs has shown some promise for expanding the capabilities of laboratory education. However, the results of this study reveal a dearth of theoretical and methodological approaches used for exploring this phenomenon. Most of the studies were evaluative, seeking to establish the efficacy of V-Labs for meeting specific needs under the expectation that simply using them would bring about predetermined student outcomes. V-Labs were used primarily as a teaching approach, responsible for providing instruction without any need or designed intent for interaction with a human teacher or peers. Regardless of approach, these elements are consistently shown to be key predictors of student learning. Interpretation of outcomes was most likely based upon an assumption that students would find them personally motivating. An assumption that seems more grounded in a novelty effect than design of the environment. New studies are needed that explore how individual background or personal characteristics influence the variation in V-Lab experience and how design can be used to afford social learning with current models of instruction through the inclusion of teachers and peers. Exploring such questions would offer a richer perspective of virtual laboratories in undergraduate education.