Introduction and rationale

Anomalous data are results of real-world investigations that do not concur with prior knowledge, theories, or anticipated outcomes. These data have various forms (numbers, images and graphs), and are also called outliers, aberrations, error, or noise (Chandola et al. 2009; Gong et al. 2012; Han et al. 2006). Scientific, engineering, and medical malpractice frequently originates from the improper evaluation of and response to anomalous data (Allchin 2001; Dekker 2006; Dorner 1996; Spector and Davidsen 2000; US News 2007). Therefore, the responsible and accurate use of anomalous data during scientific and engineering practice is paramount (Drenth 2006; Martinson et al. 2005; Masnick and Klahr 2003; NGSS 2012). However, science and engineering are no longer limited to hands-on investigation since powerful computer tools are available to examine natural phenomena through data generation, evaluation, and reasoning (Windschitl 2000; Spector 2008). Of these tools, ViRtual Laboratories (VRLs) are software-tools that allow users to design repeated experiments to test the effects of variables. This is similar to learning with hands-on laboratories (HOLs) but in a shorter amount of time, with increased safety, and at a reduced cost (Ma and Nickerson 2006; Toth 2009; Zacharia et al. 2008). A variety of VRLs are now available from research organizations (HHMI, n.d.; NASA, n.d.), higher education institutions (GSLC 2013; MyDNA 2003) as well as educational vendors. Yet, the processes of developing software-supported learning environments are complex (Reiser 2004) and there is a continued need to support practitioners (teachers, instructors, professors) in effectively using technology for teaching and learning (Kirschner and Wopereis 2003). This study responds to this need by evaluating the effectiveness of a pedagogical approach that asked students to learn basic concepts with a VRL and use these to analyze anomalous data from a real-world HOL.

Prior research

A birds-eye-view of the history of the integration of software tools for science learning and teaching indicates progress through microcomputer-based laboratories (Jackson et al. 1996; Nakhleh and Krajcik 1993), microworlds (Rieber 1992), instructional simulations (De Jong and Van Joolingen 1998), and weblabs (Dori and Barak 2003; Spector et al. 2013), as well as VRLs (Balamuralithara and Woods 2009; Chen 2010; de Jong et al. 2013; Toth et al. 2009, 2012a, b; Wolf 2010). Current tools allow users to manipulate sophisticated, remote equipment (Corter et al. 2011; Ma and Nickerson 2006; Nedic et al. 2003), and to interact with “virtual worlds” (Dede et al. 1997, 2004; Cobb et al. 2009; Dickey 2011).

A deeper-level examination of the instructional effectiveness of these tools indicates that software design features can support learning (Ainsworth 2008; Jonassen 2006; Kali 2008; Kozma 2003; Quintana et al. 2004; Underwood et al. 2005). Educationally effective software tools can scaffold the process of inquiry by structuring complex tasks and by automating routine ones (Kali 2008; Quintana et al. 2004). They can also constrain the users’ activity to important tasks, thereby reducing cognitive load (Ainsworth 2008). Furthermore, properly designed software tools can assist users in inspecting the properties of data (Quintana et al. 2004). Not all educational software tools employ effective design features however, and many even pose challenges for users (Reiser 2004). Therefore, practitioners and instructional designers need to make evidence-based instructional design decisions to reduce potential harm caused by ineffective software designs (Kirschner and Wopereis 2003; Toth 2009). The results of this study support practitioners’ instructional design efforts by describing how students learn from repeated experiments with a VRL, and how they use this learning to evaluate and reason about real-world, anomalous data. Two lines of prior research supported the development of this approach: studies that compared students’ learning with VRLs and HOLs and those that examined learning with anomalous data.

Prior research on learning with VRLs and HOLs

Two overarching research perspectives exist on combining or blending VRLs and HOLs. One aims to align students’ activities with the two environments in order to document which is more productive, and the other aims to support learning by utilizing differences in each tool. Comparative studies of effectiveness examined goals, activities and assessments and generally indicated that VRLs are more effective for learning (Apkan and Strayer 2010; Ma and Nickerson 2006; Toth et al. 2012b; Zacharia 2007) or are at least equivalent to learning with HOLs (Triona and Klahr 2003; Zacharia and Constantinou 2008).

Studies on the perceptual differences between HOLs and VRLs (Olympiou and Zacharia 2012; Olympiou et al. 2013; Toth et al. 2009, 2012a) are concerned with the nature of students’ learning interactions with different tools (Jonassen 2006; Ainsworth 2008). For example, Toth et al. (2012a, b) observed that HOLs focus students’ thinking on the manipulation of physical equipment, while VRLs direct users’ attention to variables. Similarly, when evaluating data from HOL-experiments, students’ focus on the outcomes and not the interaction of variables producing the outcomes (Toth et al. 2012a, b, p. 13). With well-designed VRLs, visual clues may direct attention on the processes by which results come about (Toth et al. 2012a, b, p. 13). Differences in the perceptual features of VRLs and HOLs might explain prior results on students’ difficulties in conceptually connecting their VRL-work to the “real-world” (Nedic et al. 2003), especially when learning with VRLs that present a simplified view of investigation and do not generate anomalous data outcomes. Consequently, there is a continued need to examine the effectiveness of instructional methods that combine or blend VRLs and HOLs.

Prior research on learning from anomalous data

A critical aspect of experimenting with “real-world” HOLs is the prevalence of error resulting in anomalous data (Allchin 2001; Chandola et al. 2009; Dorner 1996; Masnick and Klahr 2003). The pedagogical value of learning from anomalous data is that, by definition, these data are inconsistent with expected results. This inconsistency generates cognitive dissonance between what is believed (expected) and what is experienced (Schnotz and Rasch 2005; Sweller et al. 1998). Unexpected outcome can lead to new knowledge construction (Driver 1989; Limón 2001; Piaget 1985) if learners modify their existing, often incorrect, assumptions (Dreyfus et al. 1990; Limon and Carretero 1997; Posner et al. 1982). Using these processes, even very young children can have a basic understanding of variance in data outcome (Masnick and Klahr 2003) and hold sophisticated understandings of scientific investigation (Carey and Smith 1993; Smith et al. 2000; Zimmerman 2007).

Cognitive dissonance between prior knowledge (prior belief) and anomalous data can also have negative consequences and may result in students excluding anomalous data from their reasoning, re-interpreting findings to eliminate anomaly, and even rejecting anomalous data rather than changing prior (incorrect) theories (Chinn and Brewer 1993, 1998; Mason 2001). A possible explanation is that individuals formulate internal representations or models of data that influence their evaluation and reasoning (Chinn and Brewer 2001). Recent research indicates that even college students in science and engineering experience difficulties when evaluating the results of repeated experiments (e.g. Renken and Nunez 2013) and can respond to anomalous data with the same, ineffective strategies documented by Chinn and Brewer (1993).

Students’ difficulties can originate from a variety of problems in transferring learning with a VRL to real-word application. For example, students could experience difficulties in knowledge construction in the first place (Salomon and Perkins 1989). Alternatively, students may fail to recognize the explicit and implicit similarities between the learning environment and the application environment (Nokes-Malach and Mestre 2013; Rebello et al. 2005; Magin and Kanapathipillai 2000) and may fail to connect learning with a VRL and real-world data (Nedic et al. 2003). Therefore, this study examined whether university engineering students (a) learn important concepts with a VRL, (b) make a conceptual connection between their learning with the VRL and the evaluation of data from an HOL and whether they (c) use their learning with the VRL when they reason about real-world anomalous data. The research questions were as follows.

  1. 1.

    Does investigation with a VRL assist engineering students in constructing new knowledge?

  2. 2.

    In what ways do students use their learning with a VRL to evaluate anomalous data from an HOL?

  3. 3.

    In what ways do students use the variables they examined with the VRL to reason about anomalous data from an HOL?

Methods

The overall approach was constructivist in nature and it centered on the primacy of students’ knowledge development processes (Guba and Lincoln 1994). This approach is also termed “naturalistic inquiry” in the early literature (Lincoln and Guba 1985). From this perspective, knowledge is subject to continuous revision as students engage with the context of learning and form “ever more informed and sophisticated constructions” (Guba and Lincoln 1994, p. 114). Given the focus of this study on using concepts and processes learned in the context of a VRL to be applied for examining anomalous data from the real-world, we provided a learning environment where “relatively different constructions are brought into juxtaposition” (Guba and Lincoln 1994, p. 113).

Instructional context

The setting of instruction was a newly developed course on “Cellular Machinery,” designed and taught by an engineering faculty at a large research university in the United States. This semester-long, biomedical engineering course incorporated traditional lecture-discussion with problem-solving projects, group activities, and project presentations. One of the projects was used in this study. Students’ learning goal was to develop an in-depth understanding of the concepts and processes of DNA fragment separation by gel-electrophoresis with the MyDNA VRL (2003).

Instructional tools: gel-electrophoresis with the MyDNA virtual laboratory

Molecular biology is an interdisciplinary field of science and gel-electrophoresis is a common laboratory procedure, designed to separate different-sized fragments of DNA—Deoxyribonucleic acid, the genetic materials in living cells. The procedure can be used to compare DNA samples from different individuals for identification, paternity testing, and disease detection. The science of the process rests on the fact that the DNA molecule is negatively charged; thus, in a porous medium, fragments migrate to the positive pole under electric current. The porous medium (often an agarose gel) functions as a sieve and separates fragments by size with larger ones captured close to the loading points (wells on the top of the gel), and smaller ones captured farther down the gel, closer to the positive pole.

Students in biomedical engineering are required to be familiar with the variables, processes, and possible anomalous outcomes of this common laboratory technique. The traditional teaching approach uses a hands-on laboratory that can produce unexpected, anomalous results. A website titled “Hall of Shame” by Rice University (Caprette 1996) hosts a substantial database of anomalous outcome gel images—to the delight and pedagogical benefit of students and faculty. While anomalous outcomes are common with HOLs, determining the sources of anomaly by experimentation (by changing variable settings and producing repeated tests), is not feasible due to cost and lack of time.

With the MyDNA VRL (2003), students could easily design and run repeated experiments and study the effects of different variables (concentration, voltage, time of run) on the distance DNA fragments travel. They are able to stop the process and study the movement of different fragments over time. However, this particular tool does NOT allow users to generate anomalous results and it presents a rather simplified world of experimentation (Chen 2010). To help students relate the idealized world of this VRL to working in the real world, the instruction used existing images of anomalous outcomes from the “Hall of Shame” website instead of performing costly and time-consuming HOLs.

Participants

The participants were graduate and undergraduate students majoring in chemical, mechanical and aerospace or computer engineering and biology. The “Cellular Machinery” course was an elective and students participated based on their interests.

In study one, during the first year, 21 students participated (six females, 15 males; 16 undergraduate students and five graduate students). In study two, during the second year, 17 students participated (eight females, nine males; 13 undergraduate students and four graduate students). All students were required to complete the classroom work, but they could voluntarily withhold their results from inclusion in this study. As a result, 16 students in study one and 15 students in study two participated. The university’s Institutional Review Board approved the study.

Instructional activities

The same professor was the course instructor in year one (study one) and year two (study two). The instructional activities were also the same and took place in five steps (Table 1), lasting for 1 week. First, all students participated in a brief (15-min-long) lecture/discussion about the current uses of DNA in modern biological and computational applications. The instructor did not detail the specific processes of gel-electrophoresis. Next, students individually completed a short (15-min-long) Pre-Test, probing their knowledge of DNA and gel-electrophoresis concepts and processes. Subsequently, students individually worked with the My-DNA VRL for 30 min. They created repeated experimental trials in order to determine how different variables influence the distance DNA fragments travel. Students used worksheets to record the settings of voltage, concentration, and time of run as well as the distance different-sized DNA fragments traveled. They expressed the distance as a ratio of the actual travel from the starting point and the maximum distance possible. After this learning experience, students completed the Post-Test: the same, 15-min-long, instrument used as the Pre-Test. Finally, students completed an Error-Survey that presented images of anomalous gel outcomes (Caprette 1996) and asked them to evaluate and explain the results.

Table 1 The study had five steps and used two instruments for data collection: the same test was used before and after investigation with the VRL and the error survey was used at the end

Design

Because a randomized assignment of students to control and treatment conditions was not possible, a quasi-experimental approach involving pre-post-instruction tests was used that examined learning from a constructivist perspective with a mix of quantitative and qualitative data (Clark Plano and Creswell 2008). The assessment protocol was the same in both studies and it focused on students’ learning with the VRL followed by their application of newly constructed knowledge to evaluate and reason about anomalous results.

Data sources and measures

The study employed two instruments: a test of students’ knowledge (before and after work with the VRL) and an error-survey of students’ analysis of anomalous data (at the end of the work).

The test instrument

The author developed this instrument and used it in prior studies (Toth et al. 2012b). Collaborating scientists and an engineering professor examined this instrument for face validity, and established the content validity. The same test instrument was used before and after instruction (Table 1). Figure 1 illustrates the nature of the test items. The test yielded the “Knowledge Score” with three components: (1) knowledge of DNA characteristics (charge, direction of DNA-move in charged medium, differential move of small, medium and large DNA fragments etc.), and (2) knowledge of agarose gel characteristics (porous nature, function as a sieve, size of pores depending on concentration, etc.), as well as (3) knowledge of the processes of gel-electrophoresis in response to a variety of errors (such as mixed order of loading samples, failure to stop the process in time, incorrect concentration, instrument malfunction and random environmental factors etc.). The 18-item test instrument employed true/false, multiple choice and fill-in-the-blank questions with correct answers worth one point each. Incorrect answers were scored zero. Cronbach’s alpha indicated that the 18 items of the test instrument were internally consistent (0.83 and 0.75 in the first and second study, respectively).

Fig. 1
figure 1

Three example test-items from the pre- and post-tests. Questions such as 1a and 1b tested students’ knowledge of DNA fragment gel-electrophoresis concepts and processes. Images such as those in 2c tested students’ ability to use their knowledge to evaluate outcome data

The error survey

The author developed this instrument for the purpose of this research. It was the last step of assessment (Table 1) to probe students’ ability to analyze anomalous data outcomes from HOLs. The content validity of this instrument was established by the collaborating engineering professor and the data coding assistant, who had master’s level training in biology (forensic science). Survey items were edited until full agreement by the content experts was obtained. This instrument used images from the “Hall of Shame” website (Caprette 1996) and included simplified images similar to those the students saw when using the VRL (Fig. 2).

Fig. 2
figure 2

Three images illustrating gel electrophoresis results, similar to those used on the Error Survey. The first image illustrates what an ideal outcome may look like. The second image is an example of a broken gel that does not support the reliable assessment. The third image is difficult to evaluate due to inadequate visibility. The images that were actually used in the Error-Survey were made available for educational purposes by Rice University and David R. Caprette (caprette@rice.edu). Those images are not shown here and are available at http://www.ruf.rice.edu/~bioslabs/studies/sds-page/sdsgoofs.html

All images on the survey indicated anomalous outcomes so the question under each image asked students to (a) detect and describe (evaluate) the nature of anomaly and to (b) reason about how these aberrations may have come about. For each of six images on the survey, students were asked, “What makes the DNA band outcome hard to interpret from this gel?” and “What do you think may have caused this anomalous outcome?” Students responded to the questions with open-ended text that was either typed or hand-written. These responses were coded and quantified to yield the Evaluation Score, and the Reasoning Score.

Data coding and analysis

The first analysis examined quantitative data on students’ learning of gel electrophoresis concepts and processes: the Knowledge Score from the Pre- and Post-Tests. This analysis used a Wilcoxon signed rank test (WSRT), a non-parametric equivalent of the paired-t test, designed for studies with small participant numbers. The analysis established the Wilcoxon Z score separately for study one and study two by using the software SPSS (2012). It calculated the effect size by dividing the Z-score of each WSRT with the square root of the participant number, as suggested by Field (2005). This analysis answered the first research question about participants’ knowledge construction.

The second analysis coded, quantified, and analyzed the qualitative data from students’ open-ended answers on the six questions of the Error Survey. The coding and analysis centered on the interpretation of the meaning of students’ statements. To establish inter-rater reliability the researchers discussed the coding and made modifications until reaching 100 % agreement. Based on this agreed-upon meaning, the researchers assigned codes to each evaluation and reasoning statement, then quantitatively described the characteristics of coded data. This method for the quantification and analysis of qualitative data follows protocols described by Chi (1997), Miles and Huberman (1994), and Miles et al. (2013).

To arrive at the evaluation score, an evaluation statement that mentioned the focal measure, the distance DNA fragments traveled, at least once, received a score of one. Evaluative statements that did not mention DNA bands yielded a score of zero for each image, thus the maximum score for evaluation on each image was one. Reasoning scores were quantified with a similar method. Statements about anomalous outcomes that included reference to the variables studied with the VRL (concentration, voltage, time of run) received a score of one. No mention of these variables resulted in the score of zero for each image; thus the maximum reasoning score on each image was one. With this scoring method, the maximum evaluation score and the maximum reasoning score were six and the minimum scores were zero. Cronbach’s alpha indicated that the reliability of the evaluation score was 0.60 and 0.64 in study one and study two respectively. The reliability of the reasoning score was 0.45 and 0.64 in study one and study two respectively. The reliability scores of 0.6 and above are in the acceptable range for social science research (Field 2005). However, reliability is associated with the number of items in a survey, thus the moderate to low reliability of the six-item reasoning scores should be considered in this context. The analysis of the evaluation and reasoning scores answered research questions two and three, extending the findings of a prior publication (Toth et al. 2012a) with a deeper insight on how students’ use their learning with the VRL to the analyze anomalous data from an HOL.

Results

Does investigation with a VRL assist engineering students in constructing new knowledge?

The analysis of the students’ Knowledge Score with a WSRT revealed significant increase in students’ knowledge of gel-electrophoresis concepts and processes after work with the VRL. The effect sizes (r) were 0.85 (in study one) and 0.63 (in study two) as illustrated in Table 2. Leech et al. (2008) suggest that these effect sizes are larger than typical.

Table 2 Students’ mean knowledge scores of gel-electrophoresis concepts and processes before and after work with a virtual laboratory indicates significant increases in both studies

In what ways do students use their learning with a VRL to evaluate anomalous data from an HOL?

The analysis found that the majority of evaluations focused on the same outcome measure as the VRL-work: the movement of DNA bands. The top data row in Table 3 documents that 81 % of all answers in the first study and 87 % in the second study focused on DNA bands. The mean evaluation score was M = 4.75 (SD = 1.34) and M = 4.73 (SD = 1.39) in study one and study two respectively, indicating that students were not always consistent in using DNA-bands in answering each question.

Table 3 The large majority of students’ evaluations of anomalous data focused on DNA-bands, but only half of the students’ reasons focused on the variables of the VRL

The analysis found that, students noted two important concepts that connected their HOL-data evaluation to their learning with the VRL. First, they referred to the separation of different DNA bands using statements such as “The bands aren’t well defined, they run together, difficult to identify” and “The bands seem to have shadows, it is hard to tell where the bands should be.” Second, students’ evaluative statements also demonstrated focus on DNA-band pattern across the gel, as seen while working with the VRL. Sample statements showing this focus included “None of the bands line up,” “The bands are not perfectly horizontal,” and “the bands are not in line.” Statements without focus on DNA bands also appeared in evaluations such as “I am not sure what this means” and “It looks like a pirate ship.”

In what ways do students use the variables they examined with a VRL to reason about anomalous data from an HOL?

Table 3 indicates the frequency of those statements that correctly referred to the variables of the VRL: concentration, voltage, and the time of run. Only 60 % of rationales in study one and 53 % of rationales in study two used these variables. The reasoning scores were M = 3.63 (SD = 1.36) and M = 4.73 (SD = 1.44) in study one and study two respectively.

Two themes of variable-based reasoning emerged from students’ written rationales: simple reference to the variables studied with the VRL, and reference to variables with their mode of action, as learned from the VRL. For example, reasoning that simply referenced the variables of the VRL included “The voltage was too high” or “The concentration was too low. “Reasoning statements that provided a mode-of-effect included “The high voltage heated the equipment,” “The high voltage warped the gel,” “Too low concentration and longer fragments did not separate,” and “The gel is not concentrated enough for good separation.” However, rationales also included statements with focus on experimental error such as “the gels were damaged,” “Broken wells/no wells,” “Not proper procedures were followed”, “Somebody messed up,” and “the gel was polluted.”

Discussion of results

Does investigation with a VRL assist engineering students in constructing new knowledge?

An analysis that is powerful for nonparametric data and small sample sizes indicated a significant change in students’ knowledge (Table 2). Students did learn the characteristics of the gel-medium and the DNA molecule as well as the roles of concentration, voltage, and the time of run from their work with a simplified, virtual environment. These results helped eliminate the lack of knowledge as an obstacle of transfer (Salomon and Perkins 1989). The results also corroborated prior evidence on the benefits of learning with VRLs in a variety of contexts (Apkan and Strayer 2010; Triona and Klahr 2003; Zacharia 2007). Furthermore, this finding confirmed the result of a prior study from a university science classroom (Toth et al. 2012b) and suggested that the instructional approach can be replicated in engineering classrooms, as well. Having documented students’ basic conceptual understanding, the next focus was on whether students used this knowledge to evaluate and reason about anomalous data.

In what ways do students use their learning with a VRL to evaluate anomalous data from an HOL?

This question examined one critical indicator of making a conceptual connection between the two learning environments, the use of the same outcome measure. The majority of students’ evaluation statements focused on the characteristics of DNA bands, the same outcome measure that was at the center of students’ learning with the VRL (Table 3). This finding suggests that students were successful in making a conceptual “leap” from learning with the VRL to analyzing anomalous data—a skill students did not study with the VRL. Mentioning the separation of different length DNA bands, as well as the pattern of small, medium and long bands across the gel support this conclusion. These results suggest that, the perceptually apparent, surface-features of this particular VRL provided enough support for students to make a connection between experimenting with the VRL and analyzing anomalous data from the real world (Jonassen 2006). Students’ mean evaluation scores as well as their written explanations point to their successes in evaluating anomalous data by applying what they learned with the VRL. This result is in contrast to prior findings that higher education students conceptually separate virtual investigations and real-world results (Nedic et al. 2003). In our case, students were able to make such connection between the simplified variable manipulations permitted by the VRLs and the evaluation of real-world data, without the design contradictions mentioned in the prior literature (Reiser 2004).

In what ways do students use the variables they examined with the VRL to reason about anomalous data from an HOL?

In contrast to the positive findings on students’ ability to transfer their learning with the VRL to evaluate anomalous data, the analysis of students’ reasoning pointed to the cognitive complexity of the task (Chinn and Brewer 2001). The disconnect between work with the VRL and real-world, HOL-data was apparent in three aspects of students’ reasoning: the low frequency of referring to changes in variable levels, the limited explanation of the mode-of-effect of variables, and students’ reference to general experimental error instead of explaining the role of variables in the unanticipated outcome.

The frequency of referring to changes in variables was 60 and 53 % in the two studies respectively (Table 3). Given that students did have accurate content knowledge and that they were able to refer to the variables studied with the VRL during their data evaluations, this frequency of reasoning with variables is low. Furthermore, students in at least one of the studies were quite inconsistent in their reasoning with the variables. Both the low mean reasoning score and the low reliability of the reasoning score suggest the need to continue examining the difficulties students experience in reasoning about anomalous data. For example, the limited examination of the mode-of-effect of variables suggests at least some level of disconnect between learning about variables with repeated experimental designs in the VRL and applying this knowledge to the analysis of anomalous data from an HOL. Therefore, instructional design additions that draw students’ attention to the implicit and explicit similarities between this particular VRL and the real-world HOL processes could be fruitful in the future. Such explanation prompts were successful in prior studies (Sandoval and Millwood 2005; Sandoval and Reiser 2004) and a recent literature review published in this journal also point to other effective guidance methods, albeit with attention to some of the shortcomings of each method (Zacharia et al. 2015).

Students’ struggles in using the mode-of-effect of variables may have also contributed to their simple reference to human error or instrument malfunction, instead of focusing on the specific variables that contributed to the outcome. Therefore, in order to support students’ investigations in environments that blend VRLs and HOLs, the description and recognition of error types as related to working with anomalous data needs further investigation. However, the topic alone requires further study, since the definition of error varies widely by context and discipline (Allchin 2001). In the context of DNA gel electrophoresis for example, the results have personal consequences for human subjects. Therefore, a keen attention to overcoming experimental errors is not unusual and is even commendable. Nevertheless, the results are aligned with prior data on students’ use of alternative/general causal factors when they cannot offer specific reasons for why the outcome may deviate from expected results (Chinn and Brewer 2001). These results suggest the need for the more detailed examination of combining VRLs and HOL in different contexts with respect to the sources of anomalous data.

Practical implications

The results provide evidence for practitioners and instructional designers on the benefits of learning with VRLs by testing the effects of different variables with repeated experimental trials. However, there continues to be a need for instructional design decisions that minimize the disadvantage of VRLs that do not yield anomalous data. While working with such VRLs, students may require assistance in making the necessary conceptual leap between the simplified results of a VRL and the analysis of real-world data. For example, future instructional interventions may use explanation prompts, found to be beneficial by prior research (Sandoval and Millwood 2005; Sandoval and Reiser 2004). Alternatively, practitioners can employ discussion and debate to help students internalize these critical processes of scientific and engineering practice (Duschl 2008; Toth et al. 2002) while helping to reduce cognitive complexity (Sweller et al. 1998; Schnotz and Rasch 2005). In this study, there were no starter prompts in order to document the “nascent” thinking processes of these higher education students.

Limitations and further research directions

The limitations of this work are largely the artifacts of educational research in classroom settings. Research in formal classrooms is limited in opportunities for the development of randomized trials, the use of control groups, the time available for assessment and sometimes even in the number of participants available for a study. Randomized trials are especially challenging in university settings where students often self-select for entry to courses. Furthermore, university instructors are concerned about ethical issues associated with teaching different course-sections with different pedagogies. Instructional time is at premium in these classes and the number of students is usually low. This limitation may have contributed to the limited generalizability of the findings and to the low reliability of the reasoning score in study one. Furthermore, in higher education settings, students may learn concepts by responding to the questions on the pre-test. Due to the above reasons, the results of this, and other quasi-experimental studies, are best to (a) complement existing studies, and to (b) provide suggestions for further, controlled exploration (Shadish et al. 2002).

Despite the developmental nature of the work, the results draw practitioners’ attention to critical components of students’ difficulties in working with anomalous data. Future studies with larger participant numbers may explore the specific learning patterns of students with different prior experiences (i.e. graduate or undergraduate students). Future qualitative studies may employ “think-aloud” approaches and document the details of students’ interpretations of anomalous images. Continued studies on students’ work with anomalous data in complex, socially significant settings are also needed to examine the cognitive, behavioral, and attitudinal factors of preparing students to handle anomalous data. These studies may use anomalous data as a motivator before VRL work or could focus on how students evaluate their own anomalous data results from an HOL and examine whether students learn to recognize sources of error when they apply the procedures they learned with the VRL during hands-on work. Much needed research should assist practitioners in helping students conceptualize aspects of anomalous data that are random aberrations in the results of repeated experiments, results that are due to experimental error such as human action in the design, setup, and execution of experiments or instrument malfunction or random change in the investigation environment. The current literature indicates that the definition of error is context and discipline dependent (Allchin 2001) and several conceptualizations of anomalous data continue to exist today including outliers, aberrations, error, or noise (Chandola et al. 2009; Gong et al. 2012; Han et al. 2006). With further clarifications of key constructs and research-evidence on students’ learning, scholars may contribute to the development of evidence-based pedagogical approaches and consequently to reducing the frequency of misconduct and malpractice due to incorrect data handling (Dorner 1996; Mason 2001; Spector and Davidsen 2000).