Introduction

To help learners understand and apply the meaning of water quality concepts, we have been building different curricula and accompanying theory about how to ground or “situate” content (Barab et al. 2007a, b). Drawing on the power of perceptually immersive 3D worlds and game-design methodologies (Salen and Zimmerman 2004), our goal is to situate both the science content and the learner within a rich interactive context in which scientific concepts have value as tools to understand and transform the environment. One way to accomplish this goal is through transformational play, which involves: (a) projection into the role of a character who, (b) engaged in a partly fictional problem context, (c) must apply conceptual understandings to make sense of and, ultimately, transform the context (Barab et al. 2009). Additionally, transformational play (d) should include opportunities to examine one’s participation in terms of the impact it has on the immersive context. Transformational play involves more than seeing a concept or even a context-of-use; it involves being in the context and recognizing the value of concepts as tools in terms of the context in which one is engaged.

As a pedagogical tool, transformational play goes beyond perceptual immersion and does not require physical immersion, but instead is tied to situational or projective immersion or what others have referred to as presence (Dede 2009; Sheridan 1999). It is about being within a situation and, from a learning perspective, it is about learning concepts in relation to contexts-of-use. While such a sense of ‘being there’ can be elicited by a good book, there has been much research over the last 15 years revealing the power of 3D immersive worlds (IWs) for establishing a sense of virtual presence (Lessiter et al. 2001; Lombard and Ditton 1997; Sheridan 1999). According to Sheridan (1999), virtual presence is a theoretical concept intended to describe the phenomenon wherein an individual “feels herself to be present at a location which is synthetic …”; that is, created only by a computer. Dede (2009) offers a slightly more expansive view that does not restrict sense of presence to the synthetic world, and articulates how it can be elicited through the design of immersive learning experiences that draw on sensory, actional, or symbolic factors. Whereas sensory immersion is familiar to most, actional involves providing the participant to opportunity to initiate actions that have game-world consequences and symbolic immersion involves triggering powerful semantic, psychological, or cultural associations through the contextual frame in which one is functioning.

Consistent with this perspective, more than simply establishing a sense of perceptual presence, we are interested in leveraging game-based methodologies to build opportunities for transformational play in which the learner is a first-person protagonist investigating scientific problems, enlisting conceptual understandings to make sense of various data, and making decisions that impact the game world (e.g., kicking out a virtual logging company results in the game-based park going bankrupt). Beyond problem framing or even a complex word problem in which one simply speculates about someone else’s situation from a distance and is evaluated in terms of the projective consequences of their solution, in transformational play the learner is the protagonist who experientially enters into a world where actions impact the unfolding dynamics of the situation (Barab et al. 2008).

And while simulations have proven quite useful in supporting science learning videogames and their ability to establish consequential roles within narratively rich virtual worlds provide a new medium for supporting meaningful learning and advancing science education (see Barab and Dede 2007). In this study, we are interested in understanding how we can use a game-based, virtual world to situate science content. In our case, the learner enlists his or her evolving understanding about chemical indicators of water quality (e.g., turbidity, nitrate levels, amount of dissolved oxygen) and scientific processes (understanding of erosion, eutrophication, algae blooms, etc.) to interrogate a fictional situation and test various solutions (Barab et al. 2007a, b).

We designed and tested three instructional conditions each designed to teach the same underlying science content, yet, differed in the degree to which the experience drew on game-based methodologies and technologies. Student pairs (dyads) were randomly assigned to each of the three conditions, with a fourth group of single users also being randomly assigned the IW condition. The specific research questions being investigated here are as follows:

  1. 1.

    Are there significant differences among undergraduate dyads assigned to the electronic textbook (ET), simplistic framing (SF), or IW conditions, and to an IW single-user (IW-SU) condition?

  2. 2.

    Does an IW-Dyad condition significantly increase science learning over an IW-SU condition with respect to (proximal and distal) test items and performance-based learning gains?

  3. 3.

    What are the qualitative differences in how participants in the three conditions engaged the intervention?

Additionally, we were interested in the relations among scores on standardized items and performance on a transfer task. This was guided in part by our interest in understanding whether standardized items necessarily capture the depth of learning that occurred in this curriculum. Below, we first overview the relevant literature that led us to this research question and helped guide the development of the particular conditions being investigated, and then we discuss the pilot study that preceded this work. From there, we then describe the study, closing with a discussion of the results and the implications for further research.

Background

Science Immersion Through Persistent Virtual Worlds

One of the most exciting developments in interactive electronic entertainment has been the popularization of persistent virtual worlds (Castronova 2001; Squire 2006). These are persistent social and material worlds, universes with their own culture and discourses (Squire and Steinkuehler 2004). In these worlds, people engage in rich discursive practices, form meaningful relations, take part in collaborative problem-solving, and craft situated identities (Steinkuehler 2006). Koster (2000) argues that persistent virtual worlds are defined by: (a) a spatial representation of the virtual world; (b) avatar representation within the space; and (c) a “sandbox” in which to play, offering persistence for some amount of the data represented within the virtual world. These persistent virtual worlds provide a meta-context through which participant behaviors are given meaning. Consistent with the above discussion of presence, participation in these virtual spaces involves being in these spaces—perceptually, symbolically, and in terms of the transformational impact of one’s actions. And, as a perceptually present participant in the game world, the learner is investigating problematic storylines, identifying solutions, implementing action plans, and examining the impact of these solutions.

In terms of simulation experiences, the designers can establish a persistent world that immerses the user into a simulated habitat where they, for example, research the quality of a virtual river. While not being the real thing, the virtual world has the advantage of having readily manipulable chemical levels and other complex dynamics such that they have rich learning potential (Clarke and Dede 2009). While in books and movies a sense of presence might be inspired more by narrative than perceptual or interactive cues, in the persistent worlds that we design the learner becomes a protagonist who has agency and consequence with respect to the progression of the storyline. Importantly, the player assumes a role within the fictional context through his or her online persona; personified by an avatar through which the player interacts with objects in the online reality. Gee (2003) has discussed the avatar as one’s projective identity, through which one can develop empathetic embodiment with the complex system. One’s avatar is part virtual character and part real player, what Gee refers to as the “real-virtual” being. The actions of one’s real-virtual being change the virtual world and, thereby, function as a tool that allows the player to develop an empathetic embodiment for the system dynamics that govern participation in the virtual world.

More than a virtual world, we view our designed spaces as game worlds that support transformational play and involve roles, missions, rules, interaction, fantasy play, and trajectories with end states. Squire and Jan (2007) in this first installment of this series highlighted a number of important features of games for science education. First, games allow students to inhabit roles that are a melding of player identity and the game role of that player, allowing students to move beyond their role as students and actually become an environmental scientist. Secondly, similar to school, games provide challenges. However, the challenges available in games are problem-based and contextually meaningful, requiring students to learn content in relation to a player-adopted and narratively rich set of goals. In this way, games provide the learner a sense of intentionality and consequentiality. Thirdly, designed game worlds provide contested spaces in which there is a spatially bound problem that changes over time based on player decisions as they move around the space—both serving as a source of motivation but also as providing an important perceptual grounding for learning. Lastly, games allow for the just-in-time embedding of authentic resources and tools, whose meanings are in relation to an adopted task and not because they are told they are meaningful by a textbook or teacher.

Well-designed game play immerses the player in a rich network of fictional interactions and unfolding storylines where he or she must learn about the underlying game grammar to solve the game-world problems. Importantly, games also allow for play, in which one can take risks, experiment, and engage in actions they mostly likely did not have a chance to undertake in the real world. When one combines persistent worlds with videogame methodologies they have the potential to support what we described above as transformational play. A well designed IW for learning allows the player to gain an appreciation for the relations of how conceptually informed actions change the virtual world in relation to his or her adopted goals and, through this coupling of person, content, and context has the potential to support grounded understandings of underlying concepts. Science education is a particularly fertile discipline that will benefit from such immersive and contextualized treatment in that many of the phenomena of interest are difficult to engage learners in meaningfully. For example, while one might explain the concept of eutrophication with a diagram, it tends to remain a static object which is very different than engaging children in the making of such a diagram. However, immersing learners with agency in a real eutrophication context is quite difficult if not impractical. Even if one could somehow embed learners within contexts in which the variables of interest are taking place, they would not necessarily gain rich insights into the relationship of the underlying science to its real-world applications.

Recently, we are seeing a number of examples of the power of game-based virtual worlds to support science education. For example, Squire and Jan (2007) and Rosenbaum et al. (2007) explored the power of augmented reality for supporting students learning about water quality and infectious diseases. Both groups found learning gains, and also offered rich examples of where the game supported a sense of player immersion within the narrative. Neulight et al. (2007) studied students’ participation in a virtual epidemic within a multi-user virtual environment. Leveraging a popular virtual environment inhabited by children, these investigators injected a virtual disease that affected student-created avatars. Results from their analyses showed that students perceived game play as similar to a natural infectious disease, and that game play impacted students’ conceptual understanding of the causality of natural infectious diseases. Barab et al. (2007b) showed that 5th graders using a multi-user virtual environment showed statistically significant learning gains on standardized test items, and were able to transfer these understandings to other contexts. Nelson (2007), Ketelhut (2007), and Dede (2009) discuss the power of their game-based curriculum for supporting science learning while simultaneously illuminating the challenges of scaling such contexts. What this previous research has not provided are many experimental studies in which the potential of game-based, multiuser virtual environments for science education are formally tested and compared to other pedagogical approaches.

Pilot Study

At the core of the study discussed here is a game-based, multiuser virtual world referred to as Taiga Virtual Park. Taiga is a game-based IW in which players login to a three-dimensional (3D) virtual environment to solve a water quality problem (Barab et al. 2007a, b). In previous work, we investigated whether grounding the curriculum using narrative and perceptual scaffolds would significantly impact learning. This was investigated by designing three instructional conditions: each was designed to teach the same underlying science content, yet, differed in the degree to which the content was situated. For the expository text condition (ET), there was very little contextual framing, with the information to be learned being presented as multiple textbook descriptions that only loosely framed the content in terms of broader applications of use. For example, when presenting the concept of erosion, the text that participants read on the computer screen referred to a generic river accompanied by an illustration to contextualize the concept. Simply put, the expository text condition the information was presented in a manner similar to most school textbooks.

Increasing the amount of content contextualization, the simplistic framing (SF) condition involved one rich storyline in which all the content was situated—what the Cognition and Technology Group at Vanderbilt (1993) referred to as a macrocontext. While learning was narratively connected in this condition in terms of an overarching the 3rd person description storyline or problem-based learning context, students read mostly descriptive text about the storyline and their choices had no impact on the unfolding of the story. This condition while hypertext in design, was similar to a book, and was not intended to establish any sense of virtual presence but simply position the work in a manner similar to word problems. In the perceptually rich IW condition, we designed a virtual environment where students had to navigate an avatar in a virtual park to interview non-player characters and collect water quality data (see Fig. 1).

Fig. 1
figure 1

Screenshot from Taiga, including an image taken from the 3D immersive world with a student avatar walking, a capture of the image students find in the 3D space and interrogate, as well as an example dialogue scene that unfolds when clicking on non-player characters

This condition had many game features described by Squire and Jan (2007), including role-playing with players adopting a character whose identity evolves over time, a fictional space with a problem that has competing stakeholders, game-based missions that establish player intentionality for being there, fantastical elements in that players could take actions and experience possibilities not present in the real world, interactive rules that players come to understand through game play, and a win-condition that involved identifying the factors killing the fish and posing a realistic solution. We hypothesized that the immersive condition would establish a sense of presence and support transformational play, with the goal of increasing content learning. To test this hypothesis, undergraduate psychology students were recruited and randomly assigned to one of these three conditions and their performance on standardized test items was assessed.

Results were that students in the IW condition (X = .818, SD = .212) performed significantly better on proximal items than students in the simplistic framing (X = .657, SD = .145) or expository text conditions (X = .667, SD = .236) (F(1, 67) = 4.894, p < .05), η2 = .131. There were no significant differences between the simplistic framing and expository text conditions. The fact that students in the IW condition did significantly better was surprising in that the expository text students read content that directly stated the academic content that was on the test items, whereas the individuals in the other conditions had to infer the underlying concepts and practices from their experience. In fact, the IW group had a more contextually ‘noisy’ experience that did not directly explicate the underlying meanings, yet they performed better on the items that were closely related to the content they studied. However, we did not find these same differences for what Hickey and Zuiker (2003) referred to as more distal level items—that is, items testing the underlying concepts, but not in relation to a water quality context (F = .685, p = .51), η2 = .021 (small effect size). This suggests that the context designed to support a sense of transformational play did impact students understanding of items that were close to the content in which they learned, but they did not impact distal understandings on the items used in this study. We considered this to be a shortcoming of our design, and one that we addressed in the study reported in this manuscript.

Design Changes

First, in the study reported in this manuscript, we decided to use dyads instead of single users. According to Schwartz (1995), the collaborative interrogation that happens between the dyads should evoke deeper understanding of the abstract concepts (see also Wiley and Jensen 2006). To capture this potential, we also added a performance-based, open-ended transfer task to assess this deeper level of comprehension that the dyads are hypothesized to promote. In part, we were interested in whether the standardized items were capturing all the variance in learning—especially the depth of understanding that might have occurred for students in the interactive IW condition. In addition to its potential to attune learners to the non-contextually specific (invariant) science concepts, the second use of the dyads was to provide data that could be qualitatively examined through video and audio recordings. It was our expectation that the interaction within the dyad could illuminate possible explanations for why the experientially immersive condition performed better than the other two conditions, and what aspects of the communication add to the deeper level of understanding of the abstract concepts.

Additionally, we modified the game-based IW condition to include what we speculated would aid in helping students experience transformational play at the same time developing more generalizable understandings. In particular, we added more interactive rule sets, more embedded pedagogical supports, and more liminal tasks—tasks that required students to interact with the core concepts in relation to other contexts (Tempest and Starkey 2004; Zuiker et al. 2007). In terms of the first two changes, for example, we included a pedagogical agent so that students, instead of simply receiving the data results when they brought their collected water samples to the laboratory, had to work with the lab technician to conduct various analyses with his support. Based on the quality of their interactions, they would receive different forms of acknowledgment and even different information. This was also designed to establish a sense of transformational play in which student actions would be consequential in that they would impact the unfolding of the situation. We also included new roles in which students had to serve as advisors, even making recommendations on activities in which they had to determine best solutions when presented as more abstracted narratives.

These latter episodes were designed to provide a sense of liminality, a space that Garsten (1999) and Turner (1982) position as ‘betwixt and between’ or ‘neither here nor there.’ While Turner’s (1982) initial anthropological work focused on rites of passage and significant transitions, in our case the interest is on helping the player conceptually move between multiple contexts in which a core concept or understanding has relevance (Zuiker et al. 2007). For example, while a particular watershed being investigated might be suffering from erosion, the concept of erosion more generally has relevance to multiple contexts. So, the design challenge for us was to position the learning opportunities such that, in addition to the core context having an erosion problem, that there would be opportunities for the player to apply their evolving understanding of erosion to other, relevant contexts such that s/he could come to appreciate its cross-contextual relevance. Our supposition is that, through this process, the learner becomes attuned to both the variant and invariant parts of the learning environment. (see Barab et al. 2007b, for a more in-depth discussion of these elements and the research that prompted these changes). The idea was to provide multiple representations of the core concepts such that students would experience them in different contexts with different levels of grounding. Through these latter additions, we hoped that students would be more likely to develop a contextualized understanding of the underlying content and at the same time be able to transfer this understanding when the underlying domain content (e.g., water quality concepts such as dissolved oxygen, eutrophication, and more general themes such as using evidence to support claims) was relevant to other contexts.

Methods

In the follow-up study reported here, we were again interested in the role of situating disciplinary content in a transformational play space, but one that was designed to support a sense of engaged consequentiality, and in which we would have participants work in dyads. Specifically, the research questions being tested were:

  1. 1.

    Are there significant differences among undergraduate dyads assigned to the electronic textbook (ET), simplistic framing (SF), or IW conditions, and to a IW single-user (IW-SU) condition?

  2. 2.

    Does an IW-Dyad condition significantly increase science learning over a IW-SU condition with respect to (proximal and distal) test items and performance-based learning gains?

  3. 3.

    What are the qualitative differences in how participants in the three conditions engaged the intervention?

Additionally, we were interested in the relations among scores on standardized items and performance on the transfer task. This was in part because of our interest in understanding whether standardized items necessarily capture the depth of learning occurring when one used such a contextually rich curriculum.

As in the pilot work, volunteer undergraduate students were randomly assigned to one of the conditions, with the three dyadic conditions being video and audio recorded. In addition to assigning the dyads to three conditions as in the pilot work (expository text, simplistic framing, or IW), we again included the IW single-user condition to determine whether the dyadic condition significantly improved learning gains even in the condition that has already shown the best performance in previous studies. Also, in addition to quantitative scores, we were interested in examining the performance and student debriefings captured from the video and audio recordings.

Participants

Fifty-one undergraduate participants were sampled from a large Midwestern university. Of the total, 20 (39.2%) were male and 31 (60.8%) were female. Each participant was randomly assigned to one of the four experimental conditions. If the participant happened to be placed in a dyad condition, the pairs were randomly selected from the group of attendants. Some participants in the dyad conditions knew each other previously and that information was taken into account when observing their interactions. The participants either received extra credit for a class or a cash payment of $15. Participants in all conditions were given 90 min to complete the experiment.

Design of the Experimental Conditions

All three versions focused on four important science education standards: (a) evaluate the validity of claims based on the amount and quality of evidence cited; (b) explain how the solution to one problem, such as the use of pesticides in agriculture or the use of dumps for waste disposal, may create other problems; (c) demonstrate how geometric figures, number sequences, graphs, diagrams, sketches, number lines, maps, and stories can be used to represent objects, events, and processes in the real world; and d) recognize and describe at even a simple level how systems contain objects as well as processes that interact with each other. The three conditions were again designed to differ in terms of the extent to which they were likely to foster a sense of transformational play.

In particular, the focus was on: (a) learning concepts including erosion, eutrophication, water quality, and system dynamics; (b) building skills including graph (de)construction, hypothesis generation, water quality analysis, socio-scientific reasoning, and scientific inquiry; and (c) developing a richer commitment to environmental awareness. Central to these understandings is an appreciation for the nature of complex systems and how real-world problems have causes and solutions that involve non-linear dynamics and multi-causal interactions, and whose properties-as-a-whole do not derive from the simple combination of constituent parts.

Expository Text Condition

The expository text condition (ET) involved presenting the information as an electronic textbook. More specifically, it was a website broken down into four separate instructional water quality-based activities that corresponded with the same state standards as the other two experimental conditions. In total, there were 38 pages of text, each followed by a 4-part written assessment and three reflection questions. The pages were navigable by a “previous” or “next” link located at the bottom left and right corners, respectively (see Fig. 2). Some concepts on each page were bolded to illuminate their importance to the overall water quality problem.

Fig. 2
figure 2

Screen shot of the sequential text condition illustrating the information and image given to the participants, as well as the clickable “previous” and “next” links

After each section, the participants were given the opportunity to review what they had read before answering the test questions, and after each test, they had the chance to “try again” and change their answers. The participants had to look at each page sequentially. For example, they did not have the chance to navigate from page three to page five without passing through page four. There were three final reports that were equivalent in scale and content to those reports submitted for the other two conditions. Further, while presented in a more direct-instruction fashion with little framing in terms of a particular context, the underlying science content was compatible with the other two conditions. A content expert and a science teacher also reviewed the presented information to ensure its direct relevance to the underlying concepts and the test items.

Simplistic Framing Condition

The simplistic framing (SF) condition contained the same content information as the 3D environment, but the information was written as 3rd person, as opposed to 1st person, text. The participants are called upon to determine the cause of declines in fish numbers in a park that houses several different groups. Each group has a unique explanation and interest in the fish decline, and the participant must appreciate the multiple perspectives, use scientific data, and synthesize their results to determine the cause and what could be done to prevent this type of problem in the future. Both groups were provided a map of the Taiga Park in which they could see where the different stakeholders were located and the layout of the watershed. In the SF condition, the content was still situated as part of the park problem, but the groups within the park were labeled as Fishing Company, Logging Company, etc. with no mention of particular characters within each group. Rather than involving first-person interaction, participants simply read about each group.

The website contained a 2D map representation of the park located on the left side of the screen (see Fig. 3). The 2D map was the same as the map used in the IW condition, except the 2D map did not show where particular characters were located. On the right side of the screen, there were links to “Report One,” “Report Two,” “Report Three,” and park information (“Indigenous proposal”, “Journal entries”, and “Pamphlets”). To read about each group, as was part of their overall assignment, they simply had to click on the map where that group was located, and there would be a pop-up page that gave them information. In this way, the information was non-sequential, as the participants could read about any group in any order they chose. Each “Report” required the participants to gather some information and answer particular questions in an essay format. After this, they had to answer the three metacognitive reflection questions.

Fig. 3
figure 3

Screen shot from non-sequential map condition (2D) depicting the clickable map and the six report and informational links

Immersive World Condition

The Taiga Park is a world within the larger Quest Atlantis context, a multi-user virtual environment aimed toward game play and education. The IW condition was presented as a computer-based, simulated aquatic habitat. The participants explored this environment with the use of an avatar that they controlled using the arrow keys on the keyboard (see Fig. 1). They interacted with several different characters that fall within six different groups (park visitors, logging company, park administration, etc.). For example, the participants talked with Ranger Bartle about the fish decline problem or with Lisa who works for the logging company. The participants visited each character twice during their experience and engaged them in a dialogue, which produced information related to the problem-based scenario and the four state standards. During the first visit, the participants discovered initial opinions of non-player characters; after participants had talked to everyone and collected water samples, the characters volunteered new information that was more scientifically grounded than on the first visit.

The information was presented in a first-person narrative, and the participants typically had three optional responses to the character in order to “personalize” their exchanges. They also were required to bring water samples they collected to a virtual laboratory for analysis. Participants had to take quizzes throughout their experience and complete three “Quests” that are essentially reports on what they observed from their exploration of the environment. After each submission, they had to answer three metacognitive reflection questions (same in every condition), and then they interrogated the implications of their choices on the 3D world.

Dependent Measures

The outcome measure was a post-test consisting of 16 multiple choice questions, 5 short answer questions and a performance-based transfer question. We employed a “multi-level” assessment strategy to gauge the interventions’ effectiveness (cf. Hickey et al. 2006; Ruiz-Primo et al. 2002). All levels aligned with science concepts, target standards and engaging students in socio-scientific reasoning. The assessment framework involved analyzing these first in terms of intentionally selected standardized multiple-choice items that leveraged water quality problems similar to that used in the curriculum (proximal level); and also focused on standardized items that would be considered “far transfer” (distal level). Additionally, to understand depth of understanding, students responded to an open-ended, performance-based transfer task that was specifically designed for this study.

While the proximal standardized items targeted the conceptual resources aligned to the curriculum and the standards, the distal items targeted items aligned to the standards only, with explicit disregard for the curriculum. For example, an item might address the above four standards in terms of the ozone layer (see “Appendix 1” for more examples). Item pools consisting of 10 or more items were developed for each of the four target standards, and two items were randomly selected for inclusion. As such, the distal measure provided a valid comparison against other curricula and, more fundamentally, a valid proxy for high-stakes tests, serving a broad research goal of evaluating whether curricular enactments support transfer to externally developed, high-stakes achievement tests. The distal measure, by design, assessed some concepts and many facts that the intervention did not target. Further, the distal items that, by chance, assessed targeted content did so across a broader range of difficulty and, therefore, were not necessarily tuned to the kinds or degrees of competence supported by the experimental experience. Thus, the distal-level items comprised a “far-transfer” measure of learning and a challenging one for a specific, short-term intervention to impact.

Cronbach’s Alpha internal consistency estimates on this sample for both the proximal and distal sets of items were .7. The performance-based transfer task introduced a scenario of a river that flows past a farm, small communities, a wildlife preserve, and a city into a bay (see “Appendix 2”). Also, the participants were given information that there have been fewer birds in the wildlife preserve. The participants were asked to determine what was causing the decline in birds and why, with one multiple-choice and two short-answer components. The short-answer responses were evaluated using a scoring rubric assessing the enlistment of relevant water quality concepts. One assessment expert and a water quality teacher examined the task for content validity. Initially two raters evaluated the open-ended responses, but once they had 100% agreement on a number of items, one rater examined the rest.

A possible limitation to the standardized assessments is that the standards and items were typically used for 6th grade curriculum and our sample consisted of undergraduate college students. However, we did not find a ceiling effect on the assessments or even any particular items on the tests. That is, the undergraduates appeared to perform at approximately the same level as 6th grade students that have taken these assessments during other implementations. This indicates that students do not necessarily gain a deeper understanding in middle and high-school of these topics, and thus it appears that the curriculum is still appropriate for this group. This was especially relevant given the ease of using random assignment with this population, a methodological process which involves juggling multiple confounds in public schools. Another limitation in relation to the performance-based transfer task is that the students in the SF and IW condition had experience with a more in-depth problem task, possibly favoring a style of question that benefitted students in these conditions.

Qualitative Methods

All the sessions and post session interviews were audio and videotaped for all three conditions (ET, 2D, and 3D-dyad). First all the conversation data was transcribed, then a coding scheme was developed in order to analyze the session, where participants experienced one of the three online conditions, and the post-session interview data. The coding scheme was designed to measure four dimensions in the conversation data (defined below). After coding and analyzing the entire data set, authors more deeply examined the session data for visual references (e.g., body language) and the post-session interview data to investigate emergent themes.

Operational Definitions

We analyzed the conversations between participants during the session and the interview data. Our analysis focused on four aspects of their conversations: collaborative sense making, personalization, use of terminology, and instantiation. Collaborative sense making was defined as the conversation between the two participants while working together to problem-solve and to negotiate around shared conceptual understanding. We coded an instance of collaborative sense making when one of the participants started the conversation by prompting a question, suggesting a strategy and/or checking for agreement with their partner on a topic and/or activity until the dyad resolved the issue after at least each person had one turn in conversation. Thus, we measured collaborative sense making by the number of instances of sense making around a particular topic related to water quality or the learning environment.

Excerpt 1:

(Dyad 15, IW Session, 43:45)

As part of the game, dyads were given a chart on annual sales of Mulu Village, one of the three contributors to the fish decay problem in Taiga, by a non-player character (NPC):

Student 1: (reading out loud what the NPC says) Which activity seems to create the greatest sales over the last 3 years? (opens the chart) Ok, sales…I would have to say… the art works.

Student 2: We probably need to take into account how much…like they cost….so…here (points to the chart) they’ve been 35, 5 (continues to count)

S1: (looks at the chart) So, it’s decreasing (referring to the sales) (P2 continues to count the numbers) So, that’s [the last number P2 counted]the most so far.

S2: Yeah (continues to count)

S1: (seems confused) But, it’s not sales. Shall we just go to the sales column? So…art works, it is.

S2: I guess so….artworks.

S3: Cool (moves on to the next question)

This coding of collaborative sense making was used for analyzing session data only, since the interviews were conducted individually.

Personalization was defined as the number of times students connected a concept and/or an idea that they were working on in the learning environment to their personal experiences and/or prior knowledge. A student sharing with her partner the information that she had learned about photosynthesis previously in her undergraduate biology class would be an example of an instance of personalization. Instantiation was defined as making connections between concepts and/or ideas related to the content in the learning context that they were working on. Instantiation takes the form of elaborating on the new information, explaining concepts, comparing and contrasting ideas. Instantiation was measured by the number of times a student used a concept in a contextualized manner.

Excerpt 2:

(Dyad 30, SF Session, 31: 46)

Students were provided with information on the chemical change in water throughout years:

S1: So the only thing that really changed is temperature…a lot. Ph kept becoming better… like in the area…it was good, than it was very good. The DO [dissolved oxygen] didn’t change; so only small fish can live. Turbidity is neutral.

We also coded the number of times each student used terminology related to the content, such as turbidity, phosphorus, nitrite, erosion, silt, pH, nutrients, etc. during their conversation. Codings were not mutually exclusive, such that there were instances that were identified as having aspects of both, for example, instantiation and knowledge terminology categories. Note that we also intended to code for immersion or presence; however, this became quite difficult in that many of the important indicators of presence were gestures or inferences drawn from statements that were quite vague. For example, when participants breathed hard as they ran up the side of an embankment or talked about being “in the water.” Therefore, this data was better gleaned from the interview data.

Coding

There were two rounds of coding of the data. First, two of the authors coded 20% of the session and interview data for reliability. The reliability was assessed by correlating the codes of each of the two authors; the Pearson correlation ranged from .87 to .98. Due to this high inter-rater reliability, one author continued to code all remaining data. Second, after coding and analyzing the sessions and interviews, we went back to the data to try to understand the differences through specific examples. One of the authors examined a subset of session videos to look at various items of interest, such as body language, off topic conversation, navigational issues, among others. The results from both rounds of session coding are combined for a richer understanding of the experiences of the participants.

The interviews took place after the sessions. The interviews were semi-structured and typically lasted between three and 5 min. Six prompts/questions were presented to the participants in the post-session interviews: (1) describe the experience to me; (2) what did you like about the activity?; (3) what did you not like about the activity?; (4) did you learn anything new?; (5) did you find it engaging?; (6) anything else you would like to tell me? In order to understand the emerging themes in the interview data, four of the authors listened to audio data together, twice followed by a group discussion each time. During this collaboration, the authors agreed that four core themes emerged from the interviews: (1) students’ concern about the reading load, (2) complexity of the learning environment, (3) level of students’ engagement, and (4) level of authenticity of the activity.

Procedure

The dyadic and single conditions were run in separate rooms that contained a desk, two chairs, a PC computer, an audio recorder, and a video recorder. Participants were given instructions particular to their experimental condition. For the IW condition, participants were instructed how to navigate in the virtual world through the use of the arrow keys on the keyboard and how to “communicate” with the characters within the space by clicking on them with the mouse. The participants were shown where to locate their “quests” and how to submit their responses. The IW participants also received a packet that included: (1) a letter from Ranger Bartle, a fictional character within the virtual world, that gave a background for the fish decline problem in Taiga, and (2) a “field notebook” that identified each character and allowed space for taking notes for each quest.

The SF participants received a similar packet; however, the “letter” was changed to expository text and did not mention Ranger Bartle or any individual characters in the environment. In addition, the research assistant explained how to submit their responses to the three tasks. The Electronic textbook group did not receive a framing letter for their experience but did receive extra paper for taking notes. The research assistant explained how to navigate the website and that there would be short assessments throughout their experience. The participants signed a waiver stating that their experience would be video and audio recorded for later analysis. Participants in all conditions were given 90 min to complete their learning task and 30 min to complete their assessment materials independently. After this time, the research assistant conducted the short, semi-structured debrief and interview. While some students finished earlier, there were no significant differences among conditions in terms of average time spent.

Results

We examined both qualitative and quantitative data, and we will present the results in separate sections. Also, much of the discussion is reserved for the overall Conclusions section. However, in order to clarify the meaning of the results some discussion occurs in this section as well.

Quantitative

In response to the first research question, the IW-Dyad (X = 5.22, SD = .97) and the IW-single condition (X = 5.39, SD = 1.500) performed significantly better on the proximal items than the ET group (X = 3.75, SD = 1.22) (F(1, 50) = 3.90, p = .01), η2 = .199 (large effect size). There were no significant differences between any of the groups and the SF condition (X = 4.42, SD = 1.67). The study also revealed significant differences for the distal multiple choice items, with the IW-dyad condition (X = 3.71, SD = .47) outperforming the ET condition (X = 2.83, SD = .72) (F(1, 50) = 2.81, p = .05); η2 = .152 (large effect size). There were no differences found with either the IW-single condition (X = 3.39, SD = .96) or the SF condition (X = 3.33, SD = .89). Also, we found significant differences for the open-ended transfer task (F(1, 50) = 4.35, p = .01), η2 = .163 (large effect size). Post hoc comparisons revealed that the IW-dyad condition (X = 4.57, SD = 1.28) performed significantly better on the open-ended transfer task than the SF condition (X = 2.75, SD = 2.10) and the ET condition (X = 2.25, SD = 1.71). However, in response to the second research question, this was not true for the IW-single condition (X = 3.23, SD = 2.13), which was not statistically larger than any other condition (see Fig. 4 for summary of results).

Fig. 4
figure 4

Summary of study results. 3DS = 3-D singletons; 3D = 3-D dyads; 2D = 2-D dyads; DI = direct instruction dyads

Given that the spread in scores on the transfer task was so much larger than the spread on the standardized items, we were curious about the relationship between one’s standardized test score on the distal items and one’s score on the open-ended transfer task. This was partly motivated by our concern with the current national emphasis on distal standardized test scores, and our related interest in whether more open-ended tasks reveal understandings unaccounted for in standardized tests. Interestingly, a correlational analysis showed that there was a significant correlation (r = .60, p = .022) between distal standardized test scores and the open-ended transfer task, but only for the IW-Dyad condition, with standardized test scores accounting for 36.4% of the variance and representing a large effect size. Scores on the distal multiple-choice for the other three conditions did not significantly correlate with the open-ended transfer task. While only a small sample, these results suggest that standardized test scores measured one aspect of learning and the open-ended transfer task may have measured a different aspect of their learning. It appears that for students who do not master the initial material, their learning may not correlate well to how they perform on related standardized test items.

Qualitative Data

Observational Data

Addressing the final research question, we examined each of the four identified categories of interest discussed above: collaborative sense making, personalization, terminology, and instantiation. Fifteen session videos were coded using the scheme described above. Due to problems with technology, some of the data was difficult to code, which resulted in lower sample size for the quantitative assessment of differences among the four conditions. We decided it was more fruitful in the context of this study to use the identified instances to provide a more qualitative assessment of differences and present this data accordingly. Therefore, the focus here is on illuminating qualitative differences that were apparent in the three conditions.

The SF condition appeared to have the most instances of collaborative sense making (M = 20.0, SD = 10.9), followed by the IW dyad condition (M = 13.6, SD = 5.5), and the ET condition (M = 10.8, SD = 6.4). Within the IW group, there was typically a running conversation throughout the entire 90-min session between the two participants. The conversation often revolved around navigational issues and spatial orientation, but there was also some discussion around the conceptual issues. Within the SF group, there was less conversation during the session overall. However, when they typed up their reports at the end of each task, the participants’ engagement with each other increased considerably. And during this period, they were discussing and negotiating around the topics more than the other two groups.

In terms of personalization, the SF condition seemed to personalize the information more than the other two groups (an average of two per dyad whereas we saw almost no instances among the IW condition and an average of 1 for the expository text condition). However, the IW group seemed to be the most immersed in the context of the experience. Within the IW group, the participants mainly associated the information with a particular character or aspect of the virtual environment and referred to them by name or using a personal pronoun to identify them. That is, the dyads did not extract the information from the virtual world; they tended to keep the discussion within the narrative and characterization set up within the learning context.

The conversations among the IW participants suggested that they were immersed in the experience by taking upon the role of helping people in Taiga to solve the fish decay problem. This was consistent with the interview data, which we will discuss in the next session, where most of the participants made reference to the authenticity of the experience. However, while they were deeply immersed in the problem, they did not seem to relate to it personally. Perhaps the SF group had a higher rate of personalization (M = 2) because they were able to strike a balance between the story and real life, whereas the 3D group (M < 1) may have been too entrenched in the story to connect concepts to their own lives. The direct instruction group showed essentially no meaningful discussion, but there was some knowledge sharing during the typed assessments. There was mostly silent reading until they came to the written part of the task. This could explain why they had lower discourse indicators in all four areas except terminology.

The number of water quality terms used was highest in the SF group (M = 14.0, SD = 5.8), followed by the ET (M = 11.3, SD = 3.0), and the IW groups (M = 8.8, SD = 2.7). This was the only case in which the ET outperformed the 3D group. It is likely that this difference is due to the design of the ET curriculum. The text was organized around particular concepts and terms instead of a personalized narrative like IW condition, so they were likely to simply parrot the text. The ET group also showed the most instances of concept instantiation (M = 22.0, SD = 6.3), exhibiting more instances than the 3D (M = 14.6, SD = 8.2) and the direct instruction groups (M = 12.3, SD = 6.8). Again, while the IW group spent more time discussing the contextual situation (e.g., navigational layout, game character perspectives, and investigative priorities) the SF group spent more time interrogating the meaning of the concepts.

Interview Data

Occasionally, the IW single group expressed that the experience was more about reading text than engaging in the virtual space. However, none of the participants in the IW dyad group expressed a concern over the amount of reading they had to do. The SF and ET groups, on the other hand, continuously mentioned that the amount of reading that was required was indeed overwhelming and quite intense.

Conversely, participants in the IW dyad and single group raised issues around the navigational disorientation and redundant activities that were required of them in the virtual space. In particular, they expressed frustration when they had to travel and talk to characters many times, even when characters did not have anything of substance to offer. The IW group, on the other hand, found their learning environment easy to use and navigate. The ET group did not mention the ease of use of their experience; presumably this is due to the participants being required only to click ‘next’ or ‘previous’ buttons on the computer screen to see adjacent pages for information.

We found more variation in the levels of students’ engagement with the materials according to different groups. While at times the navigation proved frustrating, participants in IW dyad and single group indicated that using the avatar to interact with the game characters in the virtual space led to an engaging experience. Some participants in the IW single condition suggested that creating more interactivity between the user and virtual world would have significantly increased their enjoyment and engagement with the space. The SF condition seemed to get “into” the problem, but their comments suggested that they did not experience as much interactivity with the environment. The ET group did not mention getting “into” the problem or being engaged with anything other than the content.

Lastly, participants in all four groups expressed the similarity between the designed experience and the real world. Participants both in the IW dyad and single groups discussed that the experience felt “real” or “authentic” in nature. In fact, some dyads felt that they were actually communicating with the non-player characters within the virtual environment. In fact, one student described the NPCs as “telling me what they thought” and found it interesting “getting people’s thoughts and opinions.”Although a few of the participants in the SF group also mentioned the authentic nature of the task, most of the SF participants did not do so. As mentioned earlier, the ET group did not get “into” the problem or feel that it was real as the other groups did. Almost all the participants mentioned that they thought the topic was interesting and useful to learn and could see the relevance to real life.

Discussion

Previous research has suggested that a large piece of the puzzle regarding the successes (Bransford et al. 2002; Greeno 1998; White 1993) and failures (Detterman 1993; Gick and Holyoak 1980; Greeno et al. 1992; Lave and Wenger 1991; Nunes 1999) of transfer has to do with issues of distinguishing context from content. In this study, we explored methods for teaching students about scientific principles in a manner that leads to contextualized yet transportable knowledge. We found that those individuals working in dyads and using the most immersive intervention did significantly better than students given similar, yet more focused information when compared on the standardized test items and the performance-based, transfer task. “This is somewhat surprising when one adopts a strictly mechanistic perspective of learning in that the expository text condition experienced the content in a manner most closely related to the test items, and the simplistic framing hypertext condition showed the richest incorporation of scientific terminology in their discourse according the qualitative data”. On a related note, the performance-based transfer task illuminated differences between the simplistic framing and IW-Dyad conditions that were not revealed by the distal standardized test scores. These results indicate that, for this sample, standardized test scores were less sensitive to individual differences and, as a result, masked some within-group and between-group differences.

In explaining these differences, we have offered a theoretical frame that ostensibly has applicable value for others doing similar research. Specifically, we have described and argued for the pedagogical importance of learning environments facilitating a sense of transformational play and have suggested that this potentially grounds one’s understandings of the underlying science concepts as well as one’s relations to them. Central to our notion of transformational play is that the learner is using the science content to transform a particular context—i.e., being positioned as an individual with an authoritative role, having agency in choosing what actions to take, and having consequential actions that affect the unfolding situation. Most importantly, transformational play involves a sense of narrative, perceptual, interactive, and/or social immersion within a situation where the individual has some level of agency in terms of transforming the context and effects on how the events unfolds. Transformational play requires that the curriculum does more than “contextualize” the content, as if watching a video or reading a rich description (Cognition and Technology Group at Vanderbilt 1993), but positions the learner as an agent in and on the context with player actions having game-world consequences and game-world consequences changing possible future actions.

It is in this way that a virtual world has the potential to ground participation and the learning of science content in terms of particulars, establishing a sense of sensory, actional, and symbolic presence (Dede 2009). Arguably, a limitation of much of current science education is that learners are too often positioned as passive receivers expected to memorize abstracted disciplinary content (e.g., when students are expected to memorize a list of facts or even concepts defined by a textbook or teacher). This pedagogical move does little to aid the learner in becoming able to use and value the content, or, if they do appreciate its value, to see themselves as someone who does science. Even in those situations where context is enlisted, it frequently involves only simplistic framing. Roth (1996) distinguished between “con-text” and context, the former being those situations in which a context is somewhat artificially paired with (con-) content (text) in order to illustrate the concept. Lave (1997) further argued that the more the teacher, the texts, or the curriculum own the learning, the more difficult it becomes for the students to develop meaningful understandings. In contrast, here we placed the learner at the center of the world dynamics, grounding the learner, content, and context in a tightly coupled loop where content if properly leveraged by the person has the potential to change the context.

In conclusion, our goal here was to investigate the power of virtual worlds and videogame methodologies to develop a rich learning context for supporting learning, and to understand whether working in dyads would facilitate more transferable understandings than when working alone through participation in a conceptually rich environment. Our data showed that students in the immersive condition were able to score more highly on a performance-based transfer task, and that this potential was heightened for those in the dyadic condition. The challenge of building such a curriculum is in selecting an appropriate context and the amount of situational details that need to be introduced (Barab and Roth 2006). If a particular learning environment is too tailored to the disciplinary content, then it starts to feel more like school work, becomes less experiential (Dewey 1938), the potential for immersion becomes unlikely, and the knowledge is more likely to be inert. To the extent that a particular concept is not connected to the core context-of-use, it runs the risk of being an abstracted fact to be memorized, with no contextual anchor from which a student can see its authentic application.

As one adds more situational details, then the mystery, reality, and discovery potential increases, but one loses guidance, efficiency, and clarity. The challenge is to balance these details such that the learning occurs in a context that grounds the to-be-learned content at the same supports the player in realizing the value of the content independent of the context in which the learning occurs. Here, we used design moves such as liminal episodes, interactive rule sets, and more embedded pedagogical supports to support such learning. It is our hypothesis that collectively these moves simultaneously grounded the content learning and attuned the learner to the fact that there was scientific content being grounded. To be clear, we are not arguing that content is not important or that it should not be directly highlighted as part of the instructional process. Rather, we are suggesting that through grounding these descriptions in situationally rich experiences where one’s actions, especially those that are conceptually informed, have consequence on a context, we are able to support learners in understanding the meaning of the to-be-learned context. At the core of our theory of transformational play is that in understanding the relations of content to a particular context, one in which the learner has had experience in changing, that one is better able to see its meaning elsewhere.

Implications

It has been argued that the lecture format concentrates on memorization of factual information and promotes the development of superficial understandings (Cognition and Technology Group at Vanderbilt 1993; Roth 1996). In spite of this concern, many undergraduate science classes remain dominated by the large lecture format with class sizes over 150 students and the textbook being the primary learning resource. While this study took place within a laboratory context, these findings challenge the value of the textbook as the primary learning tool. Games scholar Gee (2003) likens learning from the textbook as primary resource to reading the game manual without playing the game. If one tries to read the manual to most videogames before playing the game they are littered with technical jargon, typically un-motivating, and not very illuminative in terms of how to play the game. However, once one has played the game, the manual is quite useful for deepening understanding of implicit rules and principles. This study suggests that undergraduate educators might find videogames technologies and methodologies to be useful pedagogical tools for deepening learning.

We are at an interesting time in which science teachers and schools are under more pressure than ever to prepare students for standardized tests, but they are facing a generation of students who view the school curriculum as having little relevance to their own lives. Presenting abstracted facts (i.e., expository text as is frequently the case in science textbooks), while potentially efficient, is often not the best pedagogical strategy. These findings suggest that there might be more powerful pedagogical tools available to teachers. Drawing on game-design principles and our underlying situated theoretical perspective, we developed a game-based curriculum that in addition to having an engaging narrative, included interactive rule sets, pedagogical agents, extrinsic rewards, and a perceptually rich 3D environment—all designed to establish a sense of conceptual play. This study has important implications for rethinking science classroom learning in that, in addition to the teacher, textbooks remain the primary resources for supporting learning in most formal institutions. Future work will explore the relevance of these findings for different populations, and will examine the role of the teacher in supporting learning when using textbooks, illuminative examples, or game-based virtual worlds.

Consistent with the findings from this study, it is our belief that science teachers should allocate fewer resources for supporting content transmission and, instead, invest in developing contexts that ground the content we wish students to learn. Our interest is on designing curricular contexts in which domain concepts become tools for inquiring into the presented situation and in which the learner’s choices have consequence on the unfolding of the learning/participation environment. It is our belief that videogame contexts, and their potential to foster this sense of engaged consequentiality, can provide a useful means for enhancing science education. Elsewhere, and in future work, we have and will continue to implement these transformational play spaces in actual classrooms, helping to build effective models for how they can best be leveraged by teachers and students to support academic learning. Also, we will more closely examine what aspects of the curriculum seemed to facilitate transfer, systematically testing the value of particular aspects to determine their influence on learning. In this study, our goal was simply to demonstrate the potential of transformational play as a pedagogical approach that leverages game-based methodologies and IW technologies to support deep and transferable science learning.