1 Introduction

Virtual environments (VE) are being used increasingly in instruction and assessment in a variety of domains. They have been enthusiastically adopted in academic contexts (Youngblut), and many universities have created educational technology departments to research, develop, and incorporate VEs and other forms of computer-administered instruction and assessment into standard pedagogy. VEs are also used for training and assessment in non-academic educational settings, such as military and medical training (Lampton and Parsons 2001; Satava and Jones (1997).

Virtual environments are, by definition, “a real-time graphical simulation with which the user interacts...within a spatial frame of reference” (Moshell and Hughes 2002, p. 893), making them places where spatial ability has an impact on performance. This is not the case for many “virtual tours” which require the individual to click an object or place on a picture to trigger an explanation or a shift to the picture or another location (which are questionable as VEs based on this definition). Rather, those VEs that require navigation from one place to another to accomplish a task necessarily rely on spatial navigation skills, and are therefore subject to differences in the way individuals navigate within VEs.

Most VEs used in education and training are not primarily targeted at assessing or training spatial navigational skills. Nevertheless, the environmental features of a VE can influence how well individuals learn or perform within it. In general, if the VE-based task requires an individual to complete a multi-component process, with different components located in different places within the VE, difficulty navigating in the VE (e.g., becoming lost or disoriented) could interfere with the performance. This could reduce the individual’s assessment score, or even lead them to quit. In these situations, poor performance might not be due to an inability to accomplish the requirements of the task itself, but rather the result of an inability to navigate fluently in the VE. For example, the emergency training simulator, ERT-VR, trains people to take on different roles in emergency scenarios (Li et al. 2004). The simulator is designed for both instruction and assessment. Credits are awarded depending on how the individual performs in a variety of emergency missions. Although this type of immersive learning is highly beneficial, its navigational demands could cause some individuals to fail despite being excellent candidates as emergency response team members. If the individual were to get lost in the VE, they would appear to be showing poor learning or bad judgment such as taking an injured person the wrong way, taking too long to evacuate a building, or not going to the most injured person first after doing a triage survey of victims. In reality, all these failures could be simply the result of navigational errors in the VE.

One reason why the navigational demands of the VE are so important is that males and females may navigate using different navigational strategies. Some research has suggested that females tend to use landmarks to navigate, while males tend to use bearings or vectors, such as the direction in which they are headed (Czerwinski et al. 2002; Sandstrom et al. 1998). Other research has suggested that males tend to navigate using distal directional landmarks (and vectors) while females tend to use proximal locational landmarks (Jacobs and Schenk 2003). These gender-specific differences in strategy are seen in both the real world, and in VEs (Halpern 2000). If the VE is not built with both types of landmarks, then it could bias performance one way or the other. In fact many VEs are designed to be learned using vectors, rather than landmarks (Czerwinski et al. 2002; Tan et al. 2003), which could be responsible for a male advantage.

There are several examples of where gender differences in navigation strategies could affect performance in VEs. In two educational tasks, ActiveWorlds (Dickey 2005) and Quest Atlantis (Barab et al. 2005), students must progress through a virtual world and find or retrieve items. The items may be objects needed to complete a task or portals to open web pages needed to research information. Students need to learn and remember the locations of these items. If female students have difficulty knowing where they are or remembering where they have been, this would slow their performance and could impair their learning the information that the VE is trying to teach them. In another VE used for training and assessment, the team-combat training FITT environment (Lampton and Parsons 2001) requires teams to advance through a building and enter rooms to check for enemies. Team members are assessed based on a series of necessary actions, such as searching the rooms in the correct sequence (always enter the next room to the right), and properly reporting the location of enemies when they are encountered (“left”, “right”, “front” etc.). If female trainees take longer to orient themselves in the VE, they could make errors searching the rooms or reporting the location of enemies.

In the present study, we examined gender differences in spatial navigation in two different VEs: one which provided only distal directional landmarks and a second which had both these and proximal positional landmarks. Both VEs were analogs of animal paradigms commonly used to study spatial navigation. The first VE—the Arena Maze (Skelton et al. 2000) replicates the Morris water maze (Morris 1984), which is solved using a configuration of distal cues (Hodges 1996) to establish directions, headings and vectors (Jacobs and Schenk 2003). In this task, participants must find a specific location within a large space in a room using vectors to walls and windows in order to determine their location and compute a trajectory to the specific location. In this task, male humans and male laboratory animals both tend to do better than females (Sandstrom et al. 1998). The second VE—the Quad Maze—is a new task which essentially replicates a radial arm maze, which can be solved using either distal or proximal cues, depending on what is available (Hodges 1996). In this task, participants must travel down arms of the maze only once per trial, and go only to particular locations along the arms. An important feature of this VE is that the environment simulates a real location on our university’s campus. This allowed us to test for learning of this task both in real and virtual space. In addition, it allowed us to test for transfer between virtual and real environments. All VEs were “desktop VEs”, that is, they were presented to participants using ordinary desktop displays (monitors).

The objectives of the present study were to: (1) examine whether gender differences in spatial tasks in VEs depend on the structure of the VE; and (2) examine whether spatial information learned in the VE transfers to the real world.

2 Materials and methods

2.1 Participants

Participants were 19 female (mean age = 19.53 years, SD = 2.12) and 16 male (mean age = 19.81 years, SD = 1.83) undergraduates ranging in age from 18 to 26 years old. All were right handed and all completed the entire study within 120 min. Two participants (one male and one female) were excluded due to vestibular problems (dizziness), difficulties with understanding the instructions, or difficulty controlling the joystick.

2.2 Apparatus and materials

Three tasks were used assess gender differences in spatial cognition: (1) a VE-based “Arena Maze” task to assess differences in cognitive mapping; (2) a classic paper-and-pencil Mental Rotation Task (MRT) to assess differences in the ability to imagine and rotate 3D objects; and (3) a new “Quad Maze” task to test for object-location learning and transfer of knowledge between virtual and real environments. Two Quad Mazes were completed by each participant: once in real space (the Real Quad Maze) and once in a VE (the Virtual Quad Maze), with order of presentation counterbalanced across participants. The Arena Maze and Virtual Quad were both built using the UnReal® engine (EPIC Games) and run on a Windows-based personal computer with an Intel PIII 450 MHz processor and 192 MB RAM, a 3dfxVodoo3 graphics card with 32 MB VRAM, and a 17 in. monitor set to a screen resolution of 800 × 600.

2.2.1 Arena Maze

The Arena Maze (like the Morris water maze) was designed to provide a configuration of distal, directional cues (Jacobs and Schenk 2003) with no proximal, positional cues to locations within the arena. The VE consists of a large circular arena contained within a very large, square room with windows, a door, and landscape outside (see Fig. 1). The arena appears to be 40 m in diameter and is bounded by a low (1 m high) wall, which restricts movement but not view. The room appears to be 75 × 75 × 17.5 m high. Two facing walls each have 3 windows, which provide views of an outside world having green hills sloping to a beach. One of the other facing walls has a large door, while the other has a large window providing a view of a large body of water with a mountainous island. Although all of these features are visible from the participant’s eye-level perspective within the arena, they are only fully viewed from near the windows. For reference purposes, each of the four walls is designated as a cardinal direction (North–N, East–E, South–S, and West–W).

Fig. 1
figure 1

Arena Maze—view from the corners (slightly above participant’s POV) a View of Northwest corner, b View of the opposite (Southeast) corner c View of South wall from inside the arena. Platform is invisible

2.2.2 Mental rotation test (MRT)

The MRT is a standard paper and pencil test that is often used to show/measure gender differences in spatial cognition. It consisted of 24 problem sets of five line-drawn 3D figures. One figure in each set is the sample; the other four are variations of the same figure, two of which match the sample when rotated, while the other two do not.

2.2.3 Quad Mazes: real and virtual

The “Quad Maze” is a new task designed to test object location memory in real and virtual space. It is similar to a “Radial Arm Maze” task used widely with animals (Hodges 1996) but was adapted in recognition of the ability of humans to follow complex verbal instructions. The real-space version was laid out in a well-known area central to the university campus and the virtual version recreated the pattern and angles of pathways, the surrounding trees and buildings, the local landmarks (such as benches and light poles) and the location markers used in the real version.

The Real Quad Maze was laid out in a central 26 × 29 m quadrangle (or “Quad”) bounded on all sides by 1.5 m wide asphalt paths and bisected on both diagonals. There was a bulge in the area where the diagonal paths met, and a fifth path entered perpendicular to one of the sides (see Fig. 2a.). This arrangement formed, in effect, an asymmetrical 5-arm radial maze, with a non-circular central hub. One diagonal path was designated as the entrance, while the other four radiating pathways comprised the maze arms (see Fig. 2a). On each of these four arms, three locations at 4 m intervals down the path were denoted by traffic cones, each holding a 10 × 10 cm sign on a 1.5 m stick. Each of these 12 signs showed a picture of a different common fruit (e.g., apple, pineapple, cherry) and was covered by an opaque piece of plastic to prevent the fruit picture being seen from a distance. One stand on each arm was designated the target location, chosen such that no positional characteristic predicted which stand would be the target. That is, the target could be the first, second, or third farthest from the hub, on the left or right side of the path, and by itself or one of two stands on the same side of the path (see Fig. 2a). Grassy knolls between the paths (and our instructions) discouraged participants from taking shortcuts between arms (see Fig. 2b).

Fig. 2
figure 2

a The layout of pathways and the locations of signs on stands (circles) of both RQ and VQ. Closed circles depict the four target locations. The “×” (near A) denotes the start of the maze, where participants received instructions. b Photograph of real university quadrangle looking towards the hub and then past it, down the entrance arm. c Virtual rendering of university quadrangle from the same viewpoint. Note similarity of buildings, features and paths to the real quad and the stand in the foreground that contained a covered picture of a fruit

The Virtual Quad Maze had identical pictures in identical locations. The (somewhat unusual) appearance of the “fruit stands” was chosen to be reproducible in both real and VE (see Fig. 2c). In the Virtual Quad, the covers over the fruit pictures opened automatically when the stand was approached.

2.3 Design

All participants were tested first on the Arena Maze, then on the MRT, and then on the Real and Virtual Quad Mazes, in counterbalanced order. Participants were alternately assigned to one of two conditions: Real-Virtual condition (10 females and 8 males) and the Virtual-Real (9 females and 8 males).

2.4 Procedures

Upon arrival, all participants completed a consent form, answered some demographic questions that included video game and joystick experience, and then were directed to the computer screen for Arena Maze testing.

2.4.1 Arena Maze

Participants were read instructions for the Arena Maze from a script. They were informed that the joystick was mechanically modified to permit forward movement and left and right turns, but no backing up. They were also informed of the task requirements, but not the purpose of each of the four types of trials: (1) an exploration trial in which they should explore the room outside the arena and the views outside the room; (2) visible platform trials in which they simply needed to go to and stand on a visible platform in the arena; (3) invisible platform trials in which they needed to search for a platform hidden below the floor, and (4) a probe trial in which they needed to search for the platform for 50 s. On invisible platform trials, the platform was always in the center of the NE quadrant and when stepped on, rose slightly to become visible, and made a mechanical sound. On probe trials, no platform was present. Participants were warned that one trial would be challenging and that the platform had not been moved to a different location but that it had been made “really hard to find”. Participants were encouraged to seek the platform until the end of the trial. The purpose of the four trial types (and the other tests in the study), are given in Table 1.

Table 1 Tasks and purposes

During the exploration trial, participants were encouraged to move around the room until they felt comfortable with the joystick and to go to the windows and look out. When they said they were comfortable, they were reminded to go quickly and directly to the platform and then “teleported” to the start position of the first visible platform trial. In all subsequent trials, participants were teleported by the experimenter directly from the platform to a new start position for the next trial as soon as they said they were ready. After four visible platform trials, the participants were informed of their task on the ten upcoming invisible platform trials. Participants were given up to 3 min to find the platform on each trial, which was located in the center of the NE quadrant on all ten trials. If criterion was not accomplished after 3 min, participants were guided to the platform with verbal movement-based instructions. The final, probe, trial was indistinguishable from the others, except that the platform did not rise until the end of the 50 s trial. Start positions on all trials varied randomly among the four cardinal points of the arena (i.e., N, E, S, or W, the four positions closest to each of the four walls). The participant’s ability to find the platform was measured as the latency from the start of the trial until they first triggered the platform, and by the distance they traveled from start to platform. The accuracy of their search for the platform on probe trials was measured by the duration of their dwell times in each of the four quadrants of the arena [see Skelton (2006) for details of the analysis]. After the last trial, participants were asked about dizziness and their sense of presence in the environment.

2.4.2 Mental rotations test (MRT)

Participants were given instructions and allowed 4 practice trials to match the sample figure to 2 of the 4 variations and were then given 24 problems to solve. The score was simply how many of the 48 matching variations (2 on each of 24 trials) they were able to identify.

2.4.3 Quad Mazes

Participants were informed that their task would be to remember the locations of the fruit pictures in the maze and were then guided to the hub up one diagonal arm of the maze (arm A in Fig. 2a). They were then given a tour of the maze, starting from the center of the hub and visiting each arm in clockwise order. They were taken to all stands and shown the pictures on each one. The tour ended back at the hub. Participants were told to always travel on the paths and not to take shortcuts. Participants were given a booklet containing pictures of four of the fruit (one per page) and told to go to the location of each one, in the order they were shown in the booklet. Trial time was calculated from then until they had found the fourth target location. Target errors were recorded if the participant approached and examined a location that was either not a target or not in the correct order. Arm entry errors were recorded if the participant traveled more than 1.5 m down an arm not containing the next target object. Once the participant found all four objects they returned to the central hub to start the next trial.

Participants were given up to five trials to reach criterion performance. They were given up to 7 min to complete each of the first three trials and up to 5 min for the last two. If they could not locate the targets within these times, they were given location-neutral, movement-based directions to each target, in order. Participants reached criterion by finding the four target locations in the correct order with no target errors twice in succession.

After completion of the Arena Maze, MRT, and both Quad mazes, participants were debriefed, thanked and given bonus points for their undergraduate Psychology course.

3 Results

3.1 Data analysis

All data were analysed with SPSS with alpha set at 0.05. The dependent variables for the Arena Maze were latency and distance on visible and invisible platform trials, and percent time spent in the correct quadrant on probe trials. An omnibus spatial score was calculated by normalizing latency, distance, and probe trial data (Skelton et al. 2000). Learning effects were tested using repeated measures MANOVA on invisible platform trials and gender effects were calculated for distance and latency based on the average of trials 2–10 (because the first trial was excluded as it reflected spatial searching ability rather than spatial learning or memory.) There were no differences between the participants in the two orders of Quad Maze testing and so their Arena Maze data were pooled within genders. The dependent variable for the MRT was number of correct match-to-samples (max 48), and gender effects were analysed using t-tests. The dependent variables for the two Quad Mazes were total time to complete the task, latencies on the first and second trials, trials-to-criterion and overall number of errors (i.e., incorrect targets examined).

3.1.1 Arena Maze

There was a reliable difference between genders on the directional version of the Arena Maze. As expected, there were no differences between genders on visible platform trials (see Figs. 3 and 4), in either latency or distance (t < 1.4, ns) indicating that males and females understood the task demands and were able to move through virtual space with equal ease. More importantly, males and females differed on invisible platform and probe trials, showing significant gender differences in latency [t (33) = 3.08, P < 0.004], distance [t (33) = 2.81, P < 0.008], dwell time [t (33) = 3.65, P < 0.001] and the spatial score [t (33) = 3.79, P < 0006]. These differences had large effect sizes (Cohen’s d) namely distance (0.92), latency (1.01), dwell time (1.21) and spatial score (1.25) showing that in the Arena maze males performed better than females. Males and females both showed significant learning, as evidenced by significant linear, quadratic, and cubic trends for latency [F (1,33) > 4.3, P < 0.05] and distance [F (1,33) > 10, P < 0.01]. Even though their asymptotic performance differed, learning rates of males and females appeared to be about the same, as there was no significant interaction between trials and gender (Figs. 3, 4).

Fig. 3
figure 3

Latency of males and females in the Arena Maze, on visible platform trials (V1–V4) and invisible platform trials (I1–I10)

Fig. 4
figure 4

Distance traveled to reach the platform (in arbitrary units) for males and females in the Arena Maze, on visible platform trials (V1–V4) and invisible platform trials (I1–I10)

3.1.2 MRT

There was a small but significant difference between genders on the MRT. Males scored higher on the MRT than females (male mean = 23.2 ± 2.1 SEM, female mean 18.1 ± 1.9 SEM) a significant difference in the expected direction [t (33) = 1.82, P < 0.04, 1-tailed test]. This effect was not large, however, d = 0.62.

3.1.3 Quad Mazes

There was little difference between the genders on the Quad Mazes. Females took the same number of trials as males to learn the first Quad Maze (see Fig. 5) and though both males and females tended to take more trials in the Virtual Quad Maze than the Real Quad Maze, the difference was not significant (t = 1.71, P = 0.10). There was a tendency for females to make more errors in the Virtual Quad Maze than did females in the Real Quad Maze, but again, the difference was not significant (t = 1.59, P = 0.12). Taking both genders together, there was no significant difference in the number of target errors between the Virtual and Real Quad Mazes (t = 1.94, P = 0.06).

Fig. 5
figure 5

Trials to criterion (left) and number of errors (right) for Real (R) and Virtual (V) Quad Mazes for males (M) and females (F). Dashed lines indicate those tested in the Virtual Quad Maze first

Knowledge gained in one Quad Maze transferred well to the other, regardless of whether it was real to virtual or virtual to real (see Fig. 5). There were significant declines in both trials-to-criterion [F (1,33) = 49,8, < 0.001] and target errors [F (1,33) = 16.77, P < 0.001] from the first Quad Maze (real or virtual) to the second (virtual or real). The declines in trials to criterion were significant for both real to virtual transfer [F (1,16) = 30.3, P < 0.001] and virtual to real transfer [F (1,15) = 25, P < 0.001], regardless of gender.

4 Discussion

The current findings reinforce previous research showing gender differences in spatial navigation in VE, and further show how the design of the VE can influence the magnitude of the difference. They show that it is not the virtual nature of the task which impairs performance of females. They also show that spatial information learned in a virtual or real environment easily transfers to a real or virtual environment, respectively.

In the MRT, the small but reliable gender difference expected from the literature was obtained (Voyer and Saunders 2004), confirming that in terms of spatial abilities, this was a typical sample of male and female participants.

In the Arena Maze task, no gender differences in latency or distance were found during the visible platform trials. This suggests that males and females understood the task and were comfortable using the joystick to move quickly in the VE. In contrast, there were large and reliable gender differences in both latency and distance on the invisible platform trials and in dwell times on the probe trials. These data indicate that males were significantly better than females at learning the location of the hidden platform, and navigating quickly and directly to it. The distance and probe data, along with the visible platform data, show that males were not better simply because of greater skill at moving in the VE.

On the Quad mazes, no substantial gender differences were seen. There was no clear gender difference in the number of trials required to learn and remember the location of all four targets and there was only a small non-significant tendency for females to make more errors in the VE Quad Maze than either males in the VE or females in the Real Quad Maze. A replication of this test with a larger n might determine whether this difference is reliable or not. Nevertheless, the current results indicate that there was no difference in the ability of males and females to learn, remember, and discriminate the target locations and navigate to them quickly.

Transfer of knowledge from real to virtual or virtual to real versions of the university quad was almost perfect. Neither males nor females needed extra trials nor made target errors in their second maze, regardless of the direction (real/virtual) they were transferring.

What accounts for the dissimilarities in gender differences between the two VEs? Such dissimilarities have been reported previously: Coluccia and Louse (2004) surveyed 14 studies examining gender differences in simulated environments, and found that males had an advantage in only 8 of the 14 (57%). This supports the idea that gender differences are due to the characteristics of the task, rather than the task being in a VE.

One possibility is that males are better than females in unfamiliar environments. The Arena Maze was novel and somewhat unrealistic whereas the Quad Maze was set in an area of campus that was not only real, but also due to its central location, familiar to most students on campus. Although this dimension would need further study to determine the magnitude of its impact, it does suggest that when designing VE tasks for training and education, that layouts be as familiar as possible, and that everyone should be given the opportunity to explore and familiarize themselves with the environment before being tested in it.

A more likely seeming difference between the VEs is the difference between the navigational demands of the Arena Maze and the Quad Maze. Certainly the Morris water maze and radial arm maze (after which the Arena and Quad Mazes were fashioned) have well recognized differences in cognitive demands [see Hodges (1996) for a review]. The Arena maze has no local reference points by which the target location (the platform) can be found. The circular arena wall is featureless and the room walls are far enough away as to provide only very poor positional cues. All they really provide are directional cues. In contrast, the Quad Maze provides an abundance of both types of cues: directional cues from the surrounding trees and buildings that are asymmetrical in all four directions, and positional cues from the asymmetrical pathways, the local features such as benches, and the left/right positions of the target locations. A recent review of spatial navigation studies (Jacobs and Schenk 2003) concluded that males and females prefer to rely on different navigational landmarks, with males preferring directional landmarks (present in the Arena Maze) and females preferring positional landmarks (absent in the Arena Maze).

4.1 Implications for design

As pointed out in the introduction, there are many features of a VE which could bias the measure against females. Scores based on time taken to perform the task, or number of errors or wrong turns taken in getting through a task may reflect difficulties with navigation in the environment rather than difficulty in understanding the task demands, the cognitive capability of the individual, or the mastery of the target skills or knowledge. In other words, those designing VEs for education and training need to be mindful of the spatial demands they are building into their tasks.

If a VE-based task requires any movement between areas and acquisition of the location of objects or information, the VE should be designed to support both directional and positional navigation. That is, they should have both local and global features by which individuals can determine their location, orient themselves to the overall structure of the space, navigate from one place to another, and locate a particular position from within an array. Cubukcu and Nasar (2005) examined ways of improving navigation in a virtual neighborhood, and found that the best results for wayfinding were in the VEs that had a simple layout, good physical differentiation between objects in the environment, and good differentiation in the vertical environment (landmarks) and in the horizontal environment (roads). This constellation of factors allows for both gender styles of navigation in VEs.

There was an indication in the current study that VEs may take a little longer to learn than real environments, and that females might need a bit more time. With this possibility in mind, training and assessments using VEs should consider providing extra time and practice for individuals to learn the basic layout of the VE, and not penalize those who need a little extra time to familiarize themselves with it.

5 Conclusions

Virtual environments offer enormous potential in education; ironically, learning and assessment can be placed in a “real world” context better using VE than in a standard classroom. For example, driver’s education courses conducted with students experiencing, through VE, the consequences of speeding, or not thinking and looking ahead while driving, would potentially be more effective in teaching defensive driving. History could be given a new context for students if they could “see” what explorers saw when they arrived for the first time in the New World. Airlines and the military have long recognized the value of flight simulators, and recent advances in online gaming have enhanced squad-level combat training.

In the coming years, gender differences in VE navigation may disappear as VEs become more prevalent and both males and females become more familiar with its demands. As yet, there has been little research to examine this phenomenon (Terlecki and Newcombe 2005); the majority of research in gender differences in VE involves one-time exposure to the VE, or a short training/familiarization period followed by an evaluation. It is possible that gender differences will diminish given adequate experience with the VE prior to or during learning in the VE, or through integrating repeated practice into instruction prior to assessment.

The main thing to note at this time is that VE programmers and instructors should create VEs that incorporate both positional and directional landmarks to support both male and female styles of navigation within the VE in which the assessment or instruction is being conducted.