Keywords

1 Introduction

Precise 3D input devices have become mainstream thanks to low-cost game hardware like Microsoft Kinect and PlayStation Move. However, the dream of immersive first-person virtual reality (VR) has not come true in the household, because display technology is lagging behind. Meanwhile, commercial motion games use a single television with the game world typically portrayed in a first-person view (Rise of Nightmares), a third-person over the shoulder view (Kinect Star Wars, Fighters Uncaged, The Fight: Lights Out) or a third-person mirror-like view (Your Shape Fitness Evolved, Dance Central).

Our work is motivated by our experiences from prototyping a third-person Kinect action game where the object was to reach static and moving targets. In early tests using a television as the display, estimating distances and hitting targets in 3D seemed very difficult. It was clear that some form of additional cues or guidance was needed, preferably without the need for a stereo 3D display to reach a wider audience.

This paper presents a study that compares both proven and novel cues for enhancing target reaching and intercepting in a third-person perspective. Reaching and intercepting are tasks integral to most fighting, sports, and action games. Our main contribution is the study of volumetric shadows. We wanted to avoid possibly alienating and “brute force” visual guides such as rendering the predicted path of an object in the game world. We were particularly intrigued by the possibility of designing lighting so that it enhances gameplay and creates positive user experience.

According to our review of the literature, the use of volumetric shadows has not been studied in positioning or interception tasks.

2 Background

The human visual system combines a number of cues with information about the posture of the eyes to compute the distance of objects [13]. Cutting and Vishton [1] list a total of 15 sources of information about three-dimensional layout and distance: accommodation, aerial perspective, binocular disparity, convergence, height in visual field, motion perspective, occlusion, relative size, relative density, linear perspective, light and shading, texture gradients, kinetic depth, kinetic occlusion and disocclusion, and gravity. Increasing the amount of information generally increases the accuracy of depth judgments [1].

The effect of visual cues such as binocular disparity on motor behavior has been researched extensively. The importance of cues varies depending on the activity or sport. For example, Heinen and Vinken [4] report that binocular vision is not necessary for experienced gymnasts. On the other hand, Laby et al. [5] report better stereoacuity in Olympic-level soccer and softball athletes compared to archers, and various ball catching and hitting studies have found binocular vision superior to monocular [69]. However, monocular catching and hitting is still possible, and the importance of binocular vision varies as a function of, e.g., target size and speed [6, 8].

In the realm of computer graphics, Wanger et al. [3] found shadows and perspective to help in a positioning task conducted using a single 2D display. Their study required the participants to match the position, rotation and size of an object with a reference object. In the context of this paper, moving the avatar’s hand to touch a target can also be regarded as a position matching task. Wanger’s follow-up study [10] found shadow sharpness and size to have no significant effects on estimating object size and position. Hubona et al. [11] reported that in monocular viewing, shadows enhance positioning accuracy to a level equivalent to stereo 3D (S3D) viewing without shadows. However, they also found that in S3D shadows increased both accuracy and response times. Non-realistic depth cues have been explored by Glueck et al. [12], who introduced virtual position pegs standing on a multiscale reference grid.

There are many studies of using head-tracking-induced motion parallax to aid depth perception: Teather and Stuerzlinger [13], and Boritz and Booth [14] have investigated both S3D and head tracking in positioning tasks, and while their results support the importance of S3D, no significant effects were found for head tracking. However, it has also been shown that head tracking can help more than S3D in other tasks [15].

It should be noted that humans use focal and ambient vision differently for movement control. Focal vision is specialized for object identification and conscious movement control, and ambient vision is specialized for unconscious control based on, e.g., optical flow [16]. Conscious processing of focal vision is slow compared unconscious movement control based on ambient vision [17]. In commercial motion games and our tests using a single television about 2–3 m away from the user, the display is probably too small for ambient vision to be of much use.

The unnatural third-person perspective also changes the optical flow patterns. Although a first-person view could have less problems, being closer to our natural way of seeing the world, we agree with Oshita [18] that a third-person view is optimal for many full body action games, because it allows the players to fully see how their movements are mapped to the avatar’s movements and how the avatar reacts to the environment.

We went through a list of 95 games that required Kinect [19] and viewed their gameplay videos online: 35 of the games relied primarily on a third-person view, 9 relied primarily on a first-person view, 41 offered a mirror-like view (usually the mirror image was very small), and the other 10 games involved primarily 2D pointer interaction or did not fit into our categorization. Existing games often make it easier to hit or intercept objects using graphical guides. Sport Champions: Table Tennis draws the predicted path of the ball as a 3D curve in the game world. The Kinect Sports goalkeeping game transforms an interception task into a reaching task by displaying a static marker that shows where the ball will be a moment later.

In the onset of our study we conducted preliminary tests with a third-person Kinect action game. Using a 2D television as the display, estimating distances and hitting targets in 3D seemed much more difficult than in real life or in an immersive VR system like the CAVE [20]. We found it possible to learn to hit static objects through trial and error, but moving objects were more often missed than hit. There was an apparent need for some form of additional cues or guidance, preferably without resorting to S3D displays so that wider audiences could be reached. This led us to experiment with volumetric shadows as depth cues. Our hypothesis is that volumetric lighting can be used to enhance spatial perception in real-time 3D applications. The purpose of our study was to test that hypothesis and to evaluate how different depth cues affect user experience.

2.1 Volumetric Shadows

In computer graphics, realism of rendered images can be improved by taking into account light scattering effects of air and other participating media [21]. This results as images where visible beams of light emanate from the light source. If there are objects blocking the light beams, then no light scattering will occur in the umbral regions behind the objects, and so called volumetric shadows (Fig. 1) will appear [22].

Fig. 1.
figure 1

(a) Volumetric shadow, (b) surface shadow, (c) volumetric light shaft.

For the sake of clarity, we use the term surface shadow to describe ordinary shadows that occur on illuminated surfaces when direct illumination from a light source is blocked by an object. It is worth noting that a light source can cast both surface shadows and volumetric shadows at the same time, as seen in Fig. 1. To distinguish from standard light sources that emit only direct light, we use the term volumetric light source for light sources that feature volumetric light shafts and shadows.

Surface shadows are prevalent in 3D video games, whereas few games use volumetric shadows. Those games that have volumetric shadows use them as mere eye-candy or for increased graphical immersion (e.g. Uncharted 3, F.E.A.R. 2). Alan Wake is a rare exception, as volumetric light sources play an important role in its gameplay, where it is necessary to illuminate the player character’s adversaries in order to defeat them.

Previous lighting and shadow studies about depth perception have been mostly limited to surface shadows. A surface shadow can resolve the depth/size ambiguity, but it is not ideal if the object of interest is far from the surface, because (1) the shadow cannot be seen accurately while keeping ones gaze fixed on the target and (2) the shadow may end up occluded or outside the camera view. In contrast, a volumetric shadow traces the linear path between the object and the surface shadow that it casts. By doing so, the volumetric shadow also points into the direction of the light source.

As we stated, volumetric lighting and volumetric shadows are a result of light’s interaction with participating media. There exists a multitude of research on participating media rendering techniques [23]. This research focuses almost exclusively on rendering quality and performance, and user studies about the effect of such advanced rendering techniques on depth perception are rare. However, a few such studies can be found in the field of medical imaging, where it can be a matter of life and death to understand 2D renditions of complicated 3D scans. Many rendering techniques for medical scans provide a translucent appearance that conveys depth by simultaneously revealing several tissue layers that otherwise would be occluded [2426]. A related, albeit non-photorealistic technique for accentuating depth information is the use of volumetric halos [27, 28].

In our literature review of volumetric shadows, we came to the same conclusion as Lindemann and Ropinski, who stated that “with respect to volume rendering, perceptual studies are rather scarce” [29]. The existing few user studies that involve volumetric shadows always employ them as part of a more complex lighting scheme, and even then their purpose is limited to conveying depth information about translucent materials by increasing color dynamics of the rendered images [26, 2932]. We found no studies where volumetric shadows would have been used for positioning or interception tasks. As far as we can tell, our study is the first one to examine volumetric light shafts as spatial cues for aiming and understanding relative positions between objects.

2.2 User Experience and Gameplay

Earlier video game lighting research has explored how lighting color affects gameplay performance and emotion [33], and how lighting can be automatically adjusted to accommodate dramatic, aesthetic, and communicative goals of a game [34]. Only a little is known how different depth cues affect user experience (UX) in games. Here, we refer the term UX as a subjective experience that stems from the use of technology that is in our case gameplay.

The concept of presence, namely the sense of being in a mediated environment, is broadly studied in different media [35]. Both S3D displays [36, 37] and head tracking [38] have been found to enhance presence. Spatial awareness, attention, and realness/naturalness form the perceptual “Big three” sub-components of physical presence [39]. In addition to users’ perceptions, their evaluations of the provided action are needed to study UX [37, 40].

The concept of flow provides a good framework to analyze human activity in many contexts. Csikszentmihalyi [41] defines flow as a positive and enjoyable experience stemming from an interesting activity that is considered worth doing for its own sake. In the core of the theory are the cognitively evaluated challenges provided by the activity and the skills possessed by the respondents. Flow evolves when a person evaluates both the challenges and the skills as being high and in balance. More specifically, the antecedents of flow – i.e. evaluated skills and challenges in the situation – indicate the quality of user experience [42]. For example, whether participants are experiencing mastery (skills above the challenges) or coping (skills below the challenges). In digital games, the mastery situation is found to have a positive effect on the enhanced motivation to continue on playing and to play again [43]. Thus, the flow framework integrates emotional and motivational layers on top of the perceptual-cognitive evaluation process.

3 User Study

We organized a user study in which participants played a 3D video game in six different experimental conditions. We advertised the study via mailing lists and social media, and recruited a total of 35 participants from our university; students, researchers, and staff. Each participant received a movie ticket voucher after taking part in the study.

3.1 Game

The game used in the study was a simplified, video game version of wall tennis implemented with the Unity game engine. The player’s in-game avatar held a paddle racket in each hand, standing in front of an archery target with a diameter of 3 m. The rackets were represented and tracked with PlayStation Move controllers. The avatar was controlled via OpenNI library using a Kinect sensor.

The objective of the game was to strike virtual balls and hit as close to the archery target’s center (bull’s-eye) as possible. Hitting directly at the bull’s-eye gave 100 points. The number of awarded points decreased linearly to zero on a disk with a radius of 4 m, so that points were given even when hitting outside the archery target. Hit score was displayed on screen for two seconds after a hit was registered, so that players would strive to get better scores.

Our game had six different graphical modes that acted as test conditions in our study. Each condition had a unique set of depth cues, combining traditional cues and our volumetric shadows (Table 1). Screenshots of different conditions are presented in Fig. 2. Condition 0 acted as a baseline, as it had the least amount of depth cues, providing only perspective cues. Condition 1 represented the “industry standard” of VR graphics with surface shadows and S3D. In Condition 2 both rackets had an omnidirectional volumetric light source with a unique color. In Condition 3 each ball acted as an omnidirectional volumetric light source. Condition 4 presented a volumetric light source placed high above the playing area. Condition 5 had S3D; otherwise it was identical to Condition 3. Surface shadows in conditions 2, 3, and 5 were fainter than in conditions 1 and 4, due to the additional lights.

Table 1. Depth cues among the study conditions.
Fig. 2.
figure 2

Six graphical modes that were the test conditions of our user study.

We made two design decisions for easing the learning curve of aiming and striking: (1) There was no gravity in the physics simulation and striking a ball sent it on a linear path with constant velocity; (2) Balls interacted only with the rackets and the in-game avatar passed through them without effect. Bouncing from ground and walls was left enabled for facilitating a basic sense of physical immersion.

The game presented only one ball at any given moment. After striking a ball and registering a hit or miss, the ball would disappear. Before a new ball would appear, the player had to return to the center of the playing area, marked by a circular object. For our study this meant that the event of striking a ball became a repeatable trial with always the same outset, regardless of the participant, previously hit balls, or other conditions. The game had both static balls hovering in the air and balls that appeared from either side of the archery target, moving linearly through the playing area. In the latter case, the movement was constant with the speed of 2 m/s, which was determined to be challenging enough in our initial testing phase.

Natural Aiming and Depth Cues. As seen from Fig. 2, volumetric lighting adds a sense of depth in conditions 2, 3, 4, and 5 of the game. Volumetric shadows in conditions 2, 3, and 5 form a cone frustum that is aligned along the line between the ball and the racket. In Condition 2, the light source in the racket casts a surface shadow from the ball that can be used as a laser sight to aim the ball. A high-above volumetric light source in Condition 4 creates volumetric shadows that bind objects to their surface shadows on the ground, providing a natural depth cue alternative to the non-realistic position pegs of Glueck et al. [12].

3.2 Equipment

The game ran 50 frames per second on a computer with Windows 7, Intel Core2 6600, 3 GB of RAM, and Nvidia GeForce 8800GTX. The game was displayed on a 55” Panasonic TX-L55ET5Y television that could output S3D through passive, circularly polarized stereo-glasses. PlayStation Move controllers were connected to our computer via Move.me software running on a PlayStation 3 that was equipped with a PlayStation Eye camera. The coordinate system between Move controllers and Kinect was calibrated with RUIS library [44].

3.3 Environment

Playing area in front of the TV was marked with a 2 m by 2 m floor mat, from which the game could be played without stepping outside. Illuminance within that area varied between 235–320 lux, as measured with Konica Minolta Chroma Meter CL-200. An illuminance between 95–135 lux was measured by the back wall that was seen as background by Kinect and PlayStation Eye camera. The playing area center was at distance of 3.2 m from the TV. There the luminous intensity was measured to be 157 cd/m2, by pointing Konica Minolta Luminance Meter LS-110 at a white TV screen. The luminous intensity through the passive stereo glasses was 65 cd/m2.

3.4 Design and Procedure

We used a randomized within-subject design with an incomplete counterbalancing, where the participants were exposed to each of the six conditions in a random order. Every other participant was exposed to the conditions in a reverse order compared to the previous participant, for counterbalancing learning effect over the conditions [45]. We paid special attention that the participants would not discover our hypothesis or other study details before or during the study; we did not mention lights, shadows, depth, or aiming cues in our interview questions. Instead we let the participants report their findings in their own words, discussing only topics that they had brought up themselves.

Metrics. The following information was collected from each of the participants: vision, background, and subjective UX evaluations after each test condition. Because of the large number of test conditions, we attempted to keep the test procedure simple, but to still get as rich UX descriptions from the participants as possible. Thus, we used both qualitative interviews and quantitative scales to assess perceptions and actions related to UX.

After each condition, participants evaluated their “overall feeling” (0–10 Likert), “challenges of the task” (1–7 Likert), and their own levels of skills (1–7 Likert). Our idea with overall feeling was to give the participants a chance to freely rate each condition. It provided a good one-dimensional comparison for the multidimensional interview data. In the past, we have used challenges and skills measures to empirically present Csikszentmihalyi’s [41] four channel flow model [46]. We divided 2,182 participants based on their challenge and skill/competence evaluations and showed how participant’s cognitive evaluations affected their emotional outcomes in digital games. The flow-space thus formed enables us to evaluate whether participants are experiencing mastery or coping while conducting the experiment.

We have tested participants’ challenge and competence evaluations in experiments conducted in between-subjects design. Our results indicate, that if different groups of participants are engaged with the same digital game [47] or a task in a virtual environment (VE) [48], then they tend to evaluate the challenges of the situation similarly. However, participants playing the digital game in the laboratory evaluated their level of competence lower compared to those playing the same game at home. Gamers’ background data revealed that at home the gamers were more experienced with the game. This obvious finding validates the used flow-space measures.

In a VE, participants moving fluently evaluate their level on competence higher compare to those being stationary (that is, moving less fluently). In this case, the background data did not provide any explanation for the finding. It seems that some participants learned to utilize the novel interaction device in VE faster than the others. Both the subjective competence evaluation and objective movement data revealed the difference among the participants and again validated the flow-space measures.

In addition, flow-space provides valuable information about the experiential process. We have demonstrated how the UX in the same digital game evolves during the first hour of play [49]. Supplementing competence and challenge measures with the gamer background, game performance and subjective interviews provided us a detailed description of how the other gamer was ready to quit and the other was just warming up. Although an individual level inspection often turns into a qualitative analysis, this example further substantiates the added value of the flow-space in evaluating human-technology interaction and games.

The participants were interviewed and asked to describe the environment and the task. Different descriptions were compressed into 17 dichotomous variables. Each of the 17 variables had at least ten mentions among the participants. The 17 variables thus formed were analyzed in a correspondence analysis (CA), which is a multivariate descriptive data analytical technique for categorical data. CA shares similarities with the principal components analysis, which applies to continuous data. CA is helpful in depicting the relationship between two or more categorical variables in a 2-dimensional chart, summarizing and illustrating similarities and differences between categories and the associations between them [50].

Finally, the participants rated their feelings of Pleasure/Valence, Arousal, and Dominance/Control (PAD) using Self-Assessment Manikin’s (SAM) [51]. The PAD profile describes participants’ degree of valence, level of activeness of an emotion, and experienced sense of control. Flow theory [41] acknowledges PAD emotions as important flow factors.

Procedure. First the participants with corrected vision were asked to wear their eyewear if that helped them to see the TV better. Next we administered the standard TNO stereoscopic vision test [52], in order to determine how sensitive the participants’ stereoscopic vision was on a scale of 15–480 arc seconds.

Before starting, participants practiced the game for 5 min in a graphical setting that had only surface shadows and monocular rendering. They were instructed that the goal of the game was to strike the ball so that it would hit as close to the bull’s-eye as possible without bouncing it off from the ground or walls. The participants were also informed about tracking problems that would occur if the PlayStation Move controllers were turned away from PlayStation Eye camera’s view. This was a problem for those participants who were using the controllers like a tennis racket, swinging their hand backwards and swiveling the controller away from the PlayStation Eye camera before striking. The problem was pronounced when striking balls in the farther edges of the playing area.

After practice, the participants were exposed to six different conditions, each consisting of 30 trials of striking the game’s virtual ball. Half of the balls were static and half were moving linearly, and these two sets were constant for each participant and each condition, but the order of balls was randomized. Figure 3 illustrates the position of the static balls and trajectories of the moving balls. The static and moving balls had a varied distance from the ground and the moving balls’ trajectories were parallel to ground plane.

Fig. 3.
figure 3

Static ball positions are displayed with white circles, and moving ball trajectories with blue lines. The red square represents the playing area (Colour figure online).

Striking through 30 balls in each condition took on average 4 min. Participants wore the stereo-glasses even during the conditions that did not have S3D. After each condition, we interviewed the participant about the experience. These interviews lasted on average 8 min each. The mean duration for the whole experiment was 90 min.

4 Results and Analysis

When calculating study results, we included 30 people with stereoscopic acuity better or equal to 120 arc seconds from the total of 35 participants. This way the possible differences between S3D and monocular rendering would not be obscured by participants with poor stereoscopic vision.

4.1 Participant Demographics

Most of the 30 participants (90 %) were Finnish speaking students or research staff of the Aalto University. The mean age of the participants was 27.7 years (SD = 4.04 years). Participants were rather familiar with the commercially available 3D user interfaces (3DUI) for digital games: 27 % had used Sony Move, 43 % Microsoft Kinect, and 87 % Nintendo Wii. However, 23 % reported to have no experience on 3DUIs and most of them evaluated their level of experience either basic (63 %) or intermediate (14 %). No one reported to be an expert in 3DUIs.

Participants reported to play digital (i.e., computer or video) games with varying intensity: 10 % played at least once every two days, 27 % less than that but at least once a month, and 47 % infrequently. Only one participant reported playing daily and four reported that they never play digital games.

Regarding actual physical sports similar to our test set-up, majority of the participants had at least tried badminton (94 %), tennis (83 %), table tennis (91 %), and volleyball (74 %). Approximately 89 % of the participants were right-handed. Although 69 % of participants reported doing some sport often (but less than 50 % of days), only one of them was a frequent badminton player. All in all seven participants reported to play badminton at some frequency, which was the most popular sport among the participants with similarities to our game. Thus, our sample represented well the casual users of 3DUIs and did not include an over-representation of either hard-core gamers or hard-core racket/ball game players.

Participants included in the study consisted of 24 males and 6 females, and hence our results are more representative of the male gender. However, there were no significant differences between genders with regards to our UX metrics (Likert ratings and interviews).

4.2 Gameplay Performance Results

We explored statistically significant differences between conditions with the Kruskall-Wallis test, Friedman test, and a 3-way ANOVA whose factors were condition index, ball index, and subject index. Post hoc tests were applied with a p-value of 0.05 using Tukey-Kramer correction. Results were obtained with Matlab’s Statistics Toolbox.

Aiming Accuracy. We measured balls’ hit accuracy with two variables: (1) Distance of the hit location on the archery target plane from the bull’s-eye. If a ball had missed the target plane or hit the ground or side walls, its distance was set as the maximum distance that was recorded. By doing this, even the balls that missed the target plane were taken into account when examining hit accuracy between conditions with a Kruskall-Wallis test. (2) We used a 3-way ANOVA over all the balls where binary outcome of the hit event was the dependent variable. Static and moving balls were examined separately.

Figure 4a is a boxplot presenting the number of static balls that hit the archery target without bouncing from ground or walls. No significant differences were found with the 3-way ANOVA, Kruskall-Wallis, or Friedman tests.

Fig. 4.
figure 4

Number of (a) static and (b) moving balls that hit the archery target, struck by 30 participants. The conditions are sorted by medians.

Number of moving balls that hit the archery target can be seen in the boxplot chart of Fig. 4b. By comparing Figs. 4a and b it is clear that the participants were able to hit the target much more often when striking static balls. In fact, the participants missed many of the moving balls completely when trying to strike them (failed interception).

Significant main effect of conditions was found with Kruskall-Wallis in moving balls’ hit distance from the bull’s-eye (χ2(5) = 17.7, p = 0.003). The post hoc test showed that accuracy was worse in Condition 0 when compared both to conditions 1 and 4. These results were repeated by the 3-way ANOVA followed by a post-hoc test.

Interception Rate of Moving Balls. We measured interception rate with the number of moving balls intercepted by each participant under different conditions (Fig. 5). Interception rate was worst in Condition 0 as expected, followed by conditions 3, 5, and 2. Interestingly, interception rate was the highest in Condition 4, despite its monocular rendering. Condition 1 came second with its S3D.

Fig. 5.
figure 5

Number of moving balls intercepted by each of the 30 participants. The conditions are sorted by medians.

Significant main effect of conditions was found with Friedman test (χ2(5) = 18.9, p = 0.002). Post hoc test revealed that interception rate was higher in Condition 4 when compared both to conditions 0 and 3. No other statistically significant differences were found. These results were repeated when treating binary outcome of a moving ball’s interception event as a dependent variable and applying the 3-way ANOVA over all the moving balls, followed by a post hoc test.

Acquisition Time for Static Balls. We also examined static balls’ acquisition times; i.e. the elapsed time from each ball’s appearance till it was struck by the participant. We found significant main effect of conditions (F(5, 2651) = 19.9, p < 0.001) using the 3-way ANOVA. According to a post hoc test, acquisition time of Condition 2 was the longest (median of 3.1 s) of all the conditions (medians between 2.5 and 2.6 s). No other significant differences were found.

Kruskall-Wallis test revealed that those who reportedly used lighting cues for aiming (4 participants) had significantly better aiming accuracy (χ2(1) = 4.9, p = 0.027) in Condition 2 than those who did not use lighting cues (26 participants). The mean test order number of Condition 2 was 3.3 for the former group and 4.0 for the latter, and thus the group using lighting cues had had a little more practice with the task. This fact and the small group size means that we cannot say conclusively if using shadows for aiming improved accuracy in Condition 2.

4.3 UX Results

Based on our interviews, four participants out of 30 discovered and used the racket light sources’ shadows for aiming in Condition 2. Two additional participants made the discovery, but they did not continue to aim with the shadows as they did not find them beneficial. Four other participants mentioned that the rackets’ volumetric light sources improved spatial perception. In Condition 3 the balls acted as volumetric light sources, and 13 participants reported that this enhanced spatial perception. Condition 4’s high-above volumetric light source was mentioned by 12 participants to improve spatial perception.

UX Interviews. We calculated all UX results with SPSS, a software for statistical analysis. The 17 dichotomous variables representing participants’ perceptions about the environment were analyzed in a correspondence analysis (CA). CA provides orthogonal dimensions that are extracted in order to maximize the distance between row and column points [50]. We used participants’ perceptions as column variables, and the six lighting conditions as row variables.

Figure 6 presents the correspondence between the 17 perceptions and the six lighting conditions. We extracted two dimensions, with a proportion of 40 % of inertia to the dimension one and 28 % to the dimension two. The χ2[96] = 286,4, p < .001 supported the meaningful relationship between the row and column variables. Dimension one was named Diminished Sense of Depth - Exciting but Unnatural, and dimension two was called Interactively Fluent - Visually Distracting, according to the corresponding variables. As the descriptions show, participants integrate their perceptions and actions when they are asked to describe their interactive environment.

Fig. 6.
figure 6

Correspondence analysis showing the relationships between the 17 perception variables and the six conditions.

The inspection of the Fig. 6 shows that Condition 0 was perceived as spatially poor and difficult to play, mostly because of the lack of shadows. Condition 1 provided a positive 3D environment that was simple and sometimes even dull, in which playing was fluent and easy. Condition 2 was somewhat interesting and exciting, but for most of the participants the rackets’ volumetric light sources and their colors were distracting and unnatural. This and the somewhat confusing shadows in Condition 2 made playing less comfortable and more difficult.

The rest of the conditions were closer to each other experientially. Although some of the participants in Condition 3 perceived the ball’s volumetric light source and its colors positively, in most cases it diminished the spatial perception of the ball. This made hitting more difficult, especially with moving balls. On the contrary, volumetric light source in Condition 4 made hitting a bit fluent and easier, but still difficult. Some participants complained Condition 4 to be too simple and even dull. Because of the S3D, high spatial perception was mentioned more often in Condition 5 compared to Condition 3. However, the volumetric light in the ball divided participants’ opinions: some of them liked it but some thought that it distracted them. Problems with racket tracking were reported quite equally across all the conditions, while Condition 1 received the largest number of mentions. Tracking problems may have been ignored more in other conditions because of the other, possibly stronger perceptions that caught the participants’ attention.

UX Scales. First, we studied UX in the playing order of the conditions with General Linear Model Repeated Measures Anova (PASW Statistics 18, pairwise comparisons between conditions were applied with a p-value of 0.05 using Bonferroni correction). Overall feeling increased significantly as the experiment proceeded (Wilk’s Lambda = .33, F(5,25) = 10.10, p < .001, η2 = .67). Similarly, the evaluated skills increased (Wilk’s Lambda = .49, F(5,25) = 5.17, p < .01, η2 = .51). Since the participants evaluated challenges the same way throughout the test, they experienced a clear learning curve from coping towards mastery (Wilk’s Lambda = .45, F(5,25) = 6.01, p < .01, η2 = .55). The sense of control was the only PAD scale that increased significantly when the experiment proceeded (Wilk’s Lambda = .59, F(5,25) = 3.52, p < .05, η2 = .41). Notably, good overall feeling, skills, and sense of control were easier to obtain in some of the conditions than others, regardless of the order of the condition.

The six conditions were significantly different in overall feeling (Wilk’s Lambda = .31, F(5,25) = 11.10, p < .001, η2 = .69). Pairwise comparison shows that the overall feeling was significantly the lowest in Condition 0. Overall feeling in Condition 1 was higher compared to conditions 0, 2, and 4. Conditions 2, 3, 4, and 5 scored equally.

The flow-space (Fig. 7) shows that the participants were coping in Condition 0, which was rated as significantly more challenging than all the other conditions but Condition 2. Conditions 1, 2, 3, 4, and 5 did not differ from each other in this regard (Wilk’s Lambda = .54, F(5,25) = 4.30, p < .01, η2 = .46). Moreover, in Condition 1 the participants experienced mastery. The evaluated skills in Condition 1 were significantly higher than in conditions 0 or 2 (Wilk’s Lambda = .61, F(5,25) = 3.19, p < .05, η2 = .39).

Fig. 7.
figure 7

The means of the skills and challenges of each condition plotted in the flow-space.

PAD profiles show that participants were equally aroused across conditions (Fig. 8). Condition 1 was the highest in valence and conditions 0 and 2 were the lowest (Wilk’s Lambda = .24, F(5,25) = 15.73, p < .001, η2 = .76). There was no difference between conditions 3, 4, and 5 in valence. Condition 1 was also significantly higher in the sense of control compared to Condition 0 (Wilk’s Lambda = .60, F(5,25) = 3.29, p < .05, η2 = .40). The three dimensions of the PAD profile reveal how equal degree of arousal across the conditions affects UX differently if it is accompanied either with low control and valence (Condition 0) or high control and valance (Condition 1).

Fig. 8.
figure 8

The means and 95 % confidence intervals of the PAD profiles in the six conditions.

Interview-Scale Correlations. Finally, we correlated participants’ subjective descriptions and quantitative UX-scales across the six experimental conditions (Table 2). The descriptions were in line with the UX-scales deepening their information. For example, we can see that sense of depth, fluency, difficulty, sense of 3D, and lights, colors, and shadows help to constitute the generic “General Feeling” measure that is aligned with Valence. Furthermore, “Ordinary, simple, dull”, “Strange, unnatural”, Exciting, interesting”, and both distracting lights and shadows provided descriptions that no scale could map. The correlations also show that the Challenge scale measured the difficulty of the task. Although Arousal scale did not correlate with any of the descriptions, integrating both the flow-space and the PAD profile data shows that low skills combined with high challenges and arousal led to a frustrating and unpleasant experience (Condition 0).

Table 2. Pearson correlations between the subjective descriptions and quantitative scales across the six experimental conditions (N = 180). Significant correlations were found with a 2-tailed test at 0.05 level (*) and 0.01 level (**).

Taking all the qualitative and quantitative UX measures into account reveals how each condition was perceived, evaluated, and finally experienced. The UX and its causes are rather univocal in conditions 0, 1, and 2. Condition 4 has some clear characteristics of its own, but conditions 3 and 5 are difficult to distinguish. These subjective findings are in line with our gameplay performance results.

5 Discussion

Compiling the UX and gameplay performance data that we gathered gives a rich description of the UX and gameplay in our six different conditions. Analysis of the interviews revealed both perceptual and action dimensions. Moreover, the analysis revealed the “big three” physical presence dimensions, that is spatial awareness (diminished sense of depth), attention (visually distracting), and realness/naturalness (exciting but unnatural) [39].

Condition 0 lacked shadows and was the worst condition in terms of UX and gameplay performance results. In Condition 2 some participants used shadows to aim the balls towards the bull’s-eye, which contributed to the static ball acquisition time that was significantly the longest. Our results were inconclusive whether this aiming improved hit accuracy. Similar to Condition 0, the challenges and skills balance in Condition 2 was towards coping. Although the ratio between arousal and control was better balanced, playing was evaluated as uncomfortable and uneasy.

Participants achieved the best UX in Condition 1 with its typical VR setting of S3D and surface shadows; they experienced mastery (skills above the challenges) in the perceptually positive 3D environment. Condition 4 had volumetric shadows instead of S3D and it was the closest to Condition 1 in mastery and similar in other aspects: both received mentions about being fluent and easy to play, while being simple or even dull. On the other hand, conditions 1 and 4 had very different PAD profiles: equally high level of arousal was associated with lower valence and control in Condition 4, whereas in Condition 1 it was associated with higher control and valence. This difference might be related to the lower overall feeling in Condition 4.

Together conditions 1 and 4 were better or on par with the other conditions in gameplay performance results. No statistically significant differences were found between the two. Interestingly, Condition 4 had a significantly better interception rate compared to conditions 0 and 3, whereas the S3D-equipped Condition 1 did not. Since we did not have a test condition with monocular rendering and mere surface shadows from a high-above light source, it is not clear how much Condition 4’s volumetric shadows might have improved the result against mere surface shadows.

S3D was the only setup distinction between conditions 3 and 5. Usually S3D increases the experience of physical presence [36, 37], but we did not find any clear differences in either the UX or gameplay performance results. This implies that the S3D’s positive effect on depth perception in Condition 5 could have been diminished due to its volumetric lighting setup. Although the volumetric light sources used in conditions 3 and 5 were experienced mostly positively, it seems that they decreased the object-background contrast, thus diminishing the amount of depth cues and hindering spatial perception, which is possible according to a study by Schor and Wood [53]. We suspect that this is why the interception rate in Condition 3 was significantly lower than in Condition 4.

Conditions 1 and 5 were the only ones with S3D. There were no significant differences between these two conditions in quantitative UX or gameplay performance results. Mastery was experienced only in Condition 1 which received the most mentions about being fluent and easy. Condition 1 also had a unique PAD profile while Condition 5’s profile resembled that of conditions 2, 3, and 4.

These pairwise comparisons between conditions 1 and 5 and conditions 3 and 4 suggest the following: volumetric light source inside the target object could (1) negatively affect the UX and (2) possibly impair gameplay performance when compared to a high-above volumetric light source. Our results imply that the high-above volumetric light source is the best choice from the three different types of volumetric lighting setups of our game in terms of UX and gameplay performance results.

We found no significant differences between conditions when examining hit accuracy of static balls. It appears that the dominant depth cue with static balls was occlusion; participants often moved their hand in xy-plane until it was in front or behind of the ball and then adjusted the z-position until the ball was hit.

Our task of striking balls towards a bull’s-eye might not be optimal for eliciting depth perception related performance differences, although many motion games often have a simplified version of this task. Precise aiming in our game required the racket to be swung in 3D so that its collision with the ball would result in a trajectory towards the bull’s-eye. For this the participants had to sense the 3D location of the ball and the 6D pose of the racket simultaneously. We suspect that the positioning and scaling tasks from prior studies [2, 3] could have led to more clear results.

Color of lighting was notably different in conditions 2, 3, and 5. Due to the high number of conditions and already long experiment duration, we decided to focus on depth cues and ignored colors as possibly contributing factors.

5.1 Lighting Guidelines for Improving Spatial Perception

Based on our results and observations during the study, we composed a short list of guidelines to aid lighting design in 3D applications where spatial perception is important:

  1. 1.

    Objects should be well contrasted against their background. Volumetric lighting and other lighting techniques can reduce this contrast and make it difficult to clearly distinguish visual border of an object (conditions 3 and 5).

  2. 2.

    Surface shadows that are meant to improve depth perception should be clearly visible. Additional illumination such as that of volumetric lighting can weaken these shadows (conditions 2, 3, and 5).

  3. 3.

    Two moving light sources can distract the user and negatively affect the UX (Condition 2). This is in line with a study by Hubona et al. [2], who reported that two light sources can impair task performance.

6 Conclusion

In this paper, we introduced novel lighting cues that can be used to assist reaching, interception, and aiming tasks, as well as enhance spatial perception. The cues are natural and blend into the rendered images because they are based on realistic rendering of volumetric lighting. This offers an alternative to traditional visual guides that are augmented over images and may appear out of place.

We presented a user study with 30 participants where the lighting cues were tested. Our results indicate that volumetric shadows can affect gameplay performance and UX positively or negatively, depending on the lighting setup. Statistically significant differences in our gameplay performance results imply that volumetric shadows can affect depth perception. A high-above volumetric light source with monocular rendering (Condition 4) did not differ from our best S3D setup with mere surface shadows (Condition 1) in terms of gameplay performance. Conversely, Condition 4 had a significantly better interception rate when compared to two other conditions whereas Condition 1 did not. Further studies are needed to quantify how much volumetric shadows can increase depth perception in monocular and S3D conditions when compared to surface shadows.

We analyzed UX with CA, Likert-scales, and flow-space metrics. Nearly half of the participants reported enhanced spatial perception in conditions with volumetric lighting. The use of volumetric light sources in our game divided the study participants’ experiences however: some were pleased with exciting and interesting lighting conditions while others were distracted by them. We found indications that a poor choice of volumetric lighting could diminish S3D’s positive effects on UX and depth perception. Overall, the most pleasing game experience was achieved with S3D and surface shadows (Condition 1).

Our study sets a starting point for further research on volumetric shadows as visual guides. Future studies need to confirm our findings for applications with first-person viewpoints. Moreover, future work should explore different aspects of UX. For instance, the concept of self-presence and its three subcomponents, namely proto (body-schema), core (emotion-driven) and extended (identity-relevant) [54] provide a noteworthy addition to physical presence and cognitive-emotional flow measures used in this study.