Keywords

1 Introduction

The art of scaring ourselves in the form of horror fiction has enticed people throughout time and across cultures. Horror is a popular theme in videogames, and one that is in itself enriched by the interactive nature of games. Game designer and writer Richard Rouse argues that it is only logical that videogames and horror are a perfect match since many of the classic horror elements lend themselves well to this interactive medium [47].

Games create an experience for the players, intensifying horror through gameplay [40], a feature that has attracted a lot of research into the emotional effects, and the impact of horror-gaming on videogame audiences.

Taking into consideration the popularity and appeal of horror in videogames, it is no wonder that, with the advent of consumer-ready Virtual Reality (VR) in 2016, a myriad of horror experiences are being developed for this medium [6, 17, 52]. Alongside disempowered protagonists, sinister atmosphere, and claustrophobic environments, the ‘jump-scare’ – a sudden effect meant to startle players – has been a common ingredient in the development of horror games [42]. The excessive use of this technique in many current VR horror titles has, however, sparked concerns among developers and researchers alike [8, 50]. “When the headset is on there is seemingly no escape. Do developers take into account the psychological differences between previous gaming horror experiences and that of VR?” [32].

The general intuition among game developers is that playing in VR is more impactful and effective at eliciting certain emotions than an experience on a conventional screen. Particularly, horror experiences could be too intense when wearing a head mounted display. Taking the headset off to break the spell of the virtual space takes more effort than with traditional gaming interfaces; the illusion of being physically in the virtual space makes any fictional threat feel quite real [16]. VR offers a fascinating format for exploring the horror genre, which is why it is important to understand the medium and its effects.

While the literature suggests that VR has the potential to elicit strong emotions in players, there is a lack of comparative studies offering empirical evidence. Additionally, previous research on the topic of fear and videogames has mostly focused on the emotional effects of a single medium. This paper presents the adapted design and implementation of a horror game as a research tool, and the subsequent study in which it was used to identify and compare players’ experiences. We examined the differences in emotional responses elicited by playing the same horror game in room-scale Virtual Reality (indicated as VRc, or VR condition) and on a conventional flat screen monitor (indicated as SCc, or screen condition). Being aware of the existence and impact of these differences between mediums can provide insight for future empirical studies, and be used by VR horror game developers to make informed decisions for the implementation of their ideas. Furthermore, there is an interest in using psycho-physiological measures in game studies, as well as game development, to gain unbiased data on emotional experiences. Inspired by similar previous studies [3, 29] and related literature [39], this study was performed with the following research question and hypotheses in mind:

What are the self-reported and measurable psycho-physiological differences in fright responses when comparing the same game experience played in two different mediums?

H1: Participants will report experiencing higher levels of fear in VRc than in SCc. Psycho-physiological measures will indicate increased emotional arousal (corresponding with fright responses) in VRc.

H2: Participants will report experiencing less or no fear on re-play, regardless of the medium. Psycho-physiological measures will follow the same pattern, showing lower levels of arousal on re-play, regardless of the medium.

2 Theoretical Foundation

The development of the game used in this study was based on an understanding of the characteristics of horror fiction, and within the medium of videogames in particular. Inspired by Perron’s concept of survival horror as an ‘extended body genre’ [40], we made the connection with Gregersen and Grodal theories about the embodied experience of videogames [14]. This link lead us to review the literature regarding the processes of embodied emotions, particularly concerning fear and the sensation of spatial presence, which informed the design of the research experiment.

2.1 Horror

Supernatural horror opens up a lot of possibilities when it comes to game design, especially when having to develop a consistent and believable world. Horror’s long roots in folklore give ample opportunity to build on existing antagonists, such as werewolves or similar monsters, that our basic instinct associates with dangerous predators. Its implicit association with darkness, or obstructed scenery brings forth an instinctual fear in most people, inspiring a sense of vulnerability and uncertainty. Moreover, the fact that supernatural elements are common in horror allows for interesting and different game mechanics without them being inconsistent with the game-world [47]. We can say that ‘habituation’ and ‘knowing’ largely decrease the potential for horror to elicit strong emotions of fear, and that the unpredictability of a fright-inducing experience is part of the thrill that makes us seek the experience in the first place [23, 41, 54].

Horror, a genre that is primarily defined by its intention of transferring the physical reactions associated with the emotion of fear, is considered a ‘body genre’ [58]. Perron [40] expanded on this notion with the concept of ‘extended body genre’ when referring to videogames. In the same line, Gregersen and Grodal [14] argue that “interacting with videogames may lead to a sense of extended embodiment, ...where one experiences both agency and ownership of virtual entities” due to an interactive feedback loop where multi-sensory and proprioceptive systems are being activated. Gerrig and Rapp indicate that Coleridge might have been wrong when he coined the popular term “suspension of disbelief”. When taking into consideration the psycho-physiological processes involved in our emotional experiences with media, the case is more akin to a “willing construction of disbelief”. Audiences must engage their conscious cognitive processes in order to reject and contextualize, while their automatic processes are being activated by the stimuli provided by media [13].

2.2 Embodied Emotions

Fear belongs to what cognitive theorists call ‘basic emotions’. Neurocognitive studies, such as those conducted by LeDoux [26], indicate that basic emotions are processed by a fast pathway through the limbic system, while cognitively evaluated secondary emotions (e.g. thrill and excitement) emerge by means of consciousness mechanisms processed by the slow pathway through the frontal lobes. Fear is a multifaceted emotion with associated action tendencies and physiological responses, automatically activated by specific perceptual triggers (real or otherwise). Once the fear module is activated, it requires conscious effort to cognitively evaluate the emotional experience in order to influence and regulate behaviors [37, 38].

Consciousness mechanisms are an important notion to take in consideration when studying the emotional effect of horror-inducing media. Grodal [15] makes the case that the field of cognitive psychology provides a useful vantage point to study and describe videogames. Our brains did not exactly evolve to experience emotionally-charged fictional horror in film or game form, and thus, the automatic fast pathway reacts to mediated stimuli very similarly to real stimuli. One of the proposed reasons about why our response to mediated horror is less intense than to real-life situations, is that we are able to evaluate emotions cognitively and to correctly attribute bodily changes to more or less controllable external sources [37]. This processing allows us to be able to ride a roller-coaster, watch a horror film in the cinema, or play a scary game in VR. The frequency, persistence, and level of emotional charge of these mediated stimuli, influence how much of a strain it is for the consciousness mechanism to assess and regulate responses [15].

Previous studies about fear in the context of videogames have combined different methods in order to establish a connection between emotions, physiological data, and player behavior [36, 53]. The physical component, these bodily responses associated with horror-gaming, can be measured using sensors capable of recording fear-related physiological arousal [19, 20]. Although opinions are divided on the use of ‘biometrics’ to measure fear in games, with some researchers suggesting that only jump-scares can be adequately measured, this study is an attempt to capture another kind of fear, namely ongoing suspense.

A review of 134 publications assessing biophysical patterns found that it is possible to differentiate between basic emotions based on autonomic nervous system activity [22]. Increases in EDA and decrease in temperature are some of the most useful indicators of fear-related responses [4, 5, 9, 10, 25, 28]. Despite the plethora of studies, there is no golden standard when it comes to accurately mapping physiological responses with discrete emotions [31]. For this reason, to elucidate the specificity of the emotional valence, it is necessary to take into consideration the context (i.e. a horror-gaming situation in an experimental setting), the self-reported measures, and the qualitative data gathered from interviews. When used in combination with self-reporting qualitative methods, physiological measures can provide a more rigorous portrayal of the player’s emotional experience [30]. This is even more important when we take into consideration that some people might repress their emotional reactions to fright-inducing media in their self-reports [53].

Prinz’s theory on embodied emotions proposes that “emotions are perceptions, and they are used to perceive our relationship to the world” [43]. It also states that emotions are “perception of affordances”, allowing us to perceive what a situation affords regarding behavioral responses. Neuroscientific discoveries supplement this theory with the addition of cognitive appraisals as part of the emotional process, explaining complex emotions [26]. These premises connect with the concepts of spatial presence and immersive technologies, as well as their emotional impact.

2.3 Spatial Presence and Immersion

The term immersion and spatial presence are frequently brought up as a central component regarding emotional response in videogame players, especially in regards to VR. Lynch and Martins [29] argue that the interactive elements, characteristic of videogames, are the possible cause of an enhanced state of presence and immersion, and that this seems to be a key component when participants reported feeling more frightened. The definitions of the terms are not without contention in the academic world, including some cases where the two terms are being used interchangeably. Ermi and Mäyrä [12] argue that immersion is a term describing the experience felt when becoming involved in, and giving attention to a mediated experience, and its ability to stimulate imagination, challenge us or stimulate our senses.

Willans [57] builds on Prinz’s theory of embodied emotions in his argument for spatial presence as a perceptual emotion. This perception is affected by our interpretation of environmental stimuli such as sight, hearing, touch, proprioception, balance/motion, smell and taste [18]. Spatial presence in VR is evoked when stimuli from the virtual environment overpower the stimuli of the real world environment enough to trigger the emotion of ‘being there’ [57].

Virtual reality is a powerful affective medium that affords a high sense of presence [45]. With the introduction of room-scale VR, which allows natural locomotion in a dedicated room-sized space, the virtual environment feels even closer to reality [59]. This heightened sensory immersion, particularly when experiencing room-scale VR, can amplify the effects of horror elements [32] and enhance the sensation of spatial presence [57].

3 The Game

In order to have a high level of experimental control, we created the research stimulus based on the design of an existing commercial game: P.T. — the standalone playable teaser of Silent Hills [21]. The virtual space in P.T. features a series of perceptual triggers that effectively activate the human fear module [38], moreover, this title has been recognized by game critics as one of the most compelling horror game experiences in recent years [1, 51]. Adapting an existing commercial game made our research tool closer, experience-wise, to real-life gaming situations, contributing to higher levels of ecological validity [34]. Additionally, this approach allowed us to minimize the impact of confounding variables, and to log in-game events.

Fig. 1.
figure 1

Game layout as seen from above (left) and in-game screenshots (right). A non-euclidean solution creates the illusion of the virtual space being bigger on the inside. This allows for areas (e.g. bathroom) to occupy the same space as the other rooms.

Table 1. Overview of the individual game loops. All letter indications are based on Fig. 1. Note that the player transitions into the next loop by walking through the loop door and traversing the corridor in the order A \(\rightarrow \) B \(\rightarrow \) C. Walking in the other direction does not trigger the next loop.

3.1 Design and Implementation

Using the game engine Unity, we developed an adaptation of the game that could be played in room-scale VR, as well as non-VR. Since designing for room-scale VR is a more complex challenge due to its unique set of constraints and considerations (e.g. natural gestures and locomotion, playing within a room with a fixed size, etc.) [49], we designed the VR version first. The other version was developed afterwards by introducing changes in the design that made it possible to play the game with a gamepad controller in front of a PC monitor. We acknowledge that this introduces a certain amount of bias in our study towards VR, since the level layout and the interactions were designed with the constraints of room-scale VR in mind. Nevertheless, we deem the resulting PC version close enough in terms of experience, pacing and mechanics to other similar horror games, including the source material.

To avoid introducing confounding variables beyond the different types of control schemes and output methods for each condition, the two game versions were essentially the same, differing only in terms of control and field of view. The levels of the game world were designed in such a way that the player moves through the physical play area and the game world at a 1:1 ratio, meaning that the size of the playable VR area correlates exactly with the size of the game world. The simplicity of the level design and architecture in the source material lent itself well for this room-scale VR adaptation, where the game world is confined to the play area space. The level geometry and size of the game world were modified accordingly to fit the virtual space in a 3.15 by 3.3 m VR playable area.

Besides adjusting the level design of P.T., the game used in this study has a modified sequence of events and content. The game levels are referred to as loops (see Table 1 for descriptions). Although a virtual body representation seems to increase the reports of spatial presence when compared to having no actual body representation in the virtual world, it can also be disturbing for players if the movements of the virtual body are not aligned with theirs [48]. Considering that tracking a full body in VR is currently not technologically feasible, we avoided implementing a virtual body representation instead of having something that would disturb the players [49].

Due to practical constraints in VR development, as well as the goals for the experimental design, we did not implement content that (1) was intended to startle players with sudden scare effects (jump-scares), (2) had progression puzzles that would cause the duration of the test to be unpredictably long, and (3) would likely induce simulator sickness.

4 Pilot Test

A pilot test was conducted to refine the data gathering tools and the test procedure, and to identify possible issues that could hamper the quality of the research experiment [24, 56]. A prototype of the VR version of the game was used to test the procedure and gameplay. This version of the game only included loops 1, 2, 4, 7 and 9. A total of 26 participants were part of the pilot test, 2 females (8%) and 24 males (92%). One of the objectives of the pilot test was to assess the ecological validity of the stimulus material, since ‘jump-scares’ and gating mechanisms were removed in the process of adaptation. Overall, participants reported being entertained and as scared as in similar experiences. As a result of the pilot-test the function to turn off the flashlight was removed, as it confused players. The behavior of the ghost character in loops 4, 7, and 8, was adjusted, and an audio intro was added to serve as mood-setting. Additionally, bugs were identified and dealt with. We handed participants an early version of the questionnaire and conducted an interview after the play-session. As a result, we shortened both the questionnaire and interview, and modified the phrasing of some items.

5 Experiment Design

The experiment ran for six days with a total of 29 participants. Two versions of the same horror game were used as stimulus material. Participants engaged with the stimulus material twice, once in each medium. The order of the test conditions alternated between participants; half of participants experienced the game first in VRc followed by SCc, and vice versa. The research process was based on a mixed-methods approach, combining game metrics, psycho-physiological measurements, observation, interviews and questionnaires. This approach was chosen to explore various angles and to gain as complete a picture of differences in player experience between both conditions as was possible within the scope of the study.

Fig. 2.
figure 2

Schematic showing the testing environment and setup.

5.1 Equipment

The same laptop, with specifications that are considered sufficient by the developers of the VR equipment, was used for both conditions, as well as ear-enclosing headphones with sonic isolation. An Empatica E4 wristband sensor was used to record psycho-physiological responses. The sensor was chosen as a compromise between level of invasiveness, the need to measure participants in motion (specifically in the VRc), and budgetary constraints. A recent study [44] compared the E4 sensor to a laboratory sensor and found it to be as accurate for the purposes of emotion recognition. Another study [33] compared the sensor to a portable electrocardiogram (ECG) and found that the ECG performed better 5% of the time. While this suggests an improvement over the E4 sensor, we argue that even with a difference in data quality, the Empatica sensor can provide valid results, especially given that fear-related physiological responses are relatively strong compared to other emotional responses.

VR condition (VRc): The HTC Vive is comprised of a headset, two wireless controllers, and two base stations enabling 360 \(\circ \) room-scale tracking. The tracking area in this test was 3.15 by 3.3 meter. The Vive includes sensors in the headset and wireless controllers that pick up infrared signals from the base stations to track the position and movement of the player inside the VRc area (see Fig. 2). For the VRc, participants were instructed to mind the cable connecting the display device to the machine running the game; basic instructions regarding wearing the headset and holding the controllers was also provided. Participants used natural locomotion to navigate the virtual space of the game, completing a total of nine clockwise loops around the play area.

SC condition (SCc): The participants were seated at a desk where they played the game using a 17 in. monitor connected to the laptop and an Xbox 360 controller. Instructions were provided regarding the controller and its button functionality, and a printed guide was available for reference.

5.2 Questionnaire and Interview Data

The questionnaire used in the experiment consisted of three parts: pre-experiment, post-session, and post-experiment, each of which was administered at a different time during the experiment. The pre-experiment section was administered as a structured interview, and consisted mainly of items that sought to profile participants’ familiarity with, and disposition toward horror games. Participants were also asked about their experience with different types of VR headsets with the purpose of addressing the influence of novelty-excitement during the VR version of the experiment. The post-session part consisted of nine closed questions where participants were asked to rate different aspects of their experience on a unipolar 10 point Likert scale. It was administered twice, right after the participant finished each of the test conditions. The post-experiment section followed the second post-session questionnaire, and was mainly aimed at gathering demographic information. The experiment ended with a semi-structured interview, the goal of which was to gain insight into the participant’s subjective game experience, as well as provide context to the participant’s measured responses.

5.3 Game Metrics and Psycho-Physiological Measures

Data recorded from the game included the player’s head position and rotation (with a frequency of 10 Hz), and gameplay events. Position and rotation information was contextualized by recording the room and loop the player was in at a given time. Game events were single occurrences that typically consisted of doors opening and closing, lights turning off and on, and sound effects. Each entry into the log was recorded with a time stamp and logged chronologically.

For psycho-physiological measures, the E4 sensor was used to record heart-rate (reported by the device based on the blood volume pulse and captured via a PPG sensor), electrodermal activity (EDA), peripheral skin temperature, as well as time stamps for synchronization of data through an event mark button. Heart rate (HR) is reported at 1 Hz, while EDA and skin temperature (Temp.) are reported at 4 Hz. For these measures we established baseline readings for each participant (described further in the next section), which were used to control for idiosyncratic differences in participants, as well as differences in activity between test conditions. The raw sensor values were processed to express the following measures:

  1. 1.

    Median: The median of the measure, indicating the ‘typical’ value of the measure over a given time frame.

  2. 2.

    Median Absolute Deviation (MAD): The MAD of the measure, indicating deviations over a given time frame from the data’s median. This measure is meant to capture consistent fluctuations, as MAD should be less influenced by few temporary outliers.

  3. 3.

    Slope: The linear polynomial of the measure over a given time frame, indicating a general trend and steepness of a measure.

  4. 4.

    Travel: Aggregate of absolute differences between individual measures over a given time frame. This measure is meant to capture both continuous and temporary fluctuations. The measure is divided by the amount of measures to make it comparable.

  5. 5.

    Onsets (EDA only): Indicates the amount of peaks (phasic skin conductance measurements) with a threshold of \(0.01\) \(\upmu \)S (processed through Ledalab using Continuous Decomposition Analysis [2]). Peaks are divided by the amount of measures to make them comparable.

Data from the game and the sensors was processed before subsequent evaluation by removing outliers (values larger than \(3 * MAD\) [27]). Sensor data was further pre-processed with a Gaussian filter (window width = 4 * measure frequency), and brought into context to a participant’s baseline measure. Each of the sensor measures therefore directly expresses an in- or decrease of the baseline in percent, with the exception of ‘Slope’, which is reported as change of a measure per minute.

6 Procedure

Participants were recruited by use of convenience sampling through social media, physical flyers, and word of mouth. People with phobias that could be triggered by the contents of the game, health issues that might get triggered by flashing lights or sudden emotional distress, as well as those lacking basic game-playing experience were excluded. The test area for the experiment was equipped with AC units, keeping the temperature at 21 \(^{\circ }\)C, to minimize the impact on skin temperature measures. All conditions in the room (i.e. light conditions, closed windows, doors, noise levels, etc.) were kept the same throughout the experiment sessions. The experiment procedure was divided into four stages, as shown in Fig. 3: Briefing, first play (VRc or SCc), second play (using the remaining condition), and interview/debriefing. Experiments alternated between VRc and SCs as the starting condition for each participant. The briefing stage started with introductions, followed by participants reading and signing the consent form, and the opportunity to ask questions regarding the form and the test. After this, a pre-experiment questionnaire was administered to establish prior experience with horror games and VR, and the participant was fitted with the E4 sensor.

A baseline recording for the psycho-physiological measures was taken before playing each condition, which functioned as a base value against which the measurements taken during each test were compared. In VRc this meant that participants walked, at a leisurely pace, within the VRc area, counting out loud until 80. For SCc, the procedure for baseline recording was done with the participant counting until 80 while seated. This approach was chosen to establish a baseline under physical conditions similar to those required in the play-session minus the stimulus material. A test conductor then explained the controls of the respective version of the game, and helped the participant with the necessary equipment. Participants were also reminded that the test could be terminated at their will at any time, and informed that the test conductors would not talk to them during the play-session (an exception to this rule was made whenever tracking problems occurred). Once the participant was ready, the test conductor initialized the game. Once each of the play-sessions concluded, the post-session questionnaire was administered.

After the participant had finished both testing conditions and post-session questionnaires, they were asked to fill in the post-experiment section of the questionnaire as well and participate in a semi-structured interview. Both test conductors were part of the debriefing and interview process. Audio recordings of the interviews were made for later analysis. To conclude the experiment, participants were given the option to ask questions, and were handed a formal debriefing information sheet.

Fig. 3.
figure 3

Flowchart illustrating the sequence of the experiment with alternating starting conditions.

7 Data Analysis and Results

The experiment was conducted with \(N=29\) participantsFootnote 1, 31% of which were female. 89.6% of participants were age 18–34. While all participants played both conditions, 6 participants did not fully complete playing the VRc. All participants completed the SCc and completed the VRc at least up to the 5th loop. The majority of participants had played or had watched others play horror games, and 43% had played (or had watched others play) P.T.. Two thirds of participants (61%) had some experience in game development, and 75% had tried some form of VR before.

Fig. 4.
figure 4

Questionnaire results for Q1

7.1 Questionnaire

When asked whether participants expected sudden scare effects (‘jump scares’), 75.9% answered ‘Yes’. When directly comparing both test conditions at the end of the experiment, 79.3% ranked VRc as scarier than SCc (BF\(_{10}\) = 4301.85, against 0.5). A closer analysis of this question, reveals that if the VRc was first, participants unanimously chose VRc as scarier than SCc. However, when the SCc was first, only 64.2% chose VRc as scarier, while 14.3% reported both condition as equally scary, and 21.4% expressed that the SCc was scarier. Additionally, 27 participants (93.1%) indicated VR as their preferred condition. After playing each condition, we asked participants to rate how scary they found the experience (Q1) on a scale from 1 to 10 (see Fig. 4). When directly comparing conditions for scariness ratings, there was no significant difference between conditions as the first experience. Looking at differences between conditions on the second play, however, SCc was rated significantly lower (Mann-Whitney U = 33, N1 = N2 = 14, SCnd vs. VRnd, p = 0.002753 two tailed; alpha level 0.05).

7.2 Interview

Interviews were transcribed and coded through a mixed approach. The coding scheme was produced by deriving codes from the literature and from data-centric themes [7].

Participants’ reports centered around the influence of ‘expectations’ and ‘knowing’, as well as the experience of spatial presence, movement, interactions, and embodiment. Participants generally expected the game to contain (more) ‘jump-scares’, due to the genre. Some also expected more traditional horror-game gameplay and interaction. These expectations were built on participants’ previous experience with the genre and had a influence on their first condition play-session.

Participants reported that ‘knowing’ had a big influence on how scared they felt, with participants that experienced VR as the first condition citing ‘knowing’ as the reason they felt little or no fear during the SCc play-session. This was not the case for participants who experienced SC as first condition. Although ‘knowing’ still was cited as reducing the experience of fear in their VR play-session, most participants still found VR to be the scarier condition regardless. They stated that feeling spatially present in the game influenced these feelings greatly (e.g. “I knew what was going to happen, but some things that had already happened still had a bigger effect on me in the VR than with the monitor. That door effect – Bathroom door loop 5 – was the one that affected me the most.”)

Spatial presence was a big factor when participants reported on why they felt scared in VR. Some expressed that in VR they could not look away from the game, and therefore felt more a part of the game itself rather than an onlooker. The visual isolation in VR added to the feeling of spatial presence as well as to their experience of fear (e.g. “I completely forgot for a few seconds where I was, and what I was doing, and I thought that I was actually in that situation, and obviously that is something that for me will throw me off completely and scare me.”). This greater sense of embodiment within the game was also reflected in the way they reported movement, interactions, and visceral reactions to the game in VR.

Table 2. Results of game metrics for both conditions combined, and for 1st and 2nd play only. M\(_{condition}\) shows the mean of a measure per condition; BF\(_{10}\) is the result of a Bayesian T-Test. Significant results are bold.

7.3 Sensor Measures and Game Metrics

Game metrics (shown in Table 2) and sensor data (shown in Table 3) were analyzed for statistically significant differences between test conditions. Bayesian T-Tests were conducted (Cauchy prior width 0.707 [55]) for all sensor and game metrics, using paired samples for first and second play sessions combined, and independent samples for analyzing differences in the individual play sessions. The Bayesian T-Test returns a Bayes Factor BF, with the notation \(BF_{10}\), indicating the assumption that a given hypothesis is not equal to its null-hypothesis. A \(BF_{10}\) of 1 indicates an equal chance of tested conditions being different as opposed to them being similar. A value lower than 1 indicates that the null-hypothesis is more likely; meaning that the Bayesian T-Test can be used to confirm the null-hypothesis rather than only reject it [46]. Note that we consider results with a \(BF_{10} > 3\) or \(BF_{10} < 0.333\) significant, that is, instances in which the likelihood of difference (or similarity) is at least 3 times higher than its inverse possibility.

The analysis of sensor and game metrics is based on varying sub-samples due to some participants not completing all game loops, as well as due to the removal of outliers for a given measure. It should be noted that outliers were removed both from the raw data of a participant, as well as from the processed measures across participants. In general, the sub-sample size for SCc was \(n\approx 28\), and \(n\approx 27\) for VRc in combined session results (with the exception of ‘play duration’ where VRc was \(n\approx 22\)). For the analysis of first and second play sessions, the sub-sample size was \(n\approx 14\) for SCc and \(n\approx 13\) for VRc (\(\approx 11\) for ‘play duration’).

Apart of the sensor measures shown in Table 3, onsets were evaluated for EDA measures, with the result \(\Delta M_{SCc}=-0.023\), \(\Delta M_{VRc}=0.087\), \(BF_{10}=14.44\) for both play sessions combined. For first play: \(\Delta M_{SCc}=-0.007\), \(\Delta M_{VRc}=0.097\), \(BF_{10}=3.32\). And for second play: \(\Delta M_{SCc}=-0.03\), \(\Delta M_{VRc}=0.053\), \(BF_{10}=7.19\). This means that the conditions differ significantly in regards to the amount of measured EDA onsets.

In addition to the analysis of play sessions as a whole, results from the individual loops were explored to discover which game loops had the most impact. In terms of game metrics, loop 8 lasted significantly longer in VRc than in SCc (\(BF_{10}=31.59\)). Differences in camera rotation are mostly impacted by the loops 0, 2, 5 and 6, all of which are higher in VRc. For HR, loop 8 shows a significant difference for MAD (\(BF_{10}=118.9\)) and travel (\(BF_{10}=39.04\)), both of which had a lower mean in SCc than in VRc. EDA is generally most influenced by the loops 3–8, with the most significant differences in loops 5 and 7. For temperature, loop 5 had the most pronounced impact (slope \(BF_{10}=317.4\), lower in VRc) on the overall measure.

Table 3. Results of the sensor metrics for HR, EDA, and skin temperature. \(\Delta M_{condition}\) shows the mean of a measure per condition in % based on the corresponding baseline measure (e.g. \(-5.0\) is 5% lower than baseline); BF\(_{10}\) is the result of a Bayesian T-Test. ‘Slope’ is shown directly in measure change per minute, and does not use a baseline. Horizontal rows separate measures of both conditions combined, 1st play only, and 2nd play only. Significant results are bold.

8 Discussion

The intent of this study was to examine the differences in player experience, especially regarding fright responses, when comparing the same horror game in two different setups. We expected to find participants reporting higher levels of fear in VRc than in SCc, and for these reports to be backed up by psycho-physiological measures (H1). When asked to compare the two experiences directly in the post-experiment questionnaire, participants reported that the game was scarier when playing VR, regardless of it being the first or second test condition. Previous experience with VR headsets, or lack thereof, had no significant influence on these outcomes. The results of the questionnaire are in line with comments made during the interviews, with most of the participants reporting that they found the experience scarier in VR. Reasons for this related to feeling a strong sense of ‘being there’ in VRc, and that this made the overall VR experience more intense. This notion of spatial presence was also mentioned by those that found SCc to be scarier, with them stating that although they did not find the VRc to be scarier, they did ‘feel it more’ physically. However, when asked to rate the experience individually after each session, only a marginal difference between first conditions was found.

Regarding psycho-physiological data, significant differences were found in EDA and skin temperature when considering the second session, corroborating the results from the questionnaire. No significant difference was found between conditions when participants played the first condition (experiencing the game as new). During the second play, however, participants showed significant measures between conditions, suggesting VRc continued to cause intense emotional responses despite players being familiar with the game at this point. This result contradicts our second assumption that participants would experience less or no fear responses on re-playing the same game, regardless of medium (H2). Additionally, questionnaire responses showed that the order in which the conditions were played had a significant impact on how scary participants rated the game. No significant differences were found in the ratings for VRc as the first or second condition. In contrast to this, we did see significant differences between SCc as first and second condition. Participants elaborated on this in the interviews, stating the game was significantly less scary when playing SCc after VRc, while VRc was still considered to be scary even after playing SCc first. A common reason cited for this difference was that knowing the sequence of events and being familiar with the game influenced how intensely fear was felt during second play-sessions. Participants playing VRc first reported experiencing less fear in the second condition as they ‘knew what was going to happen’. This knowledge had less impact on participants playing VRc second, with participants reporting that although they knew what was going to happen, the experience in VR was still scary. Spatial presence and the physicality inherent of VR was cited as the main reason for the VRc being more impactful and intense despite knowing what would happen.

Knowing the full extent of ‘any danger’, consequences and all, will reduce anxiety and fear [35]. We expected that participants would become habituated to the game during their experience with the first condition, and that the intensity of their emotional response would therefore diminish. The results show that habituation had an effect on the intensity of fear measured and reported, but that the effect is only noticeable when switching from VRc to SCc, but not vice versa. A possible reason for the decline in fear response from VRc to SCc could be that the game has no randomized events, minimal interaction options, and no death scenario. The knowledge that there are no consequences for the participant’s actions seems to increase the habituation effect when playing the game on SC as second condition. This habituation, however, does not seem to have any significant influence when looking at the responses from the participants who had SC as first condition and VR as second condition.

In addition to analyzing differences over the duration of a play session, we also considered observed differences within particular loops. Differences were observed in EDA values (median, travel, and onsets) for loops 3 through 8, with values being significantly higher in VRc. Abrupt increases in EDA (onsets) are typically associated with short-term events and occur in the presence of distinct environmental stimuli [11]. We therefore consider it likely that these differences were caused by the events in those loops (e.g. the sudden closing of a door, changes in lighting conditions, and audio cues). This suggests that changes in audio and light conditions might have a greater effect in VR. It can also indicate that the sensation of spatial presence afforded by VR enhanced the immediacy of a perceived threat. This corresponds to self-reports of players getting startled by the encounter with the ghost in loop 4, specifically mentioning that the confrontation had more of a ‘physical’ impact in VR, as well as other events, e.g. to the bathroom door opening in loop 5.

Game metrics indicated that people looked around more in VR, which correlates with statements from the interviews. Evidence for this difference stops being significant around Loop 6, which suggests that players got used to the repeating environment, or that the novelty of being in VR wears out over time. A larger difference was found when comparing the first session, suggesting that players felt less of an urge to look around when already familiar with the environment. In regards to play duration, no significant differences were found between conditions. Loop 8 is an exception to this, and shows players taking significantly longer to complete the loop in VRc than SCc. With the lights being off in this loop, this finding suggests that the darkness required more effort to navigate in VR. Other sensor readings (increases in EDA measures and a decreasing slope in temperature) suggest a more intense emotional response consistent with fear. A possible explanation for the findings in this loop could be that the events in previous loops, combined with knowledge of horror fiction tropes, created suspense, a precursor to fear [23].

One notable finding in this study was the absence of consistent significant differences in heart rate values. This challenges our expectations of finding elevated heart rate in relation to emotional arousal. Given the nature of the stimulus material — a suggestive horror game that relies mostly on suspense and atmosphere instead of sudden startle effects — we suggest that the stimulus used might not lend itself well to measurements of heart rate. Another possible explanation is that, since heart rate values were always computed against their respective baseline reading, participants could have had an already elevated heart rate due to stress caused by being part of an experiment or as a result of anticipatory fear, which is a common occurrence before playing a horror game. Future studies could remedy this by establishing a longer baseline procedure, and taking baseline measurements at a separate time from the experiment (e.g. some days apart).

Any study is limited by the technology of the time, especially when investigating the effects of new technologies on users. Participants stated that the VR headset’s cable was a distraction and, for some, a source of anxiety as they were concerned with tripping. Although many current commercial VR sets rely on cables, and our setup is therefore comparable to one that a player might have at home, we cannot exclude the effect this may have had on the gathering of data. For future research we would suggest the use of a wireless VR set when possible to limit this influence. It is also possible that data was influenced by the relative ‘newness’ of VR as a technology, and participants seemed to favor VR during interviews and in questionnaire answers. This influence is likely to decrease as more people become accustomed to VR. While this study provides an important data point regarding the differences in game experience between VR and monitors, future studies will be needed to see whether the findings remain the same as VR continues to become more generally used as a medium.

9 Conclusion

With this study we aimed to provide empirical data regarding a general intuition among game developers, namely that horror games in VR provide more intense emotional experiences than when played on a traditional monitor. To this end we developed a horror game and tested it in both conditions, using a mixed-method approach to data gathering to gain a full picture of player experience in both setups.

The data shows that when directly compared after having experienced both conditions, VR is subjectively considered to provide a more intense, frightening experience than playing on a screen. However, most of the data (both in psycho-physiological measures and questionnaire answers) points towards VR being more intense only when playing the game for a second time. This largely contradicts the literature regarding the role of uncertainty in the horror fiction experience, and suggests that VR is less impacted by ‘knowing what is to come’, with players still experiencing notable fear responses. Although our data does not necessarily support the assumption that playing an atmospheric horror game in VR is always more frightening than when played on a screen, it shows that subjectively players do consider it as such. More interestingly, it shows that horror games enjoy a longer lasting appeal when played in VR than they do on a monitor, suggesting the physical responses and sensation of spatial presence induced by the technology contribute to the continued intensity of the experience. Additionally, this study serves as a first data point for psycho-physiological measures for player experience of horror games in VR, which, we hope, will be a foundation for more empirical research on the topic.