Keywords

1 Introduction

Interactive stories are imbued with unique properties that allow them to leverage the emotions of their audience [25]. The emotional impact of interactive stories has even been wielded as a means to change user behavior [1, 3, 7, 15, 26]. Thus, it is useful to capture players’ emotions while they are playing interactive stories to be able to, for example, evaluate the effectiveness of the story for changing user behaviors, drive the path of a nonlinear narrative [16, 21], or maintain a certain emotional experience intended by the author [10]. Typically, emotion capture approaches rely on either having players self-report their emotions [29] or use physiological methods [8, 16] or other computational methods (e.g., facial expression recognition). However, physiological sensing and computational approaches can be impractical in casual settings due to the additional equipment or constraints required [29]. In our work, we are interested in self-report approaches to the capture of player emotion.

The concept of diegesis can be understood as the division between the story world and the world we exist in [14, 20]. Prior research has explored the placement of user interface (UI) elements, such as in-game shop interfaces, in interactive media within the diegetic space [11, 17]. Thus, we propose two ways of capturing emotion in an interactive story using the self-report method, that of the diegetic and the non-diegetic. Interactive stories are experienced both in the diegetic and non-diegetic worlds. In the diegetic world, the player assumes their role as the player-character in the world of the story, while in the non-diegetic space, they exist in their own physical world or environment.

This idea is also supported by the definitions of player and character by Carter et al. [6], where the player is explained as a “persistent, socially performed identity”, indicating her existence in the real world; and the character is explained as a “fictional identity within the narrative” or the story world. In other words, we embody the character we control in the world of the story, but still exist in our own identity in the real world outside of the story. The player may feel different emotions depending on whether she adopts the perspective of the embodied character, or her own perspective as an external person responding to the story. For example, a player may feel sad about a specific scenario conveyed by the story (e.g., if the scenario was applied to her own life), but as the character in the story, the player may think that the character should be happy about the scenario given the specific circumstances of the story, the character’s described personality, values, history, etc.

In our research, we are interested to capture the emotions of players as they exist in the real world in response to them playing the interactive story. We conducted a study that investigated how well a diegetic self-report UI (henceforth shortened to DEC method for ‘diegetic emotion capture’) compared to a non-diegetic UI (henceforth shortened to NDEC method) can capture player emotion. We also investigated the differences in story experience between the two emotion capture methods proposed. Our results contribute to a new understanding of capturing emotion across the diegetic space in interactive stories, which to the best of the authors’ knowledge has not been explored previously. The specific research questions for our study were:

RQ1: Is there a significant difference between how accurately players self-report their emotions while engaged in an interactive story using a diegetic approach as opposed to a non-diegetic approach?; and RQ2: How differently do players experience an interactive story when emotion is captured using a diegetic approach versus a non-diegetic approach?

2 Background and Related Work

2.1 Emotion Capture in Interactive Storytelling

In his review, Zhao [30] delineated three primary ways in which emotion is captured for use in interactive stories, citing the research of Yannakakis and Togelius [29]: subjective (emotion is reported by the player), objective (emotion is captured through physiological means), and game-based (emotion is inferred through interaction). Similar to the subjective approach, we primarily explore the use of a selection made by the players themselves during the story.

In related literature on the capturing of emotion in video games, the focus has primarily been on objective, or physiological sensing [2, 5, 9], though some have explored the previously mentioned game-based approach [18]. Yannakakis and Togelius also comment on the issues of using physiological sensing, noting the hardware can be intrusive and not always practical for use [29]. A review of 15 articles investigating physiological sensing in video games brings to light certain limitations in the scope of emotion captured by prior research [5]. The kinds of emotions captured by these papers were primarily stress, anxiety, fear, and arousal. This is understandable, given that many of the games used as stimulus in these studies lacked a narrative and were more action-oriented (Tetris, Guitar Hero, racing games), or were games specifically designed to instigate high-arousal emotions (horror games). Though there is plenty of work investigating how the player’s emotion may be used to influence the dynamic adaptation of interactive stories [8, 10, 16, 21], there seems to be scarce literature investigating the capture of emotion for interactive stories (i.e., how players respond to the story). For instance, emotions in an interactive story meant to induce more subjective or complex emotions such as nostalgia, anticipation, or joy may be better captured through a subjective measure in which the player self-reports their emotion, than via physiological sensing.

2.2 Diegesis and Diegetic UI

An earlier definition of diegesis by Gérard Genette in 1969 defined diegesis as the “spatiotemporal universe” of the story [4]. This concept has long been applied to film theory to differentiate between the narrative and elements of the film which are “only available to the audience”, such as the musical score or titles [19]. A simple way to understand diegesis in interactive stories is to picture it as the difference between the world of the story and the reality we exist in [20]. Indeed, diegesis defines the boundary between which the story world ends and the ‘real’ world begins [14]. In the study of interactive media, there has long been discussion of whether elements exist in the story world, or diegetic space; the real world, or non-diegetic space; or perhaps blur the lines between the two [13, 14]. The idea of diegetic user interfaces has been increasingly explored, particularly with regards to video games [11, 17, 27]. A key consideration when it comes to diegetic and non-diegetic elements in this realm is that while diegetic elements serve better to engage the player in the story, non-diegetic elements provide useful, and often necessary information that aid in gameplay [11]. The focus thus far has been on the impact of diegetic UI on immersion, with current research indicating that removing non-diegetic elements increases the player’s experienced immersion [11, 17, 27]. Elements that have been explored include in-game shop interfaces [17] and the heads-up display (or HUD, which can include displays for player life, score, and menu icons) [11, 19]. With regards to the HUD, it has even been proposed that instead of removing the HUD entirely, due to the useful information it provides, it could be implemented through augmented reality to increase immersion for these typically non-diegetic features [27].

In a discussion of emotions with respect to diegesis, different conceptualizations of the player also become relevant. The definitions of player and player-character, such as those presented by Carter et al. [6], seem to indicate a separation of the two across diegetic space. The player is the individual in their own reality experiencing the interactive story or game, whereas the player-character is that same individual embodying a persona within the story world. It is unclear whether the player therefore exists solely in the non-diegetic space and the player-character in the diegetic space, or if the relationship suggests a blurring of the player between the two as the player navigates two realities at once. Character identification could be an influencing factor of this experience, which Van Looy defines as consisting of the degrees to which “the player desires to be more like their avatar”, “the player sees their avatar as similar to themselves”, and “the player feels as if they are the avatar when playing the game” [28]. Schneider et al.’s work suggests that the simple presence of a story in a game increases identification between the player and character [23]. It is important to consider that the player experiences the story both as themselves and as the player-character when designing subjective emotion capture methods, as the emotion reported may be influenced by the emotions of the player-character. The question is whether the distinct embodiments of experiencing emotions can be disentangled.

3 System Description

3.1 Interactive Story Designs

To investigate the effects of a diegetic approach to emotion capture as opposed to a non-diegetic approach and to answer our research questions listed in the Introduction, we designed two interactive stories. Our interactive stories were built through the online survey platform Qualtrics. The stories were both about a stressful semester and final exam of a college student, explored through vignettes drawn from some of the researchers’ own experiences. The theme was chosen specifically for an audience of college students, given that we recruited participants from an undergraduate computer science course for our study. We chose to align the theme to the sample population to ensure that the stories written have a greater chance of resonating emotionally with our participants, and for participants to identify with the characters in these stories. The basic plot points the stories covered were:

  1. 1.

    It’s the start of the semester and the character is enrolled in several courses.

  2. 2.

    The character looks up their professor’s rating online and it looks bad.

  3. 3.

    As the semester goes on, the character is missing classes and assignments, causing worry for their grade.

  4. 4.

    The character goes to their professor’s office hours.

  5. 5.

    The character is invited to go for a night out on the town.

  6. 6.

    A week later, the character ends up sick with the flu.

  7. 7.

    The character reaches out to classmates and starts a study group.

  8. 8.

    The final exam is soon, so the character starts pulling all-nighters to prepare.

  9. 9.

    On the day of the exam, the character’s alarm does not go off.

  10. 10.

    The character makes it to the exam room and struggles with their laptop not being charged and phone going off while taking the exam.

  11. 11.

    The character finishes the exam, tired but hopeful for future semesters.

The two stories were made to be comparable in the sense that in either story, at any given scene, the same events and general reactions are covered. In one story, the main character’s name was Taylor, and in the other, the name was Ash.

To serve as an example of how these stories were unique but still comparable for the purposes of the study, one chapter of Ash’s story was: Okay, so the plan is to go to the first class and pass judgement then. And, as it turned out, the professor actually seems pretty friendly and willing to work with students... But partway through the semester, a few weeks later, it dawns on me that I’ve been missing class left and right, not turning in assignments, not keeping up with the coursework. I’m starting to get anxious for my grade.

For Taylor’s story, the corresponding chapter was written as: I’ll pay really close attention to the first class and see what it’s going to be like for myself. The professor doesn’t actually seem that bad... As the semester goes on, I realize I’ve been missing a lot of classes, and forgetting to turn in assignments on time. There’s an anxious pit in my stomach. Things aren’t looking great...

In both stories, we chose a gender neutral name and did not specify the characters’ gender at any given point, to prevent gender bias from affecting character identification. The characters also did not have a visual avatar, but were only portrayed textually through descriptions of their thoughts and experiences. The stories used a text-based choose-your-own-adventure format, where each scene presented consisted of text describing the scene, and then four player options for how to proceed (see Fig. 1 for a mockup of both emotion capture approaches, as well as a screenshot of the interface). It should be noted, that while the story took on the appearance of a branching narrative from a player’s perspective, every story choice in a given chapter led to the same following chapter.

Fig. 1.
figure 1

A mockup of our interface, showing a story scene that implements the NDEC method (left), the DEC method (right), and a screenshot of our interface (below).

3.2 Capturing Player Emotions In-Game

Given our focus on a self-report method, the emotion capture method took the form of choices the player would make themselves throughout the interactive story. In both the DEC and NDEC, we made it clear to participants to make their choice according to their own emotions. Each story chapter page emphasized the following note: “Note that by ‘you’, we mean you as in the person you are every day, existing in the world outside of this story”. This note was adapted from the definition of the player found in Carter et al. [6].

We extracted the emotion options for the capture of player emotion while playing an interactive story from the Differential Emotions Scale (DES) [24]. This scale was selected for its variety of discrete emotions that suited our purpose of capturing emotion subjectively. However, we only considered 4 core emotions from the DES: enjoyment, sadness, anger, and surprise, that were presented as forced-choice options to the player. The reason for not providing all 10 emotions measured by the DES is that we administer the scale itself to capture baseline emotion (see Sect. 3.3), and answering the full scale for all 10 emotions during engagement in the story would become unwieldy. The emotion options question was also accompanied by a likert scale question, asking participants to rate how much the emotion option that they selected matched their emotions regarding the story scene, on a continuous scale of 1 (not at all) to 5 (a lot) (input through a slider that allowed for decimal values). We used this value later in our analysis to calculate the ‘strength’ of the player’s indicated emotion. Depending on the emotion capture approach (diegetic or non-diegetic), the 4 focus DES emotions were conveyed as options differently. Visual outlines of how the emotions were presented in each approach can be found in Fig. 1, and are described below.

In the DEC Method, the emotion options are embedded in action options to allow the story to progress. Thus, we placed the emotion capture within the story context itself for our diegetic approach. For example, the ‘Anger’ emotion option embedded with a story option in the third chapter of the stories is: “This is not my fault, the TA’s won’t accept my excuses!” Here, players are choosing an option that both reflects their own emotions as they play through the story, while simultaneously choosing how the character responds to the story scene. By embedding emotions in story choices, a certain degree of subjectivity in how those emotions are perceived in the prose of the story choice arises. To validate that our emotion-embedded story choices reflected the correct emotions, we ran a pre-study with 7 participants. Participants were recruited via word-of-mouth, and included 5 females and 2 males, with 3 between the ages of 18–24 years old, 3 between 25–39, and 1 between 40–60 years old. Participants were asked to sort the 4 options for each story scene into categories representing each of the 4 emotion words. This took the form of essentially a place mat with a box for ‘Happy’, ‘Sad’, ‘Anger’, and ‘Surprise’, into which cut-outs of the prompts were sorted by set of 4 per story chapter. An emotion-embedded story prompt was judged as successfully reflecting the desired emotion if 70% of participants identified the emotion correctly. Twenty-eight of the 40 emotion prompts were identified accurately by 100% of the participants, 10 by 85.7%, and 2 by 71.4%. We proceeded with this set of emotion-embedded choices for the DEC method.

In the NDEC Method, the emotions are provided as separate choices from story options, thus existing outside of the story for the non-diegetic approach. The story options were written to be outlook-based, based on how the character looked toward their future, and not associated with any particular emotion. An example of one of the non-diegetic story choices is “I think if I spend my energy wisely now and do good work during this all nighter, I’ll be able to improve my grade.” Separately from their story choice, players would choose an emotion option from the emotion keywords of ‘Enjoyment’, ‘Anger’, ‘Sadness’, or ‘Surprise’ to indicate their own emotions.

3.3 Capturing Baseline Player Emotions

In order to measure how accurate either emotion capture method is, an appropriate baseline of the actual emotion experienced by the player is needed. Baseline emotion for the player and player-character was captured using the Differential Emotions Scale (or DES) [24]. The DES consists of 3 emotion keywords each for a total of 10 emotions. Given that our emotion capture methods only captured 4 core emotions (see Sect. 3.2), we administered the DES with only the 12 emotion keywords associated with those 4 emotions: Enjoyment, Anger, Surprise, and Sadness. Like the ‘strength’ value, the DES scale to capture the baseline emotion of the player was also provided on a continuous scale of 1 (not at all) to 5 (a lot), with a slider that allowed for decimal values.

Baseline emotion was captured for both the player’s and the player-character’s emotion. On the questionnaires which administered baseline emotion, notes were used to explain the difference between the player’s and the player-character’s emotion, which were adapted from definitions of player and character found in Carter et al. [6]. For the interactive stories, we did not capture emotion baseline at every story chapter, instead choosing 3 which were evenly spaced to be representative of the story’s beginning, middle, and end (Chaps. 2, 6, and 10). This was to prevent the study protocol from becoming cumbersome for participants.

4 Study Description

4.1 Study Design

The study used a within-subjects design, with ‘emotion capture approach’ as the single independent variable. Emotion capture approach had two levels: diegetic emotion capture (DEC) and non-diegetic emotion capture (NDEC). Thus, participants played through two different but comparable interactive stories, one of which utilized a DEC method, and the other, a NDEC method. The order of the two stories and the order of the two capture methods were both counterbalanced, for a total of 4 possible condition orders spread across the participant sample.

RQ1 Measures. For our first research question which investigates how well each emotion capture approach is able to capture players’ emotions during the course of an interactive story, we operationalized ‘player emotion’ as either the player’s emotion as an external user (em-player), or the player’s emotion as the story character (em-char). For the former, we measured the extent to which emotions players indicated during the story (em-dur) match with emotions they indicated on the baseline DES scale answering as themselves (em-base-player). For the latter, we measured how much em-dur matches with emotions player indicated on the baseline DES scale answering as the player character (em-base-char).

RQ2 Measures. For our second research question, which investigates differences in story experience between the two emotion capture approaches, the following measures were taken: overall experience was assessed by administering the Game Experience Questionnaire (GEQ) [12] at the end of each story, with the items adapted slightly to accommodate an interactive story instead of a video game (that the questionnaire was initially designed to assess). The GEQ contains 7 subconstructs: competence, sensory and imaginative immersion, flow, tension/annoyance, challenge, negative affect, and positive affect. At the end of each story, we also measured character identification, with the scale from Van Looy et al. [28]. Only items for the avatar identification construct were used, as the other constructs (game identification and group identification) did not apply to an interactive story of the type used in the study. For three story chapters in either story, we also measured relatedness to story event via the following single question: “How much can you relate to this event from your own experiences?” This question was on a scale of 1 (not at all) to 5 (a lot).

We also included comparison questions after both stories had been played through. These questions asked participants to compare the DEC method and the NDEC method on different aspects. For each question, participants had to score each emotion capture approach on a scale of 1 (not at all) to 5 (extremely), followed by answering an open response question to explain their scores. The comparison questions were as follows: How easy did you find... 1) it to make the decision of how Taylor/Ash would respond in either story?; 2) it to choose your emotion in either story?; and 3) each story in terms of your ability to go through the story feeling immersed and fully involved in the story as a whole?

4.2 Participants and Protocol

We recruited our participants from an undergraduate computer science course through a participant pool system. Participants were provided with extra course credit in exchange for their participation in our study. We had a total of 64 participants, 50 male and 14 female. Participant ages ranged from 18 to 24 years (62 participants) and 25–39 years (2 participants). Demographics included 27 participants identifying as White, 15 as Hispanic or Latino, 3 as Black or African American, 17 as Asian or Pacific Islander, and 2 as other.

The study was conducted entirely online. Participants experienced our stories and questionnaires via the Qualtrics survey platform. Upon signing up for the study, they were provided with a link that automatically sorted them into one of our 4 study condition orders. The Qualtrics questionnaire guided the participants through the steps shown in Fig. 2. Our protocol for the individual stories is shown in the expanded portion of Fig. 2.

Fig. 2.
figure 2

A diagram of our study protocol

5 Data Analysis and Results

5.1 RQ1: Accuracy of Captured Player Emotion

We calculated the accuracy of the captured emotion to participants’ emotions as the player (em-player) and to participants’ emotions as the player-character (em-char) as the difference between the captured emotion’s ‘strength’ and the average score of the keywords for that emotion on the baseline DES that participants filled at the end of 3 chapters. Accuracy scores for the 3 story chapters were averaged for each emotion capture approach to obtain a single accuracy score for em-player and a single score for em-char, for each study condition. Before averaging, we ensured that there were no significant differences among the 3 plot points for both em-player and em-char. Friedman tests were ran on the accuracy scores with plot point as independent variable for each condition. No significant differences were found, indicating that the plot points were comparable. Thus, the final accuracy score for em-player for each condition was calculated as ((chapter2(em-dur - em-base-player) + (chapter6(em-dur - em-base-player) + (chapter10(em-dur - em-base-player))/3. The final accuracy score for em-char was calculated similarly, with em-base-player instead of em-base-char.

Much of our data was not normally-distributed as assessed via the Shapiro-Wilk test, so non-parametric tests were used to compare the study conditions. A Wilcoxon signed-rank test was ran on the final accuracy scores for em-player and em-char with emotion capture method as independent variable. A statistically significant difference between the 2 emotion capture methods was found for both em-player (Z = −4.220, p = 0.000) and em-char (Z = −4.561, p = 0.000). For em-player, the NDEC method had a lower mean (M = 0.705; SD = 0.985), indicating higher accuracy with respect to the player emotion, than the DEC method (M = 1.442; SD = 0.961). Note that smaller scores reflect greater accuracy. For em-char, the NDEC method showed the lower mean as well (M = 0.318; SD = 1.068), compared to the DEC method (M = 1.238; SD = 0.930).

5.2 RQ2: Story Experience

Friedman tests were conducted on story relatedness scores across the 3 plot points, separated by condition, to ensure that the plot points were comparable enough to be averaged together. No significant difference was found. We averaged story relatedness scores for the 3 plot points for our final analysis.

Wilcoxon signed-rank tests were conducted to compare the two approaches in terms of all aforementioned story experience measures. Only measures that showed significant differences between the two emotion capture methods are reported below. Significant differences between our two emotion capture methods were found for the ‘negative affect’ subconstruct of the GEQ (Z = −2.180, p = 0.029); ease of choosing action, as measured by our first comparison question (Z = −2.018, p = 0.044); and ease of feeling immersed, as measured by our third comparison question (Z = −2.428, p = 0.015). For negative affect, the NDEC method had the higher mean (M = 2.635; SD = 1.002) than the DEC method (M = 2.437; SD = 1.028). For ease of choosing action, the DEC had the higher mean (M = 3.453; SD = 1.208), compared to the NDEC (M = 3.047; SD = 1.174). For ease of feeling immersed, the DEC had the higher mean (M = 3.063; SD = 1.296), compared to the NDEC (M = 2.688; SD = 1.194).

5.3 Comparison Questions

To analyze the open-ended responses from the questions asking participants to compare the DEC and the NDEC methods, the qualitative coding process as specified by Saldana [22] was followed. One coder did two cycles of coding on all the responses, first assigning descriptive codes, and then assigning categorical codes from the descriptive codes. Based on the categorical codes, the coder developed an initial coding scheme. Three other coders used the initial coding scheme to independently do 3 coding passes on the responses. Discussions were held among all the 4 coders after each coding pass. Amendments were done to the coding scheme during each discussion, and the coders used the revised scheme each time for their next coding pass. The final intercoder agreements established were 71.7% for comparison question 1, 82.4% for question 2, and 70.0% for question 3. An intercoder agreement level of 70% is typically deemed acceptable. The final coding scheme can be found in Table 1, with the percentage frequency of occurrence of each code. We note that in our coding, one participant response could contain multiple codes. The frequency percentages are shown organized into whether the response indicated a preference for the NDEC method, the DEC method, or the same for both methods.

Table 1. Our final coding scheme for the comparison question responses, broken down by approach preference. (no. of code occurrences/total no. of code occurrences, in %’s).

6 Discussion

Our results showed that the NDEC method is significantly more accurate at capturing player emotion than the DEC (RQ1). We attribute this to the fact that while the NDEC method explicitly asks for player emotion via simple emotion words, the DEC method blurs the story world and the real world in terms of player emotions. Our other results however, indicate that this does not mean the NDEC method is inherently superior to the DEC method for emotion capture. In terms of story experience, we saw that the NDEC scored higher in terms of negative affect, and the DEC condition scored higher in terms of ease of choosing action and ease of feeling immersed. Additionally, in our qualitative responses, the category code ‘Degree of immersion’ in Table 1, only showed in responses where the DEC condition was rated higher (and thus, more preferred) than the NDEC. These results are in line with prior research (e.g., [11, 17, 27]) that showed diegetic UI elements result in higher immersion. Our work thus confirms that the same can be concluded for emotion capture in interactive stories.

However, our study results further showed that while playing through an interactive story, irrespective of emotion capture approach, participants more accurately report the emotions of the player-character than their own emotion. In both the DEC and the NDEC conditions, the accuracy scores calculated were smaller for the player-character than that of the player. This was unexpected, as the emotion capture methods explicitly requested the player’s emotion, not the player-character’s. This finding is supported by the large frequency of responses addressing the difficulty of participants to decouple their own emotion from character emotion. One example of such responses illustrating the decoupling difficulty is: “It is natural to have an emotion to scenarios you are put into. It is not so easy to separate your own emotions from those of a character and think about how they might/should react”. This finding suggests that capturing emotion via a self-report method within an interactive story can lead to a better capture of player-character emotion than the emotions of the player.

In summary thus, the implications of our study results for player emotion capture within the design of interactive stories are that approaches have to be found that integrate the accuracy of capture of non-diegetic UI without compromising the player’s story experience. For example, one approach may be to make the prompt for the story choice directly address the player by breaking the fourth wall (e.g., ‘What do you want to do?’). This may make the prompt to self report the player’s own emotion as they exist in their own reality less jarring, and prevent a break in immersion, while maintaining the accuracy of the emotion capture method. Another approach may be to ensure that the player identifies fully with the player character prior to engagement in the interactive story. If it can be assumed that the player’s emotions will be the same or very similar to the player character’s, then using a diegetic UI to capture player emotion will be accurate. Such an approach may also help to resolve the problem that players tend to better convey their perceived player-character emotions (instead of their own emotions), irrespective of the type of UI used.

7 Conclusion

We conducted an investigation of two means of self-reporting emotion in interactive stories: a non-diegetic emotion capture (NDEC) method and a diegetic emotion capture (DEC) method. Our results showed while that the NDEC method resulted in higher accuracy to player emotion, the DEC provided a better story experience for participants. We also found that via either method, emotion capture using a self-report means was more accurate to player-character emotion than player emotion. We contribute an understanding of self-report emotion capture as a diegetic user interface element, backed by empirical evidence, and suggest that future work should seek hybrid methods, blurring the lines of diegesis to mitigate some of the effects observed. Limitations of our work include that our interactive stories were simple in nature and solely text-based, and the fact that our stories were designed specifically for a college student population. Additionally, as this was an online study, we had no way to ensure participants paid full attention while taking part in our study.