Keywords

1 Introduction

In recent years, serious games have been extensively employed in the domains of training and education, including areas such as safety and security [24], pedagogy for neurodivergent individuals [29], and support for educational processes [8, 14]. Despite their growing popularity, studies have been criticizing serious games due for their inefficiency as educational or training tools, primarily due to the absence of a personalized user experience and poor attention retention during the learning process [9].

Given the above context and limitations, a line of research on sound design is being pushed to study user experience and attention in serious games and educational environments [1, 13, 28]. Sound design is one of the most important components of serious game [13]. However, dedicated and in-depth research on the impact of different sound design techniques on user experience and attention is yet to be conducted.

Sound design for games typically addresses four categories: sound effects, soundscapes or ambient sounds, dialogue sounds, and musical events [3, 6]. In this paper, we address soundscapes and sound effects only. Soundscapes refer to the surrounding sonic environment of a particular scene or location [5]. Its design typically adopts a mixture of real-world recordings and digital sounds [21]. For example, in a driving simulation game, the soundscape can reproduce recordings inside a car from a driver’s perspective, to which some synthesized sounds can be added [25]. Sound effects usually refer to the sounds triggered when players interact with game events. The link between players and game events or objects can be built on sound effects.

This paper introduces two experiments to study sound design’s impact on user experience and attention based on the serious game Venci’s Adventures. Three different versions of the game were developed: 1) no sound, 2) standard sound design (including soundscapes and sound effects), and 3) standard sound design with auditory notifications or earcons. Perceptual experiments assessed the best sonic parameters for the earcons’ design and the impact on user experience and attention of the three designed versions.

The remainder of this paper is structured as follows. Section 2 provides an overview of previous research on sound design strategies for enhancing user experience and attention within serious games. Section 3 details the sound design techniques applied to the serious game Venci’s Adventures and used in the perceptual experiments. Section 4 presents the evaluation protocols used in the experiments. Section 5 details the experiment results. Finally, Sect. 6 presents the conclusions of our work and directions for future research.

2 Related Work

Previous studies have focused on the impact of sound design on the video gaming user experience – namely in enhancing the player’s sense of immersion – and attention.

Natasa et al. [21] studied the impact and influence of sound design, namely soundscapes, sound effects, and dialogues, on users’ emotional responses and immersion within augmented reality audio games. The authors found that sound design is important in immersing and emotionally engaging the player in the game world. Emmanouel et al. [23] studied the effect of real-time interactive, ambient, and 3D spatial sounds in the game experience. Results show that interactive sound generation mechanics can improve the immersion and challenge of the game experience.

Furthermore, sonic interaction design, namely sound effects, can increase player’s experiences, such as enjoyment, learning, motivation, immersion, emotional engagement, and attention in virtual worlds [1, 7, 18, 20, 26,27,28, 30]. For example, in the game ‘My Sound Space,’ Eriksson et al. [11] have shown that ambient sound can redirect people’s attention to an ongoing cognitive activity. Falkenberg et al. [12] have shown that peripheral auditory notifications could direct people’s attention in a virtual retail environment.

Sound design is yet to be fully studied and explored in educational contexts. The work of Wang and Lieberoth [28] shows that game-based learning incorporating sound design has a demonstrably positive impact on students’ focus, involvement, satisfaction, knowledge acquisition, and drive during classroom activities when contrasted with the absence of sound. Alseid and Rigas [1] found that the teaching outcome of an online course can be improved by using auditory earcons to convey keyword information to students.

3 Venci’s Adventures: A Serious Game on Cybersecurity Awareness

Venci’s Adventures game adopts 2D platforming mechanics within UnityFootnote 1 and explores a cyber security-themed storyline, challenging players to improve their cybersecurity knowledge. The game has four scenarios: beach, forest, cave, and temple. Figure 1 shows the storyline. Cybersecurity problem-solving challenges in each scenario drive knowledge acquisition. The challenges require the player to answer several quizzes to assess their learning outcomes.

Fig. 1.
figure 1

Simplified storyline transitions of the serious game Venci’s Adventures. The educational part, comprising quizzes, is presented subsequent to the game part within the temple, forest, and cave scenes.

3.1 Soundscapes

Following [10, 19], we adopt soundscapes comprised of continuous representative ‘physical-world’ sounds related to each scenario in Venci’s Adventures. These sounds aim at helping players identify the environment and increase the sense of immersion in the virtual world. To design each soundscape scenario, we followed a threefold method. First, we identified and listed relevant sound sources matching the visual scenarios. Second, we retrieved stereo sound samples for each identified source from the online sound libraries FreesoundFootnote 2 and Soundcloud.Footnote 3 Third, we designed the soundscape narratives using the digital audio workstation Reaper.Footnote 4 In the following sections, we detail the soundscapes sources of each scene, matching the visual scenarios shown in Fig. 2, and their temporal narrative development. In the supplementary materials to this article, we can find a short rendering of each scene soundscape, available online at: https://figshare.com/articles/media/Game_sound/23153732.

Fig. 2.
figure 2

Four scenes in the serious game Venci’s Adventures: A. Beach scene B. Forest scene C. Cave scene D. Temple scene.

The soundscape of the beach scene includes the following four sound sources: ocean waves, wind, people playing on the beach, and birds chirping at a distant forest. The sonic narrative starts with waves and wind sounds, as the game character (GC) enters the beach scene. As the GC approaches people playing on the beach, the player is immersed in the auditory experience of the waves and the lively atmosphere of people engaging in beach activities. Specifically designed to enhance realism, the sounds of people playing on the beach seamlessly fade in and out as the GC passes by. Finally, as the GC concludes the beach scene, the player’s auditory perception is enriched by the simultaneous presence of the soothing waves and the melodious chirping of birds within the forest. This incorporation of forest bird sounds serves a dual purpose: to facilitate a seamless transition between scenes and to evoke anticipation for the forthcoming forest scene.

The forest scene’s soundscape comprises the following four sound sources: birdcall and insect sounds, wind through swaying trees, and distant beach waves. The transition from the beach to the forest scene is marked by the soundscape narrative that evolves around the enchanting birds chirping in the forest and the gradually fading waves. As the GC reaches the midpoint of the forest, the player becomes immersed in the experience of the wind gently rustling through the trees, accompanied by the melodious chorus of birds and insects. Towards the end of the forest scene, the player encounters a quiz on cybersecurity, signaled by the distant birds chirping, serving as an auditory cue that signifies the impending end of the scene.

The auditory landscape within the cave scene encompasses two primary sound sources: the electronically synthesized sound of dripping water and the presence of wind within caves, accompanied by its resonating echoes. The soundscape gradually transitions as the narrative shifts from the forest to the cave scene. The fading melodies of birds and the gentle crashing of waves intertwine with the growing prominence of wind and its echoes, effectively guiding the listener through the scene’s sonic metamorphosis. Upon reaching the midpoint of the cave, the composition expands to include the atmospheric symphony of dripping water, wind, and captivating reverberations within the cavernous space. Finally, as the cave scene nears its culmination, a second cybersecurity quiz emerges, accompanied by the gradual attenuation of wind sounds within the cave, subtly signaling the impending end of the scene.

The soundscape within the temple scene is primarily characterized by the immersive presence of wind sounds. Upon the GC’s arrival at the temple scene via the cable car, the player is greeted by the resonant echoes of wind captured from the lofty heights of the surrounding mountains. As the GC ventures into the temple and proceeds toward the educational section of the scene, the auditory experience subtly shifts. In order to authentically recreate the indoor atmosphere, the player is enveloped by the gentle whispers of the wind, faintly perceptible, fostering a heightened sense of being within the temple’s sacred confines.

3.2 Sound Effects

The crafted sound effects aim to enhance the player’s overall user experience, emphasizing fostering a sense of immersion. Two primary types of sound effects have been incorporated into the game design. The first type caters to the actions and movements of the player’s in-GC, encompassing sounds like footsteps, jumping, falling, and the collection and dropping of puzzle pieces. These sound effects dynamically adapt to the visual context of each scene, taking into account various environmental factors such as soil materials and textures (e.g., grass, wood, stone, water, and sand). This careful synchronization between audio and visual elements ensures a cohesive and realistic gameplay experience.

In contrast, the second type of sound effects is tailored to the objects and entities surrounding the GC within the game world. This includes the likes of seagulls, mosquitoes, flies, collapsing stone pillars, moving stone platforms, and cable cars, as well as the audio cues associated with puzzles and other interactive elements. While the first type of sound effects is presented in mono, the second type embraces stereo techniques, leveraging the spatial positioning within the stereo sound field to enhance immersion. For instance, as players navigate the GC through the forest scene, they can audibly perceive the realistic sound of flying insects originating from various directions, amplifying the immersive qualities of the game world.

3.3 Auditory Notifications

Auditory notifications are based on the design of earcons, i.e., short, structured musical messages. Typically, designers adopt earcons to alert the player to events that require their attention [2, 4, 19, 22]. Based on guidelines for earcons design reported in Blattner, Sumikawa, and Greenberg [2], we created 22 auditory notifications with threefold attribute categorizations: pitch, timbre, and melodic pattern. Pitch can be categorized either as high or low. The split point between the two categories is the C2 note or 65.41 Hz. Earcon’s timbre is categorized as either simple or complex tones. Melodic pattern is categorized as either a unique tone or a broken chord with ascending, descending, or a combination of both patterns. Eleven notifications signal incorrect answers (the I-part notifications), while the remaining denote correct answers (the C-part notification). Table 1 summarizes the attributes of all created auditory notifications. All earcons do not exceed two seconds in duration, and we classify them as more than one second (>1 s) and less than one second (<1 s). The supplementary materials to this article, available online at https://figshare.com/articles/media/Earcons/23152871, include all designed earcons.

Table 1. Sonic attributes of the 22 earcons designed for the Venci’s Adventures game. The I and C earcons represent incorrect and correct answers given by players during the educational part, respectively. Timbre is classified as simple or complex tones, while pitch is divided into high and low frequencies. The melodic pattern includes unique tones or broken chords with ascending, descending, or combined directions.
Fig. 3.
figure 3

The educational part of the temple scene.

An example of the auditory notification’s use in Venci’s Adventures is shown in Fig. 3. In the quiz part of the temple scene, where the player needs to control the GC to select the correct spelling of the web domain by stepping on the underground press button. The earcons signal the sound of the push buttons.

4 Evaluation

We conducted two experiments. The first experiment’s primary objective was identifying the most effective sonic parameterization for auditory notifications or earcons. Subsequently, the second experiment aimed to assess participants’ game experiences within the serious game Venci’s Adventures under two distinct conditions: with and without sound. To achieve this, we first investigated the general game experiences of participants based on two game versions: the no-sound version, lacking auditory elements, and the standard sound design version, incorporating soundscapes and sound effects per the game’s intended experience. Furthermore, we also explored the potential of well-designed auditory notifications to enhance participants’ attention in serious games by comparing participants’ engagement in the standard sound version and an additional notification version. The notification version is made by adding additional auditory notifications to the standard sound version.

A total of 23 individuals, comprising 8 females and 15 males, ranging in age from 18 to 54, willingly volunteered to partake in our experiments. The age groups of the participants were distributed as follows: 69.6% were aged between 21 and 30 years old, 21.7% below 20 years old, and the remainder were aged between 31 and 40 years old. Most participants (65.2%) had previous experience with online educational tools and were familiar with the online questionnaires. Due to confidentiality agreements, all participants were employees of the company Emvenci Portugal Limitada, as the Venci’s Adventures game was still under development at the time of the experiments.

4.1 Experiment 1: Earcons Design

Each participant was guided through a twofold structured protocol for the first experiment encompassing two steps. First, demographic information for each participant was collected, including age range, gender, and prior experience with online educational testing. Collected data assisted in establishing a comprehensive participant profile for subsequent analysis.

Second, participants were exposed to two sets of earcons specifically designed to signal incorrect and correct answers in the quiz, referred to as the I- and C- part notifications. Within each set, earcons were presented by a uniform random allocation to avoid order effects. After each earcon, participants were prompted to assess their efficacy in signaling an incorrect or correct response. Participants used a seven-point Likert scale ranging from one (not at all) to seven (very much) to rate their perceptions.

We computed descriptive statistics reporting average ratings and their standard deviation to analyze the perceptual data regarding the efficacy of different earcons within each set. Earcons with the highest ratings will be understood as the more effective in eliciting incorrect and correct answers. Furthermore, we will assess user preferences in terms of sound parameters, namely in terms of pitch and timbre in the I-part earcons, which provide a more balanced number of examples across parameter changes. To this end, we apply a two-tailed paired t-test, assessing the probability of the two sample data occurring by chance. We adopt \(p \le .01\) as a reference for statistical significance.

4.2 Experiment 2: User Experience and Attention

The second experiment was structured into four distinct phases. Phase one consisted of a pre-experimental questionnaire and group allocation. In detail, participants were first required to complete a pre-experimental questionnaire, which encompassed demographic information such as age range, gender, and the presence of any hearing impairments. As the experiment heavily relied on auditory stimuli, it was crucial to ensure that all participants had normal hearing capabilities. Based on the collected data, we split participants into two groups (named Groups 1 and 2), to prevent any potential biases or repetitive measures effects.

In phase two, participants were introduced to Venci’s Adventures game and provided consent for video recording. All participants received comprehensive instructions regarding the basic operation of the Venci’s Adventures game. Furthermore, participants were explicitly asked for their consent to have their gameplay actions video recorded for subsequent analysis.

Phase three was distinct for each participant group. Participants in Group 1 conducted an assessment of the impact of the sound design in Venci’s Adventures using a adapted version of the In-game version of the Game Experience Questionnaire (GEQ) [15, 16]. Participants in Group 1 played the no sound and standard sound design versions of the game. To avoid order effects a uniform random allocation of the versions order was adopted. After each version, participants were asked to complete the GEQ, as outlined in Table 2. Depending on the sound designed version played, participants were asked to respond to GEQ-NoSound and GEQ-Sound for no sound and sound design versions, respectively. The questionnaire includes 14 questions assessing players’ competence, immersion, flow, tension, challenge, and positive and negative emotions based on items proposed in the core guidelines suggested by IJsselsteijn et al [15]. Each dimension comprises two questions, and the average ratings of these questions are utilized to represent the respective dimension’s score.

Table 2. Questions of the questionnaire used in the main part of the first experiment.

Participants in Group 2 examined the impact of the earcons pre-selected in experiment 1 in attention retention during the educational parts or quizzes. To investigate whether the earcons in addition to the standard game sound design version enhances participants’ attention during important educational segments of the serious game, a tailored questionnaire was developed based on the questionnaire design approach known as Game Mode Specific Questionnaire (GMSQ), proposed in [23].

Participants played the standard sound design version of the game and the sound design version with auditory notifications or earcons. To avoid order effects a uniform random allocation of the versions order was adopted. After each game version, participants were asked to complete the Game Mode Specific Questionnaire for the sound design version (GMSQ-Sound) and with the addition earcons (GMSQ-SoundNotify), as presented in Table 3. This questionnaire aimed at capturing the participants perception on attention, focus and response feedback during the quiz parts of the game.

Table 3. Questions of the questionnaire used in the main part of the second experiment.

In both questionnaires, all participants are asked to use the Absolute Category Rating (ACR) System to rate their experience independently on a category scale. ACR is recommended by ITU-T P.910 [17], shown in Table 4. Participants were measured using five categories: Not at all, Slightly, Moderately, Fairly, and Extremely.

Table 4. Absolute Category Rating (ACR) System.

To compute the results and infer trends in the collected data, we calculated descriptive statistics (average and standard deviation). Furthermore, we assessed the statistically significant differences between sound design version (1 vs. 2 and 2 vs. 3) using a two-tailed paired t-test, assessing the probability of the sample data to occur by chance. We adopt \(p \le .01\) as a reference for statistically significant results.

5 Results and Discussion

5.1 Experiment 1

Figure 4 shows the average and standard deviation ratings for the I-part (a) and C-part (b) earcons under evaluation. The I6 and C6 earcons had the highest average rating in each part. They were adopted in the Venci’s Adventures game as they have been perceptually rated as the earcons providing the most efficient sound in signaling both incorrect or correct answers, respectively. Experiment 2 adopts the I6 and C6 earcons in the sound design version of the game with auditory notifications.

Fig. 4.
figure 4

The average and standard deviation perceptual ratings of the 22 earcons in the I-part (a) and C-parts (b) on the left and right, respectively. I- and C- parts signal incorrect and correct answers. The sonic attributes of designed earcons are shown in Table 1.

In addition, we explored parameter preferences in the designed I-part and C-part earcons in terms of timbre (simple vs. complex tone) and pitch (low vs. high), as shown in Table 1. The two-sample unequal variance test results suggest no significant statistical differences for both parameters under study, with \(p \ge .05\). Nevertheless, the findings indicate a potential association between incorrect and correct responses to low (descending) and high (ascending) pitch, respectively, and a prevailing inclination towards complex timbre eracons. Future research with a more in-depth parameters analysis shall be conducted. A uniform distribution per sonic attribute and inter-parameter relations may yield more enlightening results.

5.2 Experiment 2

Figure 5 shows the average ratings of Group 1 responses to the GEQ comparing the no sound (GEQ-NoSound) and standard sound design (GEQ-Sound) versions of the game. As shown in Fig. 5, the standard sound design version ranked much higher than the no sound version in almost all user game experience dimensions – competence, immersion, flow, (low) tension, (low) challenge, positive affect, and (low) negative affect.

The enhanced user experience reported by participants in the standard version can be attributed to the heightened audio-visual synchronicity observed between all scenes and movements of the GC and the corresponding real-world experience. For instance, as players engage in the serious game and progress to the beach scene, they naturally anticipate hearing the sound of waves and birds chirping following the visual depiction of the sea and birds’ presence. The absence of such auditory stimulation can lead to incongruences between the players’ expectations and the game’s imagetics, ultimately resulting in a subpar user experience.

The paired t-test results, with a 1% significance level, assessing statistical differences between the no sound and standard sound version per question, are shown in Fig. 5 as an asterisk (*), after the question number in the x-axis, whenever the statistical significance is found. Significant statistical differences enforce the importance of standard sound designing in enhancing the user experience regarding competence (achievements), immersion, flow, positive affect, and low negative affect.

Non-significant statistical differences were found in dimensions of competence (skills) and more expressively in the dimensions of tension and challenge. The less expressive differences between the standard sound version in terms of tension and challenges suggest that some designed sounds may have triggered some irritable reactions from the participants leading to potential distractions. To address this concern in future endeavors, we may consider assessing the component sounds individually before the game experience and, eventually, a greater degree of variation or adopting generative strategies to increase the variability over time.

A final note on the comparison of the players’ perception of their competence (\(p \ge .05\)) and challenge (\(p \ge .05\)), respectively. The findings signify that sound design has the potential to evoke a dual effect on players, wherein they perceive a reduced sense of competence while simultaneously experiencing heightened levels of challenge. Furthermore, there exist two plausible explanations for the aforementioned outcomes. The first explanation posits that the existing four scenes within the serious game may be too straightforward for all players, resulting in the perception that the game can be effortlessly managed without requiring significant skill or exertion. Under such circumstances, distinguishing the disparities in challenges between the two game versions becomes arduous for players. The second explanation revolves around the notion that game sound predominantly influences individuals’ subjective experiences during gameplay, exerting only a limited impact on the objective aspects of game-related sensations.

Fig. 5.
figure 5

Comparison of the average ratings of the GEQ-NoSound and GEQ-Sound. Questions numbers followed by asterisks (*) denote statistically significant results with \(p\le .01\).

Figure 6 shows the average ratings of Group 2 responses to the Game Mode Specific Questionnaire (GMSQ) comparing the standard sound design (GMSQ-Sound) and with additional auditory notifications (GMSQ-SoundNotify) versions of the game. As shown in Fig. 6, the notification version was rated higher in all questionnaire dimensions – measuring attention retention, focus, attention enhancement, and feedback information. It shows that the added auditory notifications improve the players’ attention during the quiz part of the serious game compared to the standard sound design version.

The paired t-test results, with a 1% significance level, assessing statistical differences between the standard sound version and the added notification version per question, are shown in Fig. 6 as an asterisk (*), after the question number in the x-axis, whenever the statistical significance is found. The results indicate that, compared to the standard sound version, the auditory notification providing feedback sounds representing the incorrect and correct answers, attention retention, and focus have highly significant positive effects on the participants (with \(p \le .01\)).

The auditory notifications representing the incorrect and correct answers can induce the player to focus on the quiz part of the game. Compared to the standard sound design version, in the auditory notification version, both the auditory notification sound representing the player’s answer is correct and the sound representing the player’s answer is incorrect to have significantly positive effects on capturing more attention during the play stages in the education part.

Fig. 6.
figure 6

Comparison of the mean values of GMSQ-SoundNotify and GMSQ-Sound. Questions numbers followed by asterisks (*) denote statistically significant results with \(p\le .01\).

The auditory notifications signaling correct and incorrect answers have the potential to direct the player’s attention towards the quiz components of the game. In comparison to the standard sound design version, the inclusion of auditory notification sounds denoting both correct and incorrect player responses within yields noticeably positive effects in terms of garnering heightened attention during the educational stages of gameplay.

6 Conclusions and Future Work

In this study, we empirically examine the influence of sound design on user experience and attention in the serious game Venci’s Adventures. Two experiments were conducted to empirically assess the perception of user in: 1) selecting the most effective earcons in retaining the user attention and conveying feedback on quiz responses and 2) assessing the impact on user experience of threefold serious game versions – no sound, standard sound design including soundscapes and sound effects, and standard sound design with earcons. Three main conclusions were drawn from our results.

First, in the first experiment, the earcons evaluation led to the selection of two auditory notification that proved to be the most effective in retaining the attention from the user and providing clear feedback in terms of the correctness of the answer during a quiz. Further analysis of earcon parameters revealed no significant preferences with regards to timbre (simple vs. complex tone) and pitch (low vs. high).

Second, the standard version of the serious game has been perceived as conveying a significantly positive effect on players’ user experiences in terms of immersion, flow, and (positive) affect, when compared to the no-sound version. Additionally, it resulted in fewer negative experiences in terms of tension and negative affect for players. This can be attributed to the proper sound design in the serious game, which meets players’ expectations by incorporating specific sounds based on their life experiences, thus providing a favorable gaming experience.

Furthermore, an intriguing results was observed. Players did not perceive significant disparities in skill and effort between the standard and no-sound versions of the serious game. Two possible explanations were posited: 1) the existing four scenes of the serious game are too easy for all players to discern the variation in difficulty caused by different game versions; 2) the perception of difficulty is objective, and sound design primarily impacts players’ subjective experiences and moods during the game, rather than objective perceptions.

Third, the sound design version featuring auditory notifications or earcons, in comparison to the standard sound design version, exhibited significantly positive effects on measuring attention retention, focus, attention enhancement, and feedback during the educational stages of gameplay (i.e. quizzes). Both the auditory notification sound denoting correct and incorrect answers contributed to these effects.

In summary, sound design constitutes a crucial and indispensable component of serious game development, ensuring an enhanced gaming experience for players. Moreover, employing specific sound types, such as earcons created using musical notes and sine tones, as auditory notifications, can effectively improve players’ attention in serious games.

Future research endeavors will involve a systematic analysis of how the sonic attributes present in earcons lead to variations in implication, as well as the impact of different sound designs encompassing ambient sounds and sound effects on players’ attention in serious games. Based on the findings, generalized good practices for serious game sound design will be unveiled to enhance learners’ attention and improve learning outcomes.