Keywords

1 Introduction

Living organisms are in a constant interaction with their environment. These interactions allow to perceive through senses and eventually to operate the surrounding world. We, as human being, experiment our own environment as well as our own body only through our sensory organs. As a result, many life situations involve the exploitation of more than one sensory channel. For example, mobility often requires visual and auditory stimuli. More generally, in multiple situations, auditory feedbacks can be used to supplement or to reinforce the visual channel [7, 16], as the auditory channel helps at monitoring the surrounding beyond the limited field of view: over 360°. This probably explains why, since the childhood, one learned to detect, more or less precisely, where a sound is coming from in space. These considerations may probably explain the reason why we observed in recent years a growing popularity for the use of 3D sounds in virtual environments (VE). With this integration, when interacting with virtual objects, the sound feedback comes to reinforce, to complement the visual information [17]. This situation is named as a multi-sensory interaction since more than one sensory-channels (audio-vision) are associated for the benefice of the interaction. It appears that having a multi-sensory rendering can greatly contribute to increase the authenticity and plausibility of the VE, namely its overall coherence. Other studies are questioning whether the use of a multimodal interaction allows users to be more efficient or not in performing a given task [23]. In our study, we want to address an issue that seems to be ahead of such problems. Indeed, we have in some past studies observed that the ability to properly assess the direction of a sound vary greatly from one person to another [18]. In particular, we found that training plays a great positive and significant role in this [17]. Therefore, we want to allow people to improve their abilities to perceive the direction of a 3D sound simulated on a computer.

For this, we focus on the question how to allow people to improve the perception of the direction of a 3D sound simulated on a computer. Our goal is to provide a tool to help people developing their capabilities to detect the direction of a 3D virtual sound. Knowing that serious games are an effective tool for training and learning [2, 21] we hypothesize that running a serious game should lead people to improve their ability to perceive the direction of a 3D sound. This paper describes the design of that serious game. An experiment aiming at assessing the value of that serious game is also presented. We observed that all participants of that study have, for the two level of the game, improved their score after practicing the game for a ten minute period.

The paper is organized as follows: related works are presented at Sect. 2. Section 3 describes the proposed serious game. The realized experiment is presented at Sect. 4 before the conclusion in Sect. 5.

2 Related Work

Sight and hearing are the most exploited human senses. They are involved in multiple ways in the perception of the environment. In this section, we briefly report the use of 3D sounds. We first analyze cues that help at identifying where a sound is coming from in space. Second, we review the use of 3D sounds in VE. However, we do not cover techniques that can be used to simulate a 3D sound in a digital world. Among the most popular, one counts the ambisonic [9], the wave field synthesis and binaural [6].

2.1 Perception of 3D Sounds

In the physics point of view, one defines sound as a mechanical vibration of a fluid, which propagates in the form of longitudinal waves because of the elastic deformation of this fluid. The perception of a sound refers the reception and interpretation of such waves [26]. Six elements are involved in perception of sounds, namely pitch, duration, loudness, timbre, sonic texture and spatial location [5]. Here, we focus on spatial location. To detect the spatial location of a sound source in space, one has to estimate its azimuth, elevation and depth. For this, three cues are mainly used:

  • Two binaural cues:

    • Inter-aural Time Difference (ITD): defines the difference in arrival time of a sound between two the ears. ITD informs about the direction or angle of the sound source from the head.

    • Inter-aural Level Difference (ILD): is the difference in loudness and frequency distribution between the two ears.

  • Spectral cues linked to the HRTF (Head Related Transfer Function)

    • They correspond to changes in the intensities of frequencies. They are due to several phenomena (diffractions, reflections and attenuations caused by the shoulders, head and ears) that modify the sound waves received at each ear.

For more information regarding the estimation of the distance of a sound source as well as the influence of the visual channel, one can refer to [4].

2.2 Use of 3D Sound in VE

In a real world study, Murray et al. have noticed that when deprived of every sound feedback, people feel being like they were observers not actors of their actions [20]. In a more general way, it appears that having a sound feedback in a VE may make it be more immersive than without. Here, we briefly review the use of sounds in VE.

On of the first use of 3D audio in VE is in improving the quality of life of visually impaired people. Over the last two decades several works have been carried out in this direction. One counts the use of an interactive story telling performed in 3D acoustic virtual environment for blind children [14]. Recently, Balan et al. have investigated whether haptic-auditory feedbacks based training can enhance or not sound localization performance, front-back discrimination and the navigational skills of the visually impaired people [1]. They reported that subjects succeeded to improve their sound localization performance, reduced the incidence of angular precision and reversal errors. Moreover, participants became able to build an effective spatial representation map of the acoustic environment. Picinalli et al. [22] assessed VR for assisting the blind in learning architectural environments throughout an acoustic VR platform. They compared two types of learning through purely auditory exploration. They observed that Navigation in VR acoustic models provided comparable results to real navigation.

The second and main use of 3D sounds in VE is in helping people at locating virtual objects. This usage represents the most straightforward application of 3D sounds in a VE, as it targets the exact same advantage as in a real life situation. Recently, we have used 3D in order to help users at locating virtual vehicles in truck driving simulator [8, 11, 12]. In the same way, Barreau et al. have used 3D sound to enhance the immersive capabilities of a reconstructed sugar plantation [3]. In [10], Gunther et al. study how adding spatialized sound to a VE can help people navigate through it. They observed that adding 3D sounds may reduce time taken to locate objects in a complex environment, but do not increase the acquisition of spatial knowledge. However, the study of Turner et al. [24] suggests that 3D sound could be exploited to extend the apparent depth in a stereoscopic image. Also, Larsson et al. have studied the impact of the reverberation of sound sources moving in space [13]. According to this study, the authors concluded that the reverberation might enhance the sense of presence in the virtual environment.

Another use of 3D sound is in replacing visual feedbacks with sonification. This approach is widespread when analyzing large datasets. Sonification conveys spatial and temporal information of raw data into meaningful auditory signals [25], as Menelas et al. did with haptics [19]. 3D sounds may also serve to complement or to reinforce a visual feedback. In [17, 18], we have study how to exploit audio-haptic interactions for the identification, localization and selection of targets in a 3D VE. We noticed that haptics enable a rapid and precise selection. On the other hand, it appears that the auditory feedback was particularly important in complex situations since it allows perceiving multiple targets at the same time. We have also noticed that the ability to properly assess the direction of a particular sound varies greatly from one user to another. In particular, the results suggested that training plays an important role in this task. Therefore, we want to allow people to improve their abilities to perceive the direction of a 3D sound simulated on a computer. These results were a driving force for the design of the current research.

3 Design of the Game

Given that the goal of the proposed game is to help people developing their capabilities to detect the direction of a 3D virtual sound, we rely on a framework centered on learning for the design of this game. To this end, the framework that caught our attention is proposed by [15]. It is entitled Motivation and Learning through the facets of a serious game. In what follows, we describe how these facets led us to design a game that, a priori, should ensure the achievement of targeted educational goals. Thus, the design of the game is centered on six facets. The following presents how these facets are implemented in the game.

  • Pedagogical objective. The educational objectives of the game are the first step of this design approach. In our case, this facet concerns the identification of knowledge that can be transmitted throughout the game. Here, the goal is to allow the user to be able to identify the direction of an incoming 3D sound.

  • Domain simulation. This facet is to materialize the environment of the game. As stated previously, the objective is to locate the source of a 3D sound; such a task can be found in various situations of everyday life. For this, to represent the context of the game, we choose a scene that matches an everyday situation. Knowing that youth people are the primary audience of this game, we have retained a possibly familiar environment: a virtual bar (Fig. 1).

    Fig. 1.
    figure 1

    Representation of the visual off the game.

  • Interaction with the simulation. This facet represents the level of playability. It determines the relationship between the actions of the player and the feedbacks of the system; this allows determining the type of the game. Considering our objective, we want the player to experiment the game in a first person view.

  • Problems and progression. This facet concerns the progression in the game. It defines how to gradually increase the difficulty of each level. The goal is to balance the difficulty level so that it will be neither too low nor too high. Too much facility may create a sense of confidence that may cause a lack of practice. Similarly, too much difficulty may discourage a user. For this, we want to create several level of the game that will have an increasing difficulty scale. This will advance the player while keeping his motivation.

    The current version of our game has two levels. At the first one, one throws glasses to the player. Glasses can only come from two directions: right or left. At the second level, the player is put in the shoe of a waiter; he is placed in the middle of a bar where he receives orders. Here, the sounds can come from any directions within 360° around the head.

    Two more levels (3 and 4) are currently under development. At level 3, multiple sounds will be rendered at the same time in the same conditions of level 2. At this step, there will be no restriction regarding the number of sounds that will be played. This amount will increase from n to n + 1 each time the player comes to 80% of good identifications of the appropriate source among n possible sources. At level 4, the complexity will increase a little more as sources will also be place at various elevations.

  • Decorum. Decorum balances motivation and learning. It consists of many elements that will bring fun to the game in order to push the leaner to achieve the desired objective. To ensure the distractive side, graphics and animations are exploited. As an example, at the first level, an identification of the position of the sound source, the success, is rewarded by a glass of beer. A failure results in the sound of breaking glass.

  • Condition of use. This facet is intended to define how to exploit the serious game while maintaining its educational and recreational qualities. Given that this game is designed for training, we estimate that users can operate it within different real life conditions. Because of that, we made a casual game that is very simple and easy to use.

4 Implementation and Evaluation

Based on the facets described at previous section, we developed the game using Unreal Engine 4. Here we report, the experiment that aimed at evaluating the proposed game. A total of 15 people (3 women) took part in our experience. Participants are mainly undergraduates and graduated-students in various fields including literature, education and computer science.

4.1 Hardware

The serious game is tested on a 15.6 inch HD LED laptop with optimum resolution of 1366 × 768 pixels. This computer is equipped with an Intel Core i5-2450M processor 2.5 GHz, an NVIDIA GEFORCE GT 525M graphic card and 8 GB of RAM. Windows 8 (×64) Professional is the installed operating system. A professional binaural headphone is used. Subsequently, we present the experimental protocol.

4.2 Experimental Protocol

The purpose of our experiment is to evaluate whether playing the developed game affects or not the capabilities of a person to adequately identify the source of a sound. For this, we want to record two different scores (before and after using the game) for each level of the game. This test aims to verify if there is a variation of the player’s score after the use of the game. So the test involves four stages supervised by an appraiser.

  1. 1.

    During the first stage, one explains the experimental protocol to the participant and he is asked to read and sign the consent form. A pre-test form is used to collect certain information: age, gender, current occupation, and level of contact with video games. The test then moves to the second stage.

  2. 2.

    At the second stage, the participant seats in front of the computer. He receives information about the controls of the game. Namely, at each trial, he has to click to trigger the 3D sound. The participant is then invited to familiarize with the game. Thereafter, the learner will be asked to make 10 attempts for each level. These data are stored as initial results.

  3. 3.

    At the third stage, the learner is invited to train with the game for a period of ten minutes. After this time, he moves to step four.

  4. 4.

    In the final stage of the test, participants are once again invited to make ten attempts for each level. As in the second step, the successes are counted. These are the final results.

4.3 Results and Analysis

All the participants have enjoyed the test. Observed scores are reported at Figs. 2 and 3 for the pre-training and post-training.

Fig. 2.
figure 2

Scores realized at the first level

Fig. 3.
figure 3

Scores realized at the second level.

For the first level, we find that all players are at least as good as before using the game. We can observe that the score increases by an average of (38%) from 6.26 to 8.67 after the training session. Although these results show a clear improvement, at this stage we cannot conclude that this game allows improving capabilities for identifying the direction of a 3D sound. In fact, given the high rates of proper identification at this level, it seems that this scenario is fairly easy to complete. Such an observation reinforces other studies that have often mentioned that the identification of the left and right directions is easier for a human.

At the second level, we find that the score increases by an average of (42%) from 3.8 to 5.4 after the training session. With these results, we observe that the improvement is more marked than in the first level whereas this level is obviously more difficult to complete. This suggests that the game could be a good help for improving the capabilities to locate a sound in a complex 3D scene.

In a general way, it is important to note these results only concern a short-term learning. It will be important to assess if people do enjoy the game for a long period of play. In the same way, we want to evaluate for how long people will take advantages of the skill acquired in the game. All these aspects need to be studied in a future work. We also found that the contact level of participants with video games positively influenced their scores. This suggests that participants who often play video games might have better scores than the others. Nevertheless, we believe that this relationship should be investigated in detail in a specific study.

5 Conclusion

In this paper we described the design of a serious for helping people at developing their capabilities to detect the direction of a 3D virtual sound. For this, we exploited a framework that supports the learning objectives within an enjoyable environment. The proposed game is positioned in a bar where the user plays the role of a waiter. He has to locate the position where the sound is coming from in order to not break the glasses. At current stage, the game counts two levels. At the first one, the user has to identify whether the sound is coming from his left or from his right. At the second level, the user has to identify the direction of a sound that may come all around his head. This game has been implemented in Unreal 4. With the realized experiment, we observed that all the fifteen participants of that study have, for the two level of the game, improved their score after practicing the game for a period of 10 min.

We are now developing the next two levels of the game. Moreover, we want to investigate the impact of using this game over a long period. For this, we want to make the game available for free in specialized markets.