1 Introduction

Video games are changing the way people perceive and interact with the world around them. In addition, video games are increasingly used beyond entertainment for education, health, behavior change, rehabilitation, and other serious purposes [20, 31, 45]. Many video games have been identified as contributing factors to creativity and behavior change. Some other games in terms of exergame stimulate energy expenditure for exercise and rehabilitation purposes.

Despite an increasing interest in using video games and various social, educational, and health benefits, a significant number of people are deprived of using these games because of visual impairments [20]. In 2018, World Health Organization estimated 285 million visually impaired worldwide, among which 39 million were blind [41]. Consequently, it is a moral and legal obligation to make video games accessible for visually impaired people. In the United States, educational institutions relying on federal funding are forced to make their information technologies accessible for these people.

Visual impairment can create challenges for children to access various opportunities, including video games [21]. These children are reluctant to be involved in complex and social activates [5]. Consequently, they usually withdraw themselves from others [8]. Some online video games offer a meeting place within Second Life for those feeling socially isolated. However, visually impaired individuals are excluded from joining these communities due to accessibility problems. In addition, access to video games can offer new educational opportunities for individuals with visual impairments. Since people with visual impairments have limited access to physical education, recreation, and athletic programs, they are generally inactive. Exergames have some particular benefits for individuals with visual impairments. These games can be played in a controlled environment with minimized risk of injury. Developing video games that are accessible for individuals with visual impairments can provide several opportunities for visually impaired people.

Two important factors that can enhance the quality of life of blind people are orientation and mobility [30]. Research has shown that audio games can improve orientation and mobility in a pleasant and entertaining way [35]. Game-based approaches to training navigation in unfamiliar environments can help the players build a reliable spatial cognitive map of the surroundings. The initial objective of this research is to develop a computer game for visually impaired individuals with the goal of enhancing their orientation and navigation skills. We aim to use audio interfaces to develop audio games that are accessible to visually impaired people.

Various audio cues can be used to help players recognizing objects in the gaming environment. For example, verbal notifications can assist players regarding their location and inform them about which tasks must be accomplished. On the other hand, non-verbal cues can indicate the meta-level of knowledge regarding target objects’ location, direction, and distance to build spatial cognitive maps. To this end, the sonification process in audio games involves the assignment of audio cues to game objects, characters, and events. Various auditory dimensions can be used to describe a game space completely. Generally, the variation in the loudness (amplitude) of sounds is employed to encode the distance from game objects. The auditory dimension used in the game space must convey the desired perception for effective spatial navigation. In this paper, we study what type of sonification and interaction techniques can better accomplish this goal. The objectives of this research are:

  • Design and development of an audio game for visually impaired people to improve navigation skills.

  • Investigate the capacities of different sonification and interaction techniques in developing computer games for visually impaired individuals.

  • Study the techniques to represent spatial information for visually impaired people.

2 Background and related work

Interaction design is one of the most important issues in the design of a computer game experience [46, 47]. According to the interaction model proposed in [57], interaction in video games has three steps, including receiving stimuli, determining response, and providing input. Games provide visual, auditory, and haptic stimuli that can be primary or secondary. Perceiving primary stimuli is necessary to play the game, while secondary stimuli are a supplement to a primary stimulus. Almost all primary stimuli in video games are visual [20]. For example, in first-person shooter, racing, role-playing, and many other games, a player cannot play the game without perceiving the visuals. Although sound (e.g., the sound of approaching enemies) or haptic feedback (e.g., vibrating controller) provide supplementary information, players cannot play the game without visuals. In some dance/music games, visual and audio stimuli are both primaries. The lack of supplementary stimuli in video games can negatively affect the gaming experience. For example, video games without sound or haptic feedback can lead to a reduced gaming experience.

The second step in the interaction model of games is determining the response, in which a player determines the best response to the stimuli in the game. This decision is selected from the set of available actions in the game. In a soccer game, this can be passing the ball to other players or kicking the ball when the player is in a particular position. After making this decision, the third step in the interaction model is providing input, in which the player physically issues the selected action. Generally, this is realized through physical devices such as game controllers, keyboards, or mice.

The main barrier for individuals with visual impairment is in the first step, where these people have difficulties perceiving primary stimuli. This affects the second step, where it is not possible for players to determine the actions. Although some games provide audio stimuli in addition to visual stimuli, they do not contain enough information to take actions in the game. For example, the sound of footsteps may represent an approaching enemy or a friend. Consequently, when receiving stimuli is interrupted, the player is unable to provide appropriate input in the game.

Human listening can be categorized into three categories including causal listening (listening with an attempt to understand the source of sound), semantic listening (understanding auditory codes such as Morse code), and reduced listening (listening to the qualities of a sound without taking into account the source of the sound e.g., appreciating music with an attention to pitches and harmonies) [15]. More than one listening mode can be heard simultaneously. Consequently, computer games can encourage more than one way of listening. For example, codes in the sound of allies and enemies can simultaneously guide players while the sounds of ornamental objects (e.g., walls) function as navigational aids.

Considering the listening modes (causal, semantic and reduced) as vertices, musical sounds and authentic sounds can be identified. Different layers of information can be embedded in the game sound. In Tim’s Journey game, the functional information in the sound of a machine is at a different level from the presence of that machine. For example, a malfunctioning machine has an additional different sound reflliected on a meta-level. Different layers of information can be conveyed by rhythmical patterns over time or sound itself.

2.1 Game design approach

Two approaches can be used to provide games for visually impaired individuals. In the first approach, the interface and design of mainstream games are modified such that they can be played by individuals with visual impairment. In the second approach, specific games are designed for visually impaired individuals.

Almost all of the mainstream virtual world games are not accessible to visually impaired individuals. This includes both Game-like virtual worlds (e.g., World of Warcraft) and non-game virtual worlds (e.g., Second Life). Zone BBS is an example of a game platform for blind people that is also enhanced with socialization functionalities. This platform has a community of around 7000 blind players offering various online and multiplayer games. PCS is another platform offering several multiplayer games for visually impaired people. There have been efforts to make these games accessible for blind users. For example, a technique is proposed in [20] that makes it possible for players to navigate their avatar using force feedback. In this technique, different vibration frequencies are used to identify various object types in the virtual world. The main problem with this technique is that this technique requires users to memorize the vibration frequencies of different object types that can be limited. In Max, the virtual guide dog, chat-like interface feedback is used for this purpose. In Alphaworks, hotkeys are used for navigation and communication [20]. This interface is limited in supporting a large number of actions. This system also makes it possible to tag objects.

The accessibility levels of visually impaired individuals are not the same. Thus, before designing a game for visually impaired people, it is important to classify the degree of impairment. It is considered important to determine which accessibility parameters must be considered in game design. World Health Organization (WHO) has classified people regarding visual abilities to Normal vision, Moderate visual impairment, severe visual impairment, and Blindness. Moderate and severe visual impairment are also categorized as low vision. Color blindness is another visual disorder in which a person is not capable of detecting certain colors. In order to make games accessible for Individuals with low vision, techniques such as using high-contrast color schemes, using scalable fonts, or providing facilities to change the colors or zooming can be used. However, for individuals with severe visual impairments, primary visual cues must be replaced with non-visual stimuli [57]. Generally, the audio channel is used primarily to make games accessible to blind users that are called audio games.

In [20], a set of strategies have been proposed to develop accessible games for visually impaired people. These strategies are categorized in strategies for low vision people and blinds that are described as follows. To make games accessible for low vision users, techniques such as modifying visual stimuli to enhance visual feedback, using High-contrast objects in the interface, using scalable fonts that make it possible to increase readability providing Zoom options on particular areas of the screen, and providing customizable color schemes especially for color blindness are suggested. On the other hand, for blind people, methods including using audio stimuli (even though this may interfere with existing sounds in the game), using a screen reader (if the text is available) or self-voicing, using Real-world sounds (e.g., sounds of wind or footsteps) can provide hints for players. Employing 3D sound effects can also be effective in the case of using headphones, replacing visuals with haptic feedback (even though this technique requires a dedicated gaming device), and using haptification, in which frequency, intensity, or a pattern of a haptic signal (for example vibration) is used to indicate the distance between the objects or location of an object have been suggested.

Two main streams that can be replaced with visual sensors are audio and haptic channels. This has resulted in proposing various audio games and haptic games that are described in the following. First of all, the review of papers on games for visually impaired people showed that mainstream games are appealing. There is a great need to develop computer games exclusively designed for visually impaired people.

2.2 Audio games

In recent years, there has been wide attention to the sound content of computer games. As discussed in [22], the sound is an expressive narrative medium that can immerse players in a three-dimensional visual environment. Sounds can also convey very specific information creating subtle moods. Myst III: Exile [53] and Halo [39] are examples of mainstream games in which advanced soundtracks help following the narrative development of the games. In music games that are celled rhythm action games, players have to coordinate their input with the rhythm of the music. PaRappa the Rapper 2 and Dance Revolution (Konami digital entertainment) are examples of these games. However, the audio content of mainstream computer games is not developed as much as visual content. In [38], the design, implementation, and evaluation processes of an accessible version of the educational audio games are described.

In Mach 1 Car Racing, audio stimuli such as the echo of a car engine are used to determine the direction a player must turn. However, the player controls the car using arrow keys. In Shades of Doom, AudioQuake [2], audio cues such as footsteps are used to find ways in this first-person shooter game. In this game, a sound compass is used in which different tones indicate the directions a player points at. In addition, sound radar is used to determine the objects around the player. In The Last Crusade [18], self-voicing is used to find out happening events in the game. In AudioBattleShip [42], audio cues are used to help the player to find the locations on the board or happening actions such as a dropping bomb over certain cells in the grid of the game. Tim’s Journey is an adventure game with a non-linear narration that is open-ended. In this game, the player explores an island to discover his surroundings. In this game, augmented audio features cover all aspects of the game.

AuditoryPong [27] is an audio game in which players move the game paddle by a haptic device and receive acoustic feedback. In Sonic Badmington [32], a virtual shuttlecock is used with audio feedback. Audiopolis [43] is a city simulator in which blind children learn how to navigate in the city using haptic devices. In [50], a VR simulator is proposed in which the virtual environment is rendered entirely through the hearing. In this simulator, a tracking technique is used to determine users’ positions and players perceive the gaming environment through acoustic information. There have been games based on PHANToM [34] haptic device, which is a force feedback device providing the sense of touch as well as the feeling of force feedback at the fingertip. Some techniques use the camera of smartphones as a sensor to identify the world for blinds. In some mobile games, a phone camera is used to capture pathway scenes and transform them into messages and warnings for the users. In [41], an Arduino-based device is proposed that makes it possible to capture left-right movements as well as drag and drop. This device provides a real experience for exploration games in which recognizing the objects is crucial.

Sound in immersive interfaces and perceptual issues in auditory displays are discussed in [29]. Sonification strategies in terms of speed and precision in the case of a one-dimensional guidance task are proposed in [40]. In [48], a sonification model for encoding visual 3D information into sounds is proposed. This work is inspired by the properties of the objects encountered while a blind person is navigating. In [4], Pingball game is developed for visually impaired people by using sonification techniques, such as shifting pitches and varying volumes.

2.2.1 Sound type in audio games

The sound used in audio games must be rich enough to convey all the necessary information of the game [17]. Otherwise, it cannot be replaced with the visual channel. This can be categorized into sonic displays and earcons. In Sonic Displays, recognizable everyday life sounds are used in the game (e.g. ringings, steps). In this state, the communicated pieces of information are assigned to the actual semantic content of the auditory icon. For example, the sound of an alarm is assigned to a reminder about an event. On the other hand, earcons are synthesized sounds that are made of various single-audio motifs, in which the amplitude or frequency of motifs are modified. Earcons are employed to convey information that occurs simultaneously. Another technique that can be used in audio games is Binaural technology. Spatial positioning is added to the sound, where a pair of headphones is used. An audio stimulus is virtually placed around the listener’s head in this technique, providing 360 degrees space.

Beacon technologies can also be used for audio games and location-based games in particular. In [21], audio bracelets have been proposed that make sounds from the wrist. This makes it possible for the visually impaired to be involved in social games [11]. This technology provides facilities for the children to find about the activities, people, and places in their surroundings. For this purpose, They have used audible beacons to estimate proximity. These beacons can be worn (bracelets) or installed in the room (as a sound source). They showed that using sound from audio bracelets in accordance with sounds in the environment can provide useful localization cues in the gaming environment. In audio games, sound sources can be smartphones, wearable devices, or from the surroundings:

Mobile device sounds

: In this state, the users hear the sounds through the device loudspeaker or headphones of the mobile devices. Talking Points and Audvertare are examples of this approach in which speech is to tell users about the approaching environment and objects [49]. In [1] short snippets of music are employed to show meaningful audio information about the surroundings. They showed this technique can create a better experience than speech or Auditory Icons. In [14], it is shown how to structure the information when representing navigation information for users. They showed how keeping audio cues concise and showing required directions or actions before other information can provide a better navigation experience.

Wearable devices

: Some particular devices have also been used for delivering non-visual assistive Information in audio games. For example, smart glasses can be used to inform visually impaired people about their surroundings. A sound from a bracelet can also be used to encourage users to reach for nearby objects and explore their surroundings.

Surroundings

: In this state, the auditory cues come directly from the surroundings. This approach is ideal for indoor Spaces. Audio-tactile locations are tags based on Bluetooth beacons that allow visually impaired users to find where they are located. In BlueView [13], users are informed points of interest in this system beacons are paired with speakers. In Onyx Beacon, Bluetooth beacons are installed on buses to provide facilities for visually impaired users to nd the right bus at the station. When the right bus comes, a notication is sent to the smartphone of the users. As discussed in [21], location-based games can exploit this approach to play a sound when the game wants to encourage a child to do some action or inform him when his friend has moved to another location.

2.2.2 Navigational audio games

Various audio games have been proposed for blind individuals. An extensive list of several hundred Audio games is available at http://www.audiogames.net.. AbES is an audio game that makes it possible to navigate inside a virtual building. The virtual environment in this game is a building including several rooms room, doors, and corridors. The player is required to search for hidden jewels and hide them in another place. This game has an audio user interface to identify the position, orientation, and heading. However, there is still a graphical interface representing the layout of the building and game states. Keyboard provides the interaction with the game. In terms of sonification, 3D naturalistic and specialized sounds in addition to auditory icons, earcons are used in this game. This game also provides audio verbal commands.

Tim’s Journey is an adventure game that combines complex interactive soundscapes with the narration of the gameplay [23]. The game space is an island with a harbor, a forest, and a mill. In this game, the player moves around freely to discover a hidden mystery. This game provides both audio and graphical interfaces. User interaction happens through the keyboard. In terms of sonification, each scene in the game has a musical theme that is changed continuously, corresponding to different actions in the environment. In this game, objects have priority represented by the intensity and repetition of their associated soundscapes.

Audio Doom [35] is an audio game with the goal of involving blind children in conducting navigational tasks. This game aims to improve the spatial cognitive abilities of these children. The game environment includes walls and corridors in which the player tries to avoid monsters and reach the exit in order to go to the next level. This game has both graphical and audio interfaces in which players interact with the game through Keyboard, mouse, and joystick. In terms of sonification, audio spectral cues are used.

Shades of Doom [56] is a first-person shooter audio game in which the game environment includes many corridors and rooms. The player has to explore the environment to stop a dangerous experiment. This game has both audio and graphical interface with a keyboard interaction. In terms of sonification, this game has Dynamic and realistic multi-layered 3D sound playing simultaneously. It also uses Doppler for realistic movement sounds.

Pyvox [24] is a toy-based game with the goal of creating an enjoyable experience that becomes playable as soon as the user interacts with it, without any narrative instructions. The player has to find the exit for each floor without hitting the walls. Players use a mouse for navigation in the gaming environment. In the sonification approach of this game, unpleasant sounds are assigned for hitting the walls. The exit doors are indicated by a variation of the pitch dimension of the sound (related to the player’s position in relation to the position of the exit door).

Blind Side [55] is an audio game that aims to communicate an immersive audio experience that representing emotions in rediscovering the world. The game scenario is navigating through the environment to rediscover the world as blind people. The player can only use the Audio interface using Keyboard on the personal computer or gyroscopic in the mobile version. The sonification includes 3D audio, several recorded realistic sounds, as well as low-pass filtering and attenuation for the sounds objects behind the listener. The sound varies in pitch based on the angle and speed of moving subjects.

Terraformers [56] is an audio game designed for both sighted and blind players. The gaming environment is a 3D space on an imaginary planet. A player has to defeat the revolting robots that colonize the planet. Both visual and audio interfaces are used in this game in which the players interact with the keyboard. In terms of sonification, the 3D sound is assigned to all objects. Voice commands are used to indicate the direction of the player. In addition, 3D sounds indicate the distance to objects.

2.3 Haptic games

Some games have used haptic stimuli instead of visual cues that are called haptic games. In Blind Hero [57], players play a piece of music using a controller that provides vibrotactile cues. In addition, a particular glove is used to make it possible to provide input. Rock Vibe [20] is another haptic game in which vibration is provided to represent the drumhead cues. In addition, an audio cue is used to provide feedback on correct and timely hits as well as erroneous hits.

Some games have used commercial controllers for this purpose. For example, Tennis [36] uses Wii remote controller. In this game, the vibrating controller notifies the player from approaching the ball, but the system automatically moves the player to the right position to hit the ball. The player has few seconds to make the right action on the controller. Bowling [36] also uses Wii remote in which users can scan their surroundings with the Wii controller. For sure, one important direction is developing mobile games for blinds as they have transformed the landscape video games. Papa Sangre is an example of a mobile game developed in the ios platform with no graphics that employ binaural audio. In this game, a player must collect some objects without disturbing a monster. In this game, the player must walk toward sounds based on the on-screen footstep pads.

TiM is a project to provide adapting tools that enhance video games using tactile, audio, adjustable screen displays [10]. In this European commission project, researchers studying usability, education, rehabilitation, psychology are involved. This project has tried to develop computer games prototypes for visually impaired children. The SITREC (Stockholm International Toy Research Center, at KTH, Royal Institute of Technology) section of this project has developed three audio games including Mudsplat, X-tune and Tim’s Journey. The experience of developing TiM’s audio games can be used in designing interfaces for ubiquitous technologies.

Using a tangible user interface (TUI) is an alternative way to develop video games for blinds [33]. In TUI, a physical representation of the digital information is presented to the player that makes it possible to grasp the data with hands literally. In this technique, digital information is matched with the physical representations. While interacting with objects, the feedback in this system can be through the touch or in an aural or visual way [33]. This provides an interaction technique in which players interact with something close to reality. In other words, this technique is an effort to join the physical world with the digital one.

As discussed in [33], TUI can improve the learning process for several reasons. Since that perception and knowledge are linked together, grasping objects in hand makes it possible to assimilate their nature. In addition, it can result in more engagement of players, especially for children with disabilities. Moreover, this technique can increase the collaborative learning process, as several players can interact with the same objects simultaneously.

Using TUI technique, a game is proposed in [33] for visually impaired children. This game has no visual interface, and the game mechanics are developed using touch, sound, and speech recognition. In this game, physical objects are used as a tangible user interface, in which players grasp the objects to interact with the system. The instructions are given by recorded audio, while the feedback can be haptic or audio.

3 Game design

Before the game design, we conducted primary research to find the main needs, requirements, and skills of people with visual impairment [25]. In particular, we studied the interaction of blind people with existing technologies and how they accept the exiting interaction technologies. In particular, since the goal of the research was developing a mobile game, we investigated the interaction of these people with smartphones. We also investigated the experience of these people in using audio games. In terms of mobile games, we were interested in finding what type of controls (e.g., audio-based or haptic-based interaction) they prefer to use while playing games. We were also eager to find what difficulties they had with existing mobile applications and games.

Our findings showed that in order to increase the accessibility of these games, they must have step-by-step tutorials through playing the game that makes the introduction to games easier. To this end, instead of providing instruction to play the game, the players are invited to play a tutorial level in which they become familiar with interaction techniques and the sonification method used in the game. This provides facilities for players to freely explore the game without failures in the learning phase. Some screenshots from the tutorial are shown in Fig. 1.

Fig. 1
figure 1

An screenshot of the main page of the GrandEscape Game

According to the flow theory [37], a player’s skill and the difficulty of activities in the game interact with a result of cognitive and emotional states. Consequently, establishing a balance between player skills and the game challenge is crucial to increase the quality of the game experience. Since the skills of different players are different, we incorporate many difficulty levels in the game, from extremely simple to extremely hard. The number of friends (and enemies) and the speed of moving them in the game space are two parameters that determine the difficulty level of the game. Consequently. Our game allows players to select the range of challenges as players might benet from slower and easier versions.

In this paper, in order to study the effect of different sonification and interaction techniques in mobile games designed for visually impaired individuals, a mobile game called GrandEscape is developed. GrandEscape is classified under entertainment games that require making the right decisions to move in the game space to find parts of the key required to escape the room.

Game space and characters:

In GrandScape, since the only communication channel of players with the game is the audio channel, we decided to keep the game space as simple as possible to make it possible for players to navigate in the game space easily. Consequently, the game space is designed as a square room with a door in one of the walls. In addition, there is no object or obstacle in this room to interrupt the players. The motivation behind the simplicity in the game space design is reducing the cognitive load of perceiving the game environment while focusing on the game scenario based on audio cues. Accordingly, the motivation behind simple enemies and friends in the game is simplifying the gameplay. Although the game has no graphics displayed to the player, an abstract representation of the escape room, player, enemies, and friends are shown in Fig. 2. The red cube is the player, the green sphere is the friend, and the orange sphere represents the enemy. The game space is a cube with a blue door on the corner of the room. Please note that this game space is not shown to the player while playing.

Fig. 2
figure 2

An abstract representation (which is not shown to the player) of the game space and characters in GrandEscape

Game scenario:

This game is a story of a prisoner trapped in a big dark room with no lights. Consequently, the player can not see anything in the room. This prisoner wants to escape the room. However, in order to open the door, she needs to collect the parts of the key from her friends in the room. The player can not see her friends, but she can hear the voices and she can recognize her friends. The player must approach the friends to receive parts of the key from the friends. However, this player must avoid contact with the enemies, as they can grab the parts of the key. Similar to friends, the player must identify the enemies from their emitting sounds. As the game space is a dark room, enemies and friends are randomly moving the room.

Interaction design:

In the process of finding and designing an appropriate interaction technique in the GrandScape game, we were interested in an interaction technique that can be accomplished without the need for visual cues. To this end, we designed and tested GrandEscape using two different interaction techniques, including tapping and tilting. In the tilting technique, the player moves the character by tilting the smartphone up, down, left, and right. In this technique, a combination of gyroscope and accelerometer is used to indicate turning direction and speed of movement in the game space. On the other hand, in the second technique, the player taps on different sides of the screen to move in that direction. In this technique, each tap is a single predefined step towards that direction. The motivation behind using tapping and tilting interaction techniques in the GrandScape game is that these two interaction techniques are two common and intuitive techniques in mobile games that do not require visual cues on the game screen.

3.1 Good gaming experience

The most important issue in our design was attention to this issue that the main aim of our audio game is creating an enjoyable experience for players. To this end, we have tried to fulfill the following specifications. First, we have tried to increase the diversity in audio clues to create a high level of attractiveness to motivate the players. In order to encourage players, our game design does not have a game over situations that might discourage the user from continuing to play the game. One important feature of our game is that Escape provides an intuitive learning strategy that is integrated into the game. This is an easy-to-understand process without the use of text-to-speech instructions. In order to increase immersion and the sense of presence in the game, the proposed game is developed from the first-person perspective.

An important goal of any game and the proposed in this research is providing the flow state, which is known as the feeling of enjoyment and happiness and fun while playing the game [12]. One important issue to establish the flow state in the game for blind people is that audio games must create a balance between players’ skills and the challenges of the game. To this end, our game has tried to fulfill the following criteria that are suggested by [12].

  • An activity that intensively needs the player’s abilities: GrandEscape requires the spatial recognition skills of the player to understand the game space, move the game environment, interact with objects, and finally follow the game rules to proceed in the game.

  • Well-defined, comprehensible goals: The goal of GrandEscape is pretty clear and simple. The player has to collect the pieces of the key in order to escape the room.

  • A sensation that the player lost track of time while being immersed in the game: In our game, the player must carefully listen to the audio cues in order to understand the gaming environment. Consequently, the player has to be immersed in the game in order to proceed in the game.

  • A feeling of being in control of the game’s outcome at each moment of time: Every moment, the player is careful as an enemy might approach him and grab a part of the key from the player.

  • Immersion in the actions of the game: Players in GrandEscape are disconnected from the world by closing their eyes, and they are immersed in the game space in order to be able to play the game. The player is aware of all the changes happening in the gaming environment by carefully hearing the various sounds emitting from different objects, characters, and happening events in the game.

3.2 Audio design

In the design of audio games, it is necessary to identify the differences between various types of auditory information. These games contain a large amount of sound content that can build a playing experience based on auditory information [3]. In [22], auditory information is categorized as an avatar (e.g., footstep sounds or hitting an object), objects (representing the presence of an object), character (sounds coming from non-player characters), ornamental (e.g., ambient music), and instruction (the helpers’ advice in the game) sounds. In this research, we have identified and categorized the sounds for five categories including walls, enemies, friends, and the exit door.

Two different approaches can be used in the audio design of audio games. In the first approach, the sounds are designed to help players easily distinguish between different sounds. On the other hand, sounds can be designed in a way that creates ambiguity in understanding the different sounds to increase the challenges of the game. For example, In Tim’s Journey [22], the goal is to create ambiguity between the sounds of the objects and ornamental sounds. However, in our game, we argue it is important to show the player if the sound the player hears is generated by the players’ activities or by the environment. To this end, we have designed a technique in which sounds generated by the avatar are instantaneously connected to the player’s input, which is turning the phone to move the avatar. We argue this instant feedback is important to inform the player if an action is completed.

Sound is a meaningful channel that can convey information and emotion as expressive as graphics [3]. In audio games, their purpose is to improve the experience of understanding game scenarios using audio signals [16]. In this project, we aimed to balance the requirements of aesthetics and functionality in the sounds. In this project, we needed various sounds to represent different entities and actions in the game. In the auditory icon approach, the designer selects or composes recognizable sounds as much as possible based on authentic recordings. On the other hand, in the earcon approach, short musical phrases are associated with particular information in the game [9]. This project intended to create a fun exploration experience for players while making a clear and playable game. Exploring the sound is crucial in our game. The game music changed by moving the avatar in the game while approaching different objects, triggering new sounds. In addition, picking up objects, losing them by hitting the monster can also affect the game sounds.

Different techniques can be used to show the presence of an object within the game using auditory information. In the objects emitting continuous sounds approach, the player can hear the sounds of all objects simultaneously. However, this approach can confuse the players. On the other hand, using a constant signal instead of looping the sound can thoroughly reduce its level of realism. The other approach is associating the emission of short object sounds with some form of impact. The problem with this approach is that it does not create an overview of the game for players. In an intermediate approach that is used in Tim’s Journey game, static objects create brief sounds that are triggered in rhythmical patterns in recurring loops. All sounds are virtual objects in the game where an avatar can approach or pass them in this approach. Few sounds can actually be interacted with to standing out important sounds. In this approach, the player should explore the space gradually by listening to several sound loops.

As discussed in [17], one of the important design considerations in the design of games for blind children is the game must provide audio feedback for any action and event in the game. Players must be able to distinguish and understand this feedback in order to navigate the game environment. Another important design issue is that the game designer for blind people must be aware of the differences between the spatial awareness of visually impaired individuals. Third, conventional structures such as menus and sub-menus must be revised when used in audio games as visually impaired people do not have an experience of navigation based on these structures. There is a difference between creating an accessible and inclusive [19]. While Accessible means it is possible to play, inclusive means every aspect and gaming experience is considered in the design process no matter the player has access to the graphics or not.

Two different strategies for navigation that can be used in audio games are the allocentric and egocentric frames of reference. In the allocentric frame of reference, the emphasis is on the properties surrounding objects no matter what the player’s point of view is. On the other hand, in the egocentric approach, a player acquires specific knowledge from a first-person perspective. Research has shown that when information to find direction is presented to the player using audio spectral cues at the meta-level of perception, the memory load that is required for navigation is reduced. Consequently, a game exploration metaphor that relies on spatial sounds can result in creating a reliable spatial cognitive representation map.

3.3 Sonification design

The sonification process is known as the assignment of audio cues to objects, characters, and events in the game to make it possible for players to distinguish different game objects. This process is an important step in the design of any audio game, including GrandScape. In the sonification process, different sound characteristics can represent relevant information about the game environment. In particular, audio cues can describe virtual settings such as size, color, distance, elevation, direction, and other properties. An example of the sonification process in games is provided in [3] that are shown in Fig. 3.

Fig. 3
figure 3

The list of sonification techniques used in the computer games [3]

Since the game proposed in this research is an audio game, the sound design plays an important role in the success of the game [25]. In addition to sound design, another important challenge in this section is the seamless integration of sound in the game. For rendering music in the game environment, one option would be a real-time composition of music. However, in order to avoid the real-time computational overhead, we employed precomputed sounds. Thus, we composed a number of sound themes related to different objects, characters, and events in the game. In this research, each sound is individually designed and composed for each section.

Since the player has to move in the gaming environment to identify the environment, friends, and enemies, we had to design the sounds in a way that makes it possible to estimate to what extent an object or character is close to an object or a character. In this paper, we have designed and tested two different sonification techniques for this purpose. One option would be using the loudness feature, in which when the player approaches an object, the volume of the sound is increased. This might be a meaningful property in the real world, in the gaming environment where the player has to play for a long time. However, frequent increases and decreases of the sounds can be annoying. Another option is changing the tempo of the sounds as an indicator of the distance between the player and objects in the game. In musical terminology, the tempo is known as the speed of a given sound. This parameter is usually measured in beats per minute (bpm). In the second option, sound metaphors are used as a radar to indicate the position of dynamic and fixed objects in the gaming environment. Sounds are emitted from the objects’ positions. However, the tempos of the sounds are modified according to the approximation of the objects to the player. Again, to avoid computational overhead in continuously changing the tempo of the music based on the distance from an object, we decided to compose each sound at five tempo levels as below:

  • very slow (25-45 bpm): representing very far from an object.

  • slow (40-60 bpm): representing far from an object.

  • moderate: in the object area.

  • fast (close to the object).

  • very fast (very close to the object).

In this design, the sounds are designed in a way that they are complementing each other. In this composition, the overall timing is planned to emphasize musicality. We belie the designed music can communicate emotional qualities and setting a particular mood of a game or game element. In order to promote object identification and event identification in the game, sounds are designed in a way that describes actions in the game [36, 56]. For example, the game plays a collision sound when a player hits an enemy or a friend. The game’s atmosphere was enhanced by music and a rich variety of sound effects specially composed for this game. We argue the designed music can enhance the feeling of immersion. In addition, this can create associative cognitive maps of the in-game locations.

Both verbal and non-verbal notifications can be used in audio games. Verbal notifications are better when providing the players, the current location of the player. However, non-verbal cues can better indicate the meta-level of knowledge regarding the direction and distance of target objects [3]. Since audio information is perceived continuously in the audio games, this helps build spatial cognitive maps and consequently orientation, mobility, and navigation skills.

3.4 Technical details

The proposed prototypes are developed in the Unity game engine. Our game develops 3D audio cues using Unity game engine in which sound source positions are used for directional guidance. Many exiting smartphones support binaural audio using stereo headphones. However, we should note that the quality of immersive experience in this 3D audio game depends on the players’ device audio system. In order to have a game that is easily accessible for all people, we aimed to avoid using specialized hardware for playing the game. Consequently, we decided to develop and use the game in the mobile platform to break the barriers they face in getting closer to sighted people.

4 Evaluation

A set of user studies have been designed and conducted to evaluate different sonification and interaction techniques used in GrandEscape. A set of guidelines have been proposed in [6] that can be used to evaluate audio games. These guidelines have covered the accessibility, gameplay, and usability of computer games for visually impaired people. In [16] and [7], some heuristics have also been proposed for the evaluation of audio games. These guidelines make it possible to evaluate audio games during the development process. Although many of these heuristics have been applied in the design of our prototypes, a study is designed to empirically measure both quantitative and qualitative data related to the use of proposed interaction and sonification techniques. This makes it possible to make well-supported statements regarding the values of proposed techniques.

4.1 Research methodology

We aimed to measure quantitative and qualitative data related to using GrandEscape on sighted as well as visually impaired individuals. The first set of experiments involved 32 unimpaired participants. The objective was to study participant performance (examining players’ behavior in addressing the challenges of the game) and perception (evaluating human perception and cognition in the game context). The second set of experiments involved eight visually impaired participants, in which the objective was to evaluate the performance and gaming experience of visually impaired individuals.

We believe that our proposed game must be evaluated based on performance (examining the behavior of players in addressing the challenges of the game), as well as perception (evaluating human perception and cognition in the game context). In this regard, we measured the time to complete game levels, distance traversed, direction changes and error counts as quantitative measures to evaluate the performance of players in the game. On the other hand, we measured game experience and the sense of presence in the game in terms of perception.

Two independent variables in this study are sonification and interaction techniques. The sonification includes tempo and loudness conditions. In the tempo condition, the tempos of the sounds are changed based on the distance between the player and objects in the environment. On the other hand, in the loudness condition, the loudness of the emitting sounds from the objects and characters is changed based on the distance from the player. The goal of considering the type of sonification as an independent variable was to determine if the type of sonification affects the performance and playing experience of players. The second independent variable is the interaction technique, including tilting and tapping modes. In the tilting mode, the player’s character moves by tilting the mobile device in different directions. On the other hand, in the tapping mode, the player moves in the game space by tapping on different sides of the phone. This independent variable makes it possible to study what type of interaction tools in this type of game can create a better quality of game experience.

In order to address the potential learning effect, we used a between-subjects design, where each participant was assigned to only one of the groups. Consequently, the order in which the players have been exposed to different types of the game was removed as a potential confound. More specifically, according to our grouping policy, four groups (based on the sonification type × the interaction type) were formed with eight participants in each group.

Each session started with an introductory section to explain the experiment’s purpose and conditions, followed by signing the consent form. Then, participants answered a pre-study questionnaire about their educational level, prior experience with computer games, and audio games. They had an opportunity to become familiar with the game by freely playing the tutorial level without failing in this trial scenario. Then, they were asked to play two levels of the game. The first level had two enemies and two friends, and the second level had four enemies and four friends. After playing the game, players were asked to fill out the post-test questionnaires. In particular, they completed two questionnaires regarding the quality of game experience and the sense of presence in the game. Participants used the same smartphone to play the game in all experiments to eliminate the type of device as an independent variable. In particular, we used Honor 3, with a body size of (5.24 × 2.65 × 0.39 inches), a screen size of 4.7 inches, and a resolution of 720 × 1280 pixels.

The details of the game scenario were explained to the participants to make sure they know how to proceed in the game. Users were asked to sit on a fixed chair and play the game. They were free to play the game with close or open eyes, but most of them preferred to close their eyes while playing for more concentration on the game imagining and understanding the game space.

4.1.1 Participants

In the first set of the experiments, 32 individuals of different ages from the local university were recruited to participate in the study. The only precondition was having the experience of using smartphones. Among participants, only 27% were familiar with the concept of audio games, while just two participants already had the experience of playing these games. Participants were randomly assigned to one of four groups including Tempo-Tilting (two females, six males, 22-25 years, mean 23.9 years, SD 1.4 years), Tempo-Tapping (four females, four males, 21–24 years, mean age 22.3 years, SD 1.8 years), Loudness-Tilting (three females, five males, 21-23 years, mean age 22.2 years, SD 1.4 years), Loudness-Tapping groups (five females, three males, 21-24 years, mean age 22.7 years, SD 1.2 years).

In the second set of experiments, experiments were conducted on eight visually impaired participants to evaluate the effect of different interaction and sonification techniques. In particular, eight fully blind individuals (6 males and two females, mean age 34.5, SD 3.8 years) participated in the experiments. In the recruitment process, patients with emotional or cognitive deficits and those who had difficulty understanding the tasks were excluded.

4.1.2 Measures

We measured the sense of presence, the quality of gaming experience, and players’ performance as three dependent variables to make it possible to compare different sonification and interaction techniques.

Sense of presence

: Providing the sense of presence is an important issue in computer games [51]. In [44], presence is described as “the subjective expression of being in one place or environment” no matter where the person is physically situated. Presence can be categorized as physical presence, “the sense of being here.” and social presence, “the sense of being together with another.” [51]. We argue presence is an important factor in creating a good gaming experience. In our game, physical presence is important because players must feel they are trapped in an escape room. Social presence is not considered in this paper because of the solo playing nature of this game.

Although the presence concept is well defined in the literature, measuring presence is a challenging issue in audio games. Level of interactivity and realism are two important issues that can affect the sense of presence. In terms of interactivity, the player must be in full control of the game. In addition, any action of the player must result in feedback for the player. In terms of realism, three important factors realizing realism are using the first-person view instead of the 3rd-person view, using real sounds, and the quality of virtual representation of bodies. In audio games, all of the efforts to creating a sense of presence are centered around the quality of sounds.

MEC Spatial Presence Questionnaire (MEC-SPQ) [52] is a questionnaire to measure the sense of presence in different media, including computer games [44]. This questionnaire covers several aspects of presence, including originally, attention, involvement, suspension of disbelief, spatial situation model, and spatial presence, which is extracted from spatial presence [54] theory. The questionnaire used in this paper is adapted from MEC-SPQ that allows measuring presence according to involvement, perceived interactivity, and spatial presence factors. The set of questions in each of these categories are shown in Fig. 4. Participants answered the questions rated on a 5-point Likert scale.

Fig. 4
figure 4

The MEC-SPQ questionnaire [52] used to measure presence in the experiment

Game Experience

: We also measured game experience as a person’s perceptions and responses that result from playing the game. This subjective measure describes the experience of interaction with the game. The basic parameters of game experience are generally identified as utility, joy, appeal, and aesthetics [26]. We used the GEQ Questionnaire [28], which is widely used to evaluate the quality of the gaming experience. In particular, we used the core module of this questionnaire, including 33 questions. Participants responded to the questionnaire according to a seven-point Likert scale, with the anchors being strongly disagreed and strongly agreed. We used a translation of the questions (in Persian) in the experiments.

Gaming behaviour

: In terms of the performance of participants, we measured three dependent variables, including the error counts (the number of losing the parts of the key before completing the level), the time to finding all key parts and finding the way to escape the room, the distance traversed before escaping the room and direction change counts. As a 2 × 2 between-subjects design (sonification type × interaction type), each participant performed only one of the scenarios, to determine which sonification and interaction technique can lead to better performance in terms of time to task completion and error counts.

4.2 Results

In this section, the results of experiments to compare different interaction and sonification techniques are elaborated.

4.2.1 Sense of presence

We aimed to investigate what type of sonification and interaction techniques can result in a better sense of presence in mobile games for visually impaired individuals. To this end, we measured the sense of presence in terms of three different parameters: involvement, interactivity, and spatial presence. The average of different parameters of presence for four different groups is shown in Fig. 5. According to this figure, although the average sense of presence in all conditions was more than 2.5, the sense of presence in the tempo sonification and tapping interaction was more than in other conditions. The details of presence regarding different questions are also shown in Fig. 6.

Fig. 5
figure 5

The presence values for different groups in terms of involvement, interactivity and spatial presence

Fig. 6
figure 6

The details of the presence scores for regarding the different items of presence

We also performed statistical analysis tests to understand and compare the effect of different sonification and interaction techniques on the sense of presence. Our experiments showed a statistically significant difference between groups as determined by one-way ANOVA (F(3,31) = 6.4563, p = 0.00078). A Tukey post hoc test revealed that the sense of presence was statistically higher for playing under Tempo-Tapping condition (M = 4.184, SD= 0.41, p = 0.0014) and Tempo-Tilting conditions (M = 3.882, SD= 0.46, p = 0.019) compared to the Loudness-Tilting condition (M = 3.254, SD= 0.36, p = 0.03233). The sense of presence was also statistically higher for playing under Loudness-Tapping condition (M = 3.566, SD= 0.53, p = 0.02888) compared to the Loudness-Tilting condition (F(3,31) = 5.4563, p = 0.0233). However, there was no statistically significant difference between the Tempo-Tilting and Tempo-Tapping, between Loudness-Tapping and Loudness-Tilting.

One problem with the Tilting-based interaction is that since the amount of tilting indicates the speed of movement in the game space, a player may not have a sense of how much the smartphone is tilted, as she does not see the game space. Depending on how long the phone is tilted in a direction, the player’s character moves in that direction. However, in the tapping interaction condition, a certain amount of movement happens with each tapping on the screen. We argue that the reason for a better sense of presence in the tapping interaction mode compared to the tilting mode is that the tilting mode requires more cognitive effort to understand the feedback that may negatively affect the sense of presence in the game. The results also show that the tempo sonification technique can better lead players to recognize game space without disturbing the player by frequent changes in the loudness of the sounds. We conclude that the combination of tempo sonification and tapping interaction can create a sense of presence in our audio game.

4.2.2 Gaming experience

We performed statistical analysis on the gaming experience of the participants in four experimental conditions. The results showed a statistically significant difference between groups as determined by one-way ANOVA (F(3,31) = 8.546, p = .0013) in terms of core gaming experience. A Tukey post hoc test revealed that the core gaming experience of players was statistically higher for playing under Tempo-Tapping condition (M = 4.23 SD= 0.33 p = 0.005) and Tempo-Tilting condition (M = 3.97, SD= 0.42, p = 0.004) and Loudness-Tapping condition compared to the Loudness-Tilting (M = 3.53, SD= 0.63, p = 0.02933). In addition, the core gaming experience of players was statistically significantly higher for playing under the Loudness-Tilting condition compared to Loudness-Tapping conditions (M = 3.13, SD= 0.52, p = 0.015). However, the differences between other pairs were not statistically significant. We conclude that the change in the tempo of sounds as a sonification technique can yield better quality of experience than loudness.

4.2.3 Gaming behaviour

We captured a set of metrics regarding the behavior of players in the proposed game to study the effect of different interactions and sonification on the behavior of players. In particular, we recorded the distance traversed by the player, error counts (the number of losing the parts of the key before completing the level), the time to finding all key parts and finding the way to escape the room, and the number of changes in the directions (an indicator of cognitive efforts). The hypothesis was that the different sonification and interaction techniques can affect the behavior of the players. Thus, we expect to observe differences in the behavior of players in terms of these actions in different settings. These actions form the behavior of a player in the game.

As shown in Fig. 7, in terms of error counts and the number of changes in the directions, players were more active when playing under the Tilting-Loudness condition compared to other conditions. In order to statistically analysis of this issue, we used MANOVA test to assess multiple response variables simultaneously given the type of sonification and the type of interaction as independent variables and the number of errors and the number of changes in the direction as two dependent variables. The results of the test showed a statistically significant difference in players’ behavior between Tempo-Tilting and Tempo-Tapping (F (3, 31) = 7.7456, p <.05; (Λ) is 0.619) Between Loudness-Tilting and Loudness-Tapping (F (3, 31) = 8.6333, p <.05; (Λ) is 0.544). However, there was no statistically significant difference between other pairs in the analysis.

Fig. 7
figure 7

Error counts and the average number of changes in the directions per minute for different game states

We argue both the number of errors and the number of changes in the directions can be considered as a measure of cognitive load in the game. The results show that players were more active in the tilting interaction condition. We attribute this to the low control on the player’s movement in the tilting mode compared to the tapping mode, as holding the phone in the tilted position, which may be considered a passive state, results in moving the player’s character. In addition, we observed more confusion in the tilting mode that results in more movements in the game to find the targets.

We also compared the combination of different sonification and interaction techniques in terms of the total gaming time (two levels of the game) as a measure of performance. The results of our experiments showed that there was no statistically significant difference between groups as determined by one-way ANOVA (F(3,31) = 1.8342, p = .0933). Thus, the hypothesis that the performance of a player in terms of the time to finish the game can be affected by the sonification and interaction technique was rejected. Accordingly, our experiments’ results also showed no statistically significant difference between groups as determined by one-way ANOVA (F(3,31) = 2.477, p = .0722).

4.2.4 Preliminary results on visually impaired participants

Although the audio game proposed in this paper does not have any graphical user interface. As a result, it can be played by both unimpaired and visually impaired individuals. In order to increase the validity of the results, we decided to perform the experiments on a limited number (eight participants) of visually impaired participants as well. Since the goal of the paper was to investigate the sonification technique rather than the interaction technique, we did not change the interaction technique in the experiments. As a result, since unimpaired participants preferred the tapping interaction to the tilting interaction in the first set of experiments, we used the tapping interaction in all experiments. In addition, to address the small number of participants, we used a within-subjects design, in which each participant played in both tempo and loudness conditions. In this preliminary experiment, due to the small number of participants, it was not possible to perform the statistical analysis on the results.

The average of different parameters of presence, including involvement, interactivity, and spatial presence for Tempo and loudness groups, are shown in Fig. 8. As expected, similar to the first set of experiments on unimpaired participants, although the average sense of presence in both conditions is more than 3, the sense of presence in the tempo conditions is more than the loudness conditions.

Fig. 8
figure 8

The MEC-SPQ questionnaire [52] used to measure presence in the experiment for visually impaired individuals

For gaming behavior that was measured in terms of error counts and the number of changes in the directions, as shown in Fig. 9, players were more active when playing under the loudness condition compared to tempo condition. These results are in line with the findings of the first set of experiments on unimpaired participants. Since these parameters can be considered as an indication of cognitive load in the game, the results show that players were more involved in understanding the game environment through sounds in the loudness conditions than the tempo condition.

Fig. 9
figure 9

Error counts and the average number of changes in the directions per minute for different game states

The blind participants in this experiment had no previous experience with mobile games. After playing the game, they described it as fun and attractive and wanted us to implement more levels. Most of them believed that audio feedback and sonification in the tempo condition were quite engaging and understandable.

5 Conclusion

Computer games can have a significant effect on the development of mental structures in players. However, video games on the market rely primarily on graphical information, which visually impaired people can not access. Research has shown that audio-based computer games can be an efficient rehabilitative technique for training the navigation skills of visually impaired individuals. Audio games by employing a large amount of sound content make it possible for players to experience computer games. One important category of audio games is navigational audio-based games that can enhance the spatial cognitive skills of players. Two major design challenges in the development of audio games are sonification (how sounds are used to convey information in the game space) and interaction (how a player interacts with the game) techniques.

The primary goal of this paper was to study the sonification and interaction issues in the design of computer games for visually impaired individuals. In particular, we were interested in investigating what type of interaction and sonification techniques can increase the sense of presence and the quality of gaming experience in audio games. In terms of sonification, we applied tempo and loudness, and in terms of interaction, we employed tapping and tilting techniques in the game design. Accordingly, we developed different working prototypes of our navigational game to study the effect of these parameters. We designed and conducted a user study to evaluate these working prototypes in terms of presence, gaming experience, and players’ performance. Our findings showed that the combination of tempo sonification and tapping interaction techniques can better accomplish these goals. We showed that these techniques can enhance situational awareness by providing directional clues. This is an important finding that can affect the design of audio games in which the only communication channel is the audio channel.

The comparison of GrandEscape with similar audio games can better show the position of this game among specific games that are designed for visually impaired individuals. Myst III: Exile [53] and Halo [39] are examples of mainstream games in which specific soundtracks are used to guide the player without changing the sound properties. However, in GrandEscape, players are guided through modifying the tempo and loudness of sounds. On the other hand, in some games like Shades of Doom, players use implicit audio cues related to the game scenario such as footsteps to find their ways. Unlike PaRappa the Rapper 2 and Dance Revolution in which players have to coordinate their input with the rhythm of the music, audios are only used in GrandEscape for determining the distance of a player from different game objects in the game. Some games such as AuditoryPong [27] and PHANToM [34] use haptic feedback in addition to acoustic feedback for guiding players while in GrandEscape, only audio feedback is used for this purpose. Unlike GrandEscape in which changing tempo and loudness are used as sonification techniques, in Pingball [4], shifting pitches is used as a sonification.

Our findings can be transferred to conduct real-world navigation with the goal of the virtual auditory-based simulation. Further long-term studies are required to study the effect of our navigational audio game on the gaming experience of players. In addition, although audio games are primarily developed for visually impaired individuals, these techniques can also be used to develop sound-based computer entertainment for everyone. Developing audio games requires the attention of scientists from multidisciplinary domains to improve the quality of life for blind and visually impaired people.