1 Introduction

Today nearly every public environment offers visitors various aids for navigation (digital signage, websites or mobile apps), but these aids are generally not accessible for visually impaired users; also maps embedded in mobile applications, which rely on online providers, typically do not include support for accessibility. However, accessible navigation apps have proven to be effective assistive solutions for persons with visual impairments, helping them achieve better social inclusion and autonomy. Kahn et al. [1] summarized the main challenges faced by visually impaired persons during their daily activities and pointed out how ICT, and more specifically assistive technologies, may help them. Pathfinding, location tracking and orientation in outdoor and indoor environments are among the main needs that visually impaired users satisfy via mobile devices [2]. Users can rely on navigation apps to get real-time information about their actual position, route planning, and accessibility warnings in a physical environment. Applications such as Lazarillo and BlindSquare, explicitly designed for blind and partially sighted users, provide them with vocal instructions as they navigate through an outdoor or indoor environment [3, 4]. Besides aiding step-by-step navigation, a map can also provide an overview of an environment, useful when planning a visit.

The terms "orientation" and "mobility" are used in the visual impairment literature to identify, respectively, the concepts of wayfinding and locomotion, which according to Montello [5] are the two components of spatial decision-making. Mobility refers to the activities that a visually impaired subject performs as real-time responses to physical features while traversing an environment (e.g., avoiding an obstacle, ascending a step or crossing a street). On the other hand, orientation is related to the skill of planning a route from a person's current position to a given destination. While mobility depends only on the perception of the immediate surroundings at a certain moment, orientation also requires knowledge of the whole area and its relevant landmarks and hence, involves, an enduring mental representation of the environment. Such a representation is referred to as a "mental" or "cognitive" map [6].

Developing a cognitive map involves tasks such as spatial coding, landmark anchoring and route planning; in fact, the purpose of a cognitive map is to facilitate the awareness of one's position in the space in relation to the landmarks and elaborate a route to reach a given point of interest (POI). Building a cognitive map requires skill in understanding the structure of an environment, describing its organization and understanding its relationships with other physical spaces. These skills are not strictly related to vision and involve different neurological processes depending on whether an individual is visually impaired or not [7,8,9]. Thinus-Blanc et al. [10] provided thorough insights into how blind persons build mental maps. In particular, the authors have proved that sensory modalities other than vision are exploited by the visually impaired in order to gain spatial knowledge and consequently develop orientation skills.

Popular mobile apps that facilitate orientation and navigation for the visually impaired are unable to provide a rapid overview of the environment and the POIs; such a functionality might be very useful for the preliminary exploration of a space (e.g., when planning a visit), similar to the role of hardcopy maps for sighted users. The goal of this study is to investigate whether vibration patterns and verbal hints can be successfully used to help visually impaired users build a cognitive map; in particular, we will adopt the haptic channel not only to convey information about the positions of the POIs, but also about their function. We will first describe a preliminary study conducted on a simple indoor environment, and then, we will show how we applied the preliminary findings to build a new prototype based on georeferenced, downloadable maps. User testing procedures will be described, and test results will be discussed. Finally, we will analyze the open issues and future perspectives of the proposed approach.

2 Related work

The role of non-visual stimuli in the development of spatial knowledge has been a research topic for many years. Ottink et al. [7] provided an overview of neurological studies about the role of auditory and haptic stimuli in the formation of cognitive maps for both sighted and visually impaired persons. They pointed out that although visually impaired people have a slower learning rate, using non-visual hints facilitates the process of learning, and after an adequate training phase, they can build a faithful mental map of a physical space.

An allocentric representation of the space encodes information about mutual locations of landmarks in an environment; the location of a landmark in the space is thus defined relative to the locations of other landmarks. Conversely, an egocentric representation provides a perception of the space from the explorer’s perspective. Tactile maps effectively offer satisfying allocentric layout representations to persons with visual impairments [11, 12]. Locations and distances, as well as directions, can also be approximated [10,11,12]. Some difficulties occur when rotation comes into play [13]. Recognizing a rotated environment requires an additional cognitive load that may hinder the performance. It must be pointed out that individual differences occur when assessing the effectiveness of a tactile map for visually impaired persons. In blind subjects, it mainly depends on the age at onset of blindness, while in the severely visually impaired the characteristics of residual sight can make a difference. According to several studies [2, 12, 14,15,16,17], tactile maps are an effective way to improve navigation and wayfinding in a real environment. This is particularly true for small-scale tactile maps, since their exploration allows for a simplified yet effective allocentric overview of the environment, which is much faster than direct experience.

Hardcopy tactile maps, based on raised line paper or other ad-hoc stiff materials, have become a traditional alternative to visual maps for visually impaired users. Espinosa et al. [2] investigated the effectiveness of a tactile map in familiarizing oneself with an unknown urban area. They compared the learning performances of different groups of blind users performing direct exploration of the environment, some of whom were provided with a tactile map of the environment and others with verbal descriptions. Tactile maps contained a representation of the route and the significant landmarks, while the others’ details of the environment were provided verbally or as Braille text. Test results showed that participants with tactile maps achieved better results.

Large-scale adoption of tactile maps suffers from a very complex creation and authoring process, which leads to high production costs. Other limitations arise from the fact that these maps necessarily provide static information from both a spatial and a temporal point of view. In fact, a hardcopy map cannot be zoomed in or out, nor can it be modified depending on the user's orientation; moreover, changes in the physical environment cannot be reproduced on the map in real time. Finally, conveying descriptive information via the tactile channel alone may result in an overwhelming amount of stimuli for the users [18]. Thus, much effort has been made over the years to find more effective alternatives based on electronic devices.

Digital counterparts of tactile hardcopy maps, which leverage electronic devices’ haptic and audio capabilities, are the subject of several studies. Lahav et al. [19] developed a digital user interface with haptic feedback to build a virtual representation of a physical environment. The authors carried out test sessions in which visually impaired participants were first immersed in the virtual environment and then directly explored it. Test results showed that the haptic-based virtual environment allowed participants to develop faithful cognitive maps. Similar results were achieved by Poppinga et al. [20], in which the use of haptic and speech feedback was proposed to improve the usability of a map rendered by a touchscreen; in particular, vibrations issued by the device were used to trace the road network. Papadopoulos et al. [21] compared three modalities of exploration in a simulated urban area, namely verbal description, audio-tactile rendering, and audio description combined with haptic feedback. Their study highlighted that tactile and haptic feedback associated with the audio channel facilitated the process of building a mental representation of the environment. Finally, Palani et al. [11] conducted a series of tests involving both sighted and visually impaired users, using an experimental prototype of a digital map based on haptic and audio channels. The map was implemented as a multimodal user interface built on top of an Android tablet, in which different vibro-tactile feedbacks were used to distinguish between lines and polygons, while audio cues and verbal messages were used to signal landmarks. Results showed that for both categories of users, the digital multisensory map provided equivalent performances compared with a hardcopy tactile one. Another attempt to overcome the issues of versatility and flexibility posed by hardcopy tactile maps was made by Brayda et al. [22], using a programmable pin array matrix (PAM) connected to a laptop, to obtain a small-scale map of a simple indoor environment (i.e., the perimeter of a room and some landmarks). The PAM's configuration was controlled via dedicated software, and the study was conducted in a laboratory. Participants in the study were provided with the PAM tactile map and a summarized audio description of the real environment. Test results showed that the tactile map proved to be an effective aid in orientation tasks.

As smartphones have become more powerful and pervasive, their popularity has grown as means of increasing the autonomy of visually impaired persons [1]. Smartphones’ vibration motors, text-to-speech (TTS) capabilities and accessibility services can be used as low-cost solutions to convey enriched information through the haptic and audio channels. Besides their affordability, solutions based on mobile devices’ native capabilities can be easily distributed to large groups of users via mobile app stores on the Web.

Vibration patterns have been identified as a valuable aid for both sighted and visually impaired users. They have proven effective in conveying the priority level associated with notifications [23]; moreover, they were successfully adopted to identify logical partitions, icons and functions of a smartphone’s graphical user interface, possibly coupled with audio hints [24, 25], thus fulfilling the function of aiding in the construction of an allocentric perception of the graphical interface’s elements.

3 Method

3.1 Preliminary Phase

A preliminary study was carried out with a digital map of an indoor environment (namely, a shopping mall) enriched with verbal and haptic hints. The digital map was developed for an Android platform and was obtained from a PNG image of the floor plan of a mall, on which a transparent “semantic” layer was superimposed. The semantic layer made it possible to issue vibro-tactile hints as the user's finger hovered over a functional area. Figures 1, 2 and 3 show how the digital map worked. In Fig. 1, a floor plan is shown as it was rendered on the screen of the mobile device, while Fig. 2 shows the functional areas as they were rendered on the semantic layer. The RGB color encoding was used to identify each functional area, i.e., providing information about POI categories. Each POI category corresponded in fact to a fixed pair of red and green levels, and was associated with a vibration pattern and a text label (e.g., “Restaurant”) while the blue component was used to precisely identify the area by its name (the name of the restaurant itself). Hence, according to this strategy, each category could account for up to 256 different POIs. A category was announced via a text-to-speech (TTS) vocal message and a vibration pattern issued via the device’s native APIs. The final result is shown in Fig. 3, in which three POIs are highlighted, with the corresponding audio and vibration hints. Vibration patterns were designed to be as distinguishable as possible, while keeping a low level of intrusiveness. Android does not always allow full control over the vibration motor; in fact APIs to control features such as the waveform or intensity of a vibration are only available for some devices [26]. In order to provide the same user experience for every user, we decided not to involve these factors in the vibration patterns’ design.

Fig. 1
figure 1

The floor plan as it appears on the screen of the mobile device, with its different functional areas

Fig. 2
figure 2

The layer that is superimposed on the floor plan, in which different colored squares account for different categories of functional areas. Each category is identified by a couple of Red and Green color levels. This layer is not visualized on the screen

Fig. 3
figure 3

When the finger swipes over an area identified in the semantic layer, Android’s Vibration API and its TTS engine will announce the corresponding category

In order to evaluate the effectiveness of this approach, two blind users were involved in co-design sessions with a prototype application that was developed specifically for this purpose. The prototype showed a digital map composed of 15 different areas pertaining to seven categories. In order to avoid bias, at each session, the areas were placed differently. Control buttons (also highlighted in the picture) were added in order to toggle each modality of interaction and evaluate the impact of the audio and haptic channels separately.

Sessions were held remotely via Skype calls, due to COVID-19 restrictions. Participants were preliminarily sent an e-mail containing a link, from which they could install the prototype app, and detailed instructions about its functioning. Overall, three sessions of about forty minutes were held, in which users were asked to practice with the different types of feedback and then explore the map for about 10 min in order to identify as many POI categories as possible. Finally, they were asked to indicate four specific POIs. Users were invited to think aloud during the exploration, and at the end of each session, they were asked to provide us with their impressions and criticisms. Their suggestions were collected in order to refine the prototype and identify strengths and weaknesses of our approach. Figure 4 shows a screenshot of the prototype as it was at the end of the three sessions, in which different POI categories are highlighted.

Fig. 4
figure 4

The prototype used for testing; different categories of POIs are highlighted, as well as buttons to toggle interaction modality

Sessions with users showed that vibro-tactile hints were appreciated and deemed useful for building a mental map of the mall; the different areas in the map were successfully recognized through the associated vibrations, and users were able to find the POIs they were required to locate. A concern arose that the cognitive load may become too heavy with very narrow and close areas or for certain categories of users, such as the elderly. It was thus suggested to add a filtering function to the map, to allow users to receive hints only for chosen subsets of categories. For the same reason, a zooming function was suggested as a way to better distinguish between close POIs. These functionalities were taken into consideration for further evolutions of the digital map.

Although during sessions vibrations alone were sufficient for subjects to recognize POI categories, users stated that the best feedback configuration was the one combining the audio and haptic channels together, since even if a person could not feel the vibration, they could understand the spoken description and vice versa. It was stressed that depending on the type of disability, not all visually impaired subjects may be equally responsive when it comes to the sense of touch.

3.2 Enriching a downloadable GPS map

The results of the preliminary study encouraged us to adopt a similar approach with downloadable GPS maps, such as those obtained from Google Maps or OpenStreetMap [27, 28]. Metadata provided by these platforms do not natively support accessibility, so we had to come up with a strategy to enrich them in analogy with the hidden colored layer of our early prototype. Our goal was to enrich a GPS map with information to make landmarks and POIs easily discoverable by visually impaired users, possibly in a customizable way. We needed a framework for location and orientation with APIs that would allow building a customized interaction layer on top of the map’s default canvas; this led us to use the Mapbox framework [29], which offers excellent customization capabilities when it comes to defining user interactions, and allows defining multiple map layers for data presentation. Moreover, Mapbox web services and APIs provide geospatial data in the GeoJSON format [30], an intuitive open standard file format for representing POIs, which can be easily integrated with customized metadata. Figure 5 shows a GeoJSON file, and how it is related to two POIs highlighted on a map. The complete metadata file is formed by a sequence of Feature objects, in which each Feature consists of a pair of objects, named Properties and Geometry. We designed the Properties object in order to contain metadata related to accessibility hints, while Geometry defined the POI on the map in terms of its GPS coordinates. A Properties object contained an objectId, a category and a name. objectId was the unique identifier assigned to a POI; category was the numerical code assigned to the POI type, and finally name was the textual denomination of the POI. Category fields were used programmatically to issue vibration patterns and TTS announcements related to a specific type, while the POI’s name was announced by the TTS engine soon after its type.

Fig. 5
figure 5

The nested structure of a Feature GeoJSON object, in which the Properties object contains descriptive information about the POI, and the Geometry object pertains to the positioning of the POI onto the map

The following convention was used to define vibration and pauses:

  • SHORT_VIB = 100 ms vibration;

  • LONG_VIB = 200 ms vibration;

  • LONGER_VIB = 300 ms vibration;

  • PAUSE = 100 ms silence.

An additional 400 ms vibration was used to signal when the user's finger reached the edge of the map, beyond which the focus was lost and then it was necessary to re-position the finger and resume exploration.

Figure 5 better explains the metadata structure. Table 1 shows the patterns used for each POI category and the related spoken labels (Fig. 6).

Table 1 The six POI categories and the related vibration patterns and TTS labels
Fig. 6
figure 6

Correspondences between two points of interest on a map and the related GeoJSON metadata

In order to evaluate the enriched map, an Android mobile app was developed. Zooming, category filtering and feedback preferences were integrated, in accordance with the findings of our preliminary study. Figures 7 and 8 show screenshots of the app that highlight these functionalities.

Fig. 7
figure 7

A screenshot of the POI Filtering function and the notification settings

Fig. 8
figure 8

A zooming sequence; red dots are the POIs rendered in accordance with the GeoJSON description

In order to meet the needs of visually impaired users, we modified Mapbox's default interaction modes and controls. We defined an "ad-hoc" layer in which a 48-by-48 pixel draggable square could be moved and issued haptic or audio feedback whenever a point of the map related to a POI entered its perimeter. A single tap on the screen caused the draggable square to appear right under the position of the finger, and a special sound notified the user that the square was focused and ready to be dragged around; whenever the finger was lifted, the focus was lost. In order to prevent the map from involuntary panning, which could lead users to lose spatial references [31], we disabled the default gesture for this function. We developed special "pagination" buttons to show adjacent portions of the map, but for simplicity's sake, we did not include these buttons in the prototype used for testing. Zooming was associated, as per default, with the "pinch" gesture. A sound indicated when the map was being zoomed in or out, and a TTS message announced the new zooming level at the end of the procedure. In order to limit problems of map shifting as a result of rescaling, we rescaled the surroundings and repositioned the map after each zooming action, in such a way that the last focused POIs were always visible. Unfortunately, for POIs located near the borders of the screen, problems persisted. Finally, to prevent ambiguities due to rotation, we kept the map orientation fixed to north.

3.3 Testing procedure

Tests were carried out with seven volunteers, five females and two males, aged 42 to 65 years, recruited in Pisa, Italy, via the Italian Association for the Visually Impaired (UICI) [32]. Four of the subjects were affected by severe binocular visual impairment, while three were affected by binocular blindness. All participants provided written informed consent. Table 2 summarizes the main characteristics for each participant; degrees of visual impairment are defined according to the WHO visual impairment classification [33]. All of the participants had experience using navigation and mobility apps.

Table 2 An overview of the participants recruited for the test sessions

At first, we had considered setting up the tasks by assigning a time limit for their execution. However, during the initial sessions with the participants, we found that they were more comfortable performing the tasks in a less restrictive manner, particularly by talking with the researchers to satisfy their curiosity and express their thoughts. We thus decided to leave each participant free to perform the assigned tasks without setting time limits, but observing whether each one was able to complete the task smoothly, i.e., by interacting with the map without needing external help, or conversely whether they had noticeable difficulties during the interaction. Non-blind participants were left free to interact with the app using their residual sight, but screen magnification was disabled on the devices, since it hindered the correct functioning of the prototype.

After an initial training phase of about 10 min, nine tasks were sequentially assigned, that were focused on assessing the users’ ability to distinguish between different POI categories and locate their relative positions on the map. In order to collect information about the participants' user experiences, they were invited to think aloud during the training phase and while executing the tasks; meanwhile, we took note of their observations, questions and difficulties.

Tasks were carried out using two Huawei MediaPad T5 tablets running Android 8 and an Oppo A74 running Android 12. After all the tasks were performed, the System Usability Scale (SUS) standard questionnaire [26] was administered, and participants were asked which kind of hint they were more comfortable with, whether audio, vibration, or the combination of both. Finally, we asked their opinion about using a digital map like the one they had tested on a regular basis in a typical scenario such as looking for a store in a shopping district. (Table 3).

Table 3 The tasks proposed during our test sessions

3.4 Test results

Most of the participants were able to complete the required tasks autonomously, although age and familiarity with electronic devices did play a role. In particular, P3, a 65-year-old who had developed severe vision loss during the past 2 years, and who was not particularly used to using assistive technologies, needed additional assistance to interact with the map, as did P4, a 54-year-old, blind since birth, who explicitly declared that he rarely used technology “as a sight substitute,” preferring human interaction (i.e., asking people for help and information). Two of the non-blind participants, P1 and P2, were not used to relying on the sense of touch, as they habitually used the screen magnification function of the smartphone to interact with its interfaces.

All the participants agreed that tactile hardcopy maps are outdated nowadays, and they greatly appreciated the ability to interact with a digital map on their mobile devices via the vibro-tactile channel. All of them preferred the haptic channel when coupled with the audio channel, thus confirming what had emerged during co-design sessions of our preliminary study and the findings in [7]; they considered this composite feedback to be the most reliable. The haptic channel was mostly appreciated by blind participants, who proved more skilled at feeling the vibration patterns correctly, and discriminating between different patterns. Among the non-blind, P1 and P2 in particular showed more difficulty relying on the haptic channel, since magnification had been disabled. They both claimed: "I find it extremely difficult to get information relying on different kinds of vibrations. If I need to check something on the screen, I just use the magnifier and search." Pattern recognition tasks (i.e., tasks 1–6) showed that out of the six patterns present, participants were only able to remember three or four. However, all the participants pointed out that with more training they probably could have done better. Pattern recognition was more precise when category filtering was enabled, and with a proper use of the zooming function, which augments distances between POIs on the screen, again confirming what had emerged during our preliminary study. Participants who had experienced particular difficulty perceiving vibrations were asked to repeat tasks 1–6 by using the smartphone, holding it in their hands while exploring the screen. This modality of interaction proved much more effective in conveying the haptic channel, due to the fact that the vibrations were also perceived by the palm of the hand holding the device. Tasks 7 and 8 were successfully accomplished by all the participants but P4. P4 explicitly stated “I hate to relate with the external world via an electronic device. I’d rather communicate with people and ask for directions and information,” while the other participants, albeit with different levels of difficulty, completed the tasks.

Regarding the use of the enriched map as a support for real-world navigation, all of the participants stated that it could be a valuable support before visiting an unfamiliar environment, perhaps integrated into one of the existing apps for orientation and mobility as an “overview” functionality.

SUS scores per user are shown in Fig. 9, in which different colors identify different visual disabilities.

Fig. 9
figure 9

SUS scores assigned by participants at the end of the test sessions

4 Conclusions and future work

We have described an approach to exploiting the haptic and audio channels in order to make GPS-based mobile maps more accessible for users with visual impairments. Our goal was to enable the target users to derive an allocentric mental representation of a physical space from a downloadable GPS map. For this purpose, we developed an Android prototype for a georeferenced map, described by the GeoJSON data format, upon which we superimposed tactile and audio hints to indicate locations, categories and descriptions of relevant points of interest. The prototype underwent test sessions with volunteers affected by severe visual impairment or total blindness. Test users highlighted the validity of our approach as an evolution of traditional hardcopy tactile maps. In order to derive more meaningful statistics on usage and liking, we are planning to perform more tests with a wider audience. These tests will also allow better insight into the problems related to different users' needs based on factors such as age, or type of visual impairment. Functionalities such as panning and rotation also need to be integrated into a future release of the prototype and tested. At the time of writing, well-known issues persist between Android's assistive technology (TalkBack), touch interactions and apps that use the vibration service. In the case of our prototype, this produced unpredictable interaction problems when TalkBack was activated. If these bugs are fixed in the near future, effective workarounds must be found before developing a new prototype for a wider audience, especially if it will allow for tests without supervision. We are also planning to do further research on vibro-tactile hint design, to find possible strategies (e.g., pattern personalization) for easing their memorization. Finally, in order to enhance POI descriptions, interoperability with widely used platforms such as FourSquare [35] may be considered.