1 Introduction to Sensory Substitution

Sensory Substitution (SS) is the process of delivering a signal from the domain of one sensory modality to an alternative sensory modality, for example circumventing the auditory modality with the haptic modality. The purpose of SS is often to circumvent an impaired modality via an alternative one so that a person can experience stimuli from the impaired modality. Formally, we will refer to the modality being replaced as the source modality and the modality that the signal is being delivered to the target modality. In other words, Sensory Substitution is a method by which people who are blind can see by hearing, or people who are deaf can hear via touch. This revolutionary idea, that people can learn to experience sensations grounded in one modality via another, was pioneered by the late Dr. Bach-y-Rita in the 1960s. The notion that the signals eminating from the receptors of one modality can be interpreted in the brain as stimuli from another domain was novel and spurred the development of methods and systems in the last four decades harnessing this phenomenon to treat disability, enhance education, and enrich people’s lives.

The objective of a Sensory Substitution Device (SSD) is to transform a signal from the source domain into a form that can be perceived by the target modality. In the famous case of Dr. Bach-y-Rita’s TVSS system, the source domain was visual and the target domain haptics [4]. Dr. Bach-y-Rita showed that people who are blind could, with training, learn to interpret visual stimuli projected on their back as tactile stimuli using the Tactile Vision Sensory Substitution device (TVSS). The device consisted of a dental chair, retro-fitted with 400 solenoid actuators that would press upon the user’s back when seated. The solenoids were controlled by a camera system that converted images to electrical signals: a bright portion of an image would result in solenoids pressing against the back of the user in the corresponding location. This is illustrated in Fig. 4.1a–c.

Fig. 4.1
figure 1

(a) Original image of hand (b) Original image converted into electrical activations based on the brightness of that portion of the image (c) activations are converted into solenoid positions to stimulate the skin: solenoids stimulate the skin in proportion to their activation

After training, the TVSS allowed user who are blind to recognize household objects without touching them. These results had massive implications for the field of neuroscience and that of assistive technology – technology to improve the lives of people with disabilities, demonstrating that through clever uses of technology sensory impairments can be circumvented. Researchers later went on to develop Sensory Substitution Devices (SSDs) to substitute vision with hearing [46], vestibular with tactile [65], and hearing for tactile [59] with impressive initial results. While fantastic medical advances have resulted in sensory prostheses such as the Cochlear Implant (CI) [48] and retinal prosthesis [9]; SSDs provide a great alternative for circumventing the loss of a sensory modality as surgical procedures are often prohibitively expensive and always invasive. This chapter will focus on SSDs with a target modality of haptics, or Haptic SSDs.

2 Advantages and Limitations of Haptics for Sensory Substitution

The skin is the largest organ on the body, making touch one of the most versatile modalities to design SSDs for. Designers have a wide range of options with respect to where to place devices: some SSDs have even been designed for the tongue. Because of the plentiful real-estate, haptic SSDs can be designed to impart minimal obstruction to other crucial functions of the senses. For example, haptic actuators can be placed on places such as the back, upper arms, or waistline, locations that are not often used in day-to-day activity.

Touch also happens to be underutilized as a communication medium for technology. Designing touch-based SSDs has the added benefit of likely not interfering with other communication mediums to cause sensory overload. A person who is blind for example is unlikely to accept obstructing their hearing with a vision-to-auditory SSD, but is more likely to if the target modality is one that is not already being highly utilized, such as touch. Using haptics not only avoids interfering with a modality already in use, but may allow for a higher effective cognitive bandwidth due to the multi-channel nature of adding haptics.

Current models of the human memory and attention system portray the different modalities as semi-independent channels to one’s attention. The Baddeley multi-channel model for example allocates different sensory inputs unique and semi-independent subsystems of working memory and independent processing systems for such each modality [74]. Consequently, sensory signals of different modalities can more effectively make use of the human cognitive bandwidth than the same information presented to a single modality; this phenomena is called the “modality effect”. Sensory overload occurs when the attention system is overwhelmed and because touch is often underutilized in daily tasks, taking advantage of it can augment attentional bandwidth while successfully averting sensory overload. For this reason, the haptic modality has received substantial interest in military (high cognitive load) settings and haptic-vestibular SSDs have been developed for pilots flying in low-visibility settings [78].

The sense of touch though does exhibit inherent limitations. One such limitation is the limited information capacity of haptics. It is estimated that the visual system has a capacity of 4.3 Mbits/s [28] and the auditory system 8 Kbits/s [26]. In comparison, the haptic modality is estimated to have a mere 600–925 bits/s of capacity [58]. This implies an upper bound on the amount of information an SSD can convey through the sense of touch, and consequently an upper bound on the fidelity of information one can access from a higher-bandwidth modality through haptics. It was not so minuscule though that users could not use it to substitute vision and perform basic vision tasks [4].

While touch allows for a wide variety of locations, sophisticated interaction often requires multiple tactile actuators to convey complex information. The skin imposes a minimum spacing requirement between factors to maintain discernability and this spacing is a function of the location of the body the tactors are placed as well as the kind of stimulation (pressure, vibration, temperature, etc) that will be applied. For example, on the human back the minimum discriminable separation of vibration stimuli is about 11 mm [18]. Consequently, actuators must often be adequately spaced out on the body, taking up more space than a device relying on a more concentrated modality like vision or hearing. The design of the device and the signal processing is crucial for effective use and adoption as an SSD and haptic SSDs can largely be categorized into 3 categories: general purpose (Sect. 3), media readers (Sect. 4), and interactive devices (Sect. 5) which are explored in the following sections.

3 General Purpose Sensory Substitution

General purpose Sensory Substitution is intended to circumvent a source modality via the target modality outright, making it a complete substitute for the source modality. This is in contrast to application specific SS, where a device or technique transforms a signal from the source modality into the domain of the target modality in such a way that is tailored to the application. Oftentimes there exists a tradeoff between efficiency and generality: the more general an SS method, the more training is required, while more application-specific methods are often learned more quickly.

The first and likely most famous implementation of general purpose vision sensory substitution occurred in 1969, when Dr. Paul Bach-y-Rita and his team developed the Tactile Vision Sensory Substitution (TVSS), demonstrating that with a somewhat long training period (up to 150 h), users of the device could recognize common objects as well as motion, gradients, and shadows at a distance [4, 83]. While impressive, the work had a long way to go towards vision-to-tactile SS that could truly replace vision, let alone be a practical solution for daily activity. The device was incredibly bulky, having been constructed from a dental chair, hundreds of solenoids, camera equipment and electrical amplifiers. The device’s resolution was also too low to discern fine detail and long training times were required for proficiency. Furthermore, the system lacked color detection and sported a field of view was that was narrow and fixed. All of these problems made it impractical for real-world use such as navigation, reading, etc.

Some of these issues were addressed in later devices. For example the “Rabbit Display”, developed by the MIT Media Lab, made use of a tactile illusion called “saltation” in order to increase the effective resolution of a low resolution tactile display [75]. Saltation (also known as the “cutaneous rabbit”) is an illusory sensation of touch felt in between the location of where the stimuli was actually applied to the skin [23] and can be achieved by timing the stimuli in a specific manner. The authors emphasized that the display would be useful in conveying direction information to users such as pilots (such as a vestibular SSD) or to help people with navigation. Because of the low resolution nature of the display (3 × 3), it can be inferred that it can be made relatively small and lightweight, making it a viable option for mobile applications and more socially acceptable. Saltation can even evoke sensation away from the body [54], and may be used in the future to “extend” displays off of the body. The low-resolution nature of the display though limits the detail that can be conveyed, even if saltation is employed to increase perceived resolution. Generalizing this technique to a larger, finer display is not trivial though, as inducing saltation requires haptic stimuli to be presented to the skin in specific timings and patterns, limiting the representable patterns of the display and thus the informational content.

Further improving on acuity and portability, researchers in 2001 developed a Tongue Display Unit (TDU) for vision-to-electrotactile Sensory Substitution applications. The device converts images from a digital camera into electrical signals that are applied to the tongue in a similar manner to how Bach-y-Rita’s TVSS converted image information into tactile stimulation (illustrated in Fig. 4.1c). While unconventional, the tongue was chosen as the site for the TDU for both its sensitivity to electric current and density of receptors, making it better suited for discerning fine details than a user’s back. Researchers showed that users of the Tongue Display Unit were able to achieve a visual acuity of 20/860 on a standard “Tumbling E” visual acuity test and 20/430 after 9 h of training, generalizing much better than the original TVSS [57, 68].

The same group went on to use the TDU as a rehabilitation device for people with vestibular conditions affecting their balance, renaming the TDU the BrainPort. Researchers used the BrainPort to convey balance information to people who had lost their sense of balance, substituting it with electrotactile stimulation and saw marked improvements in balance, some users being able to stop using the device entirely while retaining their newfound balance [65]. This group demonstrated that haptic SSDs can not only be used as sensory substitutes but also as rehabilitation devices.

With all of these advances since the original TVSS, there are some limitations that remain untouched such as color distinction and stereo vision. Vision to tactile SSDs also are still cumbersome for practical daily use as the state-of-the-art implementations (BrainPort) require the display to be in the mouth limiting social interactions and possibly exacerbating stigma towards users. There has been more success in general vision substitution with the auditory system as the target modality. Blind users have even been able to navigate with SSDs such as the “vOICe”, which stands for “oh I see” [46, 82] and experience color with EyeMusic [1], a system that abstracts images into tones and sounds of instruments hence the name. The discrepancy in performance between vision-to-haptic and vision-to-auditory SSDs is likely to do the information capacity discrepancy being smaller between vision and auditory versus vision and haptics. Auditory-to-Haptic Sensory Substitution though enjoys a similar advantage over vision-to-haptic.

Some of the earliest attempts at general auditory-to-haptic SS were made by the Audiological Engineering Corp in the 1980s. The group designed what are now known as the Tactaid devices. The devices partition audio data into a varying number of bands based on the model of Tactaid device; for example, Tactaid VII uses seven bands and conveys activity in the bands to the user via seven unique vibrotactile actuators. Researchers evaluated the devices with users who had hearing impairments and found that users were able to discern syllables and showed “enhanced monosyllabic word recognition” but users did not report significant subjective improvements in recognition of speech [30]. A more recent and more successful method for auditory-to-haptic SS was developed in 2014 by researchers at Rice University. Instead of using just seven tactors, researchers developed a suit called the VEST containing 26 eccentric rotating mass (ERM) motors, developing patterns involving groups of 9 vibrotactile motors in a square array that conveyed directional “sweeps”. They found that the spatiotemporal sweep patterns were more distinguishable than just spatial or static patterns alone. Combining the VEST with speech processing methods (compressing and converting the speech into haptic patterns), users were able to discern speech much more clearly than ever before, distinguishing words at much higher accuracies than with the Tactaid devices [16, 59]. General purpose Sensory Substitution devices explore the limits of perception but are rarely ever widely adopted as assistive technology. Instead, application specific SSDs tend to have more success as practical aids for daily use.

4 Media

While the written word enabled mass communication, standard media formats are not accessible to the entirety of society. People with visual impairments often have difficulty accessing communication mediums due to their design being reliant on vision. Haptic SSDs for reading media are designed to convey the information in media that is visual or text-based to the sense of touch. Examples of this include devices for reading text, exploring images, and understanding maps, which are all important for and individual’s education and independence.

4.1 Language and Communication

The most famous, and arguably most successful Sensory Substitution technique for reading is Braille, the tactile, two column, three row cells of raised dots were invented by Louis Braille and published in 1829. Alphanumeric characters are converted into tactile representations, where each letter or number is assigned a Braille code occupying one Braille cell. These cells can be read and written and are the standard reading and writing system for individuals who are blind in many countries. It has been shown that after extensive practice Braille users can achieve a reading rate of 90 wpm [76]. Visual reading rates are about 200 words per minute for comparison. Written text though had to be translated into Braille before it was accessible and was often bulkier than the original material. Almost a decade and a half after the invention of Braille, refreshable Braille displays emerged as a solution to the size and heft of translated works. Refreshable Braille displays typically consist of a row of refreshable Braille cells where the dots are controlled by a piezoelectric bimorph cantilever that is activated by an electric potential [72]. Modern refreshable Braille cells are 2 × 4 in contrast to the original 2 × 3 cells, with the additional 2 dots used for cursor position and other indicators, according to the American Foundation for the Blind (AFB) [73]. Modern refreshable displays sport between 18 and 84 Braille cells and can interface with computers via bluetooth, while also utilizing input controls for typing and navigating [8, 21, 69]. In response to the era of touchscreens, methods for typing in Braille have been evolved to be compatible with the flat featureless surfaces of touchscreens [42]. Many touchscreen consumer devices today allow for Braille typing using solely the display, eliminating the requirement for additional hardware.

Other less successful methods for tactile communication were developed such as Vibratese, a tactile language based on vibrations on the body that varied in amplitude and duration to communicate alphanumeric symbols. Invented by F. A. Geldard in 1957, test subjects trained in Vibratese were able to achieve a reading rate of up to 60 words per minute using the system [61]. The language never saw widespread adoption likely due to Braille having already been the standard.

Another alternative to the refreshable Braille cell called STRESS emerged in 2005. The device uses vertical stacks of piezoelectric plates that deform with an electric current. The user places their fingers on top of the stack so that their fingers are perpendicular to the individual piezoelectric plates and the plates bend in response to applied electric current to create different sensations at the finger tips. Researchers saw promising preliminary results in creating “virtual Braille” with the STRESS device, a 1-dimensional version of Braille [37]. Researchers then went on to explore more complicated game based use-cases with the technology [81] detailed in Sect. 5.4.

Unfortunately, while technological improvements continue to advance the defacto tactile communication method Braille, literacy is in decline. The National Federation of the Blind (NFB) in a 2009 report stated that the Braille literacy rate has dwindled to less than 10% of individuals who are blind in the United States [56]. The NFB report states that Braille education is critical to literacy and employment among individuals who are blind, and while screen-readers have facilitated computer access their existence likely inhibits Braille adaption. The NFB is calling for Braille adoption to be elevated in priority for those who teach individuals who are blind.

One of the issues with Braille though is that non-digital text must be transcribed before becoming accessible, and in response several technologies emerged to read written characters beginning in the 1960s. The Optohapt for example used photosensitive sensors to detect characters on paper (on a retrofitted typewriter). The characters were passed through the sensor at a rate of 70 characters per minute creating electrical signals that were sent to vibrating actuators located at 9 spatially dispersed bodily sites [22]. During the same period, a competing device proposed by Linvill and Bliss called the Optacon (OPtical-to-TActile CONversion), was developed. The device consisted of a capture module (a wand-like device) fitted with an 8 × 12 array of photosensitive cells can be placed on a page to be read with a user’s dominant hand. The user then places a finger from their other hand on the actuator. The actuator is an array of 24 × 6 pins that the finger rests on that move up and down in response to signals from the capture module [7]. The authors claimed that a reading rate of 50 wpm could be achieved with 160 h of training.

The researchers behind the TVSS also explored different ways to display letters using the device instead of visual information, comparing static haptic patterns and dynamic ones for each letter. Letters were converted to tactile stimuli (illustrated in Fig. 4.2a–c). They found that a sliding window approach was most successful for accurate letter discrimination among participants in a user study, achieving an accuracy of 51% correct letter discrimination [40]. This conclusion (that spatiotemporal patterns are more discriminable than static ones) has been supported by later work by the developers of the VEST [59] and LRHI [19]. The sliding window approach only exposed a user to a portion of the letter at any one moment, but the whole letter would be presented over a duration of 1 s, illustrated in Fig. 4.3, imposing a maximum reading rate of 60 characters per minute (60 cpm).

Fig. 4.2
figure 2

(a) Original image representation of letter F (b) Letter converted into electrical activations (c) activations converted into solenoid positions to stimulate the skin

Fig. 4.3
figure 3

Sliding window presentation of the letter F. A user would be exposed to the sliding window stimuli over about 1 s

The TVSS implementation of a character reader seemed like overkill (400 actuators), and with a limit of 60 cpm it did not show much promise as a media reading device. More recently lower resolution displays have been explored for communicating written characters. Researchers developed a low resolution tactor array of 9 vibration actuators placed 3 × 3 on the back rest of a chair. Representations of letters were “traced” over the tactors as if the letters were being dynamically “drawn” on the user’s back. Patterns varried over space and time and participants were able to achieve an accuracy of 87% for letter and number recognition [85]. This was vastly higher than the accuracies achieved with the TVSS (51%) with far fewer actuators. This leads us to believe that for abstract information representation a more “coded” scheme may be more useful than attempting to reproduce the characteristics of the visual content faithfully. Although the hardware requirements are vastly reduced and accuracies improved, the dynamic patterns may still be too slow for use in real time, implying that a different coding scheme similar to Braille may be more practically useful.

Braille-like devices are still superior it seems when it comes to reading and writing using haptics and while Braille may be currently in decline, emerging technology in the space of refreshable Braille displays has appeared as recently as 2017 in the form of non-mechanical, air actuated displays in contrast to piezoelectric designs. This new technology uses fluids to make bubbles in the display as the dots, and it is being integrated with a traditional touchscreen tablet. This technology appeared in 2017 in the form of the Blitab (a play on words combining “blind” and “tablet”) and is purported to have 14 rows of 23 6-dot Braille cells [50]. This 2-dimensional display paves the way for richer human-computer-interaction and possibly a reemergence of Braille literacy.

4.2 Visual Content Readers

Apart from language systems, there has been growing interest in the development of haptic devices for understanding traditionally visual information such as images, graphs, and maps. Students with visual impairments are often at a disadvantage in academic settings because the content is in an inaccessible format. Even when text is transcribed or conveyed via a media reading device, images continue to present a challenge to students and teachers. An intuitive method for representing two-dimensional information using haptics are “raised paper diagrams”. These diagrams are often made from “swell paper”, which expands in an oven-like device where it has been printed on creating a tactile surface [53]. An example of such a diagram is illustrated in Fig. 4.4a, b. A similar method for creating 2D tactile visualizations that allows an end-user to reconfigure a diagram is in the form of moldable wax-based rods called “Wikki Stix”, shown in Fig. 4.5. Users can scan them with their fingers to feel the features of the visualization. While useful, Wikki Stix and raised paper diagrams still requires a translation from an original image for instructional purposes. It is also often difficult to incorporate sufficient information density due to the physical limitations of the media. Descriptions are often added by a teacher or caption to aid in comprehension of the visualizations, but a more elegant solution has been developed in the form of the Talking Tactile Tablet (T3). The T3 consists of a tactile diagram that can be felt overlaying a touch sensitive screen. When a user presses the tactile map they are presented with auxiliary audible information to complement the tactile map [36]. An even more fleshed out version uses a smartphone and 3D printed overlays to perform a similarly multimodal experience to the T3, is called TacTILE. Authors of the TacTILE developed a complete toolchain for the rapid development of such devices [25].

Fig. 4.4
figure 4

(a) Raised paper diagram of a man’s head on white paper (b) The same diagram viewed close and at an angle. Note that several different heights and thicknesses are possible on such diagrams

Fig. 4.5
figure 5

Wikki Stix used for conveying visual-spatial information via haptics. They are flexible and waxy, making them easily configurable and stationary on surfaces

More elaborate attempts to make visual information accessible began appearing in the late 1990s. Japanese researchers Ikei et al. attempted to convey an image’s textures via haptics by constructing a 5 × 10 pin finger display driven by piezoelectric actuators (similar to refreshable Braille displays). The pins though were not static like their Braille counterparts, but vibrated at 250 Hz at varying amplitudes to mimic tactile textures. Researchers converted close-up images of textured surfaces such as a bamboo woven basket, thatch basket, painted wall, and a rug to haptic textures by converting the images to pin intensities on their finger display. A user study revealed that using their technique sighted users were able to correctly identify the image belonging to the texture being displayed on the finger pad more than 90% of the time [27]. The high recognition accuracies and straight-forward method for converting images to tactile representations was promising, as generalizing to other domains would be relatively simple, although no study was performed with individuals who are blind and thus had no visual reference for the textures they were experiencing. Ikei’s method worked for any arbitrary texture but had no sense of “space” that is required to accurately convey most visualizations.

Researchers Wall and Brewster sought to solve this problem in 2006 when they developed a graphical diagram reading system by integrating the VTPlayer mouse with a digital drawing tablet and used the stylus to interact with the graph. The VTPlayer mouse is a computer mouse that is augmented with two 4 × 4-pin Braille cells. The user would point on the tablet with the stylus and receive textured information of what they were pointing at with the VTPlayer on their non-dominant hand. Complementary audio feedback would also be available if the user pressed the buttons on the VTPlayer [79]. Earlier, in 2005, Wall and Brewster performed a psychophysical study comparing the TVPlayer mouse, the WingMan Force Feedback mouse, and classic raised paper for use in image understanding. They used a simple line gradient discrimination task: a line was displayed and participants were asked to discriminate the gradient of the line using the three devices. While the force feedback mouse outperformed the VTPlayer, the raised paper was superior. Interestingly, the authors surmised that this is likely due to the combination of proprioceptive and tactile cues that neither the VTPlayer or WingMan mouse provide at the same time [80], which likely led them to develop the 2006 graph reading system using a stylus as well as the tactile feedback from the VTPlayer mouse.

Emerging techniques in Computer Vision have given way to much more comprehensive automatic image understanding systems. Facebook’s automatic image captioning [49] generates captions automatically from images. Google “Lookout” is a mobile assistive technology app that allows users to point their phone camera at objects they would like information about [11]. As of this writing, Google Lookout describes objects in the scene by giving audio descriptions such as “Trash can, 12-o’clock”, but allows the user very little freedom to explore a visual scene in an interactive way. Microsoft’s Seeing AI is slightly more sophisticated, augmented with the ability to read text, documents, people, scenes, money, and give illumination descriptions (color, brightness) [52]. All of these 1-shot methods though do not allow users to interact with images in the same way sighted individuals do, by exploring the image over time. Combining these very powerful image understanding techniques with a proprioceptive and tactile interface would likely lead to a more effective and meaningful image understanding tool.

5 Interactive Applications

Sensory Substitution devices for interactive applications are designed to function in environments that are, interactive. An interactive environment responds or changes with respect to the user’s behavior. For example, a video game is interactive while a textbook is not. Thus SSDs for interactive applications must contend with the demands of interactive environments, that is latency sensitivity, sensory overload, and diverse and dynamic situations. This section will explore SSDs designed for interactive applications of mobility and travel, interactive instructional systems, social interactions, and virtual interactive environments.

5.1 Instructional Systems

Instructional SSDs are those that are intended to be used for learning; more specifically they are intended to be used for learning in dynamic environments that react to user input, in contrast to media reading SSDs that are intended to be used to convey information about static sources such as books and illustrations.

5.1.1 Mobility Learning

Sighted people can look up images of a location and quickly acquaint themselves with the flow of the environment. Unfortunately, those for who images are inaccessible do not have such a luxury and can not benefit from the vast amounts of visual data that is available online. Furthermore, familiarity with an environment is often more important for people with visual impairments than sighted individuals. To address this issue, virtual environments that model locations that are of interest and allow people with visual impairments to interact with those environments may benefit people with visual impairments by allowing them to familiarize themselves with the novel location before visiting in person. These systems are referred to as “Mobility Training” systems.

On such system developed at the University of Colorado at Colorado Springs is called MoVE: Mobiltiy Training in Haptic Virtual Environment [70]. Its purpose is to enable people who are blind to explore a model of new environments haptically. The system is iterative, a user explores the virtual model, then explores the physical location and repeats this process to fine-tune their understanding of the space, intuitively learning the relationship between the rendered world and the real world. MoVE uses SensAble Inc’s PHANToM force feedback device, allowing users to interact with the virtual environment by poking around with the PHANToM (shown in Fig. 4.6), receiving force feedback when they contact objects. In a preliminary study, researchers found that user who are blind were quickly able to discriminate simple virtual objects such as spheres versus planes. While this approach is promising, the iterative nature has yet to be tested for individuals with visual impairments.

Fig. 4.6
figure 6

Sensable Inc’s PHANToM Desktop, a force feedback device for haptic applications

Sharkey et al. devised a more comprehensive approach using a force feedback joystick, audio feedback, and a “guiding computer agent” to create and explore virtual environments before exploring their real counterparts they were modelled after. The force feedback encoded information about texture, objects via force-fields, and structural boundaries while the audio component added descriptions of the scene as well as of the user’s orientation in space to aid in navigation. They found that users were able to accurately and quickly learn to navigate in the virtual environment and when presented with the physical version quickly generalized what they had learned to the real environment [71]. Later came Omero, combining haptic and acoustic feedback with user preferences to learn the layout of new locations similar to the Skarkey system. Researchers tested the system with people with visual impairments and received positive subjective feedback; those with really low vision were not as successful as the system made extended use of visualizations on a monitor [13].

Lahav et al. developed a similar system for cognitive mapping via a multimodal approach and compared the performance of users who are blind in real-world navigation tasks versus other users who did not have access to the technology, expressing that users who had access to the technology developed more complete and accurate cognitive maps of the environment [35]. Researchers used a multisensory virtual environment (MVE) that individuals who are blind could explore before exploring a physical environment (laid out in the same way). The MVE provided haptic force feedback and audio feedback of obstacles in the environment. Researchers found that individuals who are blind and were allowed to use the MVE developed more complete and accurate cognitive maps of the environment than those who were not given access to the MVE.

A more realistic approach was designed by Tzovaras et al. in 2009: a mixed reality system for training/educating people who are blind using a virtual white cane via the CyberGrasp device. Using a virtual white cane, trainees were able to traverse a life-sized virtual replica of an environment. Researchers enhanced the experience by providing realistic haptic feedback of cane collisions with virtual objects and realistic audio feedback [77]. This method provided the most realistic approach as users employed skills to navigate the real environment almost identically to the virtual one but may not have been the most effective for generating complete cognitive maps of the environment. A direct comparison of this mixed reality real-scale method and the non-virtual reality methods above would be a welcome addition to the literature to unviel specific advantages and disadvantages of the two approaches. Furthermore, all of these Mobility Training systems require designers to model the environments beforehand, effectively reducing the pool of available environments to a small batch. This could possibly be rectified with crowdsourcing and integration with 2D to 3D modelling techniques.

5.1.2 Motor Learning

Motor learning is the development of motor skills, and motor learning tools are tools that aid in the development of such skills. In many motor learning settings demonstrations make up the majority of the instruction. Visual impairments can hinder this kind of instruction and haptic SSDs provide a valuable avenue to replace visual instruction. Motor learning systems may also provide feedback with respect to a user’s movement in real-time, something that an instructor may not be able to give. Furthermore, some users may not be receptive to touch-based feedback from an instructor and may feel more comfortable with a device’s feedback to correct motor movements. In the absence of an in-person instructor, or when an instructor does not have time to devote to a single student, an SSD that conveys motor skill information would be also be useful to most users.

In 2002, Yang et al. designed a suit for VR-based motor learning covering the torso with a vibrotactile display called POS.T. Wear. Employing a technique called “Just Follow Me” (JFM), the researchers used the POS.T. Wear to convey movement information of nearby objects to the wearer. The JFM metaphor consists of a “ghostly master” (illustrated in Fig. 4.7) that is overlayed onto the trainee’s body in the virtual environment. The master will then guide the trainee by performing the correct movements to be learned by the trainee. Yang et al. used JFM and the POS.T. Wear to study a user’s obstacle awareness in virtual worlds and later as a motor learning tool [86].

Fig. 4.7
figure 7

A visualization of the ghostly master metaphor. A trainee (solid) feels the ghost (transparent) as it moves through the trainee’s body while performing an instructional movement. (Original image from [84])

A more intuitive haptic motor learning approach called Mapping of Vibrations to Movement (MOVeMENT) was developed by McDaniel et al. Instead of the ghostly master avatar approach in JFM, MOVeMENT seeks to map haptic stimulation to basic movements of the human body in an intuitive fashion. MOVeMENT is novel in that it is not application specific and can generalize to almost any motor learning activity. By targeting basic movements, MOVeMENT is capable of generalizing to almost any complex movement. Basic movements were developed by dividing the body via three planes that span three-dimensional space (sagittal, frontal, and horizontal planes). The planes ground the fundamental movements: extension or flexion is movement that increases or decreases respectively a joint angle in the sagittal plane, abduction or adduction refers to movement occurring in the frontal plane towards or away from the sagittal plane (respectively), and pronation or supination is rotation of a joint angle towards or away from the body from within the horizontal plane. McDaniel et al. designed haptic patterns to code for these five fundamental movements and used them as building blocks to describe more complex movements to a user using a push-pull metaphor to illicit movement in a certain plane. Participants in a preliminary study found the patterns intuitive and were able to discriminate them with high accuracy [44].

5.2 Social

Social interaction is crucial to the well-being of individuals and this of course applies to people with disabilities. Unfortunately, many disabilities preclude individuals from equitable inclusion in all aspects of social activity. This can be due to practical issues or even socially constructed expectations of social interaction. Towards enriching the lives of people with disabilities by enabling a more equitable social experience, many researchers have sought to develop systems to rectify some of these inadequacies.

Researchers at Arizona State University for example have developed several SSD technologies for use in social situations. The “Haptic Belt” (shown in Fig. 4.8) paired with a face detection system conveys the direction and distance of other people during a social interaction [43]. Tactile Rhythm was also explored in order to convey interpersonal distances to individuals who are blind [45]. These are coarse details of social interactions that are less accessible to people who are blind, but there are also very important fine details of social interaction that people who are blind miss out on too. An example of this would be facial expressions. At the same lab, researchers developed the “VibroGlove” a glove to convey facial expressions to people who are blind [33]. A chair-based approach was also explored, showing promise of conveying facial expression information via “Facial Action Units”, a system for describing facial expressions by their structural parts [5]. This culminated in a project called the Social Interaction Assistant (SIA), a person-centered SS system that combines active learning computer vision system with haptic tactors that convey information to users they might otherwise miss [60]. A user would wear a camera similar to the shown in Fig. 4.9a and receive haptic feedback from the camera using devices such as the VibroGlove and Haptic Belt (Fig. 4.8).

Fig. 4.8
figure 8

Haptic Belt developed at the CUbiC Lab at Arizona State University [67]. The belt was designed to be modular and can be extended to fit more or fewer tactors connected in series. The location of the tactors can also be modified by simply sliding them along the belt

Fig. 4.9
figure 9

(a) Mannequin wearing sunglasses mounted with a pinhole digital camera (b) close-up of pinhole camera

5.3 Electronic Travel Aids (ETAs)

Mobility is a crucial component of independence, agency, and wellness. Vision disabilities account for a large portion of these mobility issues, and it is of no surprise because navigation itself is a complicated processes requiring visual integration over time and space and a strong dependence on memory. Researchers have determined that efficiently storing and recalling the relationship of landmarks in space is essential to spacial cognition, and thus navigation [55], and because vision provides a method for establishing landmarks in 3D space it can be inferred that it is heavily reliant on for navigation [17]. For this reason, a large number of SSDs have been developed to aid those with issues navigating. The most popular Sensory Substitution device for mobility is the “white cane”, shown in Fig. 4.10a, b. This device is used to transform information that would traditionally be acquired via vision to the haptic, priopreceptive, and auditory modalities. With the white cane, users scan the ground in front of them with the cane in sweeping motions in order to detect obstacles in their path by colliding with them. Users can often infer not just the existence of an obstacle but also some of the obstacle’s properties via the tactile effects felt on contact as well as the sound emenating from the collision.

Fig. 4.10
figure 10

(a) PhD student Bryan Duarte navigating with white cane (b) close-up of white cane

There are though drawbacks to the traditional “white cane” such as the limited range at which users can detect obstacles. White canes typically have a range of 1.5 m in front of the user. A user must also collide with an object in order to detect it, which can be troublesome if the object is a person, dog, or something fragile. Users can also miss obstacles with the cane due to gaps in their sweeping pattern. White canes also can only detect obstacles at or below waist level, leaving the user vulnerable to obstacles like overhanging tree branches [66]. Researchers have instinctually sought to improve upon the white cane to remedy some of these issues.

One of the first attempts to augment the white cane was in 1945 with the “Laser Cane”. This device augmented a traditional cane with three gallium arsenide infrared laser rangefinders to detect obstacles and dropoffs at different distances. It was capable of detecting obstacles at several different angles, including an angle pointing upwards from the handle of the cane, so that users could detect obstacles above their waist and avoid tree branches. Haptic and (optionally) audio feedback was delivered to the user based on the level and distance of a detected obstacle. The device was developed with continuous feedback from travelers who are blind and was finished in 1974 [6]. While such a cane was novel, both laser and battery technology of the period restricted usage to a mere three hours per charge. The “Laser Cane” was one of the first attempts to give users information about obstacles before a collision, but it did so in a very coarse way, giving little information in the way of bearing (angle with respect to travel).

A method for detecting the bearing of objects was developed in 2002 by Dr. Roman Kuc. The device used two sonar range finders that together are used to infer the bearing of detected obstacles. Wrist-worn vibration motors vibrate with respect to the bearing of the obstacle, giving the user distance and direction information [34]. Several other “smart” canes were developed. Researchers at the Indian Institute of Technology performed a study, and found that their ultrasonic “Smart Cane” increased obstacle awareness, decreased collision prevelance, and increased mean detection distance as compared to traditional white canes in a navigation task [20]. Similar attempts at building smart canes are prevalent [2, 47] and are commonly variants of each other but [24] takes the most elaborate approach whereby the cane is equipped with wheels and “drives” a user around. The device introduces modes such as “goal finding”, where the device navigates for the user, providing turn by turn directions. This device though has not been verified by a study.

A more nuanced approach is the caneless ETA, removing altogether the need for a white cane. One such configuration is called the “Haptic Radar”, a self-contained headworn headband augmented with sensors that detects obstacles and intuitively conveys them to the wearer via haptics. The array of sensors each convey obstacle distance information for a path emanating from the sensor (there are several circling the head) [64]. Researchers found that participants tasked with a navigation task navigated more confidently with a Haptic Radar than without [10]. Caneless systems may be advantages as they may reduce stigma induced by the iconic white cane. With GPS becoming ubiquitous, turn-by-turn directions have become life-changing for those needing directional and situational assistance. Most turn-by-turn directions are conveyed using the device’s screen and are often accompanied by audio, but haptic solutions may offer a better alternative to convey this information.

5.4 Virtual

Virtual worlds are a rich part of the modern experience. Whether it be games, simulations, or educational environments, virtual worlds are becoming commonplace with the advent of consumer VR and widespread gaming hardware. One of the issues is that most virtual environments are developed with vision being the primary interaction modality, effectively excluding many individuals from participation. While some non-visual video games exist, they are few and far between and almost always rely solely on audio feedback. A few examples of modern video games accessible without vision are FEER, an “Endless Runner” game [51, 63], Timecrest: The Door, a story-based game with multiple endings and dynamic storylines [3, 14] and A Blind Legend, a first person fighting game for both PC and Android [15]. While a handful of games can be played with audio only, the majority of video games and virtual environments remain inaccessible to individuals who are blind. Haptic implementations may provide solutions to this problem.

Developers in the Haptics Laboratory of McGill University in 2006 developed a game of “Memory” using the STRESS 2 tactile display [81], a more ergonomic version of the original STRESS 1D haptic display [37]. Instead of images or text to memorize, the “cards” consisted of unique haptic patterns, making for an interesting spin on the classic game of Memory. Likewise, researchers at Arizona State University designed a 2D spatial game based around the Low Resolution Haptic Display (LRHD), a chair affixed with a 4 × 4 array of vibrotactile motors. The point of the game was to find the goal 2D top-down environment. The user’s position was displayed on the haptic chair as well as the goal using unique vibration patterns and the user could move in the environment using a computer mouse peripheral to find the goal. A study using the game found that users were able to learn how to play the game quickly and their performance increased markedly as they played [19]. An image of the Low Resolution Haptic Display is shown in Fig. 4.11. These games are in contrast to audio-only games as they are haptic-only games.

Fig. 4.11
figure 11

The Low Resolution Haptic Display, a 4 × 4 array of vibration motors mounted vertically on acoustic foam for compliance and damping [19]

Several devices and systems have been developed as SSDs for virtual environments. Some of these SSDs substitute vision for touch, while others substitute virtual touch for physical touch. For example, in 1998 researchers employed a force feedback joystick called the Impulse Engine 3000 as an interface to virtual textures and objects. Researchers demonstrated a statistically significant relationship between the virtual texture’s perceived roughness with the physical analogue and found that participants who were blind were more discriminating than sighted ones using their system [12]. More complex interaction such as discriminating the angle and identity of objects proved more difficult to discern with the system. Researchers found similar results in 1999 using the PHANToM force feedback device (pictured in Fig. 4.6) [29]. Again, simple textures were rendered convincingly but the technology was not convincing for object recognition. The primary limitation with these implementations is that only a single point of contact with the “virtual world” is possible, making the interactions akin to poking around with your finger in virtual space.

In response to these problems, researchers proposed non-realistic haptic rendering (NRHR). They argued realistic rendering can be too complicated to parse haptically and non-realistic haptic rendering can make things simpler, giving researchers the chance to eliminate distracting details while emphasizing the important information [31]. To do this, they mapped 3D models onto 2D planes which they argued were easier to navigate. The researchers also propose a different method for guided navigation in virtual environments: a haptic guide. Guiding forces are given to the user as force vectors placed on the PHANToM’s stylus [32]. Similarly, in 2012 researchers using the VTPlayer Mouse developed and tested directional cues via the Braille-like cells. Participants found the cues intuitive and easy to learn [62]. This body of research implies that directional guides are useful in navigating virtual environments haptically.

Towards navigating virtual environments “naturally”, in 2013 researchers developed the Virtual EyeCane. The virtual cane gives users an auditory signal with respect to the closest object the cane is pointed at in the virtual world [41], making this system a Virtual Electronic Travel Aid (VETA), similar to the first Laser Cane but unhindered by the limitations of rangefinding in the physical world. A more comprehensive approach was taken by Zhao et al. in 2018 in development of the “Canetroller”, which is a virtual cane that gives realistic auditory and haptic feedback in the virtual world so that people who are blind can translate their cane skills to VR. The Canetroller realistically simulates cane forces, impact vibrations, and impact sounds [87]. Besides the EyeCane and Canetroller, there have not been any significant attempts to make accessible to people with visual impairments virtual worlds on equal footing, in essence to take a visual world and present it using an SSD such that they can interact in much the same way as their sighted counterparts. Virtual worlds by their very nature provide mechanisms for making them accessible as object detection and semantic segmentation are less complicated in those environments.

6 Conclusion: Future Trends and Trajectory

Some standout implementations of Haptic Sensory Substitution are Bach-y-Rita’s TVSS, the BrainPort, and the Eagleman and Novich’s VEST, showing the true raw representational power of the modality, but they also reveal some limitations. For the TVSS, long training hours, a chair-based design with many actuators, and lack of fine details hinder its use in real-world applications. While the BrainPort tackles the portability and details issues somewhat, it still suffers from the practical concerns of requiring the display to be placed on a user’s tongue. For auditory substitution, the VEST is impressive in its ability to convey speech, but other more subtle aspects of hearing are still missing, such as localization via stereo hearing. Further strides in the realm of Haptic Sensory Substitution are more likely to arise with clever integrations with emerging signal processing tools and clever delivery techniques.

In the realm of vision-to-haptic SS, strides in Computer Vision show promise for enabling more effective Sensory Substitution. For example, object detection has made great strides, as well as depth estimation from monocular images. Having access to both depth and object identities from monocular images could drastically improve ETAs by allowing ones that rely on depth information to use only a camera instead of lasers, sonar, infrared, or stereo cameras. Figure 4.12b illustrates the impressive performance of emerging depth estimation models (MegaDepth). The methods underlying the image understanding applications from Sect. 4.2 utilizing neural networks also show great promise in augmenting haptic SSD technology. Combining these powerful models with a proprioceptive and tactile interface would likely lead to a more effective and meaningful image understanding tool that can be used both in the physical world but even more so in virtual environments, possibly making all visual virtual worlds sufficiently accessible to people with visual impairments. Automated or semi-automated methods of architecture modeling [39] also show promise for alleviating the manual design requirements of mobility training systems. Combined with systems such as the CaneTroller and crowdsourcing, could make familization with novel environments in the safety of one’s home practical and accessible to people who are blind, having applications for education in the form of virtual field trips.

Fig. 4.12
figure 12

(a) Original image of an office (b) depth image from model trained on the MegaDepth depth dataset [38]