1 Introduction

Worldwide, several hundreds of millions of individuals suffer from visual impairments, as reported by the World Health Organization (WHO Global Data on Blindness and Vision Impairment 2018). Due to a partial impediment of or complete absence of vision, persons with such impairments often face difficulties in their activities of daily living. To alleviate the situation, numerous assistive devices have been developed in the past, supporting access to visual information. Several of these solutions are being successfully employed in everyday life, such as refreshable Braille displays, screen readers, audiobooks, or smartphones with voiceover/talkback function; commercially-available solutions being the ALVA Braille Controller, the Esys40 Braille display, the VoiceOver screen reader, or Audible audiobooks, to name a few available systems.

The sense of touch and haptically-assisted interfaces play a key role in the current development of novel assistive devices. A key advantage of using haptic feedback, in comparison to audio feedback, is that it neither blocks nor distorts the sounds coming from the environment, which represent an essential input for blind users. Numerous research projects have been carried out in recent years to advance the available technology. In this chapter, we focus on two concepts of technological solutions for assisting persons with visual impairments. In the first, we will address work focusing on electronic travel aids that employ haptic feedback to support safe and independent ambulation and navigation of visually impaired users. In the second, we will discuss haptically-assisted solutions to provide access to visual media on mobile devices, also aimed at this user group. In addition, we also provide a review of the state-of-the-art, covering early and recent developments.

2 State of the Art

2.1 Early Developments of Assistive Devices

The first attempts to encode visual data into the haptic modality date back over a century. It has been reported that already in 1881, a system called Anoculoscope has been envisioned [1], which would project an image onto an 8×8 array of selenium cells; however, the system had never been realized. In the 1920s, Gault described a device to transform speech into vibrational stimuli applied to the skin [2]. The method enabled subjects to distinguish colloquial sentences and certain vowels. His Teletactor, a multi-vibration unit delivering stimuli to five fingers, enabled subjects with an auditory disability to perceive speech, music, and other sounds. This denotes a typical example of sensory substitution, where one sensory stimulus is replaced by another, e.g. here sound to tactile substitution. In the 1960s the Optacon (Optical to Tactile Converter) was developed to enable visually impaired users to access tactile representations of black-and-white images captured by the system [3]. The device comprised of a photoelectric sensor and a 24×6 array of piezo-electrically driven tactile pins. While the Optacon presented tactile images of small areas, the Tactile Vision Sensory Substitution system developed by Bach-y-Rita et al. provided a wider range of tactile images [4]. For these, solenoids were employed that provided stimuli according to the visual image of a video camera.

Next, we will overview work addressing haptically-assistive solutions, both for supporting mobility of visually-impaired users, as well as for exploration of visual data on mobile devices.

2.2 Mobility Assistance

2.2.1 White Cane Usage and Electronic Travel Aids

The traditional white cane (also called long cane) is the primary mobility tool to support persons with visual impairments. It is mostly divided into three sections: a handle, a cane tip, and a long and hollow (usually white) cylinder shaft, made of fiberglass or aluminum. Haptic properties of obstacles (stiffness, viscosity) or ground surface information (texture) is sensed in the hand holding the handle, transferred via the cane tip and hollow cylinder part from the ground. Due to the limited length of the cane, only the swept surrounding at a distance of approximately 1.2 m can be detected by the traditional white cane [5]. In addition to the limited detection range, it is not possible to detect obstacles located at the upper body or head level. According to the survey conducted by Manduchi and Kurniawan (2011), more than 90% of persons with visual impairments had experienced a head level incident [6]. Moreover, 23% of these resulted in medical consequences. Extending the physical length and vertical range of the traditional white cane could be a solution to address this issue, but such a cane would become cumbersome and too heavy to carry as a daily mobility tool. Instead of physically extending a white cane, electronic travel aids (ETAs) have been developed, incorporating various sensing technologies to increase sensing ranges. Such devices generally comprise a sensing element for obstacle detection, as well as a displaying element to deliver information through other sensory channels. To substitute vision, auditory or haptic feedback is often adopted in ETAs to communicate sensor readings. A few decades after the white cane was introduced, the first ETA was marketed in the 1960s [7]. Since then assorted ETAs with different sensing and displaying components have been developed and commercialized.

2.2.2 Sensing Elements in ETAs

The most frequently employed sensors in ETAs rely on ultrasound signals – a low cost and lightweight solution providing robust obstacle detection. These sensors emit ultrasonic waves, based on which the time until the echo pulse returns to the receiver is measured, yielding the distance to the reflecting surface. The beam angle of the sensor is about 50–60, and usually the closest object within the workspace is detected [8]. Due to the employed measurement principle, obstacle detection can fail when the surface of an obstacle is angled with respect to the sensor [9]. In contrast to this, infrared (IR) sensors have a relatively narrow beam angle. IR sensors measure the distance of an object depending on the magnitude of reflection of emitted IR light [10,11,12]. A key disadvantage of IR sensors is their difficulty with detecting transparent objects such as glass doors. Still, the aforementioned two approaches are the most widely used sensing technologies for ETAs. Other options to obtain proximity information are video cameras, combined with image processing techniques [13], laser range finders [14], depth detection systems such as the Kinect [15], LIDAR sensors [16], and time-of-flight (ToF) cameras [17, 18].

2.2.3 Display Elements

Among the five main human senses, hearing and touch are typically employed in ETAs as the display media to compensate for sight loss. Auditory feedback ranges from simple ON/OFF binary beeps to speech output announcing obstacles or providing directions for navigation. Allowing for the variation of frequencies and/or amplitudes of auditory feedback increases the information transfer (e.g. encoding distances to obstacles) [19, 20]. A larger number of systems have been developed utilizing the sense of touch, comprising of various configurations and display methods. Unlike auditory feedback, haptic stimuli (especially tactile cues) can be displayed to diverse locations on a user’s body. Wearable haptic displays in ETAs have been developed for usage on the forehead, forearm, foot, and/or torso [21]. In addition, light-weight and small-sized sensors and actuators have also been integrated into handheld ETAs (e.g. the miniguide system). Alternatively, some ETAs are combined with a traditional white cane, to maintain the basic functionality of the latter, while also providing additional information to the user [12, 22]. The majority of ETAs (i.e. industrial as well as research prototypes) employ small-sized coin-type or cylinder-type vibration motors. As an alternative, electrotactile displays present electric currents to body parts, such as the forehead, hands or tongue, via electrodes [21]. Moreover, Gallo et al. [12] developed a mechanical shock generation mechanism to mimic haptic sensations of a traditional white cane when hitting an obstacle. For this, a spinning wheel and a solenoid break were combined. Recently, Spiers and Dollar [23] introduced a handheld shape display that provides two degrees of freedom of haptic sensations and tested the device both with visually impaired and sighted users [24]. Extension and rotation of a physically-held prop represent navigation commands (i.e. direction and distance to a target). Related to this, in the haptic torch system, the rotation of a bump fixed on a rotational disk indicates the distance to an obstacle [25]. Finally, a pin display driven by piezoelectric ultrasonic actuators or tendons connected to a small DC motor has been outlined in [26, 27]; however, the applications were not specific to ETAs.

2.2.4 Feedback Coding Methods

ETAs mostly support either general navigation or obstacle avoidance and provide only occasionally both functionalities. In the case of an electronic navigation (or orientation) aid, directional commands for reaching a destination are presented. In contrast, an electronic mobility device provides feedback about the area in front of a cane user [10, 22]. Details about the distance and height of an obstacle can be delivered via auditory and haptic feedback interfaces. Analogous to a car parking system, the Teletact device generates 28 tones (ranging from 131 to 1974 Hz), each representing different distance intervals (ranging from 12–16 to 2000–3000 cm) [19]. ETAs with a single vibration motor often vary the intensity level according to the distance to an obstacle (e.g. the iGlasses TM Ultrasonic Mobility Aid). Gallo et al. [12] employed the tactile apparent movement phenomenon to change the velocity of tactile flow on three vibration motors according to the distance to an obstacle. Multiple vibration motors were also used to indicate discrete distance levels as well as the vertical location of obstacles (e.g. detecting obstacles at head or knee level) [10]. Recent work finally investigated optimal tactile coding methods via vibrotactile feedback for determining several sensing distances [28].

2.3 Haptically-Assisted Presentation of Visual Data

2.3.1 Display Using Kinesthetic Feedback

An attempt to present visual data to persons with visual impairments using a peripheral haptic device was described by Moustakas et al. [29]. They generated force fields which permitted users to explore images through a force feedback device. The system could process 3D maps as well as conventional 2D maps. Using various off-the-shelf haptic devices, such as the Phantom or the CyberGrasp, the calculated force fields could be displayed. The system was not portable, however, since visual and haptic rendering was processed on a desktop computer. Similarly, Sjöström and Jönsson [30] developed a system to render 2D functional graphs employing the Phantom haptic device. The functions were haptically presented as a groove or ridge on a flat surface. Also, Fritz and Barner [31] describe a setup to display different forms of lines and surfaces to blind users through the Phantom device. They employed the notion of virtual fixtures to let a user trace lines in 3D space.

2.3.2 Display Using Tactile or Mixed Feedback

Raza et al. outlined ideas to convey abstract graphical and non-textual content, such as bar and pie charts, via tactile feedback to blind users, as outlined in an early draft in [32]. An electrostatic friction display was employed to present individual colors in an image as perceptually distinct tactile stimuli. Memeo et al. reported on a haptic system that enabled blind users to navigate between two locations in virtual environments [33]. They combined two haptic devices to transmit allocentric and egocentric data independently. The allocentric perspective is displayed through a 3DOF tactile mouse [34] via the fingerpad. The directional cues are rendered to a vibrotactile head-mounted display. In further work, Sinclair et al. introduced a 1-DoF mechanism, actuated in the (vertical) Z-direction [35]. It has been combined with vibrotactile actuators acting in the (horizontal) XY -plane. The combined stimuli enhanced the haptic feedback; however, no rotational movement was included.

In the developments described below, we made use of the Tactile Pattern Display (TPaD) device, provided in the scope of an open source project that has been launched at the Northwestern University, as described by Winfield et al. [36]. Tactile feedback is delivered to a user through varying friction on a flat smooth surface, achieved by vibrating the latter. In the initial version, the variable friction is applied to the whole screen, by actuating piezoelements attached to a glass plate. Using this interface, various textures can be rendered. A later version has also been outlined in [37]. In general, the concept of friction-altering surfaces is based on the work of Salbu and Wiesendanger, who both investigated the so-called air squeeze-film effect [38, 39].

3 Haptic Feedback in Electronic Travel Aids

Among our common daily activities, an important one is ambulation, which can be challenging for visually impaired persons. To this date, the traditional white cane is the most widely used mobility tool for the latter, while a smaller number relies on the assistance of a guide dog (about 10% vs. 0.5%, respectively) [40]. Although various ETAs have been proposed in the past decades, they have not yet been adopted by the majority of visually impaired users. It can be argued that the drawbacks and limitations of ETAs still outweigh their advantages. To increase the acceptance, the following characteristics should be considered when designing an ETA.

  • Maintaining key functionality: Most pedestrians with visual impairments have been trained in orientation and mobility using a conventional white cane. It is thus beneficial to maintain the key functionality of the instrument: detection of obstacles and/or drop-offs ahead, detecting and following tactile ground surface indicators, delivering any other general texture cues of the ground surface to the handle, as well as additionally also the indication of the impairment to the surroundings. Since pedestrians with visual impairments are accustomed to walking with a white cane, maintaining these features would ensure the same level of safety, while reducing training time with a new interface.

  • Ease of use: The straightforward way of utilizing a conventional white cane for safe and independent ambulation is likely the most important reason for its success and acceptance by users. ETAs could potentially provide various new functionality, such as vicinity information via several sensing and display elements. However, this may render the system too complicated, especially with continuous delivery of copious information, possibly confusing users. Such information overload may decrease the overall efficiency of walking (e.g. slowing of walking speed, discontinuity in walking, etc.); therefore, mechanisms requiring too complicated manipulation during walking should be avoided.

  • Efficient enhancement of original functionality: ETAs should complement the standard functionality of a white cane. Any additionally provided information would presumably be displayed via the auditory or haptic sensory channel. Such augmented and enhanced feedback should thus be easily perceivable, and not hinder the basic functionality of the white cane.

With these recommendations in mind, an option to reduce the cognitive workload when providing detail information about obstacles via an ETA is to switch between two modes – a sweeping mode and a scanning/pointing mode. This has been explored, for instance, for displaying different levels of obstacle information in [10, 12, 18]. This strategy permits not only nearly seamless walking but also delivers detailed information only when the cane user requests/desires it. Below we will first characterize the use of a white cane in sweeping mode, and then outline several studies carried out with visually impaired persons, using a newly developed ETA.

3.1 Sweeping Mode Characteristics

The mentioned sweeping mode refers to the standard mobility technique, in which a traditional white cane is swept side to side. In normal white cane usage, physical contact between the cane and an obstacle is mediated by the cane shaft. In contrast in ETAs, an enhanced cane, virtually extended through proximity sensors, would generate information about possible collisions via auditory or haptic feedback. Following the previous discussion, it is recommended to only present e.g. simple binary obstacle notification feedback or distance information to the nearest obstacle. When an obstacle is detected during the sweeping mode, a white cane user could switch manually to the scanning mode, e.g. by pressing a button. However, it would also be possible to effect this switch automatically, based on monitoring angular velocity or sweep angles of an ETA. Therefore, we have obtained data on normal usage of a white cane in several studies.

We have measured 3-DoF angular velocity and acceleration using an inertial measurement unit (IMU) [41] attached to a white cane shaft. In addition, also video data synchronized with the above are obtained via a small (50 × 20 ×29 mm) portable video camera (TrendMicro, BRAUN) fixed to the shaft. These data were acquired while a white cane user was walking with this sensorized cane. The obtained data enable us to estimate the ambulation center line as well as the maxima of the sweeping motion. As depicted in Fig. 2.1, both acceleration as well as angular velocity vary from the start of the motion and are periodic in all dimensions. Based on these raw data, the angular velocity around the Z-axis (i.e. the yaw angle) could be used to determine the times of switching between the two modes. Only the angular rate of the yaw angle of the cane is shown separately in Fig. 2.2. Note that the highest and lowest sweeping angles reached correspond to angular rates of about 0 ° s−1, whereas the angular rate maxima and minima are observed at the sweep center line. As mentioned above, cane users could switch to the scanning mode, in case they are interested in obtaining further information about an obstacle, e.g. shape, height, color, etc. This switch could be done manually or activated e.g. by a reduction in angular rate.

Fig. 2.1
figure 1

3-axis acceleration and angular velocity during sweeping a white cane

Fig. 2.2
figure 2

Angular velocity of the white cane in z-axis (yaw angle) when using the sweeping technique

In the following, three user studies will be outlined, performed together with visually impaired white cane users. The goal of the experiments was to examine the usability and effectiveness of a new electronic travel aid developed in our research work. We will also examine how the systems align with the proposed ETA guidelines.

All experiments were carried out with the Eyecane ETA [18]. It comprises of a ToF sensor to detect distances of obstacles and further depth properties of obstacles. As a feedback interface, four eccentric rotating mass vibration motors (ERMs) were mounted in the handle, as shown in Fig. 2.3. Each ERM is enclosed by a rubber ring to reduce the transfer of vibrations to the whole handle. This permits presenting localized vibration feedback to individual fingers. The outer portion of the rubber rings is touched by the index, middle, ring, and little finger, respectively. The shaft of a conventional white cane is finally also attached to the enhanced handle.

Fig. 2.3
figure 3

Eyecane handle (left) and vibrotactile actuators mounted into cane handle (right). Each ERM is placed in the middle of a rubber ring (top right). A portion of the rings protrudes outward, where the fingers contact it on the surface (right bottom)

3.2 Ground Surface Identification

The conventional white cane plays a crucial role in identifying ground surfaces or structures ahead of a cane user [42,43,44]. Such surfaces or structures could, for instance, be a curb, uneven pavement, a drop-off, tactile surface indicators, slippery floor, or grass. Tactile paving, also called tactile ground surface indicators (TGSI) or tactile walking surface indicators (TWSI), are installed on pedestrian walkways and passages to assist persons with visual impairments in safe ambulation [42]. Although the appearance of TGSI differs between countries, usually two main types exist: directional TGSI assist in wayfinding, by indicating a path or direction, while warning TGSI, also called attention TGSI, inform and warn about hazards ahead. Any additional haptic feedback displayed by an ETA should not interfere with detecting and discriminating such different ground surfaces. Therefore, a user study with the newly developed ETA was conducted.

Seven panels of ground surface samples, 570 ×570 mm in size, were prepared, equipped with caster wheels for easy exchange between trials. As depicted in Fig. 2.4a–g, the selected ground surfaces were: a sheet of polyvinyl chloride (PVC), wood, artificial turf, tiles, cobblestone paving, directional TGSI, and warning TGSI (comprised of asphalt, embossed with 4 mm tactile paving lines). The experimental task was to identify the sample panels, while the Eyecane was displaying a randomly selected vibrotactile signal. For the latter, signals designed to indicate the distance to an obstacle were used, as developed in [28]. Nine legally blind white cane users (seven male, two female) participated in the study. Participants were sitting comfortably on a chair and wearing a headphone to listen to white noise. In addition, two legally blind participants, who still had residual vision, were asked to wear a blindfold to mask any visual input from the ground surfaces. Each ground surface was presented 20 times, in a pseudo-randomized order. Participants were asked to sweep the ground surface with the Eyecane and to discriminate between the given surfaces, as illustrated in Fig. 2.4h. A confusion matrix compiling the results of the experiment is provided in Table 2.1. As can be seen, the mean correct identification rate was 99.3%. False responses were found mainly in two cases – discriminating between the PVC sheet and the wooden panel, as well as between tiles and cobblestone paving. Nevertheless, incorrect identification rates varied only between 0.1% and 1.6%.

Table 2.1 Confusion matrix of surface sample identification task
Fig. 2.4
figure 4

Seven ground surfaces for the identification test. (a) Polyvinyl chloride sheet, (b) wood, (c) artificial turf, (d) tiles, (e) cobblestone paving, (f) directional TGSI, (g) warning TGSI, (h) experiment setup for the ground surface identification

Following the perceptional study, additional quantitative measurements were carried out, to characterize the tactile signals obtained from the surface textures. To this end, a 3-axis accelerometer (ADXL335, Analog Device) was mounted at the middle of the cane shaft. It was connected to a data acquisition device (NI-USB 6008, National Instruments) and acceleration data were obtained via a LabVIEW interface at 1000 Hz. The same subjects as before participated in the measurements. Neither a blindfold nor a headphone was used for this study. All subjects were asked to stand up in front of a ground surface panel, and then walk in place, while sweeping the Eyecane across the surface sample, as they normally would employ their white cane (see Fig. 2.5). Measurements for each ground surface lasted about 10 s. Figure 2.6 depicts box plots of peak-to-peak acceleration amplitudes obtained from the nine participants when sweeping on the seven ground surfaces. Smooth ground textures, such as PVC and wood panel show lower amplitudes. In contrast, both TGSI types yielded higher acceleration amplitudes. Figure 2.7 depicts normalized acceleration of two different example surfaces. As can be seen, amplitude profiles differ apparently between a smooth surface and rough surface.

Fig. 2.5
figure 5

Surface texture measurement setup

Fig. 2.6
figure 6

Peak-to-peak amplitudes of acceleration for the seven ground surfaces

Fig. 2.7
figure 7

Normalized acceleration when sweeping the cane

Finally, the propagation of vibrations, generated by the ERMs to the housing, was also studied. For this, a small-sized accelerometer (ADXL325, Analog Device) was attached to the rubber ring enclosing the vibration motor, at the point where the fingers would be in contact on the handle surface. In addition, another small-sized accelerometer was attached to the surface of the handle housing. This setup permits validating the proposed design using the rubber rings as dampers. Vibrotactile feedback was rendered for four distance levels, as described in [28]. Figure 2.8 visualizes the vibration amplitudes measured with the sensors. Note that Level 1 indicates the smallest distance to an obstacle and Level 4 the highest, respectively. As can be seen, the vibration amplitude decreases by about an order of magnitude, from the rubber piece to the handle.

Fig. 2.8
figure 8

Vibration amplitude measurements on the cane handle and the vibration rubber ring for two different distance levels

3.3 Detection of Tactile Distance Signals During Sweeping

In previous work, we have compared three different modes of rendering tactile signals in ETAs, each encoding a distance range of obstacles [28]. In the first rendering method, signals are generated using only temporal variation between individual stimuli. In the second, both temporal and spatial variation are employed. The third method comprises spatial, temporal, and intensity variation. In our previous work, we found that the first rendering technique resulted in lower identification rates as compared to the other two. In addition to this, we also examined grip types. From the data, we could conclude that a four finger grip configuration provided significantly higher accuracy in detecting distance levels than a single finger grip (i.e. single finger across all four vibration actuators).

Nevertheless, this prior study was carried out in a static configuration – that is, participants sat on a chair and the cane tip contacted a wooden surface statically, without any motion. This experiment has now been extended in our work, by examining the effectiveness of the proposed tactile rendering methods in a more practical setting. Seven legally blind white cane users (three female, four male) took part in this study. As suggested in [28], the four finger grip configuration was utilized. Thus, participants placed four fingers on the four individual vibrating rubber rings, as depicted above in Fig. 2.3. In the tactile rendering method, the actuators were displaying vibration patterns consecutively. Note that the direction of tactile flow showed no influence in identification accuracy in our prior work; therefore, we employed an outward direction of the tactile flow. In contrast to the previously examined static situation, participants now stood up, walked in place, and swept the Eyecane again across tactile surface samples. In this study, three different ground surfaces were employed: wooden floor, cobblestone paving, and directional TGSI. The surface samples were selected in a pseudo-random order and presented to each subject, with 240 trials in total. As soon as a subject verbally reported a distance level, as displayed by the tactile pattern, the experimenter recorded the answer and then initiated the next stimulus trial. Subjects were allowed to take a break in between, as desired.

The mean correct identification rate for all signals on all surface samples was 99.12%. For the wooden floor, the cobblestone paving, and the directional TGSI, it was found to be 99.1%, 99.1% and 99.17%, respectively. For only 14 out of 1680 trials, incorrect responses were given. These were found only for the distance levels 2 and 3, for which only the middle and the ring finger were stimulated.

3.4 Detection Accuracy of Distances Encoded by Vibration

In a final study, we examined how well depth ranges, displayed as vibrations to users, could be identified. For this, a ToF sensor with a resolution of 176 × 144 pixels (SR4000, MESA Imaging) was employed. The depth camera was held by the experimenter and depth images of obstacle scenes were captured in a static situation. The modulation frequency was set to 15 MHz; the maximum distance at which obstacles could be detected was 10 m. Initially, the information about depth in the environment is given with respect to the location of the camera. Based on the angle of the camera, the horizontal distance of any visible obstacle in the depth image can be calculated. After resampling, a pixelized depth image of obstacles ahead is obtained, in which correct horizontal and vertical Cartesian distances are stored.

For the display in the following study, the overall covered vertical height in the depth image is subdivided equidistantly into separate levels – using a total of either four or eight levels. In the former case, from top to bottom (according to human body sections), we consider the head level, the body level, the leg level, and a drop-off. In the latter case of eight sections, each of these previous four levels is subdivided further into two subparts – an upper and a lower section. The general idea of the vertical subdivision of height into levels is illustrated in Fig. 2.9. Note that the bottom drop-off level is subdivided into a shallow and a deep drop-off; the former denoting a vertical distance of less than 20 cm, and the latter one equal or greater to 20 cm, respectively. The former could be a curb or a step of a descending stair, while the latter may be a deeper hole in the ground or descending stairs. Regarding horizontal distance, the depth pixel with the smallest distance within a respective height level interval is employed. Note that in order to reduce processing time, as well as the cognitive workload of a user who would be receiving the corresponding signals, only a small 10-pixels-wide column around the depth image vertical centerline is considered for this. As an example, in Fig. 2.9 the obtained horizontal distances to the steps of a stair are given in meters.

Fig. 2.9
figure 9

Four/eight height levels of obstacle depth information. Per level, the horizontal distance between the closest point on the obstacle and the frontal plane of the user is specified

For our user study, depth images of fifteen different environments with obstacles were obtained. Figure 2.10 depicts three example images of acquired depth data of such scenes. The depth is color-coded; also note two vertical lines in the center indicating the source region for the depth values. Finally, normal photographs of the surroundings are also provided. Any vibrotactile distance signals would be based on the visible depth information. Assuming four levels, in the first example (sequential) signals would be generated for the head, body, and leg levels, respectively. Since the distances per level are getting smaller, the signal intensities would increase. There would be no signal for the drop-off level since no vertical downwards step is present. The second and third example are typical for scenes where the obstacles at the body or head level may not be detectable by a conventional white cane. Vibratory feedback in the second example would be at the body level only. In the third example, a stronger intensity would be experienced at the head level, and lighter feedback at the leg level.

Fig. 2.10
figure 10

Examples pictures of obstacles and depth profiles. Top: depth information of time-of-flight camera; Bottom: color images of charge-coupled device camera

A study was carried out with the described system, examining how well users could perceive depth details of obstacles encoded via the vibrotactile display of the Eyecane. Five visually impaired subjects (one female, four male) participated in the study. Depth was covered in a range from 0.8 to 5 m. The latter was subdivided equally into 8 depth intervals – each measuring 0.525 m. Eight different vibration intensities were employed to encode these eight depth intervals. The previously described ERMs were activated with pulse width modulation, using linearly decreasing duty cycles, ranging from 100% for the shortest distance to 20% for the longest distance. Note that distance values larger than 5 m were not displayed.

Participants were asked to hold the handle of the Eyecane with the four fingers grip. In the case of using four height levels, each vibration motor mapped exactly to one level; that is, from the index finger mapping to the head level, to the little finger mapping to the drop-off. In the case of eight levels, the first four sublevels (head and body) are displayed to the four fingers first, followed by a second set of signals, mapping the remaining four levels (leg and drop-off) to the fingers. That is, in this case, each finger receives two consecutive vibratory signals.

Each ERM is activated for 250 ms, with an inter-stimulus interval of 150 ms. Thus, the total duration of each tactile depth rendering is 1450 ms for the four and 3050 ms for the eight level case, respectively. Moreover, if no obstacle is detected, no signal is displayed during the respective time interval. In each trial, such sets of cues indicating the depth in each pseudo-randomly selected scene (out of fifteen) were presented to participants three times consecutively. An auditory beep initiated the start of each trial. Participants were then asked to verbally state the perceived distance ranges – either four or eight, depending on the condition. In total, 30 such trials for the four-level and 15 trials for the eight-level configuration were presented to each subject, respectively. The verbal responses were recorded by the experimenter.

The correct identification rate of all distance range renderings, considered in separate, is about 78%. However, it has to be stated that 53.75% of all presented cases where no signal ones, i.e. no vibration was displayed; it can be argued that correct identification in this case is trivial. As an alternative, measuring correct identification rate per depth image (i.e. all distance ranges over all height levels), the accuracy was found to be lower. As shown in Fig. 2.11 (right), a complete set of distance values was only correctly identified in 40.7% and 33.3% of the cases, in the four and eight level condition, respectively. Nevertheless, further analysis of the type of errors (left) indicates that in the majority of cases false responses were only off by one interval step. Larger estimation errors, of two or three intervals, were less common. The correct identification rate, when allowing an error margin in the depth interval estimates, would show an improvement, as can be expected (see again Fig. 2.11). Moreover, note that the absence of a signal was never misclassified. Finally, due to the lack of an absolute intensity reference, participants had to rely on the initial training phase for calibration. This sometimes led to cases where random depth signals, that coincidentally were consecutive, were all perceived as being shifted (e.g. consecutive depth signals for interval 6, 7, 8 were reported as 5, 6, 7).

Fig. 2.11
figure 11

Results of depth interval study; subjects’ deviations from correct intensity levels (left); accurate identification rates, for different intensity error margins (right)

4 Haptic Devices for Exploration of Visual Data

In the past decades, touchscreen displays manifested themselves as an integral and ubiquitous part of everyday life. The scope of applications ranges from general personal use to highly specialized tasks. Although the advances in mobile and tablet technology are vast, the integration of complex haptic feedback, except for vibratory cues, is generally not considered for these devices, leaving a gap in terms of closed-loop physical interaction. Despite the fact that an increasing number of visually-impaired persons own a smartphone, they mainly rely on auditory feedback with these. However, the latter is limited regarding the presentation of some visual content, such as images or shapes.

In order to address the mentioned shortcomings, we have developed a set of shape displays combined with surface haptics rendering systems based on mobile touchscreen tablets. In the following sections, we will first overview the devices. Thereafter, we will discuss rendering methods developed for the interfaces, and finally, we will outline corresponding user experiments and initial results. More details on these systems can be found in [45] and [46].

4.1 Hardware Overview

4.1.1 Early Prototype

Our first developed prototype can be seen as an extension of the TPaD device of the Tablet Project (see Fig. 2.12 (right)). It incorporates two modes of haptic display – texture rendering via piezoelectric surface haptics and shape display via a motion platform. As a tablet, we employ the Asus NexusTM 7, which is connected to an IOIO board via Bluetooth, and to the TPaD circuit board via micro-USB cable. The microcontroller and the amplifier are integrated into a single circuit board. The latter generates the necessary output to control the actuators of the haptic display components. All rendering computations, as well as the peripheral device control, are carried out on the tablet.

Fig. 2.12
figure 12

Mechanical design of initial prototype (left); top-down view of final assembled prototype (right). (Obtained from [45] (courtesy of L. Chhong Shing))

The TPaD surface haptic display comprises a piezoelectric bending element that creates the necessary actuation to achieve a squeeze film effect. It comprises five 8 mm wide piezoceramic layers, glued onto a passive support layer made of glass. A crucial component is the amplifier that generates the required high voltage signals of up to 100 Vpp to drive the TPaD. Its operation is frequency dependent, wherefore the amplifier must be tuned to operate at the resonance frequency of the glass. For tactile rendering, the tablet determines the desired amplitude, which is transmitted to the microcontroller. The latter then outputs a pulse-width-modulation (PWM) signal through the amplifier to the TPaD.

The shape display has two degrees of freedom. The general mechanical design is illustrated in Fig. 2.12 (left). It consists of two frames – the outer being rotated by the first motor and the inner by the second. The inner frame is seated inside the outer one; motor activation results either in a direct rotation about the X-axis or the Y -axis, respectively. The employed actuators are standard low-cost servomotors available from Parallax. Finally, the surface haptics tablet is placed on top of the inner frame, which is dimensioned such as to hold it in place. The shape display device component is mechanically limited to rotations of approximately ±15 around the axes.

Due to the fixed center of rotation about each axis, the device is best suited to displaying changing contact normals at the center of the tablet. Note that at locations different from the center, artifacts perpendicular to the interaction plane are introduced – a user’s finger is either pushed upwards or loses contact with the tablet when it moves. This can be avoided by adding additional degrees of freedom to the motion platform, which led to the development of the designs outlined in the next sections. Nevertheless, for contact locations close to the center of rotation usable haptic feedback can be generated, as addressed below in a user study.

4.1.2 SurfTics 1.0

In order to overcome the limitations of the initial prototype, a new device – named SurfTics 1.0 – has been developed. It combines the previously mentioned TPaD tactile surface display with a 3DoF Revolute-Revolute-Spherical (RRS) motion platform (similar to [47]), thus supporting rotation as well as translation of the shape display. This setup allows for the rendering of shapes in 3DoF, at non-centered positions on the touchscreen (see Fig. 2.13).

Fig. 2.13
figure 13

Mechanical design of the SurfTics 1.0 device (left); CAD-rendering of device (right). (Ⓒ2017 IEEE. Reprinted, with permission, from [48])

The mechanical setup comprises of a base plate, three stepper motors with encoders, three RRS manipulator arms, and a top plate with a NanoSuction pad as a universal tablet mount. The electrical setup consists of an ATmega1280 8-bit microcontroller (AM), three drive units (DU), and a customized board (IOIO). Data transfer is realized via UART connections between the boards. The main processing unit is the AM, which calculates the inverse kinematic algorithm of the finger position data and coordinates the three drive units. The latter each consist of an ATmega328 (AN), a stepper motor driver (Texas Instruments DRV8825), a quadrature encoder (1’000 increments), and a stepper motor (NEMA17, 0.8/step). The AN is employed for running a position control loop and handling the UART communication. Quadrature encoders are used for precise position feedback, as stepper motors are prone to slippage, especially when running at high load. More details on the device design can be found in [48].

The workspace allows for a 20 mm translation along the Z-axis and ±30 rotation about the X- and the Y -axis, respectively. Based on the stepper motor resolution (400 steps per revolution times 32 – achieved by employing the microstepping mode of the driver circuit) the minimum and maximum velocity can be calculated as v max = 24.4 rpm and v min = 0.995 rpm, respectively. The 1’000 CPR quadrature encoder delivers a resolution of 0.27 for the rotations.

The improved device design permits more diverse user interaction and haptic feedback. User experiments showed that the new design and the combined feedback benefits identification tasks, as will be overviewed below.

4.1.3 SurfTics++

The previously described SurfTics version already exhibited a reasonably improved performance; however, it still included a number of drawbacks. Firstly, vibrations were generated when the shape display was moving. These were undesired and interfered with the vibrations created by the tactile display. The vibrations were presumably caused by the employed stepper motor. Secondly, quite audible acoustic noise originating from the drive system was present. This was likely coming from the PWM signals for driving the stepper motors; these being in the audible range. Thirdly, the upper speed limit of 30 rpm of the drive units is a limitation for displaying sharp edges. Therefore, a new version of the SurfTics hardware was developed (dubbed SurfTics++), which on the one hand improves on the maximum bearable user force, dynamic range, and motion smoothness, and on the other hand reduces the audible noise in the system. The amended design will be described in detail next (also see Fig. 2.14).

Fig. 2.14
figure 14

CAD-rendering of SurfTics++ device (left); final realization of hardware (right). (Obtained from [46] (courtesy of F. Enneking))

For the further enhanced version, the three actuated revolute joints, with attached lever arms, are distributed equally at 120 positions, at a radius of 100 mm from the center point of the base plate. The three lower lever arms (from the motors to the revolute joints) measure 60 mm in length; the three upper lever arms (from the revolute joints to the spherical joints of the top plate) are 90 mm in length. The three spherical joints attached to the top plate are also distributed equally in 120 steps, also at a radius of 100 mm from the center point of the top plate.

The most crucial part of the setup is the drive system, each unit consisting of a motor, an encoder for tracking the motor’s position, and a controller for providing the needed high power control signals. The required maximum motor torque can be calculated from an assumed maximum finger force requirement of 25 N. In addition to that, the motors also have to bear the weight of the Nexus 7 tablet and the top plate weighing 0.62 kg. The resulting overall maximum force of 31.08 N is distributed over the three motors, depending on the finger position during interaction. It is assumed that in the worst case, the portion of the overall force born by a single motor can reach 66% of the overall force. Hence, considering the 0.060 m lever arms, the maximum torque requirement for the selected motors is 1.24 Nm. In addition to this requirement, the motors should also be backdrivable, with as little resistance as possible.

In contrast to the stepper motors used in SurfTics 1.0, brushed DC motors can provide a variable torque and are in general backdrivable. This is also true for brushless DC motors, but their control is more difficult and drivers are significantly more expensive. Thus, we employ brushed DC motors in the SurfTics++, specifically the Maxon RE 35 118778. This motor is highly backdriveable, showing negligible resistance or cogging torque. Its lower speed limit is 13 rpm at a 4% PWM duty cycle. In order to tweak the speed range to a desired lower speed limit of 1 rpm, a 13:1 gear reduction via capstan drives is employed. In addition, the motors come with AVAGO HEDS 5540 A11 quadrature encoders mounted on the extended motor shaft on the back, with a resolution of 500 CPR. According to this, in total an encoder resolution of 26’000 countable steps per revolution is achieved. As a motor driver, the Pololu 20 A, 5.5–50 V single motor controller, was selected. It comprises a current sense pin as well as a drive-coast operation mode. This hardware allows for driving PWM signals at frequencies of up to 40 kHz.

The workspace of the device is given by 20 mm translation along the Z-axis, and ±30 rotation about the X- and the Y -axes. The dynamic range of the drives is 1–24 rpm. For force rendering, the usable stiffness range is 1.37–6.00 Nmm−1. The positioning accuracy of the drive system was found to be 0.48.

4.2 Rendering

The kinesthetic shape display component of the various described SurfTics versions is able to present 2.5D images on the tablet. For this depth map images can be employed that are similar to a topographic map; with the color/gray level scale representing the height at each pixel/location. Depending on the employed bit-depth for encoding height, different resolutions of height levels are available. In addition to the height, local orientation can also be, for instance, obtained as height field gradients and displayed. Note that due to gradient discontinuities at edges, directional Gaussian smoothing usually has to be included for this.

Regarding the desired Z-elevation of the tablet, note that its inclination also has an effect on the perceived height at the contact location of the fingertip. To counteract this effect, one can employ a simple compensation function:

$$\displaystyle \begin{aligned} z_{comp} = x \tan{}(\phi_Y) + y \tan{}(\phi_X), \end{aligned} $$
(2.1)

with the ϕ i being the rotations around the tablet axes, and x, y giving the contact position in tablet space. By subtracting this value from the current Z-elevation, the inclination angle effects are decoupled from the rendering.

Similar to the rendering of height, texture information can also be encoded in grayscale images. Pixel intensity can be mapped directly to the amount of friction. Preliminary tests with the combined setup indicated that it is more difficult for a user to discriminate between continuous changes of friction than to perceive a discontinuous transition between levels. Therefore, in the experiments described below, we rendered friction only at some edges as hard transitions. These are given by dark and bright regions in the grayscale images and mapped to low and high friction.

4.3 User Experiments

In the following, we will describe two experiments that were carried out with the described hardware. In the first, we examine how well users can discriminate between concave and convex shapes, using the initial prototype as well as SurfTics 1.0. In the second study, we explore how well subjects can differentiate between various 2.5D geometries, using the SurfTics 1.0.

4.3.1 Curvature Detection Experiment, Comparing Prototype to SurfTics 1.0

The goal of this experiment has been to evaluate the basic rendering performance of two previously described display prototypes – in particular, we have compared the initial prototype to the SurfTics 1.0. Note that the former one is limited in the display due to a fixed center of rotation.

Four different shapes, being either convex or concave, were presented to study participants; the former were given by quadratic functions, extruded on the surface along one axis. The functions were set such as to result in a base width of either 3 cm or 4 cm, and a height of 1 cm. The virtual objects were placed at the center of the workspace, with the ridge/trough extending along the Y -axis. For the early prototype, the object was also randomly shifted along the X-axis, to allow for variation between different trials. The maximum rotation was limited to ±15 for both devices.

The concave and convex shapes were presented in four different conditions. In the first, the early prototype was employed, displaying height and curvature; but with a fixed center of rotation. In the remaining three, the SurfTics 1.0 device was employed; the rendering with that one was either height-only, orientation-only, or a combination of both cues (see also the sketch in Fig. 2.15).

Fig. 2.15
figure 15

Rendering conditions with the SurfTics 1.0, with fingertip moving along convex shape example – height-only (left), orientation-only (middle), combination (right). (Obtained from [45] (courtesy of L. Chhong Shing))

Ten sighted volunteers – 2 females, 8 males, average age 27 – participated in the study. All participants were naive to the design and the concept of the devices. All were sighted, with no sensory deficits, and happened to be right-handed. Two participants reported being familiar with the employed haptic technology.

Participants were requested to sign a consent form prior to the experiment. Subsequently, they were blindfolded and asked to wear a noise-canceling headphone, to mask any confounding cues. Before the actual experiment, participants were allowed to familiarize themselves with the devices as well as the task. After the training phase, one stimulus was randomly selected and displayed, using the device specific to the condition. Participants were permitted to explore the virtual object without any time constraints. After each trial, participants had to indicate which type of curvature (concave/convex) and which size they perceived. Also, the trial time was recorded.

Firstly, we examine the shape identification results. A Shapiro-Wilk test revealed that the data were not normally distributed per group. Therefore, a Friedman non-parametric test was carried out, yielding a significant effect of the different conditions (χ 2(3) = 9.766, p = 0.02). A Wilcoxon post-hoc test indicated a statistically significant difference between the early prototype rendering and the combined shape display (W = 26, p = 0.037). Moreover, the task completion time was normally distributed (Shapiro-Wilk). Repeated measures ANOVA indicated significant differences between the conditions (F(3,36) = 5.574, p = 0.003). A post-hoc Tukey test showed differences of the height-only rendering with the other two conditions employing the SurfTics 1.0 (p <  0.05). No significant difference was found in the size discrimination. The main results are visualized in Fig. 2.16.

Fig. 2.16
figure 16

Box plots summarizing the results of the shape identification tasks (top) and the task performance time (bottom), for all four conditions (C1–C4) C1 and C4 present curvature and height by employing the early prototype and the SurfTics 1.0 respectively. C2 and C3 provide height information only and curvature only through the SurfTics 1.0 device. (Obtained from [45] (courtesy of L. Chhong Shing))

4.3.2 Width Identification Experiment with the SurfTics 1.0

A further study has been carried out to explore how well users could identify the width of 2.5D geometries via the SurfTics 1.0 device. When users moved a finger across the tablet, height as well as local normal orientation of the contacted object surface were rendered via the shape display device component. In addition, surface haptics could be included when contacting the surface depending on the experimental condition, as will be outlined below. An overview of these rendering processes is indicated in Fig. 2.17.

Fig. 2.17
figure 17

Sketch of shape rendering, for display of a pyramid frustum (left); note numbered regions with varying orientation. Sketch of surface friction rendering of edges/transitions (right). (Ⓒ2017 IEEE. Reprinted, with permission, from [48])

Nine sighted participants – three female, six male, average age 27 – took part in the experiment. The haptic device was employed to render pyramid frustums, of equal height in Z-direction, but with varying width in the base (see Fig. 2.17). The geometries – denoted as S1 to S4 – had a width (i.e. side length) of 40, 50, 60, and 70 mm, thus resulting in a matching decrease in slope on the sides.

Participants initially familiarized themselves with the setup and the task. Thereafter, in a training phase, a blindfold and a headphone playing white noise were donned, blocking out visual and auditory cues. Subjects were then asked to explore the four shapes and were informed verbally by the experimenter on the size. This process lasted until the participants felt comfortable with the setup and confident about correctly carrying out the task.

Subsequently, the actual trials took place. The different samples were presented, each one eight times, in pseudo-random order. The index finger of a blindfolded participant’s dominant hand was guided to the center of the rendered shape by the experimenter. In each trial, a participant was allowed to explore a shape for a maximum of 15 s, before being asked to report on the perceived width of the shape. The responses of the participants were recorded by the experimenter. All trials were carried out in direct succession, without any feedback on correctness. Furthermore, note that four participants started the first half of the trials with surface haptics enabled (With condition), while the remaining ones started without this feedback (Without condition). After completing the first half, the mode switched for the groups.

Overall, in the experiment we found an average identification rate of 58%, for the condition without surface friction display, and of 71%, including the friction rendering. The recorded data were successfully tested for normality, and further analysis via a paired t-test indicated statistical significance of the results (t(8) = 2.879; p <  0.05). This hints at the beneficial effect of including friction rendering for the identification task. The overall data are illustrated in confusion matrices for both experimental conditions in Table 2.2. It is interesting to note that when no surface haptic feedback was included, incorrect replies were even found for maximum differences, i.e. S1 was identified as S4, and/or vice versa.

Table 2.2 Confusion matrices (response vs. stimulus) showing results for -- condition of shape rendering without surface haptics (left) and with surface haptics (right). The latter exhibits better performance

5 Discussion and Conclusion

In this chapter, we have outlined various aspects related to haptically-assisted interfaces for persons with visual impairments. We focused on two different threads – the use of ETAs as mobility assistance systems, as well as setups for the haptic display of visual information on mobile devices.

In the former domain, haptic feedback to support both a specific sweeping and a specific scanning mode in ETAs was investigated, including experiments with actual white cane users. The first two studies illustrated that the tactile rendering methods presented in [28] result in high identification rates of detecting distance levels, not only in the static but also in the dynamic condition. Moreover, the results were also confirmed in the presence of different ground surface textures.

Studying distance identification during motor activity is an important extension since any haptic signals that encode distance information should be recognizable by users during dynamic use of the cane (such as sweeping). According to Post et al. [49], vibration perception may degrade during dynamic movements. In this context, the proposed tactile rendering method was still found to be capable of delivering corresponding distance levels during sweeping. In addition, having properly isolated vibration stimuli per fingertip also proved to be a critical factor in the ETA design, to avoid stimulating the whole handle.

In the additional study focusing on the scanning mode, participants remained stationary only. However, this was following our idea of switching between modes; more detailed information about obstacles would be obtained in the scanning mode after a user was to detect the presence of an obstacle in the sweeping mode. A user would then stop and point the ETA at such obstacles to obtain additional information. Results of the scanning study revealed that the accuracy of depth perception of obstacles was initially not very high. We conjecture that this is due to the large number of intensity levels provided; eight distance regions were employed, without intensive user training. However, it should be noted that reducing the number of distance levels may not provide sufficient information about obstacles, e.g. it might not be possible to accurately detect individual steps of a staircase. Nevertheless, using a larger range of distance levels will require a substantial amount of user training. This will be examined further in future work; it may be a possibility to adaptively adjust the number of distance levels. The option to personalize distance detection settings and haptic signals according to white cane user preferences should also be incorporated.

In the latter domain, we introduce three prototypes for displaying visual data on mobile tablets. For haptic rendering, a surface haptic display is combined with a shape display mechanism. Preliminary user studies indicate that such devices could be used for mediating visual information via the haptic sensory channel. However, considerable additional work is required; especially, also user studies with visually impaired persons. Only simple shapes were employed in the user studies, as in similar related research. A direct extension will be to explore how more complex shapes, as well as mixed contents, could be perceived on such augmented mobile tablets by users.