Keywords

6.1 Introduction

The author and his team are trying to construct an intelligent haptic information environment that integrates communication in real space, human interfaces, and media processing [1, 2]. In other words, we seek to establish methods for collecting, understanding, and transmitting haptic information in real space and for displaying it to humans located at remote sites. Furthermore, we want to use an information space that feels like the natural space in which people move and act not only to make remote communication, remote experiences, and pseudo-experiences possible but also to build human-harmonized “haptic media” in which creative activities such as content design and production can take place in ways similar to the ways they currently do in the real world.

The information we acquire through real life gives us a holistic experience fully incorporating a variety of sensations and bodily motions—seeing, hearing, speaking, touching, smelling, tasting, moving, etc. However, the sensory modalities that can be transmitted in our information space are usually limited to the visual and auditory. Haptic information is rarely used in our daily lives’ information space, except in the case of warnings or alerts such as cell-phone vibrations.

Generally, human haptic sensations can be categorized into cutaneous sensation (the sensing of pressure/force, vibration, pain, and temperature) and proprioception (kinesthetic sense —i.e., the sensing of position, movement, and weight/force). Haptic media providing both proprioception and cutaneous sensations would let users feel like they were touching distant people and objects and would let them “touch” artificial objects. Transmission of texture, mass, warmth, moisture, and other sensory information would expand the current passive information space, which comprises only images and sounds, to an active and human-harmonized information space where the user could extend his/her hand and feel the presence of an object that isn’t really there.

A number of technologies have been developed to build a haptic information space, but they fail to provide a holistic experience because of two shortcomings:

  1. 1.

    The technologies can communicate only a select spectrum of haptic sensation because they use ad hoc methods based on an insufficient and still-primitive understanding of haptic sensation.

  2. 2.

    They offer only a narrow definition of haptic sensation, one that does not sufficiently incorporate visual/auditory sensation and bodily motion.

To establish foundation technologies for the “recording and analysis,” “transmission,” and “playback, synthesis, and display” of haptic information and to build technologies to fully transmit haptic sensation and bring haptic sensation to a level where it can be treated as information media, much like visual and auditory sensations, this research will:

  1. 1.

    expand upon the principle of haptic primary colors previously formulated by the author and his team, further elucidating the haptic sense mechanism in humans, and

  2. 2.

    establish a design method for haptic information combined with visual sensation and bodily movement.

Figure 6.1 shows the concept of haptic media comparable to visual media, which transmits and/or creates a realistic visible and tangible three-dimensional (3D) environment based on the haptic primary color theory.

Fig. 6.1
figure 1

Haptic media comparable to visual media [1]

Fig. 6.2
figure 2

Information transmission and creation of realistic tangible 3D environment [1]

6.2 Haptic Media Project

This research concerns the development of a “haptic information space,” an information system that makes possible the simultaneous delivery of high-resolution haptic, visual, and auditory information. Figure 6.2 shows the outline of the project. The real world is sensed by a haptic scanner, and the scanned information is analyzed and decomposed using the formulation of haptic primary colors before being transmitted to distant places. It is also possible to store the scanned information, which can be retrieved and edited using haptic editors. Transmitted or retrieved information is synthesized using the formulation of haptic primary colors and is displayed to human users through a haptic display.

Some possible applications of these systems would include implementation of the information content of museums and libraries, as well as training in the fields of medicine and space research. For example, visual and haptic data for a precious object in a museum’s collection (that one is normally not allowed to touch) could be archived in a computer, and users could access the object via a studio-type information space that lets them experience touching the object with their own hands (see Fig. 6.3 left). These systems could also be used in daily life. For example, a shop could store visual and haptic information about all its products and produce a tangible catalog of its goods. The customer could customize the product on the spot and try it out prior to deciding on a purchase (Fig. 6.3 middle), or two people at distant locations could cooperate in creative activities (Fig. 6.3 right).

Fig. 6.3
figure 3

Museum implementation (left), tangible product catalog (middle), and co-creation (right) [1]

6.2.1 Haptic Device Design Based on Haptic Primary Colors

Our understanding of human perceptual mechanisms for processing visual and auditory information continues to progress through physiological and psychological research, and visual and auditory information measurement and design methods for human presentation based on principles of human sensory perception are already established. It is for this reason that cameras, televisions, audiovisual displays and other general-use devices that acquire, transmit, and display visual and auditory information have been designed and widely used. Similar progress, however, has not been made with regard to haptic information. There are no standard methods and/or devices acquiring, transmitting, and displaying haptic information that are comparable to those used with visual and auditory information. This research has as its goal the establishment of design methods for processing haptic sensory information in ways that are based on a better understanding of sensory mechanisms.

Broadly speaking, human haptic sensation can be divided into cutaneous sensation (pressure sense, vibration sense, thermal sense, and pain sense) and proprioception (kinesthetic sense, position sense, and movement sense). Cutaneous perception is created through a combination of nerve signals from several types of tactile receptors located below the surface of the skin.

If we consider each type of activated haptic receptor as a sensory base, we should in principle be able to express any given pattern of cutaneous sensation through the synthesis of the signals from these bases. For pressure and vibration there are four kinds of tactile receptors—Meissner’s corpuscles, Merkel cells, Pacinian corpuscles, andRuffini endings—each activated by a different stimulus or different combination of stimuli.

Because in color theory three colors are called the primary colors, we have called these haptic information bases the “haptic primary colors” and continued to investigate them. Using this concept of haptic primary colors as a foundation, our technical concern is the recreation of cutaneous sensation through signal delivery to each sensory base separately (i.e., by selectively stimulating tactile receptors). We offer selective stimulation of the Meissner’s corpuscles and Merkel cells through electrical stimulation as one method for the reproduction of haptic primary colors. We have developed a cutaneous sense display capable of high spatial and temporal resolution, thereby demonstrating the efficacy of the formulation of haptic primary colors [3].

The development of force vector distribution sensor called “GelForce” has made possible the collection of real-world temporal and spatial haptic information. The haptic telexistence system has been devised by using these technologies in measurement and presentation. The system allows for long-distance transmission of haptic information through the use of a robotic hand with GelForce sensors embedded in its fingertips and a “master hand” with an electro-tactile display embedded in the fingertips. Haptic information about the objects gripped by the robotic hand is transmitted to the operator, who is therefore able to operate the robotic hand smoothly [4].

However, although it is already possible to recreate simple conditions like contact and pressure, it is not yet possible to create more detailed natural haptic sensations like the feel of metal and the texture of paper. Reproducing natural haptic sensations will require physical information collected from the real world to be “resolved” into haptic sensory bases and will require selective stimulation of each type of tactile receptor through composition of the nerve-firing patterns of human tactile receptors. To date, there have been virtually no examples of a conversion system for this decomposition and composition, and no effective methodologies have been established. This is one reason that previous haptic sensory research has been limited to individual tactile sensations.

In this investigation, we expand upon our principle of haptic primary colors. By adding cold receptors (free nerve endings), warmth receptors (free nerve endings), and pain receptors (free nerve endings) to the original four haptic sensory bases and reconsidering sensory activation as a temporal and spatial composition of seven sensory bases, we aim to attain a better understanding of the haptic primary colors formula for converting haptic information through decomposition and composition. In order to expand the current selective stimulation method from Meissner’s corpuscles and Merkel cells to other sensory bases, we must first deepen our biological understanding of haptic receptors. And to develop a new method for selective stimulation, we need to better understand the nature of temporal and spatial perception of haptic sensation. We will formulate design principles for haptic sensors and tactile displays that better fulfill the formula of haptic primary colors, and we will develop transmission systems that can transmit natural haptic sensations.

6.2.2 Construction Method for Embodied Haptic Contents

The use of image editing software, three-dimensional CAD, computer graphics libraries like DirectX and OpenGL, and other information composing and editing technologies has resulted in an information environment where anyone can freely create visual information contents.

In this investigation we aim to develop fundamental technologies for the creation of haptic information contents and to integrate visual information contents into haptic information, thereby constructing a haptic information space. When one touches an object with his/her hand or fingers, the haptic information is barely sensed at all unless the hands and fingers are moved. The complete haptic information about the object is collected only when one moves his/her hands and fingers. In addition, the hand and finger haptic sensations for the same portion of the same object can differ depending on many factors, including the angle, speed, and pressure of the touch. The haptic sense differs greatly from the visual and auditory senses in that the haptic perception processes are mediated by bodily movement. Haptic sensations are thoroughly embodied perceptions. This creates a necessity, when expressing haptic information contents, to control the reproduced haptic information and respond in real time to a user’s bodily movement throughout the information experience.

Kinesthetic sensation can be quantified and presented in real time by using physical simulation technologies, but it is not yet possible to simulate cutaneous sensation in real time. Using our understanding of human haptic perception, we are working to develop (1) technologies for a haptic scanner that can capture real-world haptic sensations (texture), (2) methods for mapping haptic sensory textures by using 3D computer graphics modeling, and (3) technologies that can be used to compose haptic sense information in response to arbitrary bodily movements based on the collected haptic sensory textures. We are also trying to establish a method for building haptic sensory contents with a sense of embodiment.

In this research we seek to compress motion-based haptic sense information, simplify haptic sense quantifications, and establish technologies for creating embodied haptic information contents (Fig. 6.4).

Fig. 6.4
figure 4

Technologies for creation of embodied haptic information content [1]

6.2.3 Tangible Visuo-Haptic 3D Display

Touching an object as it is viewed is an absolutely essential element in experiencing the “reality” of the target object. It has been proven that if the user of an advanced stereoscopic display system cannot extend his/her hand and touch a stereoscopic image, he/she will lose cognizance of the “reality” of the target object and experience a sense of discomfort. There is thus a strong awareness that with the popularization of stereoscopic images it is necessary to fuse visual and haptic information. Since for the content offered by conventional visual displays the concept of direct “touch” is not considered, with them it is not possible to align the positional relationship between visual information and haptic information. If a head-mounted display (HMD) is used, presenting stereoscopic images within the grasp of the user is achievable; however, it is hard to say that a “human-harmonized ‘tangible’ information environment,” which is the object of this study, is suited to such use since HMDs are isolated from the surrounding environment. Assuming that haptic information is available, the ability to provide 3D haptic information at the user’s fingertips and the ability to move one’s hands freely without a device at the location where information is provided are requisite conditions. In this study users are able to take and touch three-dimensional images as well as perform operations while perceiving autostereoscopic visual information using binocular parallax and motion parallax. In other words, a 3D visuo-haptic display that can provide “reality” to target objects is being developed (Fig. 6.5).

Fig. 6.5
figure 5

From a conventional 3D visuo-haptic display to a 3D visuo-haptic display that presents reality [1]

6.2.4 Construction and Verification of Embodied Tangible 3D System

6.2.4.1 Transmission of Realistic Tangible 3D Environment

The interim milestone for the present study was to construct, within a three-year target period, a haptic information transmission system. The system was supposed to transmit signals from a haptic sensor in real time and provide haptic sensations, including temperature sensations, to a haptic display in such a way that haptic sensations are integrated with visual sensations. We have succeeded in transmitting fine haptic sensations, such as material texture and temperature, from an avatar robot’s fingers to a human user’s fingers by using the experimentally constructed TELESAR V system.

6.2.4.2 Creation of Realistic Tangible 3D Environment

The final milestone of this study was to construct a tangible information environment system that presents integrated visual and haptic information. Visual as well as haptic models of real objects should be acquired and added to a database to produce content. RePro3D and HaptoMIRAGE have been invented to demonstrate the final milestone, and they have enabled information content to be “experienced” in situations that unite haptic senses, visual senses, and motion. These demonstrations have revealed that necessary and sufficient haptic information has been acquired, transmitted, and presented.

Future goals will be to verify and assess such complex haptic items as a sense of touching a fish as well as feeling water resistance, moistness, and slipperiness. Figure 6.6 shows such a future demonstration using a haptic aquarium.

Fig. 6.6
figure 6

Creation of realistic tangible 3D environment

6.3 Haptic Primary Colors

Humans do not perceive the world as it is. Different physical stimuli give rise to the same sensation in humans and are perceived as identical. A typical example of this fact is color perception in humans. Humans perceive light of different spectra as having the same color if the light has the same amount of red, green, and blue (RGB) spectral components. This is because the human retina typically contains three types of color receptors called cone cells or cones, each of which responds to a different range of the color spectrum. Humans respond to light stimuli via 3-dimensional sensations, which generally can be modeled as a mixture of red, blue, and green—the three primary colors.

This many-to-one correspondence of elements in mapping from physical to psychophysical perceptual space is the key to virtual reality (VR) for humans. VR produces the same effect as a real object for a human subject by presenting its virtual entities with this many-to-one correspondence. We have proposed the hypothesis that cutaneous sensation also has the same many-to-one correspondence from physical to psychophysical perceptual space, via physiological space. We call this the “haptic primary colors” [1]. As shown in Fig. 6.7, we define three spaces: physical space, physiological space, and psychophysical or perception space. As shown in Fig. 6.7, different physical stimuli give rise to the same sensation in humans and are perceived as identical.

Fig. 6.7
figure 7

Haptic primary color model [1]

In physical space, human skin physically contacts an object, and the interaction continues with time. Physical objects have several surface physical properties such as surface roughness, surface friction, thermal characteristics, and surface elasticity. We hypothesize that cutaneous phenomena can at each contact point of the skin be resolved into three components—pressure/force p(t), vibration v (t), and temperature e (t)—and objects with the same p(t), v(t), and e(t) are perceived as the same even if their physical properties are different.

We measure p(t), v(t), and e(t) at each contact point with sensors that are presented on an avatar robot’s hand. Then we transmit these pieces of information to the human user who controls the avatar robot as his/her surrogate. We reproduce these pieces of information at the user’s hand via haptic displays of pressure/force, vibration, and temperature so that the human user has the sensation that he/she is touching the object as he/she moves his/her hand controlling the avatar robot’s hand. We can also synthesize virtual cutaneous sensation by displaying computer-synthesized p(t), v(t), and e(t) to human users through the haptic display.

This breakdown into pressure/force, vibration, and temperature in physical space is based on the human restriction of sensation in physiological space. Human skin has limited receptors, as is the case in the human retina. In physiological space, cutaneous perception is created through a combination of nerve signals from several types of tactile receptors located below the surface of the skin. If we consider each activated haptic receptor as a sensory base, we should be able to express any given pattern of cutaneous sensation through synthesis by using these bases.

Recall the four kinds of tactile receptors mentioned in Sect. 6.2.1: Merkel cells activated by pressure, Ruffini endings activated by tangential force, Meissner’s corpuscles activated by low-frequency vibration, and Pacinian corpuscles activated by high-frequency vibration, respectively. Adding cold receptors (free nerve endings), warmth receptors (free nerve endings), and pain receptors (free nerve endings) to these four vibrotactile haptic sensory bases, we have seven sensory bases in physiological space. It is also possible to add the cochlea, to hear the sound associated with vibration, as one more basis. This is an auditory basis and can be considered cross-modal.

Since all the seven receptors are related only to pressure/force, vibration and temperature applied on the skin surface, these three components in physical space are enough to stimulate each of the seven receptors. This is the reason that in physical space we have three haptic primary colors: pressure/force, vibration, and temperature. Theoretically, by combining these three components we can produce any type of cutaneous sensation without the need for any “real” touching of an object.

6.4 Haptic Information Display

When we use the haptic primary colors as a foundation, a technical concern is the recreation of cutaneous sensation by designing a haptic information display [1]. There are two ways of designing a haptic information display: through the physical layer or through the physiological layer. The latter involves the delivery of the stimulus to each sensory base separately, i.e., the selective stimulation of tactile receptors. We have realized selective stimulation of the Meissner’s corpuscles and Merkel cells through electrical stimulation as one method for the reproduction of haptic primary colors. We have developed a cutaneous sense display that is capable of high spatial and temporal resolution and have thereby demonstrated the efficacy of the haptic primary color theory [5, 6].

A force vector distribution sensor called “GelForce” has been developed for the quantification of pressure sense information [7]. It made possible the collection of real-world temporal and spatial haptic information and has been used, along with presentation technologies, in a haptic telexistence system [4]. The system allows for long-distance transmission of haptic information through the use of a robotic hand with GelForce sensors embedded in its fingertips, and a “master hand,” worn by the human operator, with electro-tactile displays embedded in its fingertips. Haptic information about the objects gripped by the robotic hand is transmitted to the human operator, who is then able to operate the robotic hand smoothly.

However, although it is already possible to recreate simple conditions such as contact and pressure, it has not been possible to create more detailed natural haptic sensations, such as the feel of metal or the texture of paper, by direct stimulation of the physiological layer. To display texture information naturally, we are using physical-layer information such as force/pressure, vibration, and temperature instead of the physiological-layer information. We have made prototype devices that are able to display normal/tangential force, vibration, and temperature. These are described in Sects. 6.4.1, 6.4.2, and 6.4.3, respectively.

6.4.1 Normal/Tangential Force Display: Gravity Grabber

We have developed a wearable haptic display, called Gravity Grabber [8], that presents normal and tangential forces on a fingertip (Fig. 6.8). It has a pair of geared DC motors with rotary encoders (Maxon Motor Corp., RE10, 1.5 W, gear ratio = 1: 16) to activate a belt. The belt depresses and drags the palm side of a fingertip. As shown in Fig. 6.9, the motors work in a complementary manner to generate forces in two directions.

Fig. 6.8
figure 8

Gravity grabber in use

Fig. 6.9
figure 9

Methods for generating vertical force (normal force) (left) and shearing force (tangential force) (right)

The most fundamental sensation of grasping is the normal force (vertical force). A Gravity Grabber reproduces this type of force sensation by driving the two motors in opposing directions to roll up the belt and generate a normal force (vertical force). The prototype device can generate a force of up to 6.5 N.

When we hold an object such as a bottle, we feel the shearing force between the bottle and our fingers; this feeling is important because it tells us not to drop the bottle. The tangential force needed to reproduce it is generated by driving the two motors in the same direction to roll up one end of the belt and to release the other end. In this way, the tangential force (shearing force) can also be presented by this device. The current prototype has a frequency range of 0–200 Hz. By combining the normal force and the tangential force applied to a fingerpad, we can display force vectors in any direction.

We have found that fingerpad deformation caused by the weight of an object can generate a reliable weight sensation even when the proprioceptive sensations on the wrist and arm are absent. This indicates that a simple ungrounded display for presenting virtual mass can be realized by reproducing the fingerpad deformation.

6.4.2 Vibration Sensor and Display: TECHTILE Toolkit

TECHTILE Toolkit [9 ] is an introductory haptic toolkit. Combining “TECHnology” with “tacTILE” perception/expression, it is intended to disseminate haptic technology as a third medium in the fields of art, design, and education. The new medium will extend conventional “multimedia,” which currently comprise only visual and auditory information.

The current version of the toolkit (see Fig. 6.10) is composed of haptic recorders (microphones), haptic reactors (small voice-coil vibrators or Force Reactor vibrators), and a signal amplifier that is optimized to present not only the zone of audibility (20–20,000 Hz) but also low-frequency (1–20 Hz) vibrotactile sensation.

Fig. 6.10
figure 10

TECHTILE Toolkit

This toolkit is intuitive to use and can reduce development costs. It can deliver more highly realistic haptic sensations than many other conventional haptic devices. TECHTILE Toolkit uses the conventional method of auditory media. The sources of auditory sensation and tactile sensation are the same.

The vibration of an object generates a sequence of vibrations of the air that is perceived as sound; conversely, if the object were touched directly, it would be perceived as a tactile sensation. The auditory sensation can be recorded as a sequence of sound waves that are easily editable and that can be shared on the Internet via services such as YouTube and via other content-sharing websites.

6.4.3 Thermal Sensor and Display

We have proposed a vision-based thermal sensor that uses thermosensitive paint and a camera. Thermosensitive paint changes its color when its temperature changes. We have used the thermosensitive paint to measure the thermal change on the surface of the haptic sensor for telexistence.

Figure 6.11 shows the configuration of the proposed vision-based thermal sensor [10]. This sensor consists of an elastic sheet with thermosensitive paint, a transparent elastic body, a camera, a heat source, and a light source. The thermosensitive paint is printed on the inside of the sensor surface so that its color changes corresponding to the thermal change on the sensor’s surface. The camera detects the color of the thermosensitive paint and converts it to the temperature of the sensor’s surface.

Fig. 6.11
figure 11

Configuration of the proposed vision-based thermal sensor

We ensure that the compliance of the elastic body is the same as that of a human fingertip, so the elastic body can mimic the deformation of human skin caused by contact. The heat source maintains the temperature of the sensor and that of the human fingertip at the same level. The temperature of the sensor surface is controlled to follow the temperature of the fingertip.

To convert color into temperature, we use the hue of the captured image. Note that the color of normal thermosensitive paint stops changing when the temperature changes by more than \(5\,^{\circ }{\text{ C }}\) or \(10\,^{\circ }{\text{ C }}\). Therefore, to include the temperature measurement range of 15–45 \(^{\circ }{\text{ C }}\), we must use several paints that have various temperature ranges.

The measured temperature is reproduced by Peltier actuators placed on the operator’s fingertips.

6.5 Telexistence Avatar Robot System: TELESAR V

Telexistence [2 ] is a concept named for the general technology that enables a human being to have a real-time sensation of being at a place other than where he or she actually is and to interact with the remote environment that may be real, virtual, or a combination of both. It also refers to an advanced type of teleoperation system that enables an operator at the control to perform remote tasks dexterously with the feeling of being in a surrogate robot working in a remote environment. Telexistence in the real environment through a virtual environment is also possible [11, 12].

TELESAR V (TELExistence Surrogate Anthropomorphic Robot version V) [1, 1316] is a telexistence master–slave robot system that was developed to realize the concept of telexistence. It was implemented with a high-speed, robust, full-upper-body mechanically unconstrained master cockpit and a 53-degree-of-freedom(DOF) anthropomorphic slave robot. The system provides an experience of our extended “body schema,” which allows a human to maintain an up-to-date representation in space of the positions of his/her various body parts. Body schema can be used to understand the posture of the remote body and to perform actions with the belief that the remote body is the user’s own body. With this experience, users can perform tasks dexterously and feel the robot’s body as their own body through visual, auditory, and haptic sensations, which provide the most simple and fundamental experience of feeling that one is someone somewhere. The TELESAR V master–slave system can also transmit fine haptic sensations, such as the texture and temperature of a material, from an avatar robot’s fingers to a human user’s fingers.

As shown in Figs. 6.12 and 6.13, the TELESAR V system consists of a master (local) and a slave (remote). A 53-DOF dexterous robot with a 6-DOF torso, a 3-DOF head, 7-DOF arms, and 15-DOF hands was developed. The robot also has Full HD (\(1920\times 1080\) pixels) cameras for capturing wide-angle stereovision, and stereo microphones are situated on the robot’s ears for capturing audio information from the remote site. The operator’s voice is transferred to the remote site and outputted through a small speaker installed in the robot’s mouth area.

Fig. 6.12
figure 12

TELESAR V master (left) and slave robot (right)

Fig. 6.13
figure 13

TELESAR V system configuration

Fig. 6.14
figure 14

TELESAR V slave robot

Fig. 6.15
figure 15

Kinematic configuration of TELESAR V

On the master side the operator’s movements are captured with a motion capturing system (OptiTrack) and sent to the kinematic generator PC. Finger bending is captured to an accuracy of 14 DOF with the modified “5DT Data Glove 14.”

6.5.1 Development of 53-DOF Human-Size Anthropomorphic Robot

As shown in Figs. 6.14 and 6.15, the TELESAR V slave robot consists of four main systems: a body, a head, arms, and hands. The body is a modified “Mitsubishi PA 10-7C Industrial Robot Manipulator” placed upright. The first six joints of the manipulator arm are used as the torso, and the final joint with separately attached DC motors is used as the 3-DOF (roll, pitch, and yaw) head.

Custom-designed 7-DOF human-size anthropomorphic robotic arms are fixed between body joints 6 and 7 so that the robot resembles a human-sized dexterous robot. To increase the level of dexterity of the slave robot’s arms, they were designed with joints (with limited angles) similar to those of human arms. However, we have included a position-based electrical limit overriding the mechanical limit to provide extra safety in case of a joint-angle overshoots.

The arm joints are driven by 12 V DC motors, and the first three joints (J1, J2, and J3) use harmonic drive gears to keep backlash and vibration low while providing the necessary torque. The hands are custom-designed human-sized anthropomorphic robotic hands with a number of joints similar to that in real human hands. The robotic fingers of each hand are driven by 15 individual DC motors, and dynamically coupled wires and a pulley-driven mechanism connect the remaining joints that are not directly attached to a motor. All of the DC motors are connected to standard DC motor drivers, and a combination of optical encoder output potentiometer readings is used as position measurements. Furthermore, voltage and current are monitored at each motor, and the torque at the motor shaft is calculated. Communication between the motor drivers and the PC is carried out through a PCI-Express \(\times 1\) bus.

6.5.2 Development of Wide-Angle HD Stereovision System

To capture Full HD video from the robot, we used as each of the robot’s eyes a CMOS camera head (model: TOSHIBA IK-HK1H) and a wide-angle lens (model: FUJINON TF4DA-8) configuration. Two cameras were installed 65 mm apart and parallel to each other. To provide a HD wide-angle stereovision sensation to the operator, a HD (\(1280 \times 800\) pixels) wide-angle head-mounted display (HMD) was developed. To provide a wide–angle view while maintaining a small footprint, we used a 5.6-in. LCD display (model: HV056WX1-100) and increased the length of the optical path by using a special lens arrangement. The HMD has two parallel virtual projection planes located 1 m from both eyeballs, to present stereoscopic vision independently between the eyes, thus enabling the operator to sense the distance correctly [11]. A knob is provided at the front of the HMD so that the operator can adjust the convergence angle of the left and right eyes for clear stereovision.

With the above specifications, we were able to produce a wide-angle field of view for each eye (\(61^\circ \) horizontally and \(40^\circ \) vertically). In addition, two cameras were installed on the front of the HMD. This is useful when the operator needs to change his/her vision to the video see-through mode (see Figs. 6.16 and 6.17).

Fig. 6.16
figure 16

Telexistence head-mounted display with see-through videocameras

Fig. 6.17
figure 17

Telexistence head-mounted display assembly view

6.5.3 Development of Thermal and Haptic Transmission System

As shown in Fig. 6.18, the haptic transmission system consists of three parts: a haptic scanner, a haptic display, and a processing block. When the haptic scanner touches an object, it obtains haptic information such as contact force, vibration, and temperature. The haptic display provides haptic stimuli on the user’s finger to reproduce the haptic information obtained by the haptic scanner. The processing block connects the haptic scanner with the haptic display and converts the obtained physical data into data that include the physiological haptic perception reproduced by the haptic display. The details of the mechanisms for scanning and displaying are described below [17].

Fig. 6.18
figure 18

Haptic system configuration

First, a force sensor inside the haptic scanner measures the vector force when the haptic scanner touches an object. Then two motor-belt mechanisms in the haptic display reproduce the vector force on the operator’s fingertips. The processing block controls the electrical current of each motor to provide torques based on the measured force. As a result, the mechanism reproduces the force sensation when the haptic scanner touches the object.

A microphone in the haptic scanner records the sound generated on its surface when the haptic scanner is in contact with an object. Then a force reactor in the haptic display plays the transmitted sound as a vibration. This vibration provides a high-frequency haptic sensation, so the information should be transmitted without delay. The processing block therefore transfers the sound signals by using amplifiers and an equalizer.

A thermistor sensor in the haptic scanner measures the surface temperature of the object. The measured temperature is reproduced by Peltier actuators placed on the operator’s fingertips. The processing block generates control signals for the Peltier actuators, and the generation of each signal is based on a PID control loop with feedback from a thermistor located on the Peltier actuator.

Figures 6.19 and 6.20 show the structures of the haptic scanner and the haptic display. Figure 6.21 shows the left hand of the TELESAR V robot with the haptic scanners and shows the haptic displays set in the modified 5DT Data Glove 14.

Fig. 6.19
figure 19

Structure of haptic scanner

Fig. 6.20
figure 20

Structure of haptic display

Fig. 6.21
figure 21

Slave hand with haptic scanners (left) and master hand with haptic displays (right)

Figure 6.22 shows TELESAR V performing various tasks: picking up sticks, transferring small balls from one cup to another cup, producing Japanese calligraphy, playing Japanese chess (shogi), and feeling the texture of a cloth.

Fig. 6.22
figure 22

TTELESAR V conducting several tasks transmitting haptic sensation to the user

6.6 RePro3D: Full-Parallax Autostereoscopic 3D Based on Retroreflective Projection Technology (RPT)

Autostereoscopic 3D creates 3D images based on the human binocular perception of 3D depth without requiring the viewer to use special headgear or glasses. Most common stereoscopic displays are based on the binocular stereo concept, but binocular-stereo-based displays without motion parallax cannot render an accurate image and cannot create images that provide different perspectives of the same object or scene from different points of view. Motion parallax plays an important role in the manner in which humans perceive 3D shapes. Multiview autostereoscopic 3D provides the perception of left–right motion parallax, and full-parallax autostereoscopic 3D provides the perception of both left–right motion parallax and up–down motion parallax. Full-parallax 3D, which provides different perspectives according to the viewing direction, is not only useful for motion parallax for a single viewer but is also necessary for simultaneously displaying stereoscopic images to a large number of people.

Most conventional multiview or full-parallax autostereoscopic3D systems are based either on the parallax barrier method or on the integral photography (IP) method. In the parallax barrier method, a barrier with a number of slits is placed in front of an image source so that a different pixel is seen from different viewing angles. However, the parallax barrier method is not capable of generating vertical parallax and thus cannot provide full-parallax autostereoscopic 3D. Thus, IP is the most appropriate method of realizing full-parallax autostereoscopic 3D. In the IP method a light field is reproduced by placing an array of microlenses in front of an image source, and the number of viewpoints can easily be increased depending on the resolution of the image source. However, because the resolution of the 3D image from a viewpoint depends on the number of lenses per unit area, the lenses must be sufficiently small to have the necessary resolution. Thus only relatively crude implementations have been produced using today’s technology.

Moreover, for the user to view the displayed object as a real object, full-parallax autostereoscopic 3D alone is not sufficient. The object’s image should not be displayed on a screen but superimposed in real space. That is, a full-parallax autostereoscopic 3D image must be produced as an aerial image (i.e., one floating in the air). However, conventional IP produces 3D images in front of a screen and cannot produce aerial images.

Fig. 6.23
figure 23

Basic principle of RePro3D, consisting of a projector array, a half-mirror, and a retroreflector

Figure 6.23 shows the basic principle of RePro3D, which is the RPT-based full-parallax autostereoscopic 3D display method [18, 19]. When images from a projector array are projected onto a retroreflector, light is reflected only in the direction of each projection lens. Images from the projector array are projected onto the retroreflector. When users look at the screen through a half-mirror, they can see, without the use of glasses, a 3D image that has motion parallax. RePro3D can generate vertical and horizontal motion parallax. An identical number of viewpoints are created on either side of the axis of symmetry of the half-mirror. The resolution of the image from each viewpoint depends on the projector resolution, and the number of viewpoints is equal to the number of projectors. It is therefore easy to improve the image resolution. It is also easy to produce a full-parallax autostereoscopic 3D image as an aerial image because RePro3D inherently uses a half-mirror.

When a large number of projectors are arranged in a matrix, a 3D image can be viewed from multiple viewpoints. To realize smooth motion parallax, the density of projectors in the projector array must be sufficiently high. With commercially available projectors, however, it is difficult to make a projector array in which the projectors are located very close to each other. One reason for this is that the distance between adjacent viewpoints is limited by the size of each projector. In addition, the system scale would increase and the large number of video outputs would increase the system’s cost. A virtual high-density projector array has therefore been developed by arranging a single LCD display, a lens array, and a Fresnel concave lens. The distance between viewpoints can be diminished by making a virtual lens array of the real lens array by using the Fresnel concave lens. Because a single LCD display is used for the projector array, a single video output can be used as an image source, the cost of which is lower than that of using many image sources for multiple projectors.

Figure 6.24 shows the principle of such an arrangement. The system consists of a number of lenses, an LCD, a half-mirror, and a retroreflector that serves as the screen. The lenses are located at an appropriate distance from the imaging area of the LCD so that the projected areas of the projection lenses overlap. Shield plates are placed between the lenses to prevent light from other viewpoints from entering a lens. The luminance of the projected image depends on the LCD luminance, viewing angle, and retroreflector performance. The resolution of the image from each viewpoint depends on the LCD resolution. The number of viewpoints is equal to the number of projection lenses.

Fig. 6.24
figure 24

RePro3D configuration

Figure 6.25 shows the prototype. We used 42 projection lenses, each 25 mm in diameter and having a focal length 25 mm. The Fresnel concave lens used in the prototype was 370 mm in diameter, and its focal length was 231 mm. We used a high-luminance LCD with a resolution of \(1680 \times 1050\) pixels and a luminance of 1000 cd/m\(^2\) and used Reflite 8301 retroreflector. The projection lenses were arranged in a matrix with 6 rows and 7 columns; therefore, the total number of viewpoints was 42. The resolution of the projected image seen from each viewpoint was \(175 \times 175\) pixels. The distance between viewpoints was 16 mm. The device was able to project up to 400 mm from the user’s viewpoint. The image was projected in a space of size \(200 \times 200 \times 300\) mm.

Fig. 6.25
figure 25

Appearance of RePro3D prototype

Fig. 6.26
figure 26

Motion disparity

Figure 6.26 shows a 3D object that was projected onto the retroreflective screen and could be seen from several viewpoints. The positional relationship of each displayed object changed according to the change of viewpoint. This finding indicates that our proposed method can produce a stereoscopic image superimposed in real space with smooth motion parallax. We placed an infrared (IR) camera (Point Gray Research, Firefly MV) with an IR pass filter and IR LEDs above the projected area to capture the user’s hand movements as shown in Fig. 6.27. Then we implemented the user input system, which recognizes the degree of contact between the user’s hand and the displayed image. Using this function, we built an application that enables the user to touch a character floating in space. If the user touches the character, the character reacts to the user’s touch, and the user can perceive this reaction by looking at the changes in the character’s appearance and from sound cues.

Fig. 6.27
figure 27

Visuo-haptic system overview

Fig. 6.28
figure 28

Interaction with a virtual character

Figure 6.28 shows a demonstration installation in which the character, an animated fairy floating in real space, reacts when touched by the user’s finger. In addition, the user wears a haptic device on his/her finger. When the user touches the 3D image, he/she feels a tactile sensation generated by the haptic device. The mechanism that produces the sensation on the user’s finger is based on the Gravity Grabber technology [8]. Gravity Grabber produces fingerpad deformation by using a pair of small motors and a belt as described in Sect. 6.4.1. To create a “pushing” sensation, the dual motors are driven in opposite directions so that they roll up the belt, thus delivering vertical force to the user’s fingerpad (see Fig. 6.29). The belt tension is determined by the degree of contact between the finger and the 3D image.

Fig. 6.29
figure 29

Interaction with haptic feedback via Gravity Grabber

The results of tests using the RePro3D prototype confirmed that our proposed method produces autostereoscopic images superimposed in real space and does so with smooth motion parallax. In this prototype we also realized a user interface that enables users to interact physically with a virtual character floating in space. We thus demonstrated that RePro3D provides a visual and haptic interface that enables users to see and touch a virtual 3D object as if the object were real.

6.7 HaptoMIRAGE

HaptoMIRAGE is an autostereoscopic display for seamless interaction with real and virtual objects. This system can project the 3D image in mid-air with \(180^\circ \)–wide-angle of view based on our proposed ARIA technology described below, and up to three users can see the same object from different points of view. The 3D image can be superimposed on the real object so that the user can get natural interaction with the mixed reality environment. HaptoMirage not only superimposes 3D images on the real environment but also lets the user draw autostereoscopic 3D line drawings on the real object. So it enables us to interact with the mixed reality environment in a natural manner and to easily create and feel the mixed reality world. Our goal is to implement a mixed reality platform with natural 3D interaction for creative design, storytelling, entertainment, and remote collaboration by seamlessly mixing the “real” and “virtual” worlds.

6.7.1 Virtual Shutter Glasses Using ARIA (Active-shuttered Real Image Autostereoscopy)

As discussed in Sect. 6.6, integral photography and RPT-based autostereoscopy are two major methods to realize full-parallax autostereoscopy. In addition to the 3D image being displayed in full-parallax autostereoscopic 3D, it must be produced as an image floating in the air. RePro3D, which is an RPT-based full-parallax autostereoscopic 3D display, is capable of not only generating vertical and horizontal motion parallax but also producing a full-parallax autostereoscopic 3D image as an aerial image (i.e., floating in the air). However, RPT-based autostereoscopy has a drawback: its small area of observation. It is difficult to get a large area of observation without constructing a huge array of projectors.

On the other hand, the frame sequential method using shutter glasses (a time-division 3D method has an advantage for multiviewpoint stereoscopy. Because it measures the human user’s position, it can present a 3D aerial image that can be observed in wide area of observation. Its drawbacks are that a user must wear shutter glasses and that it is not at all autostereoscopic.

We have proposed a new method of frame sequential stereoscopy that does not require the wearing of any special eyewear as shutter glasses; i.e., an autostereoscopic frame sequential method. It is based on active-shuttered real-image autostereoscopy (ARIA) [20]. Figure 6.30 shows how ARIA is used to realize virtual shutter glasses.

Fig. 6.30
figure 30

Realization of virtual shutter glasses using ARIA optics

As is shown in Fig. 6.30, a liquid crystal display (LCD) and an active-shutter are placed at distances of \(\mathrm{S}_1\) and \(\mathrm{S}_3\), respectively, behind a Fresnel lens of focal length f. The real images of the LCD and the shutter are made by the lens at distances of \(\mathrm{S}_2\) and \(\mathrm{S}_4\), respectively.

$$ \frac{1}{S_1}+\frac{1}{S_2}=\frac{1}{f},~~ \frac{1}{S_3}+\frac{1}{S_4}=\frac{1}{f} $$

A human user observes the real image on LCD through the real image of the shutter. On the LCD, the image for the left eye and the image for the right eye are displayed frame by frame alternately. The user’s right eye position \((x_{R}, y_{R})\) and left eye position \((x_{L}, y_{L})\) are measured by a motion-sensing device with eye recognition ability, such as Kinect for Windows. When the image for the left eye is presented as the real image, the corresponding part of the active-shutter in front of the left eye (i.e., between \(y_{1L}\) and \(y_{2L}\)) is open. When the image for the right eye is presented, the corresponding part of the active-shutter in front of the right eye (i.e., between \(y_{2R}\) and \(y_{1R}\)) is open. It is also possible to change the design so that the shutter in front of the right eye closes when the image for the left eye is displayed, and vice versa. Thus the active-shutter acts as virtual shutter glasses.

When the effective display size of the LCD is 2K, by putting \(K'=\frac{S_2}{S_1} K\), we get the following equations.

$$\begin{aligned} y_{1R} = \frac{y_{R}(S_{4}-S_{2})-K'(x_{R}-S_{4})}{x_{R}-S_{2}} \nonumber \\ y_{2R} = \frac{y_{R}(S_{4}-S_{2})+K'(x_{R}-S_{4})}{x_{R}-S_{2}} \nonumber \\ y_{1L} = \frac{y_{L}(S_{4}-S_{2})-K'(x_{L}-S_{4})}{x_{L}-S_{2}} \nonumber \\ y_{2L} = \frac{y_{L}(S_{4}-S_{2})+K'(x_{L}-S_{4})}{x_{L}-S_{2}} \nonumber \end{aligned}$$

Real coordinates on the LCD display can be obtained as follows:

$$ y'_{1R}=-\frac{S_3}{S_4}y_{1R}, ~~y'_{2R}=-\frac{S_3}{S_4}y_{2R}, ~~y'_{1L}=-\frac{S_3}{S_4}y_{1L}, ~~y'_{2L}=-\frac{S_3}{S_4}y_{2L}. $$

For an interpupillary distance d, the 3D observable distance x is in the following range:

$$ {S_4} \le x \le S_{4} + \frac{d}{2K} \cdot \frac{S_{1}}{S_{2}}(S_{4}-S_{2}). $$

6.7.2 Prototype System

Figure 6.31 shows the prototype HaptoMIRAGE \(180^\circ \) 3D display system. In this system, the parameters are set as \(S_{1}=0.4\) m (\(S_{2}=0.4\) m), \(S_{3}=0.23\) m (\(S_{4}=1.53\) m), \(2K=0.2\) m, and \(f=0.2\) m. The system consists of three components, each having a \(60^\circ \) field of autostereoscopic view based on our technology called ARIA. The Fresnel lens makes the real image from the LCD display, the position of the user is measured by a camera-based motion capture system, and the active shutter using a transparent LCD panel provides the virtual shutter glasses. The shutter is switched at 60 Hz, the user can see the real image as a floating 3D image, and up to three users can see the autostereoscopic image from different viewpoints at the same time.

Fig. 6.31
figure 31

Overall view of HaptoMIRAGE system

Without wearing any special glasses, the users can see the 3D image floating in the real environment as in Fig. 6.32. When a user points a finger at certain area of the 3D image, the other users can easily find where it is. The users can use their fingers to make colorful line drawings on the real object. This is similar to light drawing, but the difference is that the users can see the 3D image of their finger’s trajectory before their eyes and can interact with the drawings.

Fig. 6.32
figure 32

Direct interaction with floating 3D object (left), drawing 3D object (center), and matching 3D virtual object with real objects (right)

6.8 Summary

In this project, we are trying to construct an intelligent information environment, based on our proposed theory of haptic primary colors, that is both visible and tangible and is one where real-space communication, human-machine interfaces, and media processing are integrated. The goal is to create a human-harmonized “tangible information environment” that allows human beings to obtain and understand haptic information about the real space, to transmit the thus obtained haptic space, and to actively interact with other people using the transmitted haptic space. The tangible environment would enable telecommunication, tele-experience, and pseudo-experience with the sensation of working as though in a natural environment. It would also enable humans to engage in creative activities such as design and creation as though they were in the real environment.

We have succeeded in transmitting fine haptic sensations, such as material texture and temperature, from an avatar robot’s fingers to a human user’s fingers. The avatar robot is a telexistence anthropomorphic robot dubbed TELESAR V, with a body and limbs having 53 degrees of freedom. This robot can transmit not only visual and auditory sensations of presence to human users but also realistic haptic sensations. Other results of this research project include RePro3D, a full-parallax, autostereoscopic 3D (three-dimensional) display with haptic feedback using RPT (retroreflective projection technology); TECHTILE Toolkit, a prototyping tool for the design and improvement of haptic media; and HaptoMIRAGE, \(180^\circ \)-field-of-view autostereoscopic 3D display using ARIA (active-shuttered real image autostereoscopy) that up to three users can enjoy simultaneously.