Keywords

Field of research

2.1 Introduction

Why are people attracted to humanoid robots and androids? The answer is simple: because human beings are attuned to understanding or interpreting human expressions and behaviors, especially those that exist in their surroundings. Infants, who are supposedly born with the ability to discriminate various types of stimuli, gradually adapt and fine-tune their interpretations of detailed social clues from other voices, languages, facial expressions, or behaviors [2]. Perhaps due to this functionality of nature and nurture, people have a strong tendency to anthropomorphize nearly everything they encounter. This is also true for computers or robots. In other words, when we see PCs or robots, some automatic process inside us attempts to interpret them as human. The media equation theory [3] was the first to explicitly articulate this tendency. Since then, researchers have been pursuing the key element that makes people feel more comfortable with computers or creates an easier and more intuitive interface to various information devices. This pursuit has also spread to the field of robotics. Recently, the focus of robotics has shifted from traditional studies on navigation and manipulation to human–robot interaction. A number of researches have investigated how people respond to robot behavior and how robots should behave so that people can easily understand them [4,5,6]. Many insights from developmental or cognitive psychologies have been implemented and examined to see how they affect the human response or whether they help robots produce smooth and natural communication with humans.

However, human–robot interaction studies have neglected one issue: the “appearance versus behavior” problem. Empirically, we know that appearance, one of the most significant elements in communication, is a crucial factor in the evaluation of interaction (see Fig. 2.1). The interactive robots developed so far have very mechanical outcomes that appear as “robots.” Researchers have tried to make such interactive robots “humanoid” by equipping them with heads, eyes, or hands so that their appearance more closely resembles that of humans and to enable them to make analogous human movements or gestures such as staring, pointing. Functionality was considered the primary concern in improving communication with humans. In this manner, many studies have compared robots with different behaviors. Thus far, scant attention has been paid to the robots’ appearance. Although there have been many empirical discussions on very simple static robots such as dolls, the design of a robot’s appearance, particularly to increase its human likeness, has always been the role of industrial designers; it has seldom been a field of study. This is a serious problem for developing and evaluating interactive robots. Recent neuroimaging studies show that a certain brain activation does not occur when the observed actions are performed by non-human agents [7, 8]. Appearance and behavior are tightly coupled, and there are strong concerns that the evaluation results might be affected by appearance.

Fig. 2.1
figure 1

Three categories of humanlike robots: humanoid robot Robovie II (left: developed by ATR Intelligent Robotics and Communication Laboratories), android Repliee Q2 (middle: developed by Osaka University and Kokoro Corporation), geminoid HI-1 (right: developed by ATR Intelligent Robotics and Communication Laboratories)

In this chapter, we introduce android science, an interdisciplinary research framework that combines two approaches, one in robotics for constructing very humanlike robots and androids, and another in cognitive science that uses androids to explore human nature. Here, androids serve as a platform to directly exchange insights from the two domains. To proceed with this new framework, several androids have been developed. The development of android systems and several results is described. However, we encounter serious issues that sparked the development of a new category of robot called geminoid. The concept and development of the first geminoid prototype are described. Preliminary findings to date and future directions of study with geminoids are also discussed.

2.2 Android Science

Current robotics research uses various findings from the field of cognitive science, especially in the area of human–robot interaction, in an attempt to adopt findings from human–human interactions to make robots that people can easily communicate with. At the same time, cognitive science researchers have also begun to utilize robots. As research fields extend to more complex, higher-level human functions such as seeking the neural basis of social skills [9], robots will be expected to function as easily controlled devices with communicative ability. However, the contribution from robotics to cognitive science has not been adequate, because the appearance and behavior of current robots cannot be handled separately. As traditional robots look quite mechanical and very different from human beings, the effect of their appearance may be too strong to ignore. As a result, researchers cannot clarify whether a specific finding reflects the robot’s appearance, its movement, or a combination of the two.

We expect to solve this problem using an android whose appearance and behavior closely resembles that of a human. The same thing is also an issue in robotics research, as it is difficult to clearly distinguish whether the cues pertain solely to the robot’s behavior. An objective, quantitative means of measuring the effect of appearance is required.

Androids are robots whose behavior and appearance are highly anthropomorphized. Developing androids requires contributions from both robotics and cognitive science. To realize a more humanlike android, knowledge from human sciences is also necessary. At the same time, cognitive science researchers can exploit androids to verify hypotheses regarding human nature. This new, bidirectional, interdisciplinary research framework is called android science [10]. Under this framework, androids enable us to directly share knowledge between the development of androids in engineering and the understanding of humans in cognitive science (Fig. 2.2).

Fig. 2.2
figure 2

Framework of Android Science

The major robotics issue in constructing androids is the development of humanlike appearance, movements, and perception functions. A further issue in cognitive science is “conscious and unconscious recognition.” The goal of android science is to realize a humanlike robot and identify the essential factors for representing human likeness. How can we define human likeness? Further, how do we perceive human likeness? It is common knowledge that humans have conscious and unconscious recognition. When we observe objects, various modules are activated in our brain. Each of them matches the input sensory data with human models to affect reactions. A typical example occurs when, even if we recognize a robot as an android, we react to it as a human. This issue is fundamental both for engineering and scientific approaches. It will be an evaluation criterion in android development and will provide cues for understanding the human brain’s mechanism of recognition.

To date, several androids have been developed. Repliee Q2, the latest android [10], is shown in the middle panel of Fig. 2.1. Forty-two pneumatic actuators are embedded in the android’s upper torso, allowing it to move smoothly and quietly. Tactile sensors, which are also embedded under its skin, are connected to sensors in its environment, such as omnidirectional cameras, microphone arrays, and floor sensors. Using these sensory inputs, the autonomous program installed in the android can make smooth, natural interactions with nearby people.

Even though current androids have enabled us to conduct a variety of cognitive experiments, they are still quite limited. The bottleneck in terms of interaction with humans is the lack of ability to conduct a long-term conversation. Unfortunately, because current artificial intelligence (AI) technology for developing humanlike brains is limited, we cannot expect humanlike conversation with robots. When meeting humanoid robots, people usually expect humanlike conversation. However, the technology lags way behind this expectation. AI progress takes time, and AI that can make humanlike conversation is our final goal in robotics. To arrive at this final goal, we need to use currently available technologies and gain a deeper understanding of what it is to be a human. Our solution for this problem is to integrate android and teleoperation technologies.

2.3 Developing Androids

Up to now, several androids have been developed. Figure 2.3 shows Repliee R1, the first android prototype, and Repliee Q2, the latest android [10]. As stated above, engineering issues in creating androids involve the development of humanlike appearance, movements, and perception. Here, we describe our approach to resolving each of these issues.

Fig. 2.3
figure 3

First android, Repliee R1 (left; developed by Osaka University), and the latest android, Repliee Q2 (right; developed by Osaka University and Kokoro Corporation)

2.3.1 Humanlike Appearance

The main difference between conventional robots and androids is in their appearance. To create a very humanlike robot, we began by copying the surface of the human skin.

First, body part molds were made from a real human using the shape-memory foam used by dentists. Then, plaster human-part models were made from the molds. A full-body model was constructed by connecting these plaster models. Again, a mold for the full-body model was made from the plaster model, and a clay model was made using the mold. Professionals in formative art modified the clay model to recover the details of the skin’s texture. The human model loses its form in the first molding process, because human skin is soft. After that modification, a plaster full-body mold was made from the modified clay model, and then a silicon full-body model was made from that plaster mold. This silicon model is maintained as a master model.

Using this master model, silicon skin is made for the entire body. The thickness of the silicon skin in our current version is 5 mm. The mechanical parts, motors, and sensors are covered with polyurethane and the silicon skin. As shown in Fig. 2.3, the details are so finely represented that they cannot be distinguished from those of human beings in photographs.

Our current technology for replicating the human figure as an android has reached a fine degree of reality. It is, however, still not perfect. One issue is the detail of the wetness of the eyes. The eyes are the body part to which human observers are most sensitive. When confronted with a human face, a person first looks at the eyes. Although the android has eye-related mechanisms, such as blinking or making saccade movements, and the eyeballs are near-perfect copies of those of a human, we can still notice differences from those of a real human. Actually, making the wet surface of the eye and replicating the outer corners using silicone are difficult tasks, so further improvements are needed for this part.

Other issues are the flexibility and robustness of the skin material. The silicone used in the current manufacturing process is sufficient for representing the texture of the skin; however, it loses flexibility after one or two years, and its elasticity is insufficient for adapting to large joint movements.

2.3.2 Humanlike Movements

Very humanlike movement is another important factor in developing androids. Even if androids look indistinguishable from humans as static figures, without appropriate movements, they can be easily identified as artificial.

To achieve highly humanlike movement, we found that a child android was too small to embed the required number of actuators, which led to the development of an adult android. The right half of Fig. 2.3 shows our latest adult android. This android, named Repliee Q2, contains 42 pneumatic (air) actuators in the upper torso. The positions of the actuators were determined by analyzing real human movements using a precise 3D motion tracker. With these actuators, both unconscious movements (such as breathing in the chest) and conscious large movements (such as head or arm movements) can be generated. Furthermore, the android is able to generate the facial expressions that are important for interacting with humans. Figure 2.4 shows some of the facial expressions generated by the android. To generate a smooth, humanlike expression, 13 of the 42 actuators are embedded in the head.

Fig. 2.4
figure 4

Facial expressions generated by android Repliee Q2

We decided to use pneumatic actuators for the androids instead of the DC motors used in most robots. The use of pneumatic actuators provides several benefits. First, they are very quiet, much closer to human-produced sound. DC servomotors require reduction gears, which generate non-humanlike noise that is very robotlike. Second, the reaction of the android to external force becomes very natural with pneumatic dampers. If we use DC servomotors with reduction gears, sophisticated compliance control is required to obtain the same effect. This is also important for ensuring safety in interactions with the android.

The disadvantage of pneumatic actuators is that they require a large and powerful air compressor. This requirement means that the current android cannot walk. For wider applicability, we need to develop new electric actuators that have similar specs to the pneumatic actuators.

The next issue is how to control the 42 air servo actuators used to achieve very humanlike movements. The simplest approach is to directly send angular information to each joint. However, as the number of actuators in the android is relatively large, this takes a long time. Another difficulty is that the skin movement does not simply correspond to the joint movement. For example, the android has more than five actuators around the shoulder for generating humanlike shoulder movements, with the skin moving and stretching according to the actuator motions. Already, we have developed methods such as using Perlin noise [11] to generate smooth movements or employing a neural network to obtain the mapping between the skin surface and actuator movements. There still remain some issues, such as the limited speed of android movement due to the nature of the pneumatic dampers. To achieve quicker and more humanlike behavior, speed and torque controls must be designed in future studies.

After obtaining an efficient method for controlling the android, the next step is the implementation of humanlike motions. A straightforward approach to this challenge is to imitate real human motions in synchronization with the android’s master. By attaching 3D motion tracker markers on both the android and the master, the android can automatically follow the motions of a human (Fig. 2.5).

Fig. 2.5
figure 5

Replicating human motions with the android

This work is still in progress, but interesting issues have arisen with respect to this kind of imitation learning. Imitation by the android means the representation of complicated human shapes and motions in the parameter space of the actuators. Although the android has a relatively large number of actuators compared to other robots, this is still far fewer than in humans. Thus, the effect of data-size reduction is significant. By carefully examining this parameter space and mapping, we may find important properties of human body movements. More concretely, we expect to develop a hierarchical representation of human body movements that consist of two or more layers, such as small unconscious movements and large conscious movements. With this hierarchical representation, we can expect to achieve more flexibility in android behavior control.

2.3.3 Humanlike Perception

Androids require humanlike perceptual abilities in addition to humanlike appearance and movements. This problem has been tackled in the fields of computer vision and pattern recognition under rather controlled environments. However, the problem becomes extremely difficult when applied to robots in real-world situations, where vision and audition become unstable and noisy.

Ubiquitous/distributed sensor systems can be used to solve this problem. The idea is to recognize the environment and human activities using many distributed cameras, microphones, infrared motion sensors, floor sensors, and ID tag readers in the environment (Fig. 2.6).

Fig. 2.6
figure 6

Distributed sensor system

We have developed distributed vision systems [12] and distributed audition systems [13] in our previous work. To solve the present problem, these developments must be integrated and extended. Omnidirectional cameras observe humans from multiple viewing points and robustly recognize their behavior [14]. The microphones catch the human voice by forming virtual sound beams. The floor sensors, which cover the entire space, reliably detect human footprints.

The only sensors that should be installed on the robot are skin sensors. Soft and sensitive skin sensors are important, particularly for interactive robots. However, there has not been much work in this area in previous robotics research. We are now focusing on this important issue by developing original sensors. Our sensors are made by combining silicone skin and Piezo films (Fig. 2.7). This sensor detects pressure through the bending of the Piezo films. Furthermore, with increased sensitivity, it can detect the presence of humans very nearby from static electricity. That is, it can perceive that a human being is in the vicinity.

Fig. 2.7
figure 7

Skin sensor

These technologies for very humanlike appearance, behavior, and perception enable us to develop feasible androids. These androids have undergone various cognitive tests, but this work is still limited. The bottleneck is long-term conversation during interactions with real humans. Unfortunately, current AI technology for developing humanlike brains has only limited ability, and thus, we cannot expect humanlike conversation with robots. When we meet humanoid robots, we usually expect to have a humanlike conversation. However, the technology is very far behind this expectation. Progress in AI takes time, and this is actually our final goal in robotics. To arrive at this final goal, we need to use the technologies available today and, moreover, truly understand what a human is. Our solution to this problem is to integrate android and teleoperation technologies.

2.4 Geminoid

We have developed Geminoid, a new category of robot, to overcome the bottleneck issue. We coined the term “geminoid” from the Latin “geminus,” meaning “twin” or “double,” and added “oid,” which indicates “similarity” or being a twin. As the name suggests, a geminoid is a robot that will work as a duplicate of an existing person. It appears and behaves as a person and is connected to that person by a computer network. Geminoids extend the applicable field of android science. Androids are designed for studying human nature in general. With geminoids, we can study such personal aspects as presence or personality traits, tracing their origins and implementation into robots. Figure 2.8 shows the robotic part of HI-1, the first geminoid prototype. Geminoids have the following capabilities:

Fig. 2.8
figure 8

Geminoid HI-1

Appearance and behavior highly similar to an existing person

The appearance of a geminoid is based on an existing person and does not depend on the imagination of designers. Its movements can be made or evaluated simply by referring to the original person. The existence of a real person analogous to the robot enables easy comparison studies. Moreover, if a researcher is used as the original, we can expect that individual to offer meaningful insights into the experiments, which are especially important at the very first stage of a new field of study when beginning from established research methodologies.

Teleoperation (remote control)

Because geminoids are equipped with teleoperation functionality, they are not only driven by an autonomous program. By introducing manual control, the limitations in current AI technologies can be avoided, enabling long-term, intelligent conversational human–robot interaction experiments. This feature also enables various studies on human characteristics by separating “body” and “mind.” In geminoids, the operator (mind) can be easily exchanged while the robot (body) remains the same. Additionally, the strength of connection, or what kind of information is transmitted between the body and mind, can be easily reconfigured. This is especially important when taking a top-down approach that adds/deletes elements from a person to discover the “critical” elements that comprise human characteristics. Before geminoids, this was impossible.

2.4.1 System Overview

The current geminoid prototype, HI-1, consists of three main elements: a robot, a central controlling server (geminoid server), and a teleoperation interface (Fig. 2.9).

Fig. 2.9
figure 9

Overview of geminoid system

A robot that resembles a living person

The robotic element has an essentially identical structure to that of previous androids [10]. However, there has been considerable effort to make a robot that appears to not only resemble a living person, but to be a copy of the original person. Silicone skin was molded by a cast taken from the original person; shape adjustments and skin textures were painted manually based on MRI scans and photographs. Fifty pneumatic actuators drive the robot to generate smooth and quiet movements, which are important attributes when interacting with humans. The allocation of actuators was determined such that the resulting robot can effectively show the necessary movements for human interaction and simultaneously express the original person’s personality traits. Among the 50 actuators, 13 are embedded in the face, 15 in the torso, and the remaining 22 move the arms and legs. The softness of the silicone skin and the compliant nature of the pneumatic actuators also provide safety when interacting with humans. As this prototype was intended for interaction experiments, it lacks the capability to walk around; it always remains seated. Figure 2.8 shows the resulting robot (right) alongside the original person, Dr. Ishiguro (author).

Teleoperation interface

Figure 2.10 shows the teleoperation interface prototype. Two monitors show the controlled robot and its surroundings, and microphones and headphones are used to capture and transmit utterances. The captured sounds are encoded and transmitted to the geminoid server by IP links from the interface to the robot and vice versa. The operator’s lip corner positions are measured by an infrared motion capture system in real time, converted to motion commands, and sent to the geminoid server by the network. This enables the operator to implicitly generate suitable lip movement on the robot while speaking. However, compared to the large number of human facial muscles used for speech, the current robot only has a limited number of actuators in its face. In addition, the response speed is much slower than that of a human, partially due to the nature of the pneumatic actuators. Thus, simple transmission and playback of the operator’s lip movement would not result in sufficient, natural robot motion. To overcome this issue, the measured lip movements are currently transformed into control commands using heuristics obtained by observing the original person’s actual lip movement.

Fig. 2.10
figure 10

Teleoperation interface

The operator can also explicitly send commands to control the robot’s behavior using a simple graphical user interface. Several selected movements such as nodding, opposing, or staring in a certain direction can be specified by a single mouse click. This relatively simple interface was prepared because the robot has 50 degrees of freedom, making it one of the world’s most complex robots and basically impossible to manipulate manually in real time. A simple, intuitive interface is necessary so that the operator can concentrate on interaction and not on robot manipulation. Despite its simplicity, by cooperating with the geminoid server, this interface enables the operator to generate natural humanlike motions in the robot.

Geminoid server

The geminoid server receives robot control commands and sound data from the remote controlling interface, adjusts and merges the inputs, and sends and receives primitive controlling commands between the robot hardware. Figure 2.11 shows the data flow in the geminoid system. The geminoid server also maintains the state of human–robot interaction and generates autonomous or unconscious movements for the robot. As described above, as the features of the robot become more humanlike, its behavior should also become suitably sophisticated to retain a “natural” look [15]. One thing that can be seen in every human being, and that most robots lack, is the slight body movements caused by an autonomous system, such as breathing or blinking. To increase the robot’s naturalness, the geminoid server emulates the human autonomous system and automatically generates these micro-movements depending on the interaction state at each point in time. When the robot is “speaking,” it shows different micro-movements than when “listening.” Such automatic robot motions, generated without the operator’s explicit orders, are merged and adjusted with conscious operation commands from the teleoperation interface (Fig. 2.11). Simultaneously, the geminoid server applies a specific delay to the transmitted sounds, taking into account the transmission delay/jitter and the start-up delay of the pneumatic actuators. This adjustment serves to synchronize lip movements and speech, thus enhancing the naturalness of geminoid movement.

Fig. 2.11
figure 11

Data flow in the geminoid system

2.4.2 Experiences with the Geminoid Prototype

The first geminoid prototype, HI-1, was completed and presented to the press in July 2006. Since then, numerous operations have been held, including interactions with laboratory members and experiment subjects. Additionally, geminoid was demonstrated to a number of visitors and reporters. During these operations, we encountered several interesting phenomena. Here are some observations made by the geminoid operator:

  • When I (Dr. Ishiguro, the origin of the geminoid prototype) first saw HI-1 sitting still, it was like looking in a mirror. However, when it began moving, it looked like somebody else, and I couldn’t recognize it as myself. This was strange, since we copied my movements to HI-1, and others who know me well say the robot accurately shows my characteristics. This means that we are not objectively recognizing our unconscious movements ourselves.

  • While operating HI-1 with the operation interface, I find myself unconsciously adapting my movements to the geminoid movements. The current geminoid cannot move as freely as I can. I felt that not just the geminoid, but my own body is restricted to the movements that HI-1 can make.

  • In less than five minutes, both the visitors and I can quickly adapt to conversation through the geminoid. The visitors recognize and accept the geminoid as me while talking to each other.

  • When a visitor pokes HI-1, especially around its face, I get a strong feeling of being poked myself. This is strange, as the system currently provides no tactile feedback. Just by watching the monitors and interacting with visitors, I get this feeling.

We also asked the visitors how they felt when interacting through the geminoid. Most said that when they saw HI-1 for the very first time, they thought that somebody (or Dr. Ishiguro, if familiar with him) was waiting there. After taking a closer look, they soon realized that HI-1 was a robot and began to have some weird and nervous feelings. However, shortly after having a conversation through the geminoid, they found themselves concentrating on the interaction, and soon the strange feelings vanished. Most of the visitors were non-researchers unfamiliar with robots of any kind.

Does this mean that the geminoid has overcome the “uncanny valley” effect? Before talking through the geminoid, the initial response of the visitors seems to resemble the reactions seen with previous androids: Even though they could not immediately recognize the androids as artificial, they were nevertheless nervous about being with the androids. Is intelligence or long-term interaction a crucial factor in overcoming the valley and arriving at an area of natural humanness?

We certainly need objective means to measure how people feel about geminoids and other types of robots. In a previous android study, Minato et al. found that gaze fixation revealed criteria about the naturalness of robots [15]. Recent studies have shown different human responses and reactions to natural or artificial stimuli of the same nature. Perani et al. showed that different brain regions are activated while watching human or computer graphic arm movements [7]. Kilner et al. showed that body movement entrainment occurs when watching human motions, but not with robot motions [16]. By examining these findings with geminoids, we may be able to find some concrete measurements of human likeliness and approach the “appearance versus behavior” issue.

Perhaps HI-1 was recognized as a sort of communication device, similar to a telephone or a videophone. Recent studies have suggested a distinction in the brain process that discriminates between people appearing in videos and people appearing live [17]. While attending TV conferences or talking by cellular phones, however, we often experience the feeling that something is missing from a face-to-face meeting. What is missing here? Is there an objective means to measure and capture this element? Can we ever implement this in robots?

2.5 Summary and Further Issues

In developing the geminoid, our purpose was to study Sonzai-Kan, or human presence, by extending the framework of android science. The scientific aspect must answer questions about how humans recognize human existence/presence. The technological aspect must realize a teleoperated android that works on behalf of the person remotely accessing it. This will be one of the practical networked robots realized by integrating robots with the Internet.

The following summarizes our current challenges:

Teleoperation technologies for complex humanlike robots

Methods must be studied to teleoperate the geminoid so as to convey existence/presence, which is much more complex than traditional teleoperation for mobile and industrial robots. We are studying a method to autonomously control an android by transferring the motions of the operator measured by a motion capture system. We are also developing methods to autonomously control eye-gaze and humanlike small and large movements.

Synchronization between speech utterances sent by the teleoperation system and body movements

The most important technology for the teleoperation system is synchronization between speech utterances and lip movements. We are investigating how to produce natural behavior during speech utterances. This problem extends to other modalities, such as head and arm movements. Further, we are studying the effects of nonverbal communication by investigating not only the synchronization of speech and lip movements, but also facial expressions, head, and even whole body movements.

Psychological test for human existence/presence

We are studying the effect of transmitting Sonzai-Kan from remote places, such as meeting participation when the person themself cannot attend. Moreover, we are interested in studying existence/presence through cognitive and psychological experiments. For example, we are investigating whether the android can represent the authority of the person himself by comparing the person and the android.

Application

Although being developed as a research apparatus, the nature of geminoids may allow us to extend the use of robots in the real world. The teleoperated, semi-autonomous facility of geminoids allows them to be used as substitutes for clerks, for example, that can be controlled by human operators only when non-typical responses are required. In most cases, an autonomous AI response will be sufficient, so a few operators will be able to control hundreds of geminoids. Additionally, because their appearance and behavior closely resembles that of humans, geminoids could be the ultimate interface devices of the future.