Keywords

1.1 Prologue

The Japan Science and Technology Agency (JST) is one of the core institutions sponsored by the Japanese government and it is responsible for the implementation of science and technology policy in Japan.Footnote 1 As well as other grants, JST provides researchers with three types of competitive funds: those for strategic basic research programs, those for research and development programs focused on technology transfer, and those for global activities. The total amount of competitive funds from JST for FY2014 was slightly more than 700 million Euro.Footnote 2 About two thirds of these funds were provided to Strategic Basic Research Programs where our Core Research for Evolutionary Science and Technology (CREST) and 10 other programs were incorporated. CREST is a funding program for network-based (team-based) research that has given rise to outstanding results that are believed to lead to scientific and technological innovation. Thirty seven research areas consisting of 431 research teams were active in FY2014. Eight research areas were in green innovation, eleven were in life innovation, nine were in nano-technology and materials, and nine were in ICT.

Our research area is funded under the title of “creation of human-harmonized information technology for a convivial society.Footnote 3” It was founded by the late Professor Yoh’ichi Tohkura and launched in FY 2009 to address strategic objectives to create a basic technology that enables an information environment that is in harmony with people. We propose a focus on perceptual information processing to harmonically interface between the human and information environment. The three main features of our program are:

  • Recognition and comprehension of human behaviors and real-space contexts by utilizing sensor networks and ubiquitous computing,

  • Technologies for facilitating man-machine communication by utilizing robots and ubiquitous networks, and

  • Content technologies that are related to analyzing, mining, integrating, and structuring multi-media information processing.

Seventeen research teams were chosen from FY2009 to 2011: eight in the first year, five in the second, and four in the third as a result of peer reviews that followed applications that were received in response to calls for proposals each year. Each research team was funded for slightly more than 5 years. This chapter encompasses the entire scope of our research area and reports the results that were obtained by eight research teams that were launched in the first year together with another one that was launched in the second year and upgraded to an exploratory research for advanced technology office (ERATO) project from FY2014.

I will first discuss the impact that rapid progress in technology may eventually bring about in our daily lives and society in the long run in the rest of this chapter and point out vulnerabilities in humanity and society, as these comprise the background to and motivation for our research area. I will then introduce the concept of human-harmonic information technology as a means of enhancing human and social potential that is threatened by the surge in social change accelerated by technology. Finally, I will overview the results obtained from the first group of research teams, discuss future perspectives, and conclude this chapter.

1.2 Changing World

The grand idea of human-harmonic information technology originated from Professor Tohkura’s impressive conjecture that concerned the five-staged shift in roles of information in human society (Fig. 1.1). The first two stages were hunter-gatherer and agricultural societies. Information was used for survival in these stages. Just like other creatures, humans needed to find cues from the environment to survive by finding food while avoiding potential dangers. Unlike other creatures, humans invented a means for communication not just with allies but also with descendants beyond generations by handing down stories with spoken languages. The expression and preservation of thoughts in written language is far more stable and endowed mankind to evolve with memes or cultural genes that were orders of magnitudes faster than biological genes. It allowed mankind to significantly increase the capability and stability of producing food, which was first accomplished by hunting and then by agriculture.

Fig. 1.1
figure 1

Professor Tohkura’s grand conjecture concerning the five-staged shift in roles of information in human society

The social structure in these two stages might be characterized as one that was supported by human intelligence, as schematically outlined in Fig. 1.2, where major communications and decision making in corporations were supported by humans who conducted physical tasks. Printing technology contributed a great deal to preserve information and communicate it beyond generations and geographic distances.

Fig. 1.2
figure 2

The human society was supported by humans. © 2015, Toyoaki Nishida and At, Inc. Reproduced with permission

Mankind exhibited enormous interest in improving the productivity of energy, materials, and machinery at lower cost and with less labor, which allowed us to witness the emergence of an industrial society in which energy, materials, and machinery enhanced one another to accelerate increases in productivity. The movable-printing technology that Gutenberg invented in the 15th century contributed to the rapid and reliable dissemination of knowledge that was required for production. Information was used by people to maintain and improve productivity. People were gradually freed from the burden of the heavy physical tasks that were involved in agriculture and hunting. The information tasks imposed on people were not greatly alleviated until the emergence of electrical and electronic engineering where electricity was used to represent signals by converting them into electronic representations for storage, transmission, and reproduction. The creation of new jobs allowed workers to become involved in contract-based work that released them from their previous destiny. Although the notion of automation was introduced, the logic for automation was tenuous when logic was mechanically implemented. The critical role of people shifted to information tasks, the control of artifacts using their capabilities for perception and cognition, and the design of novel products and services. This brought about the notion of employment, which in turn added another role to information, i.e., information for living.

Meanwhile, the structure of communication and decision making did not change much from the scheme in Fig. 1.2, even though the advent of electronics and electronic engineering and the modern transportation system significantly improved the efficiency of communication.

We witnessed the birth of digital computers in the last century. This reflected a trend in mathematics to formalize the idea of computation as mathematics. This idea was brought to fruition through electronic engineering. The invention of digital computers and networks rapidly penetrated into human society, having transformed it into an information society. The theoretical foundation for information and communication technology originated from the idea of a universal computing machine. This now called the universal Turing machine, which Alan Turing proposed in 1936, that for any recursive function, f, one can always program the universal Turing machine so it may calculate f. This is a super innovation as it suggests that if somebody builds a universal Turing machine, U, through some physical means such as electronic circuits, U can compute any recursive function “simply” by symbolically describing it, e.g., as a punched paper tape, and one does not need to reconstruct the whole electronic circuit for computing a given function. In fact, the idea of a universal Turing machine was substantiated by von Neumann and other innovators around 1945, and since then, technical improvements to digital computers have been exponentially taking place.

Another innovation in information and communication technology was the rapid development of digital networks that interconnected computers. The Internet rapidly reached world-wide, after the period of academic use that followed the military experimental stage and the invention of a hypertext mark-up language (HTML), its interpreters known as browsers, and the introduction of powerful search engines in the 1990s. Thus, there were 1.02 billion Internet users around the globe in 2005, which has kept growing at about 12.4 % per year, until it reached 2.92 billion in 2014. This is expected to reach 5.5 billion by 2025 according to Cisco.Footnote 4

The exponential growth of computer and communication technology reached a point at the end of the last century that convinced people that digital computers and networks constituted the most important part of our infrastructure that controlled energy and materials, as has been summarized in Table 1.1.

Table 1.1 The rough history of the development of information and communication technology

Many people now came to think of computers as new tools [21]. In fact, huge numbers of computer tools and systems were deployed to support human life. Although the tasks conducted by them were rather simple repetitive tasks, they potentially involved tremendous amounts of computation and complex logic. Both computers and networks became quite inexpensive compared with their performance at the beginning of this century while digital computers and networks were very slow and programming was quite expensive and non-dependable in the early days. Significant portions of basic software as well as data became open and free so many people could participate in activities to build ever new values on top of the new information and communication infrastructure.

It not only provided efficient and reliable transmission of information and computing, but it also gradually took over information tasks that had been conventionally imposed on people to support other people. The scheme presented in Fig. 1.2 was transformed into the one in Fig. 1.3, where increasingly more complex and intelligent tasks were delegated to computers that could serve as apprentices to human experts and not just as tools.

Fig. 1.3
figure 3

The human society was supported by human-AI hybrid teams. © 2015, Toyoaki Nishida and At, Inc. Reproduced with permission

Significant portions of information tasks in corporations became replaced by computers that were able to perform these tasks both efficiently, reliably, and incessantly at low cost. As a result, people have been increasingly released from the laborious duties of repetition of simple information and have shifted into higher levels that require more abstract forms of intelligence, such as aesthetic sense, intuition, wisdom, or high-risk, high-return investments. People have simultaneously come to believe that essential value consists in information that may mean a ticket to pleasure.

The exponential growth in computer and communication technology did not stop or even slow down and still kept growing in the new century. It gave rise to an information explosion or the big data phenomenon. Many aspects of human life started to be featured or described as data.

The machine learning and data mining technologies that have been developed in artificial intelligence have been widely utilized to derive useful information from data to provide better services. The more data that are available results in the emergence of more intelligent machines, which in turn produces more data for further intelligent machines. In fact, this trend can be seen in the history of the development of ICT and AI technologies , which are roughly listed in Table 1.2.

Table 1.2 The rough history of AI technology development

It is evident that although AI technology was initially meant just to address a new academic challenge due to limitations in the scale and power of computers and available data, it became socially influential around 2010. It is now believed that AI has the potential to address the value of information and significantly amplify it. The synergy between big data and intelligent machines may eventually bring about an intelligence explosion.

However, a couple of problems may arise [20] on the dark side. The first problem is technology abuse : new technologies can be applied to illegal or malicious activities. The second is responsibility flaws : the more complex artifacts become, the less likely it is that humans can place them under control, i.e., neither the product maker nor the owner of a complex artifact may take full responsibility for an artifact, if it is fairly complex.

1.3 Toward a Convivial Society

As many authors have discussed in the discourse of the latter half of the chessboard arguments, it will not take long until such exponential growth will rapidly change the social structure. AI will be able to outperform people so long as the criteria are clearly defined [3, 5]. Even very high-level information tasks that previously required a very abstract form of intelligence will be conducted by intelligent autonomous agents and the ratio of autonomous agents against humans in corporations will increase to infinity, which will result in the scheme of an AI-supported society illustrated in Fig. 1.4.

Fig. 1.4
figure 4

The human society was supported by artificial intelligence. © 2015, Toyoaki Nishida and At, Inc. Reproduced with permission

1.3.1 Brightness and Darkness in AI-Supported Society

The AI-supported society involves AI that is heavily used to mediate between people and services [20]. Social interactions may be accomplished hierarchically. The role of social interactions at low levels is to dynamically allocate computational resources to achieve maximal utility by taking into account fairness under priority settings. Social interactions at higher levels may be for more abstract social interactions including information sharing, collaboration, negotiation, contract making, coalition, and arbitration. Social interactions may be designed at the sociological level to negotiate conflicting intentions. Philosophers such as Thomas Hobbes discussed the negotiation between individuals and the government as a social contract problem of arbitration of conflicting benefits in a world governed by natural laws [16]. Apparently, positions such as “each person is to have an equal right to the most extensive basic liberty compatible with a similar liberty of others” and “social and economic inequalities are to be arranged so that they are both (a) reasonably expected to be to everybody’s advantage, and (b) attached to positions and offices open to all” proposed as part of a theory of justice by Rawls [22] should be respected. However, these positions should only be taken as desiderata, and not as rigorous rules that are approximately implemented into artifacts and artificial societies. Such a best effort attitude is significant in the Internet age in the sense that providers only promise to make best efforts to offer good services and customers have to get used to these efforts. The technology for AI-mediated social interaction might resolve problems with technology abuse and responsibility flaws.

In addition, the AI-supported society will eventually even release humans from information tasks. Even though various people will continuously take care of corporate businesses, AI will automatically make the optimal decisions to run corporations to support mankind. AI-corporations will range from the infrastructure to highly intellectual tasks such as education or care-giving, due to its reliability, cost, and nice personalized interfaces. The less people that are required to run corporations, the fewer costs they will require. The total cost of supporting mankind will eventually be significantly reduced. Although some people will still work even harder than today in the AI-supported society, it will not be because they have to support themselves or their family or do it for some inescapable reasons, but because they want to achieve some accomplishments that they probably set up voluntarily as a goal to satisfy their dreams. The elderly and small children will be able to gain autonomy with the full support of intelligent physical assistants. The notion of profession will remain not as someone who is paid who has high-level and often licensed skills, but as a highly motivated and skilled person often with an implicit or explicit mission statement, such as someone who can release mankind from pain or entertain people to make them happy. In other words, professions as an obligation or a means of earning money will decrease, while those that require self-motivation will increase. The question is whether AI-supported society will be nice to people. Is that bad?

However, AI-supported society might bring about new problems. The first is a crisis in morality ; as AI-assistants handle most social conflicts, people might become ignorant about ethics and humanity. Carr points out that we have started handing off moral decisions to computers [7]. Another problem is overdependence on artifacts: [20] as a result of AI being introduced, society might assume the infallibility of artifacts without rationale and people might use artifacts without a balanced sense of judgment. AI might bring about infirmity to individuals and society at the human level; as AI can do better than professional people who have respected expertise, human society may encounter significant social changes, and as a result, people might prefer AI to people, as has been pointed out by Turkle [27]. We need to overcome these problems until we reach the convivial society introduced by Professor Tohkura where information is used for harmony and empathy between humans and technology [18] is achieved so people and technology can know each other, feel each other, and share emotions and morals.

1.3.2 Transition to a Convivial Society

The transition phase from an information society to a convivial society might be painful. It involves such issues as end of work [8, 23] and race against the machine [4]. They are essentially caused by the conflicts between new and old regimes. Although change is welcome, human society does not seem to be prepared for any significant change to the AI-supported society. Among others, the end of work issue may cause the most serious concern, as work is not just a means of earning money, but also fulfills a more essential desire: self-actualization [17]. Even though one may find AI-supported human society suitable, s/he cannot necessarily adapt to her/his economical life to that shift. AI might become fairly creative, in the sense that it can even make important scientific discoveries. As a result, most people might eventually lose not just their jobs but also the opportunity for self-actualization, so long as employers want cheaper and better employees who can accomplish specified jobs. Although some jobs, such as those that require individual responsibility or symbolic ones that can only be done by a small number of people, e.g., top human players, will remain dedicated to humans, they are extremely competitive, as was described in Ando’s enlightening book [2]. Although this is fair, it is not easy for people as they are not familiar with that kind of lifestyle.

People need to find other areas of self-actualization. It is extremely difficult to establish creativity at a level that other people can admire and recognize. People might lose the confidence, self-efficacy, and even self-esteem that originate from their skills and expertise in the conventional regime. Such desperation might lead to a loss of identity. People may give up on retaining identity by finding something that others cannot do. Even human autonomy might be lost; if people look for a better choice, they had better follow decisions made by AI. As Carr pointed out [7], excessive automation might alienate humans from activities, which are the source of humanity. He argues that this happens even today by referring to an article published in the Economist in 2012 that pointed out that “for most people, the servant (i.e., the smart phone) has become the master (of the owner)”.

We need to completely design and implement the social regime for the transition to be successful, ranging from lifestyles to working styles and the redistribution of wealth. It is often quite difficult to destroy the social structure with which we have been familiar, particularly under circumstances where the outcome is not completely clear and not everybody may agree on the aspects of transition, though not the transition itself. Second, we need to change ourselves to match the new social regime. Even though people understand that work shift [9] is inevitable , it might not be easy at all for many people to change their work habits and void the expertise they have fostered during their lifetimes.

We will need to revamp the relationship between humans and technology, not just by inventing human-friendly technology , but also by making technology explore new forms of perception, activity, and creativity. This may suggest that human nature might be significantly influenced by technology and some virtues mankind has harnessed in history might be lost or at least threatened as a result. We need to teach computers about ourselves and our society, as computers as a metaphor for apprentices should be transformed into computers as a metaphor for partners in the transition phase. This includes numerous things, from commonsense to highly professional knowledge and skills. Even though teaching may be fun and meaningful to a majority of people, what is waiting for them after the transition will be rather difficult. As AI workers will undertake simple tasks, people need to be more creative than before, or engage in tasks only humans can do, e.g., those involved in evaluating other people or taking a certain amount of risk and responsibility to defy unpredictable challenges. This is pretty harsh, as the success ratio will be much lower than before. The success ratio might follow a power law rather than a normal distribution where people may become happy if they can be within the scope of being significantly over average. People understand effective traditional practices, such as diligence, to achieve success. In contrast, those virtues may not always be effective in achieving a very high degree of creativity, such as becoming a superstar. Indeed, no fixed paths seem to exist to achieve success in an information society [28].

The use of technology to increase empathy in society will be necessary to lead to the idea of the convivial society proposed by Professor Tohkura within a broader context. This is an approach to designing a good relationship between human society and technology by coping with the threat to humanity and human society both at the transitional stage that we will have to go through and the asymptotic stage that we will eventually witness, and that will therefore be enhanced through adequate means. I believe that the notions of human and social potential exist at the center.

1.3.3 Human and Social Potential

Issues on the transition to a convivial society are summarized in Fig. 1.5. It has been argued that human society will have to resolve numerous difficulties in the transition phase to the convivial society. We will have to make serious efforts on enhancing our own wisdom and resilience to build a new social regime and reach the convivial society in which humans and technology are in harmony.

Fig. 1.5
figure 5

Issues in the Transition to the Convivial Society. © 2015, Toyoaki Nishida and At, Inc. Reproduced with permission

We consider human and social potential to be a central issue. Human potential is the power of an individual that enables her/him to actively sustain an endeavor to achieve a goal in maintaining a social relationship with other people. It involves vision, activity, sustainability, empathy, ethics, humor, and aesthetic sense. Vision permits one to initiate a long-term coherent activity. It involves setting up a goal if it is considered important and meaningful even though it is painful and risky. Activity implies a decision of changing thought into actions when faced with numerous difficulties. Sustainability relies on a strong will to adhere to a plan when various kinds of unexpected events and failures occur. Great wisdom is needed from time to time to revise the initial plan on time, and whenever necessary. Empathy is the ability to reflect on the thoughts and emotions of other people and regard them as if they had happened to oneself. Ethics regulates one’s intentions and activities to pay a great deal of respect to other people as well as following ethical principles. As a result, one may sacrifice oneself to help others and refrain from taking advantage of the weakness of others. A sense of humor may be used to entertain oneself or other people by turning otherwise ridiculous or even negative events into a cheerful story. An aesthetic sense is about the creation and appreciation of beauty, which may make our lives pleasant and lovely. Although human potential is considered to be innate to individuals, it might be under threat due to rapid, unpredictable, and overwhelming torrents in the transition phase. People may forget or even lose the virtue of human potential under such difficulties.

Social potential is the power that a society of people possesses as a whole. It encompasses generosity, supportivity, conviviality, diversity, connectedness, and innovativeness. Generosity maximally alleviates the degree of potential penalty in failure to encourage members to address difficult challenges. Supportivity not only actually helps members engage in various challenges but also provides them with a feeling that their activities are being supported by society. Diversity encourages members to be different to increase the success of both society as a whole and individual members. Connectedness provides participants with the feeling of being connected to help one another to overcome difficulties and provokes the synergistic effect of sharing pleasure. Innovativeness is a shared attitude of individuals who aim at innovations as a whole society.

Human and social potentials complement each other to enable creativity. Human potential allows individuals to explore, set up, and sustain meaningful goals and efforts to achieve them. Even if the path is filled with difficulties and pains that have resulted from failure, human potential serves as a source of encouraging individuals not to give up. Social potential legitimately supports creativity. The more creative a society is trying to be, the lower the success ratio that results from this. Thus, a creative society tends to reward successful people more, is generous when failures occur, and promotes collaboration.

Among others, I believe that play and game are the center to human and social potential. As is suggested by “homo ludens” [10], play, defined as “a voluntary activity or occupation executed within certain fixed limits of time and place, according to rules freely accepted but absolutely binding, having its aim in itself and accompanied by a feeling of tension, joy, and the consciousness that it is ‘different’ from ‘ordinary” life,Footnote 5” is essential to humanity in the sense that “play is older than culture.” Caillois [6] elaborated on the idea, shifting the focus from play, defined as an activity which is essentially free, separate, uncertain, unproductive, governed by rules and make-believe, to game which is classified into four categories: agon (competition), alea (chance), mimicry (simulation), and ilinx (vertigo). We have good reason to believe both play and game are closely related to the creativity of individuals and conviviality of a society in the AI-supported society.

1.3.4 Roadmap for Enhancing Human and Social Potential

A roadmap for enhancing human and social potential (Fig. 1.6) may consist of multiple levels, ranging from philosophy to technology.

The top level constitutes a manifesto from the viewpoints of ethics and humanities. It should promote the development and use of technology to enhance human and social potential, where the technology should be designed to encourage active participation for creativity in a technology-supported society, and not to deactivate people. The convivial society needs to be discussed in depth, and consensus in this society should be democratically formulated. Among others, the end of work issue should be discussed from the viewpoint of ethics and humanities until a global consensus is obtained. The potential difficulty and pain in the transition phase to it need to be addressed. Even though spontaneous painstaking for innovation might be encouraged, people should not be compelled to accept pain.

The strategy level should pave the road for the convivial society so that each sector in the society may smoothly depart from the conventional social and economic regime without excessively suffering from difficulties. Coping with work shift might be among the most critical issues and a maximally effective strategy for this needs to be formulated and shared. A practical strategy for incrementally taking advantage of advanced technology and embedding it into society needs to be figured out to formulate an empathic relationship between technology and mankind. People with great expertise in the conventional paradigm should be respected and protected in the transition phase.

Fig. 1.6
figure 6

Roadmap to the Convivial Society

New technical challenges need to be identified at the technology level to create the transition and achieve the convivial society. Scenarios regarding how new technologies contribute to enhancing human and social potential need to be made explicit and shared. Human harmonized information technology, to be presented in the rest of this chapter, is focused on to enhance human perception of the convivial society.

1.4 Human-Harmonized Information Technology

The JST-CREST research area on creation of human-harmonized information technology for the convivial society was established in 2009 to address the imbalance between humans and technology, which may hinder rather than help the creativity of humans through fast and overwhelming technological developments. We placed particular emphasis on innovative theory and technology to achieve an innovative perceptual information environment in harmony with people not just to adapt to them but also to cultivate spontaneous activities to enable creativity and establish an empathic relationship between humans and technology. Our program features recognition and comprehension of human behaviors and real-space contexts by utilizing sensor networks and ubiquitous computing, technologies for facilitating man-machine communication by utilizing robots and ubiquitous networks, and content technologies related to analyzing, mining, integrating and structuring multi-media information processing.

We focus on innovative approaches in human-harmonized sensations by encompassing analysis and modeling and computational realization and applications. Our scope not only includes basic computational sensations such as audio-visual information processing that has a long history of research and tactile and haptic information processing that is relatively new, but it also strives for novel computational sensations such as a sense of presence and distrust/mistrust, which have been rarely addressed before. Although we encouraged each team to span from basic research to applications in the open calls for proposals and in fact placed equal emphasis on technology deployment and social implementation as well as basic research and technical development, we gave greater priority to innovative proposals that might lead to deep insights into research topics.

1.4.1 Organization of Research Teams in Research Area

We selected 17 teams for funding out of the 223 proposals we received over 3 years from 2009–2011, as listed in Table 1.3. We promoted collaboration among different teams after the selection so that they could not only help but also stimulate one another to achieve more challenging innovations. In this volume, we assemble the first nine teams which were already concluded, we will report the forthcoming results from the remaining eight teams in the succeeding volume. This volume is with the subtitle “Vertical Impact” as it is the first impact that spans from the foundation not only to application but also to social implementation, whereas the second volume will be with the subtitle “Horizontal Expansion” as it will demonstrate the breadth of the scope.

Table 1.3 Teams of the JST-CREST research area on creation of human-harmonized information technology for Convivial Society

Chapter 2 describes the results obtained from a project entitled “studies on cellphone-type tele-operated androids transmitting human presence” conducted by Ishiguro [11]. Ishiguro’s team addressed a new communication medium for transmitting human presence and developed a series of new tele-operated androids. Telenoid , among others, was designed to allow the user to transmit her/his presence to a distant location by using the Telenoid so that a conversation partner in the location could talk to the Telenoid as if it were the user herself/himself. A significant reduction in the cortisol levels was found for participants who had conversations with huggable devices, such as the Telenoid, in an experimental evaluation.

Chapter 3 reports the results from a project entitled “modeling and detecting overtrust from behavior signals” led by Takeda [26]. Based on large signal corpora, Takeda’s team investigated the mathematical modeling of human behaviors by mapping the behaviors onto two discrete-continuous hybrid systems, i.e., a cognition-decision process and a decision-action process. This research involved building a behavioral model that could relate the human internal state and observed behavioral signals. The results from the research were applied to detecting over-reliance in an automated driving system.

Chapter 4 explains what was obtained from a project entitled “life log infrastructure for food” led by Aizawa [1]. Although food is one of the most important and regularly consumed factors in our daily lives, it has rarely been viewed as an object of information processing. Aizawa’s team developed an infrastructure for life logs, with an emphasis on food and food-related activities in our daily lives. It allowed them to investigate the capture, analysis, visualization, and interfacing of multimedia logs of food and related experiences. They drew on this data collection to investigate potential community discovery, support for communications, standardization of life log data, and privacy control issues. Healthcare was addressed as an application.

Chapter 5 presents the results obtained from a project entitled “dynamic information space based on high-speed sensor technology” by Ishikawa [12]. A new information space was constructed in this project that allowed humans to identify phenomena that exceeded the limitations of human senses. Crucial to this effort were: perfect detection of underlying dynamics and a new model of sensory-motor integration drawn from work with a kilohertz-rate sensor and display technologies. As the sampling rate within the information space is matched with the dynamics of the physical world, humans are able to deterministically predict the attributes of the surrounding, rapidly-evolving environment. This leads to a new type of interaction, where the learning rate and capacity of our recognition system are augmented.

Chapter 6 overviews the outcome from a project entitled “construction and utilization of a human-harmonized ‘tangible’ information environment” that was led by Tachi [25]. The aim of Tachi’s project was to construct an intelligent information environment that was both visible and tangible, where real-space communication, a human-machine interface, and media processing were integrated. Tachi’s team created a human-harmonized “tangible information environment” to attain this end that allowed human beings to obtain and understand haptic information in real space, to transmit the thus-obtained haptic space, and to actively interact with other people using the transmitted haptic space. The tangible environment enables telecommunication, tele-experience, and pseudo-experience with the sensation of working as though one were in a natural environment. It also enables humans to engage in creative activities such as design and creation as though they were in a real environment.

Chapter 7 presents the work by Yasuharu Koike on his project [15] entitled “elucidation of perceptual illusion and development of a sense-centered human interface.” Koike’s team shed light on replicating physically plausible information by providing a real sensation of presence in tele-existence. Koike’s team proposed the concept of a sense-centered human interface as a technique of producing haptic sensations without the use of haptic devices. The concept was achieved through pseudo-haptics, which is a technique of simulating haptic sensations by using visual feedback. Its applications include Touch-Centric interaction embodiment eXploratorium (TCieX), surgery robots, and power-assist robots.

Chapter 8 describes the results from a project entitled “sensing and controlling human gaze in daily living space for human-harmonized information environments” by Sato [24]. The main goal of this project was to develop novel technologies for sensing and controlling human gaze non-invasively in daily living space. Such technologies are the key to achieving human-harmonized information environments that can provide us with various kinds of support more effectively without distracting us from our other activities. Sato’s team developed gaze estimation techniques to attain this goal, which required none or very limited calibration efforts by exploiting various cues such as the spontaneous attraction of people’s visual attention to visual stimuli. Sato’s team took two approaches to shifting gazes to required locations in a non-disturbing and natural way to attain gaze control: subtle modulation of visual stimuli based on visual saliency models, and non-verbal gestures in human-robot interactions.

Chapter 9 gives an overview of a project entitled “smart Posterboard: multimodal sensing and analysis of poster conversations”, which was led by Kawahara [14]. A smart posterboard was developed in this project that employed multiple sensing devices to record poster conversations, so the user could review who came to the poster and what kind of questions or comments he/she made. Conversation analysis combines speech and image processing such as head tracking, speech enhancement, and speaker diarization (identification of who spoke when information was obtained in a multi-party conversation). High-level indexing of interest and the comprehension level of the audience was accomplished based on their multi-modal behaviors during the conversation.

Chapter 10 discusses the insights obtained from a project entitled “developing a communication environment by decoding and controlling implicit interpersonal information” that was led by Kashino [13]. The team studied smooth and effective interpersonal communication that strongly depended on implicit, non-symbolic information that emerged from the interaction between partners (implicit interpersonal information: IIPI). Kashino’s team developed new methods to improve the quality of communication by decoding IIPI from brain activities, physiological responses, and body movements, and by controlling IIPI by using sensorimotor stimulation and non-invasive brain stimulation.

1.4.2 Structured Overview of First Outcome

I combined the results into a big picture because a structural description of the results from each team is given in succeeding chapters and here discuss how they contributed to achieving human-harmonized information technology. Table 1.4 provides us with a bird’s eye view of the entire contribution.

Table 1.4 A Bird’s Eye view of the human-harmonized information technology in this volume

1.4.2.1 Basic Research Level

We obtained basic conceptualizations and scientific findings for human-harmonized information technology, which encompasses human perception and cognition at the basic research level.

Ishiguro’s team introduced sonzai to represent human presence or existence and sonzaikan to represent a feeling of presence in the challenge to transmit human presence [11]. They argued that although sonzai is used to refer to an objective presence, sonzaikan is only present when its presence is recognized by a person. Thus, recognition is crucial for one to perceive sonzaikan, and at least two modalities are needed. The idea of using minimal modalities to induce sonzaikan brought about the idea of sonzaikan media from minimal design that combines auditory and tactile sensations. They proposed to examine changes in cortisol hormones to measure the effect of interaction with sonzaikan media on the human neuroendocrine system.

Takeda’s team worked on a cognitive model of excessive trust in human cognition and behaviors in (semi-) automated system environments, which was aimed at achieving accompanying intelligence that could assist the user in how to behave in complex environments [26]. They used piecewise auto- regressive systems with exogenous input (PWARX) models to express decision/action situations to complement them to adapt to new data and consistently identify the models. They used their method to build an integrated model of gaze and vehicle operational behavior and demonstrated that it was effective in detecting risky lane changes.

Kashino’s group studied implicit interpersonal information (IIPI) that is considered to enable people to enable smooth and effective interpersonal communication. Their findings consists in the four lines of research: (1) decoding mental states, such as saliency, familiarity, and preference from micro-saccade and change in pupil diameter; (2) identification of the cause of impaired communication in high-functioning autism spectrum disorder (ASD); (3) the development of the methods for improving the quality of communication by controlling IIPI and/or neural processes involved in he processing of IIPI; and (4) elucidation of neural mechanisms involved in the processing of IIPI.

Based on studies on how information is mapped between the physical, physiological, and psychological spaces, Tachi’s team has proposed a haptic primary color model that can serve as the foundation for designing a haptic information display to recreate cutaneous sensation [25].

Koike’s team addressed the conceptualization and perceptual foundation for a sense-centered human interface [15]. Koike’s team investigated the illusion in perceived heaviness induced by the time offset between visual and haptic contact. They found that an object was perceived to be heavier when force was applied earlier than visual contact and perceived to be lighter when it was applied later. They also found that an illusion became smaller after participants had been conditioned to the time offset. Furthermore, they introduced two indices, i.e., the point of subjective simultaneity (PSS) and the point of subjective equality (PSE) to quantitatively measure the subjective evaluation of timing and weight perception. In addition, they conducted an fMRI experiment to estimate the representation of motion in the brain. Their results conformed with previous results that have been reported so far. They are currently working on applying their insights to touch-centric interaction embodiment exploratorium (TCieX), which is a surgery robot augmented with a sensor-centered human interface, and power assist robots.

Kawahara’s team studied a multi-modal corpus that was obtained with the Smart Posterboard they developed and gained some useful insights into the prediction of turn-taking, speaker dialization, hot spot detection, and the prediction of interest and comprehension levels by analyzing multi-modal conversations. Eye-gaze information was generally found useful in predicting turn-taking and in improving speaker dialization. It was found that about 70 % of next speakers in turn-taking events could be predicted by combining eye-gaze objects, joint eye-gaze events, duration, and backchannels. Multi-modal speaker dialization was achieved by integrating eye gazes and acoustic information. This indicated that eye-gaze information was useful for dialization in noisy environments. Hot spots were not necessarily associated with laughter but consistently and meaningfully with reactive tokens and specific prosodic patterns. It was found that interest levels could be predicted by using the occurrence of questions and prominent tokens and comprehension levels could be estimated from question types.

1.4.2.2 Platform Research Level

We obtained generic computation schemes and systems that could serve as a platform for building human-harmonized information systems at the platform research level.

Ishikawa’s team developed four technologies: high-speed 3D vision for insensible dynamics sensing , a high-speed resistor network proximity sensor array for detecting nearby objects , noncontact low-latency haptic feedback , and a high-speed display of visual information toward achieving a dynamic information space that could harmonize a human perception system, a recognition system, and a motor system [12]. The first technology allowed them to capture depth images containing \(512 \times 512\) pixels in real time at 500 fps on a high-frame-rate (HFR) camera-projector system . Time division multiplex 3-D structured-light measurements were implemented on the HFR camera-projector system to acquire complete 3-D information with minimal occlusion using multiple camera-projector modules. The second technology brought into being a high-speed proximity sensor array to simultaneously detect azimuth and elevation. The main features of the proposed dome-shaped sensor included rapid responsiveness and simpler wiring, while maintaining a \(360^{\circ }\) sensing range and detection from the sensor’s sides to the top. The third technology was based on an airborne ultrasound tactile display (AUTD). A freely extendable phased array system allowed them to construct an array of a \(576 \times 454.2\,\mathrm{mm}^2\) aperture. The system was able to produce highly localized vibrotactile sensations on human skin \(600\,\mathrm{mm}\) away from the device with a focal intensity of \(74\mathrm{mN}\), and programmable vibrotactile sensations of \(2\,\mathrm{kHz}\) and 320-level quantization. The fourth technology consisted of a high-frame-rate LED display, smart LED tiles (SLTs), and aerial imaging by using retro-reflection (AIRR). The high-frame-rate LED display was driven by an LED video processor that distributed an input image into image data for the tiled LED units. Spatiotemporal coding was used to transmit HFR images to a conventional digital video interface. The SLTs integrated a microcontroller, sensors, a wireless module, and battery within the size of an LED panel. The SLTs were used to build a wireless sensor network to share sensed information even when the smart tiles were moved. AIRR was achieved by using retro-reflected material to create images from an HFR LED panel (\(960\,\mathrm{fps}\)). Ishikawa’s group demonstrated how a system for dynamic information space could be developed by using these four technologies. Their demonstration included an AR typing interface for mobile devices and a high-speed gaze controller for high-speed computer-human interactions, as well as more integrated systems such as a VibroTracker , which is a vibrotactile sensor for tracking objects, and an AIRR Tablet , which is a floating display with a high-speed gesture user interface.

Tachi’s team developed a rather comprehensive suite of platforms for a tangible information environment [25]. They developed a number of haptic information displays. Gravity Grabber can present normal and tangential forces on a fingertip. The TACHTILE Toolkit is an introductory haptic toolkit for disseminating haptic technologies as the third medium in the field of art, design, and education. A vision-based thermal sensor uses themosensitive paint and a camera. Thermosensitive paint is used to measure thermal changes on the surface of the haptic sensor for telexistence, as it changes its color according to thermal changes. The telexistence avatar robot system, TELESAR V, is a telexistence master-slave robot system that can provide the experience of an extended “body schema” to permit the user to maintain an up-to-date representation in space of the positions of her or his various body parts. Retro-reflective Projection Technology (RPT)-based full-parallax autostereoscopic 3D (RePro3D) can generate vertical and horizontal motion parallax, allowing the user to view a 3D image without having to use special glasses when she or he looks at the screen through a half-mirror. RePro3D may be combined with Gravity Grabber to produce tactile interaction with a video image, e.g., a virtual character. An autostereoscopic display called HaptoMIRAGE can produce a 3D image in mid-air with a wide angle of view of \(180^{\circ }\), which allows up to three users to observe the same image from different viewpoints.

Sato’s team developed a suite of techniques for sensing and controlling human gazes not only in a laboratory environment but also in a living life space. They introduced and implemented three key ideas for gaze sensing: (1) an appearance-based gaze sensing method with adaptive linear regression (ALR) that could make an optimal selection of a sparse set of training samples for gaze estimation, (2) a new approach to the auto-calibration of gaze sensing from a user’s natural viewing behavior that was predicted with a computational model of visual saliency, and (3) user-independent single-shot gaze estimation. They studied two methods of guiding the human gaze: (1) a subtle modulation of visual stimuli based on visual saliency models (e.g., modulation of intensity or color contrast) and (2) the use of a robot’s nonverbal behaviors in human-robot interaction.

Kawahara’s team developed a Smart Posterboard system that could record the conversations and related behaviors of participants during a poster session. The current version consists of a large liquid-crystal display (LCD) screen that can serve as a digital poster and attached sensors that include a 19-channel microphone array on the top, six cameras, and two Kinect sensors. It allowed them to build a multi-modal corpus for detailed quantitative analysis.

1.4.2.3 Application Level

Application depends on a story that can be shared between the society and the technology. It is quite challenging to spin a story that is not only technologically novel and feasible but also beneficial from the viewpoints of society and business. In our research area, Ishiguro’s team succeeded in developing a suite of sonzaikan media, consisting of Telenoid, Hugvie, and Elfoid . Telenoid was created as a test-bed based on the minimal design of a human. Hugvie was a human-shaped cushion phone. Human-likeness in visual and tactile information was emphasized in Telenoid’s design to facilitate human-robot and mediated inter-human interactions. Hugvie focused on a human voice and a human-like touch. Elfoid was a hand-held version of Telenoid. The cellular phone version could connect to a public cellular phone network and was designed to provoke stronger sonzaikan than normal cellular phones. The underlying technologies included motion generation through speech information and motion generation and emotional expression through visual stimuli. Aizawa’s team developed FoodLog Web and app. FoodLog Web is a system that not only allows the user to create a food log simply by shooting a photograph of what they have eaten but also applies image processing to analyze the uploaded photograph to generate food balance information to enable food assessment [1]. Foodlog app runs on smart phones to allow the use of photographs as a means of easily adding textual descriptions. Work by other teams is in progress whose outcome will be reported in Volume 2 of this book.

1.4.2.4 Social Implementation and Field Study

Long-term public installation plays a critical role in social implementation. It is far more than that many citizens come to touch and feel state-of-the-art technology; interaction with a large number of people brings about honest and frank criticisms of the technology from which researchers can gain plenty of invaluable insights about the future research. In the case of our research area, we have encouraged the PIs to host a long-term exhibition at the National Museum of Emerging Science and Innovation (Miraikan)Footnote 6 which attracts about a million of visitors per year. So far, four teams from our research area, those led by Naemura,Ishiguro, Tachi, and Yagi, respectively, have hosted or are hosting along-term exhibition directly or indirectly related to their research theme in this research area for around a half year or more (Fig. 1.7).

Fig. 1.7
figure 7

Technology exhibitions at the National Museum of Emerging Science and Innovation (Miraikan). Reproduced with the courtesy and permission of Miraikan. a Laboratory for new media 12th exhibition “The Studio—Extend Your Real World—”. b Robot world “Android: What is Human?” c Laboratory for new media 14th exhibition ‘Touch the World, Feel the Future’. d Laboratory for new media’s 15th exhibition “Let’s Walk! The first step for innovation”

The first exhibition by Naemura’s team had been mounted for 164 days from July 3rd, 2013 to January 13th, 2014. It was entitled “the studio—extend your real world—.Footnote 7” It demonstrated display design in the mixed reality environment (Fig. 1.7a) and attracted about 130,000 visitors. During the exhibition period, numerous open events, such as introductory talks, workshops, laboratory events have been organized. The second exhibition by Ishiguro’s team has been sustained for over an year since June 25th, 2014. The exhibitionFootnote 8 keeps asking each visitor a question what is human , through the interaction with Androids and a Telenoid (Fig. 1.7b). The estimated number of visitors is more than 500,000. Over a ten thousand visitors have actually experienced communication through Hugvie or other androids. Over 1500 media reports were published by the end of February 2015. The third exhibition by Tachi’s team had been open under the title “touch the world, feel the future”, from October 22nd, 2014 to June 15th, 2014.Footnote 9 It allowed the visitors to feel the world through a sense of touch (Fig. 1.7b). Around 140,000 people had visited in the first half of the entire exhibition period. The fourth by Yagi’s team started on July 15th, 2015.Footnote 10 It is about behavior understanding based on an intention-gait model . The exhibition will be held until April 11th, 2016. Progress report will be given in Volume 2 of this book.

There are a couple of teams that have gone farther. Ishiguro’s team conducted numerous field studies on sonzaikan media [11]. The acceptability of Telenoid was first estimated. Then, elderly care with Telenoid , cultural differences toward Telenoid, and educational support with Telenoid were investigated in field studies. Fourteen organizations were using the FoodLogWeb API that was provided by Aizawa’s team when this article was written [1]. A joint project with a nonprofit organization called Table for Two (TFT) is ongoing , which provides a unique program called “calorie transfer” to support school lunches for children in five African countries so that eating healthy meals may help needy children in underdeveloped countries.

1.5 Conclusion

This chapter overviewed the JST-CREST research area on the creation of human-harmonized information technology for a convivial society. I emphasized changing world phenomena as a background and characterized our research area as a challenge to develop key technology for the convivial society in which humans and technology are in harmony. I referred to late Professor Tohkura’s grand conjecture concerning the five-staged role shift in information in society, and pointed out that we are at the stage of an information society and are moving toward an AI-supported society. I have argued that a key idea in making a successful transition to the convivial society is through human and social potential. Human potential is the power of an individual that enables her or him to actively sustain an endeavor to achieve a goal in maintaining a social relationship with other people. It involves vision, activity, sustainability, empathy, ethics, humor, and aesthetic sense. Social potential is the power that a society of people possesses as a whole. It encompasses generosity, supportivity, conviviality, diversity, connectedness, and innovativeness. We believe that our research area contributes to building a technology for enhancing human and social potential. The outcome from the first group encompasses a suite of topics ranging from the foundation to social implementation, covering novel subjects such as implicit interpersonal information, sense-centered human interfaces, excessive trusts, and sense of presence (sonzaikan). The applications include FoodLog and a suite of sonzaikan media that has been socially implemented through field trials.