Keywords

Introduction

An aging society requires extension of healthy life expectancy and improvements in the quality of life of older people. It is essential that members have a certain amount of communication to improve their quality of life. Robots that encourage smooth communication are thus being actively developed in this context (Yamamoto et al. 2002; Matsui et al. 2010; Kanoh et al. 2011). Communication robots activate human-robot interaction to achieve this purpose. Kanda et al. conducted a study in which they found that observation of dialogue between robots encourages people to communicate with the robots naturally and smoothly (Kanda et al. 2002). Subsequently, in accordance with the results of that study, a pair of manzai robot systems was developed as a passive medium (Umetani et al. 2014).

This chapter introduces a manzai robot system, which is an entertainment robot, which is used as passive media, based on manzai scenarios that are autocreated from web news articles. Manzai is a Japanese traditional standup comedy that is usually performed by two comedians: a stooge and a straight man. Typically, a manzai performance comprises a humorous dialogue routine. In Japan, manzai performances are now being broadcast on various media outlets such as weekend television; consequently, people have grown very familiar with manzai. Manzai robots are constructed with the aim of facilitating the observation of entertaining dialogue between manzai robots as a socially passive medium . The manzai robot system is focused on content generation using an automatic script creation function.

The manzai robots automatically generate their manzai scripts from web news articles based on related keywords suggested or given by the audience and search results from the World Wide Web (WWW) , and then perform the manzai scripts. A manzai scenario consists of three parts: tsukami (the beginning of the manzai greeting), honneta (main body of the manzai scenario, and ochi (conclusion of the manzai performance). The style of the manzai scenario is “shabekuri manzai,” which means talk constructed from only the manzai scenario.

Several studies have been conducted on the performance motion of communication robots, for example, “Robot Manzai” (Hayashi et al. 2008). In the studies conducted, the scripts for performances – the motion of the robots and timing of the performance – are created by an engineer in advance. In contrast, the proposed manzai robot system is focused on content generation. The manzai scenarios are created automatically by using data-mining techniques, and the methodology of manzai performance is derived from Internet-based news articles (Mashimo et al. 2014). The manzai scenarios are served as the manzai scripts for each robot that are written in extended markup language (XML). Subsequently, each robot performs the corresponding role in the manzai script.

Manzai from robot dialogue has become a medium for information exchange wherein it is assumed that the information transferred using manzai scenarios is information that is more familiar. Matsumoto, a professional manzai scenario writer, states that “Manzai scenarios based on news articles are the easiest for people to understand” (Matsumoto 2008). Given that assertion, manzai scenarios are generated based on news articles from the Internet using many kinds of intelligence techniques such as word ontology and search tags obtained from the Internet search engines. The flow of the manzai scenario generation system is depicted in Fig. 1. The procedure of manzai robot system is shown as follows:

Fig. 1
figure 1

Overview of automated creation of manzai scenarios (Mashimo et al. 2014) © 2014 Association for Computing Machinery, Inc. Reprinted by permission

  1. (1)

    User inputs a keyword to the system.

  2. (2)

    The system obtains a news article that includes the keyword from the Internet.

  3. (3)

    The manzai robot system generates a manzai scenario consisting of humorous dialogues using fluffy patter and misunderstandings in real time. Subsequently, the system generates the manzai script for the manzai robots.

  4. (4)

    The manzai robots perform the manzai script in real time.

This chapter introduces a method for automatic creation of manzai scenarios from web news articles and management of the manzai robot systems. Then, a component-based manzai robot system designed to make the manzai robot system scalable is outlined. Further, the chapter verifies the potential of the manzai robot system by implementing management systems and an automatic manzai scenario creation system by using real robots.

Related Works

Hayashi et al. proposed a “robot manzai” system for human-robot communication (Hayashi et al. 2005). This system realized “manzai” using a pair of robots. They conducted a comparison experiment between “robot manzai” and “manzai” shown in a video performed by humans. Consequently, they demonstrated the usefulness of “robot manzai” as entertainment. Their robots specifically examine the human action in “manzai,” but we particularly examine the scenario as Manzai contents.

Our proposed system is important also in that it facilitates understanding of news contents. Park et al. developed the “News Cube,” which is a news browsing service that mitigates media bias (Park et al. 2009). Kitayama and Sumiya proposed a new search method of retrieving comparison contents for news archives (Kitayama and Sumiya 2007). Our proposed system is aimed at understanding of news contents by generating humorous dialogue based on news articles.

Numerous studies have presented and assessed dialogue analysis. Ishizaki and Den present a good summary of work in this area (Ishizaki and Den 2001). Their approaches include analyses of real-world dialogues and extraction of intentions from the dialogue. Our efforts include the generation of humorous dialogue from Internet news articles. Numerous studies, however, have examined conversational agents (Schulman and Bickmore 2009; Bouchet and Sansonnet 2009; Ishii et al. 2013). In almost all of these studies, CG characters communicate with humans based on dialogue. Applications of these studies are used in education, entertainment, communication support, and other fields for communication with CG characters by humans. As the main purpose of our research, however, people are expected to be entertained and healed by watching the robot dialogue, which is based on our proposed automatically generated scenario.

Research undertaken in the field of robotics has investigated many entertainment robots (Yamaoka et al. 2006; Shiomi et al. 2008; Khosla and Chu 2013; Rae et al. 2013). Paro is an entertainment robot that heals people with his cute gestures (Shibata 2010). PALRO, ifbot (Kanoh et al. 2005), KIROBO, and others communicate with people using cute dialogue. They talk with people, but our Manzai robots mutually converse. People only need to watch their humorous dialogue to be entertained.

Automated Generation of Manzai Scenario from Web News Articles

This section describes a method for automated generation of manzai scenarios from web news articles. First, the section illustrates manzai, a Japanese traditional standup comedy that is usually performed by two comedians, stooge and straight man, and its structure in the manzai performance. Then, the method for generation of each part of manzai scenarios is described.

What Is Manzai?

As described in this section, an automatic manzai scenario generation system that produces humorous dialogue from web news articles is proposed. Japan has a traditional form of comedy routine called manzai, which typically consists of two comedians performing humorous dialogues. It is similar to “stand-up comedy” in English, or “xiang sheng” in Chinese. In Japan, Manzai has remained extremely popular over time. A manzai show is now broadcast on TV every weekend. People continue to have strong familiarity with Manzai. The manzai metaphor is therefore useful to create acceptable humorous robot dialogue. The automatic generation of a Manzai scenario consisting of humorous dialogue generated automatically from web contents is proposed.

Manzai usually includes two performers: one is the boke, or stooge; the other is the tsukkomi, or straight man. The boke says things that are stupid, silly, or out of line, sometimes using puns, whereas the tsukkomi delivers quick, witty, and often harsh responses. Regarding our manzai robots, Ai-chan (left side of Fig. 2) is the tsukkomi (straight woman); Gonta (right side of Fig. 2) is the boke (stooge). Furthermore, the manzai scenario has a three-part structure: the Introduction, the Body, and the Conclusion.

Fig. 2
figure 2

Manzai robots (Left: ii-1, Ai-chan; Right: ii-2, Gonta)

Introduction Part

The Introduction part consists of several short dialogues. In the system, it consists of a greeting and a discussion in which the theme of the original web news is first presented.

Body Part

The Body part is the main part of the manzai scenario. It consists of humorous dialogue. The Body part is designated as the block of humorous dialogue called dialogue component. The dialogue component is created automatically based on a sentence in a web news article. That means, the manzai scenario generation system creates a dialogue component which is a set of humorous dialogue from a sentence in a web news article. The system generates humorous dialogue based on structure of funny points of dialogue using gaps of two types, which are rival relations and sentiment gaps (see Fig. 3).

Fig. 3
figure 3

Structure of funny point

Conclusion Part

The Conclusion part includes a farewell and a final laughing point. In our system, theConclusion part is a riddle of words related to web news. Figure 4 shows the image of correspondence of web news article and automatically created scenario.

Fig. 4
figure 4

Image of correspondence of web news and scenario

What Is Funny Point?

In Japan, usual points of humor are snappy patter and misunderstandings. The generated manzai scenarios include dialogue of that type. Matsumoto, a professional manzai scenario writer, said that “Manzai scenarios based on news articles are the easiest for people to understand” (Matsumoto 2008). Given that assumption, manzai scenarios are generated based on news articles from the web, using knowledge of many kinds such as word ontology, extraction of sentiment, and similar words obtained from searches of the Internet.

Abe has described the structure of funny points that consist of a gap separating concepts (see Fig. 3) (Abe 2006). He says the concept gap is an important aspect of the humorous points. That point is specifically examined in the generation system of manzai scenarios. Then, the structure of the funny points for dialogue is described based on Abe’s structure. In the structure, the original sentence of web news article is Abe’s common condition. A word which is in an original sentence is concept A and a word which is extracted by our system is concept B.

In this study, gaps of two types, which are important for funny dialogue, are proposed according to Abe’s results. One is a rival relation; the other is a sentiment gap. In the case of a rival relation, rival words are extracted from the web, such as “baseball” and “soccer.” For a sentiment gap, a sentiment is extracted from the original sentence. Then a new sentence having the opposite sentiment is created. As described herein, a means to extract rival keyword from the web and to create sentiment gap from the web are proposed. Then, the proposed technique is used to generate a manzai scenario for the robots.

How to Generate Manzai Scenario

The flow of the manzai scenario generation system is the following:

  1. (1)

    A user inputs a keyword to the system.

  2. (2)

    The system gets a news article that includes the keyword from the internet.

  3. (3)

    In real time, the system generates a manzai scenario consisting of humorous dialogues based on knowledge from the web.

  4. (4)

    The robots “Ai-chan and Gonta” (see Fig. 2) perform a manzai scenario in real time.

When a user inputs keywords for Manzai to be watched, we generate the web news-based manzai scenario automatically based on the three parts of actual manzai routines, which are the Introduction part, Body part, and Conclusion part. In each part, the system generates humorous dialogue from web news articles with related information obtained from the Internet. The system transform declarative sentences of web news articles into humorous dialogue using related information from the Internet based on the proposed two types of gap.

Introduction Part

The Introduction part consists of a first laughing point with a greeting and a connection to the theme of manzai to the main part. In the system, the first laughing point with a greeting is a season-related greeting, along with presentation of the theme obtained from the original web content. The system creates a season dialogue database in the system and chose dialogue from the database. In October, for example, the Introduction part of the manzai has greeting of October and web news title is “Bolt Achieves Triple Crown!!” is the following, where A denotes Ai-chan and G stands for Gonta.

  • G: Hello Everyone. It’s October. Halloween season!

  • A: Ah yes, Halloween season is coming. Anyway, have you heard the news?

  • G: Hmm… “Bolt Achieves Triple Crown!!” I did not know that.

  • A: That sounds interesting. Let’s start a manzai routine about the Bolt Achieves Triple Crown!!

Body Part

The automatically generated scenario consists of multiple components in body part. One component is one funny technique. In this study, five types of funny technique which are “exaggeration,” “word-mistake,” “Nori-tsukkomi,” “rival-mistake,” and “sentiment-mistake” are proposed. The proposed structures of funny points are gaps separating dialogue. The manzai scenario generation system generates humorous dialogue based on our structure of funny points of dialogue (see Fig. 3) using gaps of four types, which are number gap, topic gap, rival relation, and sentiment gap.

  1. (1)

    Exaggeration component

The first step to generating dialogue using exaggeration is to use impossible (larger or smaller) numbers. This component is number gap. This humorous technique is sometimes used ordinarily in dialogue routines in Japan. People feel familiar about such types of exaggeration. When a sentence includes numbers, the system increases the numbers by some substantial and unbelievable factor. For example, if a sentence is “Bolt had already won the 100 m and the 200 m,” the exaggerated dialogue becomes

  • G: At the championships, Bolt had already won the 100 m and the “200 km.”

  • A: Two hundred kilometers is way too long! Don’t you think that’s odd?

  • G: You are right! It is not 200 km, but 200 m. I misunderstood.

  1. (2)

    Word-mistake component

This component is an intentional word-mistake in the dialogue. This is topic (word) gap. A typical word-related mistake is to change a single word to another word based on changing only one character in a word, such as “fish” and “dish.”

  • G: “Belt” pulled away at once while the USA botched passing the baton to the anchor runner.

  • A: How can that be? Belt? It’s not belt, but Bolt!

  • G: Whoops. I made a careless mistake.

The Manzai robot has only one dictionary, which is Japanese dictionary for children in the server (Ai-chan) machine. The Japanese dictionary is used to extract mistaken words such as “Belt.” Japanese consists of vowels and consonants. It is a simple matter to change the first consonant to another one. If there is a word for which the first consonant in the dictionary can be changed, it becomes the candidate of the mistaken word. For example, In Japanese “Touhyou (vote)” is changed to “Kouhyou (favorable comment).”

After a word-mistake, Ai-chan, the tsukommi, emphasizes the mistaken words by explaining the mistaken word. In the example given above, Ai-chan explains the Belt as “That’s a flat, long narrow piece of cloth or leather mainly used for fixing an object.”

  1. (3)

    Rival words gap-based dialogue

It is inferred that if the words which are in dialogue have a mutual rivalry, then the system changes the original word in a sentence to a rival word; the dialogue becomes a sentence that includes a misunderstanding. Then, it becomes a funny point. The proposed rival words are “Tokyo” and “London,” and “baseball” and “soccer.” The two words are contrasting pairs. Then, the system extracts a word which is a rival to the keyword and changes the word from the keyword to the rival word. The following presents an example of the dialogue based on a rival mistake:

  • A: … By the way, do you know what “Usain Bolt” is like?

  • G: He’s, you know, famous as the Olympian of the century, right?

  • A: No! You must be confused with Carl Lewis. Usain Bolt is a Jamaican sprinter. He is the fastest sprinter in human history with the nickname of “Lightning Bolt.”

  • G: Is that so? But they are similar enough, aren’t they?

Definitions of proposed rival words are the following:

(3-1) They have the same upper ontology.

The upper ontology of “Tokyo” and “London” is national capitals. That of “baseball” and “soccer” is ball sports. Each pair shares the same upper ontology. When the system extracts the upper ontology, the hierarchy structure of Wikipedia is used. At this time, the upper ontology of a word is not just one ontology; usually a word has multiple upper ontologies. All upper ontologies are used to extract rival words. The rival words are child words of the upper ontology. Then many words can become candidate rival words. The system regards the candidates of rival words as ranked. The top-ranking word becomes a rival word. At this time, it is considered that an upper ontology that has few child words is more important than an upper ontology that has many child words because the former has a higher instance level than the latter. The system then calculates the ranking of a candidate of rival words using the following expression:

$$ \mathrm{S}\mathrm{t}\mathrm{a}\left({s}_i\right)=1- \log \frac{n}{N}, $$
$$ \mathrm{R}\mathrm{e}\mathrm{l}\left({e}_i\right)={\displaystyle {\sum}_{i=0}^m\mathrm{S}\mathrm{t}\mathrm{a}\left({S}_i\right)}. $$
(1)

The given s i denotes an upper ontology; Sta(S i ) signifies weight of s i . In addition, n represents a number of s i ’s lower ontology; N is the number of words in the corpus. The given e i stands for a candidate of rival word; Rel(e i ) signifies the ranking weight. In addition, m represents a number of e i ’s upper ontology, which is the same as the keyword. As described herein, the system uses N 2,931,465 words.

(3–2) They have a similar recognition degree.

The candidate rival words of “baseball” are “soccer” and “futsal.” At this time, the degree of recognition of soccer is more similar to that of baseball. Therefore, the system regards soccer as better than futsal for use as a rival word of baseball. That is, the two words have similar degrees of recognition. The system regards the degree of recognition as the number of results obtained from a web search. The similarity of the degree of recognition Con(key, e i ) between keywords key and e i is calculated as follows:

$$ \mathrm{Con}\left(\mathrm{key},{e}_i\right)=1- \log \frac{\left|\mathrm{Cog}\left(\mathrm{key}\right)-\mathrm{Cog}\left({e}_i\right)\right|}{ \max \left\{\mathrm{Cog}\left(\mathrm{key}\right), \mathrm{Cog}\left({e}_i\right)\right\}}. $$
(2)

In that equation, Cog(key) is the number of results of a web search using word key; Cog(e i ) is the number of results of a web search using word e i .

After calculating Rel(e i ) and Cog(key), the system regards the result of geometric mean between Rel(e i ) and Cog(key) as a ranking weight. The word having the highest ranking weight becomes a rival word. Subsequently, the rival-mistake component is created based on changing the keyword to the rival word, and generating a misunderstanding humorous dialogue.

Table 1 presents an example of the rival words we extract.

Table 1 Rival-words are related to keywords

(4) Sentiment gap-based dialogue

Sentiment mistake gaps of two types are proposed in this study: a word sentiment mistake type and a sentence sentiment mistake type.

(4-1) Word sentiment mistake

It is inferred that when the sentiment of the word is mistaken, it is interesting. At this time, the Japanese traditional technique called “Nori-tsukkomi” is used. In a nori-tsukkomi, first the boke (stooge) states some outrageous sentence and plays the clown. Next, the tsukkomi (straight man) gets into line with the boke’s stupid sentence. Then, the tsukkomi sets the right sentence and makes a fool of both the boke and itself. Under such circumstances, there are three techniques for dialogue generation, as described below.

  1. (1)

    How to create the first outrageous statement?

  2. (2)

    How does the tsukkomi agree with the boke’s line?

  3. (3)

    How does one create a corrective sentence?

The word-mistaken technique is used in (1). The word-mistaken technique is where the single character is changed, producing a different word, such as “hose” and “nose.” In (2), a sentiment of the mistaken word is used in (1). At this time, the system extracts sentiment words that co-occur with mistaken words from the web and calculate the co-occurrence ratio. The system specifically searches the web pages using the word that it wants to be extracted a sentiment as a keyword, and the system extracts sentiment words which are adjectives from the snippet of the results. Then, the system calculates co-occurrence ratio between the word and each sentiment word. The sentiment word having the highest co-occurrence ratio becomes the sentiment of the mistaken word. In (3), the system uses original sentence in web news article. An example of Nori-tsukkomi follows:

  • G: “Belt” pulled away at once while the USA botched passing the baton to the anchor runner.

  • A: Stop right there. Belt is really long.

  • A: … Wait.

  • A: How can that be? Belt? That’s a flat, long narrow piece of cloth or leather mainly used for fixing an object, isn’t it! It’s not belt, but Bolt!

  • G: Whoops. I made a careless mistake.

In (2), there is agreement based on the fact that his nose is long and his impression word of nose is “long,” but after that agreement Ai-chan mentions the mistaken word. At this time, the system extracts a sentence mentioning the mistaken word from Wikipedia. The Wikipedia sentence is usually a hard and formal sentence, which emphasizes a humorous contrast with the humorous dialogue.

(4-2) Sentence sentiment mistake

It is considered that the sentence in a dialogue has opposite sentiment to the original sentence. It is a misunderstanding of dialogue. It becomes gap of structure of funny points. For example,

  • G: “Bolt Achieves Triple Crown.” It is very sad news.

  • A: What? Why do you think the news is sad?

  • G: Because “Bolt Achieves Triple Crown.” I feel very sad…

  • A: Do you want to get triple crown? Are you so fast?

The system calculates the sentiment of the sentence and changes the word which has opposite sentiment of the original sentence. Then Ai-chan corrects the sentiment of the Gonta. In above case, correct first sentence of the Gonta is “Bolt Achieves Triple Crown,” and it is a good news article in a web news article. The system changes the “achieve” to “lost” and changes the sentiment of the sentence.

When the system generates dialogue based on sentiment mistake gap, multidimensional sentiment that is three bipolar scales and proposed by Kumamoto (Kumamoto et al. 2011) is applied. Kumamoto’s three bipolar scales are sufficient to calculate web news article sentiments because he creates the bipolar scales using news articles. The bipolar scales are “Happy – Sad,” “Glad – Angry,” “Peaceful – Strained.” Generating a sentiment mistake gap-based dialogue is done as follows:

  1. (1)

    The system calculates the sentiment of each sentence in a web news article using Kumamoto’s sentiment extraction tool (Kumamoto et al. 2011). At this time, the results are sentiment values in each bipolar axis. The results are normalized such as value of 1.0 – 0.0 means left side axis and 0.0 – −1.0 denotes the right side axis. For example, when the result is happy – sad as 0.12, glad – angry as 0.26, and peaceful – strained as −0.07, the sentiment value of the sentence becomes happy as 0.12, glad as 0.26, and strained as 0.07.

  2. (2)

    The system infers that the highest sentiment value of a sentence is a sentiment A e of the sentence. In a case of (1), the glad becomes the sentiment of the sentence.

  3. (3)

    The system extracts the word W e which has highest value of A e . Then we extract antonym T e of W e from the antonym corpus.

  4. (4)

    A new sentence using T e is created.

Conclusion Part

The conclusion part consists of a farewell and the final laughing point. The system uses an automatically generated riddle in our Conclusion part. Japanese people are familiar with riddles. They have been used for entertainment for a long time. In Japanese, riddles are usually “How are X and Y similar?” The answer is typically some form of homonym or pun: Z and Z′. For example,

  • G: What is the similarity between Bolt and driver?

  • A: I do not know. What?

  • G: Both of them have a track (“truck”)!!

In this example, Bolt is X, driver is Y, track is Z, and truck is Z′.

The system first extracted the word X (Bolt) from the web news article. Next the system extracted words that have a high co-occurrence ratio to X from the Internet as candidates for word Z. The system next extracted homonyms of candidates of Z from the dictionary, which is inside AI-chan. If the candidate of Z has a homonym, then it becomes Z. The homonym becomes Z′. Subsequently, the word having the highest co-occurrence ratio to Z′ is extracted from the Internet. It becomes Y. Then, the system generated dialogue based on X, Y, Z, and Z′.

Figure 5 shows an automatically generated scenario from web news article in Table 2.

Fig. 5
figure 5

Example of automatically generated manzai scenario

Table 2 Original web news article

Manzai Robot System

This section describes a manzai robot system that participates in a manzai performance in accordance with the manzai script generated by a manzai scenario generator. First, the configuration of the manzai robots is illustrated. Next, implementation of the manzai robot system is outlined and the manzai scripts are analyzed. Finally, experimental verification of the manzai robot system is discussed.

Manzai Robots

Figure 2 shows the manzai robots developed in this study. The taller robot, ii-1, Ai-chan, which is about 100 cm tall, performs the role of tsukkomi – the straight woman – and the shorter one, ii-2, Gonta, which is about 50 cm tall, performs the role of boke – the stooge. Tsukkomi and boke have fixed roles.

Each robot has a computer on its back and the two communicate via a wireless LAN. The computer on tsukkomi is a server connected to the Internet that directly obtains articles and automatically creates manzai scripts.

The two robots each have the following functions:

  • Locomotion and rotation using Pioneer 3-DX (Mobile Robots, Inc.)

  • Creation of facial expressions by switching images on the eye display

  • Speech generation of a synthesized script-based voice

The manzai robot system manages operations using these functions in accordance with the manzai scripts which are the manzai scenarios for the manzai robots.

System Configuration

Figure 6 shows the configurations of the manzai robots. The PC mounted on ii-1 creates manzai scripts, time flows, and manzai performance schedules. The PC mounted on ii-2 communicates with the PC on ii-1, receiving instructions on lines, motions, and expressions to create the data for robot motion and speech.

Fig. 6
figure 6

System configuration of manzai robot system

After the scripts are created, the two robots perform script-based manzai. The manzai script progress management program running on the PC on ii-1 obtains and processes the information for the next lines to be spoken, facial expressions, and motions based on the progression of the script. When it is the turn of ii-2 to perform, this information is transmitted to the PC on ii-2.

On receiving the information, the operation program running on the PC creates synthesized voices, facial expression changes, and robot motions as required. Facial expressions change automatically corresponding to emotions, expressions, and the eye movements making up facial expressions. The robots operate based on the operation commands they receive.

On completion of the motions that drive the robot mechanism, termination information is sent to the script progress management program of the PC mounted on ii-1. The system then moves to the next session of the manzai script based on this information.

Command of Robot Motion for Manzai Performance

Manzai scripts for manzai performance are served as extended markup language (XML) scripts (Nadamoto and Tanaka 2005). XML scripts are generally used for motion media robot systems and networked robots to facilitate control based on scenarios (Kitagaki et al. 2002; Tezuka et al. 2006). The manzai robot system automatically generates the manzai scripts from WWW news articles corresponding to the keywords given by the audiences. The manzai robot system analyzes the manzai script, then each robot moves and makes a speech in the skit according to the manzai script.

The sentences in each section of the manzai script are the control command of the manzai robot system. The control command consists of three types of commands: motion, expression, and speech of dialogue in the scenario. The following are examples of control commands: (Note that “Mary” and “Bob” in the manzai script correspond to ii-1 and ii-2, respectively.)

<look name = ”Mary”, what = ”audience”>

This command indicates information about the direction of the robot in manzai performance, that is, “which” robot is directed in “which” direction. In this example, the manzai script intends that “Mary” look directly to “audience” (face the audience).

<PEmo name = ”Bob”>PE11/>

This command indicates information about the facial expression of the robot in manzai performance; “which” robot expresses “which” facial expression. The number of “PExx” is the intention of the facial expression; therefore, the robot shows the facial expression according to the command. “xx” is the number of facial expression patterns.

<cast name = ”Mary”>Speech is here.</cast>

The command indicates information about the speech of the manzai script in manzai performance; “which” robot makes a speech and its contents.

The robot analyzes the commands for manzai performance, then the robots are controlled according to the command, for example, synthesizing speech, movement, changing the facial expression.

The potential of the manzai robot system was verified by implementing a full manzai robot system including an automatic manzai script creation system and a management system using real robots.

Experiments

We conducted user experiments to assess the benefits of our proposed humorous dialogue automatic generation system. We conducted experiments with 11 participants, all of whom are Japanese and who like Manzai. We use Manzai scenarios of two types generated automatically by our system. The Manzai robots, Ai-chan and Gonta, perform the Manzai routine using our Manzai scenario. We evaluated two aspects of our system based on each part (or component) of the Manzai scenario and the whole Manzai scenario. We want the scenarios to be interesting and understandable. We asked participants to complete a questionnaire. Participants responded to the questionnaire on a five-point scale (5, highest; 3, middle; 1, lowest) after watching the Manzai performance.

Interest and Comprehension Ratings of Each Part and Component of the Manzai Scenario

We evaluated the interest and comprehension of each part and component of the Manzai scenario routine. We asked participants to report their interest and comprehension of the three structural components of the Manzai scenario: the Introduction, Body, and Conclusion. We also asked them to report their interest and comprehension of five components: exaggeration, word-mistake, nori-tsukkomi, rival-mistake, and riddle.

Results and Discussion

The results of the experiment are presented in Fig. 7. The results show a high score, meaning that our proposed Manzai scenario technique produces interesting and understandable routines. The rival-mistake component elicited the lowest score, meaning that usually Japanese people use rival mistakes in humorous dialogue. In each part, which are the introduction, body, and conclusion, the results also show a high score. We regard these structures as familiar to Japanese people because real Manzai routines have the same structure.

Fig. 7
figure 7

Results of interest and comprehension for total manzai scenarios (Mashimo et al. 2014) © 2014 Association for Computing Machinery, Inc. Reprinted by permission

Interest and Comprehension of Whole Manzai Scenarios

We evaluate the interest and comprehension of whole Manzai scenarios. We asked participants to complete a questionnaire including the following questions:

  1. (1)

    How interesting is it to watch Manzai robots?

  2. (2)

    Is the speaking speed of Manzai scenarios appropriate?

  3. (3)

    Do you feel that the number of component types in the scenario is sufficient?

  4. (4)

    Is the Manzai scenario understandable?

  5. (5)

    Did transforming the news page into a Manzai scenario make it easier for you?

Results and Discussion

The experiment results are presented in Fig. 8. The average of Q1 is 3.45; seven participants answered 4. Therefore, our proposed method generated the humorous dialogue. To Q2, four participants answered 2. They said that the speaking speed of the Manzai robots was bad. However, two participants responded that it was a little bit fast. The impression of speaking speed apparently differs among people. The average of Q3 was 3.82; no participant answered 2. The result means that the component of humorous dialogue is sufficient. The average of Q4 is 3.36; Q5 is 3.36. Our generated humorous dialogue does not lose the original news article information. Therefore, we can deliver news articles using humorous dialogue.

Fig. 8
figure 8

Results of interesting and understandable total manzai scenario (Mashimo et al. 2014) © 2014 Association for Computing Machinery, Inc. Reprinted by permission

Component-Based Manzai Robot System with Scalability

This section describes a component-based manzai robot system with a small body and the scalability of the software development for the manzai robot system. First, the objectives of the component-based manzai robot system are illustrated. Next, the implementation of the component-based manzai robot system is outlined. Finally, verification of the system is discussed with respect to scalability of the robot system, including the development of manzai robots and the associated interactive information system.

Development of Component-Based Manzai Robot System

The conventional manzai robot system has several problems (Umetani et al. 2014):

  • The size of the manzai robots is quite large; a manzai robot system consists of two generic PCs, mobile robot bases, and their bodies. Therefore, it is difficult to conduct manzai performances for experimentation.

  • The robot system for manzai performance is complicated. Two robots are needed to perform a manzai skit. Further, distributed controllers for each robot and a management system for the overall robot system are required. It is difficult for the conventional manzai robot system to add new functions for performances.

  • The software development environments for robots change during software life cycles.

Consequently, a component-based manzai robot system was developed to overcome these problems. The requirements of this manzai robot system are as follows:

  • The robot system should have sufficient portability.

  • The robot system should perform manzai using the manzai script used by the conventional manzai robot system.

  • Distributed middleware should be applied to facilitate manzai performance data communication between robots. The software of the system is built using individual software components; therefore, it is easy to reuse the control software.

  • The developed manzai robot software components are executed on the conventional manzai robot system with minimal changes to the hardware of the system such as the mobile robot base, display, and the user interface parts.

Figure 9 shows the developed component-based manzai robot system (Umetani et al. 2015). To perform the manzai skit, we use two robots; the left one is ii-1 s (Ai-chan), the right one is ii-2 s (Gonta). The height of each robot is 250 mm for ii-1 s and 150 mm for ii-2 s. The width of both robots is about 150 mm. The RT middleware (Ando et al. 2005) was utilized for development of the distributed software components of the manzai robot system.

Fig. 9
figure 9

Component-based manzai robot system (Left: ii-1 s, Right: ii-2 s)

Implementation of the Component-Based Manzai Robot System

System Configuration

The RT middleware scheme for the manzai robot system was applied to develop additional functions for the manzai robot system such as a control interface. Figure 10 shows the system architecture of the manzai robot system.

Fig. 10
figure 10

System architecture of the component-based manzai robot system

The robot system is executed by two generic PCs using Windows OS, with each PC controlling one robot. The PC for robot ii-1 s has the following roles:

  1. (1)

    Generation of manzai scripts

  2. (2)

    WWW server for facial expression data (managed by Apache WWW server)

  3. (3)

    Speech synthesis of the dialogues for manzai performances

  4. (4)

    Time management of manzai script during manzai performances

  5. (5)

    Control of mobile robot base of ii-1 s, dialogue speech playback, and web page management for facial expressions

The PC for robot ii-2 s is responsible for the control of the mobile robot base of ii-2 s, playing of the speech of dialogues for ii-2 s, and web page management for facial expressions. The distributed control scheme of each robot for manzai performance makes changing the speaker for manzai performances and management of hardware resources easier. On the other hand, synchronization of manzai scripts during the manzai performance and a control scheme for each networked PC is required. Hence, the RT middleware is applied to control the networked distributed robot system.

The robot system uses Vstone Beauto Rover RTC BT as the mobile base of the manzai robot. An iPod touch viewer is used for the facial expression of each manzai robot. The output of the speech voice of manzai scripts and control of the mobile base are executed by the individual generic Windows PCs. The main controller PC is on the wireless network. A WWW server for the facial expression, viewer for balloon information system, and viewer for the user interface are executed on the controller PC. The facial expression is expressed by a web browser executed on the iPod touch mounted on each robot. The facial expression of each robot is updated by updating the web page. Figure 11 shows examples of the facial expressions of the component-based manzai robot.

Fig. 11
figure 11

Examples of the facial expressions of the component-based manzai robot

Implementation of the Component-Based Manzai Robot System

The component-based manzai robot system was implemented as a set of RT components. The RT components for the manzai robot system are as follows:

  1. (1)

    Control of manzai robot system

  2. (2)

    Facial expression during the manzai performances

  3. (3)

    Control of each mobile robot base

  4. (4)

    Speech generation during the manzai performances

Components (2), (3), and (4) are constructed for each robot. Each software component controls the various functions of each robot.

Figure 12 shows how the RT components are connected to the manzai robot system. “Mary” and “Bob” signify the components for ii-1 s and ii-2 s, respectively. The OS on the main controller PC was Microsoft Windows 7. Open RTM-aist 1.1.0 (Python) was applied for the manzai controller, manzai script generation, speech synthesis, speech generation during the manzai performance, and control of the facial expression of each robot. Open RTM-aist 1.1.0 (C++) was utilized for the controller of the mobile base. To reduce the calculation burden of the speech synthesis, all the speech voices for manzai performance were synthesized on a main controller PC in advance. Each function of the RT components is explained as follows:

Fig. 12
figure 12

Connection of RT components in the manzai robot system

Manzai_component: This component deals with management of the manzai robot system. First, the component synthesizes the speech voice of dialogues during manzai performances, and generates the control scripts for each component from XML files generated by the automated manzai script generation system. Then, it sends the control scripts for each PC. Following synthesis of the speech voice and generation of the control script, the component controls the speech dialogue of the manzai script, each mobile robot, and facial expression control components according to the XML file of the manzai script. The component also receives completion information regarding the playing of speech by the speech generation components, and then manages the manzai script during the manzai performance.

  • manzai_soundplay, manzai_soundplay2: These components play the speech data file during the manzai performance. They receive the information for playing dialogue speech from manzai_component. On completion of the speech, they send completion information to manzai_component.

  • RobotRotate_Angle, RobotRotate_Angle2: These components control the motion of each mobile robot. They receive the information for manzai performance from manzai_component.

  • manzai_PEmoMary, manzai_PEmoBob: These components control facial expressions during the manzai performance. They receive information for manzai performance from manzai_component.

The manzai controller component generates scripts for control of each RT component from the original manzai scripts. The control script of the RT component enables reduction in the amount of communication between RT components, unification of the control order for each component, and independence of each component in the manzai robot system. Following synthesis of the speech of the manzai script, the RT component analyzes the manzai script.

The component outputs the control script for each component of the manzai robot system, such as control of mobile base, output of the speech, facial expression, and the balloon information system. Then, the control component sends the control script for the manzai performance to each RT component. Each RT component executes the control script according to the position of the control script sent by the control RT component. Therefore, restarting and recovery from error conditions are easily accomplished. Its robustness against network trouble is an improvement over the conventional manzai robot system.

Experiments were conducted to verify the component-based manzai robot system. The controller RT component synthesized the speech of the manzai script, and then each component outputted the speech, generated motion, and expressed the facial expression from the original manzai XML scripts for the conventional manzai robot system. The manzai XML scripts are not changed for manzai performance by the component-based manzai robot system. Therefore, manzai performance execution by the other manzai robot system was realized.

Balloon Dialogue Presentation System for Component-Based Manzai Robot System

The balloon dialogue presentation system was developed as an aid in scenarios where the audience cannot hear the manzai performance clearly. This system presents the speech of the manzai performance synchronized with the progression of the manzai scripts. Figure 13 shows a screenshot of the balloon dialogue presentation system. The system outputs the presentation using a web browser executing on the PC. The system shows an icon representing each robot and the speech in the script enclosed within the balloon from the robot’s icon. Using the system, the audience can know “which” robot is currently speaking.

Fig. 13
figure 13

Example of balloon dialogue presentation (dialogue is expressed in Japanese)

The presentation system is called by the controller component of the manzai performance. The system was developed in C#. The system updates the web page of the balloon dialogue presentation, and then controls the location of the progression of the manzai script.

The system has to execute synchronously with the progression of the manzai performance. In the conventional manzai robot system, the system is complicated; adding such a presentation system that automatically executes in synchrony with the manzai performance is difficult. The result of the presentation system shows the scalability of the manzai robot system with respect to ease of addition of new functions to the system.

Scalability of Component-Based Manzai Robot System

To demonstrate the scalability of the system, we replaced the other mobile robot base with the robot base of the component-based manzai robot system. The mobile robot base was made by The Yujin Robot Co. Kobuki. The Kobuki_RTC component is used as the robot controller component (KobukiRTC 2013). The input values of the Kobuki_RTC are the angular and translational velocities of the mobile robot; the RT component for the control input of the robot base from the manzai script was constructed. With minimum change in the software component, the robot system operated well in the manzai performance. From this experimental result, the component-based system validated high scalability with respect to control of the different types of mobile robots using the same manzai script for the manzai performance.

Development of a component-based large-size manzai robot system with a function for sensing its surroundings in order to improve the interactive performances, and the robustness of the robot system under long-term experiments will be carried out in future works.

Discussion: Potential of Component-Based Manzai Robot System

From the results of implementation of the component-based manzai robot system, and application to the other type of mobile robot system, the feasibility and scalability of the proposed manzai robot system were demonstrated. The independent robot controllers for each robot using generic Windows PCs enable high portability and flexibility for the manzai robot system. The role of the robots in the manzai performance can easily be changed by changing the connection of the data ports between RT components, because the contents of the RT components for each robot are the same.

In addition, the control script for the manzai robot system enables the manzai robot system to be robust and scalable. The number of communication packets used for the synchronization and management of the manzai scripts during the manzai performance is also reduced. Even if the manzai performance is executed under the condition that there are many wireless LAN clients surrounding the manzai robots, the manzai performance by the robots can continue to completion. When there are many wireless LAN clients surrounding the manzai robot, that is, there are many spectators surrounding them, the performance of the wireless LAN is degraded. Thus, the reduction of communication packets makes the manzai robot system robust with respect to the communication network for the manzai performance.

In this study, an iPod touch viewer was mounted on each robot and used only for facial expressions during the manzai performance. The user interface of the manzai robot system using the viewer will be addressed. Moreover, the extension of the robot system using smartphone devices mounted on the manzai robots, such as the connection to other information using the device, is planned for future works.

The robot system currently performs the manzai without outer-sensing devices such as microphones and vision sensors. There are also desirable parameters for manzai robot systems such as tone. The speech synthesis component can change the voices of the manzai scripts, the volume, and speed of speech according to the conditions of the surroundings of the adjustment of the parameters. A “feedback” mechanism using the outer sensing devices mounted on each robot and flexible generation of the manzai performance will be addressed in future work.

Conclusion

This chapter introduced a manzai robot system, that is, an entertainment robot that is used as a passive media based on autocreated manzai scenarios from web news articles. Manzai is a Japanese traditional standup comedy act that is usually performed by two comedians: a stooge and a straight man. The manzai robots automatically generate their manzai scenarios from web news articles based on related keywords given by the audience and from search results on the WWW, and then perform the manzai scripts. Each manzai script comprises three parts: tsukami (the beginning of the manzai greeting), honneta (main body of the manzai script), and ochi (conclusion of the manzai performance). The style of a manzai script is “shabekuri manzai,” which means talk constructed from only the manzai scripts.

The proposed manzai robot system is focused on content generation. The manzai scripts are created automatically using data-mining techniques and the methodology of the manzai performance from WWW news articles. Manzai scenarios are generated based on news articles from the Internet, using many kinds of intelligence techniques such as word ontology and similar words obtained from searches of the Internet. Subsequently, the manzai script for the manzai robots written in XML is generated automatically. Then, each robot performs each role in the manzai script.

This chapter also introduced a method for automatic creation of the manzai scripts from web news articles, and management of the manzai robot systems. Then, a component-based manzai robot system that facilitates the creation of a scalable manzai robot system was presented. This chapter also verified the potential of the manzai robot system by implementing an automatic manzai scenario creation system and management systems using real robots.

The robot system performs the manzai without outer sensing devices such as microphones and vision sensors. It is desirable that the parameters of manzai robot systems such as the tone, volumes, and speed of speech be manipulated corresponding to the condition of the surroundings of the manzai robots. The speech synthesis component can change the voices of the manzai scripts by adjusting such parameters. A “feedback” mechanism using the outer sensing devices mounted on each robot and flexible generation of the manzai performance will be addressed in future work.

In addition, a robot interface using the robot body, the development of a component-based large-size manzai robot system, a sensing function for the surroundings of the robots to improve interactive performances, and enhancement of the robustness of the robot system under long-term experiments are planned for future work.

Cross-References