1 Introduction

This paper proposes a comic-style chat interface that incorporates the expression techniques used in manga, that is Japanese comic. A wide range of information is visualized and conveyed to the reader of a Japanese manga. This information is not limited to attribute information of characters that can be represented in a single drawing, such as gender, age, and appearance; it also includes, for example, the characters’ visual line direction, various symbols arranged within the screen, the adjustment of the brightness of the screen, the way the background is drawn, the way these elements are combined, and the way the frames are lined up on a two-dimensional surface. By incorporating these techniques, this paper attempts to explain how to create a screen with a simple reading order and a visual that easily conveys the selected emotions.

In the field of communication over networks, there have been many proposals for graphical dialog interfaces that combine drawings and characters represented by 2D/3D models [1,2,3,4]. Some of the useful aspects of interfaces that convey information visually include assisting the understanding of the message content, improving affinity and motivation, and facilitating the interaction between users. Drawing is said to be another language of human beings; a communication medium using vision [5].

The characters used in graphical dialog systems act as agents for users, to express their moods and intentions. Such information cannot be conveyed in a traditional text chat message. However, as long as the characters have their “faces”, users are required to control the appropriate facial expressions of the character in order to avoid misleading their interlocutors. It is burdensome for users to manually choose various emotions for each message while chatting [6]. We consider ways to reduce this burden.

In systems that represent interlocutors using avatars, the appearance of the avatar makes them appear human. These systems do not consider the possibility of reading back through the dialog at a later time [7]. Some of systems that use avatars to display information do not consider the interaction between the provider and the receiver of information [3]. As dialog interfaces that leave a record of the conversation, these systems have room for improvement.

In addition to avatars, the format of a cartoon is also used as an interface to visualize information or conversations [8,9,10,11,12]. This makes it easier to grasp the contents of message. However, such systems have problems whereby the objects that users can operate are limited or it is difficult to understand the order of message: it is visually monotonic, owing to which the operation becomes burdensome.

This paper proposes a chat interface that incorporates the techniques of expression used in manga. The system implements a screen that is easy to understand in terms of the order of message, and attempts to convey selected emotions for visual comprehension. The main contributions of this study are as follows. The chat interface makes it possible for chat users to express emotions more easily and richly than existing systems. In addition, the chat interface makes the chat conversation more exciting and enjoyable.

This paper is organized as follows: in Sect. 2, we will describe the problems about existing tools to output comic style records and related chat systems to support conveying nonverbal information. In Sect. 3, we explain how our system supports emotional chat. We will show the experimental result on whether users can add nonverbal information to chat messages intuitively in Sect. 4. Finally, we will discuss some conclusions and our future steps in Sect. 5.

2 Related Works on Transmitting Nonverbal Information

2.1 Images in Communication Support Systems

Pictographs and stamps are used as pictures on cellular phones and instant messengers. There are various types of pictographs, such as those expressing emotional states, or depicting animals, plants, and buildings. Images also can be used to complement the meaning of text in communication services such as LINEFootnote 1, TwitterFootnote 2, and SkypeFootnote 3. The images are used with the object of emphasizing a particular sentence or intention. In these tools, it is possible to create one meaningful work by using a series of images. On the other hand, the arrangement of the posted images is limited to the arrangement lined up in the vertical space, with no further combinations of graphical expression possible. It is impossible to combine more graphic representations. Figure 1 shows an example of LINE messages.

Fig. 1.
figure 1

An example of LINE messages.

2.2 Communication Support Systems with Graphical Interface

A number of tools have been developed to support online communication on chat applications, instant messengers, and online games. Users can employ avatars as their agents in these systems. An avatar has a human-like body that can easily express the user’s emotions as well as emoticons and pictographs can. In systems that represent emotions with an avatar, users can exchange their intentions easily. However these systems do not consider the possibility of looking back on the dialogue at a later time. Also, systems that use avatars to display information do not consider the interaction between the information provider and receiver. As dialogue interfaces that leave a record of the conversation, these systems still have room for improvement.

Besides avatars, the format of a comic is also used as a visual interface. Comic Chat is a system in which the users talk in a comic-like format with a variety of characters [7]. Every time the user sends a message, a square frame is displayed in sequence from the left to right. The characters, speech balloons, and backgrounds selected by user are drawn in the frames. The user can also use an emotion selection circle on the bottom-right of the input screen to specify an emotion and its degree. When a message is sent, the character with the selected emotion and a balloon with the text inside are displayed in the square frame along with the background. The character’s position within the frame, the shape of the balloon, and the perspective are automatically defined. The positions of the balloons and the characters are fixed at the top and the bottom of the frame, respectively. In this system the composition is limited to a pattern that represents the characters from the side. As the shape of the frames are limited to squares of equal size, the interface makes it difficult to grasp the direction of the dialog, the elapsed time, and the development of the narrative. The circular interface of facial expression selection has problems as well, as it is difficult to identify the expression strength accurately.

Comic Diary is a system using which it is possible to write a diary in the format of a comic [8]. The user can save and share personal experiences and records in a format with excellent readability. The system automatically creates a story with an introduction, development, and a conclusion, based on the user’s historical behavioral data. The system selects the most suitable parts of comics from a material database. The creation process of a diary is very easy. On the other hands, the user cannot become involved in the creation of the plot or the selection of the comic-related parts, such as the expressions of the characters’ emotions. The shape and size of the frames are fixed as in Comic Chat, and the number of frames is limited to no more than 12.

Thawonmas et al. proposed a system that summarizes the behavior of 3D online game players in a comic format based on the game log [9]. It divides the play content into multiple sections based on actions and events within the game, such as dialogs between players and switches of viewpoints. It sorts events within the sections into multiple frames, defines the shapes and sizes of the frames, and creates a comic using them in many formats. Based on the frame composition and layout, the player can comprehend the storyline, the elapse time, and the rhythm. In this system, the screen images and balloons, captured at automatically defined times, are simply laid out in the frames. It is not possible to add other symbols or information to explain a given situation. For this reason, it is not possible to express customized messages that the player might want to convey.

AVACHAT is a comic style communication interface [11]. The avatar agents communicate in the virtual worlds and the 3-D word balloons are displayed above their head. When an agent speaks “loudly”, the chat text is placed in a bigger balloon with a bigger bolder font. Comic Live Chat [12] also display transmitted texts in a balloon below the image of the sender. These systems express the remark as a balloon around a person in the chat screen. However, they do not consider the size and the placement of frames, and background effect.

Some systems have been proposed that automatically transform a video into a comic book. Tobita’s system [13] allows users to control the frame size and the position by simple manipulations. Chun’s system focuses on the placement of balloons in a frame. The system optimizes the positions of balloons considering not only the reading order but also the relationships between the balloons and their owners. However, a mechanism for expressing emotions is not implemented in these systems.

As mentioned above, systems with interactive interfaces that use drawings require visual improvement. Manga applied in some systems uses many elements to visually represent the characters’ facial expressions and the situations they encounter. These include the direction of a character’s visual line, various symbols and patterns arranged in the background, the shapes of the speech balloons, and frame arrangements. This paper aims to create a system that incorporates these techniques to express user emotions in a format that is visually easy to understand, instead of simply lining up the frames. The system also reduces the burden on the user due to the inputs for emotions.

3 Manga-Style Chat System

3.1 Components of Manga

Our goal is to develop a chat system that incorporates representations from manga, so that the layout of text and the drawings convey information over and above the text content. Manga has many rules to visually represent and transmit information.

Manga consists of drawings within a space delimited by thick lines called a frame. It is possible to convey a variety of information using the layout of the drawing, the text, and the frames. We discuss the main elements and techniques of expression for visual representation and communication used in manga.

The basic element of manga is the frame. A frame is composed of a drawing of characters as well the background, speech balloons, patterns, and symbols drawn on many layers of dots, lines, and texts. The thick lines that indicate the position of the frame form a rectangular shape, and may take on other polygonal shapes as well depending on the content to be represented. These frames lined up on a 2D place form the structure of a manga. The frames have temporarily connected contents, and are arranged in a manner that guides the reader from one scene of the story to the next. The layout and the change of frame size also represent the elapse time or temporal and spatial breaks. Evenly lining up frames of the same size represent a uniform time flow.

Figure 2 shows a typical example of frame arrangement in Japanese manga. The reader starts reading from the frame on the right end at the top to the left at the same level. When the left end is reached, the reader moves to the frame on the right end of the level immediately below and reads it to the left. By adjusting the size and shape of the frames and guiding the reader’s eyes in an irregular way [14], it is possible to express emotions better. It is also possible to express the spread of space and the atmosphere of the given place, time, and tempo. This way of expressing movement through frame arrangement and frame composition is a technique unique to manga, as is the drawing method [15].

Fig. 2.
figure 2

The typical sequence of frames in Japanese manga.

A speech balloon represents the character’s speech. The serifs are written in balloons. Most shapes of the balloons used in American comics are used to explain actions, such as shouting and thinking, rather than to express the characters’ psychological condition. The use of fonts and balloons of varied sizes is a technique that originated in Japan [16].

In Japanese manga, the background is simplified, and contains symbols that represent the characters’ psychological states. The reader can divine the situation from the volume of the drawing, or the impression of light and darkness afforded by the drawing itself.

3.2 Design Policy

We use a variety of techniques of expression used in manga to improve the user’s experience of the interface.

  • To avoid the impression of monotony and a lack of change, our system employs a composition that views the dialogue by two characters from above or from below. The user can also change the size of the frame to make it easy to understand the time flow.

  • To reduce the burden of selecting and entering the appropriate avatars or emoticons, we limit the types of representations available. Prevalent systems that allow the selection of emotions offer an interface with several types of emotions [7, 17]. We also narrow down the types of emotions that are frequently expressed in conversation, and limit the selectable emotions to reduce burden.

  • The system allows users to combine various manga elements instead of narrowing down the number of emotions. Balloons drawn with rounded lines give a soft and calm impression. When combined with sharp lines, they convey a sense of aggressiveness and urgency. A particularly famous symbol called “manpu” in Japan consists of a line in the shape of a drop. Sweat marks, very different from the shape of actual sweat, are used as to represent an emergency or confusion. In addition to sweat, the shapes of lights and bubbles, fog, blue veins, and bandages adorn drawings in a frame. We incorporate these drawing techniques into the proposed chat system.

3.3 System Overview

The proposed system consists of PHP, Apache and JavaScript with jQuery. JavaScript determines the frame image to be displayed on the clients’ screens based on the input data including the text message, the size of the frame, the shape of the balloon and the emotion. The input data is sent to the web server that is running PHP. The server sends the data to each client and the clients draw a new frame on the screen using JavaScript.

The proposed system consists of a server and clients connected to a network. When a user starts the system, a window comes up as shown in Fig. 3. The main space displays frames with pictures and chat messages. Users enter chat messages and select the sizes of frames from the upper right tab. They click the illustration of the relevant emotion and the balloon; the system then determines the position of the new frame and draws it along with the message, the balloon, and the character.

Users can access the system from their smartphones or their personal computers. Figure 4 shows an example of a screen on a smartphone.

Fig. 3.
figure 3

Overview of the proposed system.

Fig. 4.
figure 4

An example of a screen on a smartphone.

Fig. 5.
figure 5

Characters in the proposed system.

3.4 Emotions of Characters

Preliminary Research. An overview of the two characters of manga is provided in Fig. 5. Users can choose a character, but two users cannot use the same character at a time. These characters can represent eight kinds of emotions: “Joy”, “Surprise”, “Sadness”, “Thinking”, “Neutral”, “Doubt”, “Confusion”, and “Laughing”. These eight kinds of emotions were adopted based on the results of preliminary research involving college students who used online communication tools every day.

The preliminary research was carried out in two steps. We classified the emotions used in daily conversations into 10 categories, with reference to the four types of emotions used in related systems [17], the eight basic emotions of Plutchik’s wheel of emotions [18], and the six basic emotion of Ekman [19]. From messaging exchanged through tools in daily conversations with the experimenter, two sets of conversation sentences were extracted for each emotion for a total of 20 sets. We handed papers with the 20 sets printed on them to 16 participants and asked them to answer questions about the kinds of emotions they selected for each conversation sentence. The answer choices were the above 10 types of emotions and “Others”, and multiple answers were possible. When there was no corresponding emotion, the participants were allowed to describe reasonable, customized emotions in a free description field. Based on the results of this survey, we selected the eight emotions to be used in our system.

Characters. We drew illustrations to be used in the proposed system using mainly the character’s facial expressions, actions, manpus, background, and effect lines. We adopted the techniques of expression frequently used for drawings in manga to express each emotion. We adjust the brightness, balloon shape, and representation of symbols within the frames for each emotion.

In the frame showing the emotion “Joy”, a woman raises the corners of her mouth, narrows her eyes, and moves her hands to express her pleasure. Short, thin lines are drawn on the background outward from the character. The background is drawn to represent the appearance of energy.

In the emotion “Surprise”, a man has his eyes and his mouth open wide. The straight lines drawn around the character and the raised shoulders express the movement that tends to occur when one is surprised. The background effect called lightning flash represents the magnitude of the impact.

In the frame showing the emotion “Sadness”, the character expresses a morose appearance with wrinkles between the eyebrows, tears, and a stunned disposition. It expresses dark feelings, with the manpu representing black, dull feelings and a background that becomes darker as it approaches the character.

The frame expressing “Neutral” does not use special signs in the background.

Many short lines are arranged around the character in the frame for “Thinking”. This method gives the impression to readers that the character is shrouded in thought. It has been reported that persons sometimes take their gazes from their interlocutor when they are thinking what to say [20, 21]. We use this observation on the character in thought.

It is possible to generate frames expressing various emotions by combining speech balloons with the drawings of characters.

Fig. 6.
figure 6

An example of arranged frames.

3.5 Arrangement of Frames

The system inserts a new frame to the right of a row when the row has no frame. When the row has one frame to the right, a new frame is placed to the left and a new row is inserted below the given row. An example of the insertion of multiple frames is shown in Fig. 6.

When a user sends a message with an emotion, the display of the sent message varies depending on the size of the empty space. Figure 7 shows an example of the insertion of an empty frame. When a large frame is transmitted in the presence of empty space, or when a middle frame is transmitted in a small empty space, it does not fit the size of the empty space. The system automatically inserts an empty frame into the empty space and displays the sent frame in a new row.

Fig. 7.
figure 7

Insertion of an empty frame.

Fig. 8.
figure 8

Four kinds of combinations involving a text, a balloon, and an emotion.

When users enter a text without choosing a balloon, it is not drawn in the frame. When they do not select an emotion, the character is not drawn. Four types of frames can be created: a text-only frame with no balloon and no character, text within a balloon frame, a character and a text frame, and a combination of text, a balloon, and a character. Figure 8 shows examples of the combinations of text, a balloon, and an emotion.

The system automatically determines the size of the frame for which no emotion is selected. The reason for introducing such a mechanism lies in reducing the burden on the user. When the users emphasize the tempo of their conversation, it is not necessary to select the size of frames at the time of sending a message. When users do not select emotions, the frame size is automatically adjusted according to the empty space of the row.

4 Experimental Results

4.1 Experimental Outline

We conducted a comparative experiment to investigate whether users can intuitively add nonverbal information to chat messages in our system, and if it can reduce the burden on users.

The participants of our experiment were 16 college students accustomed to handling keyboard inputs and online communication. They had all also used emoticons in daily online communication. They were divided into eight pairs and asked to chat while sitting in separate rooms to prevent direct conversations. They were informed in advance about the identity of their conversation partner. The participants were familiar with one another.

The participants used two types of systems, a comparative system and the proposed system. The comparative system was LINE, one of the most popular online communication tools. LINE allows communications through text and multimedia images. A text message is shown herein in a balloon-shaped text box, along with a timestamp and a mark to indicate that the message had been read. New messages are inserted under old ones in the same window. LINE does not allow users to send an image and a text message at the same time. Therefore, text balloons and images are lined up in the vertical chat space.

We instructed each pair of participants to chat for 10 min using each system. Four pairs used the proposed system first and other four first used LINE to account for the influence of order. In both systems, no restrictions were imposed on the subjects with regard to using these systems. The topics of conversation of the participants were the following two themes: a movie and a drama, and an animation that they had recently. At the end of the experiment, we asked the participants to answer some questions.

We denote the experiment that used the proposed system by \(exp_{J}\) and that which used LINE by \(exp_{L}\). The result of the questionnaire are shown in Table 1. Each number signifies the number of participants who selected the given point on a five-point scale.

Table 1. Results of questionnaire survey.

4.2 Expression of Nonverbal Information

As shown in Table 1, the participants’ assignment of scores for item (i) for the proposed system was higher than that for LINE. There was a significant difference between \(exp_{J}\) and \(exp_{L}\) on the Mann-Whitney U test (\(p=0.01\)). One participant described a comment in the free description field of \(exp_{L}\), as follows: “I could supplement my emotions and intentions by adding a matched image to a text message after sending the text message”. Other participants said of the proposed system that the expressions were excellent, and the drawings supported the smooth expression of their emotions and intentions. We concluded that the drawings and images were useful for expressing nonverbal information in both systems. Owing to the significant difference in the test and the scores, the proposed system was found to better express user emotions and intentions than LINE.

With questionnaire items (vii) and (viii), we sought to determine whether changes to frame size and balloon shape helped participants express themselves better in the proposed system. We obtained a median and a mode of 4.0 for both items. In the free description field, the participants wrote the following answers:

  • “The size control function of the frames was useful when I wanted to emphasize my message”.

  • “I could encourage my partner’s next messages by leaving an empty space using small- or middle-sized frames”.

  • “I enjoyed our conversation because I could select balloons according to my messages and thus express my intentions”.

These results also show that changes to frame size and balloon shape were useful for expressing nonverbal information.

Some participants reported finding the distinction between a small and a medium frame convenient. However, they hesitated to use a large frame because it made their respective expressions appear too strong. It is thus necessary to consider the appropriate sizes of frames.

4.3 Transmission of Nonverbal Information

Questionnaire item (ii) shows the evaluations of the received messages. The values for item (ii) for the proposed system exceeded those for LINE, and there was a significant difference between \(exp_{J}\) and \(exp_{L}\) in the Mann-Whitney U test (\(p<0.001\)).

In the free description field, the participants claimed that the proposed system had made it easy to understand their expressed emotions and intentions through the drawings and the frame sizes. Another participant stated: “I understood the contents of the messages more easily by adding the balloons and facial expressions”. Changes in frame size and expressions using shapes of balloons were thus helpful in interpreting nonverbal information. This showed that the participants were able to read situations more intuitively using the proposed system.

It cleared that in conversations using the proposed system, it was easier for users to understand the information and the flow of the conversation than when using LINE.

4.4 Number of Emotions and Selection of Nonverbal Information

Users were allowed to use 88 drawings in \(exp_{L}\) and only eight in \(exp_{J}\). The results of values for item (iii) for both systems in Table 1, showed no significant difference. We obtained a median and a mode of 4.0 for the proposed system. While the proposed system allows for only eight emotions, the system was assessed to be comparable in terms of performance to LINE. We concluded that our system satisfied users as much as LINE in expressing emotions even if it allowed for considerably fewer drawings.

As discussed in Sect. 4.2, we assumed that changes to frame size and balloon shape were useful for expressing nonverbal information. Moreover, item (ix) had a median and a mode of 4.0 for the proposed system as well. The number of usable shapes of the balloons was considered sufficient to express the participants’ emotions.

There was little difference in results between the two methods on item (iv). The large number of emotions in LINE increased the burden on participants to choose the appropriate emotion. Some participants reported feeling stressed from having to choose the size of the frame every time and scrolling the screen for large frames in the proposed system. We need to improve such features of the interface as the display and selection procedures.

4.5 Influence on Conversation

A clear difference between the two systems was observed for item (v). The proposed system had higher scores, with a median and a mode of 5.0, than LINE. There was a significant difference between \(exp_{J}\) and \(exp_{L}\) on the Mann-Whitney U test (\(p<0.01\)). It was confirmed that the proposed system can thus encourage conversation through its drawings. For item (vi), the values for both systems were similar with no significant difference. It was thus equally easy to understand the order of frames and messages, and to follow the flow of conversation in both the proposed system and LINE.

Table 2. The numbers of sent messages, character drawings, and balloons.

Table 2 shows the number of messages sent by all participants, and the characters and the balloons drawn. We rounded the average numbers to one decimal place. The number of sent drawings for \(exp_{J}\) did not include the empty frames.

The drawings and balloons in the proposed system were continually used in comparison with those in \(exp_{L}\). In the survey, we obtained the following opinions from one participant: “I used the LINE stamp at key points as supplement. The drawings in the proposed system are easier to use repeatedly as a means of expressing emotions, as a text and drawings are a set, and it is easy to understand at first sight”. We considered that the high visibility of information was one reason for the high evaluation scores for the proposed system, as mentioned in Sect. 4.2.

The number of sent messages was smaller in the proposed system than LINE because the task the user was required to select many items in order to send a message, such as the size of the frame and the emotion. The arrangement of frames influenced the sense of passing time and the flow of conversation. The number of empty frames was very small for the number of exchanged messages. In the free description field, the participants reported wanting to fill up the empty space in the chat area. We think that they felt uncomfortable about empty spaces and accordingly adjusted the sizes of the sent frames. Thus, further consideration of how to utilize empty spaces is needed.

We did not verify how much the difference between the emotion expressed by the sender and the emotion interpreted by the receiver was present in this experiment. One of the reasons is that the label of the selected emotion and the emotion actually expressed may not always match. For example, if a sender inputs an angry text on the emotion of “Joy”, the receiver may interpret the message as angry even if the label is “Joy”. We read the logs and checked whether there were unnaturally interpreted answer. As a result, no such unnatural communication was found. In the future, it is necessary to objectively investigate the difference in the interpretation of the sender and the receiver from the perspective of the interlocutor, and the perspective of the third person.

5 Conclusion

In this paper, we proposed a chat system that incorporates expressions from manga to convey emotional information intuitively and appropriately. The system displays input texts, drawings containing characters expressing emotions in a background, and a balloon chosen by the user. By repeatedly exchanging frames, users can understand the flow of conversations. They can also easily convey their intentions and understand those of their interlocutors.

We performed experiments to compare the proposed system and with the LINE chat system. The results of a questionnaire and analysis of log data showed that the proposed system can express and convey user intentions and emotions more easily and comprehensibly than LINE. Although the number of sentences exchanged in our experiments was small, the users felt that the conversations were exciting. The participants were required to select some components to send a message. Using this as measure, we concluded that the burden on users imposed by our system was equivalent to that imposed by LINE. In future research, we intend to enable our system to allow users to select more components of conversations and consider the use of empty spaces.

In this interface, emotions can be exchanged properly in daily conversations. However, users may use different emotions in serious discussions. It may not be suitable for use of this interface in such a situation. In this paper, we designed only two patterns of characters. Also in the experiment, the conversation by two persons was the object of the verification experiment. Therefore, it is unclear how the conversation is affected when the system is applied to three or more persons’ conversations. It is necessary to increase the number of characters and verify whether it is possible to understand the order of conversation or whether it is possible to understand the other person’s emotion from the messages expressed by manga.