Keywords

1 Introduction

In today’s globalized society, the ability to understand and communicate with people and cultures from different countries is important. Machine translation (MT) technology can be used as a tool for communication across cultures allowing people from different countries to communicate with each other through MT. For example, in a summer school called “KISSY” organized by NPO Pangaea, children from various countries gathered and worked together using a multilingual communication chat system with embedded machine translation modules. By communicating with people from different countries, cultures, and languages, children can acquire the ability to understand and accept diverse values in a globalized society [9]. These kinds of collaboration are hindered by differences in culture and values, and there are unique ways of saying things in different countries. In order to understand these differences, it is important to strengthen communicate effectiveness. It is important to understand the other party’s expressions and thoughts, and also important to correctly transmit information to the other party.

Even if MT can help with the language barrier by providing translation, creating common ground between the parties still remains [12]. In addition, existing MT technology does not provide accurate translations for low-resource languages (LRL) which have fewer language resources, for example, having less bilingual data available to create MT services for those languages. As a result, LRL speakers are unable to actively participate in conversations including some participants of “KISSY” summer school. LRL speakers said fewer words than other language speakers [9].

This study aims to clarify effective communication strategies for facilitating LRL speakers in multilingual communication environments with different languages and cultural backgrounds.

The contribution of this paper is to define a facilitator agent whose behavior promotes LRL speech, and test its effectiveness through group discussions among people with different mother tongues.

2 Related Work

Researchers have been trying to develop and improve facilitator agents and conversation agents on different platforms [1, 11]. For example, Ito et al. [6] used an automated facilitation agent to support crowd discussion on a discussion forum, while Kim et al. [7] developed a facilitation chatbot to be used in a chat application.

One of the existing support systems is the listening dialogue system [4]. This system offers chat dialog support with the goal being a listening dialogue system that can satisfy the user’s desire for dialogue and maintain the cognitive function of elderly people. Other researchers have worked on the selection and generation of lexical responses that return idiomatic expressions in response to user utterances, responses that repeat parts of the utterances, and in-depth questions that inquire about the details of the content of the utterance [2].

Ishida et al. [4] published their work on the generation of self-disclosure responses, in which the system presents its own thoughts and information in response to the content of the user’s utterance, in addition to in-depth questions, repetition responses, lexical responses, and evaluation responses, in order to create more natural and speech-friendly listening dialogues. In addition, they also proposed a method for judging whether each response is appropriate from a listener’s point of view and selecting the appropriate type of response by using the results of speech recognition and focus analysis of the user’s utterance and information such as captured responses as features.

Besides focusing on the listening agent, some researchers have focused on other features of the agent so replicate human agent performance as closely as possible. For example, Kitaoka et al. [8] studied the timing of responses to create a dialog system that can respond as reasonably as humans. In addition to the response time, replicating face gestures is also a factor. A group of researchers found that providing the agent with a face can enhance its interaction with humans in a conversation group [10].

The existing studies aimed at promoting conversation in monolingual communication, so the innovation of this research lies in its focus on supporting multilingual communication via MT.

3 Facilitator Utterance Design

3.1 Strategies

Based on the related research detailed in the previous section, we defined strategies that could be effective in supporting communication among LRL speakers.

The first strategy is to use utterances that request a summary of the discussion to facilitate LRL speakers’ understanding of the content of the discussion and the meaning of others’ utterances. The purpose of these utterances is to make it easier for LRL speakers to understand the situation of the discussion by asking high-resource language (HRL) speakers to briefly summarize the content of the discussion at that moment. The intention is to allow LRL speakers to understand the content of the discussion and the opinions of others, and to speak their opinions more easily.

Second, we define utterances that ask non-low-resource speakers to para- phrase utterance(s) in order to facilitate the low-resource speakers’ understanding of the utterance(s). Utterances that are long might not be translated well by MT, so the facilitator agent will request non-low-resource speakers to paraphrase them briefly. This allows users to deepen their understanding of utterances that may be difficult for LRL speakers to understand. This approach is intended to simplify the message for LRL speakers.

Third, the facilitator agent sends utterances that respond to an LRL speaker’s utterances. This strategy aims at making it easier for the LRL speaker to speak. This might help create an atmosphere in which LRL speakers find it easier to participate.

The fourth strategy is responding with utterances that return a positive response when an LRL speaker expresses an opinion. This type of utterance is an affirmative response, such as agreement, to an utterance by an LRL speaker, and its purpose is to create and encourage them to speak more actively.

3.2 Facilitator Agent Behavior

We designed the facilitator agent behavior based on the strategies defined in the previous section.

First, if the LRL speaker does not speak for a certain period of time, the facilitator requests a summary of the discussion. For example, the facilitator can tell everybody to “review the discussion so far” or “summarize the discussion”. In our preliminary experiments, we found that it was effective to execute the command every three minutes, so we triggered command execution three minutes since the last utterance of the LRL speaker.

Next, the utterance requesting paraphrases from non-low-resource language (HRL) speakers is executed when an HRL speaker utters more than a certain number of characters in one utterance. Specifically, the facilitator agent can say, “Let’s summarize that in simple words” or “Please rephrase briefly”. In preliminary experiments, we found that if a user writes a message longer than 90 characters, there is a high probability that the content is difficult to understand. An utterance that responds to an utterance of an LRL speaker is executed when the LRL speaker speaks. Because it is a simple response, it is executed regardless of the content of the utterance. For example, the facilitator sends an utterance in the target LRL with messages such as “I see”, “uh-huh”, and “Is that so?”. The utterances that return a positive response when an LRL speaker expresses an opinion are executed when the LRL speaker expresses her or his thoughts and ideas. Specifically, they will not be executed in response to greetings, self-introductions, etc. The contents of the utterances include “I like it”, “It’s a nice idea,” and “I think it’s very good”. Because it is necessary to understand and judge the content of LRL speakers’ utterances, we used the Wizard Of Oz method, in which a human pretends to be a facilitator agent for this strategy, while the other strategies were executed using a virtual facilitator agent implemented as a chatbot.

A summary of all the strategies, purposes, execution conditions and execution methods is shown in Table 1.

Table 1. Facilitation Strategy

4 Implementation

4.1 Overall System Configuration

LangridChat is a web application built on Django and React. Users can use this application to chat with other users in their preferred language. Currently, English, Japanese, Thai, Vietnamese, Indonesian, Nepali, Korean, Simplified Chinese, and Traditional Chinese are available. The server uses services from the Language Grid [5] to translate the input from the sender’s language to the languages selected by the receivers, which is then sent and displayed in the receivers’ language. The language can be changed by clicking on the current language at the top of the screen. The user can select a new language from the list.

The user can type a message in the text box at the bottom of the screen and then click on the arrowhead to send the message.

Fig. 1.
figure 1

LangridChat interface

Fig. 2.
figure 2

System architecture

Figure 1 is a screenshot taken from a chat room with two users: a Japanese user and an Indonesian user. The message sent by the Japanese speaker in Japanese was translated into Indonesian and displayed on the screen of the Indonesian speaker.

The system is divided into two parts: one that runs on the server side and one that runs on the client-side, as shown in Fig. 2. The first is based on Django and includes an API for message delivery, a translation component, and a delivery component. Therefore, the server-side is responsible for creating chat rooms, translating and sending messages, recording chat logs, and retrieving user information. The latter has a React-based UI that records user information and displays sent messages. This front-end observes user behavior and responds on the client side.

4.2 Server-Side Implementation

The two types of utterances implemented on the server are paraphrase requests and responses to LRL speaker utterances.

Figure 3 displays the flowchart of sending a paraphrase request on the left and the flowchart of responding to LRL utterances on the right.

Fig. 3.
figure 3

Flowchart of paraphrase-request utterance generation (left) and responding to LRL utterances generation (right).

Since paraphrase requests are executed when a subject other than an LRL speaker sends 90 or more characters, it is necessary to observe the language used by the sender of the message and count the number of characters in the message. Since this information is exchanged on the server side, we implemented the system on the server side. If the language of the sender of the message is not Indonesian or Thai, the number of characters in the message is also checked. If the number of characters is more than 90, the server randomly sends a message with the user name of SYSTEM MESSAGE, saying “Let’s summarize in simple words” or “Please paraphrase briefly”.

An utterance that responds to an utterance of an LRL speaker is conditional on utterance type of the LRL speaker. In other words, the agent needs to observe the language used by the sender of the message, and as mentioned earlier, this information is controlled by the server side, so it needs to be implemented on the server. If the language of the sender of the message is Indonesian or Thai, the agent sends the message “seperti itu ya” (Is that so), “oh iya juga” (I see), or “Iya iya” (uh-huh).

4.3 Client-Side Implementation

Figure 4 shows the flowchart requesting a summary of the discussion. The utterance implemented on the client is an utterance requesting a summary of the discussion.

This utterance is executed on the condition that the low-resource language speaker does not speak for a certain period of time. The time at which a user sends a message is managed by the client side, so we implemented it on the client side. When a user sends a message, the server checks whether the user is speaking in an LRL. If three minutes have passed without any utterance from a low-resource speaker, the message “Let’s review the discussion so far” or “Let’s summarize the discussion once” will be randomly selected and sent under the user name SYSTEM MESSAGE to the chat room.

5 Experiment

5.1 Experimental Design

To study the effect of each strategy on the implemented system, we conducted a controlled experiment with a total of 19 subjects: five Indonesian speakers as LRL speakers, five Chinese speakers as HRL speakers, and 14 Japanese as HRL speakers. Each subject was either an undergraduate or graduate student. The subjects were divided into five groups and the effects of the facilitator agent’s utterances were examined through group discussions. Each group consisted of one LRL speaker, one Chinese speaker, and two Japanese speakers. The experiment was conducted over two days.

We prepared the following five experimental tasks (discussion themes) as shown in Table 2, and shuffled them after each discussion to attenuate the effect of the difficulty level of the tasks. The final goal of each task was to choose one answer from the given choices as a team and be able to explain the reasons why the choice was selected.

Fig. 4.
figure 4

Flowchart of summary of the discussion request.

Table 2. Experimental tasks
Table 3. Experimental set-up for each group

After each discussion, the participants were asked to fill out a questionnaire for subjective evaluation. In addition, we also obtained chat log data for objective evaluation (Tables 4 and 5).

6 Experiment Result

6.1 Response to Facilitator Agent Utterance

During the experiment, we evaluated the responses to the facilitator agent for the first two strategies since they are considered requests from the facilitator agent; the other two strategies are not directive.

For utterances requiring a summary of the discussion, the result is shown in Table 4. The effect of the action was lower in group 1. This is because the subjects did not respond to the system message and ignored it even though the facilitator agent sent it as programmed. In response to this, on the second day of the experiment, the system messages were changed from “Let’s review the discussion so far” and “Let’s summarize the discussion once” to “[Name], please tell me what you are talking about now”, “[Name], please review the discussion so far”, “[Name], what are you talking about now?”. The following is a list of the changes made to the previous section. The subjects who were named by other language speakers were more likely to respond to the system messages, and in fact, the subjects who were named by other language speakers responded to the system messages in Group 3 on the second day of the experiment. Group 2 was considered invalid because a significant result could not be obtained due to a malfunction of the system.

Table 4. Assessment of “Request for a summary of the discussion”.

As with the utterance requesting a summary of the discussion, the facilitator agent worked correctly here, but the subjects did not respond to the system message, so the effect of the action could not be discerned. On the second day of the experiment, the system message was changed from “Let’s summarize in simple words” and “Please rephrase briefly” to “Mr. [Name], could you rephrase what you just said in simple words?”, “Mr. [Name], could you please rephrase what you just said in simple words?”, “Can you please rephrase what you just said in simple words?”. However, the effectiveness of this strategy was zero. There was no response nor was any utterance rephrased.

6.2 Number of Utterances

Based on the objective evaluation, we analyze how the number of utterances of low-resource language speakers changed with each condition.

The following table summarizes the mean values of the results for each condition.

One-way ANOVA was performed on these values. However, no items were found to be significant.

Table 5. Mean quantitative ratings for speakers of low-resource languages
Table 6. Mean subjective ratings of low-resource language speakers

6.3 Subjective Evaluation

Subjective evaluation was done with questionnaires. LRL speakers were asked to rate the following six items shown in Table 6, and speakers of other languages were asked to rate the following five items, as shown in Table 7, on a five-point scale. The numbers indicate the average of the subjects’ answers.

Table 7. Mean subjective ratings of high-resource language speakers

One-way ANOVA on ranks was conducted on these results. The results showed that the LRL speakers had a P value of 0.0471 for the item “Did communication from the facilitator trigger your speech?”, which was significant at the 5% significance level. As for the questionnaire for the other language speakers, there were no items that were significant at the same significance level. However, there was significance at the 10% significance level for the item “Did you feel that some people seemed to be difficult to talk to?”.

7 Discussion and Future Direction

On the first day of the experiment, subjects did not respond to the system messages requesting them to summarize the discussion, but when the system messages were changed from a general message to a message that included a specific name, the subjects responded. This is thought to be because it became difficult to ignore the request. Therefore, we think that the facilitator agent should make utterances that give some sense of obligation to the subjects to respond.

Sending the paraphrase request utterance did not receive a good response. This may be due to the fact that the experimental task itself did not require long utterances and the difficulty level was not appropriate. Therefore, it is necessary to verify the effectiveness of this condition through group discussions with more difficult content or experiments outside group discussions. One potential issue with the restatement request is that the facilitator did not provide a clear direction for paraphrasing. Because the sender’s new communications may still be complex, the translation quality might remain low.

Even though the questionnaire responses indicated that the LRL speakers thought that the facilitator triggered their utterances for some strategies, the number of utterances from the LRL showed no obvious difference.

Tasks requiring more effort to communicate might be more appropriate in our future experiment.

Since we could not measure the effects of utterance responding to LRL utterances, and positive responses to them, our future plan include adjusting the task and evaluation methods. In addition, more iterations of experiments should also be conducted to confirm the results presented here. The limitation of this study includes the design of the experiment task for each group of participants.

In the future, we plan to improve the facilitator agent so the users feel more obligated to respond, based on the ideas from existing research including making the conversation human-like as much as possible [3], starting from changing the user name of the agent to a human name and redesigning the response utterances. Another idea from previous research [10] is to embody our facilitator agent by using API for face and speech modalities.

8 Conclusion

In order to solve the communication problem caused by the limited accuracy of machine translation in multilingual communication, we defined utterances that were considered to be effective in activating communication by low-resource language speakers. We implemented these utterances in LangridChat, a multilingual chat system, and verified and analyzed their effectiveness through experiments with participants in group discussions.

We defined four strategies for the facilitator agent based on existing research. In order to facilitate the understanding of low-resource language speakers, we defined two types of utterances: one that requests a summary of the discussion, and the other that asks for a paraphrase. Furthermore, to facilitate low-resource language speakers’ participation, we created two types of responses that provide positive responses to the opinions expressed by low-resource language speakers. The results of an experiment showed that there was significance at the 5% level in the subjective evaluation of “whether the communication from the facilitator triggered utterances”. However, no significant difference was found in the results obtained from the objective evaluation.