Keywords

1 Introduction

The number of foreign people visiting Japan has been increasing on a daily bases. Especially in Trans Asia-Pacific Area it’s frequent for them to contact and collaborate with Japanese. Communication with local people requires learning language and culture, which still includes high barriers to entry for those purposes. International people from other countries in Japan need Japanese proficiency and learn Japanese as a second language because those non-native Japanese speaker and native Japanese speakers as local residents communicate and collaborate in Japanese at the work places or at school for business, education, and so forth.

Related researches using multiple languages and cultures have also increased its number in Cross-culture Computer-Mediated Communication field. There are consecutive researches and observation through technology to report its effects and features in different language and culture for a variety of conversational themes and modes of communication. Since international conversation consists of people from different languages and cultures, they are mutually incapable of understanding each other in conversation. Moreover those of other language face difficulty to keep up with the conversation in real time and miss chance and information, or drop their task performance to achieve their objectives. This prevents them from performing effective communication and contributing to the conversation. Such communication deficit demands supporting tool that enables to share the content accurately with people of limited language proficiency. In this paper, dyad of non-native speaker (NNS) and native Japanese speaker (NS) exercises the computer mediated communication to enhance comprehension of the conversation and it is observed its effects on a conversation and also reported participants’ evaluation of the method through the experiments.

2 Related Work

2.1 Cross Cultural Communication

Novinger [1] shows cases of cross-cultural communication in the US, Mexico and Japan from the aspect of international business. Through those cases it refers to people of different cultures embrace communication obstacles, which delivers misunderstanding and ineffectual communication in business and social situations. It is also pointed that language is an important factor in communication and is certainly responsible for many obstacles, and contact with international people has rapidly increased along with the development of communication technologies. It occurs verbal and non-verbal communication gaps and different perceptions based on different cultural backgrounds. Analyses of these interactions were concluded that some prescriptions towards intercultural communication, that stated necessity of adoptability referring to every kinds of communication includes minor cultural gaps in-between.

Fujita [2] emphasizes an importance of travel agency business in Japan for domestic economic development, and its integrity in cross-cultural communication become imperative need. Some examples show communication behavior gaps and misunderstanding between foreign tourists and local businesses in Japan, which is not rare to lead to troubles and complaints. It argues a conversation as an interactive activity using the communication model and explains a communication noise of cultural differences regarding an information transmittance. Hence it is stated cross-cultural communication issues are an urgent task among the tour industry, and the government officials and colleges of tourism are to improve communicative competency and foreign language proficiency as for the tourism host country.

2.2 Automatic Speech Recognition and Machine Translation

Computer mediated communication specialists have developed a variety of support and technique over the decades. Audio conference supporting systems have worked on collaboration in a distance call and multiparty conferences. Pan et al. [3] append real-time transcript on TV news programs and audio using an automatic speech recognition (ASR) system and found NNS comprehension significantly improved for both audio and audio with video conditions when real-time transcription is provided. Pan [4] also finds ASR creates imperfect transcripts including errors, which impairs NNS comprehension compared to the perfect transcripts.

Gao et al. [5] investigates effects of public and automated transcript with ASR between native and non-native English speakers in a story telling task conversation among triad. It states publicly shared transcripts enhanced the quality of group communication and clarify NS speech by contrast with limited transcript locked up NNS’s recognition. Gao et al. [6] utilizes the keyword highlighting on machine translation (MT) in multilingual collaboration in brain storming tasks. It indicates the highlighting essential portion enhances intelligibleness and quality of collaboration with subjective impressions by all means.

Yamashita [7] compares regular communication to MT mediated communication with referring behavior that assists MT quality in multilingual tangram task conversation. This research reveals difficulties in establishing common ground via MT with referring communication despite using English as a common language realizes more accurate communication between a follower and guiders in tangram matching tasks. Miyabe [8] investigates the cost of back translation repair on MT to show the extent of imprecision of MT. This research refers to improvements over six times could be required through the trials although its repair cost is dependent on the original sentences’ accuracy and cross-culture collaboration accuracy. In sum showing perfect transcript in real time still have been developing technology in bidirectional natural language conversation.

2.3 Audio and Text Communication

As its reported in ASR and MT researches, provision of literal information significantly improves NNS comprehension. Hirai [9] shows strong correlation between second language learners’ optimal listening rate and reading rate in comparison to similarity of the first language optimal listening rates and reading rates among college students. Takagi [10] refers to the process of note taking for a medical purpose in interviewing and counseling. Such conversation shares hand writing characters on a paper between a doctor and a patient. Okamoto [11] develops the system to visualize some pictures along with cultural proper nouns to support dyadic intercultural communication. System enhances direct face to face (FTF) communication and comprehension with verbal and visual clues.

Clark [12] declares producing an utterance has a cost that varies from medium to medium. Speaking or gesture is the quickest that takes the least effort, typing on a computer keyboard is slower that takes more effort, and writing by hand is slowest and takes the most effort.

Echenique et al. [13] investigates comparison of video and audio with transcript for NNS comprehension in tangram matching triad conversation in English. ASR substitutes NS typing on a computer keyboard as real time transcripts. Consequently both of NS and NNS establish common ground and enhance comprehension in audio with NS typed transcript that shows essential portion.

Chapanis [14] compares some modes of communication in problem-solving tasks such as FTF, Televoice and Teletype, and the experiments discoveres voice with typing are much more likely to share and exchange information than communicators in FTF or voice only communication. It shows details the interaction and communicators in modes of Communication Rich, Voice, Handwriting, Typewriting by experienced typists and Typewriting by Inexperienced typists, which discoveres counterintuitive results that typing skill of communicators is not significantly affect accuracy and duration of time to solve decision-making problem in a conversation. Accurately typed material is not important for interactive communicators in a task-oriented conversation.

2.4 Successful Intercultural Communication with Common Ground

Yamashita [7] argues effectual and accurate intercultural communication referring Common Ground Theory. According to Clark [12] Common Ground is so basic to communicate of two people working together both the coordination of content and process. Shared information, knowledge, beliefs and assumptions are coordination content of a collective activity between a speaker and a listener moment by moment. Contribution to a conversation presents evidence of understanding utterance in shape of forming initiating the answer and accepting information.

2.5 Purpose and Hypotheses

Previous studies show cross-cultural communication often cause misunderstanding and ineffectual communication. Thus it is important not only to establish conversational sequence but also to accomplish mutual understanding. Since intercultural conversation consists of people from different languages and culture, they are mutually incapable of understanding each other and those of other language face difficulty to keep up with the conversation in real time. This ineffectual communication demands assistance on a conversation that contributes to share the content accurately with people of limited language proficiency to accomplish higher task performance. In this paper we proposes a method to pursue comprehension and contribution on dyadic collaboration and achieve interactive and natural human communication in cross-cultural transaction. Purpose of study is to investigate effects of the proposed method that NS typing the essential portion of a conversation on a computer keyboard to support NNS comprehension of the content. Intercultural dyadic teleconference presents provision of textual essence of words/phrases (Keyword) so that it works as an integral function of the textual reference and voice in teleconference. To examine its effects on a conversation we hypothesize below and test their validation through the quantitative and qualitative data analyses. According to Gao et al. [23], accurate comprehension of NS message accelerates NNS own communication. Therefore H1 is hypothesized when experiment participants utilize the method and increase modality of conversation. H2 is derived from a research that is about group communication enhancement using technology [5, 24]. Previous studies show that collaborative tasks and group performance are influenced by technologies, information cues from partners and impression of work context, and those cues convey different collaborative task performance and perception of participants. Hence H2 is hypothesized to investigate the influence of the proposed method on NS and also dyadic conversation as a whole.

H1::

NNS apprehends a conversation and improves own communication when NS utilizes the method.

H2::

The method also works for NS when a conversation becomes easily comprehensible.

3 Method

The method is that NS types the keywords (Key-Typing) of a conversation on a computer keyboard to support NNS comprehension of the content. Keyword is an essential portion of the speech or words/phrases that suppose to be hard to apprehend for NNS. The method targets NNS to pursue not only comprehension but also communication enhancement simultaneously. Typed letters along with NS voice provide a function of self-reference of literal information and natural human support in teleconference. The method increases modality of a conversation, that is a simple task of NS becomes a useful assistance for NNS. Reasons and advantages of the method are described as follows;

Spontaneity.

NS spontaneously key-types with talking by self-motivation and it doesn’t deteriorate natural human communication as well as assisting NNS communication.

Validity.

NS pays attention and considers reasonably what to type depends on the context. NNS refers to the key-typed characters only if NS remarks are unclear and keep an eye on somewhere incomprehensible. The method effectively utilizes human resource on computer-mediated communication.

Versatility.

Simplicity of the method allows flexible, adoptable and user-friendly system for everybody. ASR and MT are exclusive for someone affordable and also produces higher rate of word errors than NS Key-Typing. Those technological malfunctions of presentation compound NNS comprehension and overload NNS cognition such as thinking, correcting, reading, listening and talking during a conversation.

4 Experiment

4.1 Overview

Using all experimental process on a computer, conversation experiments were executed under two different conditions. One was NS Key-Typing condition that NS types an essential portion of speech on a computer keyboard, and the other was the control condition that was a conversation that nobody types on a computer keyboard Ceteris Paribus. Experimental design was a single factor two-level and between subjects. NS and NNS were randomly distributed to organize dyads and 16 pairs participated in both conditions. The conversation tasks and its order were also balanced between subjects, and every single pair participated in both of conditions within a day.

4.2 Participants

The experiment participants were 16 people of Native Japanese speakers and the same number of non-native speakers. The native speakers were all Japanese who were born and grew up in Japan. Non-native speakers were international students from China whose Japanese Language Proficiency Test N1 average score were 118.9, and they had studied Japanese for 4 years on average. The JLPT N1 requires competency to understand Japanese. Can-Do Self-Evaluation Survey of JLPT [15] refers less than 50 % of N1 successful examinees’ bottom one-third near the passing line thinks they can express their opinions in discussion. Demographic survey asking Japanese competency of international students rated average 3.5 out of 7 by self-evaluation. Differences in Japanese competency were randomly spread over the conditions and were arbitrary organized according with participants’ schedule and availability. Gender distribution was 17 male and 15 female, and their average age was 25.3. Native speakers’ average age was 26.6, 12 graduate students, 2 undergraduate students, and 2 were faculty members. Chinese students’ average age was 24, 6 graduate students and 10 undergraduate students. The participants were randomly distributed into pairs throughout between-subject conditions.

4.3 Materials

Laboratory Environment.

The pair seated at the PC tables back to back (BTB) in the laboratory, which was a simulation environment to keep deploying audible space and remove mechanical noise and distortion of teleconference. In front of each participant there was a 39 inch monitor, a mouse, a keyboard and a microphone extended from a 15 inch laptop (PC) to allow experimenters to operate with sitting behind the large monitor and participant to use extended materials. Single video camera captured the entire laboratory space including experimenters’ figures. Another two cameras tracked each participant’s upper body from the side, which captured action of PC usage as well as their conversation. Desktop screen and conversation voice were synchronously captured by computer software (Fig. 1).

Fig. 1.
figure 1

Laboratory environment.

Software and Equipment.

Each PC with turned off speaker, and microphone was simply used for voice recording. PC connected to the intramural LAN network and was synchronized with the other PC on Skype. Skype’s Share-screen feature was enabled to show only a Key-Typing window for experiment. NS typed keywords were synchronously shared on NNS PC monitor through the network. Monitor showed two MS Word 2013 windows, the left was for Key-Typing and the right was for task-oriented information that included supplemental reading materials provided and revised beforehand. Allocating two different windows separately on 39-inch monitor side by side, the PC setup simplified and reduced the participants’ physical load. Typed data of MS Word were saved after a conversation for analyses (Fig. 2).

Fig. 2.
figure 2

Software and equipment.

Procedure.

Conversational tasks were to debate the pros. and cons. of nuclear electric power generation and capital punishment system. Debate was one of an interactive activity that the pairs expressed agreement or disagreement in a logical manner, and these were well-known and commonly used debate topics. To adopt contentious and divisive problems for both of pros. and cons., each participant chose own role at the beginning of the experiment. There was no judge who determined a winning side of a debate because all participants did not have a debate training for an educational practice, participants thereby conducted it as a sort of a conversation. Supplemental reading materials were distributed to both sides of agreement and disagreement that provided definition of terminologies, major issues and representative opinions as a common sense for a self-guide. Participants then composed and modified the reading material on PC according to own opinion and did not see also the opponent’s role material. The given time was 7 min for each round, which was a predetermined period according to our preliminary. The combination of topics and conditions were balanced between subjects.

  1. 1.

    Preparation. Participants sat at the PC tables back to back (BTB) in the laboratory, and experimenters were seated behind the large monitor to ask participant to use a PC following the instruction. Participants filled out the consent forms and demographic surveys that asked age, nationality, gender and so on, and then left personal belongings including smartphones on a table. Written Experimental Procedures were handed and experimenters orally explained operations. International students also received instruction in Chinese unless s/he understood instruction in Japanese. All participants confirmed there were two debate problems in different situation and post experimental surveys that asked some memory of conversation in advance.

  2. 2.

    Instruction. Experimenter asked participants to wear a microphone and checks its located upright position, also instructed that Key-Type condition took a little time for practice when the PC left-side screen was synchronized with interlocutors PC. NS types keywords during talking and keywords were not an entire message as a whole but a part of an important point.

  3. 3.

    Tasks and Surveys. Two rounds of 7 min debate were videotaped, screen captured, and voice recorded. 3 kinds of post-experimental surveys were conducted.

5 Measures

To investigate effects of NS Key-Typing on a conversation between NS and NNS of Japanese, the experiment was videotaped. The proposed method was that NS types the essential portion of a conversation on a computer keyboard to support NNS comprehension of the content. We tested the hypotheses utilizing analyses both of quantitative and qualitative data collection.

5.1 Observation

Coding of Evidence in Grounding.

We coded a conversation line by line based on Clark’s Common Ground Theory, Evidence in Grounding [12]. The Coding scheme shows three different types of conversational sequences during discussion, which ensures dyads present process of grounding in shape of forming initiating the answer and accepting information.

Shared-knowledge Retention.

Participants consented in advance that there were two debate topics in two different situations, and each round had the post-experimental survey that asked some memory about the conversation content. This survey was to explore comprehension and retention of the conversation content from memory of both of NS and NNS.

5.2 Survey

Workload.

NASA-TLX [16] is a subjective workload assessment tool consists of 6 descriptive rating scales. Paper version NASA-TLX rating sheet evaluates each factor from 0 (low)-100 (high). Simple usage of testing only calculates an average score ratings [17]. Experimenters interviewed detailed comments about the ratings thereafter.

Questionnaire Survey.

Questionnaire survey assesses participants’ perception via 23 scales of 5 major attributions such as interlocutor’s communication, own communication, collaboration, mood and technology. Every single scales are retrieved from previous works and modified for the current study. [1822, 24].

Interview.

Detailed description about the experiment was also asked to obtain credible proof of N/NS assessment. Interview items were prepared beforehand based on our preliminary study. Bilingual students were interviewed in dominant language either Japanese or Chinese.

6 Result

6.1 Evidence in Grounding

Conversation was transcribed for the purpose of coding based on Evidence in Grounding [12]. In accordance with Common Ground Theory, Evidence in Grounding Coding Manual on Table 2 along with the context of discussion was founded as a standard of a conversational sequence presenting evidence of understanding utterance in shape of forming initiating the answer and accepting information. Obvious forms of Positive Evidence presents three most common schema as Acknowledgement, Relevant Next Turn, and Continued Attention. Current computer-mediated communication hardly seeks Continued Attention through Eye gaze that is Ad Infinitum, we thereby did not count it (Table 2).

Table 1. NASA Task Load Index.
Table 2. Evidence in Grounding Coding Manual.

Two experimenters worked independently for coding, one was Japanese and the other was Japanese-Chinese bilingual speaker. Inter-coder agreement was good (85 %) for the first time, and both data sets showed the significant difference between conditions[p = .016, Z = −2.413]. Afterword coders discussed and found one’s coding scheme was a little different from the Coding Manual, thus justified and improved inter-coder agreement higher (90 %). Key-Typing had significantly larger number of Evidence in Grounding [p = .001, SE = 3.09, t(15) = 4.04], and showed large effect size (Cohen’s d = 1.00). Figure 3 showed average numbers (times/pair) of Evidence in Grounding.

Fig. 3.
figure 3

Evidence in Grounding Coding result. (N=16; P<0.05:**)

6.2 Shared-Knowledge Retention

Participants composed their own writings on PC without any references but only by recalling their memory that they assumed to share information and knowledge with an interlocutor by words, phrases, or sentences on a conversation as much as they can individually. We double-checked them through the video and count the number of common information that are the same with the interlocutor’s writing to compare them between conditions. The average value on Key-Typing condition produced significantly larger number of shared-knowledge retention [p = .000, SE = 1.15, t(15) = 8.89]. The average value is shown on Fig. 4.

Fig. 4.
figure 4

Shared-Knowledge retention. (N=16; P<0.05:**)

6.3 Workload

It shows a score average and the significant difference by factors between conditions in Fig. 5. Scores were not weighted averaged but simply averaged throughout each factor [17]. NNS significantly decreased Frustration in Key-Typing condition [p = .01, Z = −2.566], we therefore asked for description in the interview. NS on the other hand significantly increased Physical demand [p = .038, Z = −2.079], Effort [p = .022, t(15) = 2.54], and Frustration[p= .049, Z= −1.968].

Fig. 5.
figure 5

Workload. (N=16; P<0.05:**)

6.4 Survey

Questionnaire survey assesses participants’ perception via 23 scales of 5 major attributions such as interlocutor’s communication, own communication, conversation, mood and technology. Every single scale was retrieved from the previous works and slightly modified for the current study. Scales were served in random order to cancel out the order effect on participants, and they responded questions on a scale of 1 (strongly disagree) to 7 (strongly agree). In Fig. 6. scores were averaged for 5 attributions throughout 23 scales such as “It was easy to understand what an opponent said.” and so on. The questions formed a reliable scale of Interlocutors communication [Cronbach’s α = .75], Own communication [α =.71], Conversation [α = .77], Mood [α = .77] and Techonology [α = .78]. NNS rated higher in Key-Typing condition for factors of Interlocutor’s communication, Own communication, and Conversation. NS highly rated Conversation and Technology of Key-Typing.

Fig. 6.
figure 6

Questionnaire survey result. (N=16; P<0.05:**)

6.5 Interview

At the end of the experiment, we interviewed each participant about Frustration factor of Workload. Most of NNS testified the Key-Typing method eased anxiety caused by much of terminologies in debate, and promoted comprehension of NS utterances. On the contrary some of NS remarked it was hard to type during talking.

7 Discussion

7.1 Evidence in Grounding

Conversation with Key-Typing method has significantly large number of Evidence in Grounding compared to control condition, and shows large effect size. The method solves problem of understanding and difficulty on a conversation, and also have interlocutors realized mutual understanding. Utterances found a sequence presenting evidence of understanding in shape of initiating the answer and accepting information, Key-Typing method hence naturally promoted discourse comprehension with courteous consideration utilizing human resource initiating textuality. We tested significance and it was supported H1:NNS apprehends a conversation and improves own communication when NS utilizes the method. NNS utterance was not supported in the current study although it was assessed NNS own communication was highly rated in Key-Typing condition. Ongoing process of analyses may describe it in respect of timing and utterance in chronological order.

7.2 Shared-Knowledge Retention

The average value of Key-Typing condition produces larger number of shared-knowledge retention on Fig. 5. This indicates Key-Typing conversation presents accurate information transaction and it is kept in their mind. Conversation is interactive activity and it is beneficial for both sides when information is accurately shared and commonly retained, which exists in-between participants. Hence it is supported H2:The method also works for NS regarding a conversation become easily comprehensible.

8 Limitations and Future Works

Comparing to computer researches that have developed ASR and MT, Key-Typing method requires human resource to input. Key-Typing method takes sometime to type on a keyboard, which may cause Production cost [12] that is typing is slower than speaking or gesture even though Key-Typing conversation mediates simultaneous media. Such cost would have to be minimized when further analyses suggest an effective methodology of typing in respect of timing and content. It also may be applicable not only for NS and NNS but also every kinds of communication in every natural languages and occasions across the world, as Novinger said every communication includes minor cultural gaps and language plays an important role [1]. Since the previous study has examined how audio with transcript influences NNS comprehension in triad conversation using English and show that NNS comprehension is improved [13], NS typing on a computer keyboard would be possible to apply to other natural languages. We would like to keep working on analysis to expand research range of spectrum of the method so that it would be universally exercised in interactive dialogue where is unaffordable of expensive technologies.

9 Conclusion

In this paper we proposed a method to pursue comprehension and contribution on dyadic collaboration and achieve interactive and natural human communication in cross-cultural transaction. Purpose of study was to investigate effect of the proposed method that NS types the essential portion of a conversation on a computer keyboard to support NNS comprehension of the content. The Key-Typing method enhanced mutual understanding in point of presence of Evidence in Grounding and promoting retention of shared-knowledge. Overall cost of NS keyword-typing resulted in benefit of improving mutual understanding and increasing shared knowledge. By all means questionnaire survey showed higher ratings on factor of participants’ perception of conversation and technology through the experiment, and the interview assessment obtained much of reviews that Key-Typing method would be going to perform effectively on a conversation including daily occasion of potential cross-cultural communication.