Introduction

Technological advancements have brought the world and its people closer together. Nowadays, most people own computing devices and have access to the Internet, so intercultural interactions and exchanges take place more widely and actively (Çiftçi 2016). Thus, because cultural differences exist in our globalized society, the development of cross-cultural understanding has become an important issue (Aparicio et al. 2016; Bentley et al. 2005). People need to recognize and understand cultural differences in order to avoid problems that may arise during intercultural communication, because we all must co-exist with others peacefully in this interconnected world (Rogers et al. 2007).

Modern information and communication technologies play an important role in aiding cross-cultural learning programs (Aparicio et al. 2016; Ertmer et al. 2011; Rogers et al. 2007). Çiftçi (2016) reported that the most frequently used technologies to aid cross-cultural learning included: online discussion boards, text-based chatting, blogs, email exchanges, video recording and conferencing, and podcasting. These technologies support both asynchronous and synchronous communication and are valuable, enjoyable, and suitable for cross-cultural learning, because they enrich the nature and function of communication. Nevertheless, several critical issues related to cross-cultural education still exist that hamper intercultural interactions and exchanges among people. According to Osman and Herring (2007), Reynolds (2013, 2015), Shadiev and Huang (2016), and Wu (2014, 2015), the language barrier is the most critical factor in cross-cultural learning because the lack of a common language makes it difficult to communicate. To address this issue, we applied speech-to-text recognition (STR) and computer-aided translation (CAT) systems in the present study. During intercultural communication, the STR system generated texts from participants’ speech in their native languages, and the CAT system then simultaneously translated them into foreign languages understandable to others. In this manner, it was assumed that STR and CAT systems aided the cross-cultural learning of participants from different countries who were speaking different languages. However, Barrachina et al. (2009), Hwang et al. (2012) Kuo et al. (2012), Fountain and Fountain (2009), Mellebeek et al. (2005), and Shadiev et al. (2016) warned that the accuracy rates of STR and CAT technologies should be considered when applying them to teaching and learning because content created by STR and CAT technologies can be useful and meaningful for learning only when STR and CAT texts are reasonably accurate. These scholars suggested that STR and CAT systems are still not able to deliver perfect transcriptions and translations. Since these claims were made several years ago, and technology has advanced rapidly in recent years, this issue may need to be revisited. In addition, applications of STR and CAT systems to support cross-cultural communication and learning have received very little attention from the scientific community in the past. Therefore, this study is an attempt to address these issues.

Related literature and cultural context

Cross-cultural understanding

Cross-cultural understanding is generally defined as the basic ability of individuals to correctly recognize, understand, and interpret cultural differences. According to Moran et al. (2014), diverse cultural backgrounds include differences in languages, appearance, food, traditions, values, and so on. What is allowed in one culture can be prohibited in another. For example, “Americans love beef, yet it is forbidden to Hindus, while the forbidden food in Muslim and Jewish culture is pork, eaten extensively by the Chinese and others” (Moran et al. 2014, p. 12). Thus, understanding the culture of others helps to overcome cultural differences and maintain harmonious relationships. Cross-cultural understanding can be explained by the cultural convergence theory (Gudykunst et al. 1988; Kincaid 1979). According to this theory, cross-cultural understanding takes place through communication and information exchange between/among two or more persons from different cultures when they reach a mutual understanding of each other’s culture and the world in which they live. That is, experiences of and insights into other cultures that persons communicate and share among themselves enable the expansion of their cultural awareness and behavior (Gudykunst et al. 1988; Kincaid 1979). Jaakkola et al. (2010) and Talalakina (2010) argued that cross-cultural understanding can be measured based on the following four rubrics: (1) cross-cultural knowledge refers to familiarization of an individual with different cultural characteristics, values, beliefs, and behaviors; (2) cross-cultural awareness refers to an individual’s internal understanding and appreciation of a culture; (3) cross-cultural sensitivity refers to an individual’s ability to read into situations, contexts, and behaviors that are culturally rooted and to react to them appropriately, and (4) cross-cultural competence an individual’s ability to work effectively across cultures, i.e., to respect cultural differences, adapt to changing situations, and benefit from them. Cross-cultural learning has been defined as a process involving active experience and participation that enables an individual to acquire knowledge and to absorb new attitudes and values related to different cultures (Yamazaki and Kayes 2004). Talalakina (2010) suggested fostering cross-cultural learning through multiple goals, such as (1) providing information on cross-cultural peculiarities, (2) encouraging cross-cultural communication and open-mindedness among individuals from different cultures, (3) providing tools for cross-cultural communication, and facilitating the interpretation and reactions of individuals to cross-cultural contexts, and (4) modeling situations to implement the acquisition of cross-cultural knowledge. Kohlberg (1984) proposed several techniques for implementing cross-cultural learning. One technique involves providing learners with opportunities to experience cultural diversity through readings, simulations, and watching videos (i.e., indirectly) and another is by interacting with people from other cultures (i.e., directly).

Different learning activities intended to facilitate cross-cultural understanding have been discussed in earlier studies. Self-introduction helps students become acquainted with one another and their cultures (Chase et al. 2002; Liu 2007; Tu 2004). In addition, more social interactions can be achieved through self-introduction (Curtis and Lawson, 2001). Creating media content and sharing it with others contribute to diversification of cultural expression and the empowerment of cross-cultural perceptions (Jenkins et al. 2009). Discussing enables students to discern and respect multiple perspectives across diverse communities (Jenkins et al. 2009). Experiencing foreign cultures through play, performance, and appropriation helps with adopting alternative identities. In addition, students are able to sample and remix media content meaningfully for the purpose of improvisation and discovery (Jenkins et al. 2009). By reflecting on foreign culture, students are able to share their reflections and experiences with their peers (Tu 2004).

Speech-to-text recognition and computer-aided translation

Speech-to-text recognition (STR) technology synchronously transcribes text streams from an individual’s speech (Ranchal et al. 2013; Shadiev et al. 2017). According to earlier research, STR has great potential to aid the learning of different target groups, i.e., students in a physical classroom, students learning online, non-native speakers, or students with cognitive or physical disabilities (Shadiev et al. 2017). For example, Hwang et al. (2012) and Kuo et al. (2012) adopted STR in online learning environments to support both teaching and learning. When students experienced problems associated with network connections and could not hear the lecturer, they could still follow the lecturer by reading STR-texts. Kheir and Way (2006) applied STR to assist the learning of hearing-impaired students. These students could use STR-texts to comprehend lecture content since they could not hear it. Ranchal et al. (2013) employed STR to generate lecture transcriptions. When STR was available during lectures, the students could focus on the lecture and did not have to take class notes. In addition, transcriptions were useful to them for studying lecture content after class. Ryba et al. (2006) adopted STR to support the learning of non-native speakers of English during lectures in English. The students listened to the lecturer and simultaneously read STR-texts to understand words with which they were unfamiliar and to verify and clarify words they had misheard or did not catch when the lecturer spoke too fast.

Computer-aided translation (CAT) technology translates texts from one language into another (Godwin-Jones 2011). According to earlier studies, CAT technology is a potentially reliable and valuable tool to aid learning, especially in the field of second/foreign language (SL/FL) learning. For example, Hermet and Désilets (2009) applied CAT during SL/FL writing activities to compose essays as well as to correct grammatical and lexical errors in essays. Omar et al. (2012) employed CAT during an online discussion. CAT translated and searched for appropriate words that helped the students express their opinions and ideas during the discussion. In addition, CAT checked grammar and spelling and helped with overcoming problems that occurred when they were constructing sentences. ElShiekh (2012) adopted CAT to research a writing course in order to explore the translation process from English into Arabic and vice versa in the field of advertising as well as Koranic texts and literary works. Shadiev and Huang (2016) introduced STR and CAT to Taiwanese and Uzbek students to facilitate their cross-cultural understanding. Students communicated and exchanged information about their local cuisine and culture. However, students in their study represented two cultures, and their communication was bi-lingual.

Research motivation and cultural context

In this study, we focused on facilitating cross-cultural understanding in participants from different countries who speak different languages. Informed by related theory and earlier studies, we designed a cross-cultural learning activity. The cultural context of this study was specific to each participant and represented his/her culture. That is, we had twenty-one participants from thirteen countries speaking their mother tongue in ten different languages. During the learning activity, participants introduced twenty-one traditions from their culture. They communicated and exchanged information with one another in their mother tongue. We applied STR and CAT technologies to support multilingual interaction among the participants, help them understand each other, and facilitate their cross-cultural learning. In this study, we aimed to: (a) measure the accuracy rate of current STR and CAT technologies, (b) explore issues associated with STR and CAT processes and how they can be solved, and (c) investigate whether the use of STR and CAT applications are a feasible way by which to facilitate cross-cultural learning. The significance of this study is that we attempted to address the language barrier issue with STR and CAT technologies during cross-cultural learning. That is, this research sheds new light on the feasibility of STR and CAT systems applications to support cross-cultural multilingual communication among participants. In addition, this study provides new insights into the accuracy rate of STR and CAT systems utilized during a cross-cultural learning activity. Furthermore, this study enhances our understanding of existing issues associated with STR and CAT processes and how they can be solved. It also addresses the following research questions: (1) What is the accuracy rate of current STR and CAT technologies? (2) What are the issues associated with STR and CAT processes, and how do students resolve them? (3) Are STR and CAT applications a feasible method by which to facilitate cross-cultural learning?

Method

Twenty-one university students representing thirteen nationalities participated in this study (Table 1) of which twenty of the participants were between 20 and 27 years old, and one was 32 years old. This infers that there were no generational differences. The participants were enrolled in the same course, and they did not know each other prior to this study. Sixteen participants had less than 1 year of experience or no experience with the use of STR technology, and the remaining participants had more experience. Seven participants had used CAT technology for less than 3 years, and fourteen participants had used it for a longer period. Fourteen participants had less than 1 year or no cross-cultural learning experience, and seven participants had more than 1 year of experience.

Table 1 Participants, their tradition, and foreign tradition their experienced

We aimed to facilitate the cross-cultural understanding of the participants with a learning activity following the general recommendations of earlier studies (Chase et al. 2002; Curtis and Lawson 2001; Jenkins et al. 2009; Liu 2007; Tu 2004). Thus, the study included four one-week steps. In the first step, the participants introduced themselves, their hobbies, and their interests. In the second step, the participants introduced their local traditions and related culture. They were asked to identify one local tradition, describe it in terms of process, and explain its origin and relation to their culture. In the third step, each participant selected one tradition and experienced it; for example, student ID 21 learned about the “red envelope” culture, prepared one red envelope, and presented it to a friend as a wedding gift with an accompanying speech explaining the gift’s meaning. After that, students shared their personal experiences with the selected tradition with the other participants. Before selecting a tradition to experience, each participant confirmed that he/she had no prior knowledge about it. Communication among the participants during the first three steps was carried out in a closed Facebook® group, which took place asynchronously. In the fourth step, all participants met online face to face to communicate synchronously with each other about themselves, their traditions, and their experiences with foreign cultures. The participants communicated via Skype® messenger. Student communication in the asynchronous or synchronous modes was organized based on the nature of each step of the learning activity. For example, asynchronous communication was useful during the self-introduction step because the students were not acquainted with one another, and it helped decrease anxiety and inhibitions (Yang and Chen 2007). Students were motivated to disclose personal information, which could be problematic for some during face-to-face communication (Yang and Chen 2007). Asynchronous communication allowed the students to have enough time to post their information, read the posts of others, and ask or answer questions (Shadiev et al. 2017). In addition, the students were able to communicate at convenient times (Tu 2004). Synchronous communication, on the other hand, was useful for exchanging culturally-related information, reflecting on cultural experiences, and making or receiving comments instantly (Tu 2004). Since the students were already acquainted with one another after the first three steps of the learning activity, real time communication helped increase their interaction and gave them a sense of community (Tu 2004; Shadiev et al. 2017). This is why communication among the students was synchronous in the last step.

The participants used STR technology to generate texts from their voice input in their native languages, and then they used CAT technology to simultaneously translate STR-texts into English. We employed this approach because the participants could not understand the foreign languages of the other participants, but all of them could understand English. Extracts from the participants’ communication content are provided in the Appendix as an example. Before the learning activity, the researchers trained the participants to use STR and CAT, and the participants had a 1 week period to practice the technology. The researchers met with every participant after each step to discuss technology-related issues and how these issues were addressed by the participants. If some participants had no solutions for their STR- and CAT-related problems, we offered them those proposed by other participants.

We introduced STR and CAT technologies to the participants to aid in their multilingual communication. For this purpose, we used the Google® Translate system (Fig. 1). Google® Translate is an automated machine translator application that offers free online language translation service to users with internet access. It can be used to translate words or phrases from one language into another and supports more than 100 languages. The Google® Translate algorithms are not rule-based like most CAT systems but are rather based on statistical analyses (Tobin 2015). That is, the algorithms rely on a large corpus of professionally translated texts and look for equivalences (Fountain and Fountain 2009). Google® Translate features an STR tool that is activated by clicking on a microphone button. A speaker selects a voice input language, speaks, and STR then generates text from the voice input. After that, the STR-text is simultaneously translated from the speaker’s native language into English by the Google® Translate CAT tool.

Fig. 1
figure 1

Google® Translate 1 STR tool, 2 input language, 3 input area, 4 translated text, and 5 improving translation

The analytical data were collected from two main sources: online communication among the participants and one-on-one semi-structured interviews. Table 2 presents a number of words participants communicated in different languages during different steps of the learning activity. These numbers are presented as averages because there were several participants who were native speakers of the same language (e.g., two native French speakers and three native Spanish speakers). In addition, we included the total number of words in the table to show how many words the participants communicated in total in particular languages. This information could be useful to aid with understanding how much the STR and CAT tools can help with cross-cultural multilingual communication. We analyzed the content of the online communications to measure the accuracy rates of the STR and CAT systems. The accuracy rates of both the STR and CAT systems were calculated by dividing the total of all correct words in the text by the total words and multiplying by 100.

Table 2 Number of words in posts (average and total) and accuracy rates of STR and CAT (in percentage)

One experienced researcher, a co-author of this paper, carried out in-depth, one-on-one semi-structured interviews with all participants. Interviews were carried out face to face in the researcher’s office and online via Skype at the end of each step of the learning activity. All interviews were conducted in English. In the interviews, the researcher asked the participants about their experiences with the technology during the learning activity. The interview content was then analyzed by the researcher and his two assistants to derive issues related to STR and CAT processes and the workarounds that the participants used to address such issues. Each interview took approximately 30 min. The researcher recorded all interviews using a digital recorder and transcribed the content for analysis. Subsequently, the researcher and two assistants reviewed the interview transcripts and separately coded the text segments that contained information related to the research focus. They aggregated codes with similar meanings and formed categories to produce a framework in order to report the findings. Notable differences in the coding and categorization processes were resolved through rater discussion until a consensus was achieved. The inter-rater reliability of the interview data was evaluated by using Cohen’s kappa, and the result exceeded 0.90, thus demonstrating high inter-rater reliability.

We analyzed the communication content of the participants generated during the learning activity to measure their cross-cultural understanding. Since the participants communicated in ten different languages, we analyzed the content already translated into English by the participants. We adopted a coding unit concept for this analysis. Text segments that represented cross-cultural understanding of foreign traditions were highlighted and coded. Codes with related meanings were then collected and grouped. Established groups of codes produced a framework for reporting the research findings. We evaluated cross-cultural understanding with respect to cross-cultural knowledge (i.e. student familiarization with different cultural characteristics, values, beliefs, and behaviors), awareness (i.e. internal understanding and appreciation of a culture), and sensitivity (i.e. ability to read into situations, contexts, and behaviors that are culturally rooted and to react to them appropriately) (Jaakkola et al. 2010; Talalakina 2010). We did not focus on competence (i.e. ability to work effectively across cultures) because our learning activity took place online, so the participants had no chance to experience foreign culture and traditions in authentic contexts (e.g., to attend the morning pilaf ceremony in Uzbekistan), and thus, could not demonstrate their competence. A score of “1” was given if a participant’s communication content represented cross-cultural knowledge, “2” for awareness, and “3” for sensitivity. If communication content did not represent any of these, a participant got a score of “0.” Three raters were involved in the evaluation process. The inter-rater reliability coefficients among them were calculated using Cohen’s kappa. The mean inter-rater reliability among the three raters exceeded 0.90, which indicates excellent agreement.

Results and discussion

Accuracy rate of STR and CAT

Results related to the accuracy of the STR and CAT systems with respect to different languages are presented in Table 2 and Fig. 2 (as a percentage). The accuracy rate of STR for Mongolian is absent because Google® Translate does not feature an STR function for this language. One reason for this is because Mongolian is not a widespread language as compared to the other languages under consideration here. We therefore asked the Mongolian participant to type content into CAT using a keyboard. In addition, since we asked the participants to translate their content from their native language into English, and the Belizean participant was a native speaker of English, the CAT accuracy rate for Belizean English is absent.

Fig. 2
figure 2

Accuracy rates of STR (left) and CAT (right)

According to our results, the lowest STR accuracy rate was for English (the average was 93.94%), and the highest STR accuracy rate was for French and Hindi (the average was 98.51%). This result was surprising given that participant ID18 is a native speaker of English and since this is the most commonly used language, so the STR tool for English should be quite mature, and thus the STR accuracy rate should be high. We interviewed participant ID18 to discover the reason behind this low accuracy rate. The participant admitted that since he is from Belize, he has a very strong accent; therefore, STR was unable to recognize some spoken words correctly. Holm (1982) suggested that Belizean English differs from Standard English (i.e., British or the United States) in terms of both dialect and accent. This difference reflects the varied history of the speech community and the degree of contact that the community had with Spanish and Creole. STR is based on Standard English acoustic and language models. Thus, when participant ID18 pronounced words with an accent, STR was unable to recognize them correctly and generated text with errors. Since French and Hindi are widely spoken languages, the accuracy rate of STR for these languages was high. In addition, the French and Hindi participants mentioned that they had rehearsed using STR beforehand and knew several strategies to apply to the use of this technology. This is why the highest accuracy rate was for French and Hindi.

According to our results, the lowest CAT accuracy rate was for Mongolian (94.37%) and Filipino (94.60%), and the highest rate was for Spanish (98.15%), Russian (98.02%), and French (97.95%). Perhaps, this is due to the wide use of Spanish, Russian, and French in contrast to Mongolian and Filipino. The CAT database is bigger for Spanish, Russian, and French and smaller for Mongolian and Filipino. When there is a small language database, CAT translates texts with lower accuracy rates as compared to when there is a large database (Aiken and Balan 2011). Another reason for differences in accuracy rates is the similarities/differences between English and these languages, where a larger difference results in lower accuracy (Tobin 2015). Mongolian and Filipino are Asian languages and thus have greater differences with English compared to European languages such as Spanish, Russian, and French.

Scholars have suggested that texts produced by STR and CAT systems may contain mistakes and ambiguities (ElShiekh 2012; Fountain and Fountain 2009; Tobin 2015). Therefore, STR/CAT-texts are acceptable and useful for learning only when they are generated with reasonable accuracy (Barrachina et al. 2009; Mellebeek et al. 2005). Hwang et al. (2012) and Kuo et al. (2012) argued that accuracy rates of texts of more than 85% can be considered to be reasonable and that texts with these accuracy rates will indeed enable teaching and learning. Following this argument and the fact that STR and CAT have advanced greatly after nearly 4 years of development, an accuracy rate of 85% can now be easily achieved. Therefore, the participants in this study agreed that the STR- and CAT-generated texts were acceptable and useful. These findings are in line with those of previous studies (Shadiev and Huang 2016).

Our results show that the STR and CAT accuracy rates for some languages were not consistent and changed throughout the steps of the learning activity. For example, in the case of Hindi, Vietnamese, English, Filipino, and Russian, the STR accuracy rate was higher in the first step as well as in the second one. For Mandarin, French, Vietnamese, Spanish, Mongolian, and Russian, the CAT accuracy rate was higher in the first step as compared to the second one. Theoretically, the accuracy rate of the STR and CAT system should increase throughout the steps as the participants gain experience with using the systems and subsequently utilize them during communication more efficiently. However, our results are contradictory, so we interviewed the participants who speak these languages in order to explore any possible explanations for the results. In the interviews, the participants mentioned that they used words that are very common, easy, and unambiguous for the first step (i.e., self-introduction). However, they had to use some specific words or terminology related to local traditions in the second step, so STR and CAT could not always recognize and translate them correctly. This is perhaps because the STR and CAT systems do not have such words in their database. Labov (2011) suggested that languages change over time because some words are borrowed from other languages or invented due to different cultural environments. For example, “panades” and “garnaches” are words related to popular food items in Belize that were borrowed from Spanish. These words are currently in wide use in Belizean English but do not exist in the CAT database for English. They are likely to be accurately recognized/translated by the STR/CAT for Spanish rather than by the STR/CAT for English.

In terms of perceptions of STR and CAT systems, in the interviews, the participants mentioned that it was easy to use both. In addition, the participants claimed that the accuracy rates of the translated texts were very good and acceptable. They could fully understand the translated content posted by other participants. The participants said that their experience with STR and CAT enabled them to find the strengths and limitations of the technologies, and they consequently could fully utilize these technologies during the learning activity. With each step, they obtained content with fewer errors due to their experience with the technologies except in some cases when they had to use specific words and terminology, e.g., to describe traditions and related culture. However, one important issue should be considered. Some participants said that Internet Explorer or Mozilla Firefox were their favorite browsers, but it turned out that these browsers do not feature the STR function in Google® Translate. Some other browsers (i.e., Opera®) feature the STR tool, but it may stop recording speech input before a speaker stops speaking. Therefore, the participants had to use Google® Chrome for their experience with using Google® Translate to be efficient.

Although earlier studies have suggested that STR (Hwang et al. 2012; Kheir and Way 2006; Kuo et al. 2012; Ranchal et al. 2013; Ryba et al. 2006) and CAT (Hermet and Désilets 2009; Godwin-Jones 2011; Omar et al. 2012) are valuable tools to support learning, applications of these two technologies to cross-cultural learning have received little attention. In this study, we attempted to bridge this gap. Our results demonstrated the accuracy rates for different languages when using STR and CAT systems during multilingual cross-cultural communication.

Improving the accuracy rates of STR and CAT through various workarounds

In this section, we discuss several issues reported by the participants that are associated with STR/CAT-text accuracy rates and several workarounds the participants applied to address them. With regard to the STR process, the participants experienced seven issues and employed ten workarounds to address them (Table 3). STR issue 1 The STR technology did not add punctuation marks in the STR-texts. As a result, the participants got long strings of words and could not distinguish one sentence from another. The participants then dictated punctuation marks (STR workaround 1) or inserted them manually (STR workaround 2). This approach helped the participants distinguish when one sentence ended and another started, showed how the sentence should be read, and made the meaning clear. However, it should be noted that the STR function of Google® Translate does not add punctuation marks for Mandarin, Hindi, Indonesian, Vietnamese, and Filipino but does for English, Spanish, French, and Russian.

Table 3 Mistakes in texts generated by STR and strategies to address them

STR issue 2 The STR system for Traditional Chinese generated some characters in Simplified Chinese (characters that are currently used in Mainland China) instead of in Traditional Chinese (characters that are currently used in Hong Kong, Macau, and Taiwan). The participants tried to provide the content verbally several times, but the STR system kept generating these characters in Simplified Chinese. Since the native Chinese speaking participants were from Taiwan and were unfamiliar with Simplified Chinese characters, they did not know if the STR-text in Simplified Chinese was correct or not, so they changed the simplified Chinese characters into traditional ones manually (STR workaround 3).

STR issue 3 The STR system misrecognized some specific names and special terminology and generated them into wrong words. For example, according to participant ID7 (a native speaker of French), the name of the culture in Burkina Faso is “Dogon,” but the STR system recognized it as “de gang” (i.e., “gang” in English) because the word “Dogon” is a specific name of a culture and does not exist in French whereas “de gang” does. Participant ID7 tried to pronounce this word differently, but the STR system still generated an incorrect word. Then, the participant changed the misrecognized word into the correct one manually (STR workaround 4).

STR issue 4 The STR system generates errors when the spoken content is multilingual. For example, using Western names is popular in Asia nowadays, and if Asians say “my name is” in their native language and then add a name in English, the first part will be recognized correctly by the STR, but the second one will be incorrect. As shown in Table 3, STR-generated characters in Mandarin for the name “Jack,” written as “這個” (i.e., “zhege”), are pronounced similarly. The participants used two workarounds to deal with this issue. They changed the misrecognized Chinese characters into the English word manually (STR workaround 5) or spoke one part of the sentence in Mandarin first and then switched the STR input language to a specific other language (i.e., English in this case), and spoke the second part in that language (STR workaround 6).

STR issue 5 If a speaker paused for a few seconds (e.g., to think about what to say next) and then resumed speaking again, the STR-text for the second part of the speech could override the STR-text for the first part. The participants found this experience unpleasant because it took some time for them to regenerate the overridden STR-texts. To address this issue, the participants spoke without long pauses (STR workaround 7), and they also prepared a script for their speech, so they could speak fluently without any pauses (STR workaround 8).

STR issue 6 The STR system did not recognize some words at all. For example, participant ID8 (a native speaker of Hindi) found that the STR system recognized only the first syllable of the word “नमस्ते!” (i.e., “hello”), and the other part of the word was generated as asterisks, i.e., “न*****”. This word is very common in Hindi, and the participant found it strange that the STR system could not recognize it correctly even after several attempts. Then, the participant changed the word “न*****” to “नमस्ते!” manually (STR workaround 9).

STR issue 7 The STR system was unable to recognize words with similar pronunciations (e.g., “il” instead of “ils” in French or “雲” instead of “勻” in Mandarin because they have similar pronunciations but different meanings). When the STR system generated wrong words instead of the correct ones, the STR-texts did not make any sense. Therefore, the participants had to change “雲” to “勻” or “il” to “ils” manually (STR workaround 10).

With regard to the CAT process, the participants experienced ten issues and employed thirteen workarounds to address them (Table 4). CAT issue 1 Word order in the translated text was not the same as in the original sentence, which made the content of the CAT text incorrect and even incomprehensible in some cases. For example, participant ID21 made a self-introduction in Russian, and one word was relocated from the beginning of the translated text to the end, so the sentence became meaningless (Table 4). On the other hand, the CAT system translated the sentence and kept words in the same order as in a spoken sentence for Mandarin and Vietnamese. Although the words were translated correctly, their order in the translated sentence made it incorrect and meaningless. This is because word order in some languages is not the same as in English. It is suggested that if a sentence is similar in structure to a target language, the CAT system can translate more easily. Sentences in some languages, like Mandarin and Vietnamese, do not have a structure similar to that of English. In these languages, sentences do not have spaces between words, which adds complexity since the CAT system may not even know what constitutes a word. In order to address this issue, most participants tried to make their sentences shorter by breaking one sentence into small parts, and they separated parts using paragraphs (CAT workaround 1). Some students made their sentences shorter and used punctuation marks after each part, i.e., commas, semicolons, hyphens, and full stops (CAT workaround 2). In addition, students came up with the idea of making sentences in their native language in the active form, i.e., subject + verb + the remaining words (CAT workaround 3). They said that this workaround helped them obtain a better accuracy rate.

Table 4 Mistakes in texts translated by CAT and strategies to address them

CAT issue 2 Some words in a sentence were missing after translation (mostly the verb “to be”). To address this issue, the participants added missing words to the translated text manually (CAT workaround 4), or they changed the spoken sentence by using different words to convey the same meaning (CAT workaround 5).

CAT issue 3 CAT added some extra or repeated words in the translated text. For example, when one participant greeted other participants in his/her native language, CAT doubled the word “Hello” in a translated greeting (Table 4). The participants deleted the extra or repeated word manually (CAT workaround 6) to address this issue.

CAT issue 4 The CAT system translated long sentences with more errors. For example, if there were several sentences, and no punctuation marks were inserted, CAT considered them to be one sentence and translated the words using its algorithms. As a result, the outcome was one translated sentence with words from different sentences mixed together and in the wrong order. To address this issue, the participants broke a long sentence into small parts (or shorter sentences) and separated the sentences into paragraphs (CAT workaround 1), or they inserted punctuation marks, i.e., commas, semicolons, hyphens, and full stops, into the sentences (CAT workaround 2).

CAT issue 5 Some participants do not use the verb “to be” in their native language for daily communication, e.g., when they say who they are. Therefore, when they communicated this type of content, CAT did not include a form of “to be” in the translated text and thus made it incorrect. To address this issue, the participants had to add form of “to be” to the spoken sentences in their native language (CAT workaround 7). They said that it was correct when they used “to be,” but the sentence became more formal.

CAT issue 6 CAT was not able to translate some specific terms/names correctly. For example, one participant tried to translate the Mongolian word “buuz” into English, and the CAT system translated it as the original word “buuz” instead of as “dumplings.” In such cases, the participants edited the incorrect translation to the correct word manually (CAT workaround 8). Some participants suggested improving translation by adding a “suggesting an edit” function to the system (CAT workaround 9). That is, they suggested that the system could allow clicking on a “Suggest an edit” prompt that would then allow them to type the correct word in the box that opened (Fig. 1).

CAT issue 7 CAT was also unable to translate some words into informal or casual language. For example, it is normal to say, “Good morning!” in Mandarin either as “早” or “早上好.” However, the former is more casual, and it was translated into English by the CAT system as “Early.” The participants improved the translation by suggesting an edit (CAT workaround 9), or they had to use formal language in the spoken sentence in order to get the correct translation, i.e., “Good morning!” (CAT workaround 10).

CAT issue 8 The CAT system translated a sentence with more errors if some complex words were used in a spoken sentence (Table 4). When the participants used simple words in a spoken sentence (CAT workaround 11), the CAT translation accuracy rate was higher compared to when they used more complex words.

CAT issue 9 The CAT was not able to translate ambiguous words (i.e., words with two or more translations) based on context. For example, the Spanish word “Mi muñeca” can be translated either as “My doll” or “My wrist.” CAT translated it as “My doll” whereas the participants needed it to be translated as “My wrist.” The participant tried to translate it several times, but every time got the wrong translation and had to revise “My doll” into “My wrist” manually in the translated text (CAT workaround 12).

CAT issue 10 CAT was unable to translate some cultural idioms correctly. For example, a participant tried to use a Russian idiom to talk about local culture, and the CAT system translated it word by word, so the meaning of idiom was lost (Table 4). The participant had to revise the idiom in the translated text manually (CAT workaround 13).

Some workarounds, e.g., manually typing, changing, or correcting some words/characters in the texts produced by the STR or CAT systems can prove to be far from novel as actual workarounds for some experienced users. However, there are some categories of users, e.g., novice users, for whom these workarounds can be new and useful. Because our participants were from thirteen different countries, not all of them had easy or full access to information technology. In fact, thirty percent of our participants did not have much experience with the use of CAT technology prior to this study; therefore, some of these participants did not think of using these workarounds at all. When we discussed these workarounds with these participants, they found them very useful.

Facilitating cross-cultural understanding

We asked the participants at the beginning of the study to select and experience a culture they were not familiar with. The participants admitted that before our cross-cultural learning activity, they had no prior knowledge regarding the culture and traditions they experienced. After the cross-cultural learning activity, we assessed the cross-cultural knowledge, awareness, and sensitivity of the participants, and the results showed that all participants reached all levels of cross-cultural understanding. That is, the participants learned and understood the foreign traditions they experienced in the learning activity. According to the results, the participants had become familiarized with different cultural characteristics, values, beliefs, and behaviors, and they could recall, interpret, summarize, compare, and explain the traditions they learned about and experienced. Furthermore, the participants were able to appreciate a foreign culture and to reflect on their own experiences of a foreign culture. We provide extracts from the participants’ communication representing key concepts related to foreign traditions (Appendix: Steps 3 and 4). For example, participant ID2 explained the Day of the Dead tradition and pointed out the differences between this tradition and that of Chinese culture; participant ID10 summarized and interpreted the “morning pilaf” tradition; participant ID20 understood what “buuz” was and was able to explain how to prepare it; participant ID21 talked about the red envelope tradition and its origins.

We interviewed the participants to explore their perceptions regarding the cross-cultural learning activity supported by the STR and CAT systems. In the interviews, the participants mentioned that applications of STR and CAT during the learning activity were interesting and useful for their cross-cultural learning. First, the participants were able to introduce their traditions and also reflect on their experiences with foreign traditions in their native language. Second, the participants were able to read what others said in foreign languages without any knowledge of these languages. Therefore, applications of STR and CAT helped the participants communicate with each other in their own native language because the technologies translated the communication content. As a result, the participants understood foreign traditions through the content communicated by other participants in foreign languages during the learning activity. More importantly, they learned traditions from those who represented their own traditions. The participants mentioned that they could communicate with representatives of thirteen nationalities and select any of twenty-one traditions. Information posted by the participants was special and unique. In addition, the participants could ask some specific questions of those who introduced traditions they were interested in and could obtain instant answers. Without STR and CAT, the participants would not have been able to communicate with the other participants due to language barriers. Our findings are in line with the cultural convergence theory (Gudykunst et al. 1988; Kincaid 1979) and the findings of earlier studies. According to this theory, cross-cultural learning takes place if learners from different cultures communicate and exchange learning information with each other. We designed a learning activity to implement cross-cultural knowledge, and the participants exchanged information about their culture, traditions, and cross-cultural peculiarities using online tools (Talalakina 2010). The participants discussed their traditions, experienced foreign traditions, and shared their experiences; such communication and experiences thus enabled cross-cultural learning to take place (Kohlberg 1984).

One important point worth mentioning here is our learning activities design and the technological support obtained by combining STR and CAT technologies to facilitate cross-cultural understanding occurred in a way that was impossible or ineffective to achieve using other methods of instruction. In contrast to other designs used for cross-cultural learning activities, e.g. reading or watching videos (Kohlberg 1984), our learning activities were focused on communication and exchange of culture-related information among students. Participation in our learning activities led students not only to receive information but also to experience it and reflect on their experiences. Furthermore, students were able to discuss culture-related information with a host from a given culture. Therefore, our learning activities helped promote peer learning practice and increased a sense of authenticity. On the other hand, some earlier studies also focused on interaction among participants to enhance their cross-cultural understanding (Aparicio et al. 2016; Çiftçi 2016; Ertmer et al. 2011; Rogers et al. 2007); however, the language barrier was reported as the critical issue that hindered communication and exchange of culture-related information among students speaking different languages (Osman and Herring 2007; Shadiev and Huang 2016). Employing STR and CAT technologies during our learning activities enabled students from different cultures who spoke different native languages to communicate with each other without language barriers. As a result, their sense of connectivity increased. When we applied STR and CAT technologies together, they complemented each other. STR technology is a valuable tool for education because it synchronously generates text streams from speech input, and students can use these texts to confirm what is being said, to attain a better understanding of learning material, to take notes, and to complete homework (Hwang et al. 2012; Kuo et al. 2012; Kheir and Way 2006; Ranchal et al. 2013; Ryba et al. 2006). CAT extends applications of STR by translating communication content into many different languages, so translated STR-texts can also be helpful to those who cannot speak the language in which communication content is delivered. STR extends applications of CAT by providing a hands-free input method. Shadiev and Huang (2016) argued that STR input is more fun, convenient, and faster than typing. When considering mobile technology, which is an important everyday tool not just in our lives but also in the educational milieu, STR technology is a very helpful method by which to input cross-cultural communication content since typing on mobile devices is not as easy to accomplish.

The second point is how this present study differs from our earlier research, e.g. Shadiev and Huang (2016), Shadiev et al. (2017a), and Shadiev et al. (2017b). Shadiev et al. (2017) explored the effectiveness of STR applications on learning performance, which is a different research direction from that of this study. Shadiev et al. (2016) focused on facilitating cross-cultural understanding with project-based collaborative learning in an online environment but without applications of STR and CAT. Both the study of Shadiev and Huang (2016) and the present study focused on applications of STR and CAT to facilitate cross-cultural understanding. However, the participants in this study communicated in ten different languages, so their communication was multi-lingual, whereas communication among the participants in the previous research was bi-lingual (Shadiev and Huang 2016). In addition, our participants were from thirteen countries and therefore, represented multiple cultures, whereas the participants in Shadiev and Huang (2016) were only from two countries, thus representing only two cultures. Therefore, the current exploration focuses on whether applications of STR and CAT are effective in terms of supporting and facilitating the cross-cultural understanding of participants from several countries interacting about their culture and traditions in multiple languages. In addition, previous research (Shadiev and Huang 2016) explored the accuracy rates of STR and CAT in two languages only, but we focused on ten. Furthermore, we explored issues associated with STR and CAT processes and how they can be solved, which was not the focus in Shadiev and Huang (2016).

Conclusions

Our results suggest that using STR and CAT systems are useful for simple, daily life communication in most languages. However, when considering communication on complex and advanced topics, STR and CAT produced more accurate content only for widely used languages that are similar to English (e.g., Russian, French, and Spanish). At this time, STR and CAT should not be considered a well-rounded professional translation mechanism from voice input since they have limitations that need to be considered. As stated by Scigliano (2010), the translations might still lack correct grammar and punctuation in some cases. Following this notion, Google® mentioned: “Even today’s most sophisticated software, however, does not approach the fluency of a native speaker or possess the skill of a professional translator. Automatic translation is very difficult, as the meaning of words depends on the context in which they are used. While we are working on the problem, it may be some time before anyone can offer human quality translations.” Recent evidence suggests that the accuracy rate of STR and CAT is improving as time goes on (Simonite 2016). In the case of STR, it was about 88% in 2006 (Kheir and Way 2006), about 90% in 2012 (Hwang et al. 2012; Kuo et al. 2012), and more than 90% in 2015 (Shadiev and Huang 2016) and 2016 (i.e., the present study). In the case of CAT, it was about 74–89% in 2015 (Shadiev and Huang 2016) and was more than 90% in 2016 (i.e., the present study). The results of this study showed that translated communication content was accurate enough across the languages so that the participants found it comprehensible and useful for communication and understanding foreign traditions. Meanwhile, there is still room for improvement since the accuracy rate is not yet ideal (Barrachina et al. 2009; Fountain and Fountain 2009; Mellebeek et al. 2005). Therefore, our results can be useful for educators and researchers who plan to apply STR and CAT to support multilingual communication during a learning process in the near future. Based on our results, they will be able to make the design of learning activities with STR and CAT support more efficient. Notably, the workarounds reported by our participants can be useful to improve the accuracy rate of STR and CAT. It is also important to note that while the experiences of our participants relate to the usage of STR and CAT systems for ten languages only, many of the issues and possible solutions discussed in this paper are relevant to other languages as well. In addition, we suggest that applications of STR and CAT systems can be useful not only for cross-cultural learning programs but for other programs as well. For example, students may learn such subjects as geography by exchanging related learning information with students in different countries using multilingual communication.

Some limitations of this study must be acknowledged. One limitation relates to the small sample size, short term exposure to applications of STR and CAT during the cross-cultural learning activity, and the limited number of languages that participants spoke as their mother tongue. This issue may limit generalization of our results to the wider population. Therefore, researchers need to consider these limitations and address them in future studies. Another limitation relates to potential biases in the validity of the accuracy rate. Participants’ ability to understand English enabled them to compare translations with original texts in their mother tongue. This, in turn, could affect implementation of the workarounds discussed earlier related to using these technologies more effectively. Therefore, continuous usage of the workarounds throughout this study could lead to increased accuracy rate of texts generated by the technology. In the future, we will focus on scenarios where when students speak, the technology generates texts in the native languages of their partners, which the students do not understand. We will investigate whether the accuracy rate will be the same or different. In future studies, researchers may also wish to explore what language characteristics influence the accuracy rates of STR and CAT with regard to recognizing some culture-related words or terms. It currently is not clear whether the accuracy rate of STR and CAT was influenced by the fact that words or terms for some traditions are more easily recognized by STR and CAT than those from others.