Keywords

1 Introduction

For organizations, diversity provides vast benefits. Diversity of top management can strongly improve innovativeness and firm performance [15]. Besides top management diversity, racial diversity in the general workforce is also connected to increased sales revenue, more customers, bigger market share, and greater relative profits [6]. From these studies, we can infer that, cultural difference, which is diversity in the cultural backgrounds, is important and beneficial for organizations and teams. It could lead to a broader range of ideas and some of those ideas could lead to promising innovations.

However, language differences and cultural differences are significant issues in intercultural collaboration. Even though diversity has many advantages, it also raises difficulties in communication. With different cultural and language backgrounds, communication and collaboration become challenging. Nowadays, support tools and services are available for multilingual communication [10]. Machine translation (MT) is now available and helps to offset translation problems when there is no bilingual human or translator around. Various MT embedded chat systems have also been developed to support communication across languages. Unfortunately, MT is not perfect and can cause difficulties in communication, including misunderstanding due to mistranslation, cause conversation breakdown [13], and difficulties in establishing mutual understanding [16], etc. The absence of a common ground can result in misunderstanding, and it might take time for the participants to realize the misunderstanding unless he/she has been deeply engaged in both languages and cultures or knows the team members very well. An attractive solution is extend MT systems such they can identify likely causes of misunderstanding.

Therefore, the purpose of this paper is to propose a baseline method to automatically detect words that could cause misunderstanding when the team members speak different languages. We utilize an image comparison technique to detect these words. Because people with different language background sometimes see things differently [4], images from existing databases can be linked back to the languages and keywords which would allow us to identify the cultural differences. As far as we know, this is the first study to base the automatic detection of cultural differences on image comparison.

The outputs of this method can be used to support intercultural design workshops and MT-mediated communication. Potential cultural differences can then be visualized or the MT system can warn the user that the word used has a high probability of triggering a misunderstanding.

2 Motivation

2.1 Motivating Scenario

Obviously, multilingual communication is more difficult than monolingual communication. The differences in cultural backgrounds can yield even more problems if no common ground exists between the parties. Our previous work [14] reported collaboration difficulties among children during a workshop. Briefly a block of clay was shown to the children and they were asked “what does this looks like”. A Japanese participant said it looked like ‘red bean paste’ but the children from other countries did not understand, since red bean paste in their culture looks different. The problem was solved by finding images of Japanese red bean paste, and showing those images to the other children. In this case, even if MT output was correct, this problem still exists because the team members have different backgrounds. Fortunately, they realized that they did not fully understand each other after a moment, so they could resolve the misunderstanding. Failure to identify misunderstanding rapidly delays the collaboration significantly and might lead to failure. In many cases, people do not realize that there is misunderstanding happening during the conversation, instead they think they understand but, they do not. Our solution is to create a tool that can detect cultural differences and thus possible misunderstandings. The prior study found that the children had different mental images due to their diverse cultural backgrounds, the problem was solved by searching for images in their language and share them with the other children, so they finally understood each other. From this case, we would like to investigate if the difference or the similarity of images can link us back to the cultural difference or give us information about cultural background.

2.2 Cultural Difference and Images

Cho and Ishida [2] referred to the detection of cultural difference as detecting semantic difference based on the culture definition of Geertz [5], which defines culture as “a historical transmitted pattern of meanings embodied in symbols”. They said that it can be viewed as ‘a pattern of interpretation’ or ‘a pattern of semantic’. Our method is designed to detect semantic difference in two languages and thus could be viewed as cultural difference. We expect it to be used as a tool to predict misunderstanding that could happen. The children’s workshop indicated that the world could look different through the language glass. Besides the event, Deutscher [4] gave a lot of interesting examples of how different language speakers see the world differently. One example provided is that the historical evidence of ancient people shows no reference/usage of blue color, not because they saw less color than we do, but because blue is extremely rare in nature, so there was no need to find a name for this color. He also mentioned that some languages have no word for ‘time’, and some languages use four cardinal directions instead of ‘left’ and ‘right’. It is reasonable then to assume that people with different language backgrounds might have different images in their mind and might have different thoughts when presented with the same word or the translation of the same word. Nowadays, there are databases of images with annotations in various languages and many image search tools are available. If these images and their keywords satisfy users by providing good images for the keyword input in different languages, we could use the image database as a tool to identify cultural difference. For instance, the word (dan-go), which means Japanese sweet dumpling made of rice flour, can be translated into ‘dumpling’, however, when we look for images of and ‘dumpling’ the results are totally different. Even though the translation of this word is not wrong, Japanese speakers and English speakers have different mental images when presented with these two words, as shown in Fig. 1. From this example, we can infer that an annotated image database covering the different languages can help us identify cultural differences between or among the speakers of different languages.

Fig. 1.
figure 1

Samples from images of (dan-go, Japanese sweet) (left) and ‘dumpling’ (right)

3 Related Work

3.1 Cultural Difference Detection

Cultural difference detection Groups of researchers have been studying and identifying cultural difference. Most of them collect data using cross-national surveys and then analyze the survey results. For example, Hofstede’s cultural dimension [7] is a well-known model of culture. Yoshino [17] also conducted a cross-national survey and compared the social values, ways of thinking, and other attributes. The results from these studies are interesting, but it seems impossible to apply the results to achieve real-time detection.

Cho et al. [3] studied the cultural difference in pictogram interpretations and its pattern. They did a web survey to understand the difference in pictogram interpretation in the U.S. and Japan. They found that, 19 of 120 pictograms were judged to have culturally different interpretations. They also found three patterns of the interpretation difference including, “two cultures could share the same concept but with different perspectives”, “two cultures partially share the concept”, and “two cultures do not share any concept”.

Yoshino et al. [18] proposed a method for cultural difference detection in Wikipedia. The initial dataset is created by examining words and phrases with different meaning and usage in Japanese and Chinese by 18 Japanese students and five Chinese students. They proposed a flow of judgements based on the initial dataset. They evaluated four judgements as being successful in assessing cultural difference, including, “The article is not explained from a global viewpoint”, “While the Japanese Wikipedia version of the article mentions Japan, the Chinese version does not.”, “Existence of a defining statement, categorization by country name, and reference to origin or target country.”, and “Neither country name is mentioned in either language version of Wikipedia.”

Existing works on cultural difference identification and detection have been conducted within specific areas of use. Moreover, the need for human judgement is inevitable. Accordingly, this paper focuses on developing a method that can automatically detect cultural differences present in a broad range of domains.

3.2 Support Tools for Heterogeneous and Intercultural Team

In interdisciplinary and cross-cultural collaboration, the team members have various backgrounds and if they are not aware of it, difficulties in collaboration can arise. A group of researchers [11] proposed a support tool to create awareness of the bias in design teams. Their goal was to make each member be aware of her/his own interpretation of a topic while understanding and respecting the other team members’ viewpoints. Their process includes asking each member to make a bias card by choosing three pictures for a topic, together with text (up to 140 characters), and sharing the cards among team members.

Their proposed tool can help create mutual understanding among the members. However, topics must be selected and the same process must be performed repeatedly. By comparison, our proposal can create mutual understanding without the need for topic assignment and eliminates the need for repetitive checks.

4 Method

To resolve the problem mentioned in Sect. 1, we introduce a method that can detect possible misunderstanding when communicating across languages by comparing images that are associated with keywords in different languages. We chose to use image comparison for several reasons. First, images are well linked to language and culture, as explained in Sect. 2. Second, it is often said that “A picture is worth a thousand words”. Images contain information that might not be present in a dictionary. Finally, since image databases and image search engines are available, it is more convenient and less time consuming to use them together with an image comparison technique to automate the detection of possible misunderstanding. Figure 2 shows the overall procedure used to compute the similarity of a word and its translation in another language. If the similarity is low, there is more possibility of misunderstanding when those words are used in cross-culture and cross-language communication.

Fig. 2.
figure 2

Similarity calculation procedure.

First, a word, \(W_{L1}\) is selected in language, \(L_{1}\). Its translation in language \(L_{2}\) is \(W_{L2}\). We look for a certain number of images in an image database or in the put of an image search engine for both words. The number of images should be sufficiently large, since many images may not well represent the keywords. Ideally, all the images linked to a word should be similar, but in many cases the images are rather diverse. For example, when the keyword is ‘lion’, the image results are mostly pictures of lions. But, when the keyword is ‘zoo’, the image results could include pictures containing different kinds of animals. If we only chose one image for the word ‘zoo’, we might get only the image of a lion, which does not well represent ‘zoo’ but has high similarity to the word ‘lion’ instead. To calculate similarity, after randomly selecting images, an image processing technique is used to extract features of all images for word \(W_{L1}\) and word \(W_{L2}\). Feature extraction is a dimensional reduction process so the original image data can be simplified and processed. The features extracted from the images for the same word are averaged and compared with processing results for the other word. It is also possible to compare every extracted feature but this would take too long time and consume large amounts of computation resources. Lastly, the similarity between the two averaged features is computed. This similarity usually ranges from 0 to 1. A lower similarity indicates a bigger difference between the images and thus a higher chance of misunderstanding.

5 Experiment

We conducted an experiment to examine the proposed detection method. The selection of the data source and tools used were based on simplicity, and they could be adjusted or replaced with other resources or services for better accuracy. The software for this experiment, was written in Python.

The system computed the similarity between words in English and Japanese using words from Japanese WordNet [9], a Japanese-English lexical database, created from the original English WordNet of Princeton University [12]. In WordNet, lemmas, the dictionary form of words, are linked to sets of synonyms called synsets. For Japanese WordNet, Japanese lemmas and English lemmas are linked to the same synset. We conducted this experiment on 2,500 randomly-selected noun synsets. Around half of the synsets Japanese lemma linked to the synset, so it is impossible to calculate the similarity for those synsets. Based on the detection method proposed in Sect. 4, first, we randomly selected a synset from the Japanese WordNet database and selected one or two lemmas in each language, based on its availability. If there were more than two lemmas, we run the same similarity calculation program with ten images for each lemma and choose the most similar two lemmas. Since some synsets have excessive number of lemmas making the calculation infeasible, if there were more than five lemmas for a language in one synset, we randomly selected just five lemmas and calculated the similarity among them to find two lemmas for the next step of calculation. The reason behind this is when there are too many lemmas, some lemmas have slightly different, boarder, or more specific meaning than the others, so we attempt to find the two most similar lemmas that can strongly represent the synset in each language.

Fig. 3.
figure 3

An example of feature extraction and image comparison process

For this experiment, we downloaded 30 images for each word if there was one lemma per language and 15 images for each word if there were two lemmas, with Google Image Download APIFootnote 1. The images were resized to 224 pixels \(\times \) 224 pixels for feature extraction. To extract image features, we used one of the most popular tools, VGG16Footnote 2 from Keras. The software transformed visual information of the image into a vector space. The result of feature extraction was a vector that contains 4,096 feature values for each image. An example of feature extraction and comparison is shown in Fig. 3. After that, averaged features were calculated. The synsets with two lemmas had their averaged features calculated from both lemmas.

Several similarity measures can be used, but for simplicity, we used cosine similarity, one of the most common methods of comparing vectors. Given A is the feature vector for one word and B is the feature vector for its translation, cosine similarity was calculated as follows:

$$\begin{aligned} similarity = \cos (\theta )= {\mathbf{A \cdot \mathbf B} \over \Vert \mathbf{A}\Vert \Vert \mathbf{B}\Vert } = \frac{ \sum _{i=1}^{n}{{\pmb A}_i{\pmb B}_i} }{ \sqrt{\sum _{i=1}^{n}{{\pmb A}_i^2}} \sqrt{\sum _{i=1}^{n}{{\pmb B}_i^2}} } \end{aligned}$$
(1)

We iterated the program 2,500 times on Windows servers. The calculation took several days since it involved around 150,000 images and heavy computation loads.

6 Result

We calculated the similarity of words in 2,500 synsets. Figure 4 displays the numbers of synsets grouped by the calculated similarity result. Most synsets, almost 60% of the synsets, contain lemmas with similarity values between 0.7 and 0.9. Synsets with similarity lower than 0.6 can be considered as low similarity synsets and the possible cause of misunderstanding, misperception, or different interpretation of words in those synsets. There are more synsets with high similarity than synsets with low similarity, because in the real world, most words do not cause misunderstanding or misperception.

Fig. 4.
figure 4

The Histogram of word similarity in synsets

Figure 5 displays typical examples of synsets with similarity values lower than 0.6 and higher than 0.6. The similarity values are rounded to four decimal places. Each synset used for the calculation contained one to two lemma(s) in English and one to two lemma(s) in Japanese.

Fig. 5.
figure 5

Example of synsets with high and low similarity from the result

The images under each lemma are chosen just to demonstrate the meaning of the lemma and represent overall images to the reader. Some images in the table are not used in the real calculation, but are similar to those images that were used in the calculation. These images are displayed for the reader understanding, because most images downloaded in our experiment are not permitted to be publicly reused.

From Fig. 5, it is obvious that synsets with high similarity contain more similar images from its lemmas. For synset 04317175, the word ‘stethoscope’ and (stethoscope) refer to the same object and so give the same interpretation to the reader. Synset 13490343 also has high similarity but slightly lower than the synset of word ‘stethoscope’. Images of all the words in this synset, including ‘growth’ (growing) and (progress, improvement), yield similar images and they have similar meaning. But, since these words are abstract nouns, the images are slightly different. For example, images for all those 3 words contain images of small trees, in addition to graphs with upward arrow and figures representing evolution and development. Synset 07673397 contains the words ‘oil’, ‘vegetable oil’, and (vegetable oil); it has slightly lower similarity than the first two synsets. However, the word ‘oil’ has broader meaning than the other two words so it yielded images of drilling rigs and oil in containers.

To detect the cultural difference and misunderstanding due to different language and backgrounds, we focus on synsets with low similarity. Synset 10913871 has remarkably low similarity. The reason for the similarity result between ‘Cowper’, ‘William Cowper’ and (Mini Cooper) could be the nature of language. Because Japanese language has relatively few phonemes, many words derived from foreign language can be written in Japanese character but they are pronounced differently from the original words. Both ‘Cowper’ and ‘William Cowper’ are person’s names but is pronounced ‘Kuu-Paa’. It is the homophone of ‘cooper’ in Japanese which makes most people think of Mini Cooper, i.e. the car made by the automobile marque called Mini. When an English speaker person uses the word ‘Copwer’, without any further explanation, a Japanese speaker might misunderstand that the speaker is talking about cars.

Synset 00269674 involves two words: ‘makeover’ and (reform). ‘Makeover’ is defined as “the process of improving the appearance of a person or a place, or of changing the impression that something gives” [8]. But when it is used in English, people will think about a personal makeover, usually involving make up and change in appearance. Image results from this word mostly include picture of women before and after a beauty makeover. Whereas , in a Japanese dictionary, is defined as: (1) Revise, improvement (2) To remake, to resize or redesign clothes, to renovate building(s). The main image created by a Japanese is usually related to building renovation. Images of this word are usually images of a renovated room.

Synset 07491476 links ‘amusement’, (recreation, playfulness), to (recreation). The Oxford Advanced Learner’s Dictionary defines ‘amusement’ as “the feeling that you have when something is funny or amusing, or it entertains you” and “a game, an activity, etc. that provides entertainment and pleasure”. The images of this word show mostly pictures of amusement parks and rides. The word , and are rarely used in daily life. They have similar meaning to ‘amusement’ in terms of fun and pleasure, but their use is different and more complex. is not included in standard dictionaries but are used by only some groups of people. However, its adjective, , exists in dictionaries describing the feeling of fun and enjoyment. The kanji, Chinese character used in Japanese language, means recreation and pleasure. The images of are related to ink painting magazines and painting exhibitions since they are used as magazine names and exhibitions. The image results are much different from each othes, since this word is ambiguous and intangible. A few images from are related to handmade objects and craftwork due to the usage of this word. Even though the meaning of this word is mainly about fun and recreation, this word can be explained as “To create enjoyment, give people pleasure”. Unlike the other words in this synset, this word involves personal pleasure. It is also often used conditional, for example “it will be fun, if I succeed”.

6.1 Pattern of Low Similarity Synsets

From the results of our experiment, low similarity has a few reasons that cause. Synsets with low similarity could lead to misunderstanding and different interpretations between English and Japanese speakers. When we look at the synsets with similarity lower than 0.6, we find two interesting causes related to cultural difference.

First, a word and its translation have the same meaning but represent different images in the languages. Because of the different backgrounds, language and culture, people look at a word and its translation differently.

Second, words in one language can have more specific or broader meaning than in the other. The cases in this category includes:

  1. 1.

    Broader meaning in one language - When the word in one language has broader meaning, it might yield a greater variety of images.

  2. 2.

    Several homonyms in one language - When one word has several meanings it could confuse people even for native speakers. Some of the meanings might include slang or a negative interpretation compared to its translation.

  3. 3.

    Specific noun in one language, i.e. name of famous person, brand name, product name, company name - When a word is used as a specific noun, such as a product name, in one area, many people will think about the product instead of the original meaning of that word.

7 Discussion

7.1 Visualization of the Cultural Difference

Existing work [11] raise the possibility of sharing images and short texts made by each team member in design workshops to help mutual understanding and own bias. Here, we would like to present an alternative tool to be used in the same situation. The key advantage of our tool is that the information to be shared can be created automatically using data output by the proposed cultural difference detection method. Using visualization can help team members realize the difference and understand each other better; Fig. 6 is an example. This graph can be shown to the team member together with the images already downloaded for the calculation. When the word ‘makeover’ is mentioned, we grow the tree graph from the root nodes for a few levels and investigate the neighbor nodes using data from Japanese WordNet. In this graph, we use 0.6 and 0.75 as thresholds to determine node color. We use green for similarity values of 0.75 and above (high similarity nodes), yellow for values between 0.6 and 0.75 (medium similarity). Yellow nodes might contain different interpretations of words or concepts where the difference is not obvious enough to cause misunderstanding but caution in advised. Red nodes are for words with similarity under 0.6; they have high possibility of causing misunderstanding. The hierarchy of this tree is based on the relationships in WordNet. The upper nodes are words with broader meaning than its lower originating node, or a hypernym of the lower node. Lower nodes are words with more specific meaning than their upper nodes, i.e. a hyponym of the upper node. In a workshop, such as a brainstorming workshop for an advertising idea, when the English speaker mentions ‘makeover’, the Japanese member might look at the translation and think about it as ‘reform’ and assume that the discussion is about ‘reform’ not ‘makeover’. The automated system can warn and display this tree to the participants. Since, this tree seems to fit the images from Japanese speaker’s side, the Japanese can use it to confirm their meaning and the English speaker can see what Japanese speaker thinks about this word since this word is about reconstruction, fixing, and improvement, given that the upper nodes have fair or high similarity.

Fig. 6.
figure 6

An example of visualization of cultural difference (Color figure online)

7.2 Application of the Result

The results from our detection method can be used in MT-mediated communication. When a word used in the translation process, i.e. in a chat system, is identified as a low similarity word in our database, we can warn the team members and offer the word tree and related images to them. Even for intercultural workshops or intercultural design workshops that are conducted in one language, it is still useful to warn the team about possible misunderstandings when there is non-native speaker in the team. We can also apply this warning implementation in computer mediated communication between native and non-native speakers. The result can be used as reference for culture and language studies as well. It can be modified and used as an additional tool in existing language learning research, for example to help identify false friends, words in two languages that look or sound similar but differ significantly in meaning [1] in language studies.

7.3 Limitation

Besides cultural differences low similarity is sometimes the consequence of imperfect translation of the language resource. Low similarity can also be caused by technical errors, for instance, few images are available for download. We note that abstract nouns are one of the problems our method has difficulty with, since they are not linked to unambiguous images.

7.4 Future Direction

Our current method requires a combination of different tools so its accuracy also depends on the effectiveness of those tools, including image comparison tool, and the accuracy of the lexical database or the MT system. Since this paper does not focus on the accuracy of similarity in this paper, we can improve accuracy by changing the tools and method used, for example, using different methods to select words in the same synset, using different image comparison methods, using different similarity measures, etc. The threshold of the low or high similarity can also be adjusted and studied in the future for more accurate prediction. Future work could combine the use of dictionaries and other language resources for better translation of each lemma and it could allow us to identify abstract nouns and treat those lemmas differently.

We plan to include a confidence measure of the calculation by looking at the variety of images from one lemma. If one lemma yields wildly different images, the quality of the calculation in the next step might be degraded.

We also plan to implement and evaluate the visualization tool. Comparing word trees that include similarity data is also interesting. We might be able to see the relationship between low similarity nodes form the similarity of the roots.

8 Conclusion

Our main contribution of this paper is to present a novel method to automatically detect cultural difference, and possible misunderstanding, or possible misinterpretation of words. We aim to use it for MT-mediated communication and intercultural workshops. We believe that, this is the first work to apply image comparison to the identification of cultural differences. Since existing works [3, 7, 17, 18] mostly studied and identified cultural difference using the survey method, they can only study cultural differences across specific culture pairs. It also is too expensive and too slow. The proposed method achieves our goal of detecting possible cultural differences, misunderstandings, and misinterpretations in intercultural collaboration automatically. As such, it can be used in broader areas of study and does not need human effort.

We investigated our method by applying it to Japanese WordNet. We looked for dissimilarities between sets of images that represent words in English and Japanese using an existing image database and automated image comparison. We conducted an experiment on 2,500 synsets and presented some of our results in this paper. Low similarity can be due to different images being linked to a word and its translation, resulting from the different backgrounds. The second cause is the unequal meaning assigned to words in each language. For example, when words in one language have several meanings and when one meaning in one language is used as a specific noun, including name of commercial brands, companies, or people.