Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The identification of deceptive behavior is a task that has gained increasing interest from researchers in computational linguistics. This is mainly motivated by the rapid growth of deception in written sources, and in particular in Web content, including product reviews, online dating profiles, and social networks posts [10].

To date, most of the work presented on deception detection has focused on the identification of deceit clues within a specific language, where English is the most commonly studied language. However, a large portion of the written communication (e.g., e-mail, chats, forums, blogs, social networks) occurs not only between speakers of English, but also between speakers from other cultural backgrounds, which poses important questions regarding the applicability of existing deception tools. Issues such as language, beliefs, and moral values may influence the way people deceive, and therefore may have implications on the construction of tools for deception detection.

In this paper, we explore within- and across-culture deception detection for four different cultures, namely United States, India, Romania, and Mexico. Through several experiments, we compare the performance of classifiers that are built separately for each culture, and classifiers that are applied across cultures, by using unigrams and word categories that can act as a cross-lingual bridge. Our results show that we can achieve accuracies in the range of 60–70 %, and that we can leverage resources available in one language to build deception tools for another language.

1.1 Related Work

Research to date on automatic deceit detection has explored a wide range of applications such as the identification of spam in e-mail communication, the detection of deceitful opinions in review websites, and the identification of deceptive behavior in computer-mediated communication including chats, blogs, forums and online dating sites [10, 11, 15, 16, 19].

Techniques used for deception detection frequently include word-based stylometric analysis. Linguistic clues such as n-grams, count of used words and sentences, word diversity, and self-references are also commonly used to identify deception markers. An important resource that has been used to represent semantic information for the deception task is the Linguistic Inquiry and Word Count (LIWC) dictionary [12]. LIWC provides words grouped into semantic categories relevant to psychological processes, which have been used successfully to perform linguistic profiling of true tellers and liars [9, 14, 20]. In addition to this, features derived from syntactic Context Free Grammar parse trees, and part of speech have also been found to aid the deceit detection [3, 17].

While most of the studies have focused on English, there is a growing interest in studying deception for other languages. For instance, Fornaciari and Poesio [5] identified deception in Italian by analyzing court cases. The authors explored several strategies for identifying deceptive clues, such as utterance length, LIWC features, lemmas and part of speech patterns. Almela et al. [1] studied the deception detection in Spanish text by using SVM classifiers and linguistic categories, obtained from the Spanish version of the LIWC dictionary. A study on Chinese deception is presented in [18], where the authors built a deceptive dataset using Internet news and performed machine learning experiments using a bag-of-words representation to train a classifier able to discriminate between deceptive and truthful cases.

It is also worth mentioning the work conducted to analyze cross-cultural differences. Lewis and George [6] presented a study of deception in social networks sites and face-to-face communication, where authors compare deceptive behavior of Korean and American participants, with a subsequent study also considering the differences between Spanish and American participants [7].

At difference from us, both studies analyze cultural differences using a statistical approach, where data was collected by interviewing participants and principal component analysis was applied to identify cultural aspects related with deception such as liars topic’s choice, and gender differences. In this study we rely on machine learning techniques to build deception classifiers from written statements provided by true tellers and deceivers.

In general, related research findings suggest a strong relation between deception and cultural aspects, which are worth exploring with automatic methods.

2 Datasets

We collect four datasets for four different cultures: United States (English-US), India (English-India), Romania, and Mexico (Spanish-Mexico). Following [8], we collect short deceptive and truthful essays for three topics: opinions on Abortion, opinions on Death Penalty, and feelings about a Best Friend.

To collect both truthful and deceptive statements for the Abortion and Death Penalty topics we first instructed the participants to think they were participating in a debate, where they were asked to provide their truthful opinion about the topic. Secondly, we asked them to imagine a debate where they had to provide an opposite view from what they truly believed, thus generating false statements about the topic being discussed. In both cases, we asked them to provide plausible details and to be as convincing as possible. For the Best Friend topic, we collected the deceptive and truthful essays by first asking participants to provide a description of their best friend, and second asking them to describe someone they disliked as though he/she were their best friend.

In order to collect the English-US and English-India datasets, we used Amazon Mechanical Turk with a location restriction, so that all the contributors are from the country of interest (US and India). We collected 100 deceptive and 100 truthful statements for each of the three topics. To avoid spam, each contribution was manually verified by one of the authors of this paper.

For Spanish-Mexico, while we initially attempted to collect data also using Mechanical Turk, we were not able to receive enough contributions. We therefore created a separate web interface to collect data, and recruited participants through contacts of the paper’s authors. The overall process was significantly more time consuming than for the other two cultures, and resulted in fewer contributions as shown in Table 1.

For the Romanian dataset we also used a separate web interface and participants were recruited through contacts of one of the paper’s authors. Since participants were allowed to end their participation at any time, the final process resulted in a different number of contributions per each topic as shown in Table 1.

Table 1 Dataset distributions for four deception datasets

For all four cultures, the participants first provided their truthful responses, followed by the deceptive ones. Also, all contributors provided their responses for different topics in the same topic order: Abortion, Best Friend, and Death Penalty.

Table 2 Sample statements from four deception datasets
Table 3 Word count distribution between deceptive (D) and truthful (T) statements and average number of words per statement for four deception datasets

Table 2 shows sample statements from each dataset. Also, word count distributions for the four datasets are shown in Table 3. Interestingly, for all four cultures, the average number of words for the deceptive statements is significantly smaller than for the truthful statements, which may be explained by the added difficulty of the deceptive process, and is in line with previous observations about the cues of deception [2].

3 Experiments

Through our experiments, we seek answers to the following questions. First, what is the performance for deception classifiers built for different cultures? Second, can we use information drawn from one culture to build a deception classifier for another culture? Finally, what are the psycholinguistic classes most strongly associated with deception/truth, and are there commonalities or differences amonglanguages?

In all our experiments, we formulate the deception detection task in a machine-learning framework, where we use an SVM classifier to discriminate between deceptive and truthful statements.Footnote 1

Table 4 Within-culture classification, using LIWC word classes and unigrams

3.1 What is the Performance for Deception Classifiers Built for Different Cultures?

We represent the deceptive and truthful statements using two different sets of features. First we use unigrams obtained from the statements corresponding to each topic and each culture. To select the unigrams, we use a threshold of 10, where all the unigrams with a frequency less than 10 are dropped. We choose this threshold due their best performance in the reported experiments. Also, since previous research suggested that stopwords can contain linguistic clues for deception, no stopword removal is performed.

Experiments are performed using a ten-fold cross validation evaluation on each dataset. Using the same unigram features, we also perform cross-topic classification, so that we can better understand the topic dependence. For this, we train the SVM classifier on training data consisting of a merge of two topics (e.g., Abortion + Best Friend) and test on the third topic (e.g., Death Penalty). The results for both within- and cross-topic are shown in the last two columns of Table 4.

Second, we use the LIWC lexicon to extract features corresponding to several word classes. LIWC was developed as a resource for psycholinguistic analysis [12]. The 2001 version of LIWC includes about 2,200 words and word stems grouped into about 70 classes relevant to psychological processes (e.g., emotion, cognition), which in turn are grouped into four broad categoriesFootnote 2 namely: linguistic processes, psychological processes, relativity, and personal concerns. We also used a Spanish version of the LIWC lexicon [13] as well as a Romanian version [4]. A feature is generated for each of the 70 word classes by counting the total frequency of the words belonging to that class. The resulting features are then grouped into four different sets containing the LIWC classes subset corresponding to each of the four broad categories. We perform separate evaluations using each of the feature sets derived from broad LIWC categories, as well as using all the categories together. The accuracy classification results obtained with the SVM classifier are shown in Table 4.

Overall, the results show that it is possible to discriminate between deceptive and truthful cases using machine learning classifiers, with a performance superior to a random baseline which for all datasets is 50 % given an even class distribution. Considering the unigram results, among the four cultures, the deception discrimination works best for the English-US dataset, and this is also the dataset that benefits most from the larger amount of training data brought by the cross-topic experiments. In general, the cross-topic evaluations suggest that there is no high topic dependence in this task, and that using deception data from different topics can lead to results that are comparable to the within-topic data. An exception to this trend is the Romanian dataset, where the cross-topic experiments lead to significantly lower results than the within-topic evaluations, which may be partly explained by the high lexicalization of Romanian. Interestingly, among the three topics considered, the Best Friend topic has consistently the highest within-topic performance, which may be explained by the more personal nature of the topic, which can lead to clues that are useful for the detection of deception (e.g., references to the self or personalrelationships).

Regarding the LIWC classifiers, the results show that the use of the LIWC classes can lead to performance that is generally better than the one obtained with the unigram classifiers. The explicit categorization of words into psycholinguistic classes seems to be particularly useful for the languages where the words by themselves did not lead to very good classification accuracies. Among the four broad LIWC categories, the linguistic category appears to lead to the best performance as compared to the other categories. It is notable that in Spanish, the linguistic category by itself provides results that are better than when all the LIWC classes are used, which may be due to the fact that Spanish has more explicit lexicalization for clues that may be relevant to deception (e.g., verb tenses,formality).

Table 5 Classification accuracy per class for Linguistic category classifier

Concerning the specific accuracy for the deception class, we analyzed detailed accuracies per class, obtained by the best classifier from Table 4, which is the one built using only the Linguistic category from LIWC. Table 5 shows the precision, recall, and F-measure metrics obtained for the deceptive and truthful classes obtained by the classifier for each culture. From this table we can observe that for Spanish as well as for both English cultures, the identification of deceptive instances is slightly easier than the identification of truthful statements. For Romanian instead, the truthful instances are more accurately predicted than the deceptive ones. We further analyzed differences in word usage among true tellers and liars in each culture in Sect. 3.3.

3.2 Can We Use Information Drawn from One Culture to Build a Deception Classifier in Another Culture?

In the next set of experiments, we explore the detection of deception using training data originating from a different culture. As with the within-culture experiments, we use unigrams and LIWC features. For consistency across the experiments, given that the size of the Spanish and the Romanian datasets is different compared to the two English datasets, we always train on the English-US dataset.

To enable the unigram based experiments, we translate the two English datasets into either Spanish or Romanian by using the Bing API for automatic translation.Footnote 3 As before, we extract and keep only the unigrams with frequency greater or equal to 10. The results obtained in these cross-cultural experiments are shown in the last column of Table 6.

Table 6 Cross-cultural experiments using LIWC categories and unigrams

In a second set of experiments, we use the LIWC word classes as a bridge between languages. First, each deceptive or truthful statement is represented using features based on the LIWC word classes grouped into four broad categories: linguistic process, physiological process, relativity, and personal concerns. Next, since the same word classes are used in all three LIWC lexicons, this LIWC-based representation is independent of language, and therefore can be used to perform cross-cultural experiments. Table 6 shows the results obtained with each of the four broad LIWC categories, as well as with all the LIWC word classes.

Note that we also attempted to combine unigrams and LIWC features. However, in most cases, no improvements were noticed with respect to the use of unigrams or LIWC features alone.

These cross-cultural evaluations lead to several findings. First, we can use data from a culture to build deception classifiers for another culture, with performance figures better than the random baseline, but weaker than the results obtained with within-culture data. An important finding is that LIWC can be effectively used as a bridge for cross-cultural classification, with results that are comparable to the use of unigrams, which suggests that such specialized lexicons can be used for cross-cultural or cross-lingual classification. Moreover, using only the linguistic category from LIWC brings additional improvements, with absolute improvements of 2–4 % over the use of unigrams. This is an encouraging result, as it implies that a semantic bridge such as LIWC can be effectively used to classify deception data in other languages, instead of using the more costly and time consuming unigram method based on translations.

3.3 What are the Psycholinguistic Classes Most Strongly Associated with Deception/Truth?

The final question we address is concerned with the LIWC classes that are dominant in deceptive and truthful text for different cultures. We use the method presented in [8], which consists of a metric that measures the saliency of LIWC classes in deceptive versus truthful data. Following their strategy, we first create a corpus of deceptive and truthful text using a mix of all the topics in each culture. We then calculate the dominance for each LIWC class, and rank the classes in reversed order of their dominance score. Table 7 shows the most salient classes for each culture, along with sample words.

Table 7 Top ranked LIWC classes for each culture, along with sample words

This analysis shows some interesting patterns. There are several classes that are shared among the cultures. For instance, the deceivers in all cultures make use of negation, negative emotions, and references to others. Second, true tellers use more optimism and friendship words, as well as references to themselves. An interesting finding is the use of the Religion and Family classes by Romanian true-tellers, which seems to be very related to cultural background, as religion is an important cultural component. In contrast with the other cultures, Romanian speakers use more positive feeling (Posfeel) and Optimism related words when expressing deceptive statements.

These results are in line with previous research, which showed that LIWC word classes exhibit similar trends when distinguishing between deceptive and non-deceptive text [9]. Moreover, there are also word classes that only appear in some of the cultures; for example, time classes (Past, Future) appear in English-India and Spanish-Mexico, but not in English-US, which in turn contains other classes such as Insight and Metaph.

4 Deception Detection Using Short Sentences

One limitation of the experiments presented in the previous section is that they all rely on domain-specific datasets, which may bias the deception detection. To address this potential concern, as a final experiment, we explore the detection of deception in a less-constrained environment, where the topic of the deceptive statements is not set a priori.

We collect and experiment with two datasets consisting of short open-domain truths and lies, contributed by speakers of English-US and Romanian.

For English, we set up a Mechanical Turk task where we asked workers to provide seven lies and seven truths, each consisting of one sentence, on topics of their choice. For Romanian, we designed a web interface to collect data, and recruited participants through contacts of the paper’s authors. Romanian speakers were asked to provide five truths and five lies, again on topics of their choice. In both cases, the participants were asked to provide plausible lies and avoid non-commonsensical statements such as “A dog can fly.” In addition to the one-sentence truths and lies, we also collect demographic data for the contributors, such as gender, age, and education level. The class distribution for these datasets is shown in Table 8.

Table 8 Class distribution for the Romanian and English-US open-domain deception datasets

Similar to the domain-specific experiments, for these open-domain datasets we run within- and across culture experiments. Table 9 shows the results of the deception classification experiments run separately on the English and Romanian datasets, whereas Table 10 shows the results obtained in the cross-cultural experiments.

Table 9 Within-culture classification, using LIWC word classes and unigrams
Table 10 Cross-cultural experiments using LIWC categories and unigrams

Not surprisingly, the accuracy of the deception detection method on the open-domain data is below the accuracy obtained on the domain-specific datasets. In addition to the domain-specific/no-domain difference, this drop in accuracy can also be attributed to the fact that the open-domain data consists of short sentences rather than full paragraphs, which could also further explain why using the LIWC derived features does not lead to noticeable improvements over the use of unigrams.

A similar trend is observed in the cross-culture experiments reported in Table 10, where unigrams outperform the use of LIWC classes. It is important to note however, that the use of linguistic classes is still preferable over the use of unigrams, with a rather small accuracy drop of only 2.79 % over the use of costly and more time consuming translations.

To further analyze the nature of the lying process in the open-domain datasets, we obtained the psycholinguistic classes most strongly associated with deception and truth sentences. The results are presented in Table 11. Interestingly, the analysis confirm our findings for the domain-specific experiments, where shared lying patterns among cultures include the use of negation, negative emotions, and references to others. Furthermore, true-tellers related patterns are also shared among cultures, where the most salient classes are family, positive emotions, and positive feeling.

Table 11 Top ranked LIWC classes for English and Romanian, along with sample words

At the same time, we can observe interesting differences among cultures, for instance the use of the words associated with the classes We and Achieve by the Romanian speakers as indicative of truthful responses. Moreover, unlike the American deceivers, Romanian deceivers use Eating, Senses and Body classes more frequently.

5 Conclusions

In this paper, we addressed the task of deception detection within- and across-cultures. Using four datasets from four different cultures each covering three different topics, as well as two additional datasets from two cultures on free topics, we conducted several experiments to evaluate the accuracy of deception detection when learning from data from the same culture or from a different culture. In our evaluations, we compared the use of unigrams versus the use of psycholinguistic word classes.

The main findings from these experiments are: (1) We can build deception classifiers for different cultures with accuracies ranging between 60–70 %, with better performance obtained when using psycholinguistic word classes as compared to simple unigrams; (2) The deception classifiers are not sensitive to different topics, with cross-topic classification experiments leading to results comparable to the within-topic experiments; (3) We can use data originating from one culture to train deception detection classifiers for another culture; the use of psycholinguistic classes as a bridge across languages can be as effective or even more effective than the use of translated unigrams, with the added benefit of making the classification process less costly and less time consuming; (4) Similar findings, although with somehow lower classification results, can be obtained for open-domain short sentence texts in both within- and across-cultures experiments, which confirm the portability of the classification method presented in this paper.

The datasets introduced in this paper are publicly available from http://lit.eecs.umich.edu.