Automatic deception detection in Italian court cases

Fornaciari, Tommaso; Poesio, Massimo

doi:10.1007/s10506-013-9140-4

Automatic deception detection in Italian court cases

Published: 21 February 2013

Volume 21, pages 303–340, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Artificial Intelligence and Law Aims and scope Submit manuscript

Automatic deception detection in Italian court cases

Download PDF

Tommaso Fornaciari¹ &
Massimo Poesio²

1777 Accesses
37 Citations
13 Altmetric
2 Mentions
Explore all metrics

Abstract

Effective methods for evaluating the reliability of statements issued by witnesses and defendants in hearings would be an extremely valuable support to decision-making in court and other legal settings. In recent years, methods relying on stylometric techniques have proven most successful for this task; but few such methods have been tested with language collected in real-life situations of high-stakes deception, and therefore their usefulness outside lab conditions still has to be properly assessed. In this study we report the results obtained by using stylometric techniques to identify deceptive statements in a corpus of hearings collected in Italian courts. The defendants at these hearings were condemned for calumny or false testimony, so the falsity of (some of) their statements is fairly certain. In our experiments we replicated the methods used in previous studies but never before applied to high-stakes data, and tested new methods. We also considered the effect of a number of variables including in particular the homogeneity of the dataset. Our results suggest that accuracy at deception detection clearly above chance level can be obtained with real-life data as well.

Fighting the Fake: A Forensic Linguistic Analysis to Fake News Detection

Article 28 April 2022

Signs of Legal and Pseudolegal Authority: A Corpus-Based Comparison of Contemporary Courtroom Filings

Article Open access 03 September 2024

From Case Law to Ratio Decidendi

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Effective methods for tagging potential deception on the basis of verbal or non-verbal cues (by hand or automatically) would have a number of applications in court and other legal settings. The focus of the research presented in this paper is tagging potential deception in court testimonies to support criminal investigations in cases in which external evidence of the truthfulness of these testimonies is not (yet) available, but deception detection methods could also be applied in other legal, policing and security applications, for example to identify fake reviews of books or hotels, and in human resources evaluation. There has therefore been a great deal of research in the topic—see, e.g., De Paulo et al. (2003), Ekman (2001), Fitzpatrick and Bachenko (2009), Hancock et al. (2008), Newman et al. (2003), Strapparava and Mihalcea (2009), Vrij (2008), and many many others. Among other results, this line of research showed that, regarding behavioral clues to deception, “there is no clue or clue pattern that is specific to deception, although there are clues specific to emotion and cognition” (Frank et al. 2008). Meta-studies such as De Paulo et al. (2003) and Hauch et al. (2012), on the other end, identified a number of verbal cues systematically correlated with lying and truth telling: e.g., liars tend to use more negative emotion words, more motion verbs, and more negation words, whereas truth-tellers tend to use more self-references (I, me, mine) and more ‘exclusive’ words (i.e., exception connectives: except, without, etc.). [See also Newman et al. (2003)]. As a result, automatic methods focusing on verbal cues have been developed able to detect deception with reasonable accuracy (Newman et al. 2003; Strapparava and Mihalcea 2009).

This field of research suffers, however, from a serious problem: the difficulty of collecting data suitable to study the problem, or to develop automatic methods to identify deception. It is often difficult or impossible to verify the truthfulness of statements contained in data collected in natural environments (Vrji 2005). As a result, many if not most studies in the area, and in particular the just mentioned papers proposing computational techniques for deception detection, rely on data collected in laboratory conditions (Newman et al. 2003; Strapparava and Mihalcea 2009). But as the authors themselves point out (Newman et al. 2003), lying imposes a cognitive and emotional load on individuals which is not easy to reproduce artificially, and anyway achieving true ‘high-stakes’ deception would have serious ethical implications (Fitzpatrick and Bachenko 2009). (In the context of police investigations, the awareness of the legal consequences of a testimony and the emotional impact of speaking about criminal events can turn out to be very stressful for the subjects who issue statements.) Therefore it is by no means obvious that the results obtained with data collected in the lab will generalize to real life scenarios. For example, Undeutsch (1984) claimed that, due to the lack of ecological validity, laboratory studies are not very useful in testing the accuracy of tools for the evaluation of witnesses’ reliability, such as the analyses based on Statement Validity Analysis—SVA (Vrji 2005). [Gokhmann et al. (2012) provide a useful review of the types of data used in deception detection research.]

As a result, Newman et al. (2003) identify the fact that “… external motivation to lie successfully was practically nonexistent…” among their participants as one of the main limitations of their work, the first and best known attempt to develop a computational method for deception detection relying entirely on verbal cues. A second limitation they identify is the fact that their model is limited to the English language; and given that differences in rates of self-reference is one of the main cues for identifying truth-tellers, they see Romance languages such as Italian or Spanish as particularly interesting languages to test the cross-linguistic validity of their claims. In the research discussed in this paper we addressed these two limitations of the earlier study. Specifically, we set ourselves two objectives:

1.
to collect a dataset in the context of criminal proceedings that would not suffer from the shortcomings of the datasets employed to develop earlier computational models of deception detection;
2.
to compare the results obtained with this dataset with those obtained in earlier studies both from an accuracy point of view and from the point of view of the verbal cues employed.

In order to accomplish the first objective, we created a corpus of hearings in Italian courts for cases of calumny and false testimony, in which the defendant is accused to have issued deceptive statements during a previous hearing. When the defendants are found guilty, the trials end with a judgment which reconstructs the investigated facts and specifies quasi-verbatim the lies told in the courtroom. This information allowed us to annotate the utterances produced by the defendants as true, false or uncertain with great accuracy. The resulting corpus, called DeCour (for DEception in COURt) is the first resource for studying Italian true and false statements in a real life scenario. [And because the data are in a Romance language, the second limitation pointed out by Newman et al. (2003) can be addressed as well.]

DeCour was used to train text classification models classifying utterances as false or not-false purely on the basis of verbal information. Besides replicating the methods used by Newman et al. (2003), we also applied to the task a number of ideas from the field of Stylometry (see following Section).

The structure of the paper is as follows. Section 2 is a summary to the field of deception detection and to the application of stylometric techniques in this area. In Sect. 3 our dataset is described in more detail. In Sect. 4 we discuss the machine learning and experimental methods we used to identify deceptive statements in DeCour. Finally, the results are presented in Sect. 5 and discussed in Sect. 6.

2 Background

2.1 Detecting deception

Detecting deception in communication is a challenge for humans. Human performance at recognizing deception was found to be not much better than chance in a number of studies (Bond and De Paulo 2006). Other studies claim that even specific training is not particularly effective to improve human skills (Levine et al. 2005). On the other end, there are studies suggesting that the ability of humans as lie-detectors is underestimated (Frank and Feeley 2003). In any case, even in papers which reveal positive effects of training, the difficulty of the task is out of the question (Porter et al. 2000).

2.2 Approaches to deception detection

In part no doubt because of the very difficulty of the task, a wide variety of approaches to discover deceptive statements have been tried. The literature about deceptive communication can be divided in three main branches, depending on the cues investigated:

Studies focused on non-verbal behaviour;
Studies focused on verbal behaviour;
Recent studies based on neuro-physiological, and in particular neuro-imaging techniques.

All of these approaches are however based on the same theoretical assumption, whether explicitly or implicitly: this is the idea, historically formalized by Undeutsch as the hypothesis which takes his name (Undeutsch 1967), that the cognitive elaboration of an untruthful narrative differs from the elaboration of a truthful one, therefore this difference should be traceable in the features of the narrative itself. Undeutsch was interested in verbal behavior, but his theoretical framework is also suitable to study non-verbal communication, and is consistent with recent findings using neuro-imaging techniques (Davatzikos et al. 2005; Ganis et al. 2003; Merikangas 2008; Simpson 2008).

2.3 Non-verbal approaches

The best known method for detecting deception, the polygraph, relies on non-verbal cues, but the literature contains a great number of papers studying the relation between deception and various aspects of non-verbal behaviour. One of the best known authors in this area is Ekman (2001), who studied in particular facial expressions. Other cues are the time taken to respond (response latency), or pupil dilatation (Wang et al. 2010). Many authors use combinations of cues in their attempt to improve accuracy at detecting falsehoods. This is the case of De Paulo et al. (2003), who consider more than 150 cues, verbal and non-verbal, observed through subjects mostly in lab conditions. Jensen et al. (2010) focused on cues coming from audio, video and textual data, with the aim of building a paradigm useful to identify deceptiveness.

However, coherently with the cited study of Frank et al. (2008), a common finding in this research is that it is difficult to identify non-verbal cues specific for deception, and also De Paulo et al. (2003) argue that “behaviors that are indicative of deception can be indicative of other states and processes as well”. With regard to this, Walczyk et al. (2003) mention the case of Aldrich Ames, the spy who, from 1985 to 1994, provided the former Soviet Union with classified material he obtained as high-level agent of the CIA. During these 9 years, he successfully passed two polygraph tests.

2.4 Hermeneutic approaches

Undeutsch developed a framework called Statement Analysis (Undeutsch 1967, 1982, 1984), inspired by the notion of truth in interpretation as expressed in the field of Hermeneutics developed by Heidegger, Gadamer, and others. In such approaches the truth of statements is assessed on the basis of principles called ‘reality criteria’ and designed to ensure that the statement is factual. Statement Analysis and its successors such as Statements Validity Analysis (SVA)—in turn divided in three stages, a semistructured interview, the Criteria-Based Content Analysis (CBCA), and an evaluation of the CBCA outcomes—are commonly used in forensic practice and in the literature. However, according to Vrji (2005) “SVA evaluations are not accurate enough to be admitted as expert scientific evidence in criminal courts but might be useful in police investigations”. Thus Adams (1996), among others, asserted the necessity to take into account the personal style of communication together with the content of the testimonies.

2.5 Stylometry

The approach to the analysis of verbal cues for deception identification that is becoming more and more dominant in recent years is stylometry. Stylometry studies text on the basis of its stylistic features only. This can be done for a variety of purposes, e.g., in order to attribute the text to an author (authorship attribution) or to get information about the author, e.g., her/his gender or personality (author profiling). Stylometry actually goes back a very long way—the arguments used by Lorenzo Valla in the Fifteenth century to demonstrate the falsehood of the Donation of Constantine are essentially stylistic ones (Pepe 1996)—but it is only in the Nineteenth century that the field took place with the introduction by De Morgan of quantitative measures in stylistic studies (Lord 1958). (Quantitative) stylometric methodology was subsequently formalized by Lutoslawski (1898). Modern stylometry, which relies mainly on computational methods for automatically extracting low-level verbal cues from large amounts of text and on machine learning techniques, has proven effective in several tasks, including author profiling (Coulthard 2004; Solan and Tiersma 2004) [for example, deducing age and sex of authors of written texts (Koppel et al. 2006; Peersman et al. 2011)], author attribution (Luyckx and Daelemans 2008; Mosteller and Wallace 1964), emotion detection (Vaassen and Daelemans 2011) and plagiarism analysis (Stein et al. 2007).

2.6 Stylometric methods for deception detection

As Koppel et al. (2006) point out, the features used in stylometric analysis belong to two main families: surface-related and content-related features. The second kind of features, in turn, could be divided in two categories: features extracted from lexicons, and features coming from the linguistic analysis of texts themselves.

Surface-related features This type of features includes the frequency and use of function words or of certain n-grams of words or part-of-speech (POS tag), without taking into consideration their meaning.
Content-related features These features attempt to capture the meaning of texts. Such information may come from:
- Lexicons lexicons associate each word to a variety of categories of different kinds: grammatical, lexical, psychological and so on. This results in a profile of texts with respect to those categories.
- Linguistic analyses more complex analyses such as syntactic analyses, extraction of argument structure or coreference are also possible. Some of these analyses can be carried out automatically, but others, such as those carried out by Bachenko et al. (2008), can only be done by hand.

Newman et al. (2003) was arguably the first study showing that stylometric techniques could be effectively applied to detect deception. In that study, Newman et al. collected in the lab a corpus of sincere and deceptive texts from five different topics and contexts: videotaped, typed and handwritten discussions about attitudes to abortion, feelings about friends, and mock crime. These data were then analysed using a lexical resource: specifically, the Linguistic Inquiry and Word Count (LIWC), a lexicon created by Pennebaker’s group (Pennebaker et al. 2001) and categorizing words under a number of categories such as their emotional content, self reference, etc.. The authors reached an accuracy of about 60 % (with a peak of 67 %) in three of the five studies, against a chance performance of 50 %. In the remaining two studies, the performance was not better than chance.

Strapparava and Mihalcea (2009) used surface features only. Strapparava and Mihalcea obtained good results at classifying into “sincere” or “deceptive” texts collected with the Amazon Mechanical Turk service.

Finally, an example of (semi-automatic) approach to deception detection using linguistic analysis is the work presented in Bachenko et al. (2008) and Fitzpatrick and Bachenko (2009). Fitzpatrick and Bachenko are in the process of collecting a high-stakes corpus including criminal statements, police interrogations, and civil testimony (Fitzpatrick and Bachenko 2012). Several linguistic indicators of deception were identified, such as linguistic hedges (e.g., to the best of my knowledge...), overzealous expression (I swear to God), negative emotions (I was a nervous wreck), and a variety of inconsistencies with respect to verb and noun form. The texts were then manually annotated with these indicators. This information was used as features to classify deceptive statements, with very high accuracy (close to 75 %).

3 Data set

In this section we briefly discuss how we collected and annotated a dataset containing examples of ‘high stakes’ deceptive language produced by subjects for whom the deception had real-life implications: the DeCour corpus.

3.1 Calumny and false testimony in the Italian Criminal Code

DeCour is a collection of hearings for “calumny” and “false testimony” (articles 368 and 372 of the Italian Criminal Code, respectively). While the concept of “false testimony” is fairly intuitive, ^{Footnote 1} in the Italian Criminal Code “calumny” is a particular kind of false testimony, consisting in the attempt to charge on someone else the responsibility of some crime which has been committed. ^{Footnote 2} The distinction makes sense because in the Italian legal system nobody can be forced to make statements unfavorable to oneself; thus to lie about a committed crime is not considered a crime, but it is a crime to try to blame someone else. Therefore the hearings in DeCour come from two main situations:

the defendant in a criminal proceeding tries to calumny someone;
a witness in a criminal proceeding lies for some reason.

In both cases, a new criminal proceeding is initiated, in which the subjects can issue new statements or not, and having as body of evidence the transcript of the hearing held in the previous proceeding.

DeCour only contains hearings in which at the end the defendant is found guilty of “calumny” or “false testimony”. Hence the proceeding ends with a judgment of the Court which summarizes the facts, pointing out precisely the lies told by the speaker in order to establish his punishment. Thanks to the transcription of the hearing in one hand, and to the final judgment of the Court in the other hand, it is possible to annotate the statements of the speakers on the basis of their truthfulness or untruthfulness.

3.2 Validity of the judgments of truth and falsity

Normally in corpus annotation one is only worried about replicability—i.e., whether different coders will assign the same code to an item. In this type of task however we are also concerned with validity: how confident can we be that the statements marked as false are actually false?

Of course, it is possible that Court judgments are wrong: some evidence coming form the inquiry could be in some way mistaken or misinterpreted by the judge. Since the annotation of DeCour relies on the information provided by the judgment, this would bring about an erroneous evaluation of the statements’ truthfulness and would result in some noise in the data. This kind of risk is unavoidable.

Our analysis of the data we collected suggests that any bias in Court is to the advantage of the defendant. In accordance with the principle of in dubio, pro reo, ^{Footnote 3} when the least doubt exists about their guilt, defendants are not convicted. While collecting data we ran across several proceedings where the defendant was probably lying, and the judge most likely thought so as well, but in which the defendant was ultimately acquitted for lack of evidence of deception. These proceedings were not included in DeCour. On the other end, when the defendant is convicted, his guilt is always well demonstrated.

Therefore, even though it is not possible to estimate the rate of errors in these judgments, we expect it to be fairly low.

3.3 The hearings

Among the various kind of reports which are produced in a criminal proceeding, the minutes of the hearings held in Court seemed to be most appropriate and useful for our purposes, because they are transcripts which are required to reproduce verbatim what the subject said in courtroom. ^{Footnote 4} DeCour is composed by the minutes of 35 hearings held in four Italian Courts: Bologna, Bolzano, Prato and Trento. These minutes report verbatim the statements produced by 31 different individuals (four of whom heard twice).

3.4 Preprocessing

3.4.1 Tokenization

The whole corpus was tokenized. The tokens include the words of the texts as well as punctuation. Punctuation marks are considered in blocks: this means that, for example, a single dot or a single question mark constitute a token, but an ellipsis that is three consecutive dots “...” also constitutes a single token. Our analysis units are the utterances, defined as strings of text delimited by punctuation marks, such as periods, question marks and ellipses. Taking punctuation marks in blocks prevents the creation of analysis units made uniquely by single punctuation marks. By contrast, apostrophes—which in Italian indicate the lack of the last vowel in the previous word—were not treated as separate tokens, but are kept together with the previous word. This helped the performance of the following lemmatization. Acronyms, such as “S.p.A.”, “P.M.” and so on, were considered as single tokens too. Otherwise, the dots would separate the letters constituting the acronym, with a proliferation of meaningless tokens and utterances. Lastly, hours expressed in numbers, such as “9:10”, were also considered single tokens; in this case, the aim was to keep separated the numbers from the specific case of telling an hour.

3.4.2 Anonymisation

Sensitive data were anonymised, as agreed with the Courts. Proper names of persons and things, such as streets, cars and so on, were substituted with five “x”. Therefore, each proper name was counted as the same token “xxxxx”, leaving a specific trace in the frequency lists of tokens of the cases in which the subject tells a proper name.

3.4.3 Lemmatization and POS-tagging

The whole corpus was put in lower-case, and then lemmatized and POS-tagged using a version of TreeTagger ^{Footnote 5} (Schmid 1994) trained for Italian.

3.5 Annotation

The hearings are dialogs in which at least four roles are always present and have precise duties dictated by rules of the Criminal Proceeding Code. The judge is an impartial figure who has to judge the facts. The prosecutor brings about the accusations, whereas the lawyer is in charge of the defense. All of these individuals ask questions to the defendant, who has to answer them. These answers are the object of investigation of this study.

Each answer—i.e., all the text between the end of the previous intervention by another individual and the beginning of the following intervention—is considered a turn. Each turn is constituted by one or more utterances which, as said above, are delimited by terminal punctuation marks (period, triple-dots, question and exclamation mark). The individual utterance is the analysis unit of DeCour corpus and has been annotated according to the following annotation scheme:

True The utterance is held as true if coherent with the reconstruction of the facts reported in the final judgment.
False The utterances in contrast with that reconstruction are held as false. The judgment often lists precisely the lies told by the speaker: in this case the false utterances are easily identifiable.
Uncertain Even though the judgments give a complete description of the facts, they cannot account for every statement of the witness/defendant. The utterances whose truthfulness is not clear are classified as “uncertain”. This category also includes the utterances lacking in propositional value, which from a logical point of view cannot be true or false, such as questions, meta-communicative acts and so on (for example “May you repeat, please?”, “If you think so...”...).

In order to verify agreement in the judgments about truthfulness or untruthfulness of the utterances, three annotators annotated separately about 600 utterances. Reducing the agreement to a binary task—false utterances in one side and not-false utterances in the other side, that is true and uncertain utterances—we obtained a κ (Artstein and Poesio 2008) value of κ = .64.

3.6 Some statistics

DeCour is made of 3,015 utterances, which come from 2,094 turns. 945 utterances have been annotated as false, 1,202 as true and 868 as uncertain. The size of DeCour is 41,819 tokens, including punctuation blocks, distributed as follows:

Label	Utterances	Tokens
True	1,202	15,456
Uncertain	868	10,439
False	945	15,924
Total	3,015	41,819

4 Methods

In the next section we will present several experiments concerned with the development of computational models for deception detection based on machine learning techniques. In this section we discuss the methods used to train those models.

4.1 Features

In the experiments of Newman et al. (2003), lexical features from the LIWC were used. Much work in stylometry however suggests that comparable and occasionally better performance can be achieved using surface features such as n-grams of words and/or POS tags. We tested both types of features in our experiments.

4.1.1 Utterance length

In our experiments the unit of analysis are utterances rather than full documents and therefore (differently from the output of the LIWC) it does not make sense to count the mean number of words for sentence. But we do compute two utterance length features: with and without punctuation. These two features are used in all experimental conditions. ^{Footnote 6}

4.1.2 LIWC features

82 out of the 85 ‘dimensions’ (lexical categories) of the LIWC Italian dictionary are also included among the features in these experiments. The features “Loro”, “Passivo” and “Formale” ^{Footnote 8} were discarded: “Loro” is used to categorize only one lexical item in the dictionary, whereas “Passivo” and “Formale” are not related to any term.

4.1.3 Lemma and POS n-grams

What we call here surface features are computed from frequency lists of n-grams of lemmas and part-of-speech. Lemma and part-of-speech n-grams of seven items were considered, from unigrams to eptagrams; long n-grams were included to identify conventional expressions. In each experiment, these frequency lists are computed from the subset of DeCour employed as training set in that experiment. More precisely, they come from the utterances classified as true or false in the training set, while utterances classified as uncertain were not considered in order to avoid picking up not discriminating features, coming from utterances whose truthfulness or truthlessness is not decidable or not known. Two different feature selection strategies were tested:

Best Frequencies Separate n-gram frequency lists were computed for true and false utterances in the training set, for both lemma and POS n-grams. The most frequent n-grams for each value of n were then chosen from these lists, in a decreasing number for increasing value of n. This approach was adopted as the higher the n the lower the absolute frequency of each n-gram. The number of the most frequent lemmas and part-of-speech collected for the different n-grams with this method, that we will henceforth call Best Frequencies, are shown in Table 1. Concretely, as shown in this Table, the 35 most frequent unigrams of lemmas were collected for true and false utterances, the 14 most frequent unigrams of POS, the 30 most frequent bigrams of lemmas and so on, until a total of 196 features from true and as many from false utterances were obtained. The overall number of surface features and the numbers of features of each type illustrated in Table 1 were arrived at on the basis of extensive empirical experimentation. The figure of 196 features in Table 1 is the number of features separately determined for false and true utterances. These separate lists of features are then merged into a single list, whose size depends on the degree of overlap: if the features chosen for false and true utterances are identical then only 196 features are used in total, whereas if n-grams for false and true utterances are completely disjoint then 392 n-grams (196+196) would be collected for each utterance.
Information Gain The second strategy for feature selection we employed is based on the popular Information Gain (IG) metric (Forman 2003; Yang and Pedersen 1997). Information Gain “measures the decrease in entropy when the feature is given vs. absent” (Forman 2003) according to the formula:
$$ IG = e(pos, neg) - [ P_{n-gram} e(tp, fp) + P_{\neg{n-gram}} e(fn, tn)] $$
in which e is the entropy:
$$ e(x, y) = -\frac{x}{x+y}\log_{2}\frac{x}{x+y}-\frac{y}{x+y}\log_{2}\frac{y}{x+y} $$
and $P_{n-gram}, P_{\neg{n-gram}}$ are defined as follows:
$$ P_{n-gram} = \frac{tp+fp}{all} $$
$$ P_{\neg{n-gram}} = 1-P_{n-gram} $$
where:
- tp = true positives: because the scientific focus of this work is to verify if it is possible to identify deceptive statements, true positives are the cases where the utterance is false and the feature is present;
- fp = false positives: the cases where the utterance is true and the feature is present;
- tn = true negatives: the cases where the utterance is true and the feature is absent;
- fn = false negatives: the cases where the utterance is false and the feature is absent;
- pos = positives: number of cases where the utterance is false (and the feature is present or absent: tp + fn);
- neg = negatives: number of cases where the utterance is true (and the feature is present or absent: fp + tn).
To compute the Information Gain of a feature, we compute the feature frequency lists for n-grams of lemmas and POS sequences as above, keeping all the n-grams with frequency higher than 5. We then compute the Information Gain of each feature and keep the 250 features with highest Information Gain.

Table 1 The most frequent n-grams collected

	True utterances	False utterances
non mi ricordo	20	49
non ricordo	6	68

Automatic deception detection in Italian court cases

Abstract

Similar content being viewed by others

Fighting the Fake: A Forensic Linguistic Analysis to Fake News Detection

Signs of Legal and Pseudolegal Authority: A Corpus-Based Comparison of Contemporary Courtroom Filings

From Case Law to Ratio Decidendi

Explore related subjects

1 Introduction

2 Background

2.1 Detecting deception

2.2 Approaches to deception detection

2.3 Non-verbal approaches

2.4 Hermeneutic approaches

2.5 Stylometry

2.6 Stylometric methods for deception detection

3 Data set

3.1 Calumny and false testimony in the Italian Criminal Code

3.2 Validity of the judgments of truth and falsity

3.3 The hearings

3.4 Preprocessing

3.4.1 Tokenization

3.4.2 Anonymisation

3.4.3 Lemmatization and POS-tagging

3.5 Annotation

3.6 Some statistics

4 Methods

4.1 Features

4.1.1 Utterance length

4.1.2 LIWC features

4.1.3 Lemma and POS n-grams

4.2 Evaluation

4.2.1 Evaluation metrics

4.2.2 Random baseline

4.2.3 The majority baseline

4.2.4 A simple heuristic algorithm

4.3 Training the models

5 Experiments and results

5.1 Comparing lexical and surface features

5.1.1 Preliminary discussion

5.1.2 Using the LIWC

5.1.3 Surface features

5.1.4 Combining lexical and surface features

5.2 Discriminating between clearly false and clearly true utterances

5.2.1 Preliminary discussion

5.2.2 Using the LIWC

5.2.3 Surface features

5.2.4 Combining features

5.3 Selecting more homogeneous sets of defendants

5.3.1 Preliminary discussion

5.3.2 Only male speakers

5.3.3 Only Italian native speakers

5.3.4 Only over 30 years old speakers

6 Discussion

6.1 Predicting deception

6.1.1 The effect of size

6.1.2 Deception at the utterance level

6.1.3 Uncertainty and noise

6.1.4 Using more homogeneous data

6.1.5 Linguistically more sophisticated models

6.2 The language of deception: the case of Italian

7 Conclusions

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation