Keywords

1 Introduction

Nowadays the security concerning the identity has become a very sensitive issue. In particular, the increase of terrorist attacks in the last decades imposes the need to recognize declarants of false identity. Usually migrants from Middle East entering Europe or USA do not have any documents and personal details are frequently self-declared. Among them, a high number of terrorists giving false identities are believed to be hidden. Because terrorists move across countries using fake identities, the identity detection is now a major target in anti-terrorism [1].

Deception is cognitively more complex than truth telling and this higher complexity reflects itself in a lengthening of the reaction times (RT) during a response [2]. According to literature, two memory detection techniques based on RT have been proposed to identify liars. These are the autobiographical Implicit Association Test (aIAT) [3] and the RT-based Concealed Information Test (RT-CIT) [4]. These techniques may be used also as tools for identity verification [5].

RT based techniques have a number of advantages compared to the traditional psychophysiological techniques to detect deception, as the polygraph [6]. First, RT are not subjected to strong individual and environmental changes, such as in the case of physiological parameters. Secondly, these techniques are inexpensive and suitable to be used on large scale. However, these techniques are not without limitations. Even though RTs are implicit measures, during the aIAT or CIT examination the lie detection purpose is explicit (overt detection of deception). Furthermore, RT based techniques only studied the latency in the response, so the liar has to check only this unique parameter to falsify the evidence. Finally, the use of these methods requires a prior knowledge about the information that has to be checked as true or false. In fact, both aIAT and CIT require that the true identity (or the true memory) is available, while in most real applications the true identity, as the migrant’s case, is unknown to the examiner. This feature limits the practical application of RT based verifications, even if their efficiency is proved.

The analysis of movements during the response has already been shown to present a series of advantages, since it allows to capture the cognitive complexity in stimulus processing by the registration of a variety of indicators including not only the reaction time. Recently, researchers have shown that kinematic analysis can be used as implicit measure of the cognitive processes underlying a task [7]. Several authors, as [810], measured hand movements during choice tasks on a screen to understand the dynamics of a wide range of psychological processes. They described as a simple hand motion can reflects in real-time the progress of the underlying cognitive processing. Therefore, hand-motor tracking can provide a good trace of mind processes.

Because cognition is largely involved in the process of lie [11], it is reasonable to think that the analysis of hand movements can be a good implicit measure to study the cognitive mechanisms involved in lying. A first and precursive study about the kinematic as signatures of deception was presented in [12]. The authors compared motor trajectories while subjects were engaged in an instructed lie task. Participants were required to respond truthfully or lying to the presented sentences by a visual cue. Authors used the Nintendo Wii Remote to record subjects’ responses. Results reported that deceptive responses could be distinguished from truthful ones on the basis of several parameters, including the motor onset time, the overall time required for responding, the trajectory of the movement and kinematic parameters such as velocity and acceleration. In Ref. [13], the authors studied mouse movements in an insurance fraud online context. Their results suggest that liars had an increasing in the distance of movements, a decreasing in the speed of movements, an increasing in the response time, and a more number of left clicks. In [14] authors proposed a pilot study to identify guilty individuals involved in specific insider threat activities. They analysed mouse movements while participants compiled an online survey similar to the Concealed Information Test (CIT). Their preliminary observations showed that guilty insiders had a different motion pattern when answering the key-item as compared to the answering of non-key-items, which was indicative of an increased cognitive activity while deceiving.

Concerning the identity verification, there are also several studies in literature that applied mouse movements analysis to biometric user authentication or identification in informatics fields [15]. However, these methods require necessarily a certain level of knowledge about the alleged user and a user-specific training, in order to be able to recognize him/her or the liar.

The goal of this work is to present a new identity check technique based on mouse movements recording, to identify false self-declared identities without knowing anything about the real identity of the declarant. This method consists in a memory detection technique, which investigates the truthful or untruthful nature of the memory for the personal information declared, using implicit measures from mouse movements. In other words, we employed kinematic analysis of the mouse movements to identify implicit signatures of deception.

2 Method

2.1 Participants

40 participants were recruited at the Department of General Psychology in Padova University. The sample consisted of 17 males and 23 females. Their average age was M = 25 (SD = 4.6), and their average education level was M = 17 (SD = 1.8). Because they use the mouse differently, left-handed subjects were excluded. All subjects agreed on the informed consent before the experiment.

2.2 Experimental Procedure

The experiment was implemented using MouseTracker software [16].

During the experimental procedure, participants were asked to answer 3 yes or no questions about their personal information, clicking with the mouse on the correct alternative response on the computer screen (Fig. 1 shows an example). 20 participants answered truthfully, while the others were instructed to lie about their identity according to a false autobiographical profile.

Fig. 1
figure 1

Example of the task presented to the subjects

The 20 liars were instructed to learn a false identity from an Italian standard Identity Card, where a photo of the subject were attached, and which contained false personal data (an example of ID Card is reported in Appendix). After the learning phase, participants recalled the information in the ID card for two times. Between the two recalls, they held a mathematical distracting task. On the other hand, the truth tellers performed a mathematical task and revised their real autobiographical data only once before starting the experiment.

During the experimental task, three different kinds of questions were presented to participants, in random order. Expected questions: 6 questions about information explicitly trained from liars during the learning and recall phases (e.g. date of birth). Unexpected questions: 6 questions related to the identity but not explicitly rehearsed before the experiment (e.g. age). Liars can get this information by applying a reasoning to the learned data. For example, if I know that I was born in April 1989, I can conclude that I am 26 years old. Control questions: 4 questions about personal characteristics that could not be denied. These are information regarding evident physical traits that cannot be hidden to the examiner, as the gender.

Each of these 16 questions was presented two times, one time the subject had to answer yes and in the other one the participant had to give a no response, for a total of 32 questions. In this way, truth tellers answered sincerely at all questions, whereas liars answered lying on expected and unexpected questions that required a yes response. Liar’s answers to control questions and to expected and unexpected questions, which required a no response, were truthful. An example of questions is reported in Appendix.

To view each question, participants had to click on the Start button in the lower part of the screen. Then they chose the answer clicking on the response boxes positioned in the two top corners of the screen.

2.3 Data Analysis

For each answer, the motor response was recorded using MouseTracker software. Because each recorded trajectory have a different length, in order to permit averaging and comparison across multiple trials, each motor response was time-normalized. By default, MouseTracker performs a time normalization in 101 time steps using linear interpolation. Thus, each trajectory had 101 time-steps and each time-step had a corresponding x and y coordinate.

We analysed signatures of deception in terms of the shape of each movement trajectory and the location of the trajectory over time. We also quantified the trajectory properties on dimensions of velocity, stability, and direction. In particular, we collected the following features:

  • Number of errors: number of incorrect answers.

  • Initiation time: time between the appearance of the question and the beginning of the mouse movement.

  • Reaction time: time from the appearance of the question to the click on the answer box.

  • Maximum deviation: the largest perpendicular deviation between the actual trajectory and its idealized trajectory.

  • Area under the curve: the geometric area between the actual trajectory and the idealized trajectory.

  • Maximum deviation time: time to reach the point of maximum deviation.

  • x-flip: number reversals of direction along the x-axis.

  • y-flip: number reversals of direction along the y-axis.

  • X, Y coordinates over the time: position of the mouse along the axis over the time. Specifically we choose to use for the analysis only Y coordinate data, for time-steps 18, 29, 30. This is because already from a preliminary visual analysis the two experimental groups clearly differed only in position of the mouse along the y-axis over the time.

  • Acceleration over the time: acceleration of the mouse along the axis over the time. We calculated acceleration along y-axis for time intervals 18–29 and 29–30.

These features were used to train different machine learning classifiers on all subject responses.

3 Results

A preliminary visual analysis showed a significant difference in kinematic responses between liars and truth tellers. The average maximum deviation (MD) for liars is 0.33 (SD = 0.42), while for truth tellers is 0.15 (SD = 0.28). The area under the curve is wider in liars (AUC = 0.6, SD = 1.1), than truth tellers (AUC = 0.22, SD = 0.5). Figure 2 shows the average trajectory for liars and truth tellers. Furthermore, it represents the position of the mouse along x and y-axis during the response time for liars and truth tellers. In addition, liars made a greater number of errors than truth tellers (error frequency for liars = 84, error frequency for truth tellers = 7).

Fig. 2
figure 2

Left average trajectory for liars (in red) and truth tellers (in green). Right Position of the mouse along x and y axis during the response time for liars (in red) and truth tellers (in green)

In a first step, a 10-fold cross-validation Random Forest classifier was run on all dataset (1280 stimuli). We obtained an accuracy around 90 % in the classification of the single answer as truth or lie.

Secondly, the efficiency of the classification was evaluated on 10 test sets of 10 subjects each one. The 10 test sets were extracted from the original dataset of 40 subjects using the following rules: each test set contained 5 liars and 5 truth tellers; each subject appeared in the 10 test sets a minimum of 2 and a maximum of 3 times. In this way, each test set included 320 stimuli gathered from 10 subjects. Data from the remaining 30 subjects (960 stimuli) were employed to build the model. Results for the classification of each training-test couple is reported in Table 1. Using a Simple Logistic classifier, we obtained an overall accuracy of 73.56 % in classifying a single stimulus as truthful or untruthful. From the classification of single answer as true or false, according to a majority vote system, we traced the classification of the single subject as liar or truth teller. On the single participant, we reached an average accuracy of 7.8/10 participants correctly classified as true tellers or liars, with a minimum accuracy of 7/10 and a maximum of 9/10.

Table 1 Simple logistic classifier accuracy for 10 training set and 10 test set including all stimuli

We repeated this procedure for training and testing the classifier on the answers in which only truth tellers responded sincerely and liars cheating (expected and unexpected questions that required a yes response). 10 training sets and 10 test sets were created as above. This time, each test set included 120 stimuli gathered from 10 participants and each training set included 360 stimuli obtained from 30 participants. Classification results are shown in Table 2. Training a Simple Logistic classifier, we obtained an accuracy around 78 % in the classification of the stimuli as sincere or deceitful, which means that 8.8/10 participants were correctly classified as true tellers or liars, with an accuracy ranging from 8/10 to 10/10.

Table 2 Simple logistic classifier accuracy for 10 training set and 10 test set including as stimuli only expected and unexpected questions that required a yes response

Finally, we built a model including in the training set also the answers of all 40 participants, in which both liars and truth tellers answered truthfully (control questions and expected and unexpected questions that required a no response). Each test set included the answers of 10 participants in expected and unexpected questions that required a yes response. Therefore, each one of the 10 training sets included 1160 stimuli, and each test set included 120 stimuli. Using a Random Forest classifier, we reached an average accuracy of 88.08 % in the classification of single answers as truthful or untruthful, that corresponds overall to 9.7/10 participants correctly classified as true tellers or liars, with a minimum accuracy of 8/10 and a maximum of 10/10. These data are reported in Table 3.

Table 3 Random forest classifier accuracy for 10 training set and 10 test set including control questions, expected and unexpected questions of all 40 participants in the training set

4 Conclusions

This work shows that using mouse movement analysis, it is possible to reach a high rate of accuracy in detecting the veracity of self-declared identities. The accuracy of the classification is very high not only for the single subject, but also for the single answer.

As already shown in literature [17], the presence of unexpected questions induce in liars an increase in cognitive load. This increase reflected itself in a different pattern of the kinematic response that became distinguishable from the truth teller pattern.

We believe that this approach can have several advantages compared to the RT based techniques mentioned above. First, kinematic indices can be recorded in a hidden way while the user interacts with the device and not being aware of what we are observing. Secondly, the detection of these indices is inexpensive, easily obtainable and does not require any equipment in addition to what the subject is already using during the interaction with the computer. This method is potentially very well adapted to the detection of deception also in the context of web, because it do not require the presence of an examiner and can be run automatically, quickly and anywhere. Furthermore, the use of mouse kinematic instead of the simple RT pushing a key on keyboard in order to record responses has a number of advantages. While button press may only permit to record RT, to use a mouse allows to capture the cognitive processes and their complexity by the registration of a large set of indicators, which include not only the reaction time. For this reason, the technique is promising also concerning resistance to countermeasures. The large number of characteristics of movement seem, in principle, difficult to control entirely via efficient countermeasures to lie detection.