Deception, whether through omission or direct falsification, is a fundamental part of human social interaction (DePaulo et al. 2003). Deception may refer to anything from trivial, so-called “white lies”, to situations in which the consequences of detected deception are grave—especially those involving the law. Although many lies are uncovered due to physical evidence or to the presence of third-party information (Park et al. 2002), sometimes this may be insufficient or even non-existent. In such contexts, lie detectors (such as law enforcement agents) may be forced to rely on other cues, such as nonverbal behavior, as indicators of a statement’s truth or falsehood.

Despite beliefs to the contrary, research thus far has indicated that most individual’s accuracy at lie detection is relatively poor. A recent meta-analysis by Bond and DePaulo (2006) found that participants displayed a significant, but modest, 54% accuracy rate when identifying items as either truthful or deceptive. This is similar to the accuracy ratings found in previous reviews (Vrij 2000; Zuckerman et al. 1981). Bond and DePaulo also replicated prior findings that individuals are better at identifying when a statement is truthful from when it is a lie. However, although average performance in lie detection has been found to be unimpressive, research has also identified people who are good at this task. For example, Ekman and O’Sullivan (1991) found that secret service agents achieved 64% accuracy; similarly, Ekman et al. (1999) found accuracy rates between 68% to 73% amongst groups with a special interest in deception detection. Thus, although most people display lie detection accuracy only moderately better than chance, there do appear to be some individuals who are skilled in this respect. Seemingly, behavioral cues to deception do exist, although most individuals are not particularly good at identifying them.

It should be noted that there is no good evidence for one single, reliable cue to deception. Pinocchio’s nose, said to grow longer with each fib that Pinocchio told, is but a charming fairy tale. However, a number of possible context-dependent indicators have been identified (e.g., DePaulo et al. 2003; Vrij 2000; Zuckerman et al. 1981). In particular, it has been theorised that nonverbal cues to deception arise primarily from cognitive and emotional sources (Ekman 2001). In the former, this is attributed to the increased cognitive load experienced during deception (e.g., Vrij et al. 2001), in the latter to the nonverbal leakage of lie-related emotions, such as fear or guilt, incongruent with the lie the deceiver is trying to present (Ekman 2001; Ekman and Friesen 1969, 1974). This may be due either to affective experiences associated with the content of the deception (e.g., if the person is lying about highly emotional experiences), or to emotions aroused by the act of lying itself (Ekman 2001). Leakage is believed to occur through a number of different nonverbal channels, specifically facial expression, body movements or vocal tone (Ekman et al. 1991).

The relative importance of these different channels has been the subject of some debate. It has been hypothesized that both the body (Ekman and Friesen 1969) and the voice (Scherer 1986) are of comparable importance to facial expression in deception detection. Indeed, it has been claimed that their importance may be even greater, both because deceivers pay less attention to controlling the body (Ekman and Friesen 1969), and because vocal cues are less controllable (Scherer 1986). However, research evidence does not seem to support these claims, showing reduced body movement (Ekman et al. 1988; Vrij 2000), and minimal affect-related pitch differences during deception (Scherer 1986). There is, however, substantial research evidence that facial expressions are of prime importance in the leakage of suppressed affective reactions. Specifically, felt emotions automatically trigger facial affective displays for six or seven universal emotions (Ekman and Friesen 1975; Ekman et al. 1983). Although repression, masking or inhibition of these emotional displays is learned from a young age in accordance with cultural or individual display rules (Ekman 2001; Ekman and Friesen 1975), leakage of these automatic expressions will nonetheless often occur through either so-called “reliable” facial muscles or through micro-expressions. Thus, despite the considerable control we exert over facial expression, it may nonetheless be the best source of emotional leakage to deception (Ekman and Friesen 1969).

There are two principle sources of facial leakage of repressed affective expressions: subtle expressions and micro-expressions. Subtle facial expressions are fragments of otherwise suppressed or masked affect displays using only part of the normally associated musculature. Arguably, these may be due to reliable groups of facial muscles, notably around the forehead and brow, which are difficult to control voluntarily. As such, these will not be present in fabricated emotional expressions, nor will they be inhibited by suppression or masking, thus providing valuable leakage cues as to the target’s genuine affect (Ekman and Friesen 1975). However, it is important to note that not all expressions contain such reliable muscle groups (Ekman 2001). Furthermore, there are people who can manipulate these muscles; for this subgroup they will not be reliable indicators of deception (Ekman 2001). Another form of leakage of suppressed facial emotions are micro-expressions; these are full muscular expressions of affect that occur for only a brief instant, typically around 1/25th of a second (Ekman and Friesen 1969; Frank and Ekman 1997). Such is their brevity that the majority of observers will fail to perceive them, although accurate identification can be improved with training in facial expression coding (Ekman and Friesen 1974).

Due to the insufficient suppression of affective facial expressions, both micro and subtle expressions have been theorized to provide valuable cues to deception. Indeed, two studies have shown that for college students and professional lie catchers, the ability to identify micro-expressions correlates significantly with deception detection accuracy (Ekman and O’Sullivan 1991; Frank and Ekman 1997). There have, however, been no comparable studies of subtle expressions. It is thus of considerable interest to investigate whether skill in perceiving subtle expressions is correlated with deception detection accuracy, and to compare its relative importance in this respect to skill in perceiving micro-expressions. Whereas identifying micro-expressions relies on the perception of a full expression of emotion but displayed over a short time period, identifying subtle expressions requires the extrapolation of an emotional expression from partial cues. Thus, in this study, it was hypothesized that skill in identifying both subtle and micro-expressions will be significantly and independently correlated with lie detection accuracy.

To test this hypothesis, an experiment was conducted, based on a procedure developed by Ekman (Ekman and Friesen 1974; Ekman et al. 1988, 1991; O’Sullivan et al. 1988). In Ekman’s experiments, a group of nursing students (encoders) watch two films, one intended to be stressful (an unpleasant surgical operation), the other to be neutral (a pleasant landscape scene). They are asked honestly to describe their reactions to the landscapes but to describe the surgical operation scenes as if they too are pleasant landscapes scenes. These lies were made high stake by informing participants that the ability to hide negative affective reactions would be related to later job performance. Another group of participants (decoders) are asked to watch the videos, and to identify which film the encoder in each scene is watching. Because the experimenter knows this information, it is possible to say objectively whether or not each response is correct.

It should be noted that in Ekman’s procedure, participants are asked to deceive only under conditions intended to arouse strong negative emotion (viewing an unpleasant surgical operation). This may make deception easier to detect, because the effects of intent to deceive are potentially confounded with those due to emotional arousal. The current experiment seeks to address this concern by the addition of two additional conditions: deception based on unemotional stimuli and truthful accounts of emotional stimuli. Hence, in effect, the experiment tested two types of deception detection: emotional and unemotional. Given that lies based on strong affective content will arguably contain more cues to deception, it was hypothesized on the basis of Ekman and Friesen’s (1969, 1974) theory of nonverbal leakage that emotional lie detection would be superior to unemotional lie detection.

Baseline behavior was also included in the current study, as previous research has indicated that many deceptive behaviors are dependent on individual differences; as such, they must be viewed in the context of an individual’s normal behavior to allow correct interpretation (Ekman and Friesen 1974; Vrij et al. 2000). Indeed, prior research had indicated significantly increased accuracy through the inclusion of a baseline measure of honest behavior (O’Sullivan et al. 1988).

Method

Participants

Twenty participants (12 females and 8 males) were recruited through an advertisement offering them the chance to test their deceptive abilities and win £10 in the process. All were native English speakers, either full or part-time students at the University of York, aged between 18 and 45. These participants were the encoders who assisted in the construction of the lie detection test.

A further 30 participants (11 males and 19 females) were recruited from the University of York, and entered into for a £50 prize for their participation. All were native English speakers, aged between 18 and 45; one was an administrator at the university, the others were all students. These participants were the decoders who took the lie detection test once it was constructed.

Apparatus

Video Clips

Two sets of video clips (three emotional and five unemotional) were used. The unemotional clips were selected from promotional footage of Hawaiian landscapes. These were considered pleasant to observe, but unlikely to generate more than a mild positive affective response. The emotional clips used footage of surgical operations which it was considered would generate a strong negative emotional reaction in observers. All eight clips varied in exact length between 28 and 31 s. On the basis of a pilot study this was considered sufficient duration for generating deceptive communications. Furthermore, as research has indicated that most real life lies are generally short (typically 6–61 s, Mann et al. 2002) this duration was deemed ecologically valid. It was considered unlikely that the slight variation in the length of the clips would impact significantly on participant performance. All eight clips were encoded using an MPEG-2 codec, in 320 × 240 resolution, to ensure compatibility. They were displayed using a RealOne player at Full Screen size on a 17′′ CRT monitor placed approximately 1 m away from the participant.

Micro Expression Training Tool (METT)

The METT was developed by Ekman (2002) for training in the recognition of micro-expressions of emotion. It comprises a calibration test, training exercises and a further post-training test. In the current study, only the calibration test was used as it was deemed an appropriate test of micro-expression recognition ability. It comprises 12 Japanese and Caucasian faces each displaying a micro-expression which participants have to identify from seven categories (Happy, Sad, Surprise, Contempt, Disgust, Fear and Anger). No time limit was set on decoders’ decisions; they controlled the speed at which they progressed through the test. Although there have been no validity studies of the METT, it was developed from expressions used in the Brief Affect Recognition Test (BART) which has been shown to have good reliability and validity (Matsumoto et al. 2000). It has also been used in previous deception detection research (Frank and Ekman 1997).

Subtle Expression Training Tool (SETT)

The SETT was published by Ekman (2002) as part of the same training package as the METT and is designed to improve recognition of subtle expressions of emotion. It has no equivalent to the METT’s calibration test, therefore in this study the “practice” option was used to assess skill in perceiving subtle expressions. This comprises 37 expressions all displayed by the same young Caucasian female. As with the METT, decoders had to identify the expression from seven categories (Happy, Sad, Surprise, Contempt, Disgust, Fear and Anger). The expressions can be viewed at three speeds (slower; normal; faster); in this experiment they were displayed at the “normal” setting although decoders controlled the rate at which they progressed between items. Unlike the METT calibration test, the SETT practice procedure provides ongoing feedback as to the accuracy of each selection. It also presents an option to “Try Again” should participants make an error; however, in this study, decoders were instructed to disregard this and to continue on to the next expression.

Procedure

In Stage 1 of the experiment, encoders sat facing the computer screen with the interviewer just off to their left. They were filmed using a wall-mounted camera above and behind the interviewer such that only the encoder was captured. Each encoder was first instructed to give a brief (30 s) description of their hobbies or what they did in their free time, which was recorded so as to provide a baseline. The actual length of these descriptions varied between 17 to 55 s, with a mean length of 28.7 s. However, it was deemed that restricting encoders to the instructed 30 s would result in an artificial testimonial which would not serve as an appropriate baseline. Encoders then observed one of either the unemotional or emotional clips, presented in a counter-balanced order between participants, with instructions to deceive when describing the footage. Thus, if they saw the surgical procedures they were asked to describe them as if watching a Hawaiian beach scene. If they saw the Hawaiian beach scene, they were asked to describe it as if watching a series of surgical procedures. Encoders in both conditions were advised that their performance would be judged by other participants in the study and if their deception was successful they would win £10. Encoders who had seen the surgical procedures then observed the Hawaiian beach scene and vice versa; in this phase of the experiment, they were asked to describe the clips truthfully. Thus, the deceptive condition always preceded the truthful condition and the two clips used for each encoder were always taken from opposing sets. This was to avoid possible effects due to either practice or emotional priming/habituation. Finally, encoders were debriefed as to the purpose of the experiment and all received £10 for their participation.

The footage was then edited into short clips approximately 1 min in length. Each consisted of a baseline and either a deceptive or truthful account, with both a “truth” and a “lie” item from each encoder. A pilot study was conducted to generate a difficulty rating for each item, based on the percentage of participants who identified it correctly. For the final test, items were selected to ensure a sufficient spread of difficulty; there was one item from each encoder, equally divided between truthful/deceptive and emotional/unemotional clips. In total, the deception detection task (DDT) comprised 20 items: 5 unemotional-truth, 5 unemotional-lie, 5 emotional-truth and 5 emotional-lie. Thus, the DDT could be analysed in terms of two subscales: emotional lie detection and unemotional lie detection. Because the clip distribution was chosen according to item difficulty, encoder gender distribution was somewhat uneven. Overall, there were a total of 20 encoders (12 female, 8 male). In the emotional-truth and emotional-lie conditions, there were 4 female encoders and 1 male each; in the unemotional-truth and unemotional-lie conditions, there were 2 female and 3 male encoders each. The chosen clips were randomized and separated by a 10-s interval to allow decoders time to respond. The length of individual clips within the test varied between 46 and 85 s, with a mean length of 60 s. The overall length of the DDT was 23 min and 15 s.

In Stage 2 of the experiment, the decoders completed the DDT followed by the METT and the SETT. The tests were administered to participants on a one-to-one basis in a single session. They were informed that approximately the first 30 s of each item would consist of a truthful baseline followed by a roll-over effect and then either a deceptive or truthful description of a clip of video footage. Decoders were informed that after each item they would have 10 s in which to make a decision as to the target’s truthfulness. They were asked to report which of the following 6 cues led them to make their decision: “What they said”, “How they said it”, “Facial Expression”, “Body Language”, “Gut reaction” or “Guess”. They were also asked to indicate on a 7-point scale their degree of confidence in their decision and, as a control measure, their familiarity with the target.

After completing the DDT, decoders were administered the METT and the SETT in a counterbalanced order. This was always done after taking the DDT to avoid the possibility of practice effects or priming towards nonverbal cues. At the end of Stage 2, decoders were debriefed and informed of their scores on the DDT; the METT and SETT automatically give feedback on the screen after taking each test.

Results

Seven participants indicated that they knew targets in the test at either a “casual acquaintance” level or higher; accordingly, these participants’ data were excluded from the analysis.

Overall mean performance on the DDT was 50% indicating that decoders’ performance was no better than chance. However, analysis of the emotional and unemotional sub-scales tells a different story. Mean accuracy for emotional lie detection at 64.35% was significantly above chance, t(22) = 4.67, p < .01, d = .98. In contrast, mean accuracy for unemotional lie detection at 36.09% was significantly below chance, t(22) = − 4.06, p < .01, d = .85. To test for effects due to both the stimulus dimensions (emotional/non-emotional and truths/lies), a two-way repeated measures ANOVA was conducted (see Table 1). This showed a significant main effect for emotion, with accuracy for emotional items greater than that for unemotional test items, F(1, 22) = 30.78, p < .01, η 2p  = .58. However, no significant main effect was found for truths/lies, F(1, 22) = .26, p > .05, nor was there a significant interaction between the two dimensions, F(1, 22) = .22, p > .05.

Table 1 Mean accuracy for emotional and unemotional lie detection (n = 23)

To test whether sensitivity to micro and subtle expressions is related to lie detection accuracy, METT and SETT scores were correlated with overall DDT performance, as well as with emotional lie detection and unemotional lie detection (see Table 2). Neither overall DDT performance nor unemotional lie detection were significantly correlated with the SETT or the METT, although a negative correlation between the SETT and unemotional lie detection just missed significance, r(21) = −.34, p < .10. However, a significant positive correlation was found between emotional lie detection and the SETT, r(21) = .46, p < .05, but not the METT, r(21) = .20, p > .05.

Table 2 Correlation coefficients between METT and SETT scores and lie detection accuracy (n = 23)

A series of correlations were also conducted to investigate the relationship between deception detection accuracy and cue utilization (see Table 3). Neither overall DDT performance nor unemotional lie detection were significantly correlated with reported cue utilization. However, emotional lie detection was significantly positively correlated with reported use of facial expressions, r(21) = .52, p < .01.

Table 3 Means (and standard deviations) for cues and their correlations with lie detection accuracy (n = 23)

In order further to investigate the predictive accuracy of factors influencing emotional lie detection, a backwards stepwise multiple regression was performed. Reported cue usage together with SETT and METT scores were all entered into the analysis with emotional lie detection accuracy as the dependent variable. The resulting model contained only two variables, namely, reported use of facial expression and SETT performance; between them they accounted for 38% of the variance, F(2, 20) = 6.20, p < .01. However, the results of this regression analysis should be interpreted with a degree of caution, given the relatively small number of decoders used in this study.

Discussion

Whereas many previous studies (Kraut 1980; Vrij 2000) have shown that observers perform at no better than chance in lie detection, the results of this study showed an average accuracy of 64%, but only for identifying lies or truths based on responses to emotional stimuli. In contrast, the accuracy rate for unemotional stimuli at 36% was significantly worse than chance. Comparably, whereas SETT scores showed a significant positive correlation with emotional lie detection, they were negatively correlated with unemotional lie detection, although this latter finding did not quite reach statistical significance. Reported cue usage also differed between the two sub-tests of the DDT. Whereas none of the reported cues were significantly related to unemotional lie detection, a significant positive correlation was found between reported use of facial expressions and emotional lie detection.

These results certainly provide good support for nonverbal leakage theory (Ekman and Friesen 1969, 1974), indicating that leaked emotions incongruous with the intended message can provide useful cues to deception. Although the difference between emotional and unemotional lie detection was predicted on the basis of leakage theory (Ekman and Friesen 1969, 1974), its magnitude was nonetheless surprising. Notably, it has important implications for traditional deception detection research design, where typically truthful accounts have been given in response to “pleasant” stimuli, deceptive accounts in response to “unpleasant” footage (e.g., Ekman and Friesen 1974). Thus, prior research may have been confounded by encoders’ differing affective experience in response to deceptive and truthful conditions. In contrast, the results of this study suggest a possible moderator effect of emotion type and/or emotion intensity on lie detection accuracy. Even so, it must be acknowledged that there was no direct assessment in the current study of the encoders’ emotional response to the video clips. Hence, further empirical investigation is required to test the hypothesized role of emotion and its differential effects on deception detection.

The significant positive correlations between emotional lie detection and both SETT performance and self-reported use of facial expressions cues also support previous research, which showed that observers faced with high-stake lies display greater accuracy when attending to nonverbal cues (DePaulo et al. 1983, 1988). Thus, it would appear that the ability to read leaked emotions through subtle expressions is associated with lie detection accuracy. However, this finding needs some qualification, given the negative correlation found between SETT performance and unemotional lie detection. Although this correlation was only significant at the .10 level, it suggests that, dependent on context, identification of subtle expressions of emotion may not always improve deception detection. This proposal has important practical implications. For example, training police officers to read subtle expressions will not necessarily increase their detection skill; indeed, under certain circumstances, their error rate might actually increase.

It was notable in this study that decoders performed significantly below chance in unemotional lie detection. If poor performance resulted simply from a lack of observable emotional cues, then chance accuracy might have been expected. Thus, an alternative explanation is that decoders’ poor performance occurred because of intentional misrepresentation on the part of the encoders. Arguably, when the emotional content of the stimuli is low it is much easier for encoders to control and/or dissimulate their nonverbal behavior, and hence to actively mislead observers as to the nature of the observed stimuli. Another possibility is that these results may be due to decoders misinterpreting negative affect due to the act of lying, in this case resulting specifically from detection apprehension and performance anxiety on the part of the encoders.

Unlike previous research, the current study failed to find a significant correlation between ability to identify micro-expressions and lie detection (cf. Ekman and O’Sullivan 1991; Frank and Ekman 1997). One possibility is that this reflected a lack of visible micro-expressions within the DDT. To test this hypothesis, a survey of the DDT was conducted by the first and third authors, from which microexpressions were identified in nine of the 20 test items. However, only three of these instances came from the emotional clips. Hence, the non-significant correlation between METT and emotional lie detection performance could well reflect a lack of visible microexpressions. The remaining six microexpressions occurred in response to unemotional items. Of these, five were judged to be actively misleading, suggesting possible deception when the encoders were giving truthful accounts (three instances), or truthful accounts when the encoders were engaged in deception (the other two instances). Hence, the non-significant correlation between METT and non-emotional lie detection is unsurprising.

This disparity between the findings for the SETT and the METT is obviously an issue which needs to be addressed in future research. Because encoders are motivated towards successful deception, one possibility is that their facial expressions are not fully fledged displays of emotion. Thus, they may be more similar to what is shown in the SETT than the METT, which has full (if brief), uninhibited displays of emotion. In this context, a detailed FACS analysis (Ekman et al. 2002) of the relative incidence of micro and subtle expressions would undoubtedly be useful. Certainly, the exact relationship between the METT and the SETT needs to be clarified in future research, to what extent are they measuring different types of skill, or the same underlying skill. It would also be useful to investigate whether the findings reported here for the METT and the SETT replicate with a different sample. Nevertheless, the results of this study do support the growing body of evidence that people who pay more attention to nonverbal cues and perceive them more accurately are also better detectors of deception.

A notable finding of this study was the striking difference between emotional and unemotional lie detection. This result could not have been demonstrated by previous experiments which have compared deceptive responses only to emotional stimuli with non-deceptive responses only to unemotional stimuli (e.g., Ekman and Friesen 1974). Hence, it is important in designing future experiments not to confound these two dimensions of deception and emotionality. Although in this study accuracy ratings for emotional lie detection are comparable to those found when a baseline of truthful behavior is included (e.g., 60%, O’Sullivan et al. 1988), it should be noted that the baseline condition used here preceded both the emotional and unemotional items. Hence, it is unlikely that a baseline effect accounted for the higher accuracy rate in emotional lie detection. Thus overall, the results of this study demonstrate the importance of taking the type of lie into account when assessing observers’ decoding skills.

Nonetheless, there are a number of methodological concerns that need to be discussed with regard to the DDT. One is the potentially confounding influence of camera shyness. This may have affected behavior more in the deceptive condition, given that it always preceded the non-deceptive condition. However, since both conditions were always preceded by the baseline, encoders should have been more at ease by the time at which they reached the deceptive phase of the experiment.

Another concern is that of participant motivation. Notably, DePaulo et al. (2003) found significantly stronger deception cues in studies using identity-relevant motivation than in those using instrumental motivation. In the experiment reported here, it is open to question whether the use of monetary rewards was sufficient to generate high-stake lies. However, DePaulo et al. (2003) did also find significantly more deception cues in instrumental motivation studies than in those with no stakes at all. Furthermore, given that in this experiment both emotional and unemotional lies were subject to the same monetary reward, motivational differences should not account for the observed difference in accuracy rates.

Finally, there are issues with regard to the gender and ethnicity of the test participants. It should be noted that the METT contains a high number of Asian-American encoders, which perhaps might have affected participants’ scores on this test. However, this should not have been a problem, given both the multi-cultural nature of British society, and the relatively high proportion of Chinese students at the University of York (UK), where this research was conducted. Of more concern is the somewhat uneven representation of male to female participants within the DDT, in terms of both overall numbers and distribution between the sub-scales. It is possible that at least some of the difference found between the emotional and unemotional sub-scales might reflect gender differences in encoder expressiveness. However, this should not have been the case, given that the items were selected specifically to ensure an even spread of difficulty throughout the test.

Lie detection is a complex process, and the results of this study have shown how some of the generalizations of previous research fail to stand up to closer scrutiny. Nevertheless, further investigation is required both of the nonverbal decoding skills that lead to better lie detection, and of how they are affected by the type of lies under scrutiny. Such advances should not only enhance our theoretical understanding, but also improve our practical ability to train lie detectors in this most difficult of skills.