1 Introduction

Lying and verbal deception are often accompanied by involuntary nonlinguistic cues and physiological responses, and most current lie detection approaches rely on the observation of such signals (Ekman and Friesen 1974).

EEG signals have been shown to reflect the state of the brain’s cognitive and attention mechanisms, and attempts at deception are associated with increased event-related potential (ERP) values in the brain (Langleben et al. 2002; Matsuda et al. 2013). Therefore, while traditional lie detection is heavily dependent on the subjective interpretation of the human detector, lie detection based on EEG signals is more deterministic and objective. However, truth-telling/lying is not always a binary state, and some statements may contain elements of both in different proportions. Fuzzy theory can be used to create a continuum from discrete binary data. Fuzzy membership function is used to quantify fuzzy intervals and assign linguistic class and membership degree for the establishment of a lie detection model. Fuzzy reasoning techniques can thus be used to estimate the likelihood of lying from various brain wave characteristics (Pedrycz and Chen 2015a, b).

This study uses the minimize entropy principle approach (MEPA) to develop a lie detection model based on brain wave characteristics. The honesty of a given statement is characterized as either truth, suspected lie, and lie based on different brain wave characteristics. The model is integrated in a smart phone-based application. EEG signals are collected through electrodes adhered to the subject’s head, and then transmitted to the mobile device by Bluetooth. The data are then uploaded to a cloud platform for processing, and results are then sent back to the phone for presentation in a graphical interface, allowing the surveyor to clearly assess the truthfulness of the subject’s statement. Continuous brain wave data collection and feedback serve to improve system accuracy.

2 Related works

This section introduces previous studies related to lie detection and polygraph testing.

2.1 Lying and polygraph test

The polygraph seeks to detect whether the subject is “lying,” defined as providing deliberately misleading through making false statements or omission (Krapohl and Sturm 2002). Ekman et al. (2005) noted that lying is a conscious decision. Polygraph examinations use of a variety of detection approaches to determine the validity of subject statements, including heart rate observation, oral and nonoral behavior analysis, and the use of physiological evaluations such as respiration pattern, cardiovascular measures, electroencephalography (EEG), functional magnetic resonance imaging (fMRI), and electrodermal response (EDR) (Wang et al. 2016; Ito et al. 2011).

Studies of lying behavior focus on four major aspects: verbal behavior, nonverbal behavior, paralinguistic behavior and physiological reaction (Inbau et al. 2013). Verbal behavior refers to aspects of how the subject responds to a prompt, including vocabulary selection and response content. Untruthful responses tend to feature different vocabulary from truthful responses, along with generalized and ambiguous terms nonverbal behavior refers to physical gestures and movements made while speaking, along with facial expressions. Under normal circumstances, subjects feel increased psychological pressure while engaged in lying, which can produce involuntary physical responses in an attempt to release inner anxiety and cognitive conflict.

Paralinguistic behaviors include accent, pitch, volume, speech rate, modulation, and fluency. While lying, subjects tend to raise the pitch of their voice, and take increasingly long pauses before responding (Inbau et al. 2013). The observation of these physiological responses is subject to the surveyors subjective interpretation, which may result in false-positive or false-negative results.

2.2 EEG and lie detection

Lie detection using oral interrogation, written questionnaires, and scales is open to subjective interpretation and is time-intensive to conduct. Recent years have seen the increasing use of physiological bio-signals for immediate lie detection, including electrocardiogram (ECG), galvanic skin response (GSR), and functional MRI (fMRI) (Ito et al. 2011). Physiological responses are processed using data mining, neural networks, and statistical analysis methods to assess statement veracity.

Deception is associated with strong parietal P3b suppression and significant frontal negativity (N400). A visual misinformation paradigm was combined with event-related potential (ERPs) to examine the impact of deception and misinformation on ERP components previously associated with deception and memory processes (Meek et al. 2013). Chun et al. (2014) used the guilty knowledge test (GKT) to investigate neural correlates of deception using EEG frequency analysis, and found that truth and lies produce significant variation among three different frequency bands: alpha (8.1–13.9 Hz), high beta (20.5–30.0 Hz), and gamma (30.3–40.0 Hz).

Linear discriminant analysis (LDA) as a classification method used to identify lying through the use of the P300 frequency and skin pattern (EOG) in three EEG electrode points (Fz, Cz, Pz), achieving an identification accuracy rate of 86% (Abootalebi et al. 2009; Rosenfeld et al. 2012; Winograd and Rosenfeld 2014). Palmatier and Rovner (2015) compared the performance of the comparison question test (CQT) and concealed information test (CIT) and found that the CIT provides higher accuracy.

2.3 Fuzzy reasoning technologies

Fuzzy systems have been widely used in automatic control, pattern recognition, decision analysis and time signal processing (Antonelli et al. 2016; Liu et al. 2016). In terms of practical applications, fuzzy theory is mostly used to develop fuzzy expert systems for use in medical diagnosis, recommendation systems, e-learning, decision support and other related fields (Chen 1994; Chen and Kao 2013). The core of the fuzzy expert system is the inference engine and the fuzzy rule base (Chen 1995).

The key to establishing a fuzzy system is obtaining the relevant fuzzy rules through various training algorithms (Chen 1996). Based on fuzzy rules, the fuzzy inference engine uses different fuzzy reasoning techniques to simulate human consciousness (Chen and Li 2011). Previous studies have found that fuzzy expert systems can represent vague logical implications and relationships. Such expert systems are suitable use in situations where general theories and methods are lacking, data inaccuracies or information incompleteness, or the lack of human expertize for diagnosis, interpretation, monitoring, forecasting, planning, and task design (Chen and Chung 2006; Wang and Chen 2008; Chen et al. 2009a, b). Therefore, we use fuzzy theory to develop a mobile EEG application based a reasoning system for lie detection.

3 Proposed method

This section introduces the methods used in this study to develop the proposed lie detection model.

3.1 Fast Fourier transform

The fast Fourier transform (FFT) is improved based on discrete Fourier transform (DFT). It proceeds by dividing the DFT of an original value N into two numbers N 1 and N 2 by recursive decomposition, such that N = N 1 N 2. Therefore, it can speed up the calculation of Fourier transformation. The DFT equation is as follows: N is the sequence number and it has to be a power of 2; W is the twiddle factor; and x n is the original signal:

$$X_{k} = \sum\limits_{n = 0}^{N - 1} {x_{n} e^{{ - j\frac{2\pi }{N}kn}} } ,\quad k = 0,1,2,3, \ldots ,N - 1.$$
(1)

3.2 Minimize entropy principle approach

The minimize entropy principle approach (MEPA) uses entropy to establish a fuzzy membership function. Entropy is used to express the degree of uniform distribution of energy in space, where greater uniformity corresponds with a greater degree of information distribution uniformity minimizes the degree of information entropy clutter in each data interval and determines the interval that requires the smallest degree of information to establish a division of the threshold line and then constrcut the fuzzy membership function (Ross 2009).

Equations (2), (3), and (4) are applied to calculate the entropy value obtained for the threshold line x i created at each point where data categories differ. The threshold line of the minimum entropy is the determined partition line which can divide the whole data section into two segments, segment f and segment g:

$$E(x) = f(x)E_{f} (x) + g(x)E_{g} (x),$$
(2)
$$E_{f} (x) = - [f_{1} (x) + \ln f_{1} (x) + f_{2} (x) + \ln f_{2} (x)],$$
(3)
$$E_{g} (x) = - [g_{1} (x) + \ln g_{1} (x) + g_{2} (x) + \ln g_{2} (x)],$$
(4)
$$f(x) + g(x) = 1.$$
(5)

f k (x) and g k (x), respectively, represent the conditional probability of class k samples in segment f and segment g. A value of x that gives the minimum entropy is the optimum threshold value. Estimated f k (x), g k (x), f(x), and g(x) values are calculated as follows:

$$f_{k} (x) = \frac{{n_{k} (x) + 1}}{n(x) + 1},$$
(6)
$$g_{k} (x) = \frac{{N_{k} (x) + 1}}{N(x) + 1},$$
(7)
$$f(x) = \frac{n(x)}{n},$$
(8)
$$g(x) = 1 - f(x),$$
(9)

n k (x) is the number of class k samples located in segment f, n(x) is the total number of samples located in segment f; N k (x) is number of class k samples located in segment g, N(x) is the total number of samples located in segment g, and n is the total number of samples.

4 Experiment

This section describes the construction and validated result of the proposed lie detection model.

4.1 Participants and materials

Ten graduate students were recruited, including 5 men and 5 women aged between 22 and 24 years old (average age 23.4 years with a standard deviation of 1.2). To minimize experimental error, subjects met the following criteria: (1) willing to participate in research and complete the consent form (2) able to communicate with the surveyor and answer questions, and (3) no hearing impairment.

This study used the brain–computer interface (BCI) device developed by NeuroSky Inc. Mindwave Mobile was used to collect EEG signals by means of a single-channel dry electrode sensor attached to the prefrontal frontal (Frontal pole, Fp1). The reference electrode and circuit grounding system were connected to the left earlobe, with a frequency range between 0.5 and 50 Hz and a sampling frequency of 512 Hz. The device uses the Bluetooth protocol to communicate with mobile devices (Android and iOS), notebooks, and desktops (Windows and Mac). The NeuroSky SDK was used to develop the system.

4.2 Experimental scenario design

An experimental scenario is designed to collect brain wave data in a laboratory setting. The card test polygraph method is used to prompt truthful and deceptive subject responses. We first explained the experimental goals and process to the subjects. The Mindwave mobile EEG measurement headset was then fitted to the subjects and tested. The card test takes about 3 min to complete for each subject. Surveyors prepared nine cards, each with a number, and these are held by the subject such that the surveyor cannot see the numbers. The subject then looks at each card and tells the surveyor a number, which may or may not be the actual number printed on the card. During testing, the subject keeps track of which cards he/she identifies incorrectly (i.e., “lies”). Each subject was given 15 s to consider their response and 5 s to make the response. Brain wave patterns were recorded throughout the test. After the experiment, the subjects told the surveyor which cards were truthfully identified, and which were lies. Finally, the truth/lie state of each card was linked to its corresponding time interval of the collected brain waves. From each subject, we obtained a total of 9 brain wave segments for a total of 90 segments.

4.3 Lie detection model construction

The lie detection model is established using brainwave characteristic transformation and fuzzy rules.

4.3.1 Data sampling and filtering

Brainwaves are weak physiological signals and are easily disrupted by other signals, making it necessary to use a filter to remove corrupted signals. In this study, brainwave data were passed through a Butterworth filter, and four records were found to exhibit serious interference and were thus excluded to produce a total of 63 honest segments and 23 lie segments.

4.3.2 Feature extraction

Previous studies have shown that different types of brainwaves reflect various physiological states. Decreased alpha and increased beta waves are, respectively, and significantly related to workload and attention. Some studies also have shown that alpha wave amplitude is reduced when the subject is focused on mental arithmetic tasks (Chun et al. 2014). The Cooley–Tukey FFT is used to obtain the various frequency bands from the time domain of the brainwave.

This study focused on the gamma, theta, alpha, beta, and the total frequency bands. Alpha and beta waves were further subdivided into lower alpha (LAlpha), high alpha (HAlpha), lower beta (LBeta), and high beta (HBeta) for in-depth exploration. The maximum, minimum, average and standard deviation (SD) of power values in each frequency band were calculated using power spectral density (PSD) as the characteristic for classification, producing a total of 36 characteristics, as shown in Table 1.

Table 1 Features description

This study used the analysis of variance test (ANOVA) to determine the relative importance of five key characteristics for lie detection: Total.mean, HAlpha.sd, Beta.mean, MBeta.max, Gamma.mean.

4.3.3 Establishing fuzzy rules

MEPA is used to segment the EEG data according to information entropy. Triangular and trapezoidal fuzzy functions were used to establish the membership function of each EEG characteristic, as shown in Appendix I. Based on the different characteristics and states, the corresponding fuzzy linguistic value and membership degree can be obtained. Each EEG characteristic constructs its respective membership functions and fuzzy semantics to distinguish among three levels: truth, suspected lie, and lie. For example, as shown in Fig. 1, a Beta.mean of 15.201 or less indicates truth, while 15.201–25.807 indicates a suspected lie, and 25.807 or above indicates a lie. The fuzzy rules are shown in Eqs. (1)–(3). The fuzzy rule set is obtained from the fuzzy functions of each EEG.

Fig. 1
figure 1

Mindwave Mobile (NeuroSky)

People tend to think before answering questions, resulting in a slight decrease in alpha wave activity and a slight increase in beta wave activity. Excessive beta activity suggests the subject is lying, while excessive relaxation suggests a high degree of deliberation and raises the possibility of lying. Thus, the likely veracity of the subject’s response to questions may be assessed by observing the degree of change in brain waves based on fuzzy membership functions. Moreover, the conditions of the fuzzy rules base can be adjusted according to the severity of the lie.

4.4 Model evaluation

Independent of the training data, two subjects wearing the EEG sensor were situated in a quiet room and performed the lie test for 13 min. After the end of the experiment, the subjects revealed their “lie” responses, and these were compared against the assessment provided by the system test.

According to Table 2, Subject 1 subject lying in the polygraph test was 21.33%, the probability of lying was 60.33%, and the proportion of honesty was 18.33%. The polygraph result was consistent with the subject responses. The ratio of lying for subject 2 in the polygraph test was 32.66%, while probable lies was 12.33%, and honesty was 55%. However, the polygraph result indicated honesty, and this discrepancy may be explained by psychological factors, as subject 1 reported feeling guilty while lying, but subject 2 did not.

Table 2 Detection results

Later, 12 other subjects were recruited for card polygraph testing and 91 new EEG records (66 truth and 25 lie) were collected to evaluate the proposed lie detection model and to compare the performance against other data mining methods, including Decision Tree (C4.5), Multilayer Perceptron (MLP) Neural Network, Naïve Bayes (NB) and Bayes Net (BN) and support vector machines (LibSVM) using WEKA v3.7.11. Table 3 shows our proposed model generally outperforms the other data mining methods in terms of recall-rate, F-measure and accuracy, thus providing good results for lie detection. Although LibSVM has the highest accuracy result, our approach can analyze the degree of honesty based on changes in brain waves.

Table 3 Lie detection testing results

5 Prototype system

Figure 2 illustrates the proposed lie detection system architecture in three parts: parameter capture, polygraph system, and database. Each with different functions as detailed below:

Fig. 2
figure 2

Fuzzy membership function of Beta.mean

  1. 1.

    Parameter capture The EEG sensor captures brain wave data which is then transmitted by Bluetooth to the mobile device, which then uploads the data to the server.

  2. 2.

    Feature transformation module The brainwave signals are filtered for noise, and assigns values to the various EEG characteristics for transfer to the detection module.

  3. 3.

    Detection and alert module Based on the fuzzy rules, the fuzzy reasoning produces results which are displayed on the mobile device screen for real-time lie detection.

  4. 4.

    Database Database records include personal basic information, raw brain wave data, lie detection rules and test results. The test results are also stored in the database for additional management and analysis (Fig. 3).

    Fig. 3
    figure 3

    Prototype lie detection system architecture

Once the brainwave sensors are in place and the subject has logged in, he/she presses the link button to activate the sensors, allowing the system to receive brain wave signals for display as a line chart. Brain wave data are transmitted to the server for signal processing using rule base for classification. The subject then presses polygraph analysis button to display a pie chart showing the distribution of brain waves in various bands, providing an indication of the likelihood that the subject is lying. In addition, the detailed record function provides a more detailed picture of the brain wave levels and distributions over time.

6 Conclusion

Traditional polygraph techniques are time-consuming, expensive, and inconvenient to use, and provide unreliable results. Many studies have shown that EEG signals can reflect the cognitive and attention mechanisms of the brain and can be used to develop a quick and objective lie detection tool. However, few previous studies have focused on the correlation of EEG bands and lie detection and no unique EEG bands or characteristics have been identified as being specifically correlated to lying, limiting practical applications. This study uses spectral analysis techniques to identify EEG bands and characteristics useful for lie detection and constructs a real-time lie detection model. MEPA is used to construct a fuzzy membership function by which changes to EEG characteristics can be used for lie detection. Moreover, a fuzzy rule set is built to support the development of an expert system.

This study integrates wearable EEG sensors with mobile devices to establish a real-time prototype lie detection system based on fuzzy rule. Using a typical smart phone, the system allows users to detect lies in real time, providing a quick and inexpensive alternative to traditional polygraph techniques. It also provides a useful reference for future research on the accuracy of polygraph tests, the outcomes of which can be affected by the subject’s physical and psychological fitness, along with environmental factors. Future research can take such factors into consideration to better assess variability in detection accuracy.