Abstract
EEG-based classification from human states is still challenging in human-computer interaction (HCI). Since it reflects brain activity directly, electroencephalography (EEG) has significant advantages in emotion classification research. This study recorded EEG signals from 12 participants while they perceived emotional audio-visual stimulation movie clips for four minutes. The six emotions studied were anger, excitement, fear, happiness, sadness, and a neutral state. We also perform raw data preprocessing to obtain clean data, extracting the power spectrum using Fast Fourier Transform (FFT) to generate feature vectors. In addition, we conduct extensive experiments to validate the classification of human states using subject-independent machine-learning techniques. As a result, the LSTM network achieved the highest classification accuracy of 81.46% for six emotional states, while the SVM network achieved only 68.64%. In addition, we achieved 82.89% accuracy in the Bi-LSTM network with two layers when applying the deep learning methods to different layers. In conclusion, extensive experiments were conducted on our collected dataset. Experimental results indicate that the LSTM network, a time-sequence-related model, has superior classification results of 82.89% than other methods. It also shows that the long duration of EEG signals is crucial for detecting the emotional state changes of various subject types, including individual subjects and cross-subjects. In future research, we will investigate multiple evaluation experiments utilizing deep learning models or propose novel EEG-based emotion classification features to improve the accuracy of emotion classification.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
With the rapid development of human-computer interface (HCI) technology, emotion classification is becoming increasingly important and has attracted the attention of diverse fields, ranging from psychology to engineering [9]. For this reason, emotion classification and recognition has become a promising topic that leads to the development of technology fields in Artificial Intelligence systems (AI), such as Human-Machine Interface (HMI), Human-Robot Interaction (HRI), and Brain-Computer Interface (BCI).
Emotions are most important in human perception, awareness, and social behavior. During interpersonal communication, we identify emotion using various cues, including facial and vocal expressions. Emotional characteristics related to feeling, intent, and thoughts from the human state have been included among these factors. Nevertheless, it can be challenging to recognize the vast range of expression and intensity, subjective emotions, and unique characteristics of each individual.
There are two main issues: how to recognize and classify emotions effectively. One of these is the classification of emotions. Some psychologists have proposed two emotion theories to help in emotion classification: 2D dimensional spaces and basic emotion models [10, 21]. Based on human characteristics and behavior, Ekman characterized emotions as primary emotions, including anger, fear, happiness, sadness, disgust, and surprise [10]. Russell introduced the two-dimensional space, valence, and arousal-based emotional model [21]. Two-dimensional spaces consisting of valence and arousal can be plotted on a two-dimensional plane, as shown in Fig. 1. For instance, valence is ranged from negative (or unpleasant) to positive (or pleasant) to quantify emotions. Alternately, arousal is ranged from calm (or inactive) to active (or excited) when describing the intensity of emotion. These theories have been widely applied to emotion-related research, particularly Affective Computing (AC).
Another is how to evaluate emotions objectively. In general, human emotions predicted the following evaluation methods: nonverbal behavior (facial expression, action recognition), speech behavior (text, conversation analysis), and physiological signals (EEG, ECG). Because of the advantages of strong objectivity, inability to be forged, ease of acquisition with a wearable EEG headset, and ease of operation, the EEG has become increasingly widespread in emotion recognition. In addition, emotion detection is generally used because it provides the most relevant information for detecting human emotions.
In our work, we will concentrate on the physiological modalities of electroencephalogram (EEG) since they are direct reactions caused by emotions. We also attempt to improve classification through interindividual patterns to relieve subject differences because it is a very challenging task to be suitable for all individuals customized. The primary objective of this paper is to introduce the novel EEG emotion dataset reflective of a culture-specific language, Korean, and perform emotion classification using collected data. In contrast to previous studies, we define the emotional categories by focusing on Ekman’s theory based on Russell. Consequently, we conduct an extensive experiment on six emotions from the human state: anger, excitement, fear, happiness, sadness, and neutral. In addition, we examined the inter-subject approach, known as the subject-independent method.
The organization of this paper is as follows: We describe the previous work regarding EEG-based emotion classification in Section 2. Section 3 describes the details of the emotional stimuli and data collection procedure. Section 4 describes a preprocessing method for the EEG data and analysis for feature extraction. Section 5 describes the experimental setup for emotion classification using machine learning. Section 6 gives the results of the extensive experiment. In conclusion, we finally provide our conclusion and future research in Section 7.
2 Related work
2.1 Emotion theory
Recent studies have attempted to analyze EEG data using machine learning and deep learning techniques. In general, EEG-based emotion recognition research is crucial in determining how to select emotion-related features, given that EEG signals contain many features, such as the frequency and time-frequency domains, for emotion classification [23]. Many researchers are attempting to improve classification accuracy by employing various machine learning techniques, in particular deep learning, as shown in Table 1 [4, 14,15,16, 18, 23, 24, 27].
Wang et al. [23] classified emotions using the Russell emotional model’s four emotions: high/low valence and arousal. Under feature extraction, they implemented a power spectrum with a short-time Fourier transform (STFT). Using the SVM classifier, they achieved an average accuracy of 74.41% for valence and 73.61% for arousal. Xu et al. [24] performed emotion recognition with a power spectrum feature and four emotions (LALV (Low-arousal, Low Valence), LAHV (Low arousal, High Valence), HALV (High Arousal, Low Valence), and HAHV (High Arousal, High Valence)) (High arousal, High valence). They utilized discrete wavelet transform (DWT) and ELM to achieve 79.37%. (Extreme Learning Machine). Additionally, Javaid et al. [15] performed classification on four emotions (pleasant, sad, happy, and frustrated). They utilized Higuchi’s Fractal Dimension to extract features and SVM as a classifier. The best accuracy achieved was 87.62% for arousal and 83.28% for valence. Recently, Chen et al. [4] presented the different types of EEG features selected and combined such as Lempel–Ziv complexity, Wavelet detail coefficient, and Average approximation entropy. The calculated four features are input into the LIBSVM classifier to realize the classification results. As a result, two-category classifications are performed on Arousal and Valance. The average recognition rates are 74.88% for arousal and 82.63% for valence, respectively.
In deep learning techniques, Zheng et al. [27] performed emotion recognition using differential entropy (DE) features with three emotions (positive, neutral, and negative). They achieved classification accuracies of 86.65%, respectively, using a deep belief network (DBN) classifier. Nath et al. [18] performed emotion classification with two emotions (high/low valence and arousal). They achieved the best accuracy of 87.5% using an SVM classifier. With the proposed LSTM model, the best classification accuracy for valence and arousal scales was 94.69% and 93.13%, respectively.
Recently, Islam et al. [14] demonstrated a novel method focusing on lower computational complexity based on memory requirement and computational time. They also proposed using Convolutional Neural Network architecture for EEG-based emotion recognition model. They introduced Pearson’s Correlation Coefficients (PCC) of alpha, beta, and gamma sub-bands as EEG features. As a result, they achieved the maximum accuracy of 78.22% on valence and 74.92% on arousal obtained using the DEAP dataset.
Moon et al. [17] conducted the classification using the power spectral density and three emotional categories (Positive, Neutral, and Negative). They focused on adjusting the hyper-parameter corresponding to the number of convolutional layers and attained a maximum accuracy of 86.86% with five layers.
Most studies selected two to four emotion categories in the previous research. Moreover, the Russell model does not clearly and randomly select emotion categories. To improve emotion classification accuracy, we need to have more detailed emotion categories that are specific and extensive. To solve this problem, we adequately divide and extend the emotional categories.
3 Materials and methods
3.1 Institutional review board (IRB)
The experiment was conducted with approval from the Institutional Review Board (IRB) at Yonsei University. (Approval No. 7001988-201,807-HR-424-03).
3.2 Subjects
In the study, we carried out 12 subjects(6 females) aged between 21 and 33 (mean = 25.67; SD = 3.6) with right-handedness. All subjects had normal vision and had not taken any medication, as well as had no neurological or psychiatric illness before. Each participant was required to abstain from caffeine, tobacco, and alcohol use for 24 hours before. They signed written informed consent before the experiment and received KRW 30,000 (about $ 26).
3.3 Materials
Various stimuli are utilized in emotion-related research, such as movie clips, music, speech, etc. Among them, the movie clip has greater efficiency and reliability in inducing emotions as it contains audio and video. According to emotional categories, the stimuli were five movie clips containing four minutes, i.e., anger, excitement, fear, happiness, and sadness, taken from the Korea Film Council (KOFIC). To this end, we selected the movie clips from 160 participants to achieve standardized elicitation stimulus using an evaluating scale self-assessment manikin (SAM) [3]. (See in Fig. 2).
In addition, we added a neutral state consisting of a black background and a cross-mark in the center as the baseline. Each emotional movie clip’s descriptions are described in Table 2 and Fig. 3.
3.4 Experimental procedure
The experiment was performed in a quiet room controlled for sound and light conditions. The participants were seated in comfortable chairs in front of the monitor. After the subjects correctly understood the instruction, they pressed any key on the keyboard to move to the next step. Before participating in the experiment, they requested to complete the written consent form.
The experimental procedure included six trials. Each trial started with a 5-second dark screen with a cross in the center, followed by an emotional movie clip lasting 4 seconds. After each trial ended, they used a questionnaire to rate how they felt about each movie clip. The trials were randomly presented; the experimental procedure is represented in Fig. 4.
3.5 EEG acquisition
An EEG captures and records the electrical functions of the brain along the scalp. EEG recordings were continuously collected using the Emotiv EPOC wireless headset (Emotiv Systems, Inc., San Francisco, CA) with 14 electrodes (AF3, AF4, F7, F8, F3, F4, FC5, FC6, T7, T8, P7, P8, O1, and O2) at a sampling rate of 128 Hz. The electrodes were placed according to the international 10-20 System and used for the common ground reference (left and right mastoid), as shown in Fig. 5. We applied a 60 Hz notch filter to reduce the power supply noise. The impedance of each electrode was kept at less than 5 KΩ. In this work, we concentrated on the frontal-temporal lobes of the brain. According to [22, 26], emotional changes mostly affect the EEG signals in the frontal and temporal lobes.
4 Data analysis
4.1 EEG preprocessing
The EEGLAB Tool Box, open software based on MATLAB [7], was used to preprocess the EEG signals. Before the preprocessing, we excluded the first and last 30 seconds of each data set to induce the highest elicitation of emotions. The raw data is divided into six trials under emotional categories separately. After segmentation, the data is extracted from 60 to 180 seconds to elicit more evoked emotional responses.
The raw EEG data were filtered from 0.1 to 50 Hz and converted to an average reference. The recorded EEG signals contain many artifacts like eye blinking, cardiac and muscular activity, power noise, etc. To accomplish this, we used independent component analysis (ICA), a popular technique for multi-sensor signal processing [2]. After removing the artifacts, we used the preprocessed EEG signals to evaluate the classification. EEG data preprocessing consists of the following steps:
-
(1)
Filtering: Using a notch filter, band-pass filter the EEG data between 0.1 and 50 Hz to remove power line noise
-
(2)
Re-reference: Replace the average reference
-
(3)
Visual inspection
-
(4)
Independent Component Analysis (ICA)
4.2 EEG feature extraction
The classification of EEG signals into emotion categories needs feature extraction from the signal. To extract features, we analyzed the power spectrum of the EEG signals from each channel, ranging from 1 to 50 Hz. A Fast Fourier Transformation (FFT) method with a half-overlapped 1-s Hanning window was applied to each of the 14 channels to compute the spectral time series. The FFT algorithm was used to derive spectral power (uV2) estimates for each channel site. The process was conducted using Darbeliai, the EEGLAB extension. The absolute power values from each channel were obtained within the frequency band.
5 Experiments
5.1 Classification models
For EEG-based emotion classification, machine learning techniques such as Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) and deep learning techniques such as Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) have emerged by [6, 13, 16, 25].
Introduced by Cortes [5], SVM is one of the most effective classification problem-solving techniques. It aims to determine a decision boundary, the hyperplane, to classify data into two categories. It is determined by optimizing the distances between the nearest data points of the classes and the hyperplane. Although EEG has high time resolution, the most commonly used classifiers for emotion classification can only be performed using independent EEG segments. To solve sequential classification tasks, special deep learning methods are introduced, such as Recurrent Neural Networks (RNN) and Long-Short Term Memory (LSTM) [1, 8, 18].
The RNN recognizes patterns in data sequences but has limitations in learning to avoid long-term dependency. To cover this issue, most researchers are highly interested in LSTM, an extension of the RNN that can learn longer temporal sequences. To the best of our knowledge, emotions change continuously, and this continuity is reflected in the temporal correlations of EEG signals. Because of this, LSTM is expected to better classify EEG signals than other deep learning methods when looking at their temporal correlations because it is suitable for capturing temporal dynamics.
An LSTM network comprises LSTM cells consisting of special gates (i.e., input gate, forget gate, output gate). These gates determine which information is to be added or discarded to the cell state. All the cells in the LSTM chain keep long-term dependencies on this cell state. Thus, the output of the cell and hidden states enables the LSTM network to learn both short-term and long-term dependencies successfully.
In addition, the Bi-directional Long-Short Term Memory (Bi-LSTM) network, an improved version of the LSTM, can process data in two directions with separate hidden layers, respectively [11]. It has also been widely utilized as an effective deep network algorithm for mental state classification or recognition based on physiological signals [12]. Compared with the LSTM, the Bi-LSTM can access the long-range data in both input directions and better model time sequences. Therefore, an emotion classification model based on both LSTM and Bi-LSTM networks is adopted in this paper. Finally, we use the training model to identify the emotional state of the testing samples and calculate their classification accuracy.
5.2 Experiment setup
To perform classification tasks, feature vectors are generated. We apply a 2-s window size with 256 samples and a 50% overlap to the raw EEG signals of 12 subjects for feature extraction. Each sample is represented as a vector with 14 channels.
We performed the experiment using the Scikit-learn library and the SVM with the RBF kernel function [20]. In addition, we perform classification using LSTM and Bi-LSTM. Our LSTM and Bi-LSTM comprise 1 or 2 layers with 256 hidden units, a dropout layer with a probability of 0.5, and a fully connected (FC) layer that serves as a classifier. To prevent overfitting, we implement dropout layers before the FC layer and between the LSTM or Bi-LSTM layers when they have more than two layers. The Adam optimizer is utilized during training with a 0.0005 learning rate and 100 epochs. Experiments are implemented using Python 3 with the Pytorch library [19].
In our experiments, we employed the evaluation strategy, which is subject-independent. The subject-independent experiments were used to evaluate them with 5-fold cross-validation. The details of the hyper-parameters that were selected in our experiments are shown in Table 3.
6 Experimental results
In this section, we discuss the experimental results for six emotions using the SVM, LSTM, and Bi-LSTM classifiers. Experimental results are presented in classification accuracy using a subject-independent approach.
6.1 The performance comparison of different classifiers
Table 4 summarizes the experimental results regarding the average emotion classification accuracy under different classifiers, SVM, LSTM, and Bi-LSTM. As results, LSTM outperforms in terms of emotion recognition accuracy more than other classifiers, especially the LSTM in a large gap, which is at least above 80.46%, while SVM is 68.64%. Compared with the traditional machine learning method SVM, the accuracy of LSTM improves by approximately 12.91% with six emotions. The approach achieved a higher recognition rate than the SVM and captured temporal dynamicity due to complex time series data. These results clearly show the effectiveness of the LSTM and Bi-LSTM in EEG-based emotion recognition compared to SVM. In addition, we perform extensive experiments using the deep learning methods of both LSTM and Bi-LSTM, a popular method suitable to capture temporal dynamicity.
6.2 The performance comparison of deep learning with different layers
Table 5 summarizes the experimental results under deep learning techniques with the number of layers to evaluate the emotion classification of emotional movie clips. As results, we achieve the best accuracy in the Bi-LSTM, with two layers of 82.89%, while the LSTM is 82.17%. Similarly, we confirm that both classifiers with two layers achieve the highest accuracy and are better than classifiers with other layers. It is indicated that the appropriate number of layers is required to achieve better classification accuracy. In other words, we confirm that adjusting the number of layers is a very important hyper-parameter in improving classification accuracy. It means that, depending on the size of the layers, it can also stop over-fitting and enhance the accuracy of generalization.
Table 6 and Fig. 6 show the accuracy of each subject by the subject-independent approach. The highest accuracy achieved was 82.89% by the Bi-LSTM using two layers, showing a better performance of approximately 1 to 3% over other classifiers. Compared with all subjects, the highest accuracy is 92.38% with Bi-LSTM using two layers. Moreover, we observe a drop in the accuracy rate of three layers rather than two across all subjects. It can also be thought of as the role of the layer as a hyper-parameter that affects the accuracy.
7 Conclusions
In this paper, we conduct emotion classification from EEG signals under machine learning techniques. In addition, we present the novel EEG emotional dataset and demonstrate how to recognize emotion from EEG signals. Although the EEG signals vary significantly depending on individuals, we obtained promising results for classifying six emotions. It also shows that the Bi-LSTM with two layers performs with the best accuracy of the other classifiers. Both classifiers suit high temporal resolution or time sequences and propose a novel method to improve EEG-based emotion recognition.
In future work, we will collect the EEG data to expand the dataset and publicly introduce an EEG-based emotion database. Furthermore, we will propose a novel classification method to classify optimal physiological signals, specializing in EEG signals, and conduct a comparative study with public EEG datasets to improve classification accuracy considerably. Moreover, the EEG-based emotion classification accuracy is lower than in other methods, speech, face, and body since there is a variability in EEG signals among individuals. To relieve the gap among individuals, we will perform the emotion classification that evaluates diverse subject approaches. Hence, we should look into the EEG-based features of emotions to develop a subject-specific emotion model system, like a subject-dependent, subject-independent, or cross-subject strategy.
Data availability
The data cannot be publicly available because all relevant data include personal information. The interested parties may request the data from the corresponding author upon reasonable requests.
References
Acharya D et al (2020) A long short term memory deep learning network for the classification of negative emotions using EEG signals. In: 2020 international joint conference on neural networks (IJCNN). IEEE
Bartlett M, Sejnowski TJ (1996) Viewpoint invariant face recognition using independent component analysis and attractor networks. Advances in Neural Information Processing Systems 9
Bradley MM, Lang PJ (1994) Measuring emotion: the self-assessment manikin and the semantic differential. J Behav Ther Exp Psychiatry 25(1):49–59
Chen T et al (2020) EEG emotion recognition model based on the LIBSVM classifier. Measurement 164:108047
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
Craik A, He Y, Contreras-Vidal JL (2019) Deep learning for electroencephalogram (EEG) classification tasks: a review. J Neural Eng 16(3):031001
Delorme A, Makeig S (2004) EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods 134(1):9–21
Du X, Ma C, Zhang G, Li J, Lai YK, Zhao G., ..., Wang H (2020) An efficient LSTM network for emotion recognition from multichannel EEG signals. IEEE Transactions on Affective Computing 13(3):1528-1540
Egger M, Ley M, Hanke S (2019) Emotion recognition from physiological signal analysis: A review. Electronic Notes Theor Comput Sci 343:35-55
Ekman P (1992) An argument for basic emotions. Cognit Emot 6(3-4):169–200
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5-6):602–610
Hefron RG et al (2017) Deep long short-term memory structures model temporal dependencies improving cognitive workload estimation. Pattern Recogn Lett 94:96–104
Hosseini M-P, Hosseini A, Ahi K (2020) A review on machine learning for EEG signal processing in bioengineering. IEEE Rev Biomed Eng 14:204–218
Islam MR et al (2021) EEG channel correlation based model for emotion recognition. Comput Biol Med 136:104757
Javaid MM et al (2015) Real-time EEG-based human emotion recognition. In: International conference on neural information processing. Springer
Mehmood RM, Lee HJ (2015) Emotion classification of EEG brain signal using SVM and KNN. In: 2015 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE
Moon S-E, Jang S, Lee J-S (2018) Convolutional neural network approach for EEG-based emotion recognition using brain connectivity and its spatial information. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE
Nath D et al (2020) An efficient approach to eeg-based emotion recognition using lstm network. In: 2020 16th IEEE international colloquium on signal processing & its applications (CSPA). IEEE
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, ..., Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161
Salzman CD, Fusi S (2010) Emotion, cognition, and mental state representation in amygdala and prefrontal cortex. Annu Rev Neurosci 33:173
Wang Z-M, Hu S-Y, Song H (2019) Channel selection method for EEG emotion recognition using normalized mutual information. IEEE Access 7:143303–143311
Xu H et al (2019) Research on EEG channel selection method for emotion recognition. In: 2019 IEEE international conference on robotics and biomimetics (ROBIO). IEEE
Yang H, Han J, Min K (2019) A multi-column CNN model for emotion recognition from EEG signals. Sensors 19(21):4736
Zhao G, Zhang Y, Ge Y (2018) Frontal EEG asymmetry and middle line power difference in discrete emotions. Front Behav Neurosci 12:225
Zheng W-L, Lu B-L (2015) Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans Auton Ment Dev 7(3):162–175
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (grant number) (NRF-2022R1I1A1A01053144).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lee, W., Son, G. Investigation of human state classification via EEG signals elicited by emotional audio-visual stimulation. Multimed Tools Appl 83, 73217–73231 (2024). https://doi.org/10.1007/s11042-023-16294-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16294-w