1 Introduction

With the rapid development of human-computer interface (HCI) technology, emotion classification is becoming increasingly important and has attracted the attention of diverse fields, ranging from psychology to engineering [9]. For this reason, emotion classification and recognition has become a promising topic that leads to the development of technology fields in Artificial Intelligence systems (AI), such as Human-Machine Interface (HMI), Human-Robot Interaction (HRI), and Brain-Computer Interface (BCI).

Emotions are most important in human perception, awareness, and social behavior. During interpersonal communication, we identify emotion using various cues, including facial and vocal expressions. Emotional characteristics related to feeling, intent, and thoughts from the human state have been included among these factors. Nevertheless, it can be challenging to recognize the vast range of expression and intensity, subjective emotions, and unique characteristics of each individual.

There are two main issues: how to recognize and classify emotions effectively. One of these is the classification of emotions. Some psychologists have proposed two emotion theories to help in emotion classification: 2D dimensional spaces and basic emotion models [10, 21]. Based on human characteristics and behavior, Ekman characterized emotions as primary emotions, including anger, fear, happiness, sadness, disgust, and surprise [10]. Russell introduced the two-dimensional space, valence, and arousal-based emotional model [21]. Two-dimensional spaces consisting of valence and arousal can be plotted on a two-dimensional plane, as shown in Fig. 1. For instance, valence is ranged from negative (or unpleasant) to positive (or pleasant) to quantify emotions. Alternately, arousal is ranged from calm (or inactive) to active (or excited) when describing the intensity of emotion. These theories have been widely applied to emotion-related research, particularly Affective Computing (AC).

Fig. 1
figure 1

The emotions on the valence-arousal dimension model by Russell [21]

Another is how to evaluate emotions objectively. In general, human emotions predicted the following evaluation methods: nonverbal behavior (facial expression, action recognition), speech behavior (text, conversation analysis), and physiological signals (EEG, ECG). Because of the advantages of strong objectivity, inability to be forged, ease of acquisition with a wearable EEG headset, and ease of operation, the EEG has become increasingly widespread in emotion recognition. In addition, emotion detection is generally used because it provides the most relevant information for detecting human emotions.

In our work, we will concentrate on the physiological modalities of electroencephalogram (EEG) since they are direct reactions caused by emotions. We also attempt to improve classification through interindividual patterns to relieve subject differences because it is a very challenging task to be suitable for all individuals customized. The primary objective of this paper is to introduce the novel EEG emotion dataset reflective of a culture-specific language, Korean, and perform emotion classification using collected data. In contrast to previous studies, we define the emotional categories by focusing on Ekman’s theory based on Russell. Consequently, we conduct an extensive experiment on six emotions from the human state: anger, excitement, fear, happiness, sadness, and neutral. In addition, we examined the inter-subject approach, known as the subject-independent method.

The organization of this paper is as follows: We describe the previous work regarding EEG-based emotion classification in Section 2. Section 3 describes the details of the emotional stimuli and data collection procedure. Section 4 describes a preprocessing method for the EEG data and analysis for feature extraction. Section 5 describes the experimental setup for emotion classification using machine learning. Section 6 gives the results of the extensive experiment. In conclusion, we finally provide our conclusion and future research in Section 7.

2 Related work

2.1 Emotion theory

Recent studies have attempted to analyze EEG data using machine learning and deep learning techniques. In general, EEG-based emotion recognition research is crucial in determining how to select emotion-related features, given that EEG signals contain many features, such as the frequency and time-frequency domains, for emotion classification [23]. Many researchers are attempting to improve classification accuracy by employing various machine learning techniques, in particular deep learning, as shown in Table 1 [4, 14,15,16, 18, 23, 24, 27].

Table 1 Previous research on emotion classification using EEG signals

Wang et al. [23] classified emotions using the Russell emotional model’s four emotions: high/low valence and arousal. Under feature extraction, they implemented a power spectrum with a short-time Fourier transform (STFT). Using the SVM classifier, they achieved an average accuracy of 74.41% for valence and 73.61% for arousal. Xu et al. [24] performed emotion recognition with a power spectrum feature and four emotions (LALV (Low-arousal, Low Valence), LAHV (Low arousal, High Valence), HALV (High Arousal, Low Valence), and HAHV (High Arousal, High Valence)) (High arousal, High valence). They utilized discrete wavelet transform (DWT) and ELM to achieve 79.37%. (Extreme Learning Machine). Additionally, Javaid et al. [15] performed classification on four emotions (pleasant, sad, happy, and frustrated). They utilized Higuchi’s Fractal Dimension to extract features and SVM as a classifier. The best accuracy achieved was 87.62% for arousal and 83.28% for valence. Recently, Chen et al. [4] presented the different types of EEG features selected and combined such as Lempel–Ziv complexity, Wavelet detail coefficient, and Average approximation entropy. The calculated four features are input into the LIBSVM classifier to realize the classification results. As a result, two-category classifications are performed on Arousal and Valance. The average recognition rates are 74.88% for arousal and 82.63% for valence, respectively.

In deep learning techniques, Zheng et al. [27] performed emotion recognition using differential entropy (DE) features with three emotions (positive, neutral, and negative). They achieved classification accuracies of 86.65%, respectively, using a deep belief network (DBN) classifier. Nath et al. [18] performed emotion classification with two emotions (high/low valence and arousal). They achieved the best accuracy of 87.5% using an SVM classifier. With the proposed LSTM model, the best classification accuracy for valence and arousal scales was 94.69% and 93.13%, respectively.

Recently, Islam et al. [14] demonstrated a novel method focusing on lower computational complexity based on memory requirement and computational time. They also proposed using Convolutional Neural Network architecture for EEG-based emotion recognition model. They introduced Pearson’s Correlation Coefficients (PCC) of alpha, beta, and gamma sub-bands as EEG features. As a result, they achieved the maximum accuracy of 78.22% on valence and 74.92% on arousal obtained using the DEAP dataset.

Moon et al. [17] conducted the classification using the power spectral density and three emotional categories (Positive, Neutral, and Negative). They focused on adjusting the hyper-parameter corresponding to the number of convolutional layers and attained a maximum accuracy of 86.86% with five layers.

Most studies selected two to four emotion categories in the previous research. Moreover, the Russell model does not clearly and randomly select emotion categories. To improve emotion classification accuracy, we need to have more detailed emotion categories that are specific and extensive. To solve this problem, we adequately divide and extend the emotional categories.

3 Materials and methods

3.1 Institutional review board (IRB)

The experiment was conducted with approval from the Institutional Review Board (IRB) at Yonsei University. (Approval No. 7001988-201,807-HR-424-03).

3.2 Subjects

In the study, we carried out 12 subjects(6 females) aged between 21 and 33 (mean = 25.67; SD = 3.6) with right-handedness. All subjects had normal vision and had not taken any medication, as well as had no neurological or psychiatric illness before. Each participant was required to abstain from caffeine, tobacco, and alcohol use for 24 hours before. They signed written informed consent before the experiment and received KRW 30,000 (about $ 26).

3.3 Materials

Various stimuli are utilized in emotion-related research, such as movie clips, music, speech, etc. Among them, the movie clip has greater efficiency and reliability in inducing emotions as it contains audio and video. According to emotional categories, the stimuli were five movie clips containing four minutes, i.e., anger, excitement, fear, happiness, and sadness, taken from the Korea Film Council (KOFIC). To this end, we selected the movie clips from 160 participants to achieve standardized elicitation stimulus using an evaluating scale self-assessment manikin (SAM) [3]. (See in Fig. 2).

Fig. 2
figure 2

The self-assessment manikin (SAM) adapted with permission from Bradley [4] (SAM measures pleasure (top), and arousal (bottom) on a discrete scale (1:very low, 5: neutral, 9: very high)

In addition, we added a neutral state consisting of a black background and a cross-mark in the center as the baseline. Each emotional movie clip’s descriptions are described in Table 2 and Fig. 3.

Table 2 Examples of movie clips of emotions
Fig. 3
figure 3

The description of emotional movie clips (Anger, Excitement, Fear, Happiness, Neutral, Sadness)

3.4 Experimental procedure

The experiment was performed in a quiet room controlled for sound and light conditions. The participants were seated in comfortable chairs in front of the monitor. After the subjects correctly understood the instruction, they pressed any key on the keyboard to move to the next step. Before participating in the experiment, they requested to complete the written consent form.

The experimental procedure included six trials. Each trial started with a 5-second dark screen with a cross in the center, followed by an emotional movie clip lasting 4 seconds. After each trial ended, they used a questionnaire to rate how they felt about each movie clip. The trials were randomly presented; the experimental procedure is represented in Fig. 4.

Fig. 4
figure 4

The experimental procedure

3.5 EEG acquisition

An EEG captures and records the electrical functions of the brain along the scalp. EEG recordings were continuously collected using the Emotiv EPOC wireless headset (Emotiv Systems, Inc., San Francisco, CA) with 14 electrodes (AF3, AF4, F7, F8, F3, F4, FC5, FC6, T7, T8, P7, P8, O1, and O2) at a sampling rate of 128 Hz. The electrodes were placed according to the international 10-20 System and used for the common ground reference (left and right mastoid), as shown in Fig. 5. We applied a 60 Hz notch filter to reduce the power supply noise. The impedance of each electrode was kept at less than 5 KΩ. In this work, we concentrated on the frontal-temporal lobes of the brain. According to [22, 26], emotional changes mostly affect the EEG signals in the frontal and temporal lobes.

Fig. 5
figure 5

The electrode location according to the international 10-20 system

4 Data analysis

4.1 EEG preprocessing

The EEGLAB Tool Box, open software based on MATLAB [7], was used to preprocess the EEG signals. Before the preprocessing, we excluded the first and last 30 seconds of each data set to induce the highest elicitation of emotions. The raw data is divided into six trials under emotional categories separately. After segmentation, the data is extracted from 60 to 180 seconds to elicit more evoked emotional responses.

The raw EEG data were filtered from 0.1 to 50 Hz and converted to an average reference. The recorded EEG signals contain many artifacts like eye blinking, cardiac and muscular activity, power noise, etc. To accomplish this, we used independent component analysis (ICA), a popular technique for multi-sensor signal processing [2]. After removing the artifacts, we used the preprocessed EEG signals to evaluate the classification. EEG data preprocessing consists of the following steps:

  1. (1)

    Filtering: Using a notch filter, band-pass filter the EEG data between 0.1 and 50 Hz to remove power line noise

  2. (2)

    Re-reference: Replace the average reference

  3. (3)

    Visual inspection

  4. (4)

    Independent Component Analysis (ICA)

4.2 EEG feature extraction

The classification of EEG signals into emotion categories needs feature extraction from the signal. To extract features, we analyzed the power spectrum of the EEG signals from each channel, ranging from 1 to 50 Hz. A Fast Fourier Transformation (FFT) method with a half-overlapped 1-s Hanning window was applied to each of the 14 channels to compute the spectral time series. The FFT algorithm was used to derive spectral power (uV2) estimates for each channel site. The process was conducted using Darbeliai, the EEGLAB extension. The absolute power values from each channel were obtained within the frequency band.

5 Experiments

5.1 Classification models

For EEG-based emotion classification, machine learning techniques such as Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) and deep learning techniques such as Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) have emerged by [6, 13, 16, 25].

Introduced by Cortes [5], SVM is one of the most effective classification problem-solving techniques. It aims to determine a decision boundary, the hyperplane, to classify data into two categories. It is determined by optimizing the distances between the nearest data points of the classes and the hyperplane. Although EEG has high time resolution, the most commonly used classifiers for emotion classification can only be performed using independent EEG segments. To solve sequential classification tasks, special deep learning methods are introduced, such as Recurrent Neural Networks (RNN) and Long-Short Term Memory (LSTM) [1, 8, 18].

The RNN recognizes patterns in data sequences but has limitations in learning to avoid long-term dependency. To cover this issue, most researchers are highly interested in LSTM, an extension of the RNN that can learn longer temporal sequences. To the best of our knowledge, emotions change continuously, and this continuity is reflected in the temporal correlations of EEG signals. Because of this, LSTM is expected to better classify EEG signals than other deep learning methods when looking at their temporal correlations because it is suitable for capturing temporal dynamics.

An LSTM network comprises LSTM cells consisting of special gates (i.e., input gate, forget gate, output gate). These gates determine which information is to be added or discarded to the cell state. All the cells in the LSTM chain keep long-term dependencies on this cell state. Thus, the output of the cell and hidden states enables the LSTM network to learn both short-term and long-term dependencies successfully.

In addition, the Bi-directional Long-Short Term Memory (Bi-LSTM) network, an improved version of the LSTM, can process data in two directions with separate hidden layers, respectively [11]. It has also been widely utilized as an effective deep network algorithm for mental state classification or recognition based on physiological signals [12]. Compared with the LSTM, the Bi-LSTM can access the long-range data in both input directions and better model time sequences. Therefore, an emotion classification model based on both LSTM and Bi-LSTM networks is adopted in this paper. Finally, we use the training model to identify the emotional state of the testing samples and calculate their classification accuracy.

5.2 Experiment setup

To perform classification tasks, feature vectors are generated. We apply a 2-s window size with 256 samples and a 50% overlap to the raw EEG signals of 12 subjects for feature extraction. Each sample is represented as a vector with 14 channels.

We performed the experiment using the Scikit-learn library and the SVM with the RBF kernel function [20]. In addition, we perform classification using LSTM and Bi-LSTM. Our LSTM and Bi-LSTM comprise 1 or 2 layers with 256 hidden units, a dropout layer with a probability of 0.5, and a fully connected (FC) layer that serves as a classifier. To prevent overfitting, we implement dropout layers before the FC layer and between the LSTM or Bi-LSTM layers when they have more than two layers. The Adam optimizer is utilized during training with a 0.0005 learning rate and 100 epochs. Experiments are implemented using Python 3 with the Pytorch library [19].

In our experiments, we employed the evaluation strategy, which is subject-independent. The subject-independent experiments were used to evaluate them with 5-fold cross-validation. The details of the hyper-parameters that were selected in our experiments are shown in Table 3.

Table 3 The details on hyperparameter for our experiment

6 Experimental results

In this section, we discuss the experimental results for six emotions using the SVM, LSTM, and Bi-LSTM classifiers. Experimental results are presented in classification accuracy using a subject-independent approach.

6.1 The performance comparison of different classifiers

Table 4 summarizes the experimental results regarding the average emotion classification accuracy under different classifiers, SVM, LSTM, and Bi-LSTM. As results, LSTM outperforms in terms of emotion recognition accuracy more than other classifiers, especially the LSTM in a large gap, which is at least above 80.46%, while SVM is 68.64%. Compared with the traditional machine learning method SVM, the accuracy of LSTM improves by approximately 12.91% with six emotions. The approach achieved a higher recognition rate than the SVM and captured temporal dynamicity due to complex time series data. These results clearly show the effectiveness of the LSTM and Bi-LSTM in EEG-based emotion recognition compared to SVM. In addition, we perform extensive experiments using the deep learning methods of both LSTM and Bi-LSTM, a popular method suitable to capture temporal dynamicity.

Table 4 Performance classification accuracy(%) for SVM, LSTM, and Bi-LSTM classifiers

6.2 The performance comparison of deep learning with different layers

Table 5 summarizes the experimental results under deep learning techniques with the number of layers to evaluate the emotion classification of emotional movie clips. As results, we achieve the best accuracy in the Bi-LSTM, with two layers of 82.89%, while the LSTM is 82.17%. Similarly, we confirm that both classifiers with two layers achieve the highest accuracy and are better than classifiers with other layers. It is indicated that the appropriate number of layers is required to achieve better classification accuracy. In other words, we confirm that adjusting the number of layers is a very important hyper-parameter in improving classification accuracy. It means that, depending on the size of the layers, it can also stop over-fitting and enhance the accuracy of generalization.

Table 5 Emotion classification accuracy(%) of different classifiers with different numbers of layers

Table 6 and Fig. 6 show the accuracy of each subject by the subject-independent approach. The highest accuracy achieved was 82.89% by the Bi-LSTM using two layers, showing a better performance of approximately 1 to 3% over other classifiers. Compared with all subjects, the highest accuracy is 92.38% with Bi-LSTM using two layers. Moreover, we observe a drop in the accuracy rate of three layers rather than two across all subjects. It can also be thought of as the role of the layer as a hyper-parameter that affects the accuracy.

Table 6 Emotion classification accuracy(%) of each subject using LSTM and Bi-LSTM with different numbers of layers for a subject-independent approach
Fig. 6
figure 6

The performance of comparison on accuracy between LSTM and Bi-LSTM with different numbers of layers

7 Conclusions

In this paper, we conduct emotion classification from EEG signals under machine learning techniques. In addition, we present the novel EEG emotional dataset and demonstrate how to recognize emotion from EEG signals. Although the EEG signals vary significantly depending on individuals, we obtained promising results for classifying six emotions. It also shows that the Bi-LSTM with two layers performs with the best accuracy of the other classifiers. Both classifiers suit high temporal resolution or time sequences and propose a novel method to improve EEG-based emotion recognition.

In future work, we will collect the EEG data to expand the dataset and publicly introduce an EEG-based emotion database. Furthermore, we will propose a novel classification method to classify optimal physiological signals, specializing in EEG signals, and conduct a comparative study with public EEG datasets to improve classification accuracy considerably. Moreover, the EEG-based emotion classification accuracy is lower than in other methods, speech, face, and body since there is a variability in EEG signals among individuals. To relieve the gap among individuals, we will perform the emotion classification that evaluates diverse subject approaches. Hence, we should look into the EEG-based features of emotions to develop a subject-specific emotion model system, like a subject-dependent, subject-independent, or cross-subject strategy.