Keywords

1 Introduction

Emotion recognition is the process of identifying and interpreting the emotional state of another person. It is a complex task that involves multiple cognitive processes, such as facial expression recognition, speech analysis, and body language interpretation.

Emotion recognition has a wide range of potential applications in healthcare, including:

  • Diagnosis and treatment of mental health disorders: Emotion recognition can be used to identify and diagnose mental health disorders, such as depression, anxiety, and schizophrenia. It can also be used to monitor the effectiveness of treatment for these disorders.

  • Pain management: Emotion recognition can be used to identify patients who are in pain. This information can then be used to personalize pain management strategies.

  • Patient-provider communication: Emotion recognition can be used to improve patient-provider communication. For example, it can be used to identify patients who are feeling anxious or stressed, so that providers can adjust their communication style accordingly.

  • Telehealth: Emotion recognition can be used to improve the effectiveness of telehealth services. For example, it can be used to identify patients who are not responding well to treatment, so that providers can intervene early.

Deep learning algorithms have seen a considerable increase in use over the last few years for emotion identification. In several areas, including speech recognition, picture classification, and natural language processing, deep learning algorithms have excelled, frequently producing state-of-the-art outcomes. Importantly, when using EEG brainwave data, these algorithms have also been demonstrated to be quite successful at recognizing emotions.

1.1 Motivation

The motivation of our research is to investigate the use of deep learning algorithms for emotion recognition using EEG brainwave data. We are motivated by the potential of this technology to be used in a variety of applications in healthcare, such as diagnosis and treatment of mental health disorders, pain management, and patient-provider communication.

Specifically, we are interested in the following research questions:

  • Can deep learning algorithms be used to accurately classify emotions from EEG data?

  • Which deep learning algorithm is the most effective for emotion recognition from EEG data?

  • What are the limitations of using deep learning algorithms for emotion recognition from EEG data?

1.2 Contribution

Our contribution to this research is to investigate the use of deep learning algorithms for emotion recognition using EEG brainwave data. We have shown that deep learning algorithms can be used to accurately classify emotions from EEG data, with the DNN achieving the highest accuracy of 98.44%. We have also identified some of the limitations of using deep learning algorithms for emotion recognition from EEG data, such as the need for large datasets and the difficulty of interpreting the results of deep learning models.

Our findings suggest that deep learning algorithms have the potential to be used in a variety of applications in healthcare, such as diagnosis and treatment of mental health disorders, pain management, and patient-provider communication. However, more research is needed to improve the accuracy and reliability of these systems and to address the limitations that we have identified.

Specifically, our contribution includes:

  • We conducted a study to investigate the use of deep learning algorithms for emotion recognition using EEG brainwave data.

  • We collected a dataset of EEG data from two people (1 male, 1 female) who were recorded for three min per state: positive, neutral, and negative.

  • We trained three deep learning algorithms on the dataset: DNN, LSTM, and GRU.

  • We evaluated the performance of the three algorithms on the test set and found that the DNN achieved the highest accuracy of 98.44%.

  • We also calculated other metrics such as precision, recall, and F1-score. The results showed that all three algorithms were able to achieve high accuracy in classifying emotions from EEG data.

  • We discussed the limitations of using deep learning algorithms for emotion recognition from EEG data and identified some areas for future research.

2 Related Work

Bano et al. [1] proposed a computer interface to analyze the relationship between electroencephalography (EEG) signals and human emotions. Recurrent neural network (RNN) and gated recurrent unit (GRU) models were used to classify electroencephalogram (EEG) brain signals and predict human emotions where the brain signal was recorded through TP9, AF7, AF8, and TP10 electrodes. Two people’s emotional stages were collected for three minutes by recording EEG signals to know if they were in a positive, negative, or neutral stage which is beneficial for criminal identification.

In another study, human emotions are detected by using a deep learning-based cluster-based region classifier algorithm which combines the basic ranges of EEG signal handling developed by Chakravarthy et al. [2]. To classify the emotional states, artificial neural network (ANN) and K-nearest neighbor (KNN) channels were used by the author where the system showed 94% accuracy but the model needs further modification to get uniform results.

Using an EEG-based brain-machine interface, Birdy654 et al. conducted a research on the categorization of mental states [3]. In this study, the dataset was used to build a deep learning model that could categorize four distinct mental states: happy, sad, angry, and annoyed. On the test set, the model had an accuracy of 88.9%.

Using an EEG-based brain-machine interface, Birdy654 et al. [4] classified mental emotional sentiments. A deep learning model that could categorize three different emotional states—positive, neutral, and negative—was trained using the dataset in this study. On the test set, the model had an accuracy of 95%.

In a different work by Dutta et al. [6], 640 datasets were used to evaluate five combinations of activation functions, two loss model procedures, and an Adam optimizer in LSTM and MLP-ANN algorithms. Confusion matrices were used to analyze the accuracy, execution time, and parameters; results showed that deep learning models produced the maximum accuracy when employing the binary cross-entropy loss model. The accuracy of predictions varied from 92 to 97%.

Moreover, Kasuga et al. [7] presented useful electrodes for positive–negative emotion classification based on EEG by using machine learning. The author collected data from EEG signals from 30 people who are aged between 19 and 38 with the help of 14 electrodes. After extracting frequency-domain statistical parameters, it was applied to random forests (RF) [8,9,10]. Among the electrodes, P8, P7, and FC6 played a vital role in positive–negative emotion classification. The model showed 85.4% accuracy which needs further modification.

Furthermore, Waheed et al. [11] concentrated on processing human emotions by using a machine learning algorithm. An IoT-based brainwave sensor recorded the EEG signal taken from the brain, after that Naive Bayes machine learning algorithm was classified between two-class (binary) or multi-class problems. Along with that, KNN, decision tree, and support vector machine algorithms are also implemented. The emotions are classified as meditation, boredom, joyful, and frustrations after observing the results where the system is cost-effective to use.

Similarly, Doma et al. [12] represented verification of EEG and peripheral physiological signals by exhausting different machine learning algorithms on them such as support vector machine (SVM), K-nearest neighbor, linear discriminant analysis, logistic regression, and decision trees. The participants are observed for 40 min when they saw different types of videos and the EEG signal was recorded. SVM in principal component analysis (PCA) delivered better accuracy than other algorithms.

A machine learning system was used to analyze table tennis players’ brainwave patterns in research by Tsai et al. [13]. Based on the players’ EEG report databases, several methods were used to determine the stress levels, including logistic regression, support vector machine, decision tree C4.5, classification and regression tree, random forest, and extreme gradient boosting (XGBoost). The experiment’s findings showed that the XGBoost algorithm was the most effective model for this particular test. The authors also speculate that future studies combining XGBoost with deep learning algorithms may improve stress categorization and result in even greater levels of accuracy.

Chakravarthi et al. [14] described emotion recognition by applying EEG signals and brain wave patterns in the convolutional neural network (CNN)-LSTM with the ResNet-152 algorithm. This is most helpful to resolve post-traumatic stress disorder (PTSD).

3 Methodology

See Fig. 1.

Fig. 1
A schematic flow diagram. E E G data is followed by processed data, model, and classifier. The model consists of 6 layers. The processed data is classified into 2 classes.

Proposed model in this research

3.1 Dataset Collection

The research made use of a Kaggle-available dataset titled “EEG Brainwave Dataset: Feeling Emotions.” This dataset included EEG readings made at three-minute intervals from two people (a male and a female) for each of the three emotional states: positive, neutral, and negative. A Muse EEG headband was used for the recordings, which recorded EEG data from four channels: TP9, AF7, AF8, and TP10. The dataset was preprocessed by the dataset’s creators before being used, and artifacts like eye blink and muscle movements were removed. Following this, the data was divided into training and test sets, with 80% going to the training set and 20% going to the test set. The test set had 60 samples, whereas the training set had 240 samples.

Here are some additional details about the dataset:

  • The EEG data was recorded at a sampling rate of 128 Hz.

  • The data was normalized to have a mean of 0 and a standard deviation of 1.

  • The labels for the data are as follows:

    • Positive: The participant was asked to think of something that made them feel happy or excited.

    • Neutral: The participant was asked to think of something that made them feel neither happy nor sad.

    • Negative: The participant was asked to think of something that made them feel sad or angry.

The dataset is available for free to download from Kaggle. It is a valuable resource for researchers who are interested in emotion recognition from EEG data.

Figure 2 shows that classes are almost balanced. We can get away with the difference. Means that the three classes of data (neutral, negative, and positive) are very close in size. In this case, the neutral class has 716 data points, the negative class has 708 data points, and the positive class also has 708 data points. The difference between the sizes of the classes is only 8, which is considered to be small. Therefore, the statement is saying that it is acceptable to use the data even though the classes are not perfectly balanced.

Fig. 2
A bar graph presents the sample distribution among neutral, positive, and negative labels. The approximated values are almost equal for all at 700.

Number of samples distribution

3.2 Data Visualization

Figure 3 with a resolution of 200 dots per inch (dpi) and a size of 24 by 6 in. Then, it plots the fast Fourier transform (FFT) values for three different samples of EEG data. The first sample is labeled “Sample 0” and is plotted in red with an alpha value of 0.5. The second sample is labeled “Sample 100” and is plotted in blue with an alpha value of 0.8. The third sample is labeled “Sample 200” and is plotted in black with an alpha value of 0.5.

Fig. 3
A graph of amplitude versus frequency. It presents the F F T signals of 3 samples such as sample 0, sample 100, and sample 200. The signals denote heavy fluctuatuations.

FFT visualization

The x-axis of the plot represents the frequency of the FFT values, and the y-axis represents the amplitude of the FFT values. The x-ticks are spaced every 100 points, and the labels are rotated by 45°. The legend is located in the best possible location.

Figure 4 with three subplots, each showing the frequency spectrum of the EEG data for a different time point: 1, 10, and 30 s.

Fig. 4
Three E E G data plots present heavy fluctuations of the frequency spectrum at 1 second, 10 seconds, and 30 seconds.

Three subplots, each showing the frequency spectrum of the EEG data for a different time point: 1, 10, and 30 s

The frequency spectrum of the EEG data from the first second is shown in the first subplot. The frequency spectrum of the EEG data that was recorded in the 10th second is shown in the second subplot. The final subplot, in a similar vein, shows the frequency spectrum of the EEG data that was captured at the 30th second. The EEG signal’s strength with respect to frequency is displayed on the frequency spectrum plot. The signal’s amplitude must be squared to determine power. Alpha, beta, and gamma waves are only a few of the several kinds of brain waves that may be distinguished by looking at the frequency spectrum.

3.3 Model Implementation

LSTM—Long Short-Term Memory (Fig. 5):

Fig. 5
A model diagram includes layers of sigma and tan h. Inputs are C t minus 1, h t minus 1, and x t. Outputs are h t, c t, and h t.

Typical LSTM model [11]

  • This model uses an LSTM layer with 256 hidden units, followed by a dense layer with three output units.

  • The total number of parameters in this model is 2,221,059, of which all are trainable.

  • This model is a simple model with a relatively small number of parameters. It is a good choice for a task with a small dataset or if you do not need to achieve high accuracy.

GRU (Fig. 6):

Fig. 6
A model diagram. Input is X t. The diagram consists of fully connected layers with activation functions such as sigma, sigma, and tan h. Input is concatenated and passes through these and connects to element-wise operators, X, X, X, plus. H t minus I is converted to H t tilde.

Typical GRU model [12]

  • This model uses a GRU layer with 256 hidden units, followed by a dense layer with three output units.

  • The total number of parameters in this model is 2,155,779, of which all are trainable.

  • This model is similar to the LSTM model, but it uses a GRU layer instead of an LSTM layer. GRU is a simpler type of RNN than LSTM, so this model has a slightly smaller number of parameters.

DNN—Deep Neural Network (Fig. 7; Table 1):

Fig. 7
A schematic diagram. The input layer has 3 elements. These interact with 4 elements of hidden layer 1, 5 elements of hidden layer 2, and then the process continues until 5 elements of hidden layer N, ultimately generating 1 element of the output layer.

Typical DNN model [13]

Table 1 Models summary
  • This model uses a six-layer DNN with 2548, 3822, 5096, 3822, 2548, and three hidden units, respectively. Each hidden layer is followed by a batch normalization layer and a dropout layer. The final layer is a dense layer with three output units.

  • The total number of parameters in this model is 65,019,867, of which 64,984,195 are trainable.

  • This model is the most complex of the three models. It has a large number of parameters and uses a variety of techniques to improve its performance. This model is a good choice for a task with a large dataset or if you need to achieve high accuracy.

4 Results

In this study, we presented a deep neural network (DNN) model created exclusively for EEG-based emotion identification. We used the well-known EEG Brainwave Dataset: Feeling Emotions, a well-known open dataset in the area, to train the model. A training set was created using 70% of the dataset, while a test set was created using the remaining 30%.

We used classification metric accuracy (ACC) [19] to assess the model’s performance. A frequently used assessment metric, ACC, represents the percentage of properly identified samples. Our suggested DNN model outperformed the accuracy of the LSTM model (97.18%) and the GRU model (97.51%) with an astounding accuracy of 98.44%.

Tables 2, 3, 4 and 5 give the precision, recall, and F1-score of the proposed model for each emotion class.

Table 2 Precision, recall, and F1-score for the negative class
Table 3 Precision, recall, and F1-score for the neural class
Table 4 Precision, recall, and F1-score for the positive class
Table 5 Negative, neural, and positive classes

Figure 8 describes the training and validation accuracy, training, and validation loss for the LSTM model. The model is evaluated on a validation dataset, which is a set of data that the model has not seen before. The validation accuracy is the percentage of words in the validation dataset that the model predicts correctly. The model is trained for 50 epochs. An epoch is a complete pass through the training dataset. In the first epoch, the model has a validation accuracy of 0.8996. This means that the model predicts 89.96% of the words in the validation dataset correctly. In the second epoch, the model’s validation accuracy improves to 0.9353. This means that the model is learning to predict the next word more accurately. The model’s validation accuracy continues to improve until epoch 6 when it reaches a maximum of 0.97768. This means that the model is predicting the next word with 97.768% accuracy. After epoch 6, the model’s validation accuracy does not improve any further. This means that the model has reached a plateau and is no longer learning to predict the next value more accurately. The model is stopped at epoch 16 because the validation accuracy has not improved for 10 consecutive epochs. This is a common practice in machine learning, as it prevents the model from overfitting the training data. It can predict the emotion class with 97.768% accuracy. This is a significant improvement over random guessing, which would have an accuracy of 50% (Fig. 9).

Fig. 8
Two multiline graphs. A. It presents the variations of accuracy with the increase in epoch for training and validation accuracy curves. Both follow an increasing trend. B. It presents cross entropy versus epoch. It presents 2 descending curves for training and validation loss.

LSTM training and validation accuracy (up), Training and validation loss (down)

Fig. 9
A normalized confusion matrix of true label versus predicted label. It is a 3 cross 3 matrix presenting the highest negative, neutral, and positive diagonal values of 0.97, 0.98, and 0.98 respectively.

Confusion matrix for LSTM model

Figure 10 describes the training and validation accuracy, training, and validation loss for the GRU model. The model was trained on a dataset of 33 samples, with each sample consisting of a sequence of characters. The model was trained for 50 epochs, and the loss and accuracy were evaluated on a validation set after each epoch. The model’s loss decreased over the first few epochs, and its accuracy on the validation set increased. However, after epoch 10, the model’s accuracy on the validation set did not improve any further. This suggests that the model has reached a point where it is no longer able to learn from the data. The model’s best epoch was epoch 10 when its accuracy on the validation set was 0.96875. This means that the model was able to correctly classify 96.875% of the samples in the validation set. The model was then saved after epoch 10, and training was stopped. This is because early stopping is a technique that can help to prevent overfitting. Overfitting occurs when a model learns the training data too well, and as a result, it is not able to generalize to new data. By stopping training early, we can help to prevent the model from overfitting. Overall, the model seems to be performing well. It was able to achieve high accuracy on the validation set, and it did not overfit the training data. However, it is possible that the model could be improved by training it for a longer period or by using a different set of hyperparameters (Fig. 11).

Fig. 10
Two multiline graphs. A. Accuracy versus epoch graph plots 2 increasing curves for training and validation accuracy. B. Cross entropy versus epoch curve plots 2 decreasing curves for training and validation loss. The curves for validation accuracy and validation loss fluctuate more.

GRU training and validation accuracy (up), Training and validation loss (down)

Fig. 11
A normalized confusion matrix of true label versus predicted label. It is a 3 cross 3 matrix presenting the highest negative, neutral, and positive diagonal values of 0.98, 0.98, and 0.95 respectively.

Confusion matrix for GRU model

The training and validation accuracy as well as the training and validation loss for the deep neural network (DNN) model are shown in Fig. 12 of the publication. A dataset with 33 samples was used to train the DNN model. A full training cycle on the whole dataset was run on each of the 50 epochs that made up the training phase. A different validation dataset was used to evaluate the model’s performance. The model’s loss and accuracy for the training and validation datasets were noted after each epoch. While accuracy gauges how well the model can accurately predict the labels of validation data, loss assesses how well the model matches the training data. The result shows that the model’s accuracy and loss on the training dataset constantly decrease across the epochs, demonstrating an improvement in training data fitting. After epoch 21, however, both the loss and accuracy on the validation dataset plateau indicate that the model has stopped learning new information and has begun overfitting the training set.

Fig. 12
Two multiline graphs. A. The accuracy versus epoch graph presents 2 increasing curves for training and validation accuracy. B. The cross-entropy versus epoch graph presents an initially declining then constant validation loss curve and an almost constant training loss curve.

DNN training and validation accuracy (up), Training and validation loss (down)

On the validation dataset, the model performs best at epoch 29, when it records a loss of 0.0995 and an accuracy of 0.9821. This means that 98.21% of the validation data’s labels can be accurately predicted by the model. As a result, training is terminated at epoch 29 since the validation accuracy has not increased throughout the previous five epochs. Early halting is a frequent technique used in machine learning to avoid overfitting (Fig. 13).

Fig. 13
A normalized confusion matrix of true label versus predicted label. It is a 3 cross 3 matrix presenting the highest negative, neutral, and positive diagonal values of 0.98, 0.99, and 0.98 respectively.

Confusion matrix for DNN model

Three classifications are used in the categorization task: negative, neural, and positive. Instances that do not fall within any of the other two types are included in the negative class. The cases that fall between the negative and positive classes are represented by the neural class. Last but not least, occurrences from one of the two classes are included in the positive class.

The DNN model consistently earns the greatest precision, recall, F1-score, and accuracy for all three classes when the performance of the models is evaluated. Then, among the three classes, the GRU model exhibits the second-highest precision, recall, F1-score, and accuracy. On the other hand, for all three classes, the LSTM model has the lowest precision, recall, F1-score, and accuracy.

Here are some additional observations:

  • The DNN model has the highest precision for all three classes. This means that the DNN model is the most accurate at predicting the labels of instances that belong to a particular class.

  • The DNN model also has the highest recall for all three classes. This means that the DNN model is the most likely to predict that an instance belongs to a particular class, even if it does not belong to that class.

  • The GRU model has the highest F1-score for class 0. This means that the GRU model has the best balance of precision and recall for class 0.

  • The LSTM model has the lowest accuracy for class 2. This means that the LSTM model is the least accurate at predicting the labels of instances that belong to class 2.

5 Limitations and Future Scopes

  • The dataset used in this study is relatively small. This could be a limitation, as it could lead to overfitting of the models.

  • The study only used EEG data from two people. This could also be a limitation, as it is not clear how well the models would generalize to data from other people.

  • The study only used three deep learning algorithms. It would be interesting to see how other algorithms, such as convolutional neural networks perform this task.

  • The study only classified emotions into three categories: positive, neutral, and negative. It would be interesting to see how the models would perform if they were asked to classify emotions into more categories.

Here are some future works for the project:

  • Collect a larger dataset of EEG data from a more diverse population.

  • Use more deep learning algorithms to classify emotions.

  • Classify emotions into more categories.

  • Investigate the use of EEG data to track changes in emotions over time.

  • Investigate the use of EEG data to predict emotions in real-time.

  • Investigate the use of EEG data to improve human–computer interaction.

  • Investigate the use of EEG data to diagnose and treat mental health conditions.

Overall, the results of this study are promising. The deep learning algorithms were able to achieve high accuracy in classifying emotions from EEG data. However, there are some limitations to the study, and there are many future works that can be done in this area.

6 Conclusion

In this study, we investigated the use of deep learning algorithms in conjunction with EEG brainwave data to identify emotions. For each emotional state—positive, neutral, and negative—we recorded the EEG of two people (a man and a female) at three-min intervals. We used three deep learning algorithms: a deep neural network (DNN), a long short-term memory (LSTM) network, and a gated recurrent unit (GRU) network to train and assess the performance of the models. When tested on the test set, the DNN model had the best accuracy, coming in at 98.44%. We assessed precision, recall, and F1-score in addition to accuracy, and the results showed that all three algorithms classified emotions from the EEG data with excellent accuracy.

The findings of this study suggest that deep learning algorithms have potential in a range of healthcare applications, including the diagnosis of mental health disorders, pain management, and improving patient-provider communication. However, more investigation is required to solve the highlighted shortcomings and improve the accuracy and dependability of these systems.