Keywords

1 Introduction

Emotions are intricately linked to human personality and significantly impact overall well-being. Emotion recognition is a valuable research area with applications across various fields. Two primary approaches are commonly employed in emotion recognition to enhance human-computer interaction: observable-based methods and physiological information-based methods.

Observable-based methods analyze facial expression, body gesture, behavior, and voice, enabling emotion recognition through external observation. For example, Halbhuber focused on recognizing emotions based on facial expression features [1]. While this method offers advantages in recognizing expressed emotions, it may overlook unexpressed emotions, and the observed information can sometimes differ from genuine emotional states. For instance, individuals might fake a smile while experiencing sadness [2]. Alternatively, the use of involuntary biological information, such as electrocardiogram (ECG), galvanic skin response (GSR), heartbeat, electroencephalogram (EEG), and eye movements, provides a more direct understanding of human emotions. Long-term low-valence emotions such as sadness and anger can significantly impact well-being, and this is particularly relevant for younger adults who may encounter various sources of low-valence emotions in their lives. Factors like school demands and frustrations can cause low-valence emotions among young adults and understanding and recognizing these emotions are crucial.

In this context, heart rate variability (HRV) has gained recognition as a noninvasive and objective index of the brain’s capacity to regulate emotional responses. It serves as a marker for individual differences in emotion regulatory capacity [3]. While some researchers have achieved high accuracy in emotion recognition using multimodal physiological data, this often requires multiple sensors, making the process complex and expensive. This study focused on extracting HRV indexes from ECG data using a single-electrode sensor, exploring the possibility of simple devices for recognizing high-valence and low-valence emotions. By utilizing non-deep learning techniques, we analyze the Young Adult’s Affective Data (YAAD) dataset, which contains physiological information about young adults to accurately recognize and differentiate high-valence and low-valence emotions through HRV analysis (Fig. 1). In this paper, we present our method, and the feature extraction process from HRV data. By comparing three different non-deep learning models, we aim to demonstrate the feasibility and efficiency of our proposed method. The potential applications of this research include mental health monitoring, affective computing, and enhanced human-computer interaction. Through this work, we contribute to advancing emotion recognition techniques and understanding emotional well-being in young adults.

Fig. 1.
figure 1

Process flow chart

2 Data Description

2.1 YAAD Data Construction

The YAAD dataset collected data from 25 volunteer participants, including ten females and 15 males. These participants, whose ages ranged from 8 to 25 years, watched 21 stimulus videos, each lasting 39 s [4]. To gather additional information, participants completed a self-assessment questionnaire. The dataset is organized into two configurations: one with single-modal ECG signals and the other with multi-modal ECG and GSR signals. The multi-modal dataset encompasses seven emotional states, namely happy, sad, anger, fear, disgust, surprise, and neutral. Each ECG data contains 5000 samples, and the sampling frequency is 128 Hz.

2.2 Problems with the YAAD Dataset

We use HRV indexes to classify the emotions in the YAAD dataset into eight categories and four categories, but the accuracy rate is very low. Even after feature extraction, the four category accuracy is only 58%. After analysis, it was found that after four classifications, the data set of YAAD appeared a phenomenon of data imbalance. Because the number of LVLA is the largest, the accuracy of recognition is the highest. On the contrary, the number of HVHA is too small, and the recognition rate is very low (Fig. 2).

Fig. 2.
figure 2

Number of emotions in four categories

2.3 Emotion Estimation Toward Stimulus

In emotion recognition, the Arousal-Valence space model is frequently used (Fig. 3). This model visually categorizes emotions based on two dimensions: arousal, representing the level of calmness or agitation, and valence, indicating whether the emotion is positive or negative. In this study, we divided the YAAD dataset into two different groups: high-valence emotions (happy and surprise) and low-valence emotions (sad, anger, fear, disgust). The high-valence emotions group comprises 138 data samples, while the low-valence emotions group comprises 143 data samples. Each data sample is 39 s long, containing 5000 data points and is sampled at 128 Hz. To estimate emotions by physiological indexes, this dataset performed a subjective evaluation based on arousal and valence towards the 21 stimulus videos using the self-assessment manikin (SAM) (Fig. 4). SAM is a widely referenced emotion classification method in the field of psychology, representing basic human emotions on the axes of arousal and valence [5]. The questionnaire aims to assess three features aspects of emotional response, which have been extensively studied in emotion research. It comprises single-item scales to gauge the valence/pleasure of the response (from positive to negative), perceived arousal levels (from high to low), and perceptions of dominance (from out control to control). This proven emotion classification model is commonly utilized in studies analyzing physiological indexes to estimate emotions [6].

Fig. 3.
figure 3

The Arousal-Valence space model

Fig. 4.
figure 4

The self-assessment manikin

3 Related Work

Several studies have explored the connections between emotions and the functioning of the autonomic nervous system (ANS). Kreibig conducted a review of many articles that investigated emotional ANS responses in healthy individuals and the selection of physiological measures to assess ANS reactivity [7]. The review concluded that there is significant specificity in ANS responses to emotions, particularly for certain distinct emotional states. Levenson also previously examined the ANS as an indicator for detecting emotional activity [8]. Some scientific research has considered the distinct roles of sympathetic and parasympathetic divisions in ANS function. For instance, studies have shown sympathetic activation and vagal deactivation in cases of anxiety [9]. However, it’s worth noting that there is a considerable variation in autonomic signatures during emotional activity [7].

Emotion recognition has garnered significant attention in recent years, becoming an active area of research across various interdisciplinary fields, including machine learning, signal processing, social, and cognitive psychology [10]. Halbhuber explored emotion recognition based on facial expression and feature extraction [1]. Subsequently, Datcu and Rothkrantz extended the use of acoustic and visual information and for emotion recognition [11].

Muhammad Najam Dar et al. employed deep learning to recognize emotions on YAAD dataset, classifying them into four and eight categories using ECG raw data. Their achieved accuracy rates were 69.66% and 66.64%, respectively [2]. However, their method involved dividing each ECG data point into one-second segments, leading to 13,804 data segments from 406 original data points. This strategy may result in overfitting due to the high similarity among these data segments, potentially affecting accuracy.

Suzuki, K. et al. proposed an emotion estimation model by EEG and HRV indexes. Their model development process included feature extraction and selection algorithms to identify relevant patterns in the EEG and HRV data, associated with different emotional states [12]. Their extracted and selected features from raw EEG and ECG data significantly enhance emotion recognition accuracy.

4 Methodology and Experiments

4.1 Feature Extraction

HRV refers to the physiological variation in the time interval between consecutive heartbeats. In this study, we employed two widely used methods for HRV index extraction: frequency-domain and time-domain analyses. HRV indexes are known to be influenced by the activity of the sympathetic and parasympathetic nervous systems. Standard deviation of NN intervals (SDNN) and root mean square of successive differences (RMSSD) are essential markers of heart rate variability, providing insights into autonomic nervous system activity, which is modulated by emotions. High-valence emotions are often associated with higher HRV, reflected in elevated SDNN and RMSSD values, while low-valence emotions and stress are linked to lower HRV, leading to reduced SDNN and RMSSD values. Frequency-domain HRV indexes, such as LF and HF, are derived from the pulse wave signal using fast Fourier transform (FFT). LF is believed to reflect sympathetic nerve activity, while HF reflects both parasympathetic and sympathetic nerves [9]. The LF/HF ratio serves as an indicator of the balance between these two branches, with a higher ratio suggesting sympathetic dominance and a lower ratio indicating parasympathetic dominance. Human emotions can be evaluated using the LF/HF ratio [9].

4.2 The Base Classifier

Due to the limited size of the YAAD dataset, this research employed three non-deep learning classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), and K-Nearest Neighbors (KNN), for binary emotion classification. This study focused on feature extraction from HRV data, aiming to obtain indexes that can enhance recognition accuracy. To explore the relationship between young adults’ ECG data and emotions, we solely utilized the YAAD dataset. Previous research has suggested that HRV alone may suffice for emotion estimation, obviating the need for additional physiological information and contributing to the simplification of emotion estimation technology [12].

Support Vector Machine. SVM is a classical algorithm that facilitates the solution of high-dimensional and small-sample size problems. It can also handle the interaction of nonlinear features with high generalization ability.

Logistic Regression. Logistic regression is one of the most popular machine learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. Logistic regression predicts the output of a categorical dependent variable.

The K-Nearest Neighbors Algorithm. K-nearest neighbors algorithm, is a non-parametric, supervised learning classifier, that uses proximity to make classifications or predictions about the grouping of an individual data point. While it can be used for either regression or classification problems, it is typically used as a classification algorithm, working off the assumption that similar points can be found near one another (Table 1).

Table 1. Heart Rate Variability indexes used in this study.
Fig. 5.
figure 5

The SVM confusion matrix before feature extraction

5 Result

Before feature extraction, we use ECG raw data from the YAAD dataset. After feature extraction, we use HRV indexes extracted from ECG raw data to compare the effect of feature extraction on emotion recognition accuracy.

5.1 The SVM Confusion Matrix

Before feature extraction, the SVM model achieved a accuracy of 48.28% (Fig. 5). The confusion matrix shows that for high-valence emotions, it correctly classified 6 instances while misclassifying 5 instances. Similarly, for low-valence emotions, it correctly classified 8 instances but misclassified 10 instances. After feature extraction, the SVM model achieved an impressive test accuracy of 83% (Fig. 6). The confusion matrix shows that for high-valence emotions, it correctly classified 14 instances and misclassified 4 instances. For low-valence emotions, it correctly classified 10 instances and misclassified only 1 instance.

Fig. 6.
figure 6

The SVM confusion matrix after feature extraction

5.2 The LR Confusion Matrix

Before feature extraction, the LR model achieved a higher accuracy of about 68.97% (Fig. 7). The confusion matrix shows that for high-valence emotions, it correctly classified 9 instances and misclassified 2 instances. For low-valence emotions, it correctly classified 11 instances and misclassified 7 instances. The LR model achieved a test accuracy of 76% (Fig. 8). The confusion matrix shows that for high-valence emotions, it correctly classified 12 instances and misclassified 4 instances. For low-valence emotion, it correctly classified 10 instances and misclassified 3 instances.

Fig. 7.
figure 7

The LR confusion matrix before feature extraction

Fig. 8.
figure 8

The LR confusion matrix after feature extraction

5.3 The KNN Confusion Matrix

Before feature extraction, the KNN model achieved a higher accuracy about 68.97% (Fig. 9). The confusion matrix shows that for high-valence emotions, it correctly classified 13 instances and misclassified 6 instances. For low-valence emotions, it correctly classified 7 instances and misclassified 3 instances. After feature extraction, the KNN model also achieved a test accuracy of 76% (Fig. 10). For high-valence emotions, it correctly classified 10 instances and misclassified 2 instances. For low-valence emotions, it correctly classified 12 instances and misclassified 5 instances.

Fig. 9.
figure 9

The KNN confusion matrix before feature extraction

Fig. 10.
figure 10

The KNN confusion matrix after feature extraction

5.4 The Result After Feature Extraction

Table 2, 3 and 4 displays the outcomes of recognizing high-valence and low-valence emotions using ECG data with SVM, LR, and KNN. The results reveal that SVM achieves approximately 6% higher accuracy compared to LR and KNN, which show relatively similar accuracies. In summary, the SVM model achieved the highest accuracy among the three non-deep learning models, reaching an impressive 83% accuracy. Moreover, SVM demonstrated superior precision and recall for both classes compared to the other two models. The KNN model displayed balanced performance in terms of precision and recall for both classes but had a slightly lower accuracy than SVM. On the other hand, the LR model performed reasonably well but showed a lower accuracy compared to the other two models.

Table 2. The results of classifying high valence and low valence emotions with SVM.
Table 3. The results of classifying high valence and low valence emotions with LR.
Table 4. The results of classifying high valence and low valence emotions with KNN.

6 Conclusion

In this study, we have explored emotion recognition in young adults through heart rate variability using three non-deep learning models. Our method focused on recognizing high-valence and low-valence emotions in Young Adult’s Affective Data and used feature extraction to improve recognition accuracy. Our experiment results showed that SVM achieved the highest accuracy among three non-deep learning classifiers and demonstrated superior precision and recall for both high-valence and low-valence emotions, making it a reliable choice for emotion recognition tasks. The utilization of HRV as a physiological indicator for emotion recognition provides valuable insights into the autonomic nervous system activity, influenced by emotional states. HRV indexes, such as SDNN, RMSSD, LF, HF, and LF/HF ratio, provide an understanding of emotional well-being and its implications on overall health. Our research contributes to the advancement of emotion recognition technology, presenting an efficient and practical approach to recognizing emotions in young adults. The implications of this work extend to various domains, including mental health monitoring, affective computing, and human-computer interaction. As future work, we aim to explore additional features and extend the dataset to include more diverse populations, enabling better generalization and robustness of emotion recognition models. Additionally, combining HRV data with other physiological information could provide deeper insights into the complex interplay of emotions and physiological responses.