1 Introduction

Emotions are important part of human behaviour. Emotional state alters or even changes the decisions of our daily life tasks [1]. Mental disability of children can also be diagnosed through their expression of emotions, in case of autism. Many researchers are working towards computer-based emotion recognition using different modalities [2]. Different types of modalities have been used in literature for emotion recognition like speech signals, facial expressions [3], body movement and context awareness system [4], etc. These modalities are inefficient for the reason that these can be controlled by person to hide actual emotions, like a person may smile even in sad emotion. To eliminate this inefficiency, many researchers have worked on physiological signals as a modality to recognize the real emotions [5]. The physiological signal recording process includes electroencephalography (EEG), electrocardiography (ECG), electrooculography (EOG) and electromyography (EMG), etc. The change in emotional condition causes changes in pattern or structures of these physiological signals, which cannot be controlled by the person and hence cannot be faked [6].

At present, a large number of studies reports use of EEG signals for emotion recognition [5]. In [7], authors performed dimensionality reduction by selecting the optimal EEG features based on evolutionary computational algorithm on the EEG data recorded using mobile EEG sensors. A new machine learning approach to extract time–frequency features using dual tree complex wavelet packet transform (DT-CWPT) to detect emotions is presented in [8]. Common spatial patterns analysis is also frequently used in EEG signal analysis and emotion classification [9]. Many researchers have started using other physiological signals like ECG, electro-dermal activity (EDA) and respiration (RSP) signals for emotion recognition. An automated emotion classification method for children using EDA signals is presented in [10]. This dataset contains EDA signals from 100 children younger than 30 months. Support vector machine (SVM) classifier is used to classify three emotions, joy, boredom and acceptance. In [11], authors have applied deep learning technique for emotion classification using respiration signal. The ECG signal-based emotion recognition is presented in [12], two emotions, namely joy and sadness are classified. Continuous wavelet transform is used to extract features from ECG signal and hierarchical particle swarm optimization (HPSO) is used to select the best features which are then classified using fisher classifier.

The effect of change in emotional state is reflected in physiological signal. ECG represents the electrical activity of the heart and emotional changes cause change in heart rate [13], EMG signals are nothing but the electrical activity of the skeletal muscle. EMG signal reflects the state of muscle and nerve. This signal also reflects the emotional changes [14]. EEG represents the electrical activity inside brain and changes in emotional states. Changes can be represented using the activation of particular brain area and functional interconnection between the different parts of the brain used to generate the emotion [15]. EOG signals are the recording of corneo-retinal potential generated at the time of eye movement. EOG signal is used to characterize the facial expression. The use of EOG signal in the field of emotion recognition is not significantly reported, although it is used in human computer interface and clinical applications [16]. The limitations of above reported papers are their insignificant accuracy due to improper cross-validation scheme, both of these have been also compared at the end of this paper.

In this study, an attempt has been made to classify the emotions on the basis of EOG and EMG signals. Horizontal and vertical EOG signal with zygomaticus major and trapezius EMG signals has been used to recognize the emotion. Three different types of features are extracted: time domain, frequency domain and entropy based features. These features are further classified through support vector machine, naive Bayes and artificial neural networks classifiers. The novelty and main contribution of this paper is the identification of best pair of features and classifier to achieve maximum classification accuracy. Performance of the classifier on each type of features is analysed through following performance parameters such as accuracy, average recall and average precision.

The paper is organized into 6 sections staring with current section as introduction, followed by Sect. 2, material and methods, including database description and flow of study. Section 3 explains different feature extraction techniques applied. Section 4 explains classification techniques. Sections 5 and 6 present results and conclusions, respectively.

2 Material and methods

The required signals, EOG and EMG are taken from DEAP database (database for emotion analysis using physiological signals). This database contains EEG, EOG, EMG and other physiological signals. The database consists of signals of 32 subjects recorded with 40 electrodes during audio-visual stimuli. Each subject has rated each audio-visual stimulus on the basis of level of arousal, valence, like and dominance. Out of 40 electrodes, four electrodes (channel number 33 to 36, as numbered in DEAP dataset) contain EOG and EMG signals, used in this study. Figures 1, 2, 3, and 4 represent the structure of EOG and EMG signal in 4 emotional states (i.e., happy, relaxing, sad and angry). Table 1 and Fig. 5a help to understand the locations where electrodes have been placed with calculation [17].

Fig.1
figure 1

Structure of horizontal EOG signal for different emotional conditions (a) Happy (b) Relaxing (c) Sad (d) Angry

Fig.2
figure 2

Structure of vertical EOG signal for different emotional conditions (a) Happy (b) Relaxing (c) Sad (d) Angry

Fig.3
figure 3

Structure of zygomaticus major EMG signal for different emotional conditions (a) Happy (b) Relaxing (c) Sad (d) Angry

Fig.4
figure 4

Structure of trapezius EMG signal for different emotional conditions (a) Happy (b) Relaxing (c) Sad (d) Angry

Table 1 (a) List of electrode names and their locations and (b) final channels with required calculation
Fig. 5
figure 5

Affect recognition based on physiological changes during the watching of music video, (a) Placement of EOG (1–4) and EMG (5–8) electrodes[18] (b) Process and flow of study

After getting the required signals, features are extracted. Time domain features include root mean square value, signal power, first difference, line length and second difference. Frequency domain features include dominant power, dominant frequency and total wavelet energy. Entropy-based features are Shannon entropy, spectral entropy and sample entropy [18,19]. After extracting features, the next step is to classify the emotions. In this study, three classifiers are used, namely SVM, artificial neural networks (ANN) and naive Bayes (NB). This entire process and flow of study are also illustrated in Fig. 5b.

2.1 Feature extraction

Feature extraction process extracts signal properties. Time domain features means extracting the features from the signal with respect to time. In frequency domain, the signals are analysed with respect to frequency rather than time. To extract frequency domain features, the signal needs to be transformed into frequency domain. Entropy refers to the uncertainty or disorder in general term. It represents complexity of the physiological signals [18,19,20]. The detailed description of each feature is given below:

2.2 Time domain features

2.2.1 Root mean square (RMS)

For the signal, RMS value is calculated by finding the arithmetic mean of squared value of the signal and taking square of the result. In general term, RMS value represents average power of the signal [19]. Equation (1) shows the mathematical formula for calculating RMS,

$$ {\text{RMS}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{n = 1}^{N} x\left( n \right)^{2} } $$
(1)

2.2.2 Signal power (SP)

In general term, the amount of energy consumed per unit time is called as signal power. It is calculated by considering some samples N, calculating energy of those samples and dividing energy by number of samples considered for computation of energy. The number of samples can be extended to \(\infty\) to calculate total power [19]. Signal power can be calculated using Eq. (2),

$$ {\text{SP }} = \frac{1}{N}\mathop \sum \limits_{n = - N}^{N} x\left( n \right)^{2} $$
(2)

2.2.3 Line length (LL)

This measure is sensitive to the variations in amplitude and frequency of the signal and represents the dimensionality changes in the waveform. It is calculated as summation of absolute difference of pair of samples of signal [18], as given in Eq. (3),

$$ {\text{LL}} = \mathop \sum \limits_{n = 1}^{N} abs\left[ {x\left( n \right) - x\left( {n - 1} \right)} \right] $$
(3)

2.2.4 First difference (D1)

It is one of the time domain features, which is calculated as the sum of the difference between the sample pair of the signal [19]. First difference of the signal is calculated using Eq. (4),

$$ D1{ } = { }\frac{1}{N - 1}\mathop \sum \limits_{n = 1}^{N - 1} \left| {x\left( {n + 1} \right) - x\left( {\text{n}} \right)} \right| $$
(4)

2.2.5 Second difference (D2)

Second difference feature of the signal is calculated same as the first difference. In this N-2 is used in place of N-1 [19]. Second difference is calculated using Eq. (5),

$$ D2{ } = { }\frac{1}{N - 2}\mathop \sum \limits_{n = 1}^{N - 2} \left| {x\left( {n + 2} \right) - x\left( {\text{n}} \right)} \right| $$
(5)

2.3 Frequency domain features

2.3.1 Dominant power (DP)

Out of all the peaks present in the spectrum of signal, the power associated with the dominant peak is called as dominant power [18]. Equation (6) represents the calculation of dominant power,

$$ DP = \max \left( {abs\left( {FFT\left( x \right)} \right)^{2} } \right) $$
(6)

2.3.2 Dominant frequency(DF)

The dominant frequency is the frequency associated with the dominant peak, i.e. the peak with largest dominant frequency [19]. It is obtained from Eq. (7),

$$ {\text{DF}} = {\text{Frequency}} \left( {{\text{Position}}\left( {{\text{DP}}} \right)} \right) $$
(7)

2.3.3 Total Wavelet Energy (TWE)

The sum of the energy calculated from each sub-band of a decomposed signal is called total wavelet energy. Sums of average energy calculated from each of detailed (D1- D4) signal and energy from approximate (A4) sub-band [18]. Equation (8) represents the mathematical formula for calculation of total wavelet energy,

$$ {\text{TWE}} = \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{Nd} D_{E} }}{N}} \right) + (A_{E} ) , $$
(8)

where DE is detailed coefficient of energy, AE is approximation coefficient of energy and Nd is number of detailed coefficients.

2.4 Entropy based features

2.4.1 Spectral entropy

The spectral entropy is a way to observe the irregularity, amount of disorder or complexity of physiological signal [18]. Equation (9) represents the calculation of spectral entropy,

$$ H_{S} \left( x \right) = - \frac{1}{{\log N_{f} }}\mathop \sum \limits_{f} P_{f} \left( x \right)\log_{e} P_{f} \left( x \right) $$
(9)

Here, \(P_{f} \left( x \right)\) is the probability density function estimation of signal x. \(N_{f}\) is the number of frequency components in estimate of power spectral density.

2.5 Shannon entropy

An estimation of the probability density function is calculated by applying histogram estimate over shannon’s channel entropy formula [18]. Equation (10) represents the mathematical formula for calculating the shannon entropy,

$$ {\varvec{H}}_{{{\varvec{SH}}}} \left( {\varvec{x}} \right) = - \mathop \sum \limits_{{\varvec{f}}} {\varvec{P}}_{{\varvec{h}}} \left( {\varvec{x}} \right){\varvec{log}}_{{\varvec{e}}} {\varvec{P}}_{{\varvec{h}}} \left( {\varvec{x}} \right). $$
(10)

Here, \({\varvec{P}}_{{\varvec{h}}}\) is the histogram estimate of probability density function of signal x.

2.6 Sample entropy

A modified version of the approximation entropy used to identify the signal complexity. It has some advantages over approximation entropy including trouble free implementation and data length independence [19]. Sample entropy is calculated using Eq. (11),

$$ S = { } - {\text{log}}\frac{A}{B} $$
(11)

where A represents count of template vector pairs \(d\left[ {X_{m + 1} \left( i \right),X_{m + 1} \left( j \right)} \right] < r\) with length m + 1, while B represents count of template vector pairs \(d\left[ {X_{m} \left( i \right),X_{m} \left( j \right)} \right] < r\) with length m.

In most of the cases, the value of r is chosen as 0.2 time of standard deviation and the value of m is chosen as 2.

Feature extraction process results in the feature vectors which need to be categorized or classified. In this study, three classifiers are used to classify the feature vector namely SVM, ANN and NB. Following section gives detailed description of each classifier.

3 Classification

The aim of classification is to identify the class label for unseen/new feature vector. In this study, three classifiers are used to recognize the emotion of the video depending on the features extracted from physiological signals. The classifiers used are SVM, NB and ANN.

3.1 Support vector machine (SVM)

SVM is one of the widely used classification algorithms. It works on the concept of constructing an optimal hyper plane which increases the difference between two classes. The hyper plane with the largest margin on each side is selectedas an optimal hyper plane. SVM was basically developed for binary classification purposes, but it can also work efficiently on multiclass classification problems [21].

Being a supervised classification algorithm, it needs training over training feature vector Xtr calculated from signal samples and associated target vector Ytr. Xtr is n dimensional feature vector, Xtr = (x1,x2…….xn) and Ytr = (+ 1, − 1) for binary classification problem. The hyper plane is constructed using eq. \({\mathrm{W}}^{\mathrm{T}}\mathrm{x}+\mathrm{b}=0\), where W is the weight vector, x is the input feature vector and b is bias. The optimal hyper plane is the one which minimizes the error function given as Eq. (12), which subjects to the constraints given in Eq. (13),

$$ \emptyset \left( {W, \xi } \right) = \frac{1}{2}W^{T} W + C\mathop \sum \limits_{i = 1}^{N} \xi_{i} $$
(12)
$$ {\text{y}}_{{\text{i}}} \left( {{\text{W}}^{{\text{T}}} \emptyset \left( {{\text{x}}_{{\text{i}}} } \right) + {\text{b}} \ge 1 - {\upxi }_{{\text{i}}} } \right) $$
(13)

In above Eq., \({\upxi }_{\mathrm{i}}\ge 0\), i = 1,…, N, C is a coefficient of cost, W represents the weight vector, b is bias. \(\upxi \) is slack variable. It is distance between xi and the margin plane \({\mathrm{W}}^{\mathrm{T}}\mathrm{x}+\mathrm{b}=\mathrm{Ytr}\). Here, Ytr \(\in \left(+1, -1\right)\) represents the output class label for the input vector xi. Kernel \( \emptyset\) transforms input data to the feature space. The value of cost coefficient is chosen as for positive class cost coefficient will be \({\mathrm{C}}^{+}=\mathrm{C}\times {\mathrm{W}}^{+}\) and for negative class cost coefficient will be \({\mathrm{C}}^{-}=\mathrm{C}\times {\mathrm{W}}^{-}\). The cost coefficient value should be carefully chosen, larger value penalized more error. The value for bias b is initially set to 1 which is updated iteratively [22].

While performing multiclass classification through SVM, two approaches can be used, first is one-vs-all and second is one-vs-one [23]. Here, both the approaches have been used.

3.2 Artificial neural network (ANN)

Artificial neural network is the supervised classification algorithm which replicates connectivity structure biological neurons. The ANN is characterized by four components which include network structure, input to the network, weight vector and activation function. There are various types of ANN available in literature depending on the topology and training algorithm used. In this study, feed forward neural network with scaled conjugate gradient back-propagation algorithm is used. Tan-sigmoid activation function is used in hidden layer, while pure-linear activation function is used in output layer. As ANN is supervised in nature, it needs to be trained before testing. Training is performed by providing input feature vector (x) at input layer and corresponding output label (D) at the output layer [24]. It calculates output using Eq. (14),

$$ Y = \emptyset \left( {\left( {W^{T} *x} \right) + b} \right) $$
(14)

where \(\emptyset\) is activation function at the output layer, W is a weight vector, x is an input feature vector and b is a bias. After calculating output Y, the error is calculated using Eq. (15),

$$ {\text{Error }} = Y - D, $$
(15)

where Y is actual output and D is calculated output. Using the value of the error, the weight and biases are updated as shown in Eq. (16) and (17),

$$ W_{{{\text{new}}}} = W_{{{\text{old}}}} + \left( {\alpha *{\text{Error}}*x} \right) $$
(16)
$$ b_{{{\text{new}}}} = \alpha *{\text{Error}} $$
(17)

This process is repeated until the value of error reduces [25]. After completion of training process, testing is performed by providing test input feature vector whose class label is to be predicted by the ANN classifier.

3.3 Naive Bayes (NB)

This classifier is based on conditional probability for given input feature vector X = (x1, x2,……..,xn) and output class labels C = (c1, c2,……….,cm). Naive Bayes classifier assumes that the presence of particular feature is independent of all the other feature values. For input feature vector ‘X’, if class label is not known, then \(P(\left({c}_{i}|X\right)>P\left({c}_{j}|X\right))\) and i ≠ j,X is classified as class \({c}_{i}\) according to the naive Bayes classifier.

The naive Bayes classifier is based on Naive Bayes theorem which states that \(P\left({c}_{i}|X\right)=\frac{P\left(X|{c}_{i}\right)P({c}_{i})}{P(X)}\), where \(P\left({c}_{i}|X\right)\) is the posterior probability of class label \({c}_{i}\) for given feature vector X. \(P\left(X|{c}_{i}\right)\) is the likelihood which means the probability of generating feature X given a class label \({c}_{i}\). \(P({c}_{i})\) is prior probability which means the probability of occurrence of class \({c}_{i}\), and finally \(P(X)\) is the probability of occurrence of feature X [26]. As predictors of any categories are mutually independent, the joint probability model is expressed as Eq. (18),

$$ P\left( {c_{i} {|}X} \right) \propto P\left( {c_{i} } \right)\mathop \prod \limits_{i = 1}^{n} P\left( {x_{i} {|}c} \right) $$
(18)

4 Results and observations

For training and validation of model, leave-one-subject-out (LOSO) approach has been used. This approach generates 32-fold cross validation as data of 32 subjects have been used. After completion of classification process, the results are calculated and analysed. In this study, the performance of each type of classifier for each type of features calculated is represented in terms of accuracy, average precision and average recall. Performance parameters are obtained through confusion matrix of classification. The mathematical equations for calculation of accuracy (ACC), average precision (Avg_PRE) and average recall (Avg_REC) are given as Eqs. (1921), respectively,

$$ {\text{Accuracy}} = \frac{{{\text{Sum }}\;{\text{of}}\;{\text{correctly}}\;{\text{classified}}\;{\text{instances}}}}{{{\text{Total}}\;{\text{number}}\;{\text{of}}\;{\text{instances}}}} $$
(19)
$$ {\text{Average}}\;{\text{Precision }} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} {\text{class}}_{i} \;{\text{Precision}}}}{N} $$
(20)

Here,

\({\text{class}}_{{\text{i}}} \;{\text{Precision}} = \frac{{\left( {{\text{number}}\;{\text{of}}\;{\text{correctly}}\;{\text{classified}}\;{\text{class}}_{{\text{i}}} } \right)}}{{{\text{Total}}\;{\text{number }}\;{\text{of}}\;{\text{classes}}\;{\text{classified}}\;{\text{as}}\;{\text{class}}_{{\text{i}}} }}\).

$$ {\text{Average}}\;{\text{Recall}} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} {\text{class}}_{{\text{i}}} \;{\text{Recall}}}}{N} $$
(21)

Here,

\({\text{class}}_{{\text{i}}} \;{\text{Recall}} = \frac{{\left( {{\text{number }}\;{\text{of}}\;{\text{ correctly}}\;{\text{ classified }}\;{\text{class}}_{{\text{i}}} } \right)}}{{\left( {{\text{Total }}\;{\text{number}}\;{\text{of }}\;{\text{class}}_{{\text{i}}} \;{\text{instances}}} \right)}}\).

In Tables 2, 3, and 4, performance parameters as results of time domain, frequency domain and entropy based features are listed for all classifiers. These results show the comparative analysis of performance achieved using SVM, ANN and NB classifier with each of the time domain, frequency domain and entropy based feature. From Tables 2, 3, and 4, it is observed that ANN classifier with the time domain features performs better than all the other combination of classifier and features types mentioned above, for EOG-EMG signal-based emotion recognition. The overall performance achieved using each classifier and each type of feature is presented in graphical format in Fig. 6. From Fig. 6a, it is observed that the ANN classifier achieved highest average performance (98%) as compared to SVM and NB classifier. Similarly, from Fig. 6b, it is observed that, overall highest performance (92.75%) is achieved by time domain features as compared to entropy and frequency domain features.

Table 2 Time domain features performance
Table 3 Frequency domain features performance
Table 4 Entropy based features performance
Fig. 6
figure 6

Average accuracy plots (a) Classification algorithms. (b) Feature types

Table 5 compares the presented study with different representative studies in the field of emotion recognition using physiological signals [28,29,30,31,32,33,34,35,36,37,38,39]. Sr. No 1 to 6 in the table present performance of emotion recognition based on different physiological signals using various datasets in terms of classification accuracy. Sr. No. 7 to 14 present the performance of different physiological signals-based emotion recognition tested on DEAP dataset in terms of average accuracy to classify arousal and valence class. Sr. No 15 presents the performance of the study presented in this paper. This average accuracy is achieved by ANN classifier with entropy-based features to classify happy, relaxing, sad and angry emotion. From Table 5, it is clearly observed that present study achieved maximum average accuracy.

Table 5 Comparison of representative studies in the area of physiological signal-based emotion recognition

5 Conclusions

This study investigates the potential of EOG and EMG signals for emotion recognition. Here, four types of emotions, namely happy, relaxing, angry and sad are considered. The time domain, frequency domain and entropy-based features are extracted from these signals. SVM, ANN and NB classifier has been used to recognize and classify the emotion types. Results show that maximum classification accuracy (99%) is achieved using combination of ANN classifier and time domain features. ANN turned out to be the best classifier with overall accuracy of 98%. Time domain features achieved overall accuracy of 92.75%. It is concluded from the presented study that combination of EOG and EMG signals has great potential to recognize emotion. In future, this work will be advanced in the form of human decision making model. Because emotion plays an important role in decision making process. Hence, for proper modeling of human decision making process, emotions must be included. Presented work will be used as first step in development of human decision making models.