1 Introduction

Human emotion has been studied by psychologists for more than 200 years to serve as a diagnostic tool for a large number of mental illnesses. Traditionally, psychologists judge the patients’ emotion states based on the observations including facial expressions and voice characteristics. Even though to recognise one’s emotion is one of the regular tasks for psychologists, there is still a high demand to find an accountable solution for human emotion classification from both research and practical point of view. Especially in the last few decades, the focuses of many scientific and engineering research projects have been oriented to the human body. New terms such as the human-machine interface (HMI) and brain-computer interface (BCI) require the computer to recognise the emotion states and react to the detected emotions to improve the efficiency of the communication between the machine and human. With the rapid development of human anatomy science and biomedical engineering techniques, researchers have been seeking for an engineering-based approach for emotion classification. Comparing to the psychological method, these approaches share the advantages where data acquisition measures have a solid scientific background that guarantees the accuracy of the collected data, signal processing tools offer a mathematical foundation for the calculation of the original data and machine learning algorithms demonstrate the efficiency of the training and predicting process. However, the emotion classification systems need further improvement with the challenges including these four areas: (a) to choose the proper method to collect signal and identify the distinction between the emotion states; (b) to reduce the data dimension to perform fast calculation; (c) to extract the correct features to represent the information embedded in the data; (d) to design the learning algorithm to recognise the patterns and accurately classify the emotion based on the training process.

Numerous efforts have been applied in the field of emotion classification from different aspects to overcome the above difficulties. First of all, researchers have studied many sources for emotion classification. Some visible manifestations have been investigated to detect emotion states by used the vector of 12 speech power coefficients to classify six types of basic human emotion [1]. Researchers also have drawn much interest in facial expression, for example, by analysing facial images [2]. Further works have also been done for tracking the characteristics to study of the emotion states [3, 4]. Moreover, combined sources such as facial expression and body gestures have been applied to improve the accuracy [5, 6]. Nevertheless, a significant drawback of using behavioural modalities for emotion detection is the uncertainty that arises in the case of individuals who either are consciously regulating their emotional manifestations or are naturally suppressive [7]. Thus, researchers moved their focus to psychological signals such as electrocardiography (ECG) and electroencephalogram (EEG), as these are the involuntary reactions of the human body. Anatomically, it has been confirmed that emotional processes could be represented by EEG activities [8]. Here, it is believed that emotions are generated from the limbic system located in the front lobe of the brain. The limbic system activities can be monitored by EEG signal as it represents the brain activities regarding voltage variation from the human scalp. Many recent projects [9,10,11,12,13] have been using EEG signal to perform different feature extraction methods and classifier design.

EEG signal can be enormous if it is collected in a long duration of time with high sampling frequency. In the case of human emotion classification, relative long signal collection time and high sampling frequency needed to provide enough resources for different types of human emotion and the adequate number of samples for analysis to detect emotion variations within small time frame. As a result, the computational time would be tedious, and it would limit the application of the system. Researchers have created several methods to compress EEG signal, and sparse representation is one of the most popular choices. A sparse kernel was used to transform EEG signal into the smaller size dataset [14]. Further work has been carried out for determining the sparsity of EEG in Gabor frame [15]. A recently developed method, compressed sensing (CS), has attracted considerable attention in engineering, which offers a framework for compression of finite-dimensional vectors [16]. Zhang et al. [17] argued that current CS algorithms only work well for sparse signals or signals with sparse representation coefficients. Since EEG is neither sparse in the original time domain nor sparse in transformed domains, current CS algorithms cannot achieve good recovery quality. A Block Sparse Bayesian Learning (BSBL) has been further proposed and demonstrated a high result of recovery rate [17].

Followed by data compression, numerous feature extraction methods have been applied for EEG signal. Statistic features involving maximum and minimum value, peak-to-peak, mean, standard deviation have been explored [18, 19]. In the frequency domain, Fourier Transform (FT) analysis has been investigated as well [20,21,22]. Another method, Wavelet Transform (WT), can be used with controllable wavelet size to detect every change of the signal. It also can localise the change in the signal, which could be overlooked if using FT method. Candra et al. [23] addressed several advantages using Discrete WT involving multi-scale zooming and multi-rate filtering. Both relative wavelet energy and relative wavelet entropy used in Candra's study showed consistently high quality.

Lastly, a classifier is designed to recognise the feature patterns of human emotion states. There are currently more than a dozen of machine learning algorithms invented and a large number of combinations of them. Finding the suitable algorithm for EEG signal is crucial. Classifiers which include Linear Discriminant Analysis (LDA) [24], k-Nearest Neighbour (kNN) [19], Adaptive Neural Fuzzy Inference Systems (ANFIS) [22], binary linear Fisher’s Discriminant Analysis (FDA) [25], Hidden Markov Model (HMM) [26] and Support Vector Machine (SVM) [19, 23, 27] have been utilised. Furthermore, Neural Networks (NN) including deep learning network with principal component-based covariant shift adaption [28] and statistical features back-propagation neural network [29] are also studied in emotion classification projects. The comparison between each algorithm shows that SVM has higher accuracy due to the excellent discrimination performance in binary decision problems [30].

Another property of the EEG signal is the nature of non-linear and non-stationary. EEG signal is usually collected using the 10-20 system that covers all parts of the brain. It is undoubted that any activity would cause responses from more than one part of the brain. However, there is no clear way to find which part of the brain is working for an individual emotion state, neither to understand how closely related between the extracted features and the result. The vague definition of the acquired signal and the output catalogues has caused serious troubles for the classification. One of the solutions for such situation is to use fuzzy logic. Fuzzy logic provides a foundation for approximate reasoning using imprecise propositions based on fuzzy set theory [31]. A Fuzzy C-Means and Fuzzy k-Means clustering methods have been implemented for classifying the emotions [32]. Additionally, fuzzy cognitive maps (FCMs) are fuzzy models that combine aspects of fuzzy logic, neural networks and non-linear dynamical systems [33]. FCM is firstly studied by Kosko [34] in 1986, then the method has been used in several engineering disciplines. Based on the analysis by Vliet et al.[35], the motivations for using FCM are easy to build and parameterise, flexibility in representation, easy to use and understand, handle complex and dynamic problems. Papageorgiou and Salmeron [36] state that FCM has been used as the classification tool in medicine, business information and agriculture. Salmeron [37] designed a three layer FCM-based classifier for artificial emotions forecasting. FCMs are capable of revealing the inter-relationship between their input states. Nevertheless, fuzzy logic is rarely used in emotion classification, nor in the field of EEG signal processing and analysis. FCM is able to find the inter-relationship between the features, and the relationship would help to select the feature according to the significance. Therefore, combining both algorithms together would improve the accuracy. If a hybrid classifier could merge the merits of both SVM and FCMs, the classification system would reduce the uncertainty of the EEG signal and produce higher accuracy.

The paper proposes a novel approach by using hybrid SVM and FCM classifier to detect human emotion, where EEG signal is compressed by CS, and two WT features are calculated. Section 2 explains the material used in the experiment, as well as the methodologies in each module of the project. The detailed methods for preprocessing, CS, feature extraction and classifier design are given in Sect. 3. Section 4 shows the results and some discussion regarding the classification system. Section 5 concludes the project and recommends some future work.

2 Material and Methodology

In this paper, an online EEG resource, the Database for Emotion Analysis using Physiological signals (DEAP), is selected to be the experimental input dataset. Although invasive methods such as attaching electrode probes directly on the surface of the brain have been developed to collect the electrical movement, considering the objective is to perform emotion classification with easy set-up, a non-invasive process should be preferred. However, non-invasion method, which is collecting the signal from human scalp, introduces a significant amount of noise. Therefore, a high standard electrode placement system and collecting equipment are required. Moreover, to stimulate and record the correct emotion states are also challenging. DEAP dataset has considered both of these factors with a self-assessment questionnaire to maximise the accuracy of the collected EEG signal and the emotion states recorded [38].

Fig. 1
figure 1

2D emotion planes

DEAP dataset contains 32 test subjects to perform EEG signal collection using music video clips to stimulate various emotion states. The 32 participants are asked to sit still, watching 40 music video clips lasting 1 minute. The EEG signal is extracted using the standard 10–20 system, which contains 32 electrodes. The dataset evaluates emotion states using four scales, same as we used later in the classification process, namely valence, arousal, dominance and liking. The emotion is considered from 1–9 in each scale. For instance, number 1 in valence means a deficient valence emotion state, and number 9 in dominance indicates high dominance state. Thus, there is in total four emotion responses for one music video for each. The dataset is downloaded from the official website, and the data file is translated into '.mat’ format. For each subject, two matrices record their EEG signal and corresponding emotion states. The first matrix has three dimension, which includes 40 videos labelled from 1 to 40, 32 EEG electrodes as the Geneva order and 60 seconds EEG signal in millivolts. The other matrix indicates the emotion states in four scales, namely valence, arousal, dominance and liking. All four variables are chosen, and two 2D planes are generated based on them. The emotion states are measured on the emotion plane shown below in Fig. 1. The median value, 5, is used to separate these four areas in both axes.

Fig. 2
figure 2

A simple FCM

Due to the uncertainty of EEG signal, it is necessary to implement fuzzy logic to find the inter-relationship. This paper proposes to use FCM as the classification algorithm. A simple FCM structure graph is shown in Fig. 2. C1 to C5 represent each node or state of the system. The connection between each individual is demonstrated as the weight matrix W. Therefore, at each step, the new values of each node depends on the activation function, which is a sigmoid function in this project. The vector state is calculated as:

$$\begin{aligned} c_{i}^{t+1} = f \left(c_{i}^t + \sum _{j=i}^{n} W_{ji} \cdot c_{j}^t \right) \end{aligned}$$
(1)

After an inference process, the FCM reaches either one of two states following some iterations. It settles down to a fixed pattern of node values, the so-called hidden pattern or fixed-point attractor.

Fig. 3
figure 3

Functional block diagram

The functional block diagram in Fig. 3 shows the general development procedures. The project is divided into three modules. First, the raw signal is collected from the standard 10–20 system, then pass to preprocessing that removes noises and segments the long signal into 6-second EEG epochs. The second module uses CS to compress the size of the preprocessed EEG signal into a smaller dimension. Then, WT analysis is used to extract the features form the compressed data. Two parameters are calculated from each segment forming the extracted features. The features enter the last module, classification, which contains an SVM classifier and a hybrid SVM and FCM classifier. The output of the SVM classifier is also used as the input of the hybrid one. Finally, the output is the classified emotion according to the arousal-valence plane and the dominance-liking plane.

3 Methods

3.1 Preprocessing

During the preprocessing stage, the raw EEG signal is firstly down-sampled to 128 Hz sample rate. Electrooculography (EOG) is removed in that eye movement is the primary noise source. There is also another low-frequency noise embedded in the signal. Thus, a bandpass filter with lower cutoff frequency at 4 Hz and higher cutoff frequency at 45 Hz is designed to remove the various sources of noise. The noise-free EEG signal is segmented into 6 second time window as this window size returns the highest accuracy [23]. The training set and testing set are chosen based on the scales of the emotion. Eight sets of high emotion scale and eight sets of low emotion scale are selected as the training set. The rest of the sets are used as the testing set. In this project, all training and testing are dependent on each of the test subject.

3.2 Data Compression

CS is initially defined to compress a large data \({{\mathbf {x}}}\) with the length \(N\) into a much smaller matrix by a random matrix, denoted by \(\varPhi\), i.e.,

$$\begin{aligned} {{\mathbf {y}}} = \varPhi {{\mathbf {x}}}, \end{aligned}$$
(2)

where \({{\mathbf {y}}}\) is the compressed data, and \(\varPhi\) is the sensing matrix with the size of \(M \times N\). However, EEG signal cannot directly use this equation due to the lack of zero in the input signal. The success of CS highly depends on the assumption that most of the input data are zero. An alternative approach is utilised. A dictionary matrix is calculated initially, then the equation becomes:

$$\begin{aligned} {{\mathbf {y}}} = \varPhi {{\mathbf {D}}} {{\mathbf {z}}}, \end{aligned}$$
(3)

where \({{\mathbf {D}}}\) has the dimension of \(M \times M\) and \({{\mathbf {z}}}\) is sparse. In this equation, the CS algorithms recover \({{\mathbf {z}}}\) first, and then recover the original signal \({{\mathbf {x}}}\). In this project, the EEG epoch contains 768 samples, which is the input data for CS. Using Eq. (2), the compressed data only has 192 samples in each segment.

3.3 Feature Extraction

DWT is utilised for this project due to the advantages of time-frequency localisation, multi-scale zooming and noise filtering. Alpha, beta and gamma bands are studied in this project. The specific variables for wavelet transform are determined based on previous studies [23]. The wavelet used is Daubechies 5 (db5) and the decomposition level is 6 because the dominant frequency components of EEG signal is between 8 and 64 Hz.

DWT coefficients are calculated using the following equation:

$$\begin{aligned} <f,\psi _{a,b}>= C_{a,b} = \int _{-\infty }^{\infty }(\frac{1}{\sqrt{2^{-j}}})\psi (\frac{t-2^{-j}k}{2^{-j}}). \end{aligned}$$
(4)

In Eq. (4), \(2^jk\) and \(2^j\) are the time localisation and scale, respectively, while \(\psi (t)\) denotes the mother wavelet function. DWT coefficients are used in the later features calculation.

The first feature is relative wavelet energy. It is calculated as follows:

$$\begin{aligned} p_j = \frac{\sum _{k}^{N}|C_j(k)|^2}{\sum _{j}\sum _{k}|C_j(k)|^2}, \end{aligned}$$
(5)

where \(C_j(k)\) represents the detail coefficients, which indicates that the numerator is the detail wavelet energy. The denominator is the total wavelet energy. The probability \(p_j\) reveals the time-scale density of the input data.

The second feature extracted is relative wavelet entropy. It is expressed below:

$$\begin{aligned} S_{wt}(p|q) = \sum _j p_j \cdot {ln(\frac{p_j}{q_j})}, \end{aligned}$$
(6)

where the variable \(q_j\) is used as the reference distribution to give a more accurate value for \(p_j\). Relative wavelet entropy shows the similarity between two probabilities. In this design, the Shannon entropy is utilised.

3.4 Hybrid SVM and FCM Classifier

Figure 4 demonstrates the relationship between SVM and FCM. It is the core concept of how to design hybrid SVM and FCM classifier. This research project uses MATLAB as the programming language. The architecture of the hybrid SVM and FCM classifier is represented in Fig. 4. The extracted features, which are relative wavelet energy and relative wavelet entropy, are firstly entered the SVM classifier. These wavelet features formulate the first layer of the hybrid classifier. The output of the SVM classifier combined with these two features forms a ten-node hidden layer. The sigmoid function is utilised as the activation function. After the system converges, the output layer shows the stable state of the nodes. This process also calculates the weight matrix of the hybrid classifier. The weight matrix reveals the connections between the hidden layer and the output layer, as well as the relationship within the output nodes. A defuzzification method is applied to transform the value of the nodes into scales of the human emotion states. Since the states of each node are within − 1 and + 1, the low-scale emotion is the negative values, and vice versa.

Fig. 4
figure 4

Hybrid SVM & FCM classifier

4 Results and Discussion

Fig. 5
figure 5

A segment of EEG and its recovered signal from CS

Figure 5. demonstrates a random chosen EEG signal segment in its original form and recovered form. The initial 6-second section consists of high-frequency noise and 768 number of samples. CS method uses Eq. (3), where \({{\mathbf {D}}}\) is the inverse db5 WT matrix, and \(\varPhi\) is a sparse binary matrix, to reduce the noise and the size of the data. It is clear that EEG peaks and valleys have preserved from the CS process. For example, around sample number 100 and 200, highlighted in red and yellow, the peaks are consistent with little distortion. At the same time, the recovered data is smoother, such as the blue highlighted areas that are from sample number 360 to 410, and sample number 580 to 605. The overall presentation of the recovered signal contains the major activities of the original one. The compression process reduces the size of the data into one-fourth of its original size based on Eq. (2), without losing critical information.

Fig. 6
figure 6

Accuracy on valence-arousal plane

Two types of classifiers are tested using the compressed data. A single SVM with Radial Basis Function is first implemented to examine the 6-second EEG epochs and to identify the emotion states from the four planes. Then, A hybrid SVM and FCM classifier is designed to serve the same task. Lastly, the hybrid classifier uses the output of each epoch to recognise the overall emotion states for the music videos. This process is considered as the revised process of the segmentation in preprocessing. As the continuous EEG signal with the same emotion reaction is separated into 10 sections, the last classifier merges the results of the epochs for the same videos and calculate the overall scale on the emotion planes. Figure 6 shows the accuracy of the three classifiers for the 32 test subjects on the valence-arousal plane. When the classifiers are tested on the EEG epochs, the hybrid SVM and FCM, which is the red line, is always higher than the single SVM classifier, which is the green line. This is because the hybrid classifier adds the fuzzy logic after the SVM classifier. By doing such, the connectivities within the emotion states and the features can be mapped, and it will correct previous errors by the SVM classifier. A similar trend can be viewed on the dominance-liking plane as well, which is shown in Fig. 7.

Fig. 7
figure 7

Accuracy on dominance-liking plane

The last classifier designed is shown on both Figs. 6 and 7 using the blue line. The hybrid classifier (videos) merges the result of ten predictions to form one outcome. The accuracy is expected to be higher than the previous ones. The result from all four planes justifies the assumption; however, in each plane, there are a few test subjects showing low accuracy using the last classifier. In Fig. 6, test subject 4 and 9 are showing much lower accuracy comparing with the epoch methods. Test subject 7 has the same tendency on the dominance-liking plane. This classifier demonstrates the characteristic of emphasising the accuracy of the results. In another word, the third classifier can improve the result to a higher accuracy if the previous classifiers can successfully recognise the correct emotion; on the other hand, if the previous classifiers perform poorly, the accuracy of the third classifier would be even lower, because one music video represents one kind of emotion and only one kind. In another word, one music video clip corresponds to a fixed number of four emotion scales. Therefore, if the input for the third classifier is the mean value of each music video, the accuracy will increase if the majority of the window segments recognise the right emotion. However, in some cases like participant 7, the accuracy drops. The possible reason is that the emotion experiencing by the participant does not match the standard emotional response.

Fig. 8
figure 8

Average accuracy with standard deviation on four emotion planes. Blue bar represents classical SVM classifier. Red bar represents hybrid SVM and FCM classifier using epoch. Yellow bar represents hybrid SVM and FCM classifier using video

The average accuracy for all 32 test subjects is calculated and presented in Fig. 8. The hybrid classifier using videos as a whole has the highest accuracy, which is at 78.39%. Comparing with the single SVM classifier, the hybrid one has a consistent 3.23% increase in accuracy. The dominance plane has the largest improvement, which is 3.63%, and the liking plane has the smallest rise, which is 2.77%. The overall accuracy of the hybrid SVM and FCM classifier for epoch testing is 73.32% and for video testing is 78.03%. From the confusion matrix shown in Table 1, it is clear that although the third classifier has the highest accuracy, it also has the highest standard deviation error; however, the classifier of the epoch testing has a much smaller error range than the other two classifiers. Therefore, even though the average accuracy of the videos testing classifier is higher, the system appears to be less stable than the epoch testing hybrid classifier. This finding is consistent with the discussion demonstrated in Figs. 6 and 7.

Table 1 Confusion matrix for the three designed classifiers

5 Conclusion

This paper proposed a complete human emotion classification system using EEG signal. Due to the large size of the dataset, a series of preprocessing techniques are used, and advanced CS algorithm is designed for EEG signal to reduce the dimension. Wavelet analysis is then implemented to extract the distinctive features. A hybrid classifier which combines SVM and FCM is used to reveal the connection between each state and eventually to classify the emotion states. CS has successfully reduced the size of the EEG signal four times smaller without losing critical information. The hybrid classifier demonstrates consistent improvement comparing with the single SVM classifier, which is 3.34% of the increase in valence axis, 3.19% in arousal axis, 3.63% in dominance axis, 2.77% in liking axis and overall of 3.23%. When considered one piece of 60-second video as a whole, the accuracy can reach 95.83%, with an average of 78.03%. The future work can investigate the connectivity between each EEG electrode using similar methods. With a more comprehensive understanding of the connectivity of the human brain, many brain-related research projects can be benefited from the results, and possible breakthroughs can be expected.