Keywords

1 Introduction

Emotional states highly influence both human interaction and human computer/machine interaction. In fact, analyzing emotions has attracted enormous interest in the development of systems that can interact automatically with the user, e.g., brain-computer interfaces (BCI) [5]. Regarding this, emotion representation is divided into two broad categories: discrete and dimensional. The former includes basic emotions such as: anger, joy, surprise, disgust, fear, and sadness. The latter comprises the analysis of few subtle dimensions that can define an emotional stimulus from a more physiological point of view [8]. In particular, the emotions under dimensional category employs the arousal vs. valence space characterization to describe the active/passive and the positiveness/negativeness responses, respectively, against a given emotional stimulus. Thereby, a wider range of emotions can be analyzed and quantified than in the discrete representation case [6, 13].

Concerning the emotion assessment approaches, initial attempts included audiovisual data. This type of data allows the detection of few basic emotions (discrete representation), however, the analysis of facial expressions and speech proves a challenging task due to the inter-subject variability of discriminant emotion patterns [3]. Namely, visual emotion responses derived from body movements and facial expressions are regulated by the subject, that is why the audiovisual information lacks sort of robustness in this particular task. On the other hand, recent approaches use physiological data to support the assessment [13]. Physiological data allows studying different biological responses in the human body related to the central nervous system, which includes more accurate and detailed emotion patterns than audiovisual ones [10]. Although capturing physiological data poses an invasive sensing, recent efforts to improve the acquisition technology have been made. In particular, the electroencephalogram (EEG) provides a set of time series that allows the analysis of neural activity in different brain regions that can be easily related to cognitive processes, i.e., emotions [1, 6]. Recent studies demonstrated that the EEG data and some cortical and subcortical regions of the brain could be used effectively for the discrimination of emotion responses [11]. Indeed, the EEG is preferred instead of other brain mapping technologies as functional magnetic resonance imaging (fMRI), because of its non-invasive scheme and improved time resolution. Nonetheless, some issues associated with the use of EEG include the low space resolution and the complex spatiotemporal relationships among channels.

Some works have tried to recognize emotions from EEG data by extracting a set of static features under constrained frequency bands, namely: theta, gamma, alpha, and beta rythms [12]. Besides, more elaborate feature extraction approaches, i.e., Dual-Tree Complex Wavelet Packet Transform (DT-CWPT), have been introduced to highlight emotion patterns from EEG recordings [3]. However, their results are still far from being satisfactory [1, 13]. Recent techniques employ functional connectivity (FC) representations to support emotion assessment by the computation of statistical dependencies among EEG time series [4]. Such dependencies aim to code the relations of neurophysiological events characterized by generalized synchronization (GS), phase synchronization (PS), and information theory (IT) measures [9]. In this sense, authors in [6] employ a PS measure to detect the reactive band and relevant synchronized regions of the brain related to different emotions. Moreover, authors in [10] used a mapping technique to group a region of interest from EEG time series that gives an improved location of the brain areas related to emotional states. Similarly, authors in [7] exploited the correlation and the coherence measures within a graph theory scheme for emotion assessment. Though algorithms based on FC seem to be promising, the variability of the inter-channel dependencies and the selection of the FC measure still pose an open issue. Besides, the assessment success highly depends on the subject at hand that is related to the particular form in which the brain of each person works.

In this work, we introduce an FC variability (FCV) representation strategy to classify emotional states from EEG data. Our proposal codes FC variations from three different measures: correlation, coherence, and mutual information. Moreover, a supervised kernel-based relevance analysis is used to quantify each FC measure significance. Thus, the inter-subject dependency regarding the emotion assessment is addressed as a feature relevance analysis task concerning the employed measure. Our approach is tested using a publicly available database known as Database for Emotion Assessment using Physiological Data (DEAP). In particular, a bi-class problem is built for both arousal and valence dimensions. The obtained results show competitive performances in comparison to state-of-the-art methods for subject-dependent emotion recognition. The rest of the paper is organized as follows: In Sect. 2, we present the theoretical background of FCV representation with relevance analysis. Section 3 describes the experimental set-up for emotion assessment, Sect. 4 discusses the obtained results, and the concluding remarks are outlined finally in Sect. 5.

2 Materials and Methods

2.1 Functional Connectivity Using a Variability-Based Representation

Let \({\varvec{u}},{\varvec{v}}{{\mathrm{\,\in \,}}}\mathbb {R}^L\) be a pair of EEG records of size L, a FC measure \(\xi \,\,\,{:}\,\mathbb {R}^L\,\times \,\mathbb {R}^L\,\rightarrow \, \mathbb {R}\) between \({\varvec{u}}\) and \({\varvec{v}}\) can be defined in terms of their statistical interdependence. Following some well-known FC measures are briefly described.

Correlation-(COR). The linear correlation \(\xi _{COR}\left( {\varvec{u}},{\varvec{v}}\right) {{\mathrm{\,\in \,}}}\left[ -1,1\right] \) between \({\varvec{u}}\) and \({\varvec{v}}\) in the time domain is computed by the Pearson’s correlation coefficient as:

$$\begin{aligned} \xi _{COR}\left( {\varvec{u}},{\varvec{v}}\right) = \frac{1}{\sigma _u \sigma _v}\sum ^l_{L=1}{\left( u_l-\bar{u}\right) \left( v_l-\bar{u}\right) }, \end{aligned}$$
(1)

where \(\sigma _u,\sigma _v{{\mathrm{\,\in \,}}}\mathbb {R}^+\) and \(\bar{u},\bar{v}{{\mathrm{\,\in \,}}}\mathbb {R}\) are the standard deviation and the mean values of \({\varvec{u}}\) and \({\varvec{v}},\) respectively.

Coherence-(COH). The linear time-invariant relationship between \({\varvec{u}}\) and \({\varvec{v}}\) at frequency range \([f_\mathrm{{min}},f_\mathrm{{max}}]\) is calculated trough the coherence measure as:

$$\begin{aligned} \xi _{COH}\left( {\varvec{u}},{\varvec{v}}\right) = \frac{1}{f_\mathrm{{max}}\,\,\,{-}\,f_\mathrm{{min}}}\sum \limits ^{f_\mathrm{{max}}}_{f{{\mathrm{\,=\,}}}f_\mathrm{{min}}}{ \frac{\left| \zeta _{uv}\left( f\right) \right| ^2}{\zeta _{uu}\left( f\right) \zeta _{vv}\left( f\right) }}, \end{aligned}$$
(2)

where \(\xi _{COH}\left( {\varvec{u}},{\varvec{v}}\right) {{\mathrm{\,\in \,}}}\, [0,1]\), \(\zeta _{uv}\left( f\right) {{\mathrm{\,\in \,}}}\mathbb {C}\) is the cross-spectrum of \({\varvec{u}}\) and \({\varvec{v}},\) and \(\zeta _{uu}\left( f\right) \), \(\zeta _{vv}\left( f\right) {{\mathrm{\,\in \,}}}\mathbb {C}\) are the power spectrum of \({\varvec{u}}\) and \({\varvec{v}},\) respectively.

Mutual Information-(MI). The MI between \({\varvec{u}}\) and \({\varvec{v}}\) allows revealing the uncertainty amount of one time series by observing the other. So, high-order correlations can be computed utilizing probability density estimators as follows:

$$\begin{aligned} \xi _{MI}\left( {\varvec{u}},{\varvec{v}}\right) = \sum \limits ^L_{l=1}{\hat{p}(u_l,v_l)\log \left( \frac{\hat{p}(u_l,v_l)}{\hat{p}(u_l)\hat{p}(v_l)}\right) }, \end{aligned}$$
(3)

where \(\hat{p}(u_l,v_l){{\mathrm{\,\in \,}}}\,[0,1]\) is an estimation of the joint probability density function and \(\hat{p}(u_l),\hat{p}(v_l){{\mathrm{\,\in \,}}}\,[0,1]\) are the marginal density function approximations of \(u_l\) and \(v_l\).

In practice, an emotion assessment framework includes a set of EEG data trials denoted as \(\varPsi {{\mathrm{\,=\,}}}\{ {\varvec{X}}_n {{\mathrm{\,\in \,}}}\mathbb {R}^{C \times T}\,:\,n{{\mathrm{\,=\,}}}1,2,\dots ,N\}\), where \({\varvec{X}}_n\) is the n-th observed trial with C channels and T time instants. Furthermore, let \(\varGamma {{\mathrm{\,=\,}}}\{b_n\}\) be the class label set, termed the emotion dimension class, where \(b_n{{\mathrm{\,\in \,}}}\,\{-1,+1\}.\) Given the channel \({\varvec{x}}_c{{\mathrm{\,\in \,}}}\mathbb {R}^T\) of an observed EEG trial \({\varvec{X}}\), we initially estimate a set of overlapped segments \(\{{\varvec{z}}_c^j{{\mathrm{\,\in \,}}}\mathbb {R}^{L}\,:\,j{{\mathrm{\,=\,}}}1,2,\dots ,Q\}\) which are split from \({\varvec{x}}_c,\) being \({\varvec{z}}_c^j\) the c-th channel at the j-th window. To model time-variant dependencies among EEG channels, we compute the above-described FC measures between channel segments by building the set \(\{{\varvec{A}}^j{{\mathrm{\,\in \,}}}\mathbb {R}^{C\times C}\},\) where matrix \({\varvec{A}}^j\) holds elements:

$$\begin{aligned} a^j_{cc'} = \xi _{m}\left( {\varvec{z}}^j_c,{\varvec{z}}^j_{c'}\right) , \end{aligned}$$
(4)

with \(a^j_{cc'}{{\mathrm{\,=\,}}}a^j_{c'c},\) \(m{{\mathrm{\,=\,}}}\,\{\mathrm{{COR}},\mathrm{{COH}},\mathrm{{MI}}\},\) and \(c,c'{{\mathrm{\,=\,}}}1,2,\dots ,C\). Afterwards, both the mean and the variance of each provided measure along segments are stored in matrices \({\varvec{\varDelta }}{{\mathrm{\,\in \,}}}\mathbb {R}^{C\times C}\) and \({\varvec{\varOmega }}{{\mathrm{\,\in \,}}}\mathbb {R}^{C\times C},\) holding elements:

$$\begin{aligned} \varDelta _{cc'} =&\frac{1}{Q}\sum \limits ^Q_{j=1}{a^j_{cc'}},\end{aligned}$$
(5)
$$\begin{aligned} \varOmega _{cc'} =&\frac{1}{Q}\sum \limits ^Q_{j=1}{\left( a^j_{cc'}-\varDelta _{cc'}\right) ^2}. \end{aligned}$$
(6)

Finally, the feature vector \({\varvec{y}}{{\mathrm{\,\in \,}}}\mathbb {R}^{C(C-1)}\), coding the FC variability (FCV), is built after vector concatenation of \({\varvec{\varDelta }}\) and \({\varvec{\varOmega }}\) matrices (\(\varDelta _{cc'}{{\mathrm{\,=\,}}}\varDelta _{c'c}\) and \(\varOmega _{cc'}{{\mathrm{\,=\,}}}\varOmega _{c'c}\)).

2.2 Relevance Analysis of Extracted FCV

Given a provided EEG set, a feature matrix \({\varvec{Y}}_m{{\mathrm{\,\in \,}}}\mathbb {R}^{N\times C(C-1)}\) can be obtained from Eqs. (5) and (6) by extracting FCV patterns based on the m-th measure, i.e., COR, COH, and MI. So, to highlight the most relevant connectivity measure regarding the set (subject) at hand, here, we employ a supervised kernel-based relevance analysis to take advantage of the available joint information, associating FCV variations to a given emotion dimension value. Namely, the FCV similarities among EEG trials \({\varvec{y}}_n,{\varvec{y}}_{n'}{{\mathrm{\,\in \,}}}{\varvec{Y}}_m\) are coded by estimating a Gaussian kernel matrix \({\varvec{K}}_m{{\mathrm{\,\in \,}}}\mathbb {R}^{N\times N}\) on \({\varvec{Y}}_m,\) as follows:

$$\begin{aligned} k_{nn'} = \exp \left( {-\Vert {\varvec{y}}_n-{\varvec{y}}_{n'}\Vert }/{2\sigma ^2}\right) , \end{aligned}$$
(7)

where \(n,n'{{\mathrm{\,\in \,}}}N\) and \(\sigma {{\mathrm{\,\in \,}}}\mathbb {R}^+\) is termed the kernel bandwidth. Further, on the emotion dimension space, we also estimate a kernel matrix \({\varvec{L}}{{\mathrm{\,\in \,}}}\mathbb {R}^{N\times N}\) as follows:

$$\begin{aligned} l_{nn'} = \delta \left( b_n-b_{n'}\right) , \end{aligned}$$
(8)

where \(\delta \left( \cdot \right) \) is the delta function. It is worth noting that each defined kernel reflects a different notion of similarity (FCV vs. labels). Therefore, we must still evaluate how well the kernel-based similarity matrix \({\varvec{K}}_m\) matches with the target matrix \({\varvec{L}}\). To this end, a Centered Kernel Alignment (CKA) functional is used to appraise such a match as the inner product of both kernels to estimate the dependence \(\mu _m{{\mathrm{\,\in \,}}}[0,1]\) between the jointly sampled data as follows [2]:

$$\begin{aligned} \mu _m= \frac{\langle {\bar{{\varvec{K}}}_m},\bar{{\varvec{L}}}\rangle _{\texttt {F}}}{\sqrt{\langle {\bar{{\varvec{K}}}_m},{\bar{{\varvec{K}}}_m}\rangle _{\texttt {F}}\langle {\bar{{\varvec{L}}},\bar{{\varvec{L}}}\rangle _{\texttt {F}}}}}, \end{aligned}$$
(9)

where \(\langle \cdot ,\cdot \rangle _{\texttt {F}}\) is the matrix-based Frobenius inner product. \({\varvec{\bar{K}}}\) stands for the centered kernel matrix \({\varvec{\bar{K}}}{{\mathrm{\,=\,}}}{\varvec{\tilde{I}}}{\varvec{K}}{\varvec{\tilde{I}}}\), \({\varvec{\tilde{I}}}{{\mathrm{\,=\,}}}{\varvec{I}}\,-\,{\varvec{1}}^\top {\varvec{1}}/N\), \({\varvec{I}}{{\mathrm{\,\in \,}}}\mathbb {R}^{N\times N}\) is the identity matrix, and \({\varvec{1}}{{\mathrm{\,\in \,}}}\mathbb {R}^N\) is the all-ones vector. In this sense, \(\mu _m\) weights allow ranking the relevance of an FCV, that is, the higher \(\mu _m\) value the better the m-th FCV representation regarding the emotion labels. So, the highstest weigth value is employed to select the most relevant FCV (RFCV) for a given EEG set.

3 Experimental Set-Up

Testing Dataset and Preprocessing. The well-known Database for Emotion Assessment using Physiological Data (DEAP) is used to test the introduced FCV approach. The DEAP is publicly available and contains physiological recordings from 40 emotion elicitation experiments of 32 subjects. Each subject was requested to watch a one minute portion of a video that induces a particular emotion, then, an auto-tagging system captured the arousal, valence, dominance, and liking level of each video within the range 1 to 9. The collected data includes the following signals: EEG, electrooculogram, galvanic skin response, temperature, among others. The EEG data were acquired using a 32 channel biosemi configuration at 128 Hz and filtered by an artifact removal stage [8].

FCV Training. The proposed FCV approach is tested as feature extraction tool for emotion assessment. Thus, each DEAP subject dataset is configured as a biclass problem for both arousal and valence dimensions. The first class corresponds to arousal/valence levels between 1 and 5, meanwhile, the second one holds levels between 5 and 9. Furthermore, a window of 9 s with \(25\%\) overlapping is employed to compute the inter-channel dependencies based on FCV. The fixed window size aims to highlight channel dependencies under alpha, beta, gamma, and theta rhythms along time. Likewise, the configuration of the frequencies bands for the coherence measure are related to the aforementioned rythms \((f_\mathrm{{min}}{{\mathrm{\,=\,}}}4\) Hz and \(f_\mathrm{{max}}{{\mathrm{\,=\,}}}47\) Hz). Here, the FC measures are computed using the HERMES MatLab toolbox [9]. Subsequently, the FCV-COR, FCV-COH, FCV-MI, and RFCV are computed as in Sects. 2.1 and 2.2, yielding a feature extraction matrix \({\varvec{Y}}{{\mathrm{\,\in \,}}}\mathbb {R}^{N \times P}\) with \(N{{\mathrm{\,=\,}}}40\) emotion elicitation videos and \(P{{\mathrm{\,=\,}}}992\) features for each considered representation. Finally, the discrimination between emotion classes is carried out based on a k-nearest neighbor classifier under a Gaussian similarity criteria. A nested 10-fold cross-validation strategy is used to test the system performance, where the number of nearest neighbors of the applied classifier is fixed as the one reaching the best accuracy within the following testing range \(\{1, 3, 5, 7, 9, 11\}.\)

Fig. 1.
figure 1

FC measures for the 32 EEG array in different time window (TW). Top row - COR measure, middle row - COH measure and bottom row - MI measure. Columns 1–3 from left to right corresponds to each measure in different (non-subsequently) TW. Column 4 is the average and column 5 the variance for all the time windows.

4 Results and Discussion

The FC scheme detailed in Sect. 2.1 allows the visualization of the variability in the connectivity patterns between EEG channels. Figure 1 shows an example of some time windows from the three measures over the subject 13 in a experiment with arousal and valence ratings of 8.09 and 6.15 respectively. It can be seen in Fig. 1 the variations in the dependences of channels from the EEG array for few time windows. As seen, the relationships on different channels from the EEG array varies in time, and some strong interdependences could be found according each FC measure. For this particular subject/experiment, the COR measure exhibit a strong interdependences between the majority of channels with a small degree of variability among all the time windows (Figs. 1(a)–(c)). On the other hand, for the COH (Figs. 1(f)–(h)) and MI measures (Figs. 1(k)–(m)), there is a higher degree of variability among time windows. The discussed variability for each measure is consequently summed up in the average and variance figures (columns 4–5 from Fig. 1). The average FC allows to observe the channels with strong interdependences as well as the channels with weak interdependences in the whole experiment. Likewise, the FC variance shows the channels interdependence variability across the experiment, with a higher degree of variability for the majority of channels in the COH (Fig. 1(j)) and the MI (Fig. 1(o)) measures.

Fig. 2.
figure 2

Gaussian kernel transformation applied to the three FC connectivity measures and the targets matrix for two subjects 13 and 18

On the other hand, in Fig. 2, the FCV-based representation and emotion label similarities for each considered measure (see Sect. 2.2) can be analyzed. In this particular case, the FCV corresponds to the subjects 13 and 18 and the set of 40 emotion elicitation experiments. We can infer by visual inspection that exist a higher similarity between the FCV-MI approach and the target matrix for the subject 13 (Figs. 2(a)–(d)), which is also coded by the computation of the weights \(\mu \) in the RFCV representation. In the other case, for the subject 18 there is a higher relation in the FCV-COR with the targets representation than for the FCV-COH and the FCV-MI (Figs. 2(e)–(h)). For both cases the RFCV allows to code the measure that seems to present the highest correlation with the targets.

FCV is used for classification purposes as stated in the experimental setup. A graphical description of those results can be found in Fig. 3, where the classification accuracy (CA) for each subject and each dimension are presented. Figures 3(a), (b), and (c) show CA for the 32 subjects in arousal dimension using the FCV-COR, FCV-COH and FCV-MI representation respectively. Likewise, Figs. 3(d), (e), and (f) present the CA for all subjects in the valence dimension. From the figures it can be noticed the differences in CA among subjects that evidences the subject-dependency of the FC measures. Also, a summary for each FCV measure is included in Fig. 3(g) for the arousal dimension and Fig. 3(h) for the valence dimension. From those figures, small differences in the CA when the FCV scheme is applied could be noticed and there is no evidence of one of the FCV schemes to present a superior performance in comparison to the others.

Fig. 3.
figure 3

Boxplots of classification accuracy (CA) per subject in each FC measure. Top row - arousal, middle row - valence and the bottom row, average CA for both dimensions

Table 1. Mean emotion classification results [\(\%\)] for all considered DEAP subjects.

Finally, a summary of the results of CA for all the subjects is presented in Table 1 for the FCV and RFCV schemes. In this table the results of the proposed methodology are compared against state-of-art works that have been developed in a similar framework using EEG data and the same database (DEAP). It can be seen that for all the works using the DEAP dataset, there is still room for improvement, since the higher results are around \(67.00 \%\). Our RFCV approach proves to obtain the higher CA for the valence dimension with \(65.73\%\) and the second higher CA for arousal dimension with \(66.00\%\).

5 Conclusions

We introduced a novel FC representation approach for feature extraction to enhance automatic emotion assessment from EEG data. To this end, the proposed strategy incorporates three well-known FC measures: coherence, correlation, and mutual information, to code the temporal variability of EEG inter-channel dependencies. Moreover, a supervised kernel-based relevance analysis based on CKA is used to evaluate the significance of each FC variability regarding the considered measures. Our approach learns both important temporal inter-channel variations and relevant FC measures to deal with inter-subject dependency in emotion classification. Validation of the proposed feature extraction, termed RFCV, is carried out in a public dataset (DEAP). Attained results demonstrate that RFCV is a reliable methodology for emotion assessment in comparison to the state-of-art works. As future work authors plan to couple RFCV with a space state strategy to deal appropriately with the intrinsic EEG nonstationarity. Besides, information theory measures could be employed to reveal connectivity variations among EEG channels.