Introduction

Emotion recognition is common research topic in multi-disciplinary fields from education to medicine. However, many studies analyse emotional EEG data mediated by large variety of stimulus where emotional states are represented by different manners. Emotional state is a complex phenomena assumed as a collection of biological, social, and cognitive components due to its modulation effects on both physiological and behavioural activities. In brain-computer-interface applications and cognitive neuroscience research, two separate models have been considered in defining the emotional state: Circumplex model confirms that emotions can be represented by two affective dimensions; the valence ranged from unpleasant to pleasant and the arousal ranged from calm to excited. The emotional states are categorized with largeness of two dimensions, arousal and valance (high/low in arousal/valance). Discrete emotion model confirms that emotions comprise limited discrete states into basic and mixed emotions (Barrett 2007). Regarding discrete emotion model, the basic emotions are anger, fear, sadness, happiness, disgust and surprise (Panksepp 2010; Barrett 2011; Ekman 2011). These two emotion models are also combined in several studies (Hamann 2012; Hu 2017; Liu 2017).

Musical sounds have been frequently used to investigate neural correlates of aesthetic emotions defined as subjectively perceived aesthetic liking or disliking (Koelsch 2006; Koelsch et al. 2008; Koelsch 2018). Although, unpleasant and pleasant music stimuli have been intended to induce negative and positive emotions, several discrete emotions such as disgust and amusement are difficult to induce by musical sounds (Koelsch 2006). Therefore, four discrete emotions that can strongly be evoked by well-scored music clips have been included in the present study. In particular, it is hard to obtain reliable indicators that differ fear and anger due to their similar dimensional scores in emotion models (Fontaine 2007; Zhao 2018). In these manners, novelty of the present study is to provide graph theoretical encoder of music induced emotions for successful classification of fear and anger.

In recent years, music is an important strategy in both modulating and regulating emotions (Hereld 2019; Hilsdorf and Bullerjahn 2021; Henry et al. 2021). The present study aims to provide quantitative indicators as a bridge between theoretical developments in emotion science and musical perception. Music perception begins with receiving acoustic information translated into neural activity in the cochlea, and progressively transformed in the auditory brainstem. Depending on musical parameters such as rhythmicty/periodicity, consonance and sound intensity, the inferior colliculi can initiate fight or flight response due to projection of music into reticular formation by dorsal cochlear nucleus (Sinex and Guzik 2003). Since the auditory pathway includes both bottom-up and top-down projections, neural impulses are also possibly projected by thalamus into both amygdala and medial orbitofrontal cortex (LeDoux 2000). Therefore, musical perception has been correlated with its emotional valance that can be defined as an opposition between negative (sad) and positive (happy) emotions (Juslin and Vastfjall 2008). In affective neuroscience, emotional valence has been considered as a score between unpleasant (0) and pleasant (9) states in 2-dimensional emotion models. Several past studies also show that musical perception is sensitive to both interaural disparities and the duration of perceiving music (Bigand 2005; Bueno and Ramos 2007; Droit-Volet 2010). In particular, music sounds scored by high valance (extremely pleasant) were found to stimulate the reward circuit of the brain (Blood 1999).

Mostly, neuroimaging modalities have been examined to understand neural mechanism of music induced emotions. Specific music rhythms associated with discrete emotions have not yet been investigated through EEG based brain network connectivity measures. Several studies showed the capability of music of eliciting both positive (pleasantness) (Juslin and Vastfjall 2008; Juslin and Sloboda 2010) and negative (unpleasantness) subjective ratings (Koelsch 2006), however, neural mechanism of music induced emotions have not been correlated with global brain network measures yet. Therefore, the motivation of the present study is to investigate the impact of short duration musical sounds on fast neural interactions across the cortex by using both statistical and spectral connectivity estimators in combination with functional network assumption of the brain.

In past studies, musical stimuli characterized by the higher (positive) valance scores were found to elicit the higher activity at left frontal regions, while the other musical stimuli characterized by the lower (negative) valence scores were found to elicit the higher activity at right frontal regions (Schmidt and Trainor 2001; Altenmüller and Schürmann 2002; Flores-Gutirréez and Díaz 2007b). In particular studies, physiological effect of music has been correlated with hemispheric asymmetry between right (FC4) and left (FC3) fronto-central locations (Jackson 2003; Dennis and Solomon 2010). In more recent study, frontal asymmetry (FC3/FC4) has been found to be correlated with pleasurable music (Arjmand 2017).

The motivation of the study is to show the graph theoretical differences between anger and fear that trigger “fight or flight” response. The methods and group comparison scheme have been configured in accordance with a bridge between neurobiology, i.e. origin of EEG series and emotional states derived from the widely projected leading neuromodulators, such as dopamine, serotonin, and norepinephrine. In comparison to anger, different additional hormones and neuromodulators are released in fear. In other words, superimposed post-synaptic potentials embedded in EEG segments can be either excitatory or inhibitory depending on neurotransmitter release during emotional perception. Accordingly, The novelty of the present study is to provide new findings reveal the followings: (1) Large-scale neuronal communication mechanism is mostly managed by arousal scores of discrete emotions indicated by brain network connectivity measures, (2) Fear is characterized by high modularity of the brain, (3) Neural connections become dense within cortices and sparse between these regions in fear represented by discrete emotion model rather than circumplex model as shown in Figure 1 where the quantitative scores (arousal and valance) are determined through Self-Assesment-Manikin (0 refers no activation, i.e. calmness, and unpleasant state, 9 refers high activation, i.e. highly excited and pleasant state). In literature, Fear and Anger are mostly assumed as the identical states placed on the same quarter defined by high arousal and low valance (Tao 2020; Cheng 2021), although they are clearly different emotions (Barrett 2012; Lindquist 2012). In the present study, four basic emotions (fear, anger, happiness, sadness) and baseline (stimulus-free resting-state) have been indicated by both functional and effective connectivity measures based on statistical cross-correlations and multi-varied causality correlations, respectively. In detail, Pearson Correlation (PC) and Spearman Correlation (SC) have been examined for estimation of functional connectivity levels, while Directed Transfer Function (DTF) and Partial Directed Coherence (PDC) have been used to compute effective connectivity levels across the cortex in response to affective sounds, i.e. music clips. Connectivity levels have been transferred to binary adjacency indices for estimation of graph theoretical network measures by using Brain Connectivity Toolbox as shown in Figure 2. Then, the contrasting groups (fear vs anger, happiness vs sadness) are compared to each other with respect to both single network index and combination of multiple network indices according to one-way Anova tests and Long-Short Term Memory Networks (LSTMNs). The effect of window size has also been considered in obtaining classification performance.

Fig. 1
figure 1

Both discrete emotion model (emotions are discrete states) and circumplex model (the states are mapped into quarters of arousal-valance space)

Both PDC and DTF provides causal inter-actions called as Granger causality between neuronal populations modeled by multi-variate Auto-regressive model (Gaxiola 2018). Therefore, DTF has been frequently examined in EEG analysis to investigate the neural mechanism underlying cognitive skills influenced by emotion regulation strategies (Ferdek 2016; Ligeza 2017). EEG signals can be considered as brain’s immediate responses to affective stimuli in real time (Bekkedal 2011). Depending on stimulus parameters, discrete emotions such as happiness, joy, anger, disgust, fear/anxiety and sadness were reportedly characterized by particular amplitude-frequency characteristics of evoked EEG series (Aftanas 2006). The other emotional features have been reported as frontal asymmetry and midline power for discrimination of two emotions marked in the same quarter of valence-arousal space (Liu 2017). Apart from these papers, Graph Theory (GT) based global connectivity approaches assume that the human brain can be modelled by a dynamic complex network comprising billions of interconnected neurons. GT based EEG analysis has been used to detect particular neuro-psychiatric diseases such as schizophrenia (Lynall 2010), stroke recovery (Grefkes 2011), and Alzheimer’s disease (Tijms 2013). In this sense, intrinsic functional connectomes can be computationally estimated from EEG measurements based on GT to understand the neural transmission mechanism in the brain. To maintain the relevance of cognitive theory in behaviour, there must be an optimum balance between integration and segregation of information flow in a well-established complex graph, i.e. the healthy brain (Bullmore 2009). In modelling the brain based on graph theory, EEG recordings or neuro-imaging modalities can be analysed to compute either functional or effective connectivity across the cortex. In advanced EEG analysis, recording electrode placements are considered as the nodes, while functional links between the nodes are assigned as the edges of a spatially embedded complex graph. In this sense, functional connectivity provides temporal correlations among the nodes, while effective connectivity provides to causal interactions across the graph in terms of quantitative dependencies for each pair of the nodes.

In emotion recognition research, both dimensional and discrete theories of emotions have been used in mostly neuroimaging studies (Phan 2002; Murphy 2003; Vytal 2010). In line with them, the motivation of the present study is to determine the extent to which basic emotions (fear, anger, happiness, sadness) are associated with distinguishable global brain connectivity measures based on GT driven by surface EEG recordings instead of fMRI or PET slices. Regarding brain images, basic emotions are found to be characterized by consistent and distinct neural correlates providing discrimination of an emotional state from the other in pairwise contrasts where the differences are provided by statistical permutation tests (Murphy 2003; Vytal 2010). So, statistical Anova tests have been used to measure the differences between discrete emotions induced by acoustic sounds in terms of functional brain connectivity metrics assuming the brain can be modeled by a complex network based on Graph Theory in the present study. Besides, network based deep learning models have also been used to classify emotional states in order to promise for revealing how the brain functions can be represented by discrete emotion model. In particular, our main motivation is to show the clear difference between anger and fear with respect to specified brain connectivity measures and frequency band interval of surface EEG recordings. Therefore, four basic emotions and resting-state are identified by large number of quantitative features in terms of frequency band specific global network indices (Clustering Coefficients, Local Efficiency, Global Efficiency, Modularity, Transitivity, Assortativity) estimated by using four different methods; PC, SC, PDC and DTF in accordance with two different thresholds (the mean value and 60% of max. value in dependency matrix) into shorter (2 sec) and longer (6 sec) EEG segments. Two-states (an emotional state vs another emotional state, an emotional state vs baseline, i.e. resting-state) are classified with LSTMN in Matlab2020Rb. Brief description of the methods and data are given in following sections. The results are discussed from both methodological and neuroscience points of view in last section.

Method

In order to obtain brain network measures, four different cortical dependency approaches were used. The performances of them were compared to each other to observe differences between two-states and to investigate sensitivity of the network measures to musical perception depending on stimulus parameters. The algorithmic principle of the study is summarized in Figure 2. Regarding this graphical abstract of the method, elements of connectivity matrix, \(C_{19x19}\) are estimated by using four metrics as Pearson Correlations, Spearman Correlations, PDC and DTF. The basic and common principle in obtaining the elements in the connectivity matrix is to compute cross-correlations and cross-coherence between electrode pairs by using statistical and Granger causality based connectivity estimators, respectively. Since the resulting estimations are independent the order of electrode pairs, the resulting connectivity matrix will be a symmetric matrix with an identical elements in both lower and upper triangles. Well defined threshold is applied to each individual connectivity matrix originated from short segments in between electrode pairs of interest. Then, connectivity matrix is transformed into binary adjacency matrix with respect to well defined threshold in order to compute global connectivity measures from the resulting binary version of connectivity matrix. The identical procedure has been implemented for each electrode pair in terms of short segments. In summary, global connectivity measures have been computed six times in accordance with a trial of 12 sec when segmentation length is 2 sec \((12~=~6x2)\) for each individual.

Fig. 2
figure 2

Graphical abstract of the study: This procedure has been performed for both longer (6 sec) and shorter (2 sec) EEG segmentation

EEG data with accession number of ds002721 was downloaded from a free and open platform (openneuro.org). Data acquisition principles and the parameters of acoustic stimuli as well as participants’ demographic info are introduced in references Daly (2014, 2015). The individuals are healthy adults each listened to acoustic sounds selected from an auditory stimulus dataset introduced in reference Eerola (2010). The number of acoustic stimuli was 40. Each one was presented to the participants for 12 sec. During presentation of acoustic stimuli, the participants were instructed to look at the computer screen and listen to the music without body movements. Regarding each presentation, the participants scored the stimulus along 8 axes with 5-point Likert scale in dataset owners (Daly 2014, 2015). However, affective scores of the music clips were considered as emotional scores of the stimuli in the present study, since each acoustic stimulus was correlated with an emotional state in accordance with arousal-valance dimensions graded by 116 participants as clearly presented in reference Eerola (2010). In particular, both moderate and high scores of stimuli were commonly correlated with emotion of interest. Then, artifact-free EEG recordings in response to short music clips were categorized into four emotional states as Fear, Anger, Happiness and Sadness. In addition, stimulus-free resting-state EEG recordings were also analyzed where the number of trials was identical to that of emotional recordings as listed in Table 6 as appendix. The age range was between 18-63 years in participants (median=35) in dataset where 21st subject was not included due to age above 65. In deciding exclusion of the age is equal and higher than 65, we have followed the international well-known principles published by American Psychological Association (APA) in defining age-groups such as 15–47 years old (young group), 48–63 years old (middle age group) and \(\ge\) 64 years old (elderly group) (American Psychological Association 2001).

Emotional stimuli

EEG data and emotional stimuli were introduced in reference Daly (2020). Emotional stimuli were chosen from music excerpts and computer-generated music clips. The corresponding short music clips of 12 s presented to the listeners during EEG recording were chosen among 110 films rated by 116 non-musicians as described in reference Eerola (2010). The film clips used as emotional stimuli in the present study are available on Open Neuro archive on

https://doi.org/10.18112/openneuro.ds002721.v1.0.1.

The film clips were labeled by specified emotional states in accordance with the self-ratings of the listeners in Eerola (2010), however, the new participants were also asked to rate their emotions with respect to randomly-ordered Likert questions that refer pleasantness, energy, sadness, anger, tenderness, happiness, fear, and tension in Daly (2014, 2015).

EEG data acquisition and preprocessing

Brain Products BrainAmp EEG amplifier (Brain Products, Germany) was used to collect surface EEG series with 19 recording electrodes placed on scalp surface in accordance with the international 10/20 electrode placement system (Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2) (Daly 2014, 2015). The sampling frequency was 1000 Hz. Raw data was analyzed into trials of 12 sec. Each trial was divided into non-overlapped short segments of 6 sec and 2 sec in order. In analysis, each segment was firstly filtered by well arranged Finite Impulse Response (FIR) filters in order to extract specified EEG frequency components as follow: full-band:\(0.5-60.5~Hz\), delta:\(0.5-4~Hz\), theta:\(4.5-8~Hz\), alpha:\(8.5-12.5~Hz\), beta:\(13-31~Hz\), gamma:\(31.5-60.5~Hz\).

Regarding our hypothesis, we analysed EEG series mediated by musical sounds. In literature, music processing is assumed to be lateralized at mostly right hemisphere owing to requirement of high cognitive musical functions such as musical attention, and the tracking of harmonic structure in time. In particular, it has been demonstrated that listening pleasure music causes to produce certain neurotransmitters in amygdala. The recent studies present common results matched to the past findings in full-band EEG frequency intervals in response to musical sound (Levitin 2012). Apart from full-band frequency interval in EEG recordings, EEG frequency sub-bands have also been mentioned as EEG rhythms that increase or decrease depending on both psycho-physiological and mental states such as awake, sleep, coconsciousness, emotional arousal, etc. Although slow rhythms (delta and theta) are mostly observed in deep sleep and cerebral damage, these slow rhythms align to the regularities of speech. The tracking of naturalistic sounds has been associated with how acoustic sounds are encoded in terms of neuro-electrical dynamics in the brain (Bröhl and Kayser 2021). Besides, both alpha and high beta as well as gamma sub-bands have been found to be sensitive to emotional brain functions induced by short music clips (Bo 2019). Therefore, GT based brain network measures have been estimated in five distinct EEG sub-bands in addition to full-band EEG frequency intervals in the present study.

Pearson and spearman correlations

Pearson Correlation (PC) is a linear method that measures the correlation between two nodes, i.e. two EEG electrodes in form,

$$\begin{aligned} r_{XY}=\frac{E(XY)-E(X)E(Y)}{\sigma _{X} \sigma _{Y}} \end{aligned}$$
(1)

where E(.) is the expected value operator and \(\sigma _{X}\) and \(\sigma _{Y}\) are the standard deviation values (Wang 2018). The series that will be analyzed contains N data samples, and accordingly the coefficient was computed as such,

$$\begin{aligned} r_{XY}=\frac{1}{N-1}\sum _{i=1}^{N}\left( \frac{x_{i}-{\bar{x}}}{\sigma _{X}}\right) \left( \frac{y_{i}-{\bar{y}}}{\sigma _{Y}}\right) \end{aligned}$$
(2)

Spearman Correlation (SC) is a linear method of estimation that demonstrates

$$\begin{aligned} \rho =\frac{\sum _{i} R_{x,i} R_{y,i}-C}{\sqrt{\left\{ \left[ \sum _{i} (R_{x,i})^{2}-C\right] \left[ \sum _{i} (S_{y,i})^{2}-C\right] \right\} }} \end{aligned}$$
(3)

where \(R_x\) and \(R_y\) refer to rank variables of x and y, respectively (Sprent 1988). Since the present study is motivated to un-directed and weighted network assumption in global connectivity analysis, absolute values of both PC and SC are examined in tests.

Directed transfer function and partial directed coherence

In the present study, both Direct Transfer Function (DTF) and Partial Directed Coherence (PDC) were examined by using an open source Electrophysiological Connectome (e-Connectome-2-full) software toolbox described in reference (He 2011). Time series are represented by Multi-Variate Adaptive Autoregressive Model (MVAAM) introduced in reference Schllögl (2002) for implementation of e-connectome toolbox that is supported by two additional toolboxes such as the Regularization Tools introduced in Hansen (2007) and ARfit software package introduced in references Neumaier (2001); Schneider (2001). In applications, MVAAM coefficients of EEG segments denoted by \({\textbf{s}}\)) were estimated through a stepwise least squares estimation algorithm where the model order (p) is optimized by using Schwarz’s Bayesian Criterion according to ARfit software package where the maximum order was set to 6 in form,

$$\begin{aligned} {\textbf{s}}(n)= \sum _{j=1}^{p}{\textbf{A}}(j){\textbf{s}}(n-j)+{\textbf{w}}(n) \end{aligned}$$
(4)

where \({\textbf{w}}\) refers a white noise with zero mean. Regarding Figure 2, EEG segments have been modeled by MVAAM coefficients symbolized by \({\textbf{A}}\) for each participants in every EEG segment where the number of samples is equal to sampling frequency (Hz) times the length of segments (sec). In order to estimate interactions between EEG recording sites across the cortex, a set of EEG segments simultaneously measured is described in form,

$$\begin{aligned} {\textbf{S}}= \left[ {\textbf{s}}_{1}(n) ~{\textbf{s}}_{2}(n)~\cdots {\textbf{s}}_{N}(n) \right] \end{aligned}$$
(5)

in accordance with N-channel EEG recordings (\(N=19\)) characterized by MVAAM coefficients (Schllögl 2002). Then, we can consider a matrix, \({\textbf{A}}\) including model coefficients of EEG segments simultaneously measured from scalp surface as follow,

$$\begin{aligned} {\textbf{A}}_{i,j}(\lambda )= & {} 1-\sum _{r=1}^{p} a_{ij}(r) \exp ^{-j 2\pi \lambda r} ~~ if ~~i=j \end{aligned}$$
(6)
$$\begin{aligned} {\textbf{A}}_{i,j}(\lambda )= & {} -\sum _{r=1}^{p} a_{ij}(r) \exp ^{-j 2\pi \lambda r} ~~ otherwise \end{aligned}$$
(7)

In fact, \({\textbf{A}}\) presents the linear relationship between recording channels, i.e. short time series \({\textbf{s}}_{i}\) and \({\textbf{s}}_{j}\). In other words, \({\textbf{A}}\) includes the lagged effect of the model coefficients belonging to jth recording channel on the ith recording channel. The corresponding model coefficients can be transformed in frequency domain to obtain a time-varying transfer matrix denoted by \(\mathbf {\gamma }\) in form,

$$\begin{aligned} \mathbf {\gamma }_{ij}(\lambda )= \frac{{\textbf{H}}_{ij}(\lambda )}{\sqrt{{\textbf{h}}_{i}^{H}(\lambda ){\textbf{h}}_{j}(\lambda )}} \end{aligned}$$
(8)

where \({\textbf{h}}_{j}\) denote the entries of \({\textbf{H}}(\lambda )\) given by,

$$\begin{aligned} {\textbf{H}}(\lambda )={\textbf{A}}^{-1}(\lambda ) \end{aligned}$$
(9)

where \({\textbf{H}}(\lambda )\) denotes the inverse of the frequency domain transformed model coefficients, \(^{H}\) denotes the Hermitian transpose. Actually, cross-spectral density matrix of \({\textbf{S}}\) can be represented by,

$$\begin{aligned} {\textbf{S}}(f)={\textbf{H}} \Sigma {\textbf{H}}^{H}) \end{aligned}$$
(10)

where \(\Sigma\) is a covariance matrix including entries of \(\sigma _{ij},~i=1,...,19,~j=1,...,19\) where \(\sigma _{ij}=0\) for \(i \ne j\). Then, generalized directed coherence is defined in form,

$$\begin{aligned} \gamma _{ij}(f)= \frac{ \sigma _{jj} {\textbf{H}}_{ij}(f)}{\sqrt{{\textbf{S}}_{ii}(f})} \end{aligned}$$
(11)

in accordance with fundamental knowledge in literature ?. Regarding particular frequency of f in directed coherence, \(| \gamma _{ij}(f)^{2} |\) refers the fraction of power contribution between short EEG segments recorded from two locations, i and j (Baccala 2001). Due to restriction about entries in matrix \(\Sigma\), DTF is defined by,

$$\begin{aligned} \textbf{DTF}_{ij}(f)= \frac{{\textbf{H}}_{ij}(f)}{\sqrt{ \sum _{j=1}^{N} | {\textbf{H}}_{ij}(f) |^{2}}} \end{aligned}$$
(12)

The resulting effective connectivity means causal influence of any EEG recording channel on another in specified frequency range (Blinowska 2006). The PDC has been defined as Fourier Transform of MVAAM coefficients (Baccala 2001). Thus, frequency domain connectivity matrix, describing both strength and direction of information flow between EEG segments simultaneously measured from scalp surface, is estimated in form,

$$\begin{aligned} \textbf{PDC}(f)=\frac{{\textbf{A}}_{ij}(f)}{\sqrt{{\textbf{a}}_{j}(f) {\textbf{a}}_{j}^{*}(f)}} \end{aligned}$$
(13)

where \(*\) denotes the transpose and complex conjugate operation. PDC takes values between 0 and 1 due to normalization. PDC shows only direct flows between neural populations, since it’s normalization shows a ratio between the outflow from one electrode placement as source (j) to the other as sink (i). Thus, PDC emphasizes rather the sinks, not the sources unlike DTF.

Graph theoretical brain network indices

For all emotional states, including the resting states that were taken from the first trial of each run for each participant, six brain network indices were calculated in total. Later in the study, One-Way ANOVA and deep leanring analyses were conducted for all of these indices, in order to demonstrate which analysis method, frequency band, and brain connectivity index was better at differentiating the different emotional states, based on the EEG data that was collected. The averaged clustering coefficients (CC) is a connectivity index that is associated with the number of edges connected to a specific node, which can in turn indicate the importance of that specific node (Wang 2018)). It is representative of local structure and is estimated to be a measure of resilience to random error (Stam 2007). The clustering coefficients denoted by \(C_i\) of a vertex i with degree \(k_i\) is usually defined as the ratio of the number of existing edges \(e_i\) between neighbours of i, and the maximum possible number of edges between neighbours if i, in which a vertex is called a neighbour of i when it is connected to it by an edge. Both CC and local efficiency (LE) are network segregation measures. Global efficiency (GE) can measure the global transmission ability of networks (Tian 2019). It is classified as an index that is related to the integration of the brain connectivity network. Modularity is a complex network index correlated with how well a network can be partitioned into communities (or clusters) (Supriya 2016). Transitivity (T) is a brain connectivity measure that is constructed according to the number of triangles in the network. It is calculated as the fraction of the node’s neighbors that are also neighbors of each other (Miraglia 2018). The assortativity provides the correlation between the degrees of all electrodes on two opposite ends of a link. Several binarization methods have been applied to functional connectivity matrices to remove the weak or spurious connections in graph theoretical network. In former studies, several thresholding approaches have been proposed for comparison of the groups represented by graphs with the identical number of connections per node (Sporns 2004; Stam 2007). Besides, both pre-defined absolute thresholds and the maintenance of a specific ratio of the max value have also been used in applications (Stam 2007; Rubinov 2009). In some latter studies, the significant connections are chosen by using binarization approaches such as Cluster Span Threshold (Smith 2015), Minimum Spanning Tree (Rai 2015) and Efficiency Cost Optimization threshold (Fallani 2017). Another binarization method is to built a range of threshold by criteria assuring a graph having shorter characteristic path length and a larger average CC compared to random network with the same degree distribution (Yin 2017). The choice of threshold directly effects the range of network measures that represent different graphs, where the existing connections are transformed into binary numbers (van Wijk 2010). Due to unstable and incompatible results originated from coupling methods and binarization approaches in comparing healthy individuals with the patients with major depression disorder based on graph theoretical EEG analysis (Sun and Li 2019b), the proportional thresholding has been applied to connectivity estimations with respect to the mean value and \(60\%\) of the max value in them as proposed by the contributors of BCT in references Stam (2007); Rubinov (2009). Regarding BCT introduced by Rubinov and Sporns (2010), the Matlab function, ‘\(threshold_proportional.m\)’ was examined in order to transform connectivity estimations into binary numbers. Once obtaining binary adjacency matrix, scalar network measures are computed by using BCT in MatlabR2021b. Among these measures, mathematical expressions of LE and CC are given by,

$$\begin{aligned} LE~= & {} ~\frac{1}{N}\sum _{i=1}^{N}GE \left( G_{i}\right) \end{aligned}$$
(14)
$$\begin{aligned} CC~= & {} ~\frac{1}{N}\sum _{i=1}^{N}\frac{2t_{i}}{k_{i}k_{i-1}}\delta _{mi,mj} \end{aligned}$$
(15)

Here, \(G_{i}\) refers the subgraph formed by small number of connected nodes where i refers the node and \(t_{i}\) refers the number of triangles around the node. \(k_{i}\) is used to indicate the network degree meaning of the number of links connected to a node, i. When two nodes (i and j) are neighbors, the corresponding connection status will be assigned as \(a_{ij}=1\) meaning existence of a link between i and j. Conversely, lack of link is presented with \(a_{ij}=0\). Therefore, the number of triangles around a node (i) is defined as follows,

$$\begin{aligned} t_{i}= \frac{1}{2} \sum _{j,h=1}^{N} \left( a_{ij} a_{ih}a_{jh} \right) \end{aligned}$$
(16)

Regarding functional modules of brain network in terms of cortical regions, the strength of their division is quantified by modularity index (Q) in from,

$$\begin{aligned} Q=\frac{1}{l} \sum _{,ji=1}^{N}\left( A_{ij}-\frac{k_{i}k_{j}}{l}\right) \end{aligned}$$
(17)

Here, \(l=\sum _{i,j=1}^{N}A_{ij}\) is the number of edges, mi refers the module where \(\delta _{mi,mj}=1\) if \(mi=mj\) and \(\delta _{mi,mj}=0\) otherwise. Then, assortativity (r) is defined by,

$$\begin{aligned} r=\frac{l^{-1} \sum _{i, j} k_{i} k_{j}-\left[ l^{-1} \sum _{i,j} \frac{1}{2}\left( k_{i}+k_{j}\right) \right] ^{2}}{l^{-1} \sum _{i,j} \frac{1}{2}\left( k_{i}^{2}+k_{j}^{2}\right) -\left[ l^{-1} \sum _{i,j} \frac{1}{2}\left( k_{i}+k_{j}\right) \right] ^{2}} \end{aligned}$$
(18)

The weighted path transitivity provides the density of local triangles along the shortest-paths between all pairs of EEG electrodes in form,

$$\begin{aligned} T=\frac{\sum _{i \in N}2 t_{i}^{w}}{\sum _{i \in N} k_{i}\left( k_{i}-1\right) } \end{aligned}$$
(19)

where \(t_{i}^{w}\) denotes the weighted geometric mean of triangles around node i (Rubinov and Sporns 2010).

Statistical tests and deep learning classification of brain network measures

In applications, frequency band specific graph theoretic indices (CC, GE, LE, Q, r, T) were defined as the emotional features. In estimating an index, shorter (2 sec) and longer (6 sec) EEG segments were analyzed by using functional and effective connectivity approaches and then, the resulting connectivity matrices were transformed into binary adjacency matrices with respect to specified thresholds (1st: %60 of the max value, 2nd: the mean value) that were used in EEG research, finally Brain Connectivity Toolbox (BCT) functions were applied to binary connectivity estimations as graphically summarized in Figure 2. Statistical one-way Anova tests and Long-Short-Term Memory Networks (LSTMNs) were used to compare two-states listed as follow: F-A: fear vs anger, R-F: resting-state vs fear, R-A: resting-state vs anger, H-S: happiness vs sadness, R-H: resting-state vs happiness, R-S: resting-state vs sadness.

Then, computational and algorithmic steps of the study are as follow: (1) EEG series were analyzed into larger segments of 6 sec in order to compare the performances of four functional dependency approaches (PC, SC, PDC, DTF) for estimation of graph theoretic brain network indices in full-band intervals with respect to 1st and 2nd thresholds. Both PC and PDC provided the meaningful (\(p<0.05\)) and significant (\(p<<0.05\)) statistical differences between groups in accordance with the 1st threshold. The statistical p-values were listed in Table 1. (2) Regarding larger segments of 6 sec, the performances of PC and PDC were applied to five frequency band intervals (delta, theta, alpha, beta, gamma) in discriminating the groups from each other by means of graph theoretic brain network indices with respect to 1st threshold. The corresponding p-values were listed in Table 2. (3) EEG series were analyzed into shorter segments of 2 sec in order to compare the performances of PC and PDC, for estimation of graph theoretic brain network indices (CC, GE, LE, Q, r, T) in full-band intervals with respect to 1st and 2nd thresholds. PDC provided significant (\(p<<0.05\)) statistical differences between more emotional groups with respect to more network indices in comparison to the results obtained for larger segments. In particular, relatively better results were observed when the 1st threshold was considered in estimations. The corresponding p-values were listed in Table 3. (4) The former (3rd) step was performed into frequency band intervals in accordance with the 1st threshold. The corresponding p-values were listed in Table 4. (5) The groups were classified by using well structured deep learning model, LSTMN with respect to graph feature sets estimated by using PC and PDC into shorter segments of 2 sec in accordance with the 1st threshold. The corresponding classification accuracy results were listed in Table 5. (6) Statistical box-plots of six network indices, estimated by using PC and PDC with the 1st threshold into five distinct frequency band intervals, were all shown in figures.

In deep learning classifications, frequency band specific emotional network measures were labeled by discrete emotional states regardless the subjects for subject-independent classification. In other words, the instants were classified rather than the subjects with respect to discrete emotional states where the groups, i.e. discrete emotions were classified with respect to different feature sets from FS-2 to FS-7 including six network indices estimated from EEG full-bands and specified band intervals of Delta, Theta, Alpha, Beta and Gamma, respectively where the 1st threshold was applied to connectivity estimations across shorter EEG segments of 2 sec. In particular, these sets were all combined in the largest feature set, FS-1. The dimension of the features was commonly 6 due to number of measures (CC, GE, LE, Q, r, T) in feature sets. The number of features varied in emotional states due to different number of artifact-free trials of 12 sec in experiments as listed in Table 6. In detail, total number of trials were 50, 34, 25, 55 and 60 in anger, fear, happiness, sadness and res-ting-states. So, the number of features was equal to the number of trials in frequency-band specific sets from FS-2 to FS-7, while summation of them is valid in FS-1 in accordance with the longer segments. In case of shorter segments, the number features were increased to six times of them.

In defining LSTM network architecture, MATLAB functions provided for sequence classification were implemented. The input size of sequences (feature arrays) was assigned as the number of frequency specific (full-band, Delta, Theta, Alpha, Beta and Gamma) network indices. A bidirectional LSTM layer was specified with 30 hidden units. Finally, two classes were specified by a fully connected layer of size 2, followed by a softmax layer and a classification layer as explained in reference Kudo (1999). In MATLAB, the core components of an LSTM network are a sequence input layer and an LSTM layer that can learn long-term dependencies between instants sorted in the feature array.

The groups were classified in two-class classification manner by using LSTMNs driven by subject-independent instants. 5-fold cross-validation was used to obtain averaged classification performance results. In detail, the instants, i.e. the features were divided into 5 equal sized sub-sets (n/5) and then one sub-set was tested according to training of the remaining features (4n/5) where the total number of the features is denoted by n in every classification step. LSTMNs were implemented by using Deep Learning Toolbox in Matlab2019Ra. Adaptive Moment Estimation Method (Adam) and categorical cross entropy loss function were implemented to optimize the parameters as follow: the learning rate drop period =14, gradient threshold =0.5, the hidden layer size =30, the dimension of features =6, verbose =1.0, the learning rate schedule was ’piecewise’.

Results

The main motivation of the present study is to provide reliable connectivity measures in detecting fear and anger despite their placement in the same arousal-valance quarter in circumplex emotion model. Thus, statistical differences between emotional states were computed with respect to all possible parameters such as network index, EEG frequency band interval, hemispheric correlation/dependency method, binarization threshold. The corresponding statistical test results were summarized in Table 1, in accordance with connectivity approaches, segmentation and two different thresholds considered in binarization. In tables, \(*\) denotes \(p<<0.05\).

Both PC and PDC provided more useful statistical findings referred by meaningful differences \((p\le 0.05)\) in three comparisons as \(F-A\), \(R-F\), \(R-A\) in accordance with the 1st threshold where full-band frequency intervals of longer EEG segments were analysed. In particular, the clear functional differences between Fear and Anger were captured by using both PC and PDC in terms of CC and LE, while neither SC nor DTF provided clear differences between these two states. Therefore, both PC and PDC have also been applied to well-known EEG sub-bands as listed in Table 2.

Table 1 One-way Anova test results with respect to full-band specific network measures according to methods and thresholds in longer segments (6 sec) (* refers \(p\ll 0.05\))

In more detailed analysis, the performance of PC and PDC has been compared to each other with respect to emotional states into EEG sub-bands regarding Table 2. Thus, PDC provided clear statistical differences between Fear and Anger in Alpha and Beta frequency band intervals in accordance with the 1st threshold in longer EEG segments, while no any difference was shown between Happiness and Sadness. Except GE, each network index provided meaningful statistical differences between resting and emotional states in Theta-band specific estimations. Modularity index, Q provided statistical differences between resting and emotional states in Gamma sub-band specific estimations through PC with the 1st threshold. Meaningful statistical differences were obtained in Delta-band specific estimations through PC with the 1st threshold in all two-state comparisons such that CC and LE were capable of discriminate emotional states from resting-state, while Q and T were capable of discriminate two emotional states from each other. In summary, no superior results were observed in sub-band specific estimations in comparison to full-band specific estimations on longer EEG segments (listed in Table 1).

Table 2 One-way Anova test results with respect to frequency-band specific network measures according to PC, PDC with 1st threshold in longer segments (6 sec) (* refers \(p\ll 0.05\))

Regarding Table 3, both PC and PDC provided statistical differences between resting and emotional states with respect to each brain network index estimated from full-band shorter EEG segments in accordance with the 1st threshold. PDC provided statistical differences between two different emotional states were also observed in three indices of LE, Q, T estimated from full-band shorter EEG segments in accordance with the 1st threshold. However, the methods did not produced statistical difference between resting state and Fear in full-band shorter EEG segments.

Table 3 One-way Anova test results with respect to full-band specific network measures according to thresholds in shorter segments 2 sec (* refers \(p\ll 0.05\))

Regarding Table 4, meaningful statistical differences were obtained in both Delta and Alpha as well as Gamma sub-band specific estimations through PDC with the 1st threshold on shorter EEG segments in all two-state comparisons with respect to all brain network indices.

Table 4 One-way Anova test results with respect to frequency band specific network measures according to 1st threshold in shorter segments 2 sec (* refers \(p\ll 0.05\))

Since the most useful results were obtained in analysis of shorter EEG segments by using PC and PDC with the 1st threshold, all two-state statistical comparisons were performed by using deep learning model so called LSTM driven by seven graph theoretic feature sets described in the beginning part of this section. The corresponding results, listed in Table 5 were compatible with statistical p-values listed in Table 4. In both training and testing steps, the most successful classification performances were obtained by combining all network indices estimated from both full-band and sub-band frequency intervals of shorter EEG segments through PDC with the 1st threshold where CA higher than and about \(80\%\) was considered as successful. In particular, FS1 provided the highest performance for classification of secondary uncomfortable emotions as Fear and Anger that are characterized by physiological reactions called as ’fight or flight response’, while F7 provided the highest performances for classification of opposite basic emotional states as Happiness and Sadness.

Table 5 Classification performances (CA (%) driven by feature sets (FS) according to 1st threshold in shorter segments of 2 sec

Figure 3 showed that, statistical distributions of Gamma-band specific modularity and transitivity estimations were clearly sensitive to emotional states. In particular, the highest modularity estimations were observed in resting-state, while the lowest modularity levels were observed in Anger. In particular, Fear produced the higher modularity in comparison to other discrete emotions. The lowest levels in both CC and transitivity estimations were observed in Happiness. The lowest levels in CC, LE and assortativity estimations were commonly observed in Happiness. There were many outliers in each state in GE estimations. Statistical distribution intervals of each state were clearly distinct in both modularity and transitivity estimations.

Fig. 3
figure 3

Statistical box-plots in brain network indices estimated from Gamma band intervals of shorter EEG segments (2 sec) by using PDC with the 1st threshold

Discussion

Due to functional integration of different cortical areas during emotional perception of musical sounds, the researchers have studied on brain connectivity approaches (Flores-Gutirréez and Díaz 2007b; Karmonik and Brandt 2013) in order to understand the neural mechanism of musical perception at system level. In addition to aspect of determining the most useful domain and the method in estimating brain connectivity, another aspect is the sensitivity of frequency sub-bands superimposed in EEG mediated by musical sounds. In the present study, statistical cross-correlation has been estimated to measure statistical, linear and undirected time-domain correlations between EEG segments in accordance with both PC and SC. In fact, both PC and SC are statistical measures that show the degree to temporal variations in an EEG segment in relation to simultaneous variations in another EEG segment. In other words, statistical correlation coefficients express the level to which short-duration neuro-electrical activities are linearly dependent to each other. In comparison to these statistical similarity measures, we have also examined two multivariate estimators, DTF and PDC based on the Granger causality principle in estimating effective connectivity, i.e. directed functional connectivity between EEG segments modeled by MVAAM coefficients.

In the present study, secondary, performances of functional and effective connectivity methods are compared to each other with respect to statistical logistic regression modeling to obtain robust features, i.e. frequency specific graph theoretic brain network measures sensitive to both uncomfortable and basic emotions, triggered by short-duration acoustic sounds. Rather than the methods, a deep learning model, LSTMN is also used to investigate the impact of frequency interval in estimating network indices according to threshold design in transforming adjacency matrices for classification of the states (listed in Sect. 2.6).

Among four connectivity approaches, PDC provided the best results frequency band specific network measures estimated from short segments of 2 sec with respect to first threshold (\(60\%\) of max value in connectivity matrix) for classification of contrasting emotional states where Fear and Anger can be classified with useful CA between \(82.25\%\) and \(88.48\%\) in Theta, Alpha and Beta sub-bands. When the features extracted from each EEG sub-band were combined in an extended feature set, the highest CA of \(91.79\%\) was obtained for classification of two uncomfortable emotions, Fear and Anger. PDC also provides the best results for classification of Happiness and Sadness in faster EEG rhythms so called Alpha, Beta, Gamma sub-bands, while the better performance results are observed in Theta, Alpha and beta sub-bands for classification of Anger and Fear. Regarding emotion model shown in Fig 1, it is clear that Happiness and Sadness are placed in contrasting quarters of arousal/valance dimensions, while Fear and Anger are both placed in an identical quarter of that. Meanwhile, the highest CA of \(96.67\%\) was obtained classification of two contrasting emotions, Happiness and Sadness in overall experiments.

Overall results reveal that acoustic sounds can induce emotional states in very short time interval (2 sec) due to fast neurotransmitter activities due to influence of both tempo and rhythmic units in sounds. In the brain, emotional circuits are formed as a tree-like structure including roots and lines in subcortical areas, and interactive branches in cortical regions. Perception of acoustic sounds and cognition of emotional states access the higher reaches of emotional circuits through auditory cortex into the amygdala, frontal and parietal inputs into limbic system including basal ganglia, nucleus accumbens, cingulate cortex. Thus, overall results reveal that Graph Theoretical network measures can quantify the differences in neurotransmitter driven cortical activities between discrete emotions.

Conclusion and future work

From methodological point of view, the most meaningful results are obtained by using PDC where PC can also provide useful results in discriminating emotional states induced by fast-paced and emotionally stimulating acoustic sounds. Unlike DTF, PDC is normalized to show a ratio between the outflow from channel j to channel i to all the outflows from the source channel j, so it emphasizes rather the sinks, not the sources. Therefore PDC is found to be superior to DTF in the present study. In detail, PC measures statistical association between two EEG segments based on covariance showing the degree to which the amplitude change in an EEG segment is relative to the amplitude change in another EEG segment, while PDC measures the normalized relative coupling strength of frequency domain interactions from the source of an EEG segment to the others across the cortex. Besides, the common technical outcome of PC and PDC is to provide normalized results lie between 0 and 1. In depth, PDC can quantify immediate directional coupling between neural populations whereas DTF describes the existence of directional signal propagation even if neural info travels through intermediate pathways rather than through an immediate direct causal influence path. From computational point of view, DTF estimates causal influence of an EEG recording channel on the other one at particular frequency, while PDC provides only direct flows between them. Regarding DTF, the resulting connectivity levels lie between 0 and 1 producing a ratio between the inflow (from a channel to a particular channel) to all the inflows to a particular channel. In contrast to DTF, the resulting estimations show a ratio between the outflow from a source channel to the particular channel to all the outflows from the source channel in examining PDC. Therefore, PDC provided better results for emotion recognition from EEG recordings based on GT, while DTF was found to be useful for detection of marginal variations such as local seizure (Franaszczuk and Bergey 1994), sleep stages (Kaminski and Blinowska 1997) in past.

Several neuropsychiatric disorders cause both functional and structural network parameters of segregation and integration to change (Mu 2018). Dysfunctional brain connectivity can be modeled by a loss of small-world organization of the brain, if healthy connectome is correlated with balanced neural communication across the cortex. By means of EEG recordings, graph theoretic functional brain parameters are originated from superposition of excitatory and inhibitory post-synaptic potentials that are caused by neurotransmitter release. The healthy brain integrates a wide range of incoming external stimuli through binding the spontaneous complex stream of information into cortical regions assumed to be subsystems of a complex network (Cohen 2016). So, dynamically reconfigures of emotional perception relies not only on independent processing of stimuli in particular cortices (segregation) but also on global cooperation between these regions (integration). Several studies show that functional brain capacity can be measured by both segregation and integration measures such that the higher segregation is mostly linked to simple motor tasks, while the higher integration is mostly linked to cognitive loads (Fransson 2018; Fong 2019; Finc 2020). However, it remains a great challenge to understand how the brain’s emotional configurations are supported by segregation and integration measures. In experimental cognitive neuroscience, several studies showed that the ratio between excitatory and inhibitory neural activities remains balanced at pyramidal neurons over both time and space (Haider 2006; Okun 2008). The more recent study reveals that emotional brain functions have been sustained through functional connectivity supported by a balanced ratio of excitation to inhibition originated from neurotransmitter release in response to audio-visual and affective stimuli (Kılıç 2022). In the current study, we employed six network measures to quantify both segregation and integration in response to music clips over shorter time lengths.

Many neurotransmitters serve neural communications between cortical nerve cells in order to regulate mood and emotional states. Among them, dopamine, serotonin, endorphins, and oxytocin mediate well and happiness. Besides, low levels of norepinephrine, serotonin, and dopamine are associated with negative mood and unpleasant states. Apart from arousal-valance scores of contrasting emotional states (Fear vs Anger, Happiness vs Sadness), characteristic variations in post-synaptic potentials triggered by particular neurotransmitters might be more distinct in between Happiness and Sadness in comparison to Fear and Anger. Thus, this neurobiological facts might be main factor leading to the high accuracy of 96% in classifying Happiness and Sadness in Gamma sub-band, despite the fact that the unpleasant feelings, Fear and Anger can be distinguished with the less CA.

According to computational and behavioral neuroscience, affective perception is usually accompanied by changes in high-frequency EEG series, i.e. gamma sub-bands (Boucher 2014; Aydın 2018; Yang 2020). Our results also proved that gamma-band specific brain network measures were closely relevant to discrete emotions. Emotion can be considered as a high-level cognitive function that requires the re-configuration of multiple brain regions in response to external stimuli. So, the relationships and information interactions among the cortices have been detected in high frequency components of EEG series mediated by affective music clips in the present study. Further, anger is found to be characterized by high transitivity and low modularity in Gamma-band activities.

The stable neural patterns have shown across the individuals in between negative and positive emotions induced by different video clips of 1 min in references Zheng and Zhu (2017); Li and Liu (2019). However, Fear and Anger can not be considered as distinct emotional states due to particular emotional labeling principle of four quadrants assigned with low arousal/low valence, high arousal/low valence, low arousal/high valence, and high arousal/high valence (Zheng and Zhu 2017; Li and Liu 2019). We have used discrete emotional model to investigate the neural dynamics underlying both Fear and Anger triggered by musical sounds of 12 sec. Besides, the recent studies have also shown the usefulness of Granger causality included by connectivity estimations to classify video clips into pleasant, neutral and unpleasant quadrants of arousal/valance dimensions (Li and Zheng 2018; Chen and Miao 2021). In more details, EEG based brain connectivity analysis has also been successfully examined to quantify the interactions between limbic system and motor cortex during emotional expressions induced by video clips (Li and Li 2020). In conclusion, music clips can exactly induce basic and discrete emotional states in very short time period such as 2 sec depending both rhythmicity and tonality of excerpt in comparison to presentation of longer duration video clips mapped on arousal/valance quadrants. Functional connectivity estimations can provide to insight the hierarchial neurodynamics at both modular and system levels into EEG frequency sub-bands.

Regarding GT based global connectivity estimations, the important parameter is threshold that influence the resulting measures. In analysis of fMRI data, this issue has been found to be the leading factor for investigation of brain networks where the optimum threshold is determined empirically between 0.2 and 0.3 (Bordier et al. 2017). Several binarization methods have been used in combination with phase domain synchronization approaches for detection of disorders encoded by clinical resting-state EEG recordings (Sun and Li 2019b; Tsai and Wang 2022), while two popular thresholds of the mean and \(60\%\) max value of individual connectivity matrix for recognition of discrete emotional states (Kılıç 2022) and cognitive emotion regulation strategies (Aydın 2022). The weak, noisy, and insignificant edges/connections across the cortex are eliminated by setting a threshold in obtaining a binary version of connectivity matrix, while the most important connections remain. Thus, network connectivity measures are estimated from binary adjacency matrix. Since EEG recordings are nonlinear, random, and probabilistic time series, use of adaptive thresholds as 60% of max value and the mean value in individual connectivity matrix computed for each short segment for classification of discrete emotional states induced by musical sounds in the present study.