Keywords

1 Introduction

The emotional interaction between humans and machines is one of the most important challenges in advanced human-machine interaction. One of the most important requisites in this field is to develop reliable emotion recognition systems, and for it we need to correctly identify emotions.

Some researchers support the notion of biphasic emotion, which states that emotion fundamentally stems from varying activation in centrally organized appetitive and defensive motivational systems that have evolved to mediate the wide range of adaptive behaviors necessary for an organism struggling to survive in the physical world [1, 2]. In this framework, neuroscientists have made great efforts to determine how the relationship between the stimulus input and the behavioral output is mediated though specific neural circuits that are highly organized [3].

The majority of studies in this area are based on techniques such as Positron Emission Tomography (PET) [4] or functional Magnetic Resonance Imaging (fMRI) [5] with exceptional spatial resolution but a very reduced temporal one (seconds). An alternative, which offers an excellent temporal resolution (in the range of milliseconds) is Electroencephalography (EEG).

In this study we investigated the temporal dynamics of neural activity associated to emotions (like/dislike) generated by looking at complex pictures derived from the International Affective Picture System (IAPS) [6] while the subjects listen to pleasant or unpleasant music. We used EEG to solve the problem of temporal resolution. We evaluated the correspondences between subjective emotional experiences induced by the pictures and then the role of the music in the resulting brain activity. Then we estimated the underlying neural places in which event-related potentials (ERPs) were generated and the tridimensional location of this locations was used for the assessment of changes in the activation of cortical networks involved in emotion processing.

Our results offer valuable information to better understand the temporal dynamics of emotions generated to visual and auditive stimuli and could be useful for the development of effective and reliable neural interfaces.

2 Methods

Participants

Thirteen persons participated in this study (mean age: 19.8; range: 19–38; seven men, six women). All of them were right handed with a laterality quotient of at least +0.4 (mean 0.7, SD: 0.2) on the Edinburgh Inventory [7].

All participants had no personal history of psychiatric or neurological disorder, alcohol or drug abuse, or current medication, and had normal or corrected to normal vision and audition. All were comprehensively informed about the details and the purpose of the study and gave their written consent for participation.

Visual and Auditory Stimuli

A set of standardized visual stimuli (80 pictures in total) was selected from the IAPS dataset [6]. These stimuli were validated in a previous study [8].

The images were divided into four groups, each one consisted of 20 images. Stimuli were presented in color, with equal contrast and luminance.

Pleasant music were two excerpts of joyful instrumental dance tunes: A. Dvorák, Slavonic Dance No. 8 in G Minor (Op. 46); J.S. Bach, Rejouissance (BWV 1069) and other fragments of music used previously in similar studies [9].

Unpleasant music were electronically manipulated (stimuli were processed using Cool Edit Pro software): For each pleasant stimulus, a new soundfile was created in which the original (pleasant) excerpt was recorded simultaneously with two pitches-shifted versions of the same excerpt, the pitch-shifted versions being one shade above and a return below the original pitch. Both pleasant and unpleasant versions of an excerpt, original and electronically manipulated, had the same dynamic outline, identical rhythmic structure, and identical melodic contour, rendering it impossible that simply the bottom-up processing of these stimulus dimensions already contributes to brain activation patterns when contrasting effects of pleasant and unpleasant stimuli.

Subjects were instructed to give each stimulus a score from 1 to 9 avoiding 5 depending on subjective taste (1: dislike; 9: like). Their verbal response was written.

Procedure

Figure 1 summarizes the serial configuration of the study. Each image was presented for 500 ms and was followed by a black screen lasting 3500 ms. The music started five seconds before the images started and finished five seconds later. The images appeared randomly and only once. The participants’ task was to observe the images and rate the arousal and valence of its emotional experience. Pictures score ranged from 9 (very pleasant) to 1 (very unpleasant).

Fig. 1.
figure 1

Experimental scheme. The sequence of stimuli was presented in a random and continuous mode by using python software

Data Acquisition

The participants were seated in a comfortable position and asked to move as little as possible. Following the preparation phase, participants were instructed about the task. The pictures were presented through to a 21.5-inch computer screen to the subject in the dark.

We inculcated subjects to avoiding blinking during image exposure and trying to keep the gaze toward the monitor center. EEG data was continuously recorded by means of cap-mounted Ag-AgCl electrodes and a NeuroScan SynAmps EEG amplifier (Compumedics, Charlotte, NC, USA) from 64 locations according to the international 10/20 system (FP1, FPZ, FP2, AF3, GND, AF4, F7, F5, F3, F1, FZ, F2, F4, F6, F8, FT7, FC5, FC3, FC1, FCZ, FC2, FC4, FC6, FT8, T7, C5, C3, C1, CZ, C2, C4, C6, T8, REF, TP7, CP5, CP3, CP1, CPZ, CP2, CP4, CP6, TP8, P7, P5, P3, P1, PZ, P2, P4, P6, P8, PO7, PO5, PO3, POZ, PO4, PO6, PO8, CB1, O1, OZ, O2, CB2) [10]. The impedance of recording electrodes was examined for each subject prior to data collection and the thresholds were kept less 25 k\(\Omega \) as recommended [11]. All the recordings were performed at a sampling rate of 1000 Hz. Data were re-referenced to a Common Average Reference (CAR) and EEG signals were filtered using a 0.5 Hz high-pass and low-pass 45 Hz filters. Electrical artifacts due to gesticulation and eye blinking were corrected using Principal Component Analysis (PCA) [12]. They were identified as signal levels above 75 \(\upmu \)V in the 5 frontal electrodes (FP1, FPZ, FP2, AF3 and AF4). These electrodes were chosen because they are the most affected by potential unconscious movements. The time interval for artifact detection was within the interval (−200 ms, +500 ms) from stimulus onset.

The images were separated according to their valence (positive or negative) and the accompanying music (positive or negative).

Statistical Analyses

We studied topographic changes in EEG activity [13,14,15,16] with the help of Curry 7 (Compumedics, Charlotte, NC, USA). We considered the total time course and the whole pattern of activation across the scalp by testing the total field power from all electrodes (see for additional description [17]) since this method is able to detect not only variances in amplitude, but also differences in the underlying sources of activity.

Topographical differences in EEG activity between different images were tested using a non-parametric randomization test (Topographic ANOVA or TANOVA) and a significance level of 0.01 as described elsewhere [8, 18].

On significant windows we performed standardized low resolution brain electromagnetic tomography (sLORETA) calculations [19]. This technique is an advanced low resolution distributed imaging technique for brain source localization that provides smooth and better localization for deep sources, with less localization errors but with low spatial resolution.

3 Results

Subjective Scores

The participants identified correctly the positive songs heard in each of the blocks, however, did not obtain very low scores for the unpleasant music (see Fig. 2). In fact, none was scored below five.

Fig. 2.
figure 2

Average score music for all volunteers. Music 1 and 2 correspond to the pleasant excerpts, while Music 3 and 4 correspond to unpleasant ones.

EEG

Upper Fig. 3 shows the main significant differences when we showed positive images (score 9 or 8) while the participants were listening to pleasant music regarding images with negative valence (score 1 or 2) presented simultaneously with negative music. We found a large significant time window between 448 ms to 632 ms (sig < 0.05). If the significant criteria is decreased to 0.01, the time window was reduced to 501–553 ms.

When the subjects were looking at positive images (score 9 or 8) while listening to negative music or looking at or negative images (score 1 or 2) while listening to positive music, there was also a large significant time window between 553 ms to 692 ms (sig < 0.05), see Fig. 3. When we decreased the significance to 0.01, the time window was also reduced to 592–618 ms.

Fig. 3.
figure 3

Significant differences in EEG activity to each case. The vertical rectangle contains the interval with significant differences (sig < 0.05).

sLoreta

Figure 4 shows the main results when we considered all possible source locations simultaneously applying standardized LORETA (sLORETA). We found a left lateralization when both the visual and auditory stimulus had positive valence whereas there was a clear right lateralization when both, visual and auditory stimuli were negative (Fig. 4). However, when we mixed positive images with negative sounds or viceversa there was not a clear laterality.

Fig. 4.
figure 4

Activation maps for sLoreta corresponding to a time window (sig < 0.01).

4 Discussion and Conclusion

Our results showed an increased activity in the left hemisphere for emotions with a positive valence whereas there was an increased activity in the right hemisphere for emotions with a negative valence. These results support our previous studies [8] and suggest that the visual emotional valence is reinforced when it coincides with the valence of the music. Furthermore these results agree with the valence hypothesis, which postulates a preferential engagement of the left hemisphere for positive emotions and of the right hemisphere for negative emotions [20, 21].

In addition we found a delay of a few milliseconds in the whole brain processing when images and music have different valences. Thus when both stimuli are not concordant, emotional processing takes more time.

Although more studies are still needed, our results demonstrate the feasibility and usefulness of presenting simultaneously visual and auditory information to explore the temporal dynamics of human emotions.

These results demonstrate the feasibility and usefulness of this approach to explore the temporal dynamics of human emotions and could help to set the basis for future studies of music perception and emotions. Furthermore this approach could be useful to better understand the role of specific brain regions and their relation to specific emotional or cognitive responses.