Abstract
Emotion-based analysis has raised a lot of interest, particularly in areas such as forensics, medicine, music, psychology, and human-machine interface. Following this trend, the use of facial analysis (either automatic or human-based) is the most common subject to be investigated once this type of data can easily be collected and is well accepted in the literature as a metric for inference of emotional states. Despite this popularity, due to several constraints found in real-world scenarios (e.g. lightning, complex backgrounds, facial hair and so on), automatically obtaining affective information from face accurately is a very challenging accomplishment. This work presents a framework which aims to analyse emotional experiences through spontaneous facial expressions. The method consists of a new four-dimensional model, called FAMOS, to describe emotional experiences in terms of appraisal, facial expressions, mood, and subjective experiences using a semi-automatic facial expression analyser as ground truth for describing the facial actions. In addition, we present an experiment using a new protocol proposed to obtain spontaneous emotional reactions. The results have suggested that the initial emotional state described by the participants of the experiment was different from that described after the exposure to the eliciting stimulus, thus showing that the used stimuli were capable of inducing the expected emotional states in most individuals. Moreover, our results pointed out that spontaneous facial reactions to emotions are very different from those in prototypic expressions, especially in terms of expressiveness.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Emotion plays an important role in daily life due to its effect on people’s behaviour. Several studies have investigated the physiological changes related to emotion to find a correspondence between emotional states and biometrics. To do so, automatic systems are designed for extracting individual physiological measures (e.g. electrodermal activity, heart rate, brain’s electrical activity) obtained in response to outer stimuli [1, 7, 11, 12, 41, 44, 46, 55, 66]. These studies have shown that the collected measures are similar in people experiencing the same emotion, thereby allowing the inference of emotional states based on the interpretation of physiological reactions. For example, the polygraph is a tool widely used nowadays to identify physiological reactions to emotion and, as a result, it works as an efficient lie detector, although its accuracy is penalised by the cost factor [38].
Aiming of finding a cheaper method and use it in natural human interactions, facial-based systems have been developed [8, 9, 18, 32, 39, 42, 57, 60, 68, 72, 74, 81]. A general system consists of face detection, extraction of facial features, expression recognition, and emotion inference [59]. Developing such automated facial expression analysis system is a complex task because there are many factors which affect the processing of facial information. Face shape, facial hair, eyeglasses, age, gender, occlusion, differences in expressiveness, and degree of facial plasticity are examples of common constraints which must be considered by robust algorithms to obtain consistent results. To do so, the automated system requires sufficient information to handle this wide spectrum of ethnic and physiological factors. In addition, the system has to operate at real time, whereas facial responses might be fast and sudden. There are many methods to perform each step of the facial expressions analysis depending on constraints in face acquisition (e.g. image resolution, illumination, and occlusion) and the desired efficiency in the recognition task (e.g. real-time responses and error rate).
The interest of studying facial expressions in order to recognise emotion is not new, and there is a wide spectrum of applications for this specific aim, such as prediction of soft-biometrics or health-care, human machine interaction, diagnoses, lie detection, and so on [15]. The knowledge of the individual’s emotional state provides information which could be extremely valuable in interpreting particular scenarios or evaluating human activities. For obtaining such knowledge, it is important to understand why and how we react to emotions through facial expressions.
Darwin was the first who claimed that emotions and their expressions were biologically innate and evolutionarily adaptive. Ekman and Friesen conducted a study to investigate the universality of emotion facial expressions, whose findings led to the conclusion that people have the innate ability to perform and interpret six facial expressions (happiness, anger, disgust, fear, surprise, and sadness), which are called universal, although their intensity and initiation depend on cultural factors [23]. A recent study has suggested four basic emotions instead of six [37]. This conclusion was reached by the analysis of each facial muscle activated in signalling emotions which showed clearly differences between facial expressions of happiness and sadness, while similarities between other emotions were found (fear and surprise; anger and disgust). Despite the different perspectives on the universality of some emotion facial expressions, there is a general representation of them, which is called prototypic. Most studies encompassing facial expression analysis take prototypic expressions as basis for training, causing a bias that does not present emotion facial expression analysis in real-world scenarios. For this reason, contemporaneous studies have been considering spontaneous facial expressions [2, 10, 15, 31, 47, 73, 82, 83], but there is still unresolved issues in this field.
Although the prototypic expressions of basic emotions are universal [22], which makes them easy to be identified, natural expressions (expressions elicited spontaneously in daily life) differ mainly in expressiveness, which is not universal [22] and therefore they can lead us to deduce incorrect emotional states.
Some studies have shown the impact of natural and prototypical expressions on the accuracy of classifiers [3, 72, 77]. The results showed a higher accuracy rate when using a database with prototypic facial expressions than using spontaneous ones. For example, Valstar and Pantic [72] obtained a recognition rate of 72% using spontaneous facial expressions.
In this paper we investigate the use of spontaneous facial expressions in emotion analysis to explore the possibility of developing a system able to infer emotional states, and to determine the extent to which this type of personal characteristic inference is likely to be possible in practice. We have investigated spontaneous facial expressions over prototypic ones so that we could find a system close to human interaction which included: sonorous and visual stimuli, facial expression analysis, and emotion inference based on newly collected data.
There are many reasons which encourage the development of an emotion analysis method considering spontaneous facial expressions:
-
Emotion analysis can be applied in different contexts providing essential information which could be helpful for areas such as medicine [13, 16, 58], psychology [29], forensics [20], human-machine interface [80], music [50], graphic animations [67], and so on.
-
As previously mentioned, most studies explore prototypic facial expressions over spontaneous ones and therefore the resultant methods are not suitable for real-world scenarios. Thus, the analysis of spontaneous facial expressions could lead to the development of more robust approaches in facial expression analysis.
-
The use of facial features in emotion analysis is a subject addressed by theorists who discuss the universality of emotions. The existence of similar facial reactions to emotions across different cultures is widely explored in the literature in attempting to prove or disprove this hypothesis, which once disproved could lead to different perspectives on emotion inference from face.
This paper is organised as follows: Sect. 2 describes the emotion models from the literature which were considered during the elaboration of the FAMOS; Sect. 3 presents a semi-automatic facial expression analyser employed to identify facial expressions; the stimuli for eliciting spontaneous facial expressions and the database acquisition process are presented in Sects. 4 and 5 discusses the obtained results in each experimental scenario; and finally, Sect. 6 shows our conclusions and perspectives for future work.
2 Emotion model
Emotion models are intended for recognising emotions considering their features. A well-defined emotion model is essential to cover the main parameters related to the affective experience. Among different emotion models, two-dimensional’s, where emotions are arranged as a valence-arousal vector, are the most common [11, 45, 64, 71]. Despite this, there is an increasingly growth of interest in exploring new dimensions, since two dimensions are not always enough to describe affective experiences because they are not capable of representing the rich semantic space of emotion [26, 70]. Besides, there are studies which have pointed out that the arousal dimension should not be considered as an atomic measure, whereas it is composed of two sub-dimensions, arousal–calmness and tension–relaxation, related to opposite causes [63]. This perspective has encouraged the use of multidimensional models with more than the two traditional dimensions in recent studies [49, 62, 78].
In attempting to support the multidimensional approach, a study demonstrated that emotion models should consider a set of six emotion components, namely appraisal of events, psychophysiological changes, motor expressions, action tendencies, subjective experiences, and emotion regulation, due to their high correlation with emotion experiences related to 24 prototypical emotion terms [26]. The four dimensions highlighted in this study were those which presented the greatest variance in the analysed sample: evaluation-pleasantness, potency-control, activation-arousal, and unpredictability. The dimensions were showed to be significantly correlated to the six emotion components, which in turn have been shown to describe the emotional experience robustly [25, 61]. These results also imply that simple two-dimensional models, which are common in the literature, miss major sources of variation in the emotion domain.
Taking these results into account, these six emotion components were considered as a reference for the elaboration of a new multidimensional model called FAMOS (Facial expressions, Appraisal, MOod, and Subjective experiences). The emotion words explored in this model were those related to three basic emotions: happiness, sadness and fear. They were selected due to the possibility of provoking such emotions through aesthetic art forms (music and images) strongly enough to produce visible physical changes and without manipulating people [79]. For this reason, anger/disgust were not explored in this study.
Figure 1 illustrates each of these dimensions showing how they were obtained. The input consists of both subjective and physiological factors (facial expressions). Each subjective dimension is described as a numerical attribute scaled from 1 to 5, obtained from questionnaires, while the facial expressions are described by action units (AUs), obtained by a facial expression analyser (FEA). As a result, it is possible to describe the emotion experienced in terms of these dimensions grouping the output data into basic emotions.
The FAMOS consists of the following dimensions:
-
Mood (action tendencies component) It represents the individual’s emotional state before their exposure to the eliciting stimulus. Subjects describe their initial emotional state in a 1–5 Likert scale, being 1 related to low valence emotions (sadness), 3 related to distress feelings (fear), and 5 related to high valence emotions (happiness). It is also possible to choose a middle ground between either sadness and fear or fear and happiness, depending on the subject’s perspective on their mood. The mood can influence the arousing emotion process [40] and therefore it was considered in the emotional experience analysis in order to find its correlation with the reported appraisal.
-
Appraisal (appraisal of events component) It represents the appraisal associated with the eliciting stimulus. After being exposed to such stimulus, the individual is asked to answer a second questionnaire, where they inform their appraisal on a 1–5 Likert scale, being 1 related to low valence emotions (sadness), 3 related to distress feelings (fear), and 5 related to high valence emotions (happiness). It is also possible to choose a middle ground between either sadness and fear or fear and happiness, depending on the subject’s perspective on their emotional state. These ratings are used to aid the stimulus efficiency assessment by checking whether the appraisal is consistent with the expected emotion. Furthermore, the correlation between subjective reports and facial responses is evaluated in attempting to find empirical evidence about the agreement between them.
-
Subjective experiences (subjective experiences component) Episodic memory can produce several different types of emotion in a short period of time [40], which makes the knowledge of personal information (e.g. previous contact with the eliciting stimulus) from an individual highly desirable [79]. Concerning the auditory stimuli, individuals who have musical experience, for example, could perceive hidden details that can help the cognitive process of understanding musical cues, which is necessary to arouse emotional experiences through music. We also considered people who easily get emotional, once the emotional trigger time is short, and the eclectic ones, who are more receptive to musical cues. Taking these facts into account, this personal information was collected through the application of questionnaires after and before the exposure to the eliciting stimuli.
-
Facial expressions (motor component) In response to an emotion eliciting situation, a person can demonstrate an emotion in many ways and one of the most significant is through facial expressions. Mehrabian [48] indicated that the verbal stimuli (i.e., spoken words) of a message contributes only with \(7\%\) of the effect of the message as a whole, the vocal stimuli (e.g., voice intonation) contributes with \(38\%\), while facial expression of the speaker contributes with \(55\%\) of the spoken message effect. Thus, it is possible to infer the emotion experienced by someone by means of the analysis of their facial expressions. In this work, the facial expressions were obtained through facial pictures taken from filming people during their exposure to an eliciting stimulus. Then, a FEA was employed to extract AUs from each picture, as will be presented in Sect. 3, which subsequently allowed to check the existence of AUs patterns associated with the basic emotions.
We have adopted Likert scales for representing Mood and Appraisal since they are more intuitive and easier to rate by individuals than usual 2-D valence-arousal vectors. In addition, the arousal component is already tied to the emotion terms (e.g. Happiness: high valence, average arousal [56]).
The subjective dimensions explored in this work were associated with the processing of musical cues in attempting to check links between musical load and emotional experiences. The ratings from individuals were compared to their facial reactions, which were extracted by a FEA. The results obtained by the FEA were compared to human assessment as well, in order to evaluate the accuracy and precision of the method.
The emotion components, emotion regulation and psychophysiological changes were not explored in this study since the emotions were assumed to be spontaneous (without attenuation, amplification, concealment, and substitution), and only motor expressions were considered in attempting to obtain a low-cost method (the physiological data acquisition is quite expensive).
Few studies have considered contextual information to improve the facial expression analysis task [83]; therefore, we have analysed the effect of subjective information on the emotional experience for this particular goal.
3 Facial expression analysis
Facial expressions can be defined as changes in the face provoked by internal emotional states in response to external stimuli or to communicate the individual’s intentions [69]. Processing information from face has helped humans to survive since the time when there was no language. For instance, perceiving someone’s intentions was fundamental to escape from predators or to recognise potential enemies and reproductive partners [14]. This ability is more prominent in women, who are most able to process facial information according to studies in this field [17, 34], which suggest that the reason for these results is the women’s need to look after of infants by decoding and detecting distress on their faces or in order to protect them against threatening signals from other individuals [51].
In order to describe facial expressions, Ekman and Friesen [24] developed a facial code called facial action coding system (FACS). The system describes the physical expression of emotions through AUs, which are the smallest visually discernible facial movements related to each facial muscle or group of facial muscles. Table 1 shows the upper and lower face AUs used in our study.
In order to find a unified set of AUs to describe spontaneous emotions, we have employed a method based on a model-driven technique, which depends on prior information about the faces (a neutral face) and landmark detection. The presented method infers the AUs based on the differences obtained between neutral and expressive faces. We have chosen a local model-based method since it has the advantage of being suitable to both single images and image sequences, and it is not affected by age wrinkles since a template image is taken into account during the inference of facial expressions. Also, it does not require extensive prior knowledge about the object of interest, which is required by image-based approaches. The proposed FEA is illustrated in Fig. 2.
Each step of the facial feature extraction can be described in details as follows:
-
1.
Face detection It is responsible for extracting the face from the image. For this purpose, we have used the Viola–Jones method [76], which can be described by three key concepts: integral image generation, selection of Haar-like features by Adaboost algorithm and generation of the cascade classifiers. The first step of the implemented algorithm is the contrast adjustment by using Histogram Equalisation [30]. Then, the classifier is loaded through a file containing a decision tree trained by using several positive and negative images of face. The OpenCV platform [5] offers these files freely; therefore, it was not necessary to train the classifier. Once the face is detected, the image is cropped removing unessential information. Figure 3 shows examples of the detection performed by this algorithm. Even for images with poor lighting, the face was detected.
-
2.
FCP marking Facial points are marked manually based on the geometric model proposed by Kobayashi and Hara [42] for extracting facial features. The proposed geometrical model describes the face through 30 points called facial characteristic points (FCP). The points were chosen based on the key locations of each facial deformation which constitutes expressive facial expressions. As a result, it is possible to identify expressive changes in face by analysing the FCP obtained from each facial image. The FCP were manually marked to ensure the required accuracy in identifying subtle facial expressions, since such accuracy could be harmed by using automatic extraction methods. Finally, the facial points are normalised by applying three transformations to the coordinates of the FCP: translation, rotation and scaling.
-
3.
Calculation of feature values Using the normalised FCP (obtained in Step 2), the feature values were calculated by identifying geometric features on the face [39], as shown in Table 2. For example, ieb_height describes the inner eyebrow height, which is a feature value which considers the normalised FCP 19, 20, 21, and 22. In the original method, only the one-side upper facial points were considered. Moreover, the calculation of geometric features was not efficient in specific situations, for example, in case of changes in eyebrows related to AUs 1 and 2, the feature value \(eb\_height\) contained both outer and inner brow height and therefore these AUs were always found together. Another problem was found in the feature m_mos which was not enough to describe some lower face AUs (e.g. AU15). In order to handle these problems, three new features have been proposed to describe these changes accurately: ieb_height, oeb_height and lc_height. The reformulated model is shown in Table 2.
-
4.
Action units inference After the calculation of feature values from neutral and expressive pictures, the obtained difference was matched to AUs by using a rule-based system. The rules were created by evaluating changes in mouth, eyes, and eyebrows presented in expressive expressions. Table 3 shows the changes in a neutral face and the corresponding AUs. Thresholds (values expressed in pixels) were obtained by training and used to consider changes in facial expressions which are not significant enough to generate a new expression. For instance, AU 1 is found if there is an increase in the inner eyebrow height, and the FCPs 19 and 20 compared to the values obtained from the neutral face.
The classifier was trained using 40 positive samples containing spontaneous facial expressions (obtained during an experiment, as will be presented in Sect. 4) and 40 negative samples containing neutral expressions. The changes in feature values found due to FCP normalisation were measured in order to find thresholds for each equation presented in Table 3. Afterwards, new images were submitted to the classifier and the results were compared to human assessments.
4 New emotion database acquisition: technical specifications
4.1 Subjects
Since it is rare to find a freely available database with non-prototypic facial expressions, with certain exceptions [19, 52, 53], as part of our contribution, a new database with facial expressions of elicited emotions was built. The users who took part in the data collection were students from Universidade Federal do Rio Grande do Norte (UFRN) and employees from SIG Software company, considering only individuals without facial deformities, 52 females and 49 males aged 18–60. The average time spent per person during the experiment was 7 min. Each participant was exposed to only one emotional stimulus since after the procedure the majority of people became aware of the nature of the research.
4.2 Apparatus and stimulus materials
The images were collected using a webcam (\(1280 \times 720\)). Instead of analysing videos, since we have proposed a frame-based method for extracting facial features, we have captured frames from the captured data, 5 frames per second. The seven more expressive frames from each subject were used in the data analysis stage in attempting to avoid attenuation of the results provoked by excessive neutral frames.
In music, several studies have explored the effects of musical parameters on the individual’s mood [35, 43, 44, 46, 65]. A survey of Music Information Retrieval (MIR) is presented by Hu (2010) who states the existence of mood effect in music [6] and shows the main musical parameters with some correspondence with listeners’ judgements and divergences in the subjective factor. Taking these facts into account, we have proposed sonorous stimuli for eliciting emotional states. The musical tracks were selected based on previous experiments from the literature [21, 27, 44] that showed related high emotion levels by measuring physiological changes (e.g. heart rate, blood pressure, skin conductance level, finger temperature and respiration measures) in individuals who were listening to these tracks while they were being monitored. The selected tracks are illustrated in Table 4.
At first, the experiment was made only using the auditory stimuli presented in Table 4, but the results showed that these stimuli were not sufficient to provoke visible changes in facial expressions and the absence of a fixed spot to focus the individual’s vision generated distraction which harmed the emotion eliciting process. Taking these preliminary results into account, it was decided to employ visual stimuli as well. Verbal messages were not used, thus the eliciting stimulus could be presented to anyone without relying on linguistic factors.
Facial pictures can elicit emotion based on intrinsic human features: mimicry, empathy and cognitive load [33]. On account of this, we have selected people’s images in particular emotional scenarios, depending on the emotion expected to be provoked. The pictures were chosen considering cultural categories which are related to the induction of particular emotions. For example, pictures of people smiling were chosen for eliciting happiness, pictures of people crying, alone, ill, victims of violence, and living in poverty were chosen for eliciting sadness, and pictures taken from horror movies were chosen for eliciting fear. Figure 4 shows samples of the used images labelled by emotion.
The images selected were submitted to a 30-people jury who were asked to report which emotion was transmitted by the images presented in an online questionnaire, one image per page. The jury had free time to judge them and move to the next image. The musical stimuli used in this study were not evaluated, since they had been evaluated in previous studies. The available alternatives for the 41 images (11 for happiness, 14 for sadness and 16 for fear) were happiness, sadness, fear or none. The results were evaluated to find consistency of judgements about the emotions provoked by the images, thereby supporting the selected visuals used for eliciting emotions. Figure 5 shows the ratings given by the jury.
For the purpose of describing the agreement among the user’s reports, Fleiss’ kappa (\(\kappa \)) values were obtained [75]. For happiness it was obtained \(\kappa _h = 0.68\), which indicates a substantial agreement, sadness and fear had a moderate agreement (\(\kappa _s = 0.56\) and \(\kappa _f = 0.52\)) and none had a slight agreement (\(\kappa _n = 0.03\)), according to the interpretation of \(\kappa \) by Viera et al. [75]. The general \(\kappa \) was 0.45, which is considered moderate and therefore it implies in moderate agreement about the emotions elicited by the selected visual stimuli.
4.3 Procedure
The procedure can be described as follows:
-
1.
The participant is explained the protocol in each step of the experiment. The nature of the research is not revealed, so the results are not harmed by unconscious bias behaviour.
-
2.
The participant is warned about the filming procedure, and in order to enable data acquisition, a term sheet is given to formalise the cession of the obtained images for research purposes.
-
3.
The participant is asked to fill a questionnaire about their musical experience, current feelings, if he/she is musically eclectic and often gets emotional listening to music.
-
4.
A picture is taken from the individual in order to have a reference neutral facial expression. For those using glasses, it was asked to remove them during the experiment.
-
5.
At the same time, the participant hears a track and sees pictures displayed and changed every 5 s, both representing a specific emotion, while their face is being filmed. This process takes about 1 min. At the end, an image is displayed warning the participant about the end of the experiment and then, the individual is notified about a second questionnaire.
-
6.
The participant is asked to fill the second questionnaire about their new emotional state and a previous contact with the heard song. The latter question was necessary because, in some cases, it is easier to show emotional signals in the first contact with a song, since emotions, elicited by music, related to violation of expectancy might include anxiety/fear [40].
Facing the eliciting stimulus, participants presented facial expressions and the most perceptive ones were obtained from the happiness stimulus, as presented in other previous studies [4, 36]. Figure 6 shows samples collected during the experiment of still neutral faces—(a), (b) and (c)—and samples from the same users, but after they have been exposed to stimuli of sadness, happiness or fear—(d), (e) and (f)—respectively.
5 Results and discussion
5.1 Face action units recognition
In this study, images were obtained from 101 subjects who participated in the experiment presented in Sect. 4, considering only individuals without facial deformities, 52 females and 49 males, aged 18–60, approximately 33 per emotion. Approximately 270 frames (\(640 \times 480\) pixels) were obtained from each participant, plus an additional frame representing a neutral expression of each individual. Among the collected frames, the seven more expressive ones were selected for facial expression analysis. Forty images were used during training step in attempting to obtain thresholds for each rule of the system (see Table 3). After the training step, the 61 remaining images were submitted to the analyser. Table 5 contains the results from a set of about 700 images (training images + new images). Neutral expression occurrences were not considered in the analysis, unless when they caused false positives or false negatives.
The first column enumerates the AUs explored in this work according to the categorisation proposed by Ekman et al. [24]. The second column shows the occurrences of each AU, where only one occurrence was considered per subject (it varied between 0 and 1) without considering the frequency of AUs presented in frames of the same individual. The third column contains the AUs correctly classified by the analyser, taking as ground truth the human assessment. Columns I, D and S represent insertions, deletions, and substitutions, respectively. Insertions represent false positives which occur when an AU is found, but the ground truth states that the expression was neutral. For instance, AU 4 was found in three expressions considered neutral; therefore, 4 insertions were obtained. On the other hand, deletions represent false negatives which occur when an AU is not found, but according to the ground truth the expression was observed in the image. For instance, an occurrence of AU 7 was not found by the analyser which inferred a neutral expression instead. Substitutions represent the occurrence of an AU X which was misclassified as an AU Y. For instance, AU 12 and AU 15 were misclassified as AU 20; hence, there are 2 substitutions for them and 2 insertions for the latter. Finally, the columns accuracy and precision represent efficiency metrics grouped by AU. Both metrics were obtained by means of the following equations:
The accuracy describes the overall success rate, where the value for the system was 96%. On the other hand, the precision is a reliability metric which eliminates the correct results obtained due to chance. To do so, the number of correctly classified instances is subtracted from the misclassified ones (insertions \(+\) deletions \(+\) substitutions). The overall precision for the analyser was 92%.
5.2 Data analysis
We have analysed images of 101 subjects submitted to stimuli intended for inducing emotional states and, consequently, unconscious facial responses, which are the focus of this study. Figure 7 shows some of the subjects who participated in the study.
Assuming the selected stimuli was able to evoke the expected emotions, it was possible to make a description of each emotion in terms of facial reactions using AUs. Figure 8 shows the histogram containing the presented AUs in each emotional scenario. The occurrences of each AU were obtained without considering multiple occurrences for the same individual. For example, if an individual presented the AU 12 several times, only one occurrence of this AU would be considered for the analysis. The extraction of AUs was performed by the developed facial expression analyser jointly with human assessment in order to ensure the accuracy of the results.
In Happiness, it was possible to find a pattern in facial reactions presented by the individuals. The majority of them presented the AUs 6, 7, 12, and, in some situations, AU 25. The pattern found is consistent with the prototypic representation of this emotion described in the literature, differing only regarding the occurrence of AU 7, which is associated with lowering the eyelids during spontaneous smiles, but it is rarely presented in artificial smiles. In Sadness, the predominant AUs were 7, 15 and 4, where AU 15 had an extremely smooth representation, in contrast to the prototypic representation for this emotion. AUs 4 and 7 were presented signalling discomfort caused by the eliciting stimulus. Apathy was also observed during the exposure to the sadness stimulus. The most significant difference was found in Fear. The predominant AUs were 7, 12, 4, 23 e 25 (either single or combined displays), where AU 7 was presented by individuals closing their eyes during image exposure, and AU 12 was presented in smooth smiles. Surprisingly, the smiles presented during Fear have scientific explanation. Allan and Barbara Pease [54] presented 2 types of smiles, being one of them known as a fear face, performed by primates. This fear smile communicates submission or anxiety facing a fear scenario, which is why some people smile thinking about a frightening situation; it is an involuntary way of protecting themselves by showing submission to the threat. AU 23 is an indicative of anxiety as well, once this action is associated with the dryness of mouth caused by situations of anxiety related to fear, for example. AU 25 was observed with some frequency in some individuals in a smooth representation, different from the prototypic exhibition of the emotion fear. The most expressive expressions were found in Happiness, other scenarios presented low intensity expressions, or even neutral ones.
The data obtained through questionnaires and the employed emotion stimuli were compared to the appraisal reported by the individuals aiming to identify whether particular parameters affect emotional experiences.
We accounted for the occurrences of each given question and compared them to the reported appraisal in order to obtain the correlation measure between them. In addition, we compared the appraisal to the emotional stimulus to check its validity. The purpose of this study is not to provide a formula for emotion inference, rather it is to highlight the impact of the proposed dimensions on the final emotional state, statistically, which implies that they should not be neglected in emotion assessment scenarios.
The evaluated parameters were the emotion stimulus, mood, musical experience, subject’s preferences, emotionality, previous contact with the musical stimulus, gender, and age. Figure 9 (parameters at the same scale) and Fig. 10 (parameters at different scales) show the scatter plots containing the data distribution according to its occurrence in the analysed sample.
The points represent the combination of appraisal and explored measure, which are shown in a grey scale where the darkest points correspond to highest occurrences in the sample, which happens when several subjects describe the same values for the emotional parameters, and the lighter ones correspond to lowest occurrences. The red line represents a linear trend lineFootnote 1 which shows how data are expected to be scattered in the sample considering its distribution. The emotional parameters were described by numeric values, where the stimulus is represented by 1 as Happiness, 2 as Fear, and 3 as Sadness (see Fig. 10a), Appraisal and Mood are represented in a 1–5 scale, 5 being the highest value (see Fig. 9a), Known song is represented in a 0–1 scale, 0 being the absence of previous contact with the song played during the experiment and 1 being the existence of this previous contact (see Fig. 10b), Gender is represented by 0 as Male and 1 as Female (see Fig. 10c) and Age is represented by 0 as \(\le \,25\), 1 as 25–48 and 2 as 48–60 (see Fig. 10d).
Pearson’s correlation coefficient (\(\rho \)) [28] was obtained to quantify the correlation between the variables presented in Fig. 9, where it was found moderate correlation between the emotion stimulus and appraisal (\(\rho _{s,a} \approx -\,0.62\)) and weak correlation between other variables.Footnote 2 These results demonstrate that the lower the stimulus (in terms of the numeric labels for each emotion, as previously explained) the greater the reported appraisal, which suggests that the emotion stimuli were able to elicit emotional experiences since, in most cases, the reported appraisal was consistent with the expected emotion in each emotional scenario.
The analysis of emotional parameters also considered each emotion individually. The obtained results indicated moderate correlation between mood and appraisal for Happiness and Fear, where the correlation coefficient obtained was \(\rho \approx 0.58\) for happiness stimulus (see Fig. 11a) and \(\rho \approx 0.6\) for fear stimulus (see Fig. 11b), which means that the higher the mood, the higher the reported appraisal in these emotional scenarios. For sadness stimulus the correlation was weak (see Fig. 11c), which demonstrates that for this emotion the mood did not affect the emotional experience (in terms of reported appraisal). Furthermore, it was found a moderate correlation between appraisal and musical preferences for Fear (\(\rho \approx 0.42\)) indicating that in this emotional scenario the subjects who reported higher appraisal were the more musically eclectic ones (see Fig. 11d).
Considering a gender distinction in the sample, it was found a moderate correlation (\(\rho \approx 0.54\)) between Mood and Appraisal for males (see Fig. 12a), while for females, the mood did not affect their emotional experience in most individuals (see Fig. 12b). These results suggest that the higher the mood, the higher the appraisal reported by males, thus supporting the superior performance of females over males in emotion analysis since the mood did not affect females’ performance in this task. Concerning the correlation between Appraisal and Stimulus, both males and females presented moderate values (\(\rho \approx -\,0.62\)), which means that the eliciting stimulus provoked the same emotional states regardless subject’s gender (see Fig. 12c, d).
In summary, the overall results suggested that the designed stimuli were able to elicit emotions since more than 65% of the users reported changes in mood (Fig. 9a). In addition, the results showed the effect of mood on emotional experiences of happiness and fear and the influence of musical preferences on the appraisal reported by individuals who had contact with the fear stimulus. The analysis of emotional parameters suggests that the four dimensions were valid to describe the emotional experience, although some subjective factors could have been disregarded (e.g. music experience, emotionality, and known song). Finally, 79% of the users reported an appraisal consistent with the used stimulus (Fig. 10a).
We understand these are initial results; nevertheless, we believe they point to very significant differences concerning emotion analysis from facial expressions, mainly due to the use of spontaneous facial expressions instead of prototypic ones. In the proposed experimental study, we have presented results for only three basic emotions, but we believe that this method can be used to other emotions.
6 Final considerations
The presented framework brings a new perspective on facial expression analysis considering spontaneous emotional experiences in order to find a system which is most suitable for human interaction studies. We have demonstrated that there is a link between subjective information and experienced emotions, which encourages the use of such information in emotion analysis studies. We also hope to encourage the use of more realistic data in the area of facial expression analysis by providing a method to elicit spontaneous emotional experiences. For future work, we will cover all four basic emotions by creating a visual-sonorous stimulus for inducing natural facial expressions for each emotion, thereby improving the collected set of images. Additionally, the age range will be expanded to contain younger and older individuals, which will allow a more robust data analysis taking into account distinct age groups. Finally, the database will be publicly released to support the development and assessment of further facial expression analysers.
Notes
Source: https://support.office.com/en-US/Article/Add-a-trend-or-moving-average-line-to-a-chart-3c4323b1-e377-43b9-b54b-fae160d97965. Access date: 18 March 2016.
The terms used to describe the strength of the correlation were obtained from Gerstman [28].
References
Adolphs R (2002) Recognizing emotion from facial expressions: psychological and neurological mechanisms. Behav Cogn Neurosci Rev 1(1):21–62. https://doi.org/10.1177/1534582302001001003
Bartlett M, Littlewort G, Frank M, Lainscsek C, Fasel I, Movellan J (2005) Recognizing facial expression: machine learning and application to spontaneous behavior. In: IEEE computer society conference on computer vision and pattern recognition, CVPR 2005, vol 2, pp 568–573. https://doi.org/10.1109/CVPR.2005.297
Bazzo J, Lamar M (2004) Recognizing facial actions using gabor wavelets with neutral face average difference. In: Proceedings of IEEE international conference on automatic face and gesture recognition, pp 505–510. https://doi.org/10.1109/AFGR.2004.1301583
Black MJ, Yacoob Y (1997) Recognizing facial expressions in image sequences using local parameterized models of image motion. Int J Comput Vis 25(1):23–48. https://doi.org/10.1023/A:1007977618277
Bradski G, Kaehler A (2008) Learning OpenCV: computer vision with the OpenCV library, 2nd edn. O’Reilly Media, Inc., Sebastopol
Capurso A, Foundation MR (1952) Music and your emotions: a practical guide to music selections associated with desired emotional responses. Liveright Publishing Corporation, New York
Chanel G, Kronegg J, Grandjean D, Pun T (2006) Emotion assessment: arousal evaluation using EEG’s and peripheral physiological signals. In: Gunsel B, Jain A, Tekalp A, Sankur B (eds) Multimedia content representation, classification and security, vol 4105. Lecture notes in computer science. Springer, Berlin, pp 530–537. https://doi.org/10.1007/11848035_70
Chuang C, Shih F (2006) Recognizing facial action units using independent component analysis and support vector machine. Pattern Recognit 39(9):1795–1798. https://doi.org/10.1016/j.patcog.2006.03.017
Cohen I, Sebe N, Gozman F, Cirelo MC, Huang TS (2003) Learning Bayesian network classifiers for facial expression recognition both labeled and unlabeled data. In: Proceedings of the IEEE international conference on computer vision and pattern recognition, vol 1, pp I–595. https://doi.org/10.1109/CVPR.2003.1211408
Cohn JF, Schmidt KL (2004) The timing of facial motion in posed and spontaneous smiles. J Wavelets Multi-resolution Inf Process 2:1–12
Colibazzi T, Posner J, Wang Z, Gorman D, Gerber A, Yu S, Zhu H, Kangarlu A, Duan Y, Russell J, Peterson B (2010) Neural systems subserving valence and arousal during the experience of induced emotions. Emotion 10(3):377–389. https://doi.org/10.1037/a0018484
Dael N, Mortillaro M, Scherer KR (2012) Emotion expression in body action and posture. Emotion 12(5):1085–1101. https://doi.org/10.1037/a0025737
Daros AR, Zakzanis KK, Ruocco AC (2013) Facial emotion recognition in borderline personality disorder. Psychol Med 43:1953–1963. https://doi.org/10.1017/S0033291712002607
Darwin C (1998) The expression of the emotions in man and animals, 3rd edn. Oxford University Press, Oxford
De la Torre F, Cohn J (2011) Facial expression analysis. In: Moeslund TB, Hilton A, Krüger V, Sigal L (eds) Visual analysis of humans, pp 377–409. https://doi.org/10.1007/978-0-85729-997-0_19
Deruelle C, Rondan C, Gepner B, Tardif C (2004) Spatial frequency and face processing in children with autism and Asperger syndrome. J Autism Dev Disord 34(2):199–210. https://doi.org/10.1023/B:JADD.0000022610.09668.4c
Donges US, Kersting A, Suslow T (2012) Women’s greater ability to perceive happy facial emotion automatically: gender differences in affective priming. PloS One 7(7):e41745. https://doi.org/10.1371/journal.pone.0041745
Dornaika F, Moujahid A, Raducanu B (2013) Facial expression recognition using tracked facial actions: classifier performance analysis. Eng Appl Artif Intell 26(1):467–477. https://doi.org/10.1016/j.engappai.2012.09.002
Douglas-Cowie E, Campbell N, Cowie R, Roach P (2003) Emotional speech: towards a new generation of databases. Speech Commun 40(1):33–60
Duthoit CJ, Sztynda T, Lal SKL, Jap BT, Agbinya JI (2008) Optical flow image analysis of facial expressions of human emotion: Forensic applications. In: Proceedings of the 1st international conference on forensic applications and techniques in telecommunications, information, and multimedia and workshop, ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), e-Forensics ’08, pp 5:1–5:6
Eerola T, Vuoskoski J (2010) A comparison of the discrete and dimensional models of emotion in music. Psychol Music 39(1):18–49. https://doi.org/10.1177/0305735610362821
Ekman P, Friesen W (1976) Measuring facial movement. Environ Psychol Nonverbal Behav 1(1):56–75. https://doi.org/10.1007/BF01115465
Ekman P, Friesen WV, O’Sullivan M, Chan A, Diacoyanni-Tarlatzis I, Heider K, Krause R, LeCompte WA, Pitcairn T, Ricci-Bitti PE et al (1987) Universals and cultural differences in the judgments of facial expressions of emotion. J Personal Soc Psychol 53(4):712–717. https://doi.org/10.1037/0022-3514.53.4.712
Ekman P, Friesen W, Hager J (2002) Facial action coding system (FACS): manual. A Human Face
Ellsworth PC, Scherer KR (2003) Appraisal processes in emotion. Handb Affect Sci 572:572–595
Fontaine J, Scherer K, Roesch E, Ellsworth P (2007) The world of emotions is not two-dimensional. Psycholical Sci 18(12):1050–1057. https://doi.org/10.1111/j.1467-9280.2007.02024.x
Gašpar T, Labor M, Jurić I, Dumančić D, Ilakovac V, Heffer M (2011) Comparison of emotion recognition from facial expression and music. Coll Antropol 35(1):163–167
Gerstman BB (2003) Statprimer. http://www.sjsu.edu/faculty/gerstman/StatPrimer/. Accessed 08 Nov 2014
Girard JM, Cohn JF, Mahoor MH, Mavadati S, Rosenwald DP (2013) Social risk and depression: evidence from manual and automatic facial expression analysis. In: Proceedings of IEEE international conference on automatic face and gesture recognition, pp 1–8. https://doi.org/10.1109/FG.2013.6553748
Gonzalez R, Woods R (2008) Digital image processing, 3rd edn. Pearson/Prentice Hall, Upper Saddle River
Gunes H, Schuller B (2013) Categorical and dimensional affect analysis in continuous input: current trends and future directions. Image Vis Comput 31(2):120–136. https://doi.org/10.1016/j.imavis.2012.06.016 (affect Analysis In Continuous Input)
Hamm J, Kohler C, Gur R, Verma R (2011) Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. J Neurosci Methods 200(2):237–256. https://doi.org/10.1016/j.jneumeth.2011.06.023
Hess U, Philippot P, Blairy S (1998) Facial reactions to emotional facial expressions: Affect or cognition? Cogn Emotion 12(4):509–531. https://doi.org/10.1080/026999398379547
Hoffmann H, Kessler H, Eppel T, Rukavina S, Traue HC (2010) Expression intensity, gender and facial emotion recognition: women recognize only subtle facial emotions better than men. Acta Psychol 135(3):278–283. https://doi.org/10.1016/j.actpsy.2010.07.012
Hu X (2010) Music and mood: where theory and reality meet. In: Proceedings of iConference
Huang CLC, Hsiao S, Hwu HG, Howng SL (2012) The chinese facial emotion recognition database (CFERD): a computer-generated 3-D paradigm to measure the recognition of facial emotional expressions at different intensities. Psychiatry Res 200(2–3):928–932. https://doi.org/10.1016/j.psychres.2012.03.038
Jack RE, Garrod OG, Schyns PG (2014) Dynamic facial expressions of emotion transmit an evolving hierarchy of signals over time. Current Biol 24(2):187–192. https://doi.org/10.1016/j.cub.2013.11.064
Jiang L, Qing Z, Wenyuan W (2000) A novel approach to analyze the result of polygraph. Proc IEEE Int Conf Syst Man Cybern 4:2884–2886. https://doi.org/10.1109/ICSMC.2000.884436
Jongh E (2002) Fed: an online facial expression dictionary as a first step in the creation of a complete nonverbal dictionary. Master’s thesis, Delft University of Technology
Juslin P (2013) From everyday emotions to aesthetic emotions: towards a unified theory of musical emotions. Phys Life Rev 10(3):235–266. https://doi.org/10.1016/j.plrev.2013.05.008
Kim K, Bang S, Kim S (2004) Emotion recognition system using short-term monitoring of physiological signals. Med Biol Eng Comput 42(3):419–427. https://doi.org/10.1007/BF02344719
Kobayashi H, Hara F (1991) The recognition of basic facial expressions by neural network. Proc IEEE Int Jt Conf Neural Netw 1:460–466. https://doi.org/10.1109/IJCNN.1991.170444
Korsakova-Kreyn M, Dowling WJ (2012) Emotion in music: affective responses to motion in tonal space. In: Proceedings of the 12th international conference on music perception and cognition and the 8th triennial conference of the European society for the cognitive sciences of music, pp 23–28
Krumhansl C (1997) An exploratory study of musical emotions and psychophysiology. Can J Exp Psychol/Rev Can Psychol Exp 51(4):336–353. https://doi.org/10.1037/1196-1961.51.4.336
Laurier C, Grivolla J, Herrera P (2008) Multimodal music mood classification using audio and lyrics. In: Proceedings of the international conference on machine learning and applications, San Diego, California, USA, pp 688–693. https://doi.org/10.1109/ICMLA.2008.96
Le Groux S, Valjamae A, Manzolli J, Verschure PF (2008) Implicit physiological interaction for the generation of affective musical sounds. In: Proceedings of the international computer music conference. Pompeu Fabra University, SemanticScholar, Barcelona, Spain
Lucey S, Ashraf AB, Cohn J (2007) Investigating spontaneous facial action recognition through aam representations of the face. In: Kurihara K (ed) Face recognition book, pp 275–286
Mehrabian A (1968) Communication without words, vol 2. Psychological Today, New York
Morris JD, Klahr NJ, Shen F, Villegas J, Wright P, He G, Liu Y (2009) Mapping a multidimensional emotion in response to television commercials. Hum Brain Mapp 30(3):789–796. https://doi.org/10.1002/hbm.20544
Nakanishi T, Kitagawa T (2006) Visualization of music impression in facial expression to represent emotion. Proc Asia Pac Conf Concept Model 53:55–64
Nauert R (2009) Women recognize emotions better. Psych Central http://psychcentral.com/news/2009/10/22/women-recognize-emotions-better/9100.html. Accessed 11 Oct 2014
OT́oole AJ, Harms J, Snow SL, Hurst DR, Pappas MR, Ayyad JH, Abdi H (2005) A video database of moving faces and people. IEEE Trans Pattern Anal Mach Intell 27(5):812–816. https://doi.org/10.1109/TPAMI.2005.90
Pantic M, Valstar M, Rademaker R, Maat L (2005) Web-based database for facial expression analysis. In: IEEE international conference on multimedia and Expo, ICME, pp 317–321. https://doi.org/10.1109/ICME.2005.1521424
Pease A, Pease B (2008) The definitive book of body language. Random House LLC, New York
Picard R, Vyzas E, Healey J (2001) Toward machine emotional intelligence: analysis of affective physiological state. IEEE Trans Pattern Anal Mach Intell 23(10):1175–1191. https://doi.org/10.1109/34.954607
Posner J, Russell J, Peterson B (2005) The circumplex model of affect: an integrative approach to affective neuroscience, cognitive development, and psychopathology. Dev Psychopathol 17(3):715–734. https://doi.org/10.1017/S0954579405050340
Quan W, Matuszewski B, Shark L, Frowd C (2011) Methodology and performance analysis of 3-D facial expression recognition using statistical shape representation. Int J Grid Distrib Comput 4(3):79–88
Robin M, Pham-Scottez A, Curt F, Dugre-Le Bigre C, Speranza M, Sapinho D, Corcos M, Berthoz S, Kedia G (2012) Decreased sensitivity to facial emotions in adolescents with borderline personality disorder. Psychiatry Res 200(2):417–421. https://doi.org/10.1016/j.psychres.2012.03.032
Sariyanidi E, Gunes H, Cavallaro A (2014) Automatic analysis of facial affect: a survey of registration, representation and recognition. IEEE Trans Pattern Anal Mach Intell 99(PrePrints):1. https://doi.org/10.1109/TPAMI.2014.2366127
Savran A, Sankur B, Taha Bilge M (2012) Regression-based intensity estimation of facial action units. Image Vis Comput 30(10):774–784. https://doi.org/10.1016/j.imavis.2011.11.008
Scherer KR (2005) What are emotions? And how can they be measured? Soc Sci Inf 44(4):695–729
Scherer KR (2009) The dynamic architecture of emotion: evidence for the component process model. Cogn Emotion 23(7):1309–1316. https://doi.org/10.1080/02699930902928969
Schimmack U, Grob A (2000) Dimensional models of core affect: a quantitative comparison by means of structural equation modeling. Eur J Person 14(4):325–345. https://doi.org/10.1002/1099-0984(200007/08)14:4%3c325::AID-PER380%3e3.0.CO;2-I
Schubert E (1996) Continuous response to music using a two dimensional emotion space. In: Proceedings of the international conference of music perception and cognition, pp 263–268
Sloboda J, Juslin P (2001) Psychological perspectives on music and emotion. In: Music and emotion: theory and research, pp 71–104
Smeaton AF, Rothwell S (2009) Biometric responses to music-rich segments in films: the cdvplex. In: International workshop on content-based multimedia indexing, pp 162–168. https://doi.org/10.1109/CBMI.2009.21
Sun S, Ge C (2014) A new method of 3D facial expression animation. J Appl Math 2014:1–6. https://doi.org/10.1155/2014/706159
Tian Y, Kanade T, Cohn J (2001) Recognizing action units for facial expression analysis. IEEE Trans Pattern Anal Mach Intell 23(2):97–115. https://doi.org/10.1109/34.908962
Tian YL, Kanade T, Cohn JF (2005) Facial expression analysis. In: Handbook of face recognition, chap 11. Springer, pp 247–275
Trkulja M, Janković D (2012) Towards three-dimensional model of affective experience of music. Emotion 17:25–40
Tyler P (1996) Developing a two-dimensional continuous response space for emotions perceived in music. Ph.D. thesis, Florida State University
Valstar M, Pantic M (2012) Fully automatic recognition of the temporal phases of facial actions. IEEE Trans Syst Man Cybern 42(1):28–43. https://doi.org/10.1109/TSMCB.2011.2163710
Valstar MF, Pantic M, Ambadar Z, Cohn JF (2006) Spontaneous vs. posed facial behavior: automatic analysis of brow actions. In: Proceedings of ACM Int’l conference on multimodal interfaces, pp 162–170
Veloso L, Carvalho J, Cavalvanti C, Moura E, Coutinho F, Gomes H (2007) Neural network classification of photogenic facial expressions based on fiducial points and gabor features. In: Mery D, Rueda L (eds) Advances in image and video technology, lecture notes in computer science, vol 4872, pp 166–179. https://doi.org/10.1007/978-3-540-77129-6_18
Viera AJ, Garrett JM et al (2005) Understanding interobserver agreement: the kappa statistic. Fam Med 37(5):360–363
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 1:511–518. https://doi.org/10.1109/CVPR.2001.990517
Vukadinovic D, Pantic M (2005) Fully automatic facial feature point detection using gabor feature based boosted classifiers. IEEE Trans Syst Man Cybern 2:1692–1698. https://doi.org/10.1109/ICSMC.2005.1571392
Vuoskoski JK, Eerola T (2011) Measuring music-induced emotion a comparison of emotion models, personality biases, and intensity of experiences. Music Sci 15(2):159–173. https://doi.org/10.1177/1029864911403367
Wallbott HG, Scherer KR (1989) Assessing emotion by questionnaire. Emotion Theory Res Exp 4:55–82
Wimmer M, MacDonald B, Jayamuni D, Yadav A (2008) Facial expression recognition for human-robot interaction—a prototype. In: Sommer G, Klette R (eds) Robot vision, Lecture notes in computer science, vol 4931, pp 139–152. https://doi.org/10.1007/978-3-540-78157-8_11
Yang P, Liu Q, Metaxas DN (2007) Boosting coded dynamic features for facial action units and facial expression recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1–6. https://doi.org/10.1109/CVPR.2007.383059
Zeng Z, Fu Y, Roisman GI, Wen Z, Hu Y, Huang TS (2006) Spontaneous emotional facial expression detection. J Multimed 1(5):1–8
Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58. https://doi.org/10.1109/TPAMI.2008.52
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Costa-Abreu, M.D., Bezerra, G.S. FAMOS: a framework for investigating the use of face features to identify spontaneous emotions. Pattern Anal Applic 22, 683–701 (2019). https://doi.org/10.1007/s10044-017-0675-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-017-0675-y