Keywords

1 Introduction

The communication between people is not only rational and logical, but also natural and emotional. In the modern age of technology development, the ability of computer to recognize and express emotions are expected in Human-Computer Interaction (HCI). As an important branch of HCI, Affective Computing is related not only to psychology of emotion, but also to computer technology and statistical analysis [1]. Human emotions are implied in explicit features which play an important role in HCI, such as verbal behaviors, facial expressions and physical behaviors [2]. With the maturity of computer vision and machine learning, the analyses of individual behaviors and emotions become more and more popular.

According to previous research, the emotional recognition of behavior was usually associated with static gestures or inflexible limited behaviors [3, 4]. This paper proposed a method to evaluate human emotions while the interaction is in progress, which prevented interference. We focus on real-time emotional evaluation through human explicit behavior. To achieve the aim, OpenCV was used to extract silhouette features of human behaviors and (PAD) emotional scale was used to evaluate the emotions on time. We formed a model between behaviors and emotions.

2 Related Work

The research of affective computing in HCI includes facial expression, speech emotion, body movement, text emotion and so on [2, 5, 6]. And previous researchers have proposed plenty of methods to evaluate human emotions. Ekman and Friesen studied the relationship between emotional strength and posture in 1967 [7]. Camurri et al. described the features of body movements through physical parameters, such as the location and speed of the body [8], while Castellano found that emotional recognition of body movements was more effective than verbal language and facial expressions [9]. Compared with the method of subjective evaluation, extracting physical behavior was non-intrusive and less interference, which made the abstraction of data more convenient and objective [8, 9].

With the development of information technology, researchers tried to obtain human behaviors with wearable sensors and cameras, and to analyze human behaviors through computer technology [10]. The way of obtaining human behaviors with wearable sensors was more direct and accurate, but more interferential. However, the method of cameras has the advantages of non-invasive, convenient, but confined by the background and target movements [11].

Emotional recognition based on behaviors was mainly to establish an emotion-behavior library by discriminating and analyzing the characteristics of various movements, which were extracted from the movement features of the body under various emotional states [3, 4, 12]. Human movement features, such as the duration, frequency and other properties, were the basis of emotion recognition. Laban and Ullman proposed the Body Action Coding System (BACS) containing information: the part of the body (such as the left hand), direction, speed and shape (such as hand clenched into a fist), distance (curve/line), strength (weak/strong), time (continuous/fast) and fluency [13]. However, most of postures or movements do not have obvious emotional characteristics, and can’t be used fully identify the resolution.

Under the condition of laboratory, emotional induction was an effective way to obtain effective emotional reaction. In the past few decades, researchers put forward lots of methods of to induce emotions including the methods of recall, imagination, image, film and music [14]. The methods of recall and imagination had instable result, while the duration was short induced by picture. The method of film had complex factors, while the music was immersive, coincident and duration long [14, 15]. In previous study, the music was mostly Western classical music, which has different effects on the Chinese people’s emotion because of cultural differences [15]. In this paper, we chose “Chinese folk music emotional library” as the emotional induction material, which was established in our preliminary study. The music library can effectively induce positive, neutral and negative emotions, which are consistent in different subjects.

The PAD scale was used to evaluate the emotion in both this paper and the establishment of “Chinese folk music emotional library”. The theory of discrete emotion believed that human emotion was discrete and measurable [16]. On the other hand, the 6 basic emotions, involving anger, disgust, fear, happiness, sadness and surprise [17], proposed by Ekman was applied widely. The theory of continuous emotion held that human emotion was continuous, complex and distributed in a certain range. The PAD model proposed by Mehrabian and Russell consisted of three dimensions: pleasure-displeasure, arousal-nonarousal and dominance-submissiveness [18]. Continuous emotion model was convenient for feature modeling, which can cover the typical types of discrete emotions [16].

3 Methods

3.1 Features Extraction

Silhouette features, involving the silhouette area of the target: Area (Silhouette[t]), the smallest external polygon area of the target: Area (MinPolygon[t]), and the silhouette centroid coordinate(x[t] and y[t]), were extracted through OpenCV function. And then, the extracted data was cleaned by removing the non-target data and filling the missing frame.

The features extracted from the previous step were calculated and sorted into 3 categories of parameters. There were two state parameters: relative state of silhouette area (CI[t]), and the barycentric coordinates of silhouette (x[t] and y[t]). And there were four change rates of parameters: the change rate of silhouette area (RoSC[t]), the change rate of barycentric coordinates (v[t]), the change rate of silhouette area (A[t]), and the change rate of barycentric coordinates (a[t]). The time series data (the state parameters, the change rate parameters and the first derivative of change rate) were normalized by extracting the parameters of each time series data.

$$ {\text{RoSC}}\left[ {{\text{t}},{\text{n}}} \right] = \frac{{\left( {\mathop \sum \nolimits_{i = t}^{t + n} Silhouette[i]} \right) - \left( {\mathop \sum \nolimits_{i = t - n}^{t} Silhouette[i]} \right)}}{{\mathop \sum \nolimits_{i = t - n}^{t} Silhouette[i]}} $$
(1)
$$ CI\left[ t \right] = \frac{{Area\left( {Silhouette[t]} \right)}}{{Area\left( {MinPolygon[t]} \right)}} $$
(2)

3.2 PAD Scale

PAD emotional scale was designed based on PAD emotional state model consisting of the dimensions of Pleasure, Arousal and Dominance. The PAD scale used in the study was simplified Chinese version revised by Researchers at the Institute of psychology Chinese Academy of Sciences based on the scale proposed by Mehrabian. Each dimension of the scale is measured by 4 items, which can be used to evaluate and classify human emotions effectively. In this study, after listening to each piece of music, subjects were asked to evaluate the scale according to their actual emotional state. More consistent the state to their emotional state more the score is close to 4, or else to −4 (see Table 1).

Table 1. Emotional scale (It was in Chinese in the experiment)

3.3 Materials

Set-Up.

The experiment was held in Beijing. The size of the experimental site was 4 × 4 m (Fig. 1). Two cameras were settled in front of the site and the right. Ten subjects participated in the experiment.

Fig. 1.
figure 1

Experimental stage and a subject

Emotional induction.

In this study, Chinese folk music was used as the material of emotional induction, which has good immersion, persistence, and validity. A total of 21 pieces of music were chosen from “Chinese folk music emotional library”, including positive, neutral and negative music each 5 pieces and from which 2 pieces were picked from each category (for test-retest reliability of the data) (see Table 2).

Table 2. Number of induction material

3.4 Experimental Procedure

Firstly, emotions involving positive, medium and negative were induced by 21 periods of Chinese folk music, while 10 subjects were asked to express their emotions through body movements while listening music. Secondly, the movements of subjects were recorded by 2 cameras, the position of which were front and right of the stage. After each piece of music, the emotions of subjects were evaluated through PAD scale for a few seconds. Then, the silhouette parameters of subjects in the video were extracted through our methods. Finally, the training models were established with the parameters and PAD scales, which was used to evaluate the unknown emotions implied in body movements (see Fig. 2).

Fig. 2.
figure 2

Experimental procedure

4 Result Verification

4.1 Data Processing

Emotional Scale

Reliability. Software of SPSS 22 was used for data statistical analysis. Internal Consistency Coefficient of PAD emotional scale: α = 0.888; for test-retest reliability, independent sample T of repeated induction material: P = 0.104. There was no significant difference between repeated induction, which indicated the consistence of perception and evaluation of subjects.

Classification. The emotions of the experimental music were divided into 2 major categories or 3 specific categories through clustering analysis (see Table 3) of PAD scale. The result of clustering analysis was consistent with the hypothesis: the first 7 pieces of music corresponded to negative emotion, the middle 7 pieces of music corresponded to neutral emotion, and the last 7 corresponded to positive emotion.

Table 3. Classification accuracy of each category

Silhouette feature data.

The OpenCV was used to extract the features of the 10 subjects in each of the 21 videos, which described the body characteristics of the targets. Then the data was cleared and the missing values were processed. Finally, the normalized parameters of time series data were extracted, including the mean, median, variance, maximum, minimum and maximum of the sequence.

4.2 Emotion Recognition

Induction material was divided into 2 big categories based on the PAD scale, which could be described as negative and positive emotions; and into 3 categories, which could be described as positive, neutral, and negative emotions. The accuracy of prediction equals the correct number of recognition divided by the total number of samples and multiplied by 100%.

Table 4 lists the corresponding classification accuracy of each camera: the front camera has a high accuracy, for example accuracy of positive and negative was 72.5%; the data of right camera hasn’t been classified effectively.

Table 4. Classification accuracy of each category

5 Conclusion and Discussion

The results verified the hypothesis: (1) behavioral characteristics can effectively represent emotion; (2) the method for extracting the behavioral characteristics in this study is effective in this study.

The internal consistency reliability and the test-retest reliability of the PAD scale was valid. Therefore, the PAD scale was a reliable and effective method for evaluating emotions. 3 categories of the classification results of PAD scale, involving positive, neutral and negative emotions, were completely consistent with the classification of the induction materials, which verified the effectiveness of the induction material.

We found that the expression of negative emotion was close to neutral. The results of classification and recognition verified the conjecture in the cluster analysis: the difference of expression between negative and neutral emotion was small, while the difference between negative and positive emotions was large.

Table 5 presents the correlation test of correlation (0.3 < pearson < 0.5) between the parameters and categories. These parameters contributed greatly to the classification results, such as the variance of A, the variance of RoSC and so on.

Table 5. Correlation between parameters and categories

Through analyzing the characteristics of different emotions, we found that the positive emotion was more powerful than the negative emotion. Speed and strength in BACS corresponded to RoSC and A in Fig. 3. The amplitude of red line is larger than the blue, which means the positive emotion contains more energy.

Fig. 3.
figure 3

RoSC and A of positive and negative emotions (The figure above is the rate of the target’s movement, while the below is a derivative of the rate. Blue line represents negative emotion, while red indicates positive emotion) (Color figure online)