Distinguishing Posed and Spontaneous Smiles by Facial Dynamics

Mandal, Bappaditya; Lee, David; Ouarti, Nizar

doi:10.1007/978-3-319-54407-6_37

Bappaditya Mandal¹⁶,
David Lee¹⁷ &
Nizar Ouarti^18,19

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10116))

Included in the following conference series:

Asian Conference on Computer Vision

1851 Accesses
4 Citations

Abstract

Smile is one of the key elements in identifying emotions and present state of mind of an individual. In this work, we propose a cluster of approaches to classify posed and spontaneous smiles using deep convolutional neural network (CNN) face features, local phase quantization (LPQ), dense optical flow and histogram of gradient (HOG). Eulerian Video Magnification (EVM) is used for micro-expression smile amplification along with three normalization procedures for distinguishing posed and spontaneous smiles. Although the deep CNN face model is trained with large number of face images, HOG features outperforms this model for overall face smile classification task. Using EVM to amplify micro-expressions did not have a significant impact on classification accuracy, while the normalizing facial features improved classification accuracy. Unlike many manual or semi-automatic methodologies, our approach aims to automatically classify all smiles into either ‘spontaneous’ or ‘posed’ categories, by using support vector machines (SVM). Experimental results on large UvA-NEMO smile database show promising results as compared to other relevant methods.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Spontaneous Versus Posed Smiles—Can We Tell the Difference?

Spontaneous versus posed smile recognition via region-specific texture descriptor and geometric facial dynamics

Article 01 July 2017

Static Posed Versus Genuine Smile Recognition

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In the past, the research on human affect was focused on basic emotions which include happiness, sadness, fear, disgust, anger, and surprise [1]. Past research conducted in this area largely focused on posed facial expressions. Primarily, because of difficulty in obtaining and capturing spontaneous facial expression that lead to unavailability of such databases. However, these discrete emotions (posed facial expressions) fail to describe the wide range of emotions that occur in natural social interactions. Recent studies have shown that spontaneous facial expressions reveal more information about the emotions of the person, and spontaneous expressions differ greatly from its posed counterpart [2]. For example, a spontaneous smile is generally interpreted to show enjoyment or happiness, but it can also arise out of frustration [3]. As a result, more emphasis has been placed on spontaneous facial expressions in the recent years. Smile is one of the key elements in identifying emotions and state of mind from facial expressions, as it is frequently exhibited to convey emotions like amusement, politeness and embarrassment [4], and is also used to mask other emotional expressions [5]. Since smile is the easiest expression to pose [5], it is important for the machines (and also humans) to distinguish when it is posed and when it is a genuine smile of enjoyment.

This work contributes to the field of affective computing by solving the problem of distinguishing posed and spontaneous smiles. Development in this field can be applied commercially to enhance our daily lives. For example, automatic human affect recognition can be applied to wearable devices like Google Glass to help children with autism who have difficulty reading expressions to understand emotions in a social environment [6,7,8,9]. It can also be installed in vehicles to detect fatigue levels of the driver and prevent fatigue-causing accidents from occurring. Affective computing can also be applied in the academic fields like psychology, psychiatry, behavioral science and neuroscience, to reduce the time-consuming task of labeling human affects and can improve human lives.

1.1 Related Work

In early years of research, it was thought that the morphological features of the face were good indicators of a spontaneous smile. The Facial Action Coding System (FACS) [10] defines Action Units (AU), which are the contraction or relaxation of one or more muscles. It is commonly used to code facial expressions, and can be used to identify emotions. In FACS, a smile corresponds to AU12, which is the contraction of the zygomatic major muscle that raises the lip corners. A genuine smile of joy is thought to include AU6, also known as the Duchenne Marker, which is the contraction of the orbicularis oculi (pars lateralis) muscle that raises the cheek, narrows the eye aperture, and forms wrinkles on the external sides of the eyes. However, recent research casts doubt on the reliability of Duchenne marker in identifying true feelings of enjoyment [11]. Another possible marker of spontaneous smiles is the symmetry of the smile. Initial studies claim that smile symmetry is a factor in identifying spontaneous smiles, where spontaneous smiles are more symmetrical than posed smiles [12]. However, later studies report no significant differences of symmetry [13, 14].

Recently, more attention has been paid to dynamical properties of smiles such as the duration, amplitude, speed, and acceleration instead of static features like smile symmetry or the AUs. To analyze these properties, the smile is generally broken up into three different phases - onset, apex, and offset. Spontaneous smiles tend to have a smaller amplitude, a slower onset [15], and a shorter total duration [16]. The eye region is analyzed as well - the eyebrow raise in posed smiles have a higher maximum speed, larger amplitude and shorter duration than spontaneous ones [14]. Most techniques extract dynamic properties of smiles that are known to be important factors in classifying smiles [1, 17]. Apart from these properties of smiles, facial dynamics can reveal other useful information for the classification of smiles, such as the subject’s age.

Currently, the method with the best performance is the one proposed by Dibeklioglu et al. [13]. In their method, 11 facial feature points are tracked using a Piecewise Bezier Volume Deformation (PBVD) tracker, which was proposed by Tao and Huang [18]. The duration, amplitude, speed and acceleration of various features in the eyes, cheeks and mouth regions are calculated across the 3 different phases of the smile, and a mid-level fusion is used to concatenate the features. Using these features, the classification accuracy on the UvA-NEMO Smile Database was 87.0%. An optical flow along with various face component based features is proposed in [19], where it is shown that even optical based features can perform similar to face component based features. However, their tracking is initialized by manually annotated facial landmarks.

This work aims to discover other additional features from facial dynamics that can be useful for classification. To do so, the entire face region should be processed and used for smile classification instead of extracting the known features of a smile described above. A cluster of approaches are developed to extract features from a video and use a classifier to determine whether the smile in the video is posed or spontaneous. The system first pre-processes the video by tracking the face and extracting the face region from the video after normalization. 3 different normalization techniques are tested to determine its efficacy. Micro-expressions in the face region are amplified to test their impact on smile classification. Then, several image processing techniques are used to extract features from the face region. The features are then post-processed to reduce the dimensionality of the features and to normalize the number of features per video for the classifier to work. Finally, the processed features are given as an input to a support vector machines (SVM) to classify between posed and spontaneous smiles.

In the rest of this paper, in Sect. 2, a cluster of methodologies is proposed to test the effectiveness of three normalization techniques, and the effectiveness of HOG, LPQ, dense optical flow and pre-trained convolutional neural network features for smile classification. A relatively new technique for amplifying micro-expressions is also tested to analyze the impact of micro-expressions in classifying smiles. In Sect. 3, experimental results and analysis are presented. Conclusions are drawn in Sect. 4.

2 Proposed Methodologies

Figure 1 shows the flow of our proposed approaches. The incoming videos/images are pre-processed to extract the face and eye regions from the image frames. Faces are then tracked using these detected regions so as to understand the global movement of the faces. 3 separate normalization techniques are used to normalize the face in the extracted image sequence, and each technique is tested to see how effective each one is. Eulerian video magnification (EVM) is used in the pre-processing step to amplify micro-expressions in the face region to test the impact of micro-expressions in classification. After the face region is extracted, the videos are processed to extract features for classification. Different feature extraction techniques are used to test the effectiveness of each method. The features are then post-processed to reduce the dimensionality of the features and to normalize the number of features per video for the classifier to work. Finally, the processed features are given as an input to the SVM to classify between posed and spontaneous smiles. Each of these blocks are discussed below.

2.1 Pre-processing Techniques

The incoming image frames are pre-processed using three different methodologies which are described below:

Face and Eye Detections: The face and eyes are first detected to be used for tracking in the next step. Initially, a cascade object detector from Matlab’s computer vision toolbox was implemented to detect the face and both eyes in the first frame of the video. However despite fine-tuning, it was inaccurate as there were many cases of false positives. Such errors will propagate throughout the entire system and cause anomalous results in the classification. Therefore this implementation was discarded and replaced by a more accurate method by Mandal et al. [20, 21]. This method uses a fusion of OpenCV face and eye detectors [22] and Integration of Sketch and Graph patterns (ISG) eye detectors developed for human-robot-interaction by Yu et al. in [23]. Through the integration of both eye detectors [24], we are able to achieve high success rate of 96.3% eye localization in the smile face images for both frontal and semi-frontal faces at various scales with large global motions. The output of this step is 3 bounding boxes containing the face, the left eye and the right eye, separately.

Face and Eye Tracking: A Kanade-Lucas-Tomasi (KLT) tracker [25] was implemented to track the face and eyes in the video. The tracker searches for good feature points to track within the face and eyes bounding boxes and tracks these points across the video. These points are marked as crosses (+) in the faces shown in Fig. 2. An affine transformation is estimated based on the movement of the feature points from one frame to the next, and the bounding box is then warped according to the transformation to keep track of the face region. Figure 2 shows the original bounding box in the first frame of the video, and the warped bounding box in a later frame. The output of this step is the coordinates of the bounding box vertices at every frame.

Cropping Strategies: The KLT tracker is used to track the face region in the video. Then, three different methods of normalization are tested.

1.
Normalize eye location: This method normalizes the location of the eyes in the video, as described by Mandal et al. in [20, 26]. The location of the eyes are tracked, and the video is rotated and resized such that both eyes are always on the same horizontal axis, and there is a fixed pixel distance between both eyes (234 pixels apart for a face image of \(400\times 500\), similar to [27,28,29]). Figure 3 shows a sample image of three subjects from UvA-NEMO smile database.
2.
Normalize face orientation: This method normalizes the orientation of the face. The orientation of the face is obtained from the KLT tracking data, and the video is rotated to compensate for the face rotation such that the face appears upright. Figure 4 shows the corresponding samples.
3.
No normalization: This is a control experiment to observe the effects of normalization. Figure 5 shows the corresponding samples.

After normalization from each of these techniques, the face region is cropped out of the video for feature extraction. The first two methods produce a \(400\times 500\) pixel region, while the third method produces a \(720\times 900\) pixel region.

2.2 Micro-expression Amplification

Eulerian Video Magnification (EVM) is a Eulerian method to amplify small motions and color variations in videos. It was first developed by Wu et al. [30], and the motion amplification was later improved with a phase-based pipeline by Wadhwa et al. [31]. It is used to amplify micro-expressions in the extracted face region. This method of motion amplification has shown good results when used in a system to detect spontaneous micro-expressions [32]. The algorithm first decomposes an input video into different spatial frequency bands and orientations using complex steerable pyramids, and the amplitude of local wavelets are separated from their phase. The phases at each location, orientation and scale are then independently filtered and the phase of a specified frequency range is amplified (or attenuated) and de-noised. Finally, the video is reconstructed, which now has some motions amplified, depending on the frequencies selected.

2.3 Feature Extraction

In this subsection details of the feature extraction and their corresponding normalization techniques are discussed. Features are extracted from the cropped face region. Four different image-processing techniques are used to extract features.

1.
Local Phase Quantization: LPQ was originally used for texture description [33], which outperformed the Local Binary Pattern operator in texture classification and has been extended to face recognition [34]. LPQ features are insensitive to motion blurs, out of focus blurs, and atmospheric turbulence blurs. The algorithm computes a 2D Discrete Fourier Transform (DFT) over a neighborhood at each pixel using 2D convolutions. Then, the covariance matrix of the DFT is obtained, de-correlated and quantized to obtain the local phase information. LPQ descriptors are less sensitive to image blurs which may arise from the interpolation of the normalization techniques, and can be useful for smile recognition.
2.
Histogram-of-Oriented-Gradients: HOG is typically used for objection detection, and has been used for human detection in videos [35]. The idea behind HOG is that an object’s shape can be well-characterized by the distribution of local intensity gradients (edge directions) without precise knowledge of their positions. The algorithm divides the image into spatial regions called cells, and a histogram of gradient directions is computed for each pixel in the cell. Cells are grouped to form larger spatial regions called blocks, and the local histogram ‘energy’ of a block is used to normalize the contrast in all the cells in that block. These normalized blocks are the HOG descriptors. The ‘UoCTTI’ variant of HOG [36] is used as it has a 31 features per cell compared to the original 36 features per cell, which is a significant reduction in dimensionality. A \(4\times 4\) pixel cell with 8 orientations is used as early experiments show that it has the best results for the given problem.
3.
Dense Optical Flow: Optical flow is the pattern of apparent motion of objects in a visual scene caused by a relative movement between an observer and the scene. Optical flow can be computed for a sparse feature set to track the movement of the feature points, or for a dense feature set whereby every pixel in an image sequence is tracked. Dense optical flow is used to calculate the movement of each pixel throughout the video. This work uses a dense optical flow field determined by a differential method proposed by Farneback [37]. Optical flow works on the assumptions that the pixel intensities of a scene do not change between consecutive frames, and that neighboring pixels have similar motion. Farneback’s algorithm approximates an image signal to be a polynomial expansion and solves for the displacement of that polynomial assuming a translation occurred. The solution is made to be more robust by introducing assumptions to constrain the system. The main assumptions made are that the displacement field is slowly varying, and that the displacement field can be parameterized according to a motion model.
4.
Pre-trained Convolutional Neural Network (CNN) Features: Another emerging class of features are the pre-trained deep CNN features. Studies have shown that deep convolutional neural network (CNN) model trained for certain application can be applied to similar classification problems perform well [38, 39]. We use the pre-trained deep CNN model for face recognition, which is trained with 2.6 million images comprising of over 2600 people for face smile classification. Most existing models related to face or object recognition is applied to images, not videos. To accommodate this method for video-processing, each frame of the video is processed and features are extracted from a deep fully-connected layer. These features are post-processed to combine them and used for classification with a SVM. This work uses the VGG face recognition model [39], and 4096 features with L2 normalized per frame are extracted from the \(35^{th}\) layer of the CNN network.

The techniques mentioned above extract features frame-by-frame. Since the video samples have a varying number of frames, there is also a varying number of features per sample. Our learning algorithms require a fixed number of features per sample, so the number of features per sample is normalized. To do so, the features of each video are concatenated across time and transformed to the cosine domain via Discrete Cosine Transform, such that the number of cosine coefficients is equal to the number of frames in the video. Then, the number of coefficients are normalized by either cropping or padding them such that there is an equal number of cosine coefficients per sample video; this is equivalent to normalizing the number of frames per video. The number of frames in the videos range between 80–500 frames, and the average number of frames is 150. A suitable number of frames for normalization would be 150, however computational constraints limit the number of features that each sample can have, thus the number of frames per video is normalized to 100 for the experiments.

2.4 Classification

A SVM is a machine learning algorithm that can be used for classification, regression or other tasks by constructing a hyperplane or a set of hyperplanes. The modern implementation of SVMs was developed by Cortes and Vapnik in [40]. Recent study shows that when applied to distinguishing posed and spontaneous smiles, the linear SVM classifier outperforms other classifiers like the Polynomial Kernel SVM, RBF Kernel SVM, Linear Discriminant, Logistic Regression, k-Nearest Neighbour and Naïve Bayes [41]. Therefore, linear SVM is used for classification in this work.

3 Experiments

The UvA-NEMO Smile Database is used to analyze the dynamics of posed and spontaneous smiles [13]. This is the largest database of videos of spontaneous and posed smiles to date [19], with 643 posed smiles and 597 spontaneous smiles from 400 subjects. The videos were recorded with a Panasonic HDC-HS700 3MOS camcorder that was placed on a monitor approximately 1.5 m away from the recorded subjects. The videos have a resolution of \(1920\times 1080\) pixels and a frame rate of 50 frames per second. For posed smiles, each subject was asked to pose an enjoyment smile after being shown a demonstration video. For spontaneous smiles, each subject was shown a set of short, funny videos for approximately 5 min. Two trained annotators segmented the recorded video of their smiles to obtain the smile video.

Similar to [13, 19], the accuracy of the system is measured using 10-fold cross validation, where the data set is divided into 10 folds specified by the smile database. 9 folds are used for training and the remaining fold is used for testing. This is done 10 times, such that each fold is used for testing once in order to represent the entire data set, and the average of the results is used as the measure of accuracy. The classification accuracy is recorded into a confusion matrix. To simplify the presentation of results, only the accuracy of true positives and negatives are displayed. The large number of features from HOG coupled with the larger face region size from not normalizing the face results in a large dimensionality that could not be computed using the SVM classifier. Therefore, the results for that case is omitted.

Figure 6a shows the optical flow output, where the hue of the pixel denotes the flow direction, while the saturation of the pixel denotes the flow magnitude. The drawback of dense optical flow is that a large number of features is produced. For a face region of \(400\times 500\) pixels, each pixel will have 2 features for optical flow in the x- and y-directions. This results in 400,000 features per frame, and approximately 30 million features per video. The dimensionality for classification is too high, requiring a very long time and a large memory to process the training data. Therefore, the dimensionality must be reduced in order for this method to be feasible. To do so, the optical flow data from the eyes, nose and mouth regions are extracted and used for classification, while the other regions are discarded since they are not likely to have a large impact on classification. Figure 6b shows the bounding boxes of the regions that are extracted.

The eye regions are obtained from the tracking data. The nose and mouths regions are estimated based on the location of the eyes. The result is a dimensionality of approximately 52,800 features per 21 frame, and approximately 8 million features per video. However, the accuracy obtained using 10-fold cross validation (see Sects. 3 and 4) for dense optical flow is bad, with 59.9% accuracy for classifying posed smiles correctly, and 70.3% accuracy for classifying spontaneous smiles. Due to the long time that it takes to extract dense optical flow features and the large dimensionality of the features, it was deemed that this method is not worth further tests and is omitted from the discussions hereafter.

Table 1 shows the classification accuracy of the system by varying the normalization methods, feature extraction techniques, and the use of EVM for micro-expression amplification. For example, the top-left entry of the table shows that there is a 81.7% accuracy for classifying posed smiles, and 74.3% accuracy for classifying spontaneous smiles, when the eye location is normalized, EVM is used to amplify micro-expressions, and HOG features are used. The table abbreviates the pre-trained VGG convolutional neural network model features to ‘VGG’. From Table 1, it can be seen that the classification of posed smiles is always more accurate than spontaneous smiles, regardless of normalization or feature extraction method. This may be a result of having more prominent visual appearances of the posed smiles as compared to the spontaneous ones. Perhaps, this allows the SVM to have better support vectors to define posed smiles, whereas lesser visual appearances of spontaneous smiles could mean that the support vectors are less well defined and the hyperplane produced is not a good true separator of boundaries.

Table 1. True positive (posed smiles) and true negative (spontaneous smiles) classification accuracy (%) of our system with varying feature extraction methodologies and normalization.

Full size table

Table 2. Overall classification accuracy (%) of our system with varying feature extraction methodologies and normalization.

Full size table

Table 2 shows that overall accuracy of our proposed approaches. Although VGG face model is trained with millions of face images, features from such deep CNN network may not be good for face smile classification. It is evident that HOG features with normalization using the eye locations along with the magnified micro-expression using EVM performs best in most of the cases as compared to LPQ and VGG features.

3.1 Comparison with Other Methods

Correct classification rates (%) using various methods on UvA-NEMO are shown in Table 3. Most of the existing methodologies involve semi-automatic processes for face feature extraction, whereas our method is fully automatic and does not require any manual intervention. It is evident from the table that our proposed approach is quite competitive as compared to the other state-of-the-arts methodologies.

Table 3. Correct classification rates (%) on UvA-NEMO database.

Full size table

3.2 Discussions

Table 1 shows that the classification of posed smiles is always more accurate than spontaneous smiles, regardless of normalization or feature extraction method. This is probably because posed smiles may have more prominent visual appearances as compared to the spontaneous ones. Figure 7 represents the results in Table 2 in a histogram by grouping the feature extraction methods together. It can be seen that HOG outperforms the other two feature extraction methods, LPQ and VGG, for most of the normalization techniques, except for one case where using HOG has a 0.21% lesser accuracy than using the pre-trained NN model features when normalizing the face orientation, and not using EVM. Similarly, using the pre-trained VGG CNN model outperforms LPQ as a feature extraction method as the classification accuracy using VGG is higher than LPQ for all cases.

From the experiments, it can be said that among the 3 feature extraction methods, HOG features are the best, and LPQ features are the worst. Features from pre-trained VGG model using large number of face images for face recognition has not been able to generalize well for smile classification. It is interesting to note that HOG is capable of capturing the fine grained facial features that helps in distinguishing posed and spontaneous smiles.

Figure 8 shows the comparison of classification accuracy between normalization methods, which are grouped according to the parameters. Generally, having no face normalization results in a lower classification accuracy as compared to having either normalization method. The only exception is the case of using LPQ features with EVM, where the classification accuracy is 0.3% higher without normalization than normalizing the eye location. Normalizing the eye location performs the best in 4 out of the 6 cases, thus it seems like it is the better normalization. The most significant difference between normalization techniques is seen when using HOG descriptors, where there is 1% difference in classification accuracy between normalizing eye location vs. normalizing face orientation.

4 Conclusions

In this work, a cluster of methodologies is proposed to distinguish posed and spontaneous smiles. It involves four feature extraction methods, three normalization methods and the use of EVM for micro-expression amplification, but was unable to improve the state-of-the-art performance. The best classification accuracy obtained was 78.14%. Using EVM to amplify micro-expressions did not have a significant impact on classification accuracy, while the normalizing facial features improved classification accuracy. The advantage of our proposed approaches as compared to other methods is that they are fully automatic. The effectiveness of the feature extraction methods in smile classification is ranked from most to least effective as follows: HOG, a pre-trained VGG CNN model features, LPQ and dense optical flow. This work succeeded in identifying techniques which are helpful and detrimental to smile classification. Experimental results on large UvA-NEMO smile database show promising results as compared to other relevant methods.

References

Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. PAMI 31, 39–58 (2009)
Article Google Scholar
Ekman, P., Rosenberg, E.: What the Face Reveals: Basic and Applied Studies of Spontaneous Expression Using the Facial Action Coding System, 2nd edn. Oxford University Press, New York (2005)
Book Google Scholar
Hoque, M., McDuff, D., Picard, R.: Exploring temporal patterns in classifying frustrated and delighted smiles. IEEE Trans. Affect. Comput. 3, 323–334 (2012)
Article Google Scholar
Ambadar, Z., Cohn, J., Reed, L.: All smiles are not created equal: morphology and timing of smiles perceived as amused, polite, and embarrassed/nervous. J. Nonverbal Behav. 33, 17–34 (2009)
Article Google Scholar
Ekman, P.: Telling Lies: Cues To Deceit in the Marketplace, Politics, and Marriage. WW. Norton & Company, New York (1992)
Google Scholar
Hadwin, J., Baron-Cohen, S., Howlin, P., Hill, K.: Can we teach children with autism to understand emotion, belief, or pretense? Dev. Psychopathol. 8, 345–365 (1996)
Article Google Scholar
Xu, Q., Ching, S., Mandal, B., Li, L., Lim, J.H., Mukawa, M., Tan, C.: Socio glass: social interaction assistance with face recognition on google glass. J. Sci. Phone Apps Mob. Devices 2, 1–4 (2016)
Article Google Scholar
Mandal, B., Lim, R.Y., Dai, P., Sayed, M.R., Li, L., Lim, J.H.: Trends in machine and human face recognition. In: Kawulok, M., Celebi, M.E., Smolka, B. (eds.) Advances in Face Detection and Facial Image Analysis, pp. 145–187. Springer, Cham (2016). doi:10.1007/978-3-319-25958-1_7
Google Scholar
Mandal, B., Wang, Z., Li, L., Kassim, A.A.: Performance evaluation of local descriptors and distance measures on benchmarks and first-person-view videos for face identification. Neurocomputing 184, 107–116 (2016)
Article Google Scholar
Ekman, P., Friesen, W.: The Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press Inc., San Francisco (1978)
Google Scholar
Krumhuber, E.G., Manstead, A.S.: Can duchenne smiles be feigned? new evidence on felt and false smiles. Emotion 9, 807–820 (2009)
Article Google Scholar
Ekman, P., Hager, J., Friesen, W.: The symmetry of emotional and deliberate facial actions. Psychophysiology 18, 101–106 (1981)
Article Google Scholar
Dibeklioğlu, H., Salah, A.A., Gevers, T.: Are you really smiling at me? spontaneous versus posed enjoyment smiles. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 525–538. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33712-3_38
Chapter Google Scholar
Schmidt, K., Bhattacharya, S., Denlinger, R.: Comparison of deliberate and spontaneous facial movement in smiles and eyebrow raises. J. Nonverbal Behav. 33, 35–45 (2009)
Article Google Scholar
Cohn, J., Schmidt, K.: The timing of facial motion in posed and spontaneous smiles. Int. J. Wavelets, Multiresolut. Inf. Process. 2, 1–12 (2004)
Article MathSciNet Google Scholar
Schmidt, K., Ambadar, Z., Cohn, J., Reed, I.: Movement differences between deliberate and spontaneous facial expressions: zygomaticus major action in smiling. J. Nonverbal Behav. 30, 37–52 (2006)
Article Google Scholar
Valstar, M.F., Pantic, M., Ambadar, Z., Cohn, J.F.: Spontaneous vs. posed facial behavior: automatic analysis of brow actions. In: Proceedings of ACM International Conference on Multimodal Interaction, pp. 162–170 (2006)
Google Scholar
Tao, H., Huang, T.: Explanation-based facial motion tracking using a piecewise bézier volume deformation model. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 611–617 (1999)
Google Scholar
Mandal, B., Ouarti, N.: Spontaneous vs. posed smiles - can we tell the difference? In: International Conference on Computer Vision and Image Processing (CVIP), vol. 460, pp. 261–271. Roorkee, India (2016)
Google Scholar
Mandal, B., Chia, S.-C., Li, L., Chandrasekhar, V., Tan, C., Lim, J.-H.: A wearable face recognition system on google glass for assisting social interactions. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9010, pp. 419–433. Springer, Cham (2015). doi:10.1007/978-3-319-16634-6_31
Google Scholar
Mandal, B., Li, L., Chandrasekhar, V., Lim, J.H.: Whole space subclass discriminant analysis for face recognition. In: IEEE International Conference on Image Processing (ICIP), Quebec, Canada, pp. 329–333 (2015)
Google Scholar
OpenCV: Open source computer vision (2014). http://opencv.org/
Yu, X., Han, W., Li, L., Shi, J.Y., Wang, G.: An eye detection and localization system for natural human and robot interaction without face detection. In: Groß, R., Alboul, L., Melhuish, C., Witkowski, M., Prescott, T.J., Penders, J. (eds.) TAROS 2011. LNCS, vol. 6856, pp. 54–65. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23232-9_6
Chapter Google Scholar
Mandal, B., Zhikai, W., Li, L., Kassim, A.A.: Evaluation of descriptors and distance measures on benchmarks and first-person-view videos for face identification. In: Jawahar, C.V., Shan, S. (eds.) ACCV 2014. LNCS, vol. 9008, pp. 585–599. Springer, Cham (2015). doi:10.1007/978-3-319-16628-5_42
Google Scholar
Tomasi, C., Kanade, T.: Detection and tracking of point features. Carnegie Mellon University Technical Report CMU-CS-91-132 (1991)
Google Scholar
Mandal, B., Eng, H.L.: 3-parameter based eigenfeature regularization for human activity recognition. In: 35th IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 954–957 (2010)
Google Scholar
FERET: Feret normalization (2005). http://www.cs.colostate.edu/evalfacerec/data/normalization.html
Mandal, B., Jiang, X.D., Kot, A.: Verification of human faces using predicted eigenvalues. In: 19th International Conference on Pattern Recognition (ICPR), Tempa, Florida, USA, pp. 1–4 (2008)
Google Scholar
Jiang, X.D., Mandal, B., Kot, A.: Face recognition based on discriminant evaluation in the whole space. In: IEEE 32nd International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), Honolulu, Hawaii, USA, pp. 245–248 (2007)
Google Scholar
Wu, H.Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., Freeman, W.T.: Eulerian video magnification for revealing subtle changes in the world. ACM Trans. Graph. 31, 65 (2012)
Article Google Scholar
Wadhwa, N., Rubinstein, M., Durand, F., Freeman, W.T.: Phase-based video motion processing. ACM Trans. Graph. 32 (2013)
Google Scholar
Li, X., Hong, X., Moilanen, A., Huang, X., Pfister, T., Zhao, G., Pietikäinen, M.: Reading hidden emotions: spontaneous micro-expression spotting and recognition. CoRR abs/1511.00423 (2015)
Google Scholar
Ojansivu, V., Heikkilä, J.: Blur insensitive texture classification using local phase quantization. Image Signal Process. 5099, 236–243 (2008)
Article Google Scholar
Ahonen, T., Rahtu, E., Ojansivu, V., Heikkilä, J.: Recognition of blurred faces using local phase quantization. In: 19th International Conference on Pattern Recognition, pp. 1–4 (2008)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1627–1645 (2009)
Article Google Scholar
Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). doi:10.1007/3-540-45103-X_50
Chapter Google Scholar
Hu, G., Yang, Y., Yi, D., Kittler, J., Christmas, W.J., Li, S.Z., Hospedales, T.M.: When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition. CoRR abs/1504.02351 (2015)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision Conference (BMVC), vol. 41, no. 1–41, p. 12 (2015)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Dibeklioglu, H., Salah, A., Gevers, T.: Recognition of genuine smiles. IEEE Trans. Multimedia 17, 279–294 (2015)
Article Google Scholar
Pfister, T., Li, X., Zhao, G., Pietikainen, M.: Differentiating spontaneous from posed facial expressions within a generic facial expression recognition framework. In: ICCV Workshop, pp. 868–875 (2011)
Google Scholar
Dibeklioglu, H., Valenti, R., Salah, A., Gevers, T.: Eyes do not lie: spontaneous versus posed smiles. In: ACM Multimedia, pp. 703–706 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Visual Computing Department, Institute for Infocomm Research, Singapore, Singapore
Bappaditya Mandal
Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
David Lee
Université Pierre et Marie Curie, Paris, France
Nizar Ouarti
Sorbonne Universités, Paris, France
Nizar Ouarti

Authors

Bappaditya Mandal
View author publications
You can also search for this author in PubMed Google Scholar
David Lee
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Ouarti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bappaditya Mandal .

Editor information

Editors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Chu-Song Chen
Tsinghua University , Beijing, China
Jiwen Lu
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Kai-Kuang Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandal, B., Lee, D., Ouarti, N. (2017). Distinguishing Posed and Spontaneous Smiles by Facial Dynamics. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10116. Springer, Cham. https://doi.org/10.1007/978-3-319-54407-6_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-54407-6_37
Published: 15 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54406-9
Online ISBN: 978-3-319-54407-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Distinguishing Posed and Spontaneous Smiles by Facial Dynamics

Abstract

Similar content being viewed by others

Spontaneous Versus Posed Smiles—Can We Tell the Difference?

Spontaneous versus posed smile recognition via region-specific texture descriptor and geometric facial dynamics

Static Posed Versus Genuine Smile Recognition

Keywords