Double δ-LBP: A Novel Feature Extraction Method for Facial Expression Recognition

Shen, Fang; Liu, Jing; Wu, Peng

doi:10.1007/978-981-13-1702-6_37

Fang Shen¹¹,
Jing Liu¹¹ &
Peng Wu¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 875))

Included in the following conference series:

Chinese Conference on Image and Graphics Technologies

1859 Accesses
2 Citations

Abstract

The Local Binary Pattern (LBP) is a widely used descriptor in facial expression recognition due to its efficiency and effectiveness. However, existing facial expression recognition methods based on LBP either ignore different kinds of information, such as details and the contour of faces, or rely on the division of face images, such as dividing the face image into blocks or letting the block centering on landmarks. Considering this problem, to make full use of both detail and contour face information in facial expression recognition, we propose a novel feature extraction method based on double δ-LBP (Dδ-LBP) in this paper. In this method, two δ-LBPs are employed to represent details and the contour of faces separately, which take different kinds of information of facial expression into account. Experiments conducted on both lab-controlled and wild environment databases show that Dδ-LBP outperforms the original LBP method.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Facial Expression Recognition Based on Feature Block and Local Binary Pattern

Extracting Local Binary Patterns from Image Key Points: Application to Automatic Facial Expression Recognition

Low-resolution expression recognition based on central oblique average CS-LBP with adaptive threshold

Article 17 November 2017

Keywords

1 Introduction

The purpose of automatic facial expression recognition is to make the machine recognize different expressions of people, which has been widely used in the area of human-machine interaction. Many algorithms have been proposed for automatic facial expression recognition, in which the most crucial part is feature extraction. Recent successful features in facial expression recognition have been either handcrafted or learned from data. The handcraft feature focuses on constructing informative features manually. A good low-level feature should be both discriminative for inter-expression difference and invariant to intra-expression variations, such as lighting and the same expression of different people.

The Local Binary Pattern (LBP) descriptor has been widely used to both face recognition and facial expression recognition [1]. Existing feature extraction approaches using LBP can be generally grouped into two approaches: dividing face images into regular non-overlapping patches [2,3,4] and extracting patches centering at key points [5]. The regular non-overlapping dividing method first divides each face image into N overlapping patches with the same size, then applies the LBP to each patch. However, this method may separate the same information into two parts and this is not good for classification. The centering at key points patches method first uses the face alignment method to get landmarks, and then extracts feature centering at the key points. It relies largely on the accurate facial landmarks. However, most existing methods of this category present an expression image ignoring different kinds of information like details and the contour, which is crucial for facial expression recognition in reality.

This paper proposes a novel Double δ-LBP (Dδ-LBP) based facial expression recognition approach which applies the key points based method and also takes different kinds of information into consideration. After get the dense landmarks by the face alignment method, we extract patches centering at each landmark, and then apply two δ-LBP to get the detail and contour information respectively.

The performance of Dδ-LBP is validated on four databases: the Extended Cohn-Kanade (CK+) database [6, 7], the Japanese Female Facial Expression (JAFFE) database [8, 9], the MMI database [10] and the Real-world Affective Face Database (RAF-DB) [11]. Experimental results illustrate that Dδ-LBP achieves superior performance in comparison with the single δ-LBP method.

The main contributions of this paper are summarized as follows:

(1)
We employ two δ-LBP with two different parameters to obtain two parts of representation for facial expressions, which takes both detail and contour information into consideration and higher accuracies are achieved compared with single δ-LBP methods;
(2)
Different feature extraction methods of LBP are compared, which can be taken as a reference in further research;

The organization of the rest of this paper is as follows. Section 2 gives a review on LBP feature extraction methods in facial expression recognition and δ-LBP. Section 3 presents the proposed Double δ-LBP approach. Section 4 shows the experimental results. The conclusion is drawn in Sect. 5.

2 Related Work

In this section, we will first review existing feature extraction methods using LBP. Then, δ-LBP, which is the improved form of LBP is introduced. Also, we explain how we came up with the new feature extraction method Dδ-LBP. The form of LBP and the feature extraction method are two important factors in facial expression recognition using LBP. From the weakness of previous feature extraction methods, combined with advantages of the improved LBP, the Dδ-LBP is proposed.

2.1 Feature Extraction Methods Using LBP

We briefly review feature extraction methods using LBP in facial expression recognition in aforementioned two categories: regular non-overlapping dividing method and centering at key points method.

Considering the problem that an LBP histogram computed over the whole face image encodes only the occurrences of the micro-patterns without any indication about their locations, the regular non-overlapping dividing method equally divides face images into small regions R₀, R₁, … R_m to extract LBP histograms. Shan et al. [2] divide the 110 × 150 pixels face images into 18 × 21 pixels regions. That is, face images are divided into 42(6 × 7) regions and represented by the LBP histograms with the length of 2478, giving a good trade-off between recognition performance and feature vector length. Ahmed et al. [12] partition each image into a number of regions and the proposed CLBP histograms are generated from each of those regions. The histograms of all regions are concatenated to obtain the extended LBP histogram. And the number of regions divided is also estimated in the experiment.

The centering at key points based patches method extracts the feature of facial images centering at landmarks which need to use the face alignment method to get landmarks first. Chen et al. [5] constructed the feature by extracting multi-scale patches centered at dense facial landmarks. After using recent face alignment method, they extract multi-scale image patches centered around each landmark. Each patch is divided into a grid of cells and codes each cell by a certain descriptor. Finally, they concatenate all histograms to form the high-dimensional feature.

2.2 δ-LBP

The original LBP operator was introduced by Ojala et al. [15, 16] and was proved to be a powerful texture description as it can detect even a tiny change of the grayscale value. LBP operator is defined as follows:

$$ LBP_{P,R} \left( {x,y} \right) = \mathop \sum \limits_{P = 0}^{P - 1} s\left( {g_{P} - g_{c} } \right)2^{P} ,s(x) = \left\{ {\begin{array}{*{20}c} {0,x < 0} \\ {1,x \ge 0} \\ \end{array} } \right. $$

(1)

where g_c stands for the grayscale value of the center pixel and g_p (p = 0, 1, … P − 1) represents the neighbor of the center pixel on a circle of radius R, and P denotes the number of the neighbors. In conclusion, the LBP value of a pixel is computed by comparing this pixel with its neighbors.

One fatal weakness of the original LBP operator is that it is sensitive to noise, especially in the near-uniform facial image regions since the thresholds are set exactly to the value of central pixel. To address this problem, Lu et al. [13] proposed the δ-LBP operator. δ-LBP which has 2-valued codes by comparing twice can be considered as the simplified LTP which has 3-valued codes by comparing three times. δ-LBP cut back a formula and in this way, it can greatly reduce the computational burden. δ-LBP is defined as follows:

$$ \begin{array}{*{20}l} {\delta - LBP_{P,R} \left( {x,y} \right) = \mathop \sum \limits_{P = 0}^{P - 1} s\left( {g_{P} - g_{C} } \right)2^{P} } \hfill \\ {s(x) = \left\{ {\begin{array}{*{20}c} {0,x \le \delta th} \\ {1,x > \delta th} \\ \end{array} } \right.,\delta th \ge 0} \hfill \\ \end{array} $$

(2)

Compared with (1), (2) introduces a parameter δ_th to describe the difference between the peripheral pixel value and the intermediate pixel value and we can select different values of δ_th to achieve different effects. Figure 1 shows the encoding process of δ-LBP. Obviously, when δ_th is set to 0, δ-LBP equals to the original LBP.

2.3 Motivation of Proposing Dδ-LBP

As mentioned in Sect. 2.1, most existing feature extraction methods using LBP in facial expression recognition did not take different kinds of information like details and the contour into account but relied on selecting patches on face images. In order to more fully extract the information of facial expression, we use two δ-LBP instead of single LBP to represent the features of facial expressions. One is used to extract the features of details, the other to extract the features of contours.

3 Facial Expression Representation Based on Dδ-LBP

The framework of the proposed approach to represent the facial expression based on Dδ-LBP is illustrated in Fig. 2. In this approach, a facial expression image is modeled as a combination of two histograms by applying two δ-LBP with two different δ_th. The approach can be described by the following procedure: (1) After pretreatment, e.g. face alignment, patches which have the size of 20 × 20 are generated centering around the landmarks of an input image. The number of patches equals to the number of landmarks; (2) The first histogram is formed by applying δ-LBP with the smaller parameter δ_th1 to each patch and concatenate them. This histogram represents the detail information of the facial expression; (3) The second histogram is formed by applying δ-LBP with the larger parameter δ_th2 to the patches and concatenate them. The second histogram represents the contour information of the facial expression; (4) The first and the second histograms are concatenated to form the final histogram as the representation of the facial expression. The more detailed procedure and the selection of parameters are described below.

3.1 Facial Expression Representation Based on Single δ-LBP

The parameter δ_th occupies the most significant position in the δ-LBP. In addition, the choice of δ_th affects the representation result. When the value of δ_th is small, the texture map obtained by applying δ-LBP operator presents more details of the face. When the value of δ_th is large, the texture map shows more contour information for the reason that δ-LBP with larger δ_th emphasizes the contrast of the surrounding pixels and the middle pixel value—Only those grayscale value contrasts between the surrounding and the middle pixel are obvious (e.g. edge regions like eyebrows, eyes, and mouth etc.) may be set to 0, while other gentle areas is set to 1. Therefore, the selection of δ_th depends on the problem you need to solve. Figure 3 shows the texture map using δ-LBP with different values of δ_th.

3.2 Facial Expression Representation Using Double δ-LBP

The information of facial expression is reflected in two aspects: One is the change of facial features like eyebrows, eyes and mouth, etc., the other is the creation of wrinkles. In this case, we can use a pair of δ-LBP to represent these two different types of information. As illustrated in Fig. 3, δ-LBP with smaller δ_th reflects more details of the face, so we can use a δ-LBP with smaller δ_th to stand for wrinkles. δ-LBP with larger δ_th reflects contours of the face, so a δ-LBP with larger δ_th is used to demonstrate facial features. The detailed procedure is described below.

Firstly, we employ the 3000 fps [14] to get 68 landmarks and align faces. Then, the 68 patches which have the size of 20 × 20 are centering on landmarks obtained in the previous step and the boundary points that do not fall in the center of pixels can be estimated by bilinear interpolation. The feature vector of a face image is achieved by concatenate two histograms, each of which has the length of 68 × 59. The first histogram is obtained by δ-LBP with smaller δ_th to describe detail information. The secondary histogram is obtained by δ-LBP with the larger δ_th to describe contour information.

The most crucial part of the procedure above is the selection of δ_th1 and δ_th2. It derives from the result with the single δ-LBP. The single δ-LBP is similar to the Dδ-LBP and the only different step is that the feature vector is formed with one histogram obtained from δ-LBP with one δ_th. We increase the value of δ_th from 0 to 40 in the step of 1 to get 41 results of recognition rate using the single δ-LBP method. Then we found that the result has a significant phenomenon: The highest recognition rate is achieved by a smaller δ_th and the second highest recognition rate is achieved by a larger δ_th. It turns out that smaller δ_th preserves more information including details and the contour at the same time. The larger δ_th cuts off most noise and can preserve the information of contours of the face, which are the most discriminating features of facial expressions. Chen et al. [5] found that high dimensionality leads to high performance in face recognition because it contains the amount of discriminative information for inter-person difference. To construct informative feature, we form the higher dimensional feature containing both details information with δ_th1 achieved the best performance using the single δ-LBP and the contour information with δ_th2 achieved the second-best performance using the single δ-LBP.

3.3 Dimension Reduction and Classification

We construct the high dimensional feature, so before the feature is input to the classifier, the high dimensional feature should be compressed. We use the joint of supervised and unsupervised subspace learning methods, joining Principal Component Analysis (PCA) [17] and Linear Discriminant Analysis (LDA) [18] to compress the high dimensional feature. After the compression, we send the feature into the support vector machine (SVM) [19] to recognize the expression.

4 Experiments

4.1 Database

The Extended Cohn-Kanade (CK+) Database.

The database [6, 7] includes 593 sequences from 123 subjects posed or nonposed by 210 people from different area and different genders. The image sequence varies in duration and incorporate the onset to peak formation of the facial expressions.

The Japanese Female Facial Expression (JAFFE) Database.

The database [8, 9] contains 213 photos of seven classes of facial expressions (six classes of basic facial expressions + 1 neutral faces) posed by ten Japanese females.

The MMI Database.

The MMI database [10] consists of 30 subjects of both sexes (44% female) aged from 19 to 62, including either Asian, European or South American ethnic background and 213 sequences have been labeled with six expressions.

The Real-World Affective Face Database (RAF-DB).

The RAF-DB [11] is a large and diverse real-world database that contains 29,672 static face images uploaded by Flickr users worldwide and provides multi-tagged emotional annotations.

4.2 Experiments on Single δ-LBP Method

Before conducting the experiment on Dδ-LBP, we first investigate the single δ-LBP method to determine the two crucial parameters: δ_th1 and δ_th2. In this experiment, we extract image patches centered at 68 landmarks and the patch size is fixed to 20 × 20. Each patch is further divided into 3 × 3 cells then each cell is encoded with δ-LBP. 68 histograms calculated from each patch constitute the feature of one image. The dimension of the features is reduced by joint PCA (reserving 90% energy) and LDA. We apply 41 δ_th which range from 0 to 40 to evaluate the performance of the single δ-LBP method.

Experiment on CK+ Database.

Six basic emotions (remove all “contempt” sequences) and neutral face are used to compare with other methods. For each sequence, the first image (neutral face) and the three peak images are used for prototype facial expression recognition. We construct 10 person-independent subsets by ascending ID order with the step size of 10 based on the subject ID in the dataset. Also, 10-fold cross validation is adopted. The average recognition rate of 7 classes with δ_th varying from 0 to 40 is shown in Fig. 4(a). We can find that δ_th which performs the best has the value of 2 and δ_th which performs the second-best has the value of 11. One has a smaller value and the other one has a larger value.

Experiment on JAFFE Database.

All the 213 images are used for 7-class expression recognition. We adopt person-independent facial expression recognition and 10-fold cross-validation. Specifically, we use all images of one person as the validation set and the remaining images as the training set and the experiment is repeated 10 times so that each person is used for testing. The recognition rate with δ_th varying from 0 to 40 is shown in Fig. 4(b). We can also easily find that δ_th which performs the best has the value of 2 and δ_th which performs the second-best has the value of 10. One has a smaller value and the other one has a larger value.

4.3 Experiment on Different Face Region Selection Methods

To decide which face region selection method to use in Dδ-LBP, we test the single δ-LBP in two selection methods: dividing the face into regular non-overlapped patches and patches centering at the key points. In the first method, we divide the face into 6 × 7 regular patches. In the second method, we use 3000 fps to get the 68 key points. As shown in Table 1, the centering at the key points method performs better than the other method. So we choose centering at the key points method in Dδ-LBP.

Table 1. The comparison between different face region selection methods.

Full size table

4.4 Experiment on Double δ-LBP Method

We conduct Dδ-LBP experiment on both lab-controlled and wild environment databases. δ_th1 is set to 2 and δ_th2 to 11 on four databases.

We also extract the feature centered at each 68 landmarks and each patch is fixed to 20 × 20. First, we apply δ-LBP with δ_th1 to all patches and concatenate them together to form the 68 × 59 = 4012 dimension feature. Second, we apply δ-LBP with δ_th2 to all patches and concatenate them together to form the second 4012-dimension feature. Finally, we concatenate the two 4012-dimension features to form the final 8024-dimension feature. This high dimensional feature is reduced by joint PCA (reserving 90% energy) and LDA before it is input to the SVM classifier.

The comparison of recognition rate between single δ-LBP and Dδ-LBP on two databases is reported in Table 2. As shown in Table 2, the improvement of recognition rates due to Dδ-LBP are 2.27%–7.84% on four databases, which shows the effectiveness of the Dδ-LBP method.

Table 2. The comparison between single δ-LBP and Dδ-LBP.

Full size table

5 Conclusions

A double δ-LBP based facial expression recognition method (Dδ-LBP) is proposed in this paper. Dδ-LBP employs two δ-LBP to represent facial expression with different scales of information into consideration. Considering the most important property of facial expressions, we use two δ_th to represent the detail and the contour information separately. Experiments are conducted on four databases to illustrate the effectiveness of the proposed method. Compared with the single δ-LBP method, Dδ-LBP achieves better performance in terms of facial expression recognition accuracy. The key advantages of Dδ-LBP method is that it takes both details and the contour of faces into account, which can fully extract the information of facial expressions. The proposed method can also be applied to other fields, such as face recognition, object detection, and so on.

References

Huang, D., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 41(6), 765–781 (2011)
Article Google Scholar
Shan, C., Gong, S., Mcowan, P.W.: Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis. Comput. 27(6), 803–816 (2009)
Article Google Scholar
Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)
Article Google Scholar
Kumari, J., Rajesh, R., Pooja, K.: Facial expression recognition: a survey. Int. Symp. Comput. Vis. Internet 58, 486–491 (2015)
Google Scholar
Chen, D., Cao, X., Wen, F., Sun, J.: Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. Comput. Vis. Pattern Recognit. 9(4), 3025–3032 (2013)
Google Scholar
Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: 2000 IEEE International Conference on Automatic Face and Gesture Recognition, pp. 484–490 (2000)
Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J.: The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 36, no. 1, pp. 94–101 (2010)
Google Scholar
Lyons, M., Akamatsu, S., Kamachi, M., Gyoba, J.: Coding facial expressions with Gabor wavelets. In: 1998 IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200–205 (1998)
Google Scholar
Lyons, M.J., Budynek, J., Akamatsu, S.: Automatic classification of single facial images. IEEE Trans. Pattern Anal. Mach. Intell. 21(12), 1357–1362 (1999)
Article Google Scholar
Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Web-based database for facial expression analysis. In: Proceedings of the 2005 IEEE International Conference on Multimedia and Expo, vol. 14 (2005)
Google Scholar
Deng, W., Hu, J., Zhang, S., Guo, J.: DeepEmo: real-world facial expression analysis via deep learning. In: Visual Communications and Image Processing, pp. 1–4 (2016)
Google Scholar
Ahmed, F., Hossain, E., Bari, A.S.M.H, Shihavuddin, A.: Compound local binary pattern (CLBP) for robust facial expression recognition. In: IEEE International Symposium on Computational Intelligence and Informatics, pp. 391–395 (2011)
Google Scholar
Lu, S., Yang, J.H., Zhang, B., Zhang, J.Q.: Infrared target detection based on LBP. J. Changchun Univ. Sci. Technol. 32(1), 22–24 (2009)
Google Scholar
Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 fps via regressing local binary features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1685–1692 (2014)
Google Scholar
Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Article Google Scholar
Ojala, T., Pietikäinen, M., Mäenpää, T.: A generalized local binary pattern operator for multiresolution gray scale and rotation invariant texture classification. In: Singh, S., Murshed, N., Kropatsch, W. (eds.) ICAPR 2001. LNCS, vol. 2013, pp. 399–408. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44732-6_41
Chapter MATH Google Scholar
Jolliffe, I.T.: Principal Component Analysis, vol. 87, pp. 41–64. Springer, Berlin (1986). https://doi.org/10.1007/978-1-4757-1904-8. no. 100
Book Google Scholar
Altman, E.I., Marco, G., Varetto, F.: Corporate distress diagnosis: comparisons using linear discriminant analysis and neural networks. J. Bank. Financ. 18(3), 505–529 (1994)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, Xidian University, Xi’an, China
Fang Shen, Jing Liu & Peng Wu

Authors

Fang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang Shen .

Editor information

Editors and Affiliations

Beijing Institute of Technology, Beijing, China
Yongtian Wang
Beihang University, Beijing, China
Zhiguo Jiang
Peking University, Beijing, China
Yuxin Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, F., Liu, J., Wu, P. (2018). Double δ-LBP: A Novel Feature Extraction Method for Facial Expression Recognition. In: Wang, Y., Jiang, Z., Peng, Y. (eds) Image and Graphics Technologies and Applications. IGTA 2018. Communications in Computer and Information Science, vol 875. Springer, Singapore. https://doi.org/10.1007/978-981-13-1702-6_37

Download citation

DOI: https://doi.org/10.1007/978-981-13-1702-6_37
Published: 12 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1701-9
Online ISBN: 978-981-13-1702-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics