Keywords

1 Introduction

The purpose of automatic facial expression recognition is to make the machine recognize different expressions of people, which has been widely used in the area of human-machine interaction. Many algorithms have been proposed for automatic facial expression recognition, in which the most crucial part is feature extraction. Recent successful features in facial expression recognition have been either handcrafted or learned from data. The handcraft feature focuses on constructing informative features manually. A good low-level feature should be both discriminative for inter-expression difference and invariant to intra-expression variations, such as lighting and the same expression of different people.

The Local Binary Pattern (LBP) descriptor has been widely used to both face recognition and facial expression recognition [1]. Existing feature extraction approaches using LBP can be generally grouped into two approaches: dividing face images into regular non-overlapping patches [2,3,4] and extracting patches centering at key points [5]. The regular non-overlapping dividing method first divides each face image into N overlapping patches with the same size, then applies the LBP to each patch. However, this method may separate the same information into two parts and this is not good for classification. The centering at key points patches method first uses the face alignment method to get landmarks, and then extracts feature centering at the key points. It relies largely on the accurate facial landmarks. However, most existing methods of this category present an expression image ignoring different kinds of information like details and the contour, which is crucial for facial expression recognition in reality.

This paper proposes a novel Double δ-LBP (Dδ-LBP) based facial expression recognition approach which applies the key points based method and also takes different kinds of information into consideration. After get the dense landmarks by the face alignment method, we extract patches centering at each landmark, and then apply two δ-LBP to get the detail and contour information respectively.

The performance of Dδ-LBP is validated on four databases: the Extended Cohn-Kanade (CK+) database [6, 7], the Japanese Female Facial Expression (JAFFE) database [8, 9], the MMI database [10] and the Real-world Affective Face Database (RAF-DB) [11]. Experimental results illustrate that Dδ-LBP achieves superior performance in comparison with the single δ-LBP method.

The main contributions of this paper are summarized as follows:

  1. (1)

    We employ two δ-LBP with two different parameters to obtain two parts of representation for facial expressions, which takes both detail and contour information into consideration and higher accuracies are achieved compared with single δ-LBP methods;

  2. (2)

    Different feature extraction methods of LBP are compared, which can be taken as a reference in further research;

The organization of the rest of this paper is as follows. Section 2 gives a review on LBP feature extraction methods in facial expression recognition and δ-LBP. Section 3 presents the proposed Double δ-LBP approach. Section 4 shows the experimental results. The conclusion is drawn in Sect. 5.

2 Related Work

In this section, we will first review existing feature extraction methods using LBP. Then, δ-LBP, which is the improved form of LBP is introduced. Also, we explain how we came up with the new feature extraction method Dδ-LBP. The form of LBP and the feature extraction method are two important factors in facial expression recognition using LBP. From the weakness of previous feature extraction methods, combined with advantages of the improved LBP, the Dδ-LBP is proposed.

2.1 Feature Extraction Methods Using LBP

We briefly review feature extraction methods using LBP in facial expression recognition in aforementioned two categories: regular non-overlapping dividing method and centering at key points method.

Considering the problem that an LBP histogram computed over the whole face image encodes only the occurrences of the micro-patterns without any indication about their locations, the regular non-overlapping dividing method equally divides face images into small regions R0, R1, … Rm to extract LBP histograms. Shan et al. [2] divide the 110 × 150 pixels face images into 18 × 21 pixels regions. That is, face images are divided into 42(6 × 7) regions and represented by the LBP histograms with the length of 2478, giving a good trade-off between recognition performance and feature vector length. Ahmed et al. [12] partition each image into a number of regions and the proposed CLBP histograms are generated from each of those regions. The histograms of all regions are concatenated to obtain the extended LBP histogram. And the number of regions divided is also estimated in the experiment.

The centering at key points based patches method extracts the feature of facial images centering at landmarks which need to use the face alignment method to get landmarks first. Chen et al. [5] constructed the feature by extracting multi-scale patches centered at dense facial landmarks. After using recent face alignment method, they extract multi-scale image patches centered around each landmark. Each patch is divided into a grid of cells and codes each cell by a certain descriptor. Finally, they concatenate all histograms to form the high-dimensional feature.

2.2 δ-LBP

The original LBP operator was introduced by Ojala et al. [15, 16] and was proved to be a powerful texture description as it can detect even a tiny change of the grayscale value. LBP operator is defined as follows:

$$ LBP_{P,R} \left( {x,y} \right) = \mathop \sum \limits_{P = 0}^{P - 1} s\left( {g_{P} - g_{c} } \right)2^{P} ,s(x) = \left\{ {\begin{array}{*{20}c} {0,x < 0} \\ {1,x \ge 0} \\ \end{array} } \right. $$
(1)

where gc stands for the grayscale value of the center pixel and gp (p = 0, 1, … P − 1) represents the neighbor of the center pixel on a circle of radius R, and P denotes the number of the neighbors. In conclusion, the LBP value of a pixel is computed by comparing this pixel with its neighbors.

One fatal weakness of the original LBP operator is that it is sensitive to noise, especially in the near-uniform facial image regions since the thresholds are set exactly to the value of central pixel. To address this problem, Lu et al. [13] proposed the δ-LBP operator. δ-LBP which has 2-valued codes by comparing twice can be considered as the simplified LTP which has 3-valued codes by comparing three times. δ-LBP cut back a formula and in this way, it can greatly reduce the computational burden. δ-LBP is defined as follows:

$$ \begin{array}{*{20}l} {\delta - LBP_{P,R} \left( {x,y} \right) = \mathop \sum \limits_{P = 0}^{P - 1} s\left( {g_{P} - g_{C} } \right)2^{P} } \hfill \\ {s(x) = \left\{ {\begin{array}{*{20}c} {0,x \le \delta th} \\ {1,x > \delta th} \\ \end{array} } \right.,\delta th \ge 0} \hfill \\ \end{array} $$
(2)

Compared with (1), (2) introduces a parameter δth to describe the difference between the peripheral pixel value and the intermediate pixel value and we can select different values of δth to achieve different effects. Figure 1 shows the encoding process of δ-LBP. Obviously, when δth is set to 0, δ-LBP equals to the original LBP.

Fig. 1.
figure 1

The encoding process of δ-LBP.

2.3 Motivation of Proposing Dδ-LBP

As mentioned in Sect. 2.1, most existing feature extraction methods using LBP in facial expression recognition did not take different kinds of information like details and the contour into account but relied on selecting patches on face images. In order to more fully extract the information of facial expression, we use two δ-LBP instead of single LBP to represent the features of facial expressions. One is used to extract the features of details, the other to extract the features of contours.

3 Facial Expression Representation Based on Dδ-LBP

The framework of the proposed approach to represent the facial expression based on Dδ-LBP is illustrated in Fig. 2. In this approach, a facial expression image is modeled as a combination of two histograms by applying two δ-LBP with two different δth. The approach can be described by the following procedure: (1) After pretreatment, e.g. face alignment, patches which have the size of 20 × 20 are generated centering around the landmarks of an input image. The number of patches equals to the number of landmarks; (2) The first histogram is formed by applying δ-LBP with the smaller parameter δth1 to each patch and concatenate them. This histogram represents the detail information of the facial expression; (3) The second histogram is formed by applying δ-LBP with the larger parameter δth2 to the patches and concatenate them. The second histogram represents the contour information of the facial expression; (4) The first and the second histograms are concatenated to form the final histogram as the representation of the facial expression. The more detailed procedure and the selection of parameters are described below.

Fig. 2.
figure 2

Representation for the facial expression based on Dδ-LBP.

3.1 Facial Expression Representation Based on Single δ-LBP

The parameter δth occupies the most significant position in the δ-LBP. In addition, the choice of δth affects the representation result. When the value of δth is small, the texture map obtained by applying δ-LBP operator presents more details of the face. When the value of δth is large, the texture map shows more contour information for the reason that δ-LBP with larger δth emphasizes the contrast of the surrounding pixels and the middle pixel value—Only those grayscale value contrasts between the surrounding and the middle pixel are obvious (e.g. edge regions like eyebrows, eyes, and mouth etc.) may be set to 0, while other gentle areas is set to 1. Therefore, the selection of δth depends on the problem you need to solve. Figure 3 shows the texture map using δ-LBP with different values of δth.

Fig. 3.
figure 3

Examples of texture map using δ-LBP with different δth. (a) The original face image. (b) The texture map using δ-LBP with δth = 0. (c) The texture map using δ-LBP with δth = 3. (d) The texture map using δ-LBP with δth = 5. (e) The texture map using δ-LBP with δth = 10. (f) The texture map using δ-LBP with δth = 15.

3.2 Facial Expression Representation Using Double δ-LBP

The information of facial expression is reflected in two aspects: One is the change of facial features like eyebrows, eyes and mouth, etc., the other is the creation of wrinkles. In this case, we can use a pair of δ-LBP to represent these two different types of information. As illustrated in Fig. 3, δ-LBP with smaller δth reflects more details of the face, so we can use a δ-LBP with smaller δth to stand for wrinkles. δ-LBP with larger δth reflects contours of the face, so a δ-LBP with larger δth is used to demonstrate facial features. The detailed procedure is described below.

Firstly, we employ the 3000 fps [14] to get 68 landmarks and align faces. Then, the 68 patches which have the size of 20 × 20 are centering on landmarks obtained in the previous step and the boundary points that do not fall in the center of pixels can be estimated by bilinear interpolation. The feature vector of a face image is achieved by concatenate two histograms, each of which has the length of 68 × 59. The first histogram is obtained by δ-LBP with smaller δth to describe detail information. The secondary histogram is obtained by δ-LBP with the larger δth to describe contour information.

The most crucial part of the procedure above is the selection of δth1 and δth2. It derives from the result with the single δ-LBP. The single δ-LBP is similar to the Dδ-LBP and the only different step is that the feature vector is formed with one histogram obtained from δ-LBP with one δth. We increase the value of δth from 0 to 40 in the step of 1 to get 41 results of recognition rate using the single δ-LBP method. Then we found that the result has a significant phenomenon: The highest recognition rate is achieved by a smaller δth and the second highest recognition rate is achieved by a larger δth. It turns out that smaller δth preserves more information including details and the contour at the same time. The larger δth cuts off most noise and can preserve the information of contours of the face, which are the most discriminating features of facial expressions. Chen et al. [5] found that high dimensionality leads to high performance in face recognition because it contains the amount of discriminative information for inter-person difference. To construct informative feature, we form the higher dimensional feature containing both details information with δth1 achieved the best performance using the single δ-LBP and the contour information with δth2 achieved the second-best performance using the single δ-LBP.

3.3 Dimension Reduction and Classification

We construct the high dimensional feature, so before the feature is input to the classifier, the high dimensional feature should be compressed. We use the joint of supervised and unsupervised subspace learning methods, joining Principal Component Analysis (PCA) [17] and Linear Discriminant Analysis (LDA) [18] to compress the high dimensional feature. After the compression, we send the feature into the support vector machine (SVM) [19] to recognize the expression.

4 Experiments

4.1 Database

The Extended Cohn-Kanade (CK+) Database.

The database [6, 7] includes 593 sequences from 123 subjects posed or nonposed by 210 people from different area and different genders. The image sequence varies in duration and incorporate the onset to peak formation of the facial expressions.

The Japanese Female Facial Expression (JAFFE) Database.

The database [8, 9] contains 213 photos of seven classes of facial expressions (six classes of basic facial expressions + 1 neutral faces) posed by ten Japanese females.

The MMI Database.

The MMI database [10] consists of 30 subjects of both sexes (44% female) aged from 19 to 62, including either Asian, European or South American ethnic background and 213 sequences have been labeled with six expressions.

The Real-World Affective Face Database (RAF-DB).

The RAF-DB [11] is a large and diverse real-world database that contains 29,672 static face images uploaded by Flickr users worldwide and provides multi-tagged emotional annotations.

4.2 Experiments on Single δ-LBP Method

Before conducting the experiment on Dδ-LBP, we first investigate the single δ-LBP method to determine the two crucial parameters: δth1 and δth2. In this experiment, we extract image patches centered at 68 landmarks and the patch size is fixed to 20 × 20. Each patch is further divided into 3 × 3 cells then each cell is encoded with δ-LBP. 68 histograms calculated from each patch constitute the feature of one image. The dimension of the features is reduced by joint PCA (reserving 90% energy) and LDA. We apply 41 δth which range from 0 to 40 to evaluate the performance of the single δ-LBP method.

Experiment on CK+ Database.

Six basic emotions (remove all “contempt” sequences) and neutral face are used to compare with other methods. For each sequence, the first image (neutral face) and the three peak images are used for prototype facial expression recognition. We construct 10 person-independent subsets by ascending ID order with the step size of 10 based on the subject ID in the dataset. Also, 10-fold cross validation is adopted. The average recognition rate of 7 classes with δth varying from 0 to 40 is shown in Fig. 4(a). We can find that δth which performs the best has the value of 2 and δth which performs the second-best has the value of 11. One has a smaller value and the other one has a larger value.

Fig. 4.
figure 4

(a) The average recognition rate of 7 classes with δth varying from 0 to 40 on CK+. (b) The average recognition rate of 7 classes with δth varying from 0 to 40 on JAFFE.

Experiment on JAFFE Database.

All the 213 images are used for 7-class expression recognition. We adopt person-independent facial expression recognition and 10-fold cross-validation. Specifically, we use all images of one person as the validation set and the remaining images as the training set and the experiment is repeated 10 times so that each person is used for testing. The recognition rate with δth varying from 0 to 40 is shown in Fig. 4(b). We can also easily find that δth which performs the best has the value of 2 and δth which performs the second-best has the value of 10. One has a smaller value and the other one has a larger value.

4.3 Experiment on Different Face Region Selection Methods

To decide which face region selection method to use in Dδ-LBP, we test the single δ-LBP in two selection methods: dividing the face into regular non-overlapped patches and patches centering at the key points. In the first method, we divide the face into 6 × 7 regular patches. In the second method, we use 3000 fps to get the 68 key points. As shown in Table 1, the centering at the key points method performs better than the other method. So we choose centering at the key points method in Dδ-LBP.

Table 1. The comparison between different face region selection methods.

4.4 Experiment on Double δ-LBP Method

We conduct Dδ-LBP experiment on both lab-controlled and wild environment databases. δth1 is set to 2 and δth2 to 11 on four databases.

We also extract the feature centered at each 68 landmarks and each patch is fixed to 20 × 20. First, we apply δ-LBP with δth1 to all patches and concatenate them together to form the 68 × 59 = 4012 dimension feature. Second, we apply δ-LBP with δth2 to all patches and concatenate them together to form the second 4012-dimension feature. Finally, we concatenate the two 4012-dimension features to form the final 8024-dimension feature. This high dimensional feature is reduced by joint PCA (reserving 90% energy) and LDA before it is input to the SVM classifier.

The comparison of recognition rate between single δ-LBP and Dδ-LBP on two databases is reported in Table 2. As shown in Table 2, the improvement of recognition rates due to Dδ-LBP are 2.27%–7.84% on four databases, which shows the effectiveness of the Dδ-LBP method.

Table 2. The comparison between single δ-LBP and Dδ-LBP.

5 Conclusions

A double δ-LBP based facial expression recognition method (Dδ-LBP) is proposed in this paper. Dδ-LBP employs two δ-LBP to represent facial expression with different scales of information into consideration. Considering the most important property of facial expressions, we use two δth to represent the detail and the contour information separately. Experiments are conducted on four databases to illustrate the effectiveness of the proposed method. Compared with the single δ-LBP method, Dδ-LBP achieves better performance in terms of facial expression recognition accuracy. The key advantages of Dδ-LBP method is that it takes both details and the contour of faces into account, which can fully extract the information of facial expressions. The proposed method can also be applied to other fields, such as face recognition, object detection, and so on.