Keywords

1 Introduction

Human face communicates useful information about a person’s emotional state or expressions. Accurate recognition of facial expression is difficult due to facial occlusions, pose variation and illumination differences. The existing expression recognition methods perform well for the high resolution images. However, in real world applications such as visual surveillance and smart meeting, the input face images are often at low resolutions. In this paper, a method has been presented to recognize the expression of face images by creating a compressed image space. The main contribution of this paper is summarized as pre-processing to overcome the illumination differences, next transformation to a compressed size image by generating block-wise binary patterns, block based histogram calculation as features and finally classification of expression. By introducing modified LBP based transformation the image size is reduced and computation speed is improved. We comprehensively study facial expression recognition with different classifiers and compared to previous work based on Local Binary Pattern (LBP) and Gabor Wavelet [13]. Our technique provides better performance with respect to computation speed and recognition accuracy. The recognition accuracy for the low resolution images are specifically good and look promising for the real world applications.

Automatic facial expression recognition involves two important steps: facial feature representation and designing of classifier. A number of methods have been developed for extracting features from face images like Facial Action Coding System (FACS), Principal Component Analysis (PCA) [47], Local Binary Pattern (LBP) [1, 2, 8], Independent Component Analysis (ICA), Linear Discriminate Analysis (LDA) [9], Edge detection, Active Appearance Model (AAM), Gabor Wavelet [10, 11], Contourlet Transform [1214].

The rest of the paper is organized as follows. We present a brief review of related work in this section. Section 2 presents the methodology followed, Sect. 3 presents the experimental results analysis and finally concluding remarks are summarized in Sect. 4.

2 Proposed Method

The block diagram of the proposed method for expression recognition is shown in Fig. 1, consisting of five main modules: face detection, pre-processing, modified LBP, feature extraction and classification.

Fig. 1
figure 1

Flow diagram of expression detection

2.1 Face Detection

The first step of expression recognition is face detection. In the paper face detection has been carried out using Viola Jones method [15], and satisfactory results are achieved with a correct detection rate of 99 %, tested on Caltech Image Database.

2.2 Preprocessing

After face detection, the feature area is extracted by cropping the face image 15 % from right and left and 20 % from top of the image for removing ears and hairs. Then in the next step Gaussian smoothing filter is applied and image is resized. At the last step of preprocessing, histogram equalization is performed to overcome the illumination differences.

2.3 Modified LBP

The preprocessed image is transformed to an image of reduced dimension by introducing modified LBP. The preprocessed image is first divided into (3 × 3) blocks, and we calculate the threshold value for each block. The binary pattern for each block is obtained after thresholding and represented by equivalent decimal value. The threshold for generating the binary pattern is calculated using a new technique. The algorithm is described below:

A binary pattern is thus obtained for block B, as shown in Fig. 2. In this way for each block transformed intensity value is thus obtained considering decimal number of eight bit pattern. Therefore, the preprocessed image is transformed into compressed image as shown in Fig. 3.

Fig. 2
figure 2

Modified LBP for block B

Fig. 3
figure 3

Compressed pattern image

2.4 Feature Extraction

To extract features from the compressed modified LBP image, divide the compressed pattern image into (M × N) blocks, and calculate histogram for each block. Concatenating the histogram of all (M × N) blocks, features are obtained, as shown in Fig. 4. During experiment we have extracted features for different block size, and for each block size we have calculated the histogram by increasing no. of bin from 5 to 59.

Fig. 4
figure 4

Histogram generation procedure

2.5 Classification

Facial expressions are classified using template matching and Support Vector machine. Computation speed is an important issue in real time system development, which depends on feature extraction and classification procedures.

Template Matching. We first adopt template matching technique for expression classification. Average value of histograms of the transformed image for a particular class is obtained to generate template1. In template2, weighted average technique is applied. Consider n number of training images and extract the features say, \( \{ x_{1} ,x_{2} , \ldots ,x_{n} \} \) where x i represents average feature value of histograms for image i. Generate non negative weight set randomly \( \{ w_{1} ,w_{2} , \ldots ,w_{n} \} \) and calculate average feature for template2 using Eq. (1).

$$ \bar{x} = \frac{{\mathop \sum \nolimits_{{i} = 1}^{n} {w}_{i} {x}_{i} }}{{\mathop \sum \nolimits_{{i} = 1}^{n} {w}_{i} }} $$
(1)

Then by sorting the data set (features) in ascending order and weight in descending order, the weight is multiplied with corresponding feature, which imply more weight to the lesser feature value. To create a template, 50 images are used for each class. Then for any test image, after extracting the features, nearest-neighbor classifier is adopted to match with the closest template. We have used Euclidean distance to measure similarity.

Support Vector Machine (SVM). Support vector machine performs an implicit mapping of data into a higher dimensional feature space, and finds a linear separating hyper plane with the maximal margin to separate data in the higher dimensional space. Given a training set of labeled examples \( {\text{F}}_{\text{train}} = \{ (x_{i } ,y_{i } ),\,i = 1, \ldots ,p\} \), where \( x_{i } \in {\text{R}}^{n} \) and \( y_{i } \in \{ 1, - 1\} \), then the new test data set is classified by the function, described in Eq. (2).

$$ f\left( x \right) = {\text{sgn}}(\sum\limits_{i = 1}^{p} {\alpha_{i} y_{i} K\left( {x_{i } ,x} \right) + b } ) $$
(2)

where \( \alpha_{i} \) is the Lagrange multiplier, of dual optimization problem, \( K\left( {x_{i } ,x} \right) \) is the kernel function and b is the threshold parameter of the hyperplane. Given a non-linear mapping Φ that embeds the input data into high dimensional space, kernels have the form of \( K\left( {x_{i } ,x_{j } } \right) = \langle\Phi \left( {x_{i } } \right) \cdot\Phi (x_{j } ) \rangle. \) The most frequently used kernels are polynomial kernels and radial basis functions.

3 Results and Discussions

The proposed algorithm was trained and tested on the Cohn Kanade facial expression database [16] consists of 100 university students aged between 18 and 30 years, among which 65 % female, 15 % African–American, and 3 % Asian or Latino.

Database contains anger, disgust, happy, neutral, sadness, surprise and some of feared face image sequences. For experiments, we selected 600 images from the database where Fig. 5 shows some sample images from the Cohn-Kanade database.

Fig. 5
figure 5

Sample face expression images from the Cohn Kanade database

3.1 Results of Template Matching

The recognition performances of template matching techniques are shown in Table 1 considering facial images of size (128 × 128) pixels. For feature extraction, after image compression using modified LBP, image is divided into blocks of (8 × 8) pixels per region. The template matching technique achieves maximum accuracy of 89 % for the weighted average method (template2) and for simple average (template1) it is 83 %. We have tested the template matching techniques, for images of different resolutions and observed that images of (128 × 128) resolution give best result. We compared the results with [1, 2], where template matching technique has been used to classify the expression. Comparison in Table 2 illustrates that our template matching technique performs better.

Table 1 Recognition performance of template matching techniques
Table 2 Comparison with other template matching for 6-class expression recognition

3.2 Results of SVM

SVM can classify objects or training samples into two categories, so the multi classification can be performed using one-against-rest technique, which trains binary classifiers to differentiate one expression from all others. The performance has been achieved with different kernels are shown in Table 3.

Table 3 Classification accuracy for SVM classifier

For Table 3, the degree of the polynomial kernel is 1, and the standard deviation for the RBF kernel is 215 is considered. Facial images of (128 × 128) pixels are compressed (nine times) by our proposed technique and divided into block of (8 × 8) pixels per region. Thus compressed images are divided into 36 regions and then features are extracted using 5 bin histograms with the length of 180 (36 × 5). From Table 3 we conclude that surprise, sad, happy, angry are recognized with high accuracy (95.67–100 %) while the recognition rate for disgust and neutral is greater than 90 %. For the SVM implementation we have used Matlab and 10-fold cross-validation technique. We compare the computation time and number of features, is shown in Table 4. It is observed that our technique is better than the LBP and Gabor wavelet based feature extraction technique.

Table 4 Comparison of computation time and no of features using SVMs

To examine the performances of the proposed method for low resolution images we have studied 4 different resolutions of the face mages (110 × 150, 55 × 75, 36 × 48, 27 × 37) based on Cohn-Kanade database. Recognition performance for different resolution images is shown in Fig. 6.

Fig. 6
figure 6

Recognition performance of different resolution images for 59 bin

For the images of resolution 110 × 150 is divided into 18 × 21 pixels per region and for 55 × 75, 36 × 48, 27 × 37 resolution images are divided into 10 × 10 pixels per region. For the recognition of different expression we have used SVM with polynomial.

4 Conclusions

This paper presented a new method for facial expression recognition. Classification accuracy shows effectiveness of the proposed feature extraction method. Compare to Gabor wavelet and LBP features, the proposed technique save more computational time and resources. This feature extraction technique is robust and stable over a useful range of low resolution images. For the low resolution images when geometric features are not available, our technique can be applied for the expression recognition.