1 Introduction

Facial expressions of human beings are universal across all cultures and these are immediate means to understand the mood of the person [23]. It is observed that 55% of communication is easily conveyed to human beings by facial expressions only [14]. Over the past decade, Facial Expression Recognition (FER) has been a significant area in the field of computer vision [57]. The FER has many exciting applications in human-computer interaction, driver mood detection, and data driven animation [43]. With the development of technology, the analysis of facial expressions may be applied to education field, where we can able to understand the student commitment in case of virtual class room. The FER can be applied to medicine in which patient pain has been detected automatically through online by the doctor [2]. Analysis of facial expressions can also be applied for security to detect suspicious persons by observing their emotions. Even in animation, the FER is used to detect the emotions from given image [44]. Hence, the research area of FER is very interesting and exciting in many practical applications [46].

The basic FER system is shown in Fig. 1. It mainly consists of three fundamental steps: face acquisition, feature extraction, and classification. Face acquisition involves the detection of the face region from the given image or video sequence. After detecting the face region, selected face area is cropped to eliminate the background part in the image. The cropped face area is further normalized to a fixed size. Once the face region is selected, then features are extracted from this region. The performance of any FER system depends on an efficient extraction of significant features from the face [47]. Feature extraction is helpful for interpreting meaningful information from facial image [20]. A detailed survey on fundamentals of image features and descriptors found in [4]. Any extracted feature should satisfy three important properties: first, it must be high discriminative among the inter class variations and low discriminative within the same class variations. Second, the procedure of feature computation should be simple and it should be described in a low dimensional space. Third, the extracted feature must be more robust against illumination variation and noise.

Fig. 1
figure 1

Facial expression recognition system

The final step of FER system is to choose the classifier for features extracted in the previous step. The features are fed into the selected classifier which performs classification based on the similarities between the extracted features. The widely accepted basic emotions are: neutral, anger, disgust, fear, happy, sad, and surprise. Various classifiers include K-Nearest Neighbours (KNN), Neural Networks (NN) and Support Vector Machines (SVM). It has been proven that for FER, the SVM performs better than other classifiers.

In this paper, a new approach for feature extraction by applying Local Binary Pattern (LBP) separately to 4-neighbors and diagonal neighbors is considered instead of applying on eight neighbours of a central pixel like traditional LBP. This approach reduces the feature length from 256 to 32, thereby the computational complexity is reduced. In order to have noise robustness, a new neighbourhood is defined by averaging along radial direction for texture classification [42]. In this work, an adaptive window is considered around each pixel based on the intensity variations. Further, in order to achieve the robustness in presence of noise, averaging of pixel intensities in each radial directions is considered, thereby a 3 × 3 window is used to calculate the LBP.

The remainder of the paper is as follows: Related work of FER is described in Section 2, a brief review on LBP and its variants is discussed in Section 3. The proposed method for feature extraction is given in Section 4. Experimental setup is presented in Section 5. Simulation results are discussed in Section 6. Finally, concluding remarks are given in Section 7.

2 Related work

Feature extraction plays an important role in any FER system [5]. The extracted features from the face must have high discrimination among the different classes but less discrimination within the same classes. There are two types of approaches for feature extraction: geometric based methods and appearance based methods [6]. In the geometric based approach, the feature vector is formed by extracting the locations of the facial components and shape. However, in real time, the extraction of geometric features is difficult due to large changes in faces across the different cultures [10]. So, appearance based approach gained popularity in the feature extraction [48]. In appearance based methods, spatial filters are applied on either the whole face or specific region of the face to obtain the texture information [13]. Gabor wavelets and LBP are two popular feature extraction techniques in appearance based methods [8]. Gabor wavelets extract the features in multiple scales and orientations but computational complexity is high.

The LBP was a powerful texture descriptor for feature extraction in appearance based methods [38]. The LBP was used in various areas like content based image retrieval, texture analysis, face recognition, bearing fault diagnosis and many more [30]. LBP based methods were computationally simple and efficient, but they were affected by the noise. In order to improve the robustness of LBP, different variants of LBP were proposed for feature extraction. Tan and Triggs [51] proposed Local Ternary Pattern (LTP), which partially solved the problem in the presence of noise but its threshold selection was difficult. Jabid et al. [25] proposed Local Directional Pattern (LDP) which was based on the edge responses in eight different directions and then binary code was formed based on these responses. The performance of LDP depends on the number of edge response directions [26]. Tong et al. [53] proposed Local Gradient Coding using Horizontal and Diagonal principle (LGC-HD) operator to reduce the feature length, but it did not consider the effect of noise. Holder and Tapamo [21] proposed Improved Gradient Local Ternary Pattern (IGLTP) using Scharr operator to reduce the effect of variation in illumination, but the selection of threshold was difficult. Yee et al. [55] proposed completed Local Ternary Pattern (CLTP) to overcome the drawbacks in basic LBP operator. Fan et al. [18] proposed Structure Co-Occurrence Texture (Scoot) that considers both spatial structure and co-occurrence texture statistics. Histogram of Oriented Gradients (HOG) was another descriptor used for feature extraction in FER system. Eng et al. [16] applied HOG for feature extraction in FER system. Nigam et al. [40] combined the HOG and wavelet transform for feature extraction to improve the performance of FER system. All these methods have partially solved the effect of the noise on feature extraction.

Recently, Convolutional Neural Networks (CNN) and deep learning successfully applied in the applications of computer vision. The main advantage of deep learning was that it integrates both feature extraction and classification. Cho et al. [11] proposed Relational Graph Module (RGM) for heterogeneous face recognition. It mainly describes the high-level relational information between facial components. Bi et al. [7] proposed conditional Generative Adversarial Network (cGAN) for Face Sketch Synthesis. cGAN was used to learn the mapping relationships between the face and responding sketch. Huang et al. [24] proposed an adaptive curriculum learning loss for deep face recognition, which automatically emphasizes easy samples first and hard samples later. Pitaloka et al. [45] proposed Enhanced CNN for automatic emotion recognition. They showed that pre-processing methods like resizing, face detection, cropping etc., influence the performance of CNN. Chen et al. [9] proposed deep CNN based on edge computing. It overcomes the problem of high similarity and imbalanced number of sample facial expressions. Kim et al. [33] proposed Deep Neural Network (DNN) structure by combining appearance and geometric features. Li et al. [35] presented faster regions with CNN features to overcome the problems of low level data operation and complexity in feature extraction. Yang et al. [54] proposed Weighted Mixture Deep Neural Network (WMDNN) for effective feature extraction. They used gray scale images and their corresponding LBP images to improve the performance of the FER system. Shan et al. [50] proposed automatic FER using deep CNN network structure. Jung et al. [28] developed the FER system using both CNN and DNN and demonstrated experimentally that CNN outperform compared to DNN. Liliana [36] proposed Deep CNN by detecting the occurrence of facial Action Units for efficient FER. Aneja et al. [3] proposed DeepExpr model for FER by using CNN and transfer learning technique for stylized animated characters. Minaee and Abdolrashidi [39] proposed deep learning approach based on attentional convolutional network which was able to focus on the important parts of the face. Jaiswal and Nandi [27] developed CNN architecture for real-time emotion detection system.

Very recently, CNNs employed for Salient Object Detection (SOD). The SOD helps in determining the most visually characteristic objects in an image. Fan et al. [17] evaluated the recent CNNs based SOD models and further they proposed high quality Salient Objects in Clutter (SOC) dataset. Zhang et al. [56] proposed uncertainty network named UC-Net for RGB-D inspired by human uncertainty in ground truth for saliency detection. Zhao et al. [59] proposed an Edge Guidance network (EGNet) for SOD. EGNet models complementary features: salient object and edge information. Zhang et al. [58] presented a new weakly-supervised SOD method and introduced a new scribble based saliency dataset S-DUTS. Deep learning was applied for many applications in computer vision. However, some limitations of deep learning are: it requires large number of training samples, convergence to local minimum and huge computation compared to conventional feature extraction techniques. Hence, it is proposed a novel approach for FER by using LBP with the concept of adaptive window.

3 LBP and its variants

In this section, the basic theory of the LBP and its variants such as LTP, LGC-HD operator, and IGLTP are discussed.

3.1 LBP

The LBP is an efficient texture descriptor and it is used in many areas like image retrieval, face recognition, and remote sensing, etc [22]. The main advantage of LBP lies with its less computational complexity and illumination invariant property [30]. The LBP operator for a given window having P number of pixel intensities and radius R, is calculated as follows.

$$ LB{{P}_{P}^{R}}({x_{c,}}{y_{c}}) = \sum\limits_{p = 0}^{P - 1} {s({g_{p}} - {g_{c}}){2^{p}}} $$
(1)
$$ s(x) = \left\{ \begin{array}{l} 1, {\text{if}} x \ge 0\\ 0, {\text{if}} x < 0 \end{array} \right. $$
(2)

where, gc is the intensity value of the center pixel (xc, yc) and gp denotes the gray values of neighbours around centre pixel with radius R . For instance, the window shown in Fig. 2 has P = 8 and R = 1, and the LBP produces 256 different output values [41]. The LBP operator is applied on each pixel to compute the LBP codes [1]. The 256-bin histogram is formed from LBP codes and it is used as a feature vector for FER [49]. The main drawback of LBP is that its code changes if central pixel value changes due to noise or any other disturbance.

Fig. 2
figure 2

The 3 × 3 window

3.2 LTP

The LTP was proposed by Tan and Triggs [51] to improve the noise sensitivity of the central pixel in LBP. The LBP is extended into three valued codes: -1, 0, and 1 instead of 0 and 1. The LTP can be calculated by replacing s(x) in (2) by

$$ s({g_{p}},g{}_{c},t) = \left\{ \begin{array}{l} 1,{g_{p}} \ge {g_{c}} + t\\ 0,\left| {{g_{p}} - {g_{c}}} \right| < t\\ - 1,{g_{p}} \le {g_{c}} - t \end{array} \right. $$
(3)

where, t is a threshold. The LTP uses binary codes separately for positive and negative values because of large range with three values. The histogram is calculated for both codes and they are combined to get a final feature. The LTP partially solved the problem of noise sensitivity in LBP, but the selection of threshold was difficult and further, the feature length was more when compared to the LBP.

3.3 LGC-HD operator

The LGC-HD operator was proposed by Tong et al [53]. This operator was based on local gradient coding in which horizontal gradients and diagonal gradients were considered. This operator has an advantage of less computational complexity, but powerful discriminative operator for texture changes in face. For a given window shown in Fig. 2, the LGC-HD operator is calculated as follows:

$$ \begin{array}{l} \mathit{LGC\text{-}HD} = s({g_{1}} - {g_{3}}){2^{4}} + s({g_{4}} - {g_{5}}){2^{3}} + s({g_{6}} - {g_{8}}){2^{2}}\\ \quad \quad \quad \quad \quad+ s({g_{1}} - {g_{8}}){2^{1}} + s({g_{3}} - {g_{6}}){2^{0}}{\text{ }} \end{array} $$
(4)

where, s(x) is defined in (2). The LGC-HD operator is applied on each and every pixel of a given image and LGC-HD codes are calculated. The histogram of all LGC-HD codes is used as a feature vector.

3.4 IGLTP

The IGLTP was proposed by Holder and Tapamo [21]. This operator was based on the Scharr operator and the LTP. First, the Scharr operator was used to calculate the gradient magnitude of a given image. The kernels used to compute gradient by using Scharr operator for a given 3 × 3 window are shown in Fig. 3. These are applied on image to compute the gradients in horizontal (Gx ) and vertical directions (Gy ). Finally, the gradient magnitude is calculated by using the formula:

$$ {G_{x,y}} = \sqrt {{G_{x}}^{2} + {G_{y}}^{2}} $$
(5)
Fig. 3
figure 3

Scharr kernels for 3 × 3 mask: a horizontal b vertical

After calculating the gradient magnitude, the LTP is applied on this gradient image to get the feature vector.

4 Proposed method

In this section, the modified LBP for feature extraction method using adaptive window approach to improve the noise robustness is presented.

4.1 Feature extraction using proposed method

Feature extraction plays a vital role in the performance of FER system. Two important characteristics should be considered in feature extraction, viz., size of the feature vector and robustness in the presence of noise [32]. This has motivated to propose a new approach to calculate the LBP to have both these advantages. In this approach, LBP is calculated considering 4-neighbors and diagonal neighbours separately. Further, the size of the window is adaptively chosen based on the variance of the window. If the size of the window is more than 3 × 3, then the averaging of pixel intensities in the same radial direction is considered to reduce the size of the effective window to 3 × 3. This increases the noise immunity and also becomes invariant to illumination variations. Consider a 3 × 3 window as shown in Fig. 2. The steps of the proposed method for feature extraction are as follows:

  1. (i)

    Apply the LBP by considering only horizontal and vertical neighbors and calculate the histogram over the entire image by using (6) and this feature vector is represented as f1.

    $$ \begin{array}{l} \mathit{LBP\text{-}HV} = s({g_{2}} - {g_{c}}){2^{3}} + s({g_{5}} - {g_{c}}){2^{2}} + s({g_{7}} - {g_{c}}){2^{1}} + s({g_{4}} - {g_{c}}){2^{0}} \end{array} $$
    (6)
  2. (ii)

    Apply the LBP by considering only diagonal neighbors and calculate the histogram over the entire image by using (7) and this feature vector is represented asf2.

    $$ \begin{array}{c} \mathit{LBP\text{-}D} = s({g_{1}} - {g_{c}}){2^{3}} + s({g_{3}} - {g_{c}}){2^{2}} + s({g_{8}} - {g_{c}}){2^{1}} + s({g_{6}} - {g_{c}}){2^{0}}{\text{ }} \end{array} $$
    (7)

    where, s(x) is defined in (2) .

  3. (iii)

    Now, a new feature vector is defined by concatenating the above two feature vectors as follows:

    $$ f = [{f_{1,}}{f_{2}}] $$
    (8)

In this approach, each operator considers only four neighbours at a time, hence the feature length of each operator is 16 and the concatenated feature has the length of 32 only, which is less in length compared to the traditional LBP. However, this approach is also sensitive to noise. In order to improve the noise robustness, the size of the window is changed adaptively, as discussed in the next section.

4.2 Noise robustness

The features extracted by using LBP are easily affected if any slight change occurs in the intensity value of the central pixel value due to noise. In order to have the robustness in obtaining the features by LBP, a new window for a given window size N × N is defined as follows: The original central pixel value of the window is replaced with the mean value of the pixels in the window, because the original central pixel value is approximately equal to the mean value of the pixels in the given window. Next, the intensity values of neighbours of given central pixel are modified by averaging the intensity values along radial direction [42]. This procedure of averaging along radial directions is explained in Fig. 4 for N = 5. However, the size of the window also affects the recognition rate. If the window size is varied according to the variance of the intensity values within the window, then the proposed approach improve the performance of FER system.

Fig. 4
figure 4

Modified neighborhood for noise robustness

4.3 Adaptive window

In this section, the method of selection of window size is discussed. The modification of window for feature extraction by averaging along radial direction achieves the noise robustness. However, the selection of window size for averaging the pixel values in radial direction influences the feature description. Hence, the appropriate selection of window size is necessary before feature extraction. The characteristics of the feature vector changes if the window size is varied. The variance of the pixel intensities within the window is less, and then it is considered as smooth region, else it is considered as non-smooth region. The size of the window is high for smooth region, whereas the size of the window is less for non-smooth region. This has motivated to vary the size of the window according to variance of intensity values within the window. If V1, V2 and V3 are the variances of windows with different sizes as shown in Fig. 5, then the size of the window is selected based on the following steps:

Fig. 5
figure 5

Adaptive window approach: V1, V2 and V3 are variances

(i) If V1 < V2, then window size is increased because of the smoothness of the region. Otherwise, original window size is retained. (ii) If V2 < V3, then window size is further increased , otherwise same window size is retained.

After selection of window size using above steps, the intensity values of neighbours around a central pixel are modified by the procedure shown in Fig. 4. After modifying the intensity values of a given window with any size N × N to 3 × 3 window, the features are extracted by using (8). The schematic representation of the proposed method for feature extraction shown in Fig. 6. It is observed that the features extracted on the whole face image contains only macro level information [19]. In order to obtain the micro level information, each face image is divided into sub regions and features are extracted from each sub region [15]. Finally, histograms from each sub image are concatenated to get the final feature of the face image [31]. Since the proposed method considers both averaging in radial directions and adaptive window, the performance is robust in terms of recognition rate in noisy images.

Fig. 6
figure 6

Schematic representation of the proposed method for feature extraction

The novelty of proposed method compared to LBP is that it has three advantages. First, the traditional LBP uses eight neighbors which results in feature length of 256 whereas our method has feature length of 32 only. For example, consider an image with dimension 128 × 128. Suppose if it is divided into 8 × 8 sub regions, then extracted feature has length of 16384 (8 × 8 × 256 = 16384) whereas our approach has a dimension of 2048 (8 × 8 × 32 = 2048) only. Such a reduction in feature characteristic dimension is useful at the stage of classification. Second, the formation of window for feature extraction by averaging pixels in radial directions makes proposed method more robust to the noise, but the traditional LBP is highly sensitive to noise. Third, the concept of adaptive window in the selection of window size for averaging in radial direction further enhances the performance of FER system.

5 Experimental setup

In this section, a brief discussion on databases used and the procedure for performance evaluation of the proposed method are explained. Japanese Female Facial Expression (JAFFE) database [37] and Cohn-Kanade (CK) database [29] are popularly used by the researchers in the area of FER. Each database contains seven basic emotions: neutral, anger, disgust, fear, happy, sad, and surprise.

5.1 JAFFE database

This database contains the images of 10 Japanese females and each person was instructed to display the six basic prototypic expressions starting from the neutral expression. It has a total of 213 images (30 anger, 29 disgust, 32 fear, 31 joy, 31 sadness, 30 surprise, and 30 neutral) with each image has a dimension of 256 × 256 pixels. Some of the sample images from JAFFE database are shown in Fig. 7.

Fig. 7
figure 7

Sample face expression images from JAFFE database

5.2 CK database

This database contains the images of 100 university students aged from 18 to 30 years, of which 65% were female, 15% were African-American and 3% were Asian or Latino. Students were instructed to display the facial expressions starting from neutral expression to target expression; images were digitized to a dimension of 640 × 480 or 490. In this work, first neutral frame and last three peak frames of each sequence are selected in the database. In total, 1290 images are used (108 anger, 117 disgust, 105 fear, 276 joy, 129 sadness, 225 surprise, and 330 neutral). Some of the sample images from CK database are shown in Fig. 8.

Fig. 8
figure 8

Sample face expression images from CK database

5.3 Testing procedure of the proposed method

The selected images from the databases are cropped to a size of 150 × 110 as done in [48]. Further, each image is divided into 7 × 6 sub regions to get the micro-level information from the face. After extraction of features, the Principal Component Analysis (PCA) is applied for dimensionality reduction. The recognition rate is calculated to evaluate the performance of the FER system by using proposed method. The recognition rate is calculated by dividing the number of correct predictions by the total number of predictions. The 10-fold cross validation testing scheme is used to calculate the recognition rate. In 10-fold cross validation, the dataset is roughly divided into ten equal groups. Nine groups are used for training and one group is used for testing. This process is repeated 10 times and the recognition rates are averaged to obtain the final recognition rate. The confusion matrix is also calculated in order to obtain the recognition rate of individual facial expression. The confusion matrix in machine learning describes the performance of a given classifier on testing data for which true labels are known. In confusion matrix, the rows correspond to the observed classes and the columns correspond to the predicted labels. The diagonal elements in the confusion matrix represent the correctly classified classes and off-diagonal elements give incorrectly classified classes. Finally, for classification, the support vector machine (SVM) is used because of its popularity in most of the pattern recognition problems [12]. SVM is a supervised machine learning algorithm that can be used for binary classification. It performs classification by finding a linear separating hyperplane that maximizes the margin between the classes in high dimensional space. SVM is a binary classifier, and the multi-class classification is obtained by using one-against-rest technique [34]. All experiments are done by using MATLAB software. The classification learner app of MATLAB is used for SVM implementation and quadratic kernel is selected.

6 Results and discussion

In this section, results on the JAFFE database and the CK database are discussed for the both 6-class and 7-class FER. In the first set of experiments, the proposed method is evaluated for different sizes of the window N × N and adaptive window on the JAFFE and the CK databases. The values of N are selected as 3, 5, and 7 respectively, for the size of the window. Experimental results are shown in Tables 1 and 2 for the JAFFE and the CK databases. It can be seen from Tables 1 and 2, the adaptive window approach gives the highest recognition rate of 92.9% and 88.3% for 6-class and 7-class recognition on the JAFFE database. Similarly, this approach achieves recognition rate of 96% and 93.9% for 6-class and 7-class recognition on the CK database. The recognition rate for the adaptive window is higher because of the size of the window is changed according to the variance of the pixel intensities within the window.

Table 1 Recognition rate(%) on JAFFE database
Table 2 Recognition rate(%) on CK database

In the second set of experiments, in order to know the performance of the proposed method on the individual facial expressions, the confusion matrices for 6-class and 7-class FER on the JAFFE database are calculated. The confusion matrices are given for both 6-class and 7-class in Tables 3 and 4 respectively. It is observed that, happy and surprise are recognised with higher recognition rates of 100% and 96.7%, while recognition rate for anger is lower and easily confused with the other facial expressions. As neutral expression is included in 7-class recognition, the recognition rate is reduced compared to 6-class recognition rate because some of the facial expressions are confused with the neutral expression.

Table 3 Confusion matrix of 6-class FER on JAFFE database
Table 4 Confusion matrix of 7-class FER on JAFFE database

The same experimental procedure is performed on the CK database, and confusion matrices for 6-class and 7-class recognition are given in Tables 5 and 6 respectively. The overall performance in the CK database is more compared to the JAFFE database because of the large number of subjects is available for facial expressions in the CK database. Another reason for lower recognition rate on the JAFFE database is that some facial expressions in this database had been labelled incorrectly and these images influence the recognition rate.

Table 5 Confusion matrix of 6-class FER on CK database
Table 6 Confusion matrix of 7-class FER on CK database

The proposed method is compared with the traditional methods LBP [48], LTP [51], LGC-HD [53], IGLTP [21], HOG [16], Wavelet + HOG [40] and few deep learning methods: Deep CNN [50], CNN [28], DNN [28], Deep CNN [36], DNN [33] and WMDNN [54] . The simulation results on the JAFFE and CK databases are given in Tables 7 and 8. It is observed that the proposed method outperformed with a recognition rate of 92.9% for 6-class and 88.3% for 7-class compared to the existing methods on JAFFE database. Similarly the results on the CK database for 6-class and 7-class are given in Table 8. As it can be seen from Table 8, the recognition rate of the proposed method is 96% for 6-class and 93.9% for 7-class. It is observed that Deep learning methods: DNN [33] and WMDN [54] achieves better performance with recognition rates 96.4% and 97.02% on CK database but comparable with the proposed method which is 96%. However, on JAFFE database, the proposed method outperforms than DNN [33] and WMDN [54].

Table 7 Comparison of recognition rate(%) on JAFFE database for various methods
Table 8 Comparison of recognition rate(% ) on CK database for various methods

Finally, in order to verify the robustness of the proposed method against noise, the Gaussian noise is added with zero mean and different variances equal to 0.05, 0.01, 0.015, 0.02, 0.025 to the images of the JAFFE and the CK databases. For comparison, The LBP and its variants: LBP, LTP, LGC-HD, and IGLTP are considered because these methods are highly effected by noise. The simulation results are shown in Figs. 9 and 10 for 6-class and 7-class on the JAFFE database. It is observed that IGLTP has poor performance in the presence of noise, but it outperformed without noise compared to the LBP, LTP, LGC-HD methods. But, the proposed method gives good recognition rate compared to all these methods. Similarly, the simulation results on the CK database are shown in Figs. 11 and 12 for 6-class and 7-class respectively. The simulation results confirm that the proposed method gives good recognition rate on the CK database when compared to the LBP and its variants. The experimental results validate that proposed approach in calculating the LBP with adaptive window is a powerful approach against noise and noise free conditions.

Fig. 9
figure 9

Recognition Rate on JAFFE database for 6-class under different levels of Gaussian noise

Fig. 10
figure 10

Recognition Rate on JAFFE database for 7-class under different levels of Gaussian noise

Fig. 11
figure 11

Recognition Rate on CK database for 6-class under different levels of Gaussian noise

Fig. 12
figure 12

Recognition Rate on CK database for 7-class under different levels of Gaussian noise

Further, the proposed method is evaluated on two other publicly available databases: Facial Expression Research Group (FERG) database [3] and FEI face database [52]. The FERG Database has 55767 annotated face images of six stylized characters namely: Ray, Malcolm, Jules, Bonnie, Mery, and Aia. Each character has seven types of universal facial expressions namely: neutral, anger, disgust, fear, happy, sad, and surprise with original size of 768 × 768 pixels. Six sample images from this database are shown in Fig. 13. The FEI face database is a Brazilian face database contains 14 images for each of 200 persons with original size of 640 × 480 pixels. All images are obtained from students and staff in between 19 and 40 years old. Each person is posed two basic emotions: neutral and happy. Some sample images from FEI face database are shown in Fig. 14.

Fig. 13
figure 13

Sample images from FERG database

Fig. 14
figure 14

Sample Images from FEI face database

In our experiments, 15 images from each character with seven basic emotions from FERG database are considered. All color images from this database are converted into gray scale images. Similarly, all 200 images with frontal pose from FEI face database are considered for 2-class expression recognition. The experimental procedure is same as conducted for JAFFE and CK databases. The average recognition rates of the proposed method and some of the previous methods are provided in Table 9. It is noticed that the average recognition rates on FERG data base and FEI face database are 96.7% and 98.9% respectively which outperforms compared to the existing methods. It is also observed that best recognition rate FEI face data base is obtained because it has only two emotions whereas other databases have all seven basic emotions. In summary, it is confirmed that the proposed method achieves higher recognition rate on JAFFE, CK, FERG and FEI face databases compared to the existing methods.

Table 9 Performance of proposed method on other databases

7 Conclusion

The performance of any FER system strongly depends on the efficient feature extraction. This paper presents a novel approach for feature extraction. The major contribution of this work is to improve the LBP for feature extraction. The highlights of the proposed method are: it is a powerful texture descriptor for FER; it reduces the feature dimension which further improves the classification; and finally it is more robust against noise. The features are extracted by applying LBP separately to the 4-neighbors and diagonal neighbors. This approach has the advantage of less size in dimension of the feature compared to traditional LBP. The effect of noise in the feature extraction is reduced by using new neighbourhood which is obtained by averaging in the radial directions and the adaptive window concept. The combination of adaptive window and averaging in radial direction make the feature as more affective in texture description. The classification is performed by using Support Vector Machine. Experimental results on the JAFFE, CK, FERG and FEI face databases confirm that the superiority of the proposed method in recognition rate compared to the existing methods. In future work, the use of deep learning concepts can be explored for classification and to improve recognition rate. Further, the concepts of SOD using deep learning can be applied for FER system.