Introduction

Expressions in the human face are generated by neurological triggers of facial muscles causing thereby the deformation of basic facial features conveying the emotional state of a subject [8]. Facial expression is the most significant form of non-verbal communication having an inherent natural connection with human emotions. Automatic detection of facial expressions and associating them with proper emotional levels offer challenging research propositions in the last decade [3, 14, 29].

It has many emerging applications in the field of Human–Computer Interaction (HCI) such as online advertisements, sentiment analysis, attentiveness analysis of vehicle driver [20], security, automated tutoring system, etc. [15]. In this context, the pioneering work of Paul Ekman identified that six basic expressions (Anger, Disgust, Fear, Happiness, Sadness, and Surprise) are universal [7].

A facial expression recognition system can be categorized into two groups based on their principal mode of operation with one being Action Unit (AU) based models while appearance-based models being the other. Facial Action Coding System (FACS) is a system to identify the movement of facial muscles termed as Action Units (AU) from the appearance of the face [9]. In contrast to FACS, the Appearance-based models try to analyze the geometric and texture features of the whole face. The Active Appearance Model (AMM) is a deformable object matching system that can describe the shape of a face using a set of landmark points [5]. The active appearance model (AAM) is used to generate landmark points on the face. Some salient landmark points are selected to form a triangulation set. The area of the triangle formed by joining the circumcenter–incenter–centroid of a triangle is used as a shape descriptor. These features identify the changes in the Center of gravity of a triangle. We have employed Multi-Layer Perceptron (MLP) for the classification of expressions.

Our Motivation

Facial expression changes the shape of the face. By identifying and measuring these changes we can determine the expression of the face. We have chosen the incenter, circumcenter, and centroid of a triangle, because the area of a triangle connecting incenter–circumcenter and centroid can well describe scalenes and obtuseness of a triangle. By choosing the ratio area of a triangle and the area of the CIC triangle we can intend to characterize all types of shapes.

Our Contribution

(i Static image-based facial expression analysis is introduced here by considering different face images that are collected from several benchmark image databases. (ii) Most popular feature point tracker, namely, the Active Appearance Model (AAM) is utilized to describe geometric positions of landmark points associated with important face components. (iii) Circumcenter–Incenter–Centroid (CIC) triangle area is used as a unique feature for describing the shape of a face generated through the formation of triangles from landmark points and the feature is used as the unique discriminative descriptor of shape information available in facial expressions. (iv) Finally, Multi-Layer Perceptron (MLP) is employed in the system to process those CIC features to be distinguished into several facial expression levels.

The remaining parts of this article are structured as follows. “Related Works” describes about related works. The methodology is described in “Methodology” with the discussion on Salient Landmark Selection, Triangulation Formation, circumcenter–incenter–centroid Trio Feature extraction, and Classification Learning process. Experimental Results on several databases are reported in “Experimental Results”. “Discussion” provides the discussion on comparative study. The conclusion of our work is drawn in “Conclusions”.

Related Works

Facial Action Coding System (FACS) is introduced by pantic et al. in their pioneering work of expression identification based on muscle movement in an emoted faced [9]. FACS is a list of Action Units (AUs) that are uniquely identifiable and also may contribute to generating emotion. FACS based system independently identifies all Action Units (AUs) and appearance. In the FACS system, some AUs are combined to identify a facial expression but FACS based system does not describe the appearance of the face. A study by shan et al. used Local Binary Pattern (LBP) for representing salient micro patterns generated in an expressed face which also works robustly for low-resolution images [28]. This method only uses texture-based features but does not include shape-based parameters. Some shape-based works use the Active Shape Model(ASM) and Active Appearance Model (AAM) which are deformable object models trained for facial recognition [6, 21]. These two models are used to determine the shape of the face. The pioneering work of Kotsia et al. introduced a model that decomposes the facial image based on Non-negative Matrix Factorization. This method also uses geometrical displacement features designed for recognizing partial occlusions in face images [13]. The method uses Gabor Wavelets texture-based information and tries to identify expressions also in partially occluded faces. Another different approach is explored by Barman and Dutta, where they have used Distance and Shape signature generated out of AAM. They also incorporated stability indices to achieve better classification accuracy with MLP [3]. They have used facial triangulation along with different statistical measures.. Ji and Idrissi modeled a Spatio-temporal descriptor to describe the transformations on facial expression [11]. Yi, Chen et al. used the slope of AMM-based feature points on low intensity and high-intensity frames of a sequence along with texture block movement for expression classification [31]. This paper described video-based facial expression recognition using feature differences between keyframes. This study uses a one-dimensional Convolutional Neural Network(CNN) for recognition. Richhariya and Gupta applied the Universum twin support vector machine (UTSVM) to enhance the multi-class classification performance of Support Vector Machine (SVM) for the classification of facial expressions [26]. In supervised learning, information about the distribution of data is given by data points not belonging to any of the classes. These data points are known as Universum data. In [24] and [2] we have used Incenter–Circumcenter–Centroid (ICC) Trio feature and Side Length features, respectively. Our work in [2] used side length features of facial triangles generated by AAM. This method explores the face in a componentwise and also in a holistic manner and eleven different side length features are explored and best-performing features are selected for computation of final accuracy. In our previous work [24] we have used three distance and three slopes feature of among circumcenter, incenter, and centroid of face triangle, on the contrary, we have used Area of Circumcenter, Incenter and Centroid triangle features in this study which found to yield higher accuracy on all four databases.

Methodology

The proposed algorithm has the following steps I. Image Input II. AAM fitting III. Salient Landmark Selection, IV. Triangulation Formation, V. Circumcenter–Incenter–Centroid Trio feature extraction, and VI. Classification Learning with MLP. VII. Prediction. We have presented an algorithm for our proposed method below. Algorithm 1 takes expression image set (I), expression labels (E), and pre-Trained AAM (pAAM) as input and produces the Facial Expression Recognition Model (CICModel) and model Accuracy (\(A_{cc}\)) as output. Algorithm 2 is used for the implementation of the function \(CIC\_Trio\_Feature()\) which is invoked by Algorithm 1.

figure a
figure b

The flow-chart of the computation is shown in Fig. 1.

Fig. 1
figure 1

Flow chart of the proposed method depicting training and testing phase of computation

Figure 1 shows different stages of computation Image Input, AAM fitting, Salient Landmark Selection, Triangulation Formation, Trio feature calculation, Classification with MLP, Prediction.

Salient Landmark Selection

The AAM model in [27] generates 68 landmark points but all the landmark points do not convey the same amount of information. We have selected only 23 salient landmark points for our purpose which are highly sensitive to the facial expressions shown in Fig. 2. The midpoint and corner points of the lips, eyes, eyebrows, and nose are selected as salient landmark points. Three landmark points are selected on the left eyebrow [1,2,3], and three on the left eyebrow [4,5,6]. Four points are selected from the left eye [7,8,9,10] and Four points from the right eye [11,12,13,14]. Three points are selected from the nose [15,16,17]. Four points are selected from the upper lip [18,19,20,21] and two points are selected from the lower lip [22, 23]election of landmark points is demonstrated in Fig. 2. The landmark points are mathematically represented as a set of Cartesian points L in Eq. 1:

$$\begin{aligned} L = [(x_1,y_1), (x_2,y_2), ... , (x_{23}, y_{23})] \end{aligned}$$
(1)
Fig. 2
figure 2

Salient landmark points computed by AAM

Triangulation Formation

With these salient landmark points, all possible triangles are formed taking 3 points at a time. A total number of \({23}\atopwithdelims (){3}\) \(= 1771\) triangles can be drawn on the face accordingly. The formation of triangles can be represented as a set of coordinate triplets Tr in a specific sequence as per Eq. 2:

$$\begin{aligned} Tr= & {} [((x_1,y_1),(x_2,y_2), (x_3,y_3)),...\nonumber \\&,((x_1,y_1),(x_2,y_2), (x_{23},y_{23})),...\nonumber \\&,((x_{21},y_{21}),(x_{22},y_{22}), (x_{23},y_{23}))] \end{aligned}$$
(2)

Triangulation on the face is depicted in Figs. 2, 3.

Fig. 3
figure 3

Triangulation formation on the face

Circumcenter–Incenter–Centroid Trio Feature

The circumcenter is the center point of the circumcircle of a triangle. Incenter is the center of the incircle of a triangle and centroid of a triangle is the point, where three medians of the triangle meet. The mathematical calculation of circumcenter, incenter and centroid of a triangle is given below.

if \(P_a = (x_1,y_1)\), \(P_b = (x_2,y_2)\) and \(P_c = (x_3,y_3)\) are three vertices of a triangle T. Then three side vectors can be calculated as

$$\begin{aligned} {\vec {AB}} = P_b - P_a,\ \vec {AC} = P_c - P_a,\quad {\text{and}}\quad \vec {BC} = P_b - P_c \end{aligned}$$
(3)

Mathematical Calculation of Incenter

In the case of incenter the unit vectors \(\hat{ab}, \hat{bc}\), and \(\hat{ca}\) are calculated as

$$\begin{aligned} \hat{ab} = \frac{\vec {AB}}{\left\Vert AB\right\Vert },\ \hat{ac} = \frac{\vec {AC}}{\left\Vert AC\right\Vert },\ and\ \hat{bc} = \frac{\vec {BC}}{\left\Vert BC\right\Vert } \end{aligned}$$
(4)

The directors are calculated as \(\vec {L_1} = \hat{ab}+\hat{ac}\) and \(\vec {L_2} = \hat{ba}+\hat{bc}\), the Coefficient Matrix A is calculated as \(\vec {A} = \hat{L_1} - \hat{L_2}\) and \(\vec {B} = P_a - P_b\).

By solving the linear equation \(A.x = B\) we get \(x_1\) and \(x_2\) (Fig. 4). Finally we get the incenter \(I_c\) from Eq. 5:

$$\begin{aligned} I_c = P_a +x_1.L_1 \end{aligned}$$
(5)
Fig. 4
figure 4

Incenter and Incircle of a triangle

Mathematical Calculation of Centroid

For calculation of centroid the \(\vec {L_1}\) and \(\vec {L_2}\) vectors are computed as \(\vec {L_1} = \frac{P_b+P_c}{2}-P_a\) and \(\vec {L_2} = \frac{P_a+P_c}{2}-P_b\) and the Coefficient Matrix A is calculated as \(\vec {A} = \vec {L_1} - \vec {L_2}\) and \(\vec {B} = P_a - P_b\).

By solving the linear equation \(A.x = B\) we get \(x_1\) and \(x_2\) (Fig. 5). Finally we get the centroid \(C_e\) from Eq. 6:

$$\begin{aligned} C_e = P_a +x_1.L_1 \end{aligned}$$
(6)
Fig. 5
figure 5

Centroid of an example triangle

Mathematical Calculation Circumcenter

The \(\vec {L_1}\) and \(\vec {L_2}\) vector for circumcenter is calculated \(\vec {L_1} = \hat{ab} \times \vec {N}\) and \(\vec {L_2} = \hat{bc} \times \vec {N}\), where \(\vec {N} = \hat{ac} \times \hat{ab}\).

The Coefficient Matrix A is calculated as \(A = \vec {L_1} - \vec {L_2}\) and \(\vec {B} = \frac{P_c - P_a}{2}\).

By solving the linear equation \(A.x = B\) we get \(x_1\) and \(x_2\) (Fig. 6). Finally, we get the circumcenter \(C_c\) from Eq. 7:

$$\begin{aligned} C_c = \frac{P_a + P_b}{2} +x_1.L_1 \end{aligned}$$
(7)

These three types of centers of a triangle are used to form a triangle which is shown in Fig. 7.

Fig. 6
figure 6

Circumcenter of an example triangle

Fig. 7
figure 7

Circumcenter–Incenter–Centroid \(\triangle (C_c,I_c,C_e)\) triangle connecting the landmark points \(\triangle (18,19,14)\)

The normalized area (\(A_1\)) of the triangle \(\triangle (C_c,I_c,C_e)\) connecting Circumcenter \(C_c = (x_c,y_c)\), Incenter \(I_c = (x_i,y_i)\), and Centroid \(C_e = (x_e,y_e)\) is calculated in Eq. 8:

$$\begin{aligned} A_1 = \sqrt{\frac{s_1(s_1-a_1)(s_1-b_1)(s_1-c_1)}{s_1^2}} \end{aligned}$$
(8)

where \(s_1 = \frac{a_1+b_1+c_1}{2}\), \(a_1 = \sqrt{(x_c-x_i)^2+(y_c-y_i)^2}\), \(b_1 = \sqrt{(x_i-x_e)^2+(y_i-y_e)^2}\), and \(c_1 = \sqrt{(x_c-x_e)^2+(y_c-y_e)^2}\).

We also compute the normalized area (\(A_2\)) of original triangle \(\triangle (P_a,P_b,P_c)\) connecting \(P_a = (x_1,y_1)\), \(P_b = (x_2,y_2)\) and \(P_c = (x_3,y_3)\) using Eq. 9:

$$\begin{aligned} A_2 = \sqrt{\frac{s_2(s_2-a_2)(s_2-b_2)(s_2-c_2)}{s_2^2}} \end{aligned}$$
(9)

where \(s_2 = \frac{a_2+b_2+c_2}{2}\), \(a_2 = \sqrt{(x_1-x_2)^2+(y_1-y_2)^2}\), \(b_2 = \sqrt{(x_2-x_3)^2+(y_2-y_3)^2}\), and \(c_2 = \sqrt{(x_1-x_3)^2+(y_1-y_3)^2}\).

Classification Learning

For Learning we use is (1) Normalized area of the triangle connecting circumcenter, incenter and centroid (\(A_1\)), (2) The normalized area of the originating triangle (\(A_2\)), and (3) The ratio \(A_3 = \frac{A_1}{A_2}\). We trained MLP with these features. The data set is subdivided in a 70%, 15%, and 15% ratio for training, validation, and testing, respectively, in a non-overlapping manner.

Experimental Setup

We make use of AAM by Sagonas et al for generating face description by identifying 68 landmark points on the face [27]. Python-based Dlib implementation of [27] is used for computation of facial landmarks available on “https://github.com/davisking/dlib-models”. Multi-Layer Perceptron (MLP) is used for classification learning. MLP with 20 hidden neurons and 6 output neurons is used with a SoftMax layer. With different types of setup, the MLP with 20 hidden units achieved the highest accuracy. The 6 output neuron is used to get six output classes. The SoftMax layer is used for the distribution of probability among 6 different expression classes. The MLP is trained with Scale Conjugate Gradient (SCG) back-propagation algorithm [23].

Experimental Results

The methods we have discussed in this paper for the extraction of the Circumcenter–Incenter–Centroid (CIC) trio feature generated out of facial triangulation are tested on four benchmark databases viz. Japanese Female Facial Expression (JAFFE) database [19], Extended Cohn–Kanade (CK+) database [18], Multimedia Understanding Group (MUG) database [1] and MultiMedia Imaging (MMI) database [30]. The experimental results on CK+, JAFFE, MMI, and MUG database are discussed in subsections “Experiments on Extended Cohn–Kanade (CK+) Database” through “Experiments of Japanese Female Facial Expression (JAFFE) Database”.

Experiments on Extended Cohn–Kanade (CK+) Database

The Cohn–Kanade Facial Expression Database contains 327 images from CMU-Pittsburg having a mix of posed and spontaneous expression. This database has 6 basic expressions with one extra expression Contempt (CO). We have selected only 309 peak expression images from 6 basic expression classes for training some classifiers. The highest accuracy achieved in this database with the MLP classifier is 99.80%. The confusion matrix of CK+ data set is shown in Table 1. The confusion matrix in Table 1 demonstrate that Anger, Disgust, Fear, and Happiness expressions are recognized with \(100\%\) precision in the CK+ database.

Experiments of Japanese Female Facial Expression (JAFFE) Database

This facial expression database contains only posed expressions and images are taken from 10 different Japanese models. This database contains a total of 213 images of 6 basic expressions and 1 extra neutral expression. Removing neutral expression images we get 183 images that are used in training. In this data set, the MLP learning reports an overall highest precision of 98.62%. The confusion matrix of the JAFFE data set is shown in Table 1. It is evident from the confusion matrix in Table 1 that the Fear and Surprise shows \(100\%\) recognition accuracy in the JAFFE database.

Experiments on MMI Database

This database contains 400 images with 6 basic expressions. Images are taken from 75 subjects from different origins. All the images in this database do not have proper annotation. We have manually annotated 222 truly expressed images in this database. We have used MLP for classification and the proposed method achieved 97.95% accuracy. The confusion matrix of the MMI data set is shown in Table 1. It can be observed from Table 1 of the MMI database that the Disgust expression is recognized with 100% accuracy.

Experiments on Multimedia Understanding Group (MUG) Database

This database contains 401 images from 26 models of different origins having both the posed and spontaneous expressions. Our system accomplishes 98.92% of overall accuracy on the MUG data set. The confusion matrix of the MUG data set is shown in Table 1. It can be observed from the confusion matrix in Table 1 that the Happiness expression is recognized with \(100\%\) precision in the MUG database.

Table 1 Confusion Matrix of CK+, JAFFE, MMI and MUG database

Discussion

The confusion matrices corresponding to CK+, JAFFE, MMI, and MUG are represented in Table 1. The MLP classifier performed very well in the CK+ data set with \(99.79\%\) overall accuracy and the relevant confusion matrix reveals that Anger, Disgust, Fear, and Happiness expressions are recognized with \(100\%\) precision in the CK+ database. In the JAFFE data set overall accuracy is \(98.62\%\) and the Fear, Surprise report \(100\%\) recognition accuracy. The proposed method achieved a 97.97% MMI database. The Disgust expression achieved 100% accuracy in the MMI database. The MUG data set shows \(100\%\) precision in Happiness expression with overall \(98.92\%\) accuracy. Good accuracy on all four data sets implies that the system learns facial expressions very well without human intervention. The Circumcenter–Incenter–Centroid (CIC) trio feature efficiently grabs the expression-related cues confirming the accurate learning capability of the system.

We have also compared the results obtained with many other existing studies as reported in Table 2.

For the verification of recognition capability of our proposed approach, we compared our experimental results with other existing literature works [10, 12, 22, 17, 25, 4, 32]. It is seen from Table 2 that authors in [4] experimented recognition task on four benchmark databases CK+, JAFFE, MMI and MUG and got overall accuracy 98.70%, 97.60%, 94.30% and 98.60%, respectively, while our proposed method offers better results through achieving overall accuracy on CK+ (99.80%), JAFFE (98.62%), MMI (97.95%) and MUG (98.92%). Even though authors in [10, 17, 32] used different experimental environments to perform recognition task on CK+ and JAFFE databases, our proposed approach is found to achieve higher overall recognition accuracy on CK+ (99.80%) and JAFFE (98.62%) compared to theirs. Although the convolutional neural network based work by [17] achieved slight higher accuracy 98.80% in JAFFE databased but our method found to outperform on CK+ database with 99.80% accuracy. It is also noticed that authors in [25] achieved higher accuracy on MMI (95.24%) compared to other existing work reported in Table 2, whereas our approach is found to ensure overall accuracy on MMI (97.95%) beating the approach reported in [25]. Similarly, our method also gives more overwhelming performance on CK+ as well as MMI compared to the methods in [12, 22].

Table 2 Comparison of the Circumcenter–Incenter–Centroid (CIC) trio feature method with other state of the art works

It can be observed from Table 2 that the proposed method outperformed all the existing state-of-the-art works in this domain.

Conclusions

In the present scope, we have introduced a geometric feature based on the Circumcenter–Incenter–Centroid (CIC) of triangles generated by joining the important landmark points on the face image for learning of human expressions. This feature has been found to perform impressively for subsequent classification of expressions reflected in the concerned face images. Four benchmarks face data sets viz. CK+, JAFFE, MUG, and MMI are used for adjudication of the applicability of the proposed CIC-induced face recognition task. Extensive studies followed by adequate analyses validate the effectiveness of the present approach for the said purpose. The technical significance of the proposed technique is further vindicated by its performance superiority over its competing state-of-the-art techniques in vogue.