Fuzzy triangulation signature for detection of change in human emotion from face video image sequence

Nasir, Md; Dutta, Paramartha; Nandi, Avishek

doi:10.1007/s11042-021-11196-1

Fuzzy triangulation signature for detection of change in human emotion from face video image sequence

Published: 22 July 2021

Volume 80, pages 31993–32022, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Fuzzy triangulation signature for detection of change in human emotion from face video image sequence

Download PDF

Md Nasir¹,
Paramartha Dutta¹ &
Avishek Nandi¹

184 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

The present article proposes a geometry-based fuzzy relational technique for capturing gradual change in human emotion over time available from relevant face image sequences. As associated features, we make use of fuzzy membership arising out of five triangle signatures such as - (i) Fuzzy Isosceles Triangle Signature (FIS), (ii) Fuzzy Right Triangle Signature (FRS), (iii) Fuzzy Right Isosceles Triangle Signature (FIRS), (iv) Fuzzy Equilateral Triangle Signature (FES), and (v) Other Fuzzy Triangles Signature (OFS) to achieve the task of appropriate classification of facial transition from neutrality to one among the six expressions viz. anger (AN), disgust (DI), fear (FE), happiness (HA), sadness (SA) and surprise (SU). The effectiveness of the Multilayer Perceptron (MLP) classifier is tested and validated through 10 fold cross-validation method on three benchmark image sequence datasets namely Extended Cohn-Kanade (CK+), M&M Initiative (MMI), and Multimedia Understanding Group (MUG). Experimental outcomes are found to have achieved accuracy to the tune of 98.47%, 93.56%, and 99.25% on CK+, MMI, and MUG respectively vindicating the effectiveness by exhibiting the superiority of our proposed technique in comparison to other state-of-the-art methods in this regard.

Recognition of Transforming Behavior of Human Emotions from Face Video Sequence: A Triangulation-Induced Circumradius-Incenter-Circumcenter Combined Approach

Landmark triangulation-induced Altitude Signature for change detection of human emotion from face image sequence

Article 20 November 2021

Emotion recognition from geometric fuzzy membership functions

Article 15 January 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Effectual person-robot interaction is a recent significant aspect in the affective computing field. In the interaction between people, verbal cues (such as spoken words, voice) and non-verbal cues (such as body gesture, facial expression) are used to describe their feeling [22]. The improvement of human-robot interaction can be ensured if a robot can interpret accurately the internal meaning of human emotion from facial expression. According to Pantic et al. [25], facial expression has a higher contribution than other cues to express the mental state of humans. There are several applications of facial expression recognition with high demand in the human healthcare system. The recognition system can help to identify the actual health condition of patients by examining the emotional behavior [30]. Here, the system takes into cognizance the patient’s facial expression to interpret whether it belongs to positive expressions or negative expressions. If it displays any positive expressions such as happiness, gladness then patients can be considered to be in a healthy condition while the unhealthy condition is considered by observing any negative expressions such as sadness, anger. The reflection of human emotion in the face is undeniable. Even though the appropriate nature of association prevalent between the emotional states of mind with its reflection in the face is premature. The type of such association needs the application of cognitive science on one hand and multimedia framework on the other. So far, several works on the identification of emotion have been reported in other studies on the affective computing field. Among them, the work of Ekman and Friesen [11] established the idea for the detection of human emotion from facial muscle movements. In their literature study, it is reported that there is a correlation between the deformation of face components and expression. They introduced six different universal human emotions viz. anger, disgust, fear, happiness, sadness, and surprise by characterizing the similarity of expressions under different cultures, ages, and sex. Due to a large amount of variability in face appearance, lots of online applications for face modeling has been developed by the researchers in the recent era. Aging, rejuvenation, facial expressions are a few face appearances, considered in online system [8] to understand about social aspects of the face like biometric information of face appearance. Moreover, authors in [8] tried to predict special effects of facial appearance with facial expressions, aging, and face rejuvenation by their proposed web-based online system. In our proposed work, we are dealing with the issue of understanding the dynamic behavior of human emotion through the application of the proposed recognition system.

Several footsteps involving the traditional emotion recognition system have been utilized by the researchers for the study about the activities associated with emotion. These footsteps can be categorized into three major processes: facial components finding, exploration of feature, and identification of feature [34]. In the first step, the entire face region with specifying major face components: eyes, eyebrows, mouth are first detected from input face image (static images or video clips). In the step of feature exploration, appearance-based information, or geometry-based information is collected from face regions [17]. Appearance-based information includes texture, pixel intensity variation with respect to facial expression and these appearance elements are collected either from the whole area of the face or areas of major components [14]. It is described in [7] that somehow facial deformation is affected by the different face appearance elements such as changes in skin texture, changes in muscle points in terms of intensity values. Authors in [7], applied the facial rejuvenation process on facial images to describe facial distortion through the correction of the above mentioned automatic changes, observed on the face. In the case of geometry-based information, it includes shape information of major face regions rather than the entire face region [38]. Finally, explored features are exploited in the classification modules to identify the emotion separately.

In this article, we propose Fuzzy membership features such as fuzzy Isosceles triangle signature (FIS), fuzzy right triangle signature (FRS), fuzzy isosceles and right triangle signature (FIRS), fuzzy equilateral triangle signature (FES) and other fuzzy triangles signature (OFS) for better understanding about the facial transition. These features are individually fed into MLP recognizer to justify the task of differentiation of facial transitions separately. Landmarks on major face parts are found very useful to capture the geometric shape divergence on the face plane in emotional transition. Sequential face images are utilized by the appearance model to identify landmark points. We utilize only significant landmarks such as eight points on both eyes, six points on both eyebrows, three points on the nose, and six points on the mouth portion of the face. The fuzzy triangulation technique is applied to these landmarks to construct triangle shapes by considering each combination of three landmarks and we identify membership values of five different fuzzy triangles to obtain five different fuzzy triangle signatures discussed above. After utilizing these five signatures into MLP recognizer separately, five several recognition results are obtained. We exclusively report all results obtained through the implementation of our system on three benchmark face video datasets viz. CK+, MMI, and MUG and compare the results with others. It is observed that five proposed signatures perform separately better than others by showing higher recognition results on three datasets.

Motivation

A face is an amalgamation of muscle articulations that are distorted from a neutral expression to any universal facial expressions because of the contraction of muscle on the face. During this muscle contraction, nonrigid movements are found around the eyes, eyebrows, lips, and nose on the face plane [33]. The identification of emotional class depends on the measurements obtained from feature points on those face components. In the literature study [4,5,6], authors used static images of highest intensity facial expression to identify the expressional class. It is difficult to estimate quantitative information about the motion of geometric shapes (eye, nose, lips, and eyebrows) while feature points move from frame to frame in a transition of emotion. To fix this problem, we introduce an automatic recognition system that uses a sequence of face frames denoting the transition of emotion from neutral to universal expression. Here we utilized the fuzzy triangulation technique to generate a geometric triangle shape derived from feature points. These shapes are used to understand the nonrigid motion of different face components for a dynamic expression by producing fuzzy membership signatures.

Contribution of our proposed work includes

I
Presenting a frame-based facial expression analysis to estimate flow control of behavioral changes in the transition of human emotion available in terms of video clips.
II
Tracking landmark positions from face frame sequence with the application of AAM [35] to trace the frame-wise deviation of feature points.
III
Introducing a fuzzy triangulation technique that is used to build fuzzy triangle shapes by taking a trio combination of landmark points to assess the fuzzy relationship among major portions of the face (eyes, eyebrows, nose, and lips).
IV
Discussing the separately significant influence of five different fuzzy triangle membership signatures on recognition of dynamic changes in human emotion such as fuzzy Isosceles triangle signature (FIS), fuzzy right triangle signature (FRS), fuzzy isosceles and right triangle signature (FIRS), fuzzy equilateral triangle signature (FES) and other fuzzy triangles signature (OFS).
V
Briefly examining the system performance on three benchmark face sequence datasets: CK+ [19], MMI [36] and MUG [2] with the Multilayer Perceptron (MLP) classification task.

The paper structure of the remaining parts is as follows. Surveys on different previous works are narrated in Section 2. The proposed approach with the explanation of Landmark Points Detection, Fuzzy Triangle Based Geometric Feature Exploration, and Feature Learning by Multilayer Perceptron Network Module is represented in Section 3. Section 4 discusses experimental setup and different results with various benchmark image sequence datasets description. Section 5 reports comparative outcomes of our proposed approach with other works. Section 6 draws conclusion.

2 Literature survey

Several endeavors figure in literature distinguishing one facial expression from another. The task of prominent feature discovery has been mainly focused on the traditional expression identification systems. Feature invention procedure can be considered in two different ways: geometry-based invention and appearance-based invention. The most challenging task in the geometry-based invention is to locate the proper facial landmarks on the face image. More accuracy in finding the locations of landmarks ensures adequacy of feature mapping into the appropriate facial expression. In study [13], landmark identification is done with the combined application of the elastic bunch graph matching (EBGM) algorithm and Kanade–Lucas–Tomaci (KLT) tracker on face images. Authors in [13] first initialized landmarks by using the EBGM algorithm and after that, they tracked the location of landmarks points. Such effort of landmark identification generated distinguishable geometric features from selected points, lines, and triangles and they found different recognition results for different representations. Active shape model (ASM) [10] is applied to find landmarks points using a matching algorithm that works with point distribution information. ASM model constructs a statistical model for deformable shape objects to extract landmarks from face regions. But it is difficult to identify stable landmarks in the sequential frames due to the movement of the position. Authors in [1] presented a framework to generate time-varying features from sequential face frames by using ASM with the help of Lucas-Kanade (LK) optical flow application. In the literature study [9], authors have observed that AAM is found as a more useful landmark tracker to obtain principle landmark locations and these landmark locations are used for producing prominent geometric feature to recognize facial expression properly.

In [41], authors used only two face frames from each sequence to create discriminative features, one is a normal face frame and another is an emotional face frame with maximum expression. Here authors also focused on important facial points to generate differential geometric features by considering differences between distances of facial points in the normal frame and maximum expressional frame. Authors in [27] presented a novel framework to build an efficient recognition system for distinguishing various facial expressions. They also applied geometric features computed by using different statistical measurement techniques within their system and this illustrates the ability to capture the information about the deformation of facial features with the help of hidden Markov models (HMM) recognizer. Authors in [32] compared the recognition capability of combined geometric features with the individual features. They tested their system on individual features that include landmark information and their relative distances as well as on a combination of these two individual features. The combined feature is found more reliable than individual features to differentiate different facial expressions with the help of an ensemble neural network recognizer. On the other side, a statistical measurement of pixel intensities is used for computing appearance information from the face image. Authors in [24] first established the local binary pattern (LBP) operator as an effective texture information retriever from an image. LBP operator encodes each pixel of an image by taking 3 × 3 neighborhood pixels of a particular center pixel. Here the binary number is assigned on each pixel after thresholding the template of size 3 × 3 and a histogram is generated from such binary labeled context. Such histograms help to understand the local pattern distribution of edges, flat areas, and spots, etc. Authors in [31], have found that only a basic LBP operator falls short to describe image features. They used an extended version of LBP with the flexibility in usage of different number of neighborhoods to acquire foremost features. Apart from LBP feature, histogram orientation gradient [12], local Gabor binary pattern [43] are used as appearance features for the detection of facial expression.

In [15], authors tried to extract hybrid features from important facial patches having a major role in changes of facial expressions. They used both geometric shape features as well as appearance features. They also reported the novelty of their work by reducing the computational cost for the extraction of discriminative features. Apart from the task of feature invention, there is another important step of classification learning which is mandatory for the emotion recognition system to discriminate extracted features into different expressional groups. Several classifiers such as support vector machine (SVM) [16], hidden Markov models (HMM) [40], K-Nearest Neighbor (KNN) [21], Multi-Layer Perceptron Neural Network (MLPNN) [26] are employed into the recognition system to estimate recognition results. The choice of an effective classification module plays a crucial role to ensure the robustness of the system because different classifiers may compute different recognition accuracies based on their capabilities.

3 Proposed approach

It is a very challenging task to capture the shape deformation of the face due to vagueness in the border of geometric shape [37]. Our proposed approach tries to deal with this problem by defining the triangle shapes with the fuzzy relationship prevalent among them. The emotional transition recognition process reported in this article is segmented into three important sub-processes namely identification of landmark points, computation of various fuzzy membership signatures corresponding to various triangles under consideration, and classification of the appropriate nature of the transition. Figure 1 displays the workflow of our system.

3.1 Landmark points detection

Uniform allocation of geometric positions into the face frame plays important role in the exploration of prominent geometric features. In our proposed system, a well-known landmark allocation algorithm namely active appearance model (AAM) [35] is put into use to track the reckonable coordinate points on the face plane. AAM model can be stated as a union of two different statistical models which unify them into a single model by collecting both geometric shape information and colors/intensity-based information from deformable face object. The model starts working with some training image samples which are annotated with initial landmark points. Here Procrustes analysis process is utilized to allocate initial landmarks on those training samples and each annotated training sample is denoted by the vector s containing landmarks as an element thereby building a shape model. This model is repeatedly trained to get better matching of initial landmark points with the mean shape $\bar {s}$. Such a contribution of the work is made from the application of an active shape model (ASM) [10]. On the other hand, the process of Eigen-analysis is executed to build a texture model that describes the local pattern of patches available in the shape region. After doing normalization of such texture information, it is stored into the vector g. At last, the correlation between the shape model and texture model is computed through the learning process of the parameters in the (1) and (2) to build an appearance model.

$$ s = \bar{s} + Q_{s}C $$

(1)

$$ g = \bar{g} + Q_{g}C $$

(2)

Where $\bar {s}$ and $\bar {g}$ are mean shape and mean texture respectively, c is control parameter. Q_s and Q_g are shape variation and texture variation respectively. After executing the application of such an appearance model on consecutive face frames we obtain a total of 68 various geometric coordinates describing the entire face region for each individual frame. Among them, only 23 coordinates are found very informative and as the expression changes, those coordinates are dislocated accordingly [4]. All informative points are taken from major face regions viz. eyes (8 points), eyebrows (6 points), nose (3 points), and lips (6 points). Figure 2 illustrates informative landmarks detection from a single face frame.

3.2 Fuzzy triangle based geometric feature exploration

Here, the fuzzy triangulation technique is introduced to capture the changing behavior of human emotion. This technique starts taking the combined information of every three landmark points from the set of 23 landmarks one by one and generates ²³C₃ = 1771 many triangle shapes for each frame in the sequence. The dependency of displacement among landmarks is very high when facial expression evolves with the time span. The characteristics of such dependency are measured in the form of various components of the triangle. Let, $(a_{l},b_{l},c_{l},A_{l},B_{l},C_{l})_{i,j,k}^{m}$ represent three sides and three angles of triangle formed by i,j,k the three vertex points chosen as landmark in l^th frame in m^th sequence. First three components are calculated by using Euclidean distance metric given in (3), (4), (5) and last three components are computed by using following (6), (7) and (8).

$$ a_{l} = \sqrt{(y_{k} - y_{j})^{2} + (x_{k} - x_{j})^{2}} $$

(3)

$$ b_{l} = \sqrt{(y_{k} - y_{i})^{2} + (x_{k} - x_{i})^{2}} $$

(4)

$$ c_{l} = \sqrt{(y_{j} - y_{i})^{2} + (x_{j} - x_{i})^{2}} $$

(5)

$$ A_{l} = \cos^{-1}(\frac{{b_{l}^{2}} + {c_{l}^{2}} - {a_{l}^{2}}}{2\times b_{l} \times c_{l}}) $$

(6)

$$ B_{l} = \cos^{-1}(\frac{{a_{l}^{2}} + {c_{l}^{2}} - {b_{l}^{2}}}{2\times a_{l} \times c_{l}}) $$

(7)

$$ C_{l} = \cos^{-1}(\frac{{a_{l}^{2}} + {b_{l}^{2}} - {c_{l}^{2}}}{2\times a_{l} \times b_{l}}) $$

(8)

Formation of triangle and computation of angles are demonestrated in Fig. 3.

Now, we consider the universe of discourse U by taking three angles A_l, B_l, C_l as member and it is formulated as per (9).

$$ U = \{(A_{l},B_{l},C_{l}) | A_{l} \geq B_{l} \geq C_{l} \geq 0; A_{l}+B_{l}+C_{l} = 180^{o}\} $$

(9)

Next, we fuzzyfy these angle components into different fuzzy triangle families such as fuzzy isosceles triangle (I), fuzzy right triangle (R), fuzzy isosceles and right triangle (IR), fuzzy equilateral triangle (E) and other fuzzy triangle (T) by using following fuzzy membership rules shown in (10), (11), (12), (13) and (14) respectively.

$$ \{\mu_{I}(A_{l},B_{l},C_{l})\}_{i,j,k}^{m} = 1 - \frac{1}{60}\min\{(A_{l} - B_{l} , B_{l} - C_{l})\}_{i,j,k}^{m} $$

(10)

$$ \{\mu_{R}(A_{l},B_{l},C_{l})\}_{i,j,k}^{m} = 1 - \frac{1}{90}\mid A_{l} - 90\mid_{i,j,k}^{m} $$

(11)

$$ \{\mu_{IR}(A_{l},B_{l},C_{l})\}_{i,j,k}^{m} = \min\{\mu_{I}(A_{l},B_{l},C_{l}), \mu_{R}(A_{l},B_{l},C_{l})\}_{i,j,k}^{m} $$

(12)

$$ \{\mu_{E}(A_{l},B_{l},C_{l})\}_{i,j,k}^{m} = 1 - \frac{1}{180}(A_{l} - C_{l})_{i,j,k}^{m} $$

(13)

$$ \{\mu_{T}(A_{l},B_{l},C_{l})\}_{i,j,k}^{m} = 1- \max\{\mu_{I}(A_{l},B_{l},C_{l}), \mu_{R}(A_{l},B_{l},C_{l}), \mu_{E}(A_{l},B_{l},C_{l})\}_{i,j,k}^{m} $$

(14)

Finally, five different fuzzy triangle membership signatures are formalized to signify a perticular face video sequence in various ways. These are defined by (15), (16), (17), (18) and (19) respectively.

Face sequence representation by fuzzy isosceles triangle signature

$$ \begin{array}{@{}rcl@{}} &&(FIS)_{i,j,k}^{m} = [\{\mu_{I}(A_{0},B_{0},C_{0})\}_{i,j,k}^{m}, \{\mu_{I}(A_{1},B_{1},C_{1})\}_{i,j,k}^{m},\\ &&............ , \{\mu_{I}(A_{n},B_{n},C_{n})\}_{i,j,k}^{m}] \end{array} $$

(15)

Face sequence representation by fuzzy right triangle signature

$$ \begin{array}{@{}rcl@{}} &&(FRS)_{i,j,k}^{m} = [\{\mu_{R}(A_{0},B_{0},C_{0})\}_{i,j,k}^{m}, \{\mu_{R}(A_{1},B_{1},C_{1})\}_{i,j,k}^{m},\\ &&............ , \{\mu_{R}(A_{n},B_{n},C_{n})\}_{i,j,k}^{m}] \end{array} $$

(16)

Face sequence representation by fuzzy isosceles and right triangle signature

$$ \begin{array}{@{}rcl@{}} &&(FIRS)_{i,j,k}^{m} = [\{\mu_{IR}(A_{0},B_{0},C_{0})\}_{i,j,k}^{m}, \{\mu_{IR}(A_{1},B_{1},C_{1})\}_{i,j,k}^{m},\\ &&............, \{\mu_{IR}(A_{n},B_{n},C_{n})\}_{i,j,k}^{m}] \end{array} $$

(17)

Face sequence representation by fuzzy equilateral triangle signature

$$ \begin{array}{@{}rcl@{}} &&(FES)_{i,j,k}^{m} = [\{\mu_{E}(A_{0},B_{0},C_{0})\}_{i,j,k}^{m}, \{\mu_{E}(A_{1},B_{1},C_{1})\}_{i,j,k}^{m},\\ &&............, \{\mu_{E}(A_{n},B_{n},C_{n})\}_{i,j,k}^{m}] \end{array} $$

(18)

Face sequence representation by Other Fuzzy Triangles Signature

$$ \begin{array}{@{}rcl@{}} &&(OFS)_{i,j,k}^{m} = [\{\mu_{T}(A_{0},B_{0},C_{0})\}_{i,j,k}^{m}, \{\mu_{T}(A_{1},B_{1},C_{1})\}_{i,j,k}^{m},\\ &&............ , \{\mu_{T}(A_{n},B_{n},C_{n})\}_{i,j,k}^{m}] \end{array} $$

(19)

Sequences have n frames. n = 10 is considered in present context. Thus 1771 × 10 = 17710 is feature vector size. The computation of five fuzzy triangle memberships is pictorically explained in Fig. 4.

We used the combination formula in (20) that calculates the number of combinations of triangles is used in our system to get prominent features. While using the twenty-three number of landmark points instead of the sixty-eight number we computed feature components only for the number of triangles that are found prominent. It actually reduces the number of computation through discarding all other triangles which are not prominent. For a better understanding of computational overhead, we described Case1 and Case2 in (21) and (22) respectively and compared them.

$$ \begin{array}{@{}rcl@{}} T &=& C(l,p) \\ &=& \frac{l!}{(p!(l-p)!)} \end{array} $$

(20)

Number of triangles in single frame is T. l is landmark number available in single frame. p landmarks are used to make a single triangle.Case1: l = 68, p = 3, then

$$ \begin{array}{@{}rcl@{}} T_{case1} &=& C(68,3) \\ & = & \frac{68!}{(3!(68-3)!)} \\ &=& 50116 \end{array} $$

(21)

Case2: l = 23, p = 3, then

$$ \begin{array}{@{}rcl@{}} T_{case2} &=& C(23,3) \\ & = & \frac{23!}{(3!(23-3)!)} \\ &=& 1771 \end{array} $$

(22)

From (21) and (22), it is found that computation complexity of T_case2 < T_case1

3.3 Feature learning by multilayer perceptron network module

Our extracted geometric features are finally used in the proposed Multi-Layer Perceptron (MLP) classifier [3] to be recognized into different emotional classes with the representation of their transitional behaviors. Here, MLP uses such network architecture which is modeled by considering a set of nodes coming from three different layers (input layer, hidden layer, and output layer) and connections among them. Such nodes of the input layer take features as input followed by processing with connection waits and bias value so as to pass the input signal to the next layer called the hidden layer. Similarly, a signal from the hidden layer is transmitted to the output layer to estimate the classification outcomes. Each node is associated with activation function $\tan h$ to forward the signals. At this point, the important task is to check and control the error rate between the estimated outcome and the target outcome. Thus the learning rule for the network training is required for adjusting the error rate. Here, the backpropagation learning rule is applied to modify the weight value of the network connections. The learning rule uses the scaled conjugate gradient algorithm to reduce the error rate by setting up the network parameter by calibrating the learning rate at a suitable level. A lower learning rate makes a greater accuracy in the classification. The network stops learning when it finds a minimal error rate. The error is calculated by using (23).

$$ \epsilon = \frac{1}{2}{\sum}_{k}(\tau_{k} - \gamma_{k})^{2} $$

(23)

Here, τ_k is the target outcome, and γ_k is the estimated outcome at k^th output node. The (24) shows how Partial differentiation is utilized for this error function to control the weight values and make the error minimization.

$$ \delta\omega_{jk} = -\alpha \frac{\partial \epsilon}{\partial \omega_{jk}} $$

(24)

Here, α is learning rate, δω_jk is connection weight between nodes j and k. Our proposed network consists of 17710 many input nodes (as the number is equal to our feature vector size), 10 hidden nodes (at this number our system is found to yield good results) and 6 output nodes (as we considered six basic emotions). Feature learning process is described in Algorithm 1.

4 Experimentation and result discussion

To ensure the performance consistency of our proposed recognition system, we orchestrate experimentations for five different proposed geometric signatures separately. Well-known video datasets of facial expressions viz. CK+ [19], MMI [36], and MUG [2] are used for our experimentations. The performance of each signature is evaluated on those three datasets separately. In this experimentation, each dataset is divided into three non-overlapping segments. The first segment includes training data (70%), the second segment includes validation data (15%) and the third segment includes test data (15%). Training data is utilized to initially learn the MLP network with the fitting of parameters such as learning rate, edge weights, etc. Mostly, the network does not give unbiased results with this training set. The network faces with overfitting problem during this training process due to the improper learning of network parameters. Thus subsequently the validation dataset is used to tune properly the parameters of the network. This set helps the network to find a starting point as well as an endpoint for the overfitting problem. With the help of such supervised learning, a best-fitted network model is obtained for our experimentation. Finally, the test dataset is reserved from the main dataset comprising samples unused in training and validation. This dataset helps the network to gain recognition results from the best-fitted model as an unbiased outcome. The results are evaluated through the computation of confusion matrices from every dataset. For the detailed study on the influence of different signatures over those datasets, other measurement parameters such as False Acceptance Rate (FAR), False Rejection Rate (FRR), and Error rate (ERR) are also incorporated here. These parameters are defined by the following (25), (26) and (27).

$$ FAR = \frac{FP}{FP+TN} $$

(25)

$$ FRR = \frac{FN}{FN+TP} $$

(26)

$$ ERR = \frac{FP + FN} {TP + TN +FP + FN} $$

(27)

Here, $FP\rightarrow $ False Positive, $FN \rightarrow $ False Negative, $TN \rightarrow $ True Negative and $TP \rightarrow $ True Positive.

FAR: It measures the proportion that the classifier incorrectly recognizes an image sequence that is not available in the actual expressional class.
FRR: It measures the proportion that classifiers incorrectly discard an image sequence to be recognized which is available in the actual expressional class.
ERR: It measures the proportion by taking the ratio between all image sequences that are incorrectly recognized and discarded by the classifier and the total number of image sequences is available in the dataset.

Further, the effectiveness of all proposed signatures is justified by presenting the implementation of k-fold cross-validation in this article. Dataset-wise description and performance assessment follow subsection wise.

4.1 Discussion on extended Cohn-Kanade (CK+) dataset

The dataset stores facial expression profiles which are recorded from 210 different peoples. Among them, most belong to the female category (69%) and 81% of this female category are Euro-American, while the rest are from Afro-American (13%) and other cultures (6%). The ages of those people vary from 18 to 50 years. A total of 593 face sequences taken from 123 subjects are available within this dataset. Each sequence includes several image frames (vary from 6 to 60) which are captured and digitized into pixel arrays (640 × 490 or 640 × 480). Only 327 sequences are labeled with 7 different expressions: anger (AN), contempt (CON), disgust (DI), fear (FE), happiness (HA), sadness (SA), and surprise (SU). The dataset with those 327 labeled sequences is used in our proposed experimentation. The expression profile of happiness as a typical representative example is shown in the first row of Fig. 5 which displays the transition from a neutral expression to a happiness expression.

4.1.1 Result analysis on CK+ Dataset

In this section, the performances of all five fuzzy signatures on the CK+ dataset are described individually. Each signature is computed from this dataset to detect the transitional behavior of basic facial expressions. The associated number of various expressions are utilized from this dataset, 83 (AN), 18 (CON), 59 (DI), 25 (FE), 69 (HA), 28 (SA), and 83 (SU). Table 1 displays confusion matrices corresponding to those five signatures computed from the CK+ dataset resulted from the application of the MLP classifier. The corresponding analysis graphs of confusion matrices for CK+ dataset are displayed in Fig. 6.

Table 1 Confusion Matrices for different signatures on CK+ Dataset

Full size table

Fuzzy Isosceles Triangle Signature (FIS)

From Table 1, it is observed that the FIS signature is able to find out contempt, fear, happiness, and surprise with 100% recognition accuracy. 43 anger expressions are classified accurately but the remaining 2 expressions out of 45 anger expressions are misclassified with disgust. The signature classifies 55 disgust expressions and 26 sadness expressions perfectly. Among all disgust expressions, 3 are misclassified with anger and only 1 is mismatched with sadness expression. Out of the 28 sadness expressions, 1 is identified incorrectly with expressing anger, and 1 is with fear expression. Here, 97.55% overall accuracy is found on the CK+ dataset.

Fuzzy Right Triangle Signature (FRS)

It is able to find the transition of expressions such as anger, contempt, disgust, happiness, and surprise properly without any error. In the case of sadness expression, 26 expressions are identified correctly and 2 expressions are wrongly identified as anger. The lower accuracy is found in the recognition of fear expression. Here 3 expressions are categorized into anger class and 1 is categorized into happiness class but 21 expressions are found appropriately classified. The overall recognition rate of 98.16% is found by applying this signature on the CK+ dataset.

Fuzzy Isosceles and Right Triangle Signature (FIRS)

FIRS signature is able to identify the expressions (contempt, disgust, happiness, and surprise) perfectly with no misclassification. 44 anger transitions are detected properly and 1 is misclassified with sadness. Fear classifies 22 transitions correctly but it finds a total of 3 misclassifications distributed equally into contempt (1), sadness (1), and surprise (1). Among 28 transitions of sadness, only 1 transition is incorrectly identified as a surprise and the rest of the transitions (27) are properly classified into exact class. The signature could achieve overall accuracy of 98.47% on the CK+ dataset.

Fuzzy Equilateral Triangle Signature (FES)

The transition of anger, happiness, and surprise are correctly recognized by this signature without any misclassification. 25 transitions of sadness, 24 transitions of fear, 58 transitions of disgust, and 16 transitions of contempt are recognized perfectly. 1 transition of disgust is incorrectly recognized as sadness. 2 transitions of sadness and 1 transition of fear are incorrectly identified as anger. 1 transition of sadness and 1 transition of contempt are misclassified as a surprise. 1 transition of contempt is found confused with happiness. The overall recognition rate of 97.85% is achieved by this signature.

Other Fuzzy Triangles Signature (OFS)

Fear, happiness, and sadness recognize all their transitions into exact classes accurately. 14 transitions of contempt, 43 transitions of anger, 57 transitions of disgust, and 82 transitions of surprise are recognized properly. 4 transitions of contempt and 2 transitions of disgust are misinterpreted as anger. 2 anger transitions are confused with disgust and happiness. 1 surprise transition is misclassified as contempt. The signature reports 97.24% overall accuracy on the CK+ dataset.

Other performance evaluation parameters: FAR, FRR and ERR are computed from CK+ dataset for proposed all different signatures figuring in Table 2. Figure 7 demonstrates FAR, FRR and ERR for CK+ dataset by showing graphs. After doing the discussion on performance, it is observed that transitions of happiness and surprise are recognized with 100% accuracy by all five different signatures which show that the signatures are able to find out the common pattern of changing behavior over the transitions of happiness and surprise emotions. On the other side, the fear transition is found more difficult to detect perfectly. Among five signatures, only two signatures (Fuzzy Isosceles Triangle Signature and Other Fuzzy Triangles Signature) are able to detect more information about triangle shape deformation in fear emotion than other signatures by showing 100% accuracy. Fuzzy Right Triangle Signature and Fuzzy Isosceles and Right Triangle Signature are showing higher performance than the other four signatures by achieving more than 98% overall accuracy on the CK+ dataset.

Table 2 FAR, FRR and ERR on CK+ dataset

Full size table

4.2 Discussion on M&M initiative (MMI) dataset

MMI has 236 expression profiles of emotion which are captured in video clips taking from different people of ages from 19 to 62 years. Most of the people belong to the female category from different cultures (European, Asian, or South American). Each expression profile contains the sequence of both frontal and side view face images. It is a very challenging task to map landmarks properly on such a dataset. We have collected transitions of frontal face expression in which facial expressions are found evolving from neutral expression to peak expression and turn back to a neutral expression. It is found that only 202 profiles of six expressions (anger, disgust, fear, happiness, sadness, and surprise) are labeled perfectly. The transition profile of happiness emotion is shown in the second row of Fig. 5.

4.2.1 Result analysis of MMI dataset

We demonstrated the recognition ability of five different signatures on the MMI dataset individually. The dataset consists of the following number of six expressions: 31 (AN), 32 (DI), 28 (FE), 42 (HA), 28 (SA), and 41 (SU). Different signature induced confusion matrices on the MMI dataset are presented in Table 3. The corresponding analysis graphs of confusion matrices for the MMI dataset are displayed in Fig. 8.

Table 3 Confusion Matrices for different signatures on MMI Dataset

Full size table