Detecting facial emotions using normalized minimal feature vectors and semi-supervised twin support vector machines classifier

Kumar, Manoj Prabhakaran; Rajagopal, Manoj Kumar

doi:10.1007/s10489-019-01500-w

Detecting facial emotions using normalized minimal feature vectors and semi-supervised twin support vector machines classifier

Published: 24 May 2019

Volume 49, pages 4150–4174, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Detecting facial emotions using normalized minimal feature vectors and semi-supervised twin support vector machines classifier

Download PDF

544 Accesses
17 Citations
Explore all metrics

Abstract

In this paper, human facial emotions are detected through normalized minimal feature vectors using semi-supervised Twin Support Vector Machine (TWSVM) learning. In this study, face detection and tracking are carried out using the Constrained Local Model (CLM), which has 66 entire feature vectors. Based on Facial Animation Parameter’s (FAPs) definition, entire feature vectors are those things that visibly affect human emotion. This paper proposes the 13 minimal feature vectors that have high variance among the entire feature vectors are sufficient to identify the six basic emotions. Using the Max & Min and Z-normalization technique, two types of normalized minimal feature vectors are formed. The novelty of this study is methodological in that the normalized data of minimal feature vectors fed as input to the semi-supervised multi-class TWSVM classifier to classify the human emotions is a new contribution. The macro facial expression datasets are used by a standard database and several real-time datasets. 10-fold and hold out cross-validation is applied with the cross-database (combining standard and real-time). In the experimental result, using ‘One vs One’ and ‘One vs All’ multi-class techniques with 3 kernel functions produce a 36 trained model of each emotion and their validation parameters are calculated. The overall accuracy achieved for 10-fold cross-validation is 93.42 ± 3.25% and hold out cross-validation is 92.05 ± 3.79%. The overall performance (Precision, Recall, F1-score, Error rate and Computation Time) of the proposed model was also calculated. The performance of the proposed model and existing methods were compared and results indicate them to be more reliable than existing models.

Emotion Recognition System Based on Facial Expressions Using SVM

RETRACTED ARTICLE: Real-time facial expression recognition for affect identification using multi-dimensional SVM

Article 20 June 2020

Facial expression recognition using three-stage support vector machines

Article Open access 16 December 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

For the past three decades several researchers have displayed more interest in human emotion recognition for Human Computer Interaction (HCI), affecting computing, etc. In [1, 2] and [3] research determined the facial emotion analysis criteria using the still images that had a high recognition rate, but not in video sequences. The work carried out in [4] and [5] established the automatic facial emotion recognition system (FERs) from facial video sequences, which analyse facial emotion through detection and tracking of feature points. From a literature review in [6] and [7] it is suggested that facial emotion is defined by the maximum number of facial feature points with Action Units (AUs) [8]. A maximum feature point for facial emotion creates complex data in computation with less accuracy [6,7,8,9]. To overcome this problem, selecting the minimal feature points is essential. Paul Ekman [10] used the Facial Action Coding System (FACS), which defines only muscular movement of a facial feature. FACS also defines the basic six emotions on a human face. However, FACS uses a combination of additional facial features (i.e. the eyebrow, jaw and mouth region, etc.) to define human facial emotions. In FACS, the facial emotion recognition system has more data complexity and high computation time using 40 Action Units (AUs). In this case, Facial Animation Parameters (FAPs) define human facial emotions through the facial feature point movements. Facial Animation Parameters (FAPs) [11] defines facial emotion of action units within 10 groups using entire feature points. FAP also defines facial emotions with minimal facial actions. In line with FAPs, this paper concentrates only on minimal feature vectors of human emotion. In [12], the geometric deformable model (CLM) has specific face detection and tracking mechanisms when compared to different face modelling, which is considered state of the art.

From the literature survey [1,2,3] and [13,14,15,16,17,18,19,20,21] and [22] the FER system developed by various supervised learning systems achieved good accuracy. However, the robust and automatic facial emotion recognition system is not suitable above those FERs. Therefore, semi supervised learning was chosen for FERs because it is better than supervised learning of FERs. From the survey of [23,24,25] and [26] the semi-supervised Twin Support Vector Machines (TWSVMs) has a high performance compared to the other classifier models. From the [27, 28] and [29] other semi-supervised learning of facial emotion has less accuracy and moderate performance. In this study, multi-class TWSVMs was used to detect human motions.

The purpose of this study is to detect human emotions using semi-supervised learning with minimal facial feature vectors from videos. Four essential steps are involved in this emotion recognition system. First, Constrained Local Method (CLM) is used for face detection, tracking, and extracting the feature points. Second, the minimal feature vectors are formed based on FAPs of AUs. Third, using normalization, minimal feature vectors are obtained. Finally, the normalized minimal feature vectors are fed as input to the TWSVMs for facial emotion classification. Section 2 describes the methods involving facial emotion systems. Section 3 describes the experimental analysis and validation. Section 4 outlines the experimental results and discussion of the proposed system. Section 5 summarises the conclusion and recommendations for future studies.

2 Facial emotion recognition system

The architecture of the robust facial emotion recognition system is shown in Fig. 1 and contains the following five steps:

2.1 Facial detection and tracking

The non-rigid shape of the face is employed by the Point Distribution Model (PDM). In PDM, 2D vertex meshes is symbolized as dimensionality shaped vectors. PDM is applied through Principal Component Analysis (PCA) for acquiring aligned non-rigid face shapes. Before PCA, the Procrustes analysis is applied for removing the parameters s, R, tx, ty of aligned mesh shapes. The 2D PDM is linearly deformed by the variation of non-rigid shapes and is combined with the transformation, placing the shape in an image frame as shown in (1).

$$ \mathbf{x}_{i}=sR(\tilde{\mathbf{x}}_{i})+T_{t_{x},t_{y}} \Leftarrow T_{s,R,t_{x},t_{y}} (\tilde {\mathbf{x}}_{i}) $$

(1)

Where s, R, t_x, t_y denotes as scale, rotation, and translation respectively. Where x_i denotes i^th landmark of 2D PDM’s location, x_i denotes as mean shape of 2D PDM and pose parameters of PDM represent as p = (s,R,t,q). In CLM fitting, applied Subspace Constrained Mean Shift (SCMS) [30] is applied to combine the good local (patch) search and optimised 2D PDM landmark fitting. In an exhaustive local search, using linear logistic regression [31] gives the response maps for the i^th landmark position in image frame, which is given in (2).

$$ \textit{p}(l_{i}=\textbf{aligned}|\textit{I},\mathbf{x})={\frac{1}{1+\exp\{\alpha C_{i}(\textit{I},\mathbf{x})+\beta\}}} $$

(2)

Where l_i is a discrete random variable that denotes the i^th number of iterations, it aligns to correct the PDM landmark value. I denotes the 2D location in local images (x). β is the regression coefficient and denotes as to correct the 2D landmark of local maps. C_i is the linear classifier of a local detector which is defined in (3) with $\textbf {x}_{i=1}^{m}$$ \in {\Omega }_{\textbf {x}_{i}}$ (image patch) and b_i is shape vector parameters.

$$ C_{i}(\textit{I}(\mathbf{x}_{i}))=\textit{w}_{i}^{T} [\textit{I}(\mathbf{x}_{i});....;\textit{I}(\mathbf{x}_{m})]+\textit{b}_{i} $$

(3)

The optimized probabilistic function of local response image for each landmark detection is given in (4). Once the response maps of each landmark of local search have been found that the probabilistic function is maximized.

$$ \textit{p}\left( \left\{l_{i}=\textbf{aligned}\right\}_{i=1}^{n}|\textit{p}\right)= \prod\limits_{i=1}^{n}\textit{p}(l_{i}=\textbf{aligned}|) $$

(4)

With respect to p. PDM parameters, x_i is parametrized of response images. The active shape model is defined as a summation of weighted least square difference between the maximum response map and peak response of PDM coordinates, which is given in (5).

$$ \textit{Q}(\textbf{\textbf{p}})=\sum\limits_{i=1}^{n}\textit{w}_{i}\Vert\textbf{x}_{i}-\mu_{i}\Vert^{2} $$

(5)

A first order Taylor expansion is applied in (4) for minimising the active shape model, which leads to convergence of the PDM landmark. This is defined in (6).

$$ \textbf{x}_{i}\approx \textbf{x}_{i}^{\textit{c}}+\textbf{J}_{i}{\Delta}\textit{p} $$

(6)

And solving the parameter update is defined in (7).

$$ {\Delta}\textit{p}=\left( \sum\limits_{i=1}^{n}\textit{w}_{i}\textbf{J}_{i}^{T}\textbf{J}_{i}\right)^{-1} \left( \sum\limits_{i=1}^{n}\textit{w}_{i}\textbf{J}_{i}^{T}(\mu_{i}-\textbf{x}_{i}^{c} )\right) $$

(7)

The current parameter is applied in p ← p + Δp to estimate the pose and shape. Here Jacobian is J = [J₁; ⋯ ; J_n] and current shape parameter update is $x = [{x_{1}^{c}}; \cdots ;{x_{n}^{c}}]$. For independent maximisation and non-parametric representation of the Kernel Density Estimate (KDE), [32] is applied for each PDM landmark location in the Mean-Shift Algorithm (MSA) [33]. This consists of a fixed point iteration and is defined in (8). Equation (8) is applied iteratively until it reaches convergence Δp

$$ \textbf{x}_{i}^{\tau+1}\leftarrow\sum\limits_{\mu_{i}\epsilon{\Psi}_{\textbf{x}_{i}^{c}}} \frac{{\alpha^{i}_{\mu_{i}}}N\left( \textbf{x}_{i}^{(\tau)};\mu_{i},\sigma^{2}\textit{I}\right)} {{\sum\limits_{\textit{y}\epsilon{\Psi}_{\textbf{x}_{i}^{c}}}} {\alpha_{\textit{y}^{i}}N\left( \textbf{x}_{i}^{(\tau)};\textit{y},\sigma^{2}\textit{I}\right)}}\mu_{i} $$

(8)

To convert the shape constrained to optimisation, which is defined in (9), MSA uses a two-step strategy: i) compute the mean-shift update for each 2D PDM landmark, ii) constrain the mean-shifted landmark to remain in the PDM parametrization using a least-square fit. The Gauss Newton update for least square PDM constraint is also defined in (9).

$$ \textit{Q}(\textbf{\textbf{p}})=\sum\limits_{i=1}^{n}\Vert\textbf{x}_{i}-\textbf{x}_{i}^{(\tau+1)}\Vert^{2} $$

(9)

The image that has a high probabilistic function of local response obtained from (4) is given as an input to the EM algorithm to find the difference between the peak responses. The Q-function of M-step is defined in (10), and is formed using the linear shape model in (6).

$$ {\Delta}\textit{p}=\textbf{J}^{\dagger}\left[\textbf{x}_{1}^{(\tau+1)}-\textbf{x}_{1}^{c};\cdots\cdots;\textbf{x}_{n}^{(\tau+1)}-\textbf{x}_{n}^{c}\right] $$

(10)

Where J^† denotes the pseudo-inverse of J and $x_{i}^{(\tau +1)}$, it denotes the i^th landmark of mean shift update parameters, which is given in (8). The kernel width relaxation and local optima is addressed in the subspace constrained mean shift (SCMS) [30].

2.2 Feature vectors displacement

The geometric deformable model (CLM) is carried out to perform face detection, tracking and extraction from video. The feature vectors displacement (d_{i, j}) is defined as the facial feature point movement between the consecutive frame by frame sequence. (d_{i, j}) is the difference between the grid node displacements of the first to i^th node coordinates. The feature vectors displacement is given in (11).

$$ \begin{array}{@{}rcl@{}} \textbf{d}_{i,j}&=&\left[\begin{array}{ll} {\Delta}\textbf{x}_{i,j}\\ {\Delta}\textbf{y}_{i,j} \end{array}\right]\\ &=&\sum\limits_{i,j=1}^{F,N}\scriptsize\left[\begin{array}{l} \textbf{a}_{11}-\textbf{a}_{12} \quad\textbf{a}_{12}-\textbf{a}_{13}\quad \cdots\quad\!\textbf{a}_{1,j+1}-\textbf{a}_{1,j+2}\\ \textbf{a}_{21}-\textbf{a}_{22} \quad\textbf{a}_{22}-\textbf{a}_{23}\quad \cdots\quad\textbf{a}_{2,j+1}-\textbf{a}_{2,j+2}\\ \vdots\qquad\qquad\qquad{\vdots} \quad\quad\quad \cdots\qquad\qquad\vdots\qquad\quad\\ \textbf{a}_{i+1,1}-\textbf{a}_{i+1,2} \quad\cdots\quad \cdots\quad\textbf{a}_{m,n}-\textbf{a}_{m,n+1} \end{array}\right]\\ \end{array} $$

(11)

i = 1, … , F, j = 1, … , N , where Δx_{i, j}, Δy_{i, j} are x, y axis coordinates of grid node displacement of the i^th node in the j^th frame, respectively. F is the number of grid node (F = 66 no. of nodes of CLM) and N is the number of extracted facial frames from video. The grid deformation feature vector displacement g_j consists of feature vector displacement of the every geometric grid node d_{i, j}, g_j which is given in (12).

$$ \textbf{\textit{g}}_{j}=\left[\textbf{d}_{1,j},\textbf{d}_{2,j},\cdots\cdots,\textbf{d}_{E,j}\right]^{T} $$

(12)

2.3 Minimal feature vectors displacement

The pictorial representations of feature points that are affected by different emotions are shown in Fig. 2. The entire feature vectors and minimal feature vectors of the six basic emotions are shown in Fig. 2a and b, respectively.

In [11], [34] describe the groups, number of FAPs, and textual description of emotion, which are affected by the six basic emotions. Based on the definition of FAPs, which is illustrated in Tables 1 and 2, it provides the information regarding the number of feature points and group numbers that are affected by the six basic emotions. Table 1 provides information about the entire set of feature points involved in six emotions. Insight of any emotion is given as onset, where staring points of emotion, apex, where emotion reaches peak position, and offset, where emotion returns to the neutral state. The entire set of feature vectors of any emotion is the displacement between the onset and offset phase. From the literature [6, 9] and [35], the entire vectors displacement has a high data redundancy, less accuracy, and low data computation. To increase the accuracy and reduce computation time, the minimal feature vectors displacement is chosen. Any emotion that reaches the apex phase from onset phase is defined as the peak response. The minimal feature vectors displacements are chosen based on the high variance values among feature points during the peak response of an emotion. In Table 2, the high variance range of minimal feature is calculated and tabulated. The major feature movement for six facial emotions are eyebrow (ebw), outer lip (olp), eyelids (eld) and corner lip (clp). Table 2, shows the differences of variance range of the other than minimal feature points that are also tabulated.

Table 1 Entire feature vectors by FAPs

Detecting facial emotions using normalized minimal feature vectors and semi-supervised twin support vector machines classifier

Abstract

Similar content being viewed by others

Emotion Recognition System Based on Facial Expressions Using SVM

RETRACTED ARTICLE: Real-time facial expression recognition for affect identification using multi-dimensional SVM

Facial expression recognition using three-stage support vector machines

Explore related subjects

1 Introduction

2 Facial emotion recognition system

2.1 Facial detection and tracking

2.2 Feature vectors displacement

2.3 Minimal feature vectors displacement

2.4 Normalization of feature vectors displacement

2.5 Facial emotion classifier-bilinear classifier

3 Experimental analysis and validation

3.1 Experimental settings

3.2 Training and testing

4 Experimental results and discussion

4.1 Experimental results

4.1.1 Hold out cross validation

4.1.2 10-fold cross validation

4.2 Discussion

5 Conclusion

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation