Keywords

1 Introduction

Facial expression delivers important information about human emotion and relation and plays crucial role in communication. The main area for facial expression detections comes under pattern recognition and computer vision. Humans recognize emotions quite efficiently and accurately. An automatic emotion recognition system is an element in human–machine interaction. This expression is the most natural, powerful for humans to communicate his/her emotions. It is a computer process for verifying or identifying a person from image or video frame. This process is performed by the computers or humans who contain extracting facial feature, locating faces, and applying facial features.

Facial emotion part deals with the recognition of expression from the image [3]. This part is used in communication because one can expresses him/her self-basis of mood. Emotions are detected using various approaches, that is, body language, gestures, voice intonation, and complex methods like electroencephalography are used. Six types of universal emotions recognized in various cultures are: disgust, anger, sad, surprise, happy, and fear.

1.1 Facial Expression Recognition System

The facial expression recognition system recognizes the expressions of identifying the human emotions like disgust, fear, surprise, sadness, anger, and happiness. These expressions vary for every individual person. By spoken 7% and by voice 38%, whereas 55% through facial expressions message information is conveyed.

Facial expression recognition system consists of the following four stages.

Image Acquisition

Image sequences or static images are used for recognizing the expressions. For the reorganization of the image 2-D gray-scale image was used popularly. Color images give more information about emotion. Color images are preferred for image recognition because of low cost of equipment. For images we use the phone, camera, or other digital devices.

Pre-processing

It is the key role. This stage enhances mainly input image quality. It locates data of interest by removing the noisy information. It removes redundancy from image without the image detail. Pre-processing also includes filtering and normalization of image which produces uniform size and rotated image.

Feature Extraction

It consists of “interest” part in image. It includes information of shape, motion, color, and texture of facial image. It extracts the meaningful information form image. Compared to actual image, feature extraction is used to reduce information of image, which gives advantage in storage.

Classification

It follows the output of features extraction stage. Classification stage identifies the facial image and groups them according to certain classes and helps in their proficient recognition. Classification is complex because of many factors affected. Classification stage can also be called feature selection stage, deals with extracted information, and groups them according to certain parameters.

2 Background and Literature Survey

2.1 Facial Expression

Outward appearance distinguishing proof is a critical element to know the conduct of a man. It helps the other individual who is in up-close and personal changes with other individuals so that one can take a legitimate measure to talk about or communicate. In any case, the characteristic outward appearance of a man cannot be recognized effortlessly. It requires some master learning to peruse the substance of a man. When we need to peruse the substance of a man by utilizing a camera and investigate utilizing a face acknowledgment framework, it is a troublesome part to a machine to peruse the natural appearance of a man as the machine can break down the noticeable bit of the face, yet not the psyche of the individual.

According to the investigation of Mehrabian [16], among the human correspondence, facial articulations involve 55% of the message transmitted in correlation with the 7% of the correspondence data passed on by semantic dialect and 38% by paralanguage.

2.2 Types of Emotions

Emotion detection through facial expression recognition is becoming quite popular for its increasing scope of applications in human–computer interactive systems. However, there are several methods developed for emotion recognition, including voice, facial expression, posture, and gesture. The emotion recognition mainly consists of two fundamental steps; feature extraction and classification. Feature extraction is defined as the set of features or attributes, which turns to emotional expression. During classification, the features are mapped onto one of the several emotions, such as anger, sad, disgust, and happy.

Figure 2 illustrates the basic seven types of emotions. These are fear, anger, sadness, happiness, disgust, and surprise [18]. Here we have shown only six types of emotions (Fig. 1).

Fig. 1
figure 1

Facial expression recognition system

Fig. 2
figure 2

Six basic emotions [19]

The set of features that are considered for extraction and the classifier that is used for the task of classification are equally important to determine the performance of an emotion recognition system. For a poorly selected set of features, sometimes, even a good classification algorithm cannot give a good result. Thus, selecting good features is always a pre-requisite for high classification accuracy and good result. Many algorithms are developed for classifying emotions in the last three decades.

3 Various Methods of Emotion Detection

There are various methods used for detecting emotions from facial expression. Here we have reviewed some methods and algorithms.

3.1 Fuzzy Rule Method for Expression Recognition

Mufti and Khanam [1] proposed a fuzzy rule-based algorithm where partial matching of emotion can detect the feature.

Step 1:

Input a video which has emotions.

Step 2:

Extract frames where input is a continuous video signal. The frame extraction module extracts individual frames from the sequence at a fixed rate to obtain the sampled frames which are essentially 2-D gray-scale images.

Step 3:

Extract the feature point for facial expression.

Step 4:

Face animation parameter (FAP) extraction module calculates FP movements and decides which particular FAPs have been activated.

Step 5:

The crisp inputs are defined by the fuzzification method; by using membership function values are translated into linguistic variables.

Step 6:

Detect the expression based on fuzzy parameters

This algorithm is implemented based on the fuzzy logic principle to recognize video image emotions and give high robustness. Because of efficient fuzzy systems, the system is thus robust to various fluctuations in image processing results. Recognition results of emotions are obtained as for joy 60%, surprise 80%, disgust 50%, and surprise 60%. In this paper surprise emotion gives the highest accuracy than other emotions.

3.2 3-D Facial Features Distance Algorithm

To classify emotions on the face, Hamit and Demirel [9] proposed 3-D facial features distance algorithm. It recognizes the basic six expressions. To describe emotions on face, that is, anger, happy, fear, sad, neutral, and surprise, the distance values are used. In this data captured form 60 subjects and each expression is used.

Step 1:

Feature extraction of distance vectors characteristics is defined in a table containing the six distances.

Step 2:

By using NN the distance vectors are classified and trained using back propagation algorithm.

Step 3:

Six distances are used to normalize the five distances

The drawback in this method is that the classification of emotions rate is low for a class named anger because of confusion with anger and neutral classes.

3.3 Algorithm for Feature Fusion to Detect Facial Emotions

In this algorithm, the authors Chen et al. [10] used feature fusion nonlinear dimensions reduction algorithms (HLAC + WPCA), and used a method feature extraction in the face recognition because of the ability to preserve the original geometry. Follow with simple min distance classifier which gives better result than principle components analysis-based method.

Step 1:

Expressions are extracted by using integrals projections method.

Step 2:

Extract the features by using higher-order auto-correlation method and reduction of dimensions can be done using weighted PCA method. By using facial expression measure system face actions code system weighted values are updated.

Step 3:

Minimum distance classifier is used in recognizing the facial expressions for surprise, anger, disgust, happy, sad, and fear

Database used: Image database consists of 336 images and CMU-Pittsburgh Actions Unit Code Face Expression include six persons; 56 images are used per person, which include 7 expressions, and for expression 8 images are used. A total of 168 images are used for training and others are used for testing.

Feature fusion nonlinear dimensions reduction (FFNDR) algorithm is used for features extraction in the expression which recognizes after preserving the original local geometry.

Some researchers are using powerful and complicated classification methods like NN, SVM, and KFDA which are used instead of the minimum distances classifiers to get better performance and improve the system performance with the integration part of a local feature to satisfy the demand for real time and automatic applications, and overcoming the illumination and gesture changes in emotion recognition.

3.4 Automated Emotion Detection Using Facial Expression Recognition

Andrew et al. [7] proposed an algorithm for facial expression recognition. This algorithm is as follows:

Step 1:

Video processing: Input video are sequenced into individual frames. Each frame is rated as 25 frames per second. Once the video is sequenced, the frames are moved to the queue.

Step 2:

Appearance and shape modeling: A geometric landmark on the face used for automatic face tracking and registration.

Step 3:

Classification is carried out on expressions

Step 4:

After classification of expressions at runtime, the AFERS offers operators in the real time; output of the expression classification, such as snapshot generation and interrogation reporting, and so on, is done by the analytics engine

The AFERS system is used to recognize the process of interview but it does not detect directly which needs refinement and more research along with contextual models is required.

3.5 Facial Expression Classification Based on Singular Values Decomposition and Principals Component Analysis

In this paper the authors, Kaur et al. [8], have proposed PCA for singular values decompositions for classifying the emotions.

Step 1:

Input image is given for pre-processing.

Step 2:

Extracted features are sent to the classifier. Image from the expression database is pre-processed and gives knowledge database for training and fed to classifiers.

Step 3:

The two images were compared and expressions recognize the emotions to be detected

The database used for emotions recognition system is JAFFE. This database measures the achieved performance of recognition systems. It contains seven fundamental facial expressions for emotion classification, such as sad, neutral, disgust, happy, fear, angry, and surprise. These have gray-scaled images of 213 for expression recognition. Each image size is of 256 × 256.

The advantage of this algorithm is that excellent classification results are found for all principal emotions along with the neutral emotion from the training database. The image is enhanced, localized, and its features are extracted by using singular value decomposition technique. The algorithm can effectively detect different emotions.

The main drawback of this algorithm is the elimination of errors because of the reflections in the images, that is, persons wearing glasses were not implemented and algorithms used are computationally efficient.

The authors found the results with the highest accuracy. The recognition rate for emotions like angry, disgusts, happy, sad, and surprise along with neutral is obtained. Among them, the expression happy gives 95% accuracy which is a higher rate than existing algorithms. Finally, the network is tested on the real-time dataset with excellent recognition rate.

3.6 Neural Network Algorithm for Expression Recognition

Using this method the authors, Pushpaja et al. [6], have presented a basic facial gesture to recognize the arrangement, and a generic face detection arrangement gives description of emotion classification and proposes system containing the following module or steps:

Step 1:

The inputted image is from the webcam.

Step 2:

Optical flow method is used for facial detection.

Step 3:

Image pre-processing: The image is normalized and noisy or unwanted information in the image is removed. In eigenface library the database images contain two steps: testing dataset and training dataset. Eigenfaces are calculated from the trained dataset. Once the training phase has been done, then these dataset images are matched with the best eigenface which contains the highest eigenvalues.

Step 4:

PCA is used for feature extraction.

Step 5:

Classification is done using a feed-forward back-propagation network

The advantage of this algorithm is it provides a solution to the problems of recognition in a constraint environment. The drawback is it poses a problem in unconstraint environment.

3.7 Multimodal Emotion Recognition in Response to Videos

Mohan et al. [2] performed the experiment on the extraction of parameters from electroencephalogram and response to multimodel emotions recognition by independent users that is used on pre-existing clips from online resources which when compared with the ground-defined truth gives us the efficiency of the recognition system.

Step 1:

Signal setup has been done by recording environment using electroencephalogram.

Step 2:

Track real-time expressions obtained by the frame of sequences and apply the algorithm called facetracking.

Step 3:

Data acquired from the step above are transferred to other computers and processed using internet communications engine middleware.

Step 4:

Candide-3 model was implemented with facial features invariantly.

Step 5:

Bayesian network was constructed for the action units (AUs) obtained from the above step.

Step 6:

Facial expression was evaluated and emotion was recognized

The advantage of the proposed system is robust detection is possible using this method and it improves classification rate. It provides a good result for the emotion neutral than other states of emotions and also eliminates errors in the user detection by using less data on the trainer at feature extraction state.

The limitation of the system is that the parameters are complex in nature and changes from person to person with age. Light conditions, shadow, and noises can affect papillary responses. The fusion classifiers are hard to find. Different videos and modalities are not directly comparable due to the lack of proper metrics to measure the emotion using eye gaze.

3.8 Eigenvalue-Based Facial Emotion Classification

This method uses hardware system known as field program gate arrays that are digital and help in complex computations involving that of finding the largest eigenvalues using power method. The proposed algorithm of Sheily et al. [11] is given as follows:

Step 1:

Acquire the image of text form.

Step 2:

Crop image to 16 × 16 pixel matrix with every pixel consisting of range 0–255.

Step 3:

Apply power method and use Verilog to calculate the largest eigenvalue.

Step 4:

A necessary simulation was performed.

Step 5:

To get the results, the file (.bit file) format is generated.

Step 6:

FPGA ROM maintained the stored files.

Step 7:

To detect the corresponding emotions simulated eigenvalue was used

The advantages of the power method implementation using the FPGA kit are it is mobile and flexible, implementation is practical, easily configurable, and easily integrated into other systems. It is found that this methodology involves less mathematical complexity, uses less storage space, and thereby promotes faster recognition.

Limitation of the algorithms is FPGA is expensive, thus making these approaches cost-ineffective. Also, include additional dimensions which involve tedious calculations.

The power method is used to recognize emotions. It is implemented on Xilinx Spartan 3E FPGA kit. The FPGA system consists of size, configurability, flexibility, and mobility. By using this algorithm largest eigenvalues are obtained and then compared with the desired eigenvalues of the stored dataset. It produces results in both MATLAB and Verilog which classify the emotions like angry, sad, and happy. Compared to MATLAB, Verilog provides the results faster.

3.9 Artificial Neural Network for Emotion Classification

The authors Siraj et al. [12] proposed the following algorithm:

Step 1: Load video or image

The picture of the face will be taken as input. Then the image process is performed which will convert the image into the desired color and resolution.

Step 2: Frames are detected.

Step 3: Feature extraction: The features are extracted using HAAR method. Contrast values are used instead of HAAR features to detect images. Relative darken and lighten areas are determined from a group of pixels. The image detection varies in various sizes.

Step 4: Train classifier: To train the input against original data which is the target, the method radial basis neural network is used as training classifier. This classifier is trained for different videos and images. After training the train classifier is performed to test target data to identify facial expressions.

From this experiment, the authors found the automated human emotion classification which successfully displays emotion from an image uploaded by the user for matching it with trained dataset. In case when trained dataset is matched with the uploaded image, the system shows the output, otherwise not. Input to the system is an image of human emotion or a video from our database. The output is the emotion of the uploaded image or the frames are taken from the video.

3.10 Approximated Supervised Learning and Bezier Curve Approach for Facial Recognition

In this model, Dixit Manish and Silakari Sanjay [4] proposed the facial recognition of an image with human frontal faces using approximation on Bezier curves to ensure curve fitting and smoothing on structural features that are classified by learning from intelligent neural networks. The proposed method was:

Step 1:

For converting grayscale, frontal image is taken as input.

Step 2:

Applied histogram binarization and equalization thinning of image for pre-processing the inputted image.

Step 3:

Harris algorithm was used to extract the corner detection.

Step 4:

To use control points, parametric and Bezier curves were used.

Step 5:

The curve points were fed to test and train a NN.

Step 6:

BP algorithm was used to train the facial images through the network.

Step 7:

For recognition of output, testing was done for the accuracy of the network through trained images

The advantage is the facial rate is comparatively higher for this method and it produces a high amount of images when recognition is correct. Use the image-thinning process to reduce redundant pixels and also reduce the cost.

The limitation of this method is images with structural components and angled poses are tough to detect.

The output of the above method is the feature extraction and recognition techniques require longer evaluation and training time if the proper feature points with suitable algorithms are not considered. In this suitable algorithm for edge, corner detections and feature extraction are used with its hybridization in training process. It reduces the error rate.

3.11 Contourlet Transformation and PSO

Li et al. [13] used contourlet transformation and the spatial domain to create feature vector unlike the current working system that works on local binary patterns/steerable pyramid that create feature vector only from transformation and spatial domain. It utilizes properties of directionality which extracts important features. For contour sub-bands, the authors suggested a new coefficient enhancement algorithm which enhances skin region features to make the system more vigorous. They also tested in features level fusion on multiple databases that showed face recognition rate was competitive. Action units are used for recognition and analysis by using classifier in a video.

The first random frog will detect action units and these detected AUs are classified by second random forest which detects expressions. On first frame face landmarks are generated by active appearances model and landmarks are then tracked for the sequence of frames in a video by Lucas-Kanade optical tracker. The vector was created between natural and peak expression. Random forest detects action units from DNNP features, and these AUs are sent to second random forest as an input which then processes these AUs into facial expressions. This method achieves an accuracy rate of 89.37% for the two-fold forest classifier.

Microgeneric algorithms embedded with particles swarm optimizations give the features. It also solves the local optimum problem and premature convergence by introducing non-replaceable memory, a secondary swarm having five participants with a leader and four followers, velocity updating, sub-dimension-based regional facial feature searching and global searching. For recognition, features that are generated from the algorithm are classified with multiclass SVM and ensemble classifier for improved accuracy. Results from the paper LBP-based feature extraction surpassed most recent local binary pattern variants. For expression recognition, 100% accuracy was achieved in case of CK + and 94.66% in case of MMI database for mGA embedded POS and diverse classifier. The assessment was done for 30 trails.

3.12 Faster R-CNN Algorithm

Jiang Huaizu and Learned-Mille Erik [14] proposed an approach that consists of regional proposal network and fast regions of CNN features. The regional proposal network is used instead of using a selective search algorithm because of slow and time-consuming process which affects the quality of the network.

Region proposal network: The output contains boxes/proposals by using regression and classifiers. RPN predicts the possibility of the anchor being processed a foreground or background and refines the anchor.

ROI pooling: Obtains regions with vary in sizes of CNN features. For the reduction of feature maps, the pooling method is used to simplify the problem. This pooling divides feature maps which are an input to a number of regions and applies maximum pool on each region.

The dataset was provided by Chinese Linguistic Data Consortium (CLDC) consisting of multimodal emotional video and audio data. This database has 66,486 images with eight emotions. Among them 12,019 anxious, 9867 happy, 6174 worried, 4977 neutral, 2574 surprise, 10,862 angry, 18,326 sad, and 1687 disgust images.

The model for this algorithm used Pascal_voc.model provided by the py-faster-rcnn, VGG_CNN_M_1024, ZF and VGG16, which are called the faster_rcnn_alt_opt network to fine tune image net model.

For the training set, the sampling was done towards positives, because of compared to background it is extremes to be uncommon.

The result shows that faster R-CNN identifies facial expression easily. The original picture is used as the whole network input. The process of feature extraction in recognizing facial expression is completely avoided. The features are extracted by the network from the training dataset. The region proposal networks (RPNs) generate efficient region proposal. In each image, it locates the face region and recognizes the expression directly.

3.13 Emotion Detection in Virtual Learning Environments

Yang et al. [5] proposed a method for the reorganization of emotions on face virtual learning model, consisting of subset feature, emotion classifier, and feature extraction. HAAR method detects the inputted image. The basic six emotions are classified using NN classifier. The JAFFE database gives high classification performance.

Step 1:

Input the frames

Step 2:

Acquire the facial image

Step 3:

Pre-processing emotion recognition becomes lower when cropping the eye region and grabbing the characteristic values. Here transformation is required to improve accuracy of the system.

Step 4:

HAAR cascades method was used for the identification of image frames. If the face does not exist in the system, this algorithm is repeatedly applied. If the face image frames exist in the system, then mouth and eyes were located and cropped.

Step 5:

Filter and edge detection: Mean and median filters have been applied to make the image smoother by removing unwanted noisy information. If gradient input images were at the maximum condition, then returns the edge

The authors used the database of 213 images of Japanese women’s faces. The actual image and frontal faces are to be readjusted and trim are shown by using the JAFFE database.

This model solves the problems and gives high accuracy and efficiency using emotion classification in virtual learning environments for facial recognition. HAAR method detects the mouth and eyes region. This method identifies the emotions through the network named neural network method. This method is also applied to real distance education.

3.14 Recognition of Facial Expression Using Eigenvector-Based Distributed Features and Euclidean Distance

Jeemoni and Karen [15] proposed a method to recognize facial expression using eigenvector-based distributed features and Euclidean distance to analyze the various parts of the face, like the left eye, right eye, nose, lip, nose, and lip together. They implemented facial expression recognition in three steps. These steps are:

Step 1:

To calculate the eigenvectors and eigenvalues of the covariance matrix of the facial images. With the available eigenvectors of expressions, separate subspaces for all the six universal expressions (surprise, happiness, fear, anger, sad, and disgust) are created.

Step 2:

Use different Euclidean distances between all the six expressions with the current image that captured.

Step 3:

Count the number of different expressions and using voting rule the highest expression is the result

They used real-time facial expressions using a digital camera and analyzed the result and got 95% accuracy. The drawback of this method is the computation time to identify a single facial expression and for more cases, it will take more time.

3.15 Emotion Detection Algorithm Using Frontal Face Image

Hwan et al. [17] proposed a new emotion detection algorithm using a frontal facial image. This algorithm has three stages: the first stage is the image processing stage, the second stage is the facial feature extraction, and the last stage is the emotion detection. They implemented fuzzy classifier to get the emotions. Initially, to extract face region from the facial image, the fuzzy color filter and histogram analysis methods are implemented and later fuzzy classifier is used to classify the emotion from the extracted feature.

They got an accuracy of 82.7% using this new method.

4 Conclusion

In this article, some of the methods and algorithms on emotion recognition are overviewed from the face. These methods involve recognition, extraction, and face categorization. Many approaches give better recognition. Techniques with recognition having higher rate can give greater performance. These approaches provide a better solution of expression recognition and can work in a well-constrained environment. Emotion detection is an issue that causes complexity due to psychological emotions.

We have not considered the emotion detection from the speech as it requires in-depth study and also a complex one. Therefore, research in this field will remain under continuous study for more years to come because many problems have to be solved so that it creates an ideal user-interface and improved recognition of confusion emotions is required.