Keywords

1 Introduction

The intentions of facial emotions that help us understand others are significant factors in human communication. Different facial expressions depict various human emotions, each one of those are distinct than the other one. The various surveys suggest that more than 60% of human communications are non-verbal. The facial expressions are one of the key data channels in interpersonal communication among several non-verbal components, by showcasing the meaning of emotions.

The human–computer interaction (HCI), virtual reality (VR), augment reality (AR), advanced driver assistant systems (ADASs), and entertainment are the artificial intelligent techniques which are growing rapidly with automatic facial emotion recognition (FER).

The FER is composed of three major steps, (1) facial component and face detection, (2) extraction of features, and (3) in conventional FER approaches classification of expression. First, detection of face image using input image, and landmarks or facial components detection from the region of face. Second, the different various structures are mined from the facial components. Third, the pre-trained FE classifiers yield the recognition results using the mined features.

2 Literature Survey

Ferreira [1], Cowie [2] were the one who had originated the technology of recognizing face expressions (known as FER), this went on to be one of the predominant topics of research and has application in a vast number of fields. Because of the present FER dataset being too small, the most difficult challenge is to train the deep network algorithm. The transfer learning model can be used as a substitute but the accuracy of the models will be again lesser than the actual potential. For that reason, we have presented a paper with point-to-point neuron networking architectures along with a meaningful loss function which will be based on the past knowledge about face expression bringing the result of some motion in face muscle and other parts. The loss function is explained to take the full process of learning in a perfect order so that the explained neural network is able to learn the emotions or expression attributes on the faces. The accuracy of this model will work for all laboratory controlled and all strange environments. The neural network presented is usually in use for the ongoing technologies as it gives most effective outputs.

Yang [3], Verma [4], Hsu [5] proposal for recognizing face expressions (FER) which is an important factor to machines for understanding the change in emotions of human beings. The accuracy of extracting results out of the face expressions is very laborious on consideration of the images with less brightness scale or partial faces and various different shortcomings. As a consequence, the attributes that can be used to draw out the actual outputs need to be put into action. Method: A weighted mixture deep neural network (WMDNN) has been presented to take out the attributes from facial images automatically that makes it easy to perform with the help of FER. There are lots of restrictions for FER because of too many available pre-processing approaches like detecting faces, rotating, and data augmenting.

Appearance associated attributes of face pictures are taken out by tuning a semi VGG16 Web, and the variables for this are applied using VGG16 representation which do train on dataset of net images. The results of all the mediums are combined in a prominence manner and the outputs of ultimate recognition are anticipated with the help of SoftMax differentiation. Outputs: Investigational output shown in the presented algorithm can identify four basic face expressions like happiness, sadness, neutral, and being surprised with high efficiency.

3 Materials and Methods

Identification of facial expression is classified into pre-processing, face and facial component detection, feature extraction, and classification as shown in Fig. 1.

Fig. 1
figure 1

Identification of facial expression flow diagram

  1. A.

    Pre-processing: This step is performed to improve the quality of the image. It concentrates on only essential data required while removing any noise. It removes extra details of the image which are redundant. It also facilitates normalization and filtration.

  2. B.

    Face and Facial component detection: The organs on the face are recognized along with other key facial features. Other important features are also identified. Now to map the recognized features, we use tools such as FE Classifiers, SVM, and AdaBoost.

  3. C.

    Feature Extraction: It is the most interesting feature of the process. Features such as color, motion, and shape are extracted. When compared to original copy, this extraction has lesser but precise data, hence lesser storage.

  4. D.

    Classification: It takes as input the output of previous stage. The recognized features are grouped into classes based on certain parameters. This stage is comparatively difficult with the other stages since it is influenced by many other factors.

4 Classification Utilizing Deep Learning

A convolutional neural network is one of the forward feeding artificial neural network. This idea is encouraged by biological human brain organization. Convolutional neural network is same as the organization of human neural system in which it consists of neurons with the weights and the biases. Each neuron is fed with the input to the next neuron, i.e., the output of the previous neuron is served as the input to the next neuron, by this feature of biological concept the highest accurate and efficient result is obtained because of the multiple layer between the input and the output layer in which the most refined features are extracted to obtained the correct output as shown in Fig. 2.

Fig. 2
figure 2

Convolutional neural network basic layout

  • Input Layer: FER contains haarcascade_frontalface_default.xml file is used in openCV which contains pre-trained images and also helpful to find and crop the images.

  • Hidden layer: The numPy array gets conceded into hidden layer to specify the number of filters. Each filter generates (3, 3) receptive fields. Convolutional produces a feature map that characterizes how pixel values are improved.

  • Output layer: The output represents itself as a probability for each expression class. The model is able to demonstrate the feature conformation of the expressions in the face.

5 Results and Discussions

Simple face expression recognition is proposed. The proposed system was implemented using three steps they are facing component detection, feature extraction, and classification. Result of this project is compared with the training and testing example which are already given as a sample input with similar expressions. The experimental results are produced by our own students in the department which is shown from Figs. 3, 4, 5, 6 and 7.

Fig. 3
figure 3

Happy face

Fig. 4
figure 4

Neutral face

Fig. 5
figure 5

Sad face

Fig. 6
figure 6

Surprised face

Fig. 7
figure 7

Angry face

Since the person in Fig. 3 is smiling, the system recognizes the pixels around is mouth and eyes and results in happy face. The face expression is detected as neutral as shown in Fig. 4 since the person is not giving any expressions to be recognized. This state is displayed only when a person is not ready to express their reaction at a particular time interval.

By calculating a number of pixels value in the given persons expression, it is concluded that a person is in a sad mood as shown in Fig. 5. The above screenshot is displayed a sad expression in a square box which specifies that only fixed square of a considered for an expression calculation. A person in Fig. 6 is expressing the surprised reaction for certain incident. This expression is displayed when a person reacted with an enlarged eye and surprised form of reaction is expressed.

In Fig. 7, a person is reacted in the angry state. The changes in the eye surrounded pixels are evaluated in the above person expression and resulted in the conclusion that a person is in an angry mood.

The final result which gives the expression of the person and counts the appearance of that particular expression and also calculates the average for maximum appeared expression by considering total number of count and the count of the individual expression as shown in Fig. 8 and final experimental result is shown in Table 1.

Fig. 8
figure 8

Experimental result

Table 1 Expression calculation in terms of percentage

6 Conclusion

In this paper, we have spoken the task of facial expression recognition and aimed to classify images of faces into five discrete emotions categories that represent universal human emotions. We experimented with various techniques such as facial expression recognition (FER) methods and achieved our highest accuracy about on a CNN trained from the beginning with four convolutional layers. The final result calculates the average time with different expressions and returns the highest value.

Future work

In the future, a mobile application to recognize the face expression and also detection of gender during the facial expression will be developed. Finally, all these records will show in the form of a graph for better understand and more accuracy.