Abstract
Facial expression recognition (FER) is considered a primary cue for human emotion detection. Conventional and deep learning-based techniques are the two most commonly proposed techniques for addressing FER. Traditional approaches are less dependent on data and hardware but make end-to-end learning impossible as they require manual feature extraction. Although deep learning-based methods demand high computation power, they proved to show outstanding accuracy for FER. For the deep learning-based models, specifically convolutional neural networks (CNNs), performance highly depends on the configuration of hyperparameters. Studies have shown that the accuracy of CNN for image classification increased by 10% with hyperparameter tuning. Hence, it is possible to improve FER classifier performance further by hyperparameter optimization. Due to their stochastic nature, metaheuristic optimizers proved to offer better results over other trial and error optimizers. The present study used differential evolution (DE) for hyperparameter tuning. The proposed model is trained and tested on two benchmark datasets. The results obtained have shown a 4% improvement in classification accuracy.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Human emotion recognition has been an active research area for the past few years, due to the increasing demand for applications in perceptual and cognitive sciences and affective computing. It has become an essential component for fields such as computer animations, sociable robots, and neuromarketing. Human emotions can be recognized by using facial expressions and vocal tones. According to Kaulard et al. [1], nonverbal components convey two-thirds of human communication, while verbal components convey only one-third. Various kinds of data including physiological signals, such as electromyograph (EMG), electrocardiogram (ECG), and electroencephalograph (EEG), can also be considered as input for the emotion recognition process. Among these, the facial image is the promising input type as it is noninvasive and provides an ample amount of information for expression recognition. Emotions can be categorized into three types: basic emotions (BEs), compound emotions (CEs), and micro-expressions (MEs). Basic emotions cover neutral, anger, disgust, fear, surprise, sadness, and happiness.
Two categories of approaches for facial expression recognition (FER) are in use: conventional approaches and deep learning-based approaches. When compared to deep learning-based techniques, conventional techniques are advantageous as they require less computational power. Hence, no additional infrastructure is needed. Input images having illumination changes, occlusion, and deflection of the head may influence the face detection task performance and reduce the accuracy of FER. Conventional techniques are not suitable for noisy input data. Deep learning-based techniques address these issues. Of late, convolutional neural networks (CNNs) were proven effective for face detection [2]. As CNNs contain deep layers and use elaborate designs, they can ably handle noisy data automatically [3]. CNNs proved to exhibit better performance than conventional methods for the FER task [4, 5]. The performance of CNN highly depends on the choice of its hyperparameters. It is possible to enhance the CNN’s performance by optimizing hyperparameters such as the number of hidden layers, units in each layer, filters, size of the filter, batch size, and learning rate. The present work considers the optimization of hyperparameters that describe the CNN structure. Grid search and random search techniques are commonly used for this purpose [6]. Each of these techniques has its limitations, and both need more time and domain expertise for identifying ideal hyperparameter values. Metaheuristic-based approaches can address these shortcomings as they are stochastic approximation methods. The present work employed the differential evolution (DE) algorithm for tuning the selected hyperparameters.
2 Related Work
Kim et al. [7] proposed to train multiple CNNs. They have shown an improvement in training by changing the network topology and random weight initialization. An interesting method for selecting the CNN structure was presented by Gao et al. [8]. They proposed gradient priority particle swarm optimization (GPSO) with gradient penalties for tuning CNN architecture. Experimental results have shown that the proposed method has gained competitive prediction performance for the emotion recognition task. Bergstra and Bengio [6] proposed to employ a grid or random search for tuning hyperparameters. Since the number of hyperparameters is large, testing is computationally expensive. Snoek et al. [9] have addressed the limitations of trial and error-based techniques for hyperparameter optimization. They have proposed a Bayesian optimization framework. Bochinski et al. [10] have shown that evolutionary algorithms can outperform the existing hyperparameter optimization methods.
3 Methodology
Benchmark dataset for facial expression recognition is split into training set (TS) and testing set (TE). For ensuring that samples of all classes get selected, stratified sampling without replacement is used. Selected samples from TS generate tuning set (TUS). Tuning set is further divided into TUS1 and TUS2. TUS1 is used for hyperparameter optimization. TUS2 is used for validating the outcome of optimization. Differential evolution is performed until the termination condition is met. CNN is trained using the outcome of DE on TS. The holdout method is used for assessing the performance of the trained model. After training, the model’s performance is assessed by using TE. Table 1 specifies the architecture of the convolutional neural network used in the present work (Fig. 1).
Hyperparameter Tuning—Metaheuristic optimization techniques proved to yield better results when the search space is large and complex [11]. Since the number of hyperparameters is large in CNN, tuning them is computationally expensive. Hence, the proposed model determines the optimal network topology by using the differential evolution (DE) algorithm. A simple, yet powerful, population-based stochastic search technique, differential evolution (DE) [12], has gained much attention and a wide range of successful applications [13, 14], due to its simplicity, ease in the implementation, and quick convergence.
The hyperparameters considered for tuning using DE include number of convolutional layers, filter size, stride, dropout rate, and batch size. A vector comprising the above-mentioned parameters is used as a chromosome for the DE algorithm. Precision and recall values for each of the six basic emotions are calculated by using the confusion matrix. Fitness function is defined as F = AvgPrec + AvgRec, where AvgPrec is the average of precision values computed for each basic emotion. Likewise, AvgRec is the average of recall values. DE aims to improve the existing solution using the techniques of mutation, recombination, and selection. The general paradigm of differential evolution is shown in Fig. 2.
Initialization—Creation of a population of individuals. The ith individual vector (chromosome) of the population at current generation t with d dimensions is as follows
Mutation—A random change of the vector Zi components. For each individual vector Zk(t) that belongs to the current population, a new individual, called the mutant individual, U is derived through the combination of randomly selected and pre-specified individuals.
where the indices m, n, i, j are uniformly random integers mutually different and distinct from the current index ‘k’ and F is a real positive parameter, called mutation factor or scaling factor (usually ϵ [0, 1]).
Recombination (Crossover)—Merging the genetic information of two or more parent individuals for producing one or more descendants. Binomial crossover is used in the present work. The binomial or uniform crossover is performed on each component n (n = 1, 2, …, d) of the mutant individual Uk,n(t + 1). For each component, a random number ‘r’ in the interval [0, 1] is drawn and compared with the crossover rate (CR) or recombination factor (another DE control parameter), CR € [0, 1]. If r < CR, then the nth component of the mutant individual Uk,n(t) will be selected; otherwise, the nth component of the target vector Zk,n(t) becomes the nth component.
Selection—Choice of the best individuals for the next cycle. If the new offspring yields a better value of the objective function, it replaces its parent in the next generation; otherwise, the parent is retained in the population, i.e.,
where f is the objective function to be minimized. It can be inferred that DE is a powerful population-based heuristic search technique that has empirically proven to be very robust for global optimization over continuous spaces. As the number of control parameters in DE is very few compared to other algorithms, DE is effective and efficient and thus can be treated as a widely applicable approach for solving real-world problems [13, 14].
4 Experimentation
For experimentation, two benchmark datasets, CK+ and Japanese Female Facial Expressions (JAFFE), are used.
CK+ Dataset: This dataset has 593 image sequences representing seven basic expressions (happiness, sadness, surprise, disgust, fear, anger, and neutral) of 123 models. Since the work is focused on recognition of six basic expressions, neutral expression images were ignored. Out of 593, 309 sequences have validated emotion labels that belong to one of the six previously mentioned emotions. They were selected by excluding other sequences. From each image sequence, last two frames were selected making a dataset of 618 images.
Japanese Female Facial Expressions (JAFFE): The JAFFE dataset has 213 images of ten female Japanese models. Each image represents one of the seven basic emotions (including neutral emotion). Images pertaining to neutral expression are not used.
The proposed model is implemented using Keras with a TensorFlow back end in Python 3.6. Experiments are conducted on the selected datasets. Seventy percentage of the samples are used for training, and remaining 30% is used for testing. For validating hyperparameter tuning, 20% of the samples from training dataset are used. The samples are selected by using stratified sampling. Tables 2, 3, 4, and 5 show the confusion matrices of the two datasets used with and without hyperparameter tuning. Prediction accuracies for CK+ dataset and JAFFE dataset are depicted in Fig. 3. For both the datasets, optimization of hyperparameters has improved the accuracy of all basic emotions except fear. Fear has least impact of optimization. For JAFFE dataset, accuracy is decreased by 1%. Proposed model has improved the overall classification accuracy by 4.32% for CK+ dataset and 3.78% for JAFFE dataset.
5 Conclusion
The present study proposes to optimize the convolutional neural network hyperparameters for improving the human emotion recognition rate from facial expressions. Conventional techniques fail to offer good classification accuracy for noisy input data. As CNNs contain deep layers, they can handle noisy data and are proven suitable for facial expression recognition. However, CNNs demand high computation power making their applicability limited. The performance of CNN highly depends on the choice of its hyperparameters. To enhance the CNN performance for facial expression recognition, its hyperparameters are optimized using the DE algorithm. CK+ and JAFFE datasets are used for assessing the tuned model’s performance. The results obtained have shown that hyperparameter tuning has improved the overall accuracy by 4%.
References
Kaulard K, Cunningham D, Bülthoff H, Wallraven C (2012) The MPI facial expression database—a validated database of emotional and conversational facial expressions. PLoS ONE 7(3):e32321
Li H, Lin Z, Shen X, Brandt J, Hua G (2015) A convolutional neural network cascade for face detection. In: Proceedings of IEEE conference on computer visual and pattern recognition, pp 5325–5334
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Cirean DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the twenty-second international joint conference on artificial intelligence (IJCAI'11), vol 2, AAAI Press, Barcelona, Catalonia, Spain, pp 1237–1242
Ko BC (2018) A brief review of facial emotion recognition based on visual information. Sensors 18(2):401
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
Kim BK, Roh J, Dong S-Y, Lee S-Y (2016) Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J Multimodal User Interfaces 10(2):173–189
Gao Z, Li Y, Yang Y, Wang X, Dong N, Chiang HD (2020) A GPSO-optimized convolutional neural networks for EEG-based emotion recognition. Neurocomputing 380:225–235
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst 2951–2959
Bochinski E, Senst T, Sikora T (2017) Hyper-parameter optimization for convolutional neural network committees based on evolutionary algorithms. In Proceedings of the 2017 IEEE international conference on image processing (ICIP), Beijing, China, pp 3924–3928
Radhika S, Chaparala A (2018) Optimization using evolutionary metaheuristic techniques: a brief review. Brazilian J Oper Prod Manage 15(1):44–53
Price K, Storn R (1995) Differential evolution—a simple and efficient adaptive scheme for global optimization over continuous spaces. International Computer Science Institute, Berkeley. Berkeley, CA
Sajja R, Rao CS (2014) A new multi-objective optimization of master production scheduling problems using differential evolution. Int J Appl Sci Eng 12(1):75–86
Radhika S, Rao CS, Pavan KK (2013) A differential evolution based optimization for Master production scheduling problems. Int J Hybrid Inf Technol 6(5):163–170
Acknowledgements
The author wishes to thank the management of RVR&JC College of Engineering, for funding the present work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chaparala, A. (2022). Human Emotion Detection Using Convolutional Neural Networks with Hyperparameter Tuning. In: Raje, R.R., Hussain, F., Kannan, R.J. (eds) Artificial Intelligence and Technologies. Lecture Notes in Electrical Engineering, vol 806. Springer, Singapore. https://doi.org/10.1007/978-981-16-6448-9_42
Download citation
DOI: https://doi.org/10.1007/978-981-16-6448-9_42
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6447-2
Online ISBN: 978-981-16-6448-9
eBook Packages: Computer ScienceComputer Science (R0)