A Survey on Automatic Multimodal Emotion Recognition in the Wild

Sharma, Garima; Dhall, Abhinav

doi:10.1007/978-3-030-51870-7_3

Garima Sharma⁶ &
Abhinav Dhall^7,8

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 189))

2316 Accesses
17 Citations

Abstract

Affective computing has been an active area of research for the past two decades. One of the major component of affective computing is automatic emotion recognition. This chapter gives a detailed overview of different emotion recognition techniques and the predominantly used signal modalities. The discussion starts with the different emotion representations and their limitations. Given that affective computing is a data-driven research area, a thorough comparison of standard emotion labelled databases is presented. Based on the source of the data, feature extraction and analysis techniques are presented for emotion recognition. Further, applications of automatic emotion recognition are discussed along with current and important issues such as privacy and fairness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analytical Review and Study on Emotion Recognition Strategies Using Multimodal Signals

Survey on AI-Based Multimodal Methods for Emotion Detection

Emotion Recognition from Sensory and Bio-Signals: A Survey

References

Agrafioti, F., Hatzinakos, D., Anderson, A.K.: ECG pattern analysis for emotion detection. IEEE Trans. Affect. Comput. 3(1), 102–115 (2012)
Article Google Scholar
Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12, 2037–2041 (2006)
Article MATH Google Scholar
Alarcao, S.M., Fonseca, M.J.: Emotions recognition using EEG signals: a survey. IEEE Trans. Affect. Comput. (2017)
Google Scholar
Albanie, S., Nagrani, A., Vedaldi, A., Zisserman, A.: Emotion recognition in speech using cross-model transfer in the wild. arXiv preprint arXiv:1808.05561 (2018)
Ali, M., Mosa, A.H., Al Machot, F., Kyamakya, K.: Emotion recognition involving physiological and speech signals: a comprehensive review. In: Recent Advances in Nonlinear Dynamics and Synchronization, pp. 287–302. Springer (2018)
Google Scholar
Asghar, N., Poupart, P., Hoey, J., Jiang, X., Mou, L.: Affective neural response generation. In: European Conference on Information Retrieval, pp. 154–166. Springer (2018)
Google Scholar
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Incremental face alignment in the wild. In: Computer Vision and Pattern Recognition, pp. 1859–1866. IEEE (2014)
Google Scholar
Bachorowski, J.A.: Vocal expression and perception of emotion. Curr. Direct. Psychol. Sci. 8(2), 53–57 (1999)
Article Google Scholar
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.P.: Openface 2.0: Facial behavior analysis toolkit. In: 13th International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 59–66. IEEE (2018)
Google Scholar
Bänziger, T., Mortillaro, M., Scherer, K.R.: Introducing the geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12(5), 1161 (2012)
Article Google Scholar
Barber, S.J., Lee, H., Becerra, J., Tate, C.C.: Emotional expressions affect perceptions of younger and older adults’ everyday competence. Psychol. Aging 34(7), 991 (2019)
Article Google Scholar
Basbrain, A.M., Gan, J.Q., Sugimoto, A., Clark, A.: A neural network approach to score fusion for emotion recognition. In: 10th Computer Science and Electronic Engineering (CEEC), pp. 180–185 (2018)
Google Scholar
Batliner, A., Hacker, C., Steidl, S., Nöth, E., D’Arcy, S., Russell, M.J., Wong, M.: “You Stupid Tin Box” Children Interacting with the AIBO Robot: A Cross-linguistic Emotional Speech Corpus. Lrec (2004)
Google Scholar
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid Kernel. In: 6th ACM international conference on Image and video retrieval, pp. 401–408. ACM (2007)
Google Scholar
Bou-Ghazale, S.E., Hansen, J.H.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)
Article Google Scholar
Busso, C., Bulut, M., Lee, C.C., Kazemzadeh, A., Mower, E., Kim, S., Chang, J.N., Lee, S., Narayanan, S.S.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335 (2008)
Article Google Scholar
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: 6th International Conference on Multimodal Interfaces, pp. 205–211. ACM (2004)
Google Scholar
Busso, C., Parthasarathy, S., Burmania, A., AbdelWahab, M., Sadoughi, N., Provost, E.M.: MSP-IMPROV: an acted corpus of dyadic interactions to study emotion perception. IEEE Trans. Affect. Comput. 8(1), 67–80 (2017)
Article Google Scholar
Cairns, D.A., Hansen, J.H.: Nonlinear analysis and classification of speech under stressed conditions. J. Acoust. Soc. Am. 96(6), 3392–3400 (1994)
Article Google Scholar
Cambria, E.: Affective computing and sentiment analysis. Intell. Syst. 31(2), 102–107 (2016)
Article Google Scholar
Chen, J., Chen, Z., Chi, Z., Fu, H.: Dynamic texture and geometry features for facial expression recognition in video. In: International Conference on Image Processing (ICIP), pp. 4967–4971. IEEE (2015)
Google Scholar
Chen, W., Picard, R.W.: Eliminating physiological information from facial videos. In: 12th International Conference on Automatic Face and Gesture Recognition (FG 2017), pp. 48–55. IEEE (2017)
Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 6, 681–685 (2001)
Article Google Scholar
Correa, J.A.M., Abadi, M.K., Sebe, N., Patras, I.: AMIGOS: A dataset for affect, personality and mood research on individuals and groups. IEEE Trans. Affect. Comput. (2018)
Google Scholar
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision & Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. IEEE Computer Society (2005)
Google Scholar
Davison, A., Merghani, W., Yap, M.: Objective classes for micro-facial expression recognition. J. Imaging 4(10), 119 (2018)
Article Google Scholar
Davison, A.K., Lansley, C., Costen, N., Tan, K., Yap, M.H.: SAMM: a spontaneous micro-facial movement dataset. IEEE Trans. Affect. Comput. 9(1), 116–129 (2018)
Article Google Scholar
Dhall, A., Asthana, A., Goecke, R., Gedeon, T.: Emotion recognition using phog and lpq features. In: Face and Gesture 2011, pp. 878–883. IEEE (2011)
Google Scholar
Dhall, A., Goecke, R., Gedeon, T.: Automatic group happiness intensity analysis. IEEE Trans. Affect. Comput. 6(1), 13–26 (2015)
Article Google Scholar
Dhall, A., Goecke, R., Lucey, S., Gedeon, T., et al.: Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia 19(3), 34–41 (2012)
Article Google Scholar
Dhall, A., Kaur, A., Goecke, R., Gedeon, T.: Emotiw 2018: audio-video, student engagement and group-level affect prediction. In: International Conference on Multimodal Interaction, pp. 653–656. ACM (2018)
Google Scholar
Du, S., Tao, Y., Martinez, A.M.: Compound facial expressions of emotion. Natl. Acad. Sci. 111(15), E1454–E1462 (2014)
Article Google Scholar
Ekman, P., Friesen, W.V.: Unmasking the face: a guide to recognizing emotions from facial clues. Ishk (2003)
Google Scholar
Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: The Manual on CD ROM, pp. 77–254. A Human Face, Salt Lake City (2002)
Google Scholar
El Ayadi, M., Kamel, M.S., Karray, F.: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011)
Article MATH Google Scholar
Ertugrul, I.O., Cohn, J.F., Jeni, L.A., Zhang, Z., Yin, L., Ji, Q.: Cross-domain au detection: domains, learning approaches, and measures. In: 14th International Conference on Automatic Face & Gesture Recognition, pp. 1–8. IEEE (2019)
Google Scholar
Eyben, F., Scherer, K.R., Schuller, B.W., Sundberg, J., André, E., Busso, C., Devillers, L.Y., Epps, J., Laukka, P., Narayanan, S.S., et al.: The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7(2), 190–202 (2016)
Article Google Scholar
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in Opensmile, the Munich open-source multimedia feature extractor. In: 21st ACM international conference on Multimedia, pp. 835–838. ACM (2013)
Google Scholar
Fabian Benitez-Quiroz, C., Srinivasan, R., Martinez, A.M.: Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: Computer Vision and Pattern Recognition, pp. 5562–5570. IEEE (2016)
Google Scholar
Fan, Y., Lu, X., Li, D., Liu, Y.: Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In: 18th ACM International Conference on Multimodal Interaction, pp. 445–450. ACM (2016)
Google Scholar
Filntisis, P.P., Efthymiou, N., Koutras, P., Potamianos, G., Maragos, P.: Fusing body posture with facial expressions for joint recognition of affect in child-robot interaction. arXiv preprint arXiv:1901.01805 (2019)
Friesen, E., Ekman, P.: Facial action coding system: a technique for the measurement of facial movement. Palo Alto 3, (1978)
Google Scholar
Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various MFCC implementations on the speaker verification task. SPECOM 1, 191–194 (2005)
Google Scholar
Ghimire, D., Lee, J., Li, Z.N., Jeong, S., Park, S.H., Choi, H.S.: Recognition of facial expressions based on tracking and selection of discriminative geometric features. Int. J. Multimedia Ubiquitous Eng. 10(3), 35–44 (2015)
Article Google Scholar
Ghosh, S., Dhall, A., Sebe, N.: Automatic group affect analysis in images via visual attribute and feature networks. In: 25th IEEE International Conference on Image Processing (ICIP), pp. 1967–1971. IEEE (2018)
Google Scholar
Girard, J.M., Chu, W.S., Jeni, L.A., Cohn, J.F.: Sayette group formation task (GFT) spontaneous facial expression database. In: 12th International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 581–588. IEEE (2017)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang, Y., Thaler, D., Lee, D.H., et al.: Challenges in representation learning: a report on three machine learning contests. Neural Netw. 64, 59–63 (2015)
Article Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Article Google Scholar
Gunes, H., Pantic, M.: Automatic, dimensional and continuous emotion recognition. Int. J. Synth. Emotions (IJSE) 1(1), 68–99 (2010)
Article Google Scholar
Haggard, E.A., Isaacs, K.S.: Micromomentary facial expressions as indicators of ego mechanisms in psychotherapy. In: Methods of research in psychotherapy, pp. 154–165. Springer (1966)
Google Scholar
Han, J., Zhang, Z., Ren, Z., Schuller, B.: Implicit fusion by joint audiovisual training for emotion recognition in mono modality. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5861–5865. IEEE (2019)
Google Scholar
Han, J., Zhang, Z., Schmitt, M., Ren, Z., Ringeval, F., Schuller, B.: Bags in bag: generating context-aware bags for tracking emotions from speech. Interspeech 2018, 3082–3086 (2018)
Google Scholar
Happy, S., Patnaik, P., Routray, A., Guha, R.: The Indian spontaneous expression database for emotion recognition. IEEE Trans. Affect. Comput. 8(1), 131–142 (2017)
Article Google Scholar
Harvill, J., AbdelWahab, M., Lotfian, R., Busso, C.: Retrieving speech samples with similar emotional content using a triplet loss function. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7400–7404. IEEE (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer vision and pattern recognition, pp. 770–778. IEEE (2016)
Google Scholar
Hu, P., Ramanan, D.: Finding tiny faces. In: Computer vision and pattern recognition, pp. 951–959. IEEE (2017)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Computer vision and pattern recognition, pp. 4700–4708. IEEE (2017)
Google Scholar
Huang, Y., Yang, J., Liu, S., Pan, J.: Combining facial expressions and electroencephalography to enhance emotion recognition. Future Internet 11(5), 105 (2019)
Article Google Scholar
Hussein, H., Angelini, F., Naqvi, M., Chambers, J.A.: Deep-learning based facial expression recognition system evaluated on three spontaneous databases. In: 9th International Symposium on Signal, Image, Video and Communications (ISIVC), pp. 270–275. IEEE (2018)
Google Scholar
Jack, R.E., Blais, C., Scheepers, C., Schyns, P.G., Caldara, R.: Cultural confusions show that facial expressions are not universal. Curr. Biol. 19(18), 1543–1548 (2009)
Article Google Scholar
Jack, R.E., Sun, W., Delis, I., Garrod, O.G., Schyns, P.G.: Four not six: revealing culturally common facial expressions of emotion. J. Exp. Psychol. Gen. 145(6), 708 (2016)
Article Google Scholar
Jiang, B., Valstar, M.F., Pantic, M.: Action unit detection using sparse appearance descriptors in space-time video volumes. In: Face and Gesture, pp. 314–321. IEEE (2011)
Google Scholar
Joshi, J., Goecke, R., Alghowinem, S., Dhall, A., Wagner, M., Epps, J., Parker, G., Breakspear, M.: Multimodal assistive technologies for depression diagnosis and monitoring. J. Multimodal User Interfaces 7(3), 217–228 (2013)
Article Google Scholar
Jyoti, S., Sharma, G., Dhall, A.: Expression empowered residen network for facial action unit detection. In: 14th International Conference on Automatic Face and Gesture Recognition, pp. 1–8. IEEE (2019)
Google Scholar
Kaiser, J.F.: On a Simple algorithm to calculate the ‘Energy’ of a Signal. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 381–384. IEEE (1990)
Google Scholar
King, D.E.: Dlib-ML: A machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Knyazev, B., Shvetsov, R., Efremova, N., Kuharenko, A.: Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598 (2017)
Koelstra, S., Muhl, C., Soleymani, M., Lee, J.S., Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A., Patras, I.: DEAP: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2012)
Article Google Scholar
Kratzwald, B., Ilić, S., Kraus, M., Feuerriegel, S., Prendinger, H.: Deep learning for affective computing: text-based emotion recognition in decision support. Decis. Support Syst. 115, 24–35 (2018)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Latif, S., Rana, R., Khalifa, S., Jurdak, R., Epps, J.: Direct modelling of speech emotion from raw speech. arXiv preprint arXiv:1904.03833 (2019)
Lee, C.M., Narayanan, S.S., et al.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2), 293–303 (2005)
Article Google Scholar
Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: The IEEE International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Li, S., Deng, W.: Deep facial expression recognition: a survey. arXiv preprint arXiv:1804.08348 (2018)
Li, S., Deng, W., Du, J.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Computer Vision and Pattern Recognition, pp. 2852–2861. IEEE (2017)
Google Scholar
Li, W., Xu, H.: Text-based emotion classification using emotion cause extraction. Expert Syst. Appl. 41(4), 1742–1749 (2014)
Article Google Scholar
Lian, Z., Li, Y., Tao, J.H., Huang, J., Niu, M.Y.: Expression analysis based on face regions in read-world conditions. Int. J. Autom. Comput. 1–12
Google Scholar
Liao, S., Jain, A.K., Li, S.Z.: A fast and accurate unconstrained face detector. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 211–223 (2016)
Article Google Scholar
Lienhart, R., Maydt, J.: An extended set of haar-like features for rapid object detection. In: Proceedings of International Conference on Image Processing, vol. 1, p. I. IEEE (2002)
Google Scholar
Liu, X., Zou, Y., Kong, L., Diao, Z., Yan, J., Wang, J., Li, S., Jia, P., You, J.: Data augmentation via latent space interpolation for image classification. In: 24th International Conference on Pattern Recognition (ICPR), pp. 728–733. IEEE (2018)
Google Scholar
Livingstone, S.R., Russo, F.A.: The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13(5), e0196391 (2018)
Article Google Scholar
Lotfian, R., Busso, C.: Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast rRecordings. IEEE Trans. Affect. Comput. (2017)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Lowe, D.G., et al.: Object recognition from local scale-invariant features. ICCV 99, 1150–1157 (1999)
Google Scholar
Lucas, B.D., Kanade, T., et al.: An iterative image registration technique with an application to stereo vision (1981)
Google Scholar
Lucey, P., Cohn, J.F., Kanade, T., Saragih, J., Ambadar, Z., Matthews, I.: The extended Cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 94–101. IEEE (2010)
Google Scholar
Macías, E., Suárez, A., Lacuesta, R., Lloret, J.: Privacy in affective computing based on mobile sensing systems. In: 2nd International Electronic Conference on Sensors and Applications, p. 1. MDPI AG (2015)
Google Scholar
Makhmudkhujaev, F., Abdullah-Al-Wadud, M., Iqbal, M.T.B., Ryu, B., Chae, O.: Facial expression recognition with local prominent directional pattern. Signal Process. Image Commun. 74, 1–12 (2019)
Article Google Scholar
Mandal, M., Verma, M., Mathur, S., Vipparthi, S., Murala, S., Deveerasetty, K.: RADAP: regional adaptive affinitive patterns with logical operators for facial expression recognition. IET Image Processing (2019)
Google Scholar
Martin, O., Kotsia, I., Macq, B., Pitas, I.: The eNTERFACE’05 audio-visual emotion database. In: 22nd International Conference on Data Engineering Workshops (ICDEW’06), pp. 8–8. IEEE (2006)
Google Scholar
Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: DISFA: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. 4(2), 151–160 (2013)
Article Google Scholar
McDuff, D., Amr, M., El Kaliouby, R.: AM-FED+: an extended dataset of naturalistic facial expressions collected in everyday settings. IEEE Trans. Affect. Comput. 10(1), 7–17 (2019)
Article Google Scholar
McGilloway, S., Cowie, R., Douglas-Cowie, E., Gielen, S., Westerdijk, M., Stroeve, S.: Approaching automatic recognition of emotion from voice: a rough benchmark. In: ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000)
Google Scholar
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012)
Article Google Scholar
Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14(4), 261–292 (1996)
Article Google Scholar
Mehrabian, A., Ferris, S.R.: Inference of attitudes from nonverbal communication in two channels. J. Consult. Psychol. 31(3), 248 (1967)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Moffat, D., Ronan, D., Reiss, J.D.: An evaluation of audio feature extraction toolboxes (2015)
Google Scholar
Mollahosseini, A., Hasani, B., Mahoor, M.H.: Affectnet: A database for facial expression, valence, and arousal computing in the wild. arXiv preprint arXiv:1708.03985 (2017)
Munezero, M.D., Montero, C.S., Sutinen, E., Pajunen, J.: Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Trans. Affect. Comput. 5(2), 101–111 (2014)
Article Google Scholar
Murray, I.R., Arnott, J.L.: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J. Acoust. Soc. Am. 93(2), 1097–1108 (1993)
Article Google Scholar
Nwe, T.L., Foo, S.W., De Silva, L.C.: Speech emotion recognition using hidden Markov models. Speech Commun. 41(4), 603–623 (2003)
Article Google Scholar
Ojansivu, V., Heikkilä, J.: Blur insensitive texture classification using local phase quantization. In: International Conference on Image and Signal Processing, pp. 236–243. Springer (2008)
Google Scholar
Ou, J., Bai, X.B., Pei, Y., Ma, L., Liu, W.: Automatic facial expression recognition using gabor filter and expression analysis. In: 2nd International Conference on Computer Modeling and Simulation, vol. 2, pp. 215–218. IEEE (2010)
Google Scholar
Pan, X., Guo, W., Guo, X., Li, W., Xu, J., Wu, J.: Deep temporal-spatial aggregation for video-based facial expression recognition. Symmetry 11(1), 52 (2019)
Article Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. BMVC 1, 6 (2015)
Google Scholar
Rabiner, L., Schafer, R.: Digital Processing of Speech Signals. Prentice Hall, Englewood Cliffs (1978)
Google Scholar
Rassadin, A., Gruzdev, A., Savchenko, A.: Group-level emotion recognition using transfer learning from face identification. In: 19th ACM International Conference on Multimodal Interaction, pp. 544–548. ACM (2017)
Google Scholar
Reynolds, C., Picard, R.: Affective sensors, privacy, and ethical contracts. In: CHI’04 Extended Abstracts on Human Factors in Computing Systems, pp. 1103–1106. ACM (2004)
Google Scholar
Rhue, L.: Racial influence on automated perceptions of emotions. Available at SSRN 3281765, (2018)
Google Scholar
Ringeval, F., Eyben, F., Kroupi, E., Yuce, A., Thiran, J.P., Ebrahimi, T., Lalanne, D., Schuller, B.: Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recogn. Lett. 66, 22–30 (2015)
Article Google Scholar
Ringeval, F., Schuller, B., Valstar, M., Cummins, N., Cowie, R., Tavabi, L., Schmitt, M., Alisamir, S., Amiriparian, S., Messner, E.M., et al.: AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. In: 9th International on Audio/Visual Emotion Challenge and Workshop, pp. 3–12. ACM (2019)
Google Scholar
Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: 10th International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8. IEEE (2013)
Google Scholar
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Article Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. Adv. Neural Inform. Process. Syst. 3856–3866 (2017)
Google Scholar
Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: 12th International Conference on Computer Vision, pp. 1034–1041. IEEE (2009)
Google Scholar
Sariyanidi, E., Gunes, H., Cavallaro, A.: Automatic analysis of facial affect: a survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1113–1133 (2015)
Article Google Scholar
Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., Chetouani, M., Weninger, F., Eyben, F., Marchi, E., et al.: The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, Autism. In: 14th Annual Conference of the International Speech Communication Association (2013)
Google Scholar
Sebe, N., Cohen, I., Gevers, T., Huang, T.S.: Emotion recognition based on joint visual and audio cues. In: 18th International Conference on Pattern Recognition, vol. 1, pp. 1136–1139. IEEE (2006)
Google Scholar
Seyeditabari, A., Tabari, N., Zadrozny, W.: Emotion detection in text: a review. arXiv preprint arXiv:1806.00674 (2018)
Shi, J., Tomasi, C.: Good Features to Track. Tech. rep, Cornell University (1993)
Google Scholar
Siddharth, S., Jung, T.P., Sejnowski, T.J.: Multi-modal approach for affective computing. arXiv preprint arXiv:1804.09452 (2018)
Sikka, K., Dykstra, K., Sathyanarayana, S., Littlewort, G., Bartlett, M.: Multiple Kernel learning for emotion recognition in the wild. In: 15th ACM on International Conference on Multimodal Interaction, pp. 517–524. ACM (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sneddon, I., McRorie, M., McKeown, G., Hanratty, J.: The Belfast induced natural emotion database. IEEE Trans. Affect. Comput. 3(1), 32–41 (2012)
Article Google Scholar
Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012)
Article Google Scholar
Strapparava, C., Mihalcea, R.: Learning to identify emotions in text. In: ACM Symposium on Applied Computing, pp. 1556–1560. ACM (2008)
Google Scholar
Strapparava, C., Valitutti, A., et al.: Wordnet affect: an affective extension of wordnet. In: Lrec, vol. 4, p. 40. Citeseer (2004)
Google Scholar
Teager, H.: Some observations on oral air flow during phonation. IEEE Trans. Acoust. Speech Signal Process. 28(5), 599–601 (1980)
Article Google Scholar
Thoits, P.A.: The sociology of emotions. Annu. Rev. Sociol. 15(1), 317–342 (1989)
Article Google Scholar
Tomasi, C., Detection, T.K.: Tracking of point features. Tech. rep., Tech. Rep. CMU-CS-91-132, Carnegie Mellon University (1991)
Google Scholar
Torres, J.M.M., Stepanov, E.A.: Enhanced face/audio emotion recognition: video and instance level classification using ConvNets and restricted boltzmann machines. In: International Conference on Web Intelligence, pp. 939–946. ACM (2017)
Google Scholar
Trigeorgis, G., Ringeval, F., Brueckner, R., Marchi, E., Nicolaou, M.A., Schuller, B., Zafeiriou, S.: Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5200–5204. IEEE (2016)
Google Scholar
Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. In: Computer Vision and Pattern Recognition, pp. 1526–1535. IEEE (2018)
Google Scholar
Verma, G.K., Tiwary, U.S.: Multimodal fusion framework: a multiresolution approach for emotion classification and recognition from physiological signals. NeuroImage 102, 162–172 (2014)
Article Google Scholar
Viola, P., Jones, M., et al.: Rapid object detection using a boosted cascade of simple features. CVPR 1(1), 511–518 (2001)
Google Scholar
Wagner, J., Andre, E., Lingenfelser, F., Kim, J.: Exploring fusion methods for multimodal emotion recognition with missing data. IEEE Trans. Affect. Comput. 2(4), 206–218 (2011)
Article Google Scholar
Wagner, J., Vogt, T., André, E.: A systematic comparison of different HMM designs for emotion recognition from acted and spontaneous speech. In: International Conference on Affective Computing and Intelligent Interaction, pp. 114–125. Springer (2007)
Google Scholar
Wang, S., Liu, Z., Lv, S., Lv, Y., Wu, G., Peng, P., Chen, F., Wang, X.: A natural visible and infrared facial expression database for expression recognition and emotion inference. IEEE Trans. Multimedia 12(7), 682–691 (2010)
Article Google Scholar
Warriner, A.B., Kuperman, V., Brysbaert, M.: Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45(4), 1191–1207 (2013)
Article Google Scholar
Wiles, O., Koepke, A., Zisserman, A.: Self-supervised learning of a facial attribute embedding from video. arXiv preprint arXiv:1808.06882 (2018)
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)
Article Google Scholar
Wu, T., Bartlett, M.S., Movellan, J.R.: Facial expression recognition using gabor motion energy filters. In: Computer Vision and Pattern Recognition-Workshops, pp. 42–47. IEEE (2010)
Google Scholar
Wu, Y., Kang, X., Matsumoto, K., Yoshida, M., Kita, K.: Emoticon-based emotion analysis for Weibo articles in sentence level. In: International Conference on Multi-disciplinary Trends in Artificial Intelligence, pp. 104–112. Springer (2018)
Google Scholar
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.c.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)
Google Scholar
Yan, W.J., Li, X., Wang, S.J., Zhao, G., Liu, Y.J., Chen, Y.H., Fu, X.: CASME II: an improved spontaneous micro-expression database and the baseline evaluation. PloS One 9(1), e86041 (2014)
Article Google Scholar
Yan, W.J., Wu, Q., Liang, J., Chen, Y.H., Fu, X.: How fast are the leaked facial expressions: the duration of micro-expressions. J. Nonverbal Behav. 37(4), 217–230 (2013)
Article Google Scholar
Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3D facial expression database for facial behavior research. In: 7th International Conference on Automatic Face and Gesture Recognition, pp. 211–216. IEEE (2006)
Google Scholar
Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-wild: valence and arousal’In-the-wild’challenge. In: Computer Vision and Pattern Recognition Workshops, pp. 34–41. IEEE (2017)
Google Scholar
Zamil, A.A.A., Hasan, S., Baki, S.M.J., Adam, J.M., Zaman, I.: Emotion detection from speech signals using voting mechanism on classified frames. In: International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 281–285. IEEE (2019)
Google Scholar
Zhalehpour, S., Onder, O., Akhtar, Z., Erdem, C.E.: BAUM-1: a spontaneous audio-visual face database of affective and mental states. IEEE Trans. Affect. Comput. 8(3), 300–313 (2017)
Article Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Article Google Scholar
Zhang, Z., Girard, J.M., Wu, Y., Zhang, X., Liu, P., Ciftci, U., Canavan, S., Reale, M., Horowitz, A., Yang, H., et al.: Multimodal spontaneous emotion corpus for human behavior analysis. In: Computer Vision and Pattern Recognition, pp. 3438–3446. IEEE (2016)
Google Scholar
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 126(5), 550–569 (2018)
Article MathSciNet Google Scholar
Zhao, G., Huang, X., Taini, M., Li, S.Z., PietikäInen, M.: Facial expression recognition from near-infrared videos. Image Vis. Comput. 607–619 (2011)
Google Scholar
Zhao, G., Pietikainen, M.: Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans. Pattern Anal. Mach. Intell. 6, 915–928 (2007)
Article Google Scholar
Zhong, P., Wang, D., Miao, C.: An affect-rich neural conversational model with biased attention and weighted cross-entropy loss. arXiv preprint arXiv:1811.07078 (2018)
Zhou, G., Hansen, J.H., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. IEEE Trans. Speech Audio Process. 9(3), 201–216 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Human-Centered Artificial Intelligence group, Monash University, Melbourne, Australia
Garima Sharma
Human-Centered Artificial Intelligence group, Monash University, Melbourne, Australia
Abhinav Dhall
Indian Institute of Technology, Ropar, India
Abhinav Dhall

Authors

Garima Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Abhinav Dhall
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Garima Sharma or Abhinav Dhall .

Editor information

Editors and Affiliations

Loyola University Maryland, Baltimore, MD, USA
Gloria Phillips-Wren
Dipartimento di Psicologia, Università della Campania “Luigi Vanvitelli”, Caserta, Italy
Anna Esposito
Centre for Artificial Intelligence, University of Technology Sydney Broadway,, Sydney, NSW, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sharma, G., Dhall, A. (2021). A Survey on Automatic Multimodal Emotion Recognition in the Wild. In: Phillips-Wren, G., Esposito, A., Jain, L.C. (eds) Advances in Data Science: Methodologies and Applications. Intelligent Systems Reference Library, vol 189. Springer, Cham. https://doi.org/10.1007/978-3-030-51870-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-51870-7_3
Published: 27 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-51869-1
Online ISBN: 978-3-030-51870-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Survey on Automatic Multimodal Emotion Recognition in the Wild

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analytical Review and Study on Emotion Recognition Strategies Using Multimodal Signals

Survey on AI-Based Multimodal Methods for Emotion Detection

Emotion Recognition from Sensory and Bio-Signals: A Survey

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Survey on Automatic Multimodal Emotion Recognition in the Wild

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analytical Review and Study on Emotion Recognition Strategies Using Multimodal Signals

Survey on AI-Based Multimodal Methods for Emotion Detection

Emotion Recognition from Sensory and Bio-Signals: A Survey

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation