Abstract
Automatic emotion classification is a task that has been subject of study from very different approaches. Previous research proves that similar performance to humans can be achieved by adequate combination of modalities and features. Nevertheless, large amounts of training data seem necessary to reach a similar level of accurate automatic classification. The labelling of training, validation and test sets is generally a difficult and time consuming task that restricts the experiments. Therefore, in this work we aim at studying self and active training methods and their performance in the task of emotion classification from speech data to reduce annotation costs. The results are compared, using confusion matrices, with the human perception capabilities and supervised training experiments, yielding similar accuracies.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Bicego, M., Murino, V., Figueiredo, M.: Similarity-Based Clustering of Sequences using Hidden Markov Models. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 95–104. Springer, Heidelberg (2003)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, pp. 92–100. ACM, New York (1998)
Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 595–602. ACM, New York (2008)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Hermansky, H.: The modulation spectrum in automatic recognition of speech. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 140–147. IEEE (1997)
Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing, special issue on Robust Speech Recognition 2, 578–589 (1994)
Li, D., Sethi, I.K., Dimitrova, N., McGee, T.: Classification of general audio data for content-based retrieval. Pattern Recognition Letters 22(5), 533–544 (2001)
Lomasky, R., Brodley, C.E., Aernecke, M., Walt, D., Friedl, M.: Active Class Selection. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 640–647. Springer, Heidelberg (2007)
Maganti, H.K., Scherer, S., Palm, G.: A Novel Feature for Emotion Recognition in Voice Based Applications. In: Paiva, A.C.R., Prada, R., Picard, R.W. (eds.) ACII 2007. LNCS, vol. 4738, pp. 710–711. Springer, Heidelberg (2007)
Monteleoni.: Learning with Online Constraints: Shifting Concepts and Active Learning. PhD thesis, Massachusetts Institute of Technology (2006)
Rabiner, L.R.: Fundamentals of Speech Recognition. Prentice-Hall (1993)
Scherer, K.R., Johnstone, T., Klasmeyer, G.: Affective Science. In: Handbook of Affective Sciences - Vocal expression of emotion, ch. 23, pp. 433–456. Oxford University Press (2003)
Scherer, S.: Analyzing the User’s State in HCI: From Crisp Emotions to Conversational Dispositions. PhD thesis. Ulm University (2011)
Settles.: Curious Machines: Active Learning with Structured Instances. PhD thesis, University of Wisconsin Madison (2008)
Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648. University of Wisconsin–Madison (2009)
Thiel, C., Scherer, S., Schwenker, F.: Fuzzy-Input Fuzzy-Output One-against-all Support Vector Machines. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part III. LNCS (LNAI), vol. 4694, pp. 156–165. Springer, Heidelberg (2007)
Tong.: Active Learning: Theory and Applications. PhD thesis. Stanford University (2001)
Wendt, B.: Analysen Emotionaler Prosodie, Hallesche Schriften zur Sprechwissenschaft und Phonetik, vol. 20. Peter Lang Internationaler Verlag der Wissenschaften (2007)
Wendt, B., Scheich, H.: The ”Magdeburger Prosodie Korpus” - a spoken language corpus for fMRI-studies. In: Speech Prosody SProSIG 2002, pp. 699–701 (2002)
Zhu, X.: Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences. University of Wisconsin-Madison (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esparza, J., Scherer, S., Schwenker, F. (2012). Studying Self- and Active-Training Methods for Multi-feature Set Emotion Recognition. In: Schwenker, F., Trentin, E. (eds) Partially Supervised Learning. PSL 2011. Lecture Notes in Computer Science(), vol 7081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28258-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-28258-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28257-7
Online ISBN: 978-3-642-28258-4
eBook Packages: Computer ScienceComputer Science (R0)