Abstract
For many tasks in computer vision, it is very important to produce the groundtruth data. At present, this is mostly done manually. Manual data labeling is labor-intensive and prone to the human errors. The training data it produces often lacks in both quantity and quality. Fully automatic data labeling, on the other hand, is not feasible and reliable. In this paper, we propose an interactive image labeling technique for efficient and accurate data labeling.
The proposed technique includes two parts: an automatic labeling part and a human intervention part. Constructed on a Bayesian Network, the automatic image labeler produces an initial labeling of the image. A person then examines the initial labeling and makes some minor corrections. The selected human corrections and the image measurements are then integrated by the Bayesian Network framework to produce a refined labeling. To minimize the human involvement, an active user feedback strategy is developed, through which the optimal user feedback is determined, so that the labeling errors in the subsequent re-labeling process can be maximally reduced. The proposed framework combines the advantages of the human input with those of the machine so that the reliable, accurate, and efficient data labeling can be achieved. We demonstrate the validity of the proposed framework for interactive labeling of facial action units. The proposed methodology, however, is not limited to labeling of facial action units. It can be easily extended to other areas such as interactive image segmentation.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Levin, A., Viola, P., Freund, Y.: Unsupervised improvement of visual detectors using co-training. In: Int’l. Conf. on Computer Vision, pp. 13–16 (2003)
Ekman, P., Friesen, W.V., Hager, J.C.: Facial Action Coding System: the Manual. Research Nexus, Div., Network Information Research Corp., Salt Lake City, UT (2002)
Pantic, M., Bartlett, M.: Machine analysis of facial expressions. In: Delac, K., Grgic, M. (eds.) Face Recognition, pp. 377–416. I-Tech Education and Publishing, Vienna (2007)
Kanade, T., Cohn, J.F., Tian, Y.: Comprehensive database for facial expression analysis. In: Proc. 4th IEEE Int’l Conf. Automatic Face and Gesture Recognition, pp. 46–53 (2000)
Zhou, Z.H., Chen, K.J., Dai, H.B.: Enhancing relevance feedback in image retrieval using unlabeled data. ACM Trans. on Information Systems 24(2), 219–244 (2006)
In Chapelle, O., Weston, J., Schölkopf, B. (eds.): Semi-supervised learning. MIT Press, Cambridge (2006)
Cozman, F., Cohen, I., Cirelo, M.: Semi-supervised learning of mixture models. In: ICML (2003)
Wang, F., Zhang, C.: Label propagation through linear neighborhoods. In: ICML (2006)
Weston, J., Leslie, C., Zhou, D., Elisseeff, A., Noble, W.S.: Semisupervised protein classification using cluster kernels. In: Advances in neural information processing systems, vol. 16. MIT Press, Cambridge (2004)
Xu, L., Schuurmans, D.: Unsupervised and semi-supervised multi-class support vector machines. In: Proc. of the 20th National Conf. on Artificial Intelligence (2005)
Zhou, Z.H., Li, M.: Semi-supervised regression with co-training. IJCAI (2005)
Zhu, X.: Semi-supervised learning with graphs. Doctoral dissertation, Carnegie Mellon University (2005)
Ackley, D.H., Littman, M.L.: Generalization and scaling in reinforcement learning. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems, vol. 2, pp. 550–557. Morgan Kaufmann, San Mateo (1990)
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7. The MIT Press, Cambridge (1995)
Gullapalli, V.: Reinforcement learning and its application to control. Ph.D thesis, University of Massachusetts (1992)
Lin, L.J.: Reinforcement learning for robots using neural networks. Ph.D thesis, Carnegie Mellon University (1993)
Maes, P., Brooks, R.A.: Learning to coordinate behaviors. In: Proc. 8th National Conf. on Artificial Intelligence, pp. 796–802 (1990)
Zhang, W., Dietterich, T.G.: A reinforcement learning approach to job-shop scheduling. IJCAI (1995)
Chang, E.Y., Lai, W.C.: Active learning and its scalability for image retrieval. In: ICME, pp. 73–76 (2004)
Christoudias, C., Saenko, K., Morency, L., Darrell, T.: Co-adaptation of audio-visual speech and gesture classifiers. In: Int’l. Conf. on Multimodal Interfaces, pp. 84–91 (2006)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the Workshop on Computational Learning Theory, pp. 92–100 (1998)
Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168 (1997)
Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.L.: Multi-criteria-based active learning for named entity recognition. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (2004)
Kapoor, A., Grauman, K., Urtasun, R., Darrell, T.: Active learning with gaussian processes for object categorization. In: ICCV (2007)
Raghavan, H., Madani, O., Jones, R.: Active learning with feedback on both features and instances. Journal of Machine Learning Research 7, 1655–1686 (2006)
Tur, G., Rahim, M., Hakkani-Tuk, D.: Active labeling for spoken language understanding. Eurospeech, 2782–2789 (2003)
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans. PAMI 22(12), 1424–1445 (2000)
Bartlett, M.S., Littlewort, G.C., Frank, M.G., Lainscsek, C., Fasel, I.R., Movellan, J.R.: Automatic recognition of facial actions in spontaneous expressions. J. Multimedia 1(6), 22–35 (2006)
Tong, Y., Liao, W., Ji, Q.: Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Trans. PAMI 29(10), 1683–1699 (2007)
Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning 20(3), 197–243 (1995)
Heckerman, D.: A tutorial on learning with bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, pp. 1–40 (1995)
Multiple Aspects of Discourse research lab, http://madresearchlab.org/
Douglas-Cowie, E., Cowie, R., Schroeder, M.: The description of naturally occurring emotional speech. In: Fifteenth Int’l Congress of Phonetic Sciences (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, L., Tong, Y., Ji, Q. (2008). Active Image Labeling and Its Application to Facial Action Labeling. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5303. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88688-4_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-88688-4_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88685-3
Online ISBN: 978-3-540-88688-4
eBook Packages: Computer ScienceComputer Science (R0)