Abstract
We address the problem of estimating three head pose angles in sign language video using the Pointing04 data set as training data. The proposed model employs facial landmark points and Support Vector Regression learned from the training set to identify yaw and pitch angles independently. A simple geometric approach is used for the roll angle. As a novel development, we propose to use the detected skin tone areas within the face bounding box as additional features for head pose estimation. The accuracy level of the estimators we obtain compares favorably with published results on the same data, but the smaller number of pose angles in our setup may explain some of the observed advantage.
We evaluated the pose angle estimators also against ground truth values from motion capture recording of a sign language video. The correlations for the yaw and roll angles exceeded 0.9 whereas the pitch correlation was slightly worse. As a whole, the results are very promising both from the computer vision and linguistic points of view.
This work has been funded by the following grants of the Academy of Finland: 140245, Content-based video analysis and annotation of Finnish Sign Language (CoBaSiL); 251170, Finnish Centre of Excellence in Computational Inference Research (COIN); 134433, Signs, Syllables, and Sentences (3BatS). The authors wish to thank MA Birgitta Burger of the Finnish Centre of Excellence in Interdisciplinary Music Research, and MA Danny De Weerdt of the Sign Language Centre, University of Jyväskylä, for the motion capture recording used.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Murphy-Chutorian, E., Trivedi, M.: Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 607–626 (2009)
Wilbur, R.B.: Phonological and prosodic layering of nonmanuals in ASL. In: Emmorey, K., Lane, H. (eds.) The Signs of Language Revisited. An Anthology to Honor Ursula Bellugi and Edward Klima, pp. 215–244. Lawrence Erlbaum Associates, Mahwah (2000)
Pfau, R., Quer, J.: Nonmanuals: Their prosodic and grammatical roles. In: Brentari, D. (ed.) Sign Languages, pp. 381–402. Cambridge University Press, Cambridge (2010)
Zeshan, U.: Hand, head and face: Negative constructions in sign languages. Linguistic Typology 8, 1–58 (2004)
Ormel, E., Crasborn, O.: Prosodic correlates of sentences in signed languages: A literature review and suggestions for new types of studies. Sign Language Studies 12, 279–315 (2012)
Uřičář, M., Franc, V., Hlaváč, V.: Detector of facial landmarks learned by the structured output SVM. In: Csurka, G., Braz, J. (eds.) VISAPP 2012: Proceedings of the 7th International Conference on Computer Vision Theory and Applications, vol. 1, pp. 547–556. SciTePress — Science and Technology Publications, Portugal (2012)
Smola, A., Schólkopf, B.: A tutorial on support vector regression. Statistics and Computing 14, 199–222 (2004)
Stiefelhagen, R.: Estimating head pose with neural networks — results on the Pointing04 ICPR workshop evaluation data. In: Proceedings of the ICPR Workshop on Visual Observation of Deictic Gestures (2004)
Gourier, N., Maisonnasse, J., Hall, D., Crowley, J.L.: Head pose estimation on low resolution images. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 270–280. Springer, Heidelberg (2007)
Wu, J., Pedersen, J., Putthividhya, D., Norgaard, D., Trivedi, M.: A two-level pose estimation framework using majority voting of gabor wavelets and bunch graph analysis. In: Proc. Pointing 2004 Workshop: Visual Observation of Deictic Gestures, Citeseer, pp. 4–12 (2004)
Cootes, T., Wheeler, G., Walker, K., Taylor, C.: View-based active appearance models. Image and Vision Computing 20, 657–664 (2002)
Kanaujia, A., Huang, Y., Metaxas, D.: Tracking facial features using mixture of point distribution models. In: Kalra, P.K., Peleg, S. (eds.) ICVGIP 2006. LNCS, vol. 4338, pp. 492–503. Springer, Heidelberg (2006)
Li, S.Z., Fu, Q., Gu, L., Schölkopf, B., Cheng, Y., Zhang, H.: Kernel machine based learning for multi-view face detection and pose estimation. In: ICCV, pp. 674–679 (2001)
Whitehill, J., Movellan, J.R.: A discriminative approach to frame-by-frame head pose tracking. In: FG, pp. 1–7. IEEE (2008)
Li, Y., Gong, S., Sherrah, J., Liddell, H.: Support vector machine based multi-view face detection and recognition. Image and Vision Computing 22, 413–427 (2004)
Foytik, J., Asari, V.K., Youssef, M., Tompkins, R.C.: Head pose estimation from images using canonical correlation analysis. In: 2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–7. IEEE (2010)
Moon, H., Miller, M.: Estimating facial pose from a sparse representation [face recognition applications]. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 1, pp. 75–78. IEEE (2004)
Ji, H., Liu, R., Su, F., Su, Z., Tian, Y.: Robust head pose estimation via convex regularized sparse regression. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3617–3620. IEEE (2011)
Matsumoto, Y., Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 499–504. IEEE (2000)
Ghaffari, A., Rezvan, M., Khodayari, A., Sadati, S.H., Vahidi-Shams, A.: A new head pose estimating algorithm based on a novel feature space for driver assistant systems. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 180–185. IEEE (2011)
Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Transactions on Intelligent Transportation Systems 11, 300–311 (2010)
Xu, M., Raytchev, B., Sakaue, K., Hasegawa, O., Koizumi, A., Takeuchi, M., Sagawa, H.: A vision-based method for recognizing non-manual information in japanese sign language. In: Tan, T., Shi, Y., Gao, W. (eds.) ICMI 2000. LNCS, vol. 1948, pp. 572–581. Springer, Heidelberg (2000)
Erdem, U., Sclaroff, S.: Automatic detection of relevant head gestures in american sign language communication. In: Proceedings of the 16th International Conference on Pattern Recognition, vol. 1, pp. 460–463. IEEE (2002)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I–511. IEEE (2001)
Gourier, N., Hall, D., Crowley, J.: Estimating face orientation from robust detection of salient facial structures. In: FG Net Workshop on Visual Observation of Deictic Gestures, pp. 1–9 (2004)
Ho, H.T., Chellappa, R.: Automatic head pose estimation using randomly projected dense sift descriptors. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 153–156. IEEE (2012)
Haj, M.A., Gonzalez, J., Davis, L.S.: On partial least squares in head pose estimation: how to simultaneously deal with misalignment. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2602–2609. IEEE (2012)
Guo, G., Fu, Y., Dyer, C.R., Huang, T.S.: Head pose estimation: Classification or regression? In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)
Tu, J., Fu, Y., Hu, Y., Huang, T.: Evaluation of head pose estimation for studio data. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 281–290. Springer, Heidelberg (2007)
Jantunen, T., Burger, B., De Weerdt, D., Seilola, I., Wainio, T.: Experiences from collecting motion capture data on continuous signing. In: Crasborn, O., Efthimiou, E., Fotinea, E., Hanke, T., Kristoffersen, J., Mesch, J. (eds.) Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, Istanbul, Turkey, pp. 75–82 (2012)
Littlewort, G., Whitehill, J., Wu, T., Fasel, I.R., Frank, M.G., Movellan, J.R., Bartlett, M.S.: The computer expression recognition toolbox (cert). In: FG, pp. 298–305. IEEE (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Luzardo, M., Karppa, M., Laaksonen, J., Jantunen, T. (2013). Head Pose Estimation for Sign Language Video. In: Kämäräinen, JK., Koskela, M. (eds) Image Analysis. SCIA 2013. Lecture Notes in Computer Science, vol 7944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38886-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-38886-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38885-9
Online ISBN: 978-3-642-38886-6
eBook Packages: Computer ScienceComputer Science (R0)