Head Pose Estimation for Sign Language Video

Luzardo, Marcos; Karppa, Matti; Laaksonen, Jorma; Jantunen, Tommi

doi:10.1007/978-3-642-38886-6_34

Marcos Luzardo¹⁸,
Matti Karppa¹⁸,
Jorma Laaksonen¹⁸ &
…
Tommi Jantunen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7944))

Included in the following conference series:

Scandinavian Conference on Image Analysis

3474 Accesses
5 Citations

Abstract

We address the problem of estimating three head pose angles in sign language video using the Pointing04 data set as training data. The proposed model employs facial landmark points and Support Vector Regression learned from the training set to identify yaw and pitch angles independently. A simple geometric approach is used for the roll angle. As a novel development, we propose to use the detected skin tone areas within the face bounding box as additional features for head pose estimation. The accuracy level of the estimators we obtain compares favorably with published results on the same data, but the smaller number of pose angles in our setup may explain some of the observed advantage.

We evaluated the pose angle estimators also against ground truth values from motion capture recording of a sign language video. The correlations for the yaw and roll angles exceeded 0.9 whereas the pitch correlation was slightly worse. As a whole, the results are very promising both from the computer vision and linguistic points of view.

This work has been funded by the following grants of the Academy of Finland: 140245, Content-based video analysis and annotation of Finnish Sign Language (CoBaSiL); 251170, Finnish Centre of Excellence in Computational Inference Research (COIN); 134433, Signs, Syllables, and Sentences (3BatS). The authors wish to thank MA Birgitta Burger of the Finnish Centre of Excellence in Interdisciplinary Music Research, and MA Danny De Weerdt of the Sign Language Centre, University of Jyväskylä, for the motion capture recording used.

Download to read the full chapter text

Chapter PDF

Sparse-MVRVMs Tree for Fast and Accurate Head Pose Estimation in the Wild

Computer Vision for Head Pose Estimation: Review of a Competition

2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Murphy-Chutorian, E., Trivedi, M.: Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 607–626 (2009)
Article Google Scholar
Wilbur, R.B.: Phonological and prosodic layering of nonmanuals in ASL. In: Emmorey, K., Lane, H. (eds.) The Signs of Language Revisited. An Anthology to Honor Ursula Bellugi and Edward Klima, pp. 215–244. Lawrence Erlbaum Associates, Mahwah (2000)
Google Scholar
Pfau, R., Quer, J.: Nonmanuals: Their prosodic and grammatical roles. In: Brentari, D. (ed.) Sign Languages, pp. 381–402. Cambridge University Press, Cambridge (2010)
Chapter Google Scholar
Zeshan, U.: Hand, head and face: Negative constructions in sign languages. Linguistic Typology 8, 1–58 (2004)
Article Google Scholar
Ormel, E., Crasborn, O.: Prosodic correlates of sentences in signed languages: A literature review and suggestions for new types of studies. Sign Language Studies 12, 279–315 (2012)
Article Google Scholar
Uřičář, M., Franc, V., Hlaváč, V.: Detector of facial landmarks learned by the structured output SVM. In: Csurka, G., Braz, J. (eds.) VISAPP 2012: Proceedings of the 7th International Conference on Computer Vision Theory and Applications, vol. 1, pp. 547–556. SciTePress — Science and Technology Publications, Portugal (2012)
Google Scholar
Smola, A., Schólkopf, B.: A tutorial on support vector regression. Statistics and Computing 14, 199–222 (2004)
Article MathSciNet Google Scholar
Stiefelhagen, R.: Estimating head pose with neural networks — results on the Pointing04 ICPR workshop evaluation data. In: Proceedings of the ICPR Workshop on Visual Observation of Deictic Gestures (2004)
Google Scholar
Gourier, N., Maisonnasse, J., Hall, D., Crowley, J.L.: Head pose estimation on low resolution images. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 270–280. Springer, Heidelberg (2007)
Chapter Google Scholar
Wu, J., Pedersen, J., Putthividhya, D., Norgaard, D., Trivedi, M.: A two-level pose estimation framework using majority voting of gabor wavelets and bunch graph analysis. In: Proc. Pointing 2004 Workshop: Visual Observation of Deictic Gestures, Citeseer, pp. 4–12 (2004)
Google Scholar
Cootes, T., Wheeler, G., Walker, K., Taylor, C.: View-based active appearance models. Image and Vision Computing 20, 657–664 (2002)
Article Google Scholar
Kanaujia, A., Huang, Y., Metaxas, D.: Tracking facial features using mixture of point distribution models. In: Kalra, P.K., Peleg, S. (eds.) ICVGIP 2006. LNCS, vol. 4338, pp. 492–503. Springer, Heidelberg (2006)
Chapter Google Scholar
Li, S.Z., Fu, Q., Gu, L., Schölkopf, B., Cheng, Y., Zhang, H.: Kernel machine based learning for multi-view face detection and pose estimation. In: ICCV, pp. 674–679 (2001)
Google Scholar
Whitehill, J., Movellan, J.R.: A discriminative approach to frame-by-frame head pose tracking. In: FG, pp. 1–7. IEEE (2008)
Google Scholar
Li, Y., Gong, S., Sherrah, J., Liddell, H.: Support vector machine based multi-view face detection and recognition. Image and Vision Computing 22, 413–427 (2004)
Article Google Scholar
Foytik, J., Asari, V.K., Youssef, M., Tompkins, R.C.: Head pose estimation from images using canonical correlation analysis. In: 2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–7. IEEE (2010)
Google Scholar
Moon, H., Miller, M.: Estimating facial pose from a sparse representation [face recognition applications]. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 1, pp. 75–78. IEEE (2004)
Google Scholar
Ji, H., Liu, R., Su, F., Su, Z., Tian, Y.: Robust head pose estimation via convex regularized sparse regression. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3617–3620. IEEE (2011)
Google Scholar
Matsumoto, Y., Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 499–504. IEEE (2000)
Google Scholar
Ghaffari, A., Rezvan, M., Khodayari, A., Sadati, S.H., Vahidi-Shams, A.: A new head pose estimating algorithm based on a novel feature space for driver assistant systems. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 180–185. IEEE (2011)
Google Scholar
Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Transactions on Intelligent Transportation Systems 11, 300–311 (2010)
Article Google Scholar
Xu, M., Raytchev, B., Sakaue, K., Hasegawa, O., Koizumi, A., Takeuchi, M., Sagawa, H.: A vision-based method for recognizing non-manual information in japanese sign language. In: Tan, T., Shi, Y., Gao, W. (eds.) ICMI 2000. LNCS, vol. 1948, pp. 572–581. Springer, Heidelberg (2000)
Chapter Google Scholar
Erdem, U., Sclaroff, S.: Automatic detection of relevant head gestures in american sign language communication. In: Proceedings of the 16th International Conference on Pattern Recognition, vol. 1, pp. 460–463. IEEE (2002)
Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I–511. IEEE (2001)
Google Scholar
Gourier, N., Hall, D., Crowley, J.: Estimating face orientation from robust detection of salient facial structures. In: FG Net Workshop on Visual Observation of Deictic Gestures, pp. 1–9 (2004)
Google Scholar
Ho, H.T., Chellappa, R.: Automatic head pose estimation using randomly projected dense sift descriptors. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 153–156. IEEE (2012)
Google Scholar
Haj, M.A., Gonzalez, J., Davis, L.S.: On partial least squares in head pose estimation: how to simultaneously deal with misalignment. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2602–2609. IEEE (2012)
Google Scholar
Guo, G., Fu, Y., Dyer, C.R., Huang, T.S.: Head pose estimation: Classification or regression? In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)
Google Scholar
Tu, J., Fu, Y., Hu, Y., Huang, T.: Evaluation of head pose estimation for studio data. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 281–290. Springer, Heidelberg (2007)
Chapter Google Scholar
Jantunen, T., Burger, B., De Weerdt, D., Seilola, I., Wainio, T.: Experiences from collecting motion capture data on continuous signing. In: Crasborn, O., Efthimiou, E., Fotinea, E., Hanke, T., Kristoffersen, J., Mesch, J. (eds.) Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, Istanbul, Turkey, pp. 75–82 (2012)
Google Scholar
Littlewort, G., Whitehill, J., Wu, T., Fasel, I.R., Frank, M.G., Movellan, J.R., Bartlett, M.S.: The computer expression recognition toolbox (cert). In: FG, pp. 298–305. IEEE (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, Espoo, Finland
Marcos Luzardo, Matti Karppa & Jorma Laaksonen
Sign Language Centre, Department of Languages, University of Jyväskylä, Finland
Tommi Jantunen

Authors

Marcos Luzardo
View author publications
You can also search for this author in PubMed Google Scholar
Matti Karppa
View author publications
You can also search for this author in PubMed Google Scholar
Jorma Laaksonen
View author publications
You can also search for this author in PubMed Google Scholar
Tommi Jantunen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Signal Processing, Tampere University of Technology, P.O. Box 553, Tampere, Finland
Joni-Kristian Kämäräinen
Department of Information and Computer Science,, Aalto University, P.O. Box 15400, 00076, Espoo, Finland
Markus Koskela

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luzardo, M., Karppa, M., Laaksonen, J., Jantunen, T. (2013). Head Pose Estimation for Sign Language Video. In: Kämäräinen, JK., Koskela, M. (eds) Image Analysis. SCIA 2013. Lecture Notes in Computer Science, vol 7944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38886-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-38886-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38885-9
Online ISBN: 978-3-642-38886-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Head Pose Estimation for Sign Language Video

Abstract

Chapter PDF

Similar content being viewed by others

Sparse-MVRVMs Tree for Fast and Accurate Head Pose Estimation in the Wild

Computer Vision for Head Pose Estimation: Review of a Competition

2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Head Pose Estimation for Sign Language Video

Abstract

Chapter PDF

Similar content being viewed by others

Sparse-MVRVMs Tree for Fast and Accurate Head Pose Estimation in the Wild

Computer Vision for Head Pose Estimation: Review of a Competition

2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation