Abstract
We focus on the task of hand pose estimation from egocentric viewpoints. For this problem specification, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved problem. The problem is exacerbated when considering a wearable sensor and a first-person camera viewpoint: the occlusions inherent to the particular camera view and the limitations in terms of field of view make the problem even more difficult. We propose to use task and viewpoint specific synthetic training exemplars in a discriminative detection framework. We also exploit the depth features for a sparser and faster detection. We evaluate our approach on a real-world annotated dataset and propose a novel annotation technique for accurate 3D hand labelling even in case of partial occlusions.
This research was supported by the European Commission under FP7 Marie Curie IOF grant “Egovision4Health” (PIOF-GA-2012-328288).
Chapter PDF
Similar content being viewed by others
References
Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: a retrospective memory aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177–193. Springer, Heidelberg (2006)
Yang, R., Sarkar, S., Loeding, B.L.: Handling movement epenthesis and hand segmentation ambiguities in continuous sign language recognition using nested dynamic programming. PAMI 32(3), 462–477 (2010)
den Bergh, M.V., Gool, L.J.V.: Combining rgb and tof cameras for real-time 3d hand gesture interaction. In: WACV, 66–72 (2011)
Shotton, J., Fitzgibbon, A.W., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR (2011)
Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Tracking the articulated motion of two strongly interacting hands. In: CVPR (2012)
Romero, J., Kjellstrom, H., Ek, C.H., Kragic, D.: Non-parametric hand pose estimation with object context. Im. and Vision Comp. 31(8), 555–564 (2013)
Tang, D., Kim, T.H.Y.T.K.: Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In: ICCV (2013)
Sakata, H., Taira, M., Kusunoki, M., Murata, A., Tsutsui, K.I., Tanaka, Y., Shein, W.N., Miyashita, Y.: Neural representation of three-dimensional features of manipulation objects with stereopsis. Experimental Brain Research 128(1–2), 160–169 (1999)
Fathi, A., Ren, X., Rehg, J.: Learning to recognize objects in egocentric activities. In: CVPR (2011)
Pirsiavash, H., Ramanan, D.: Detecting activities of daily living in first-person camera views. In: CVPR (2012)
Starner, T., Schiele, B., Pentland, A.: Visual contextual awareness in wearable computing. In: International Symposium on Wearable Computing (1998)
Kurata, T., Kato, T., Kourogi, M., Jung, K., Endo, K.: A functionally-distributed hand tracking method for wearable visual interfaces and its applications. In: MVA, 84–89 (2002)
Kölsch, M., Turk, M.: Hand tracking with flocks of features. In: CVPR (2), 1187 (2005)
Kölsch, M.: An appearance-based prior for hand tracking. In: Blanc-Talon, J., Bone, D., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2010, Part II. LNCS, vol. 6475, pp. 292–303. Springer, Heidelberg (2010)
Morerio, P., Marcenaro, L., Regazzoni, C.S.: Hand detection in first person vision. In: FUSION (2013)
Dominguez, S., Keaton, T., Sayed, A.: A robust finger tracking method for multimodal wearable computer interfacing. IEEE Transactions on Multimedia 8(5), 956–972 (2006)
Ryoo, M.S., Matthies, L.: First-person activity recognition: What are they doing to me?. In: CVPR (2013)
Mayol, W., Davison, A., Tordoff, B., Molton, N., Murray, D.: Interaction between hand and wearable camera in 2d and 3d environments. In: BMVC (2004)
Ren, X., Philipose, M.: Egocentric recognition of handled objects: Benchmark and analysis. In: IEEE Workshop on Egocentric Vision (2009)
Damen, D., Gee, A.P., Mayol-Cuevas, W.W., Calway, A.: Egocentric real-time workspace monitoring using an rgb-d camera. In: IROS (2012)
Ren, X., Gu, C.: Figure-ground segmentation improves handled object recognition in egocentric video. In: CVPR, pp. 3137–3144. IEEE (2010)
Fathi, A., Farhadi, A., Rehg, J.: Understanding egocentric activities. In: ICCV (2011)
Oikonomidis, I., Kyriazis, N., Argyros, A.: Efficient model-based 3d tracking of hand articulations using kinect. In: BMVC (2011)
Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012)
Xu, C., Cheng, L.: Efficient hand pose estimation from a single depth image. In: ICCV (2013)
Mann, S., Huang, J., Janzen, R., Lo, R., Rampersad, V., Chen, A., Doha, T.: Blind navigation with a wearable range camera and vibrotactile helmet. In: ACM International Conf. on Multimedia. MM 2011 (2011)
Argyros, A.A., Lourakis, M.I.A.: Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera. In: Pajdla, T., Matas, J.G. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 368–379. Springer, Heidelberg (2004)
Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: Vision-based hand pose estimation: A review. CVIU 108(1–2), 52–73 (2007)
Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using rgb and depth data. In: ICCV (2013)
Stenger, B., Thayananthan, A., Torr, P., Cipolla, R.: Model-based hand tracking using a hierarchical bayesian filter. PAMI 28(9), 1372–1384 (2006)
Oikonomidis, I., Kyriazis, N., Argyros, A.A.: Full dof tracking of a hand interacting with an object by modeling occlusions and physical constraints. In: ICCV (2011)
de La Gorce, M., Fleet, D.J., Paragios, N.: Model-based 3d hand pose estimation from monocular video. IEEE PAMI 33(9), 1793–1805 (2011)
Ong, E.J., Bowden, R.: A boosted classifier tree for hand shape detection. In: FGR (2004)
Rogez, G., Rihan, J., Orrite, C., Torr, P.H.S.: Fast human pose detection using randomized hierarchical cascades of rejectors. IJCV 99(1), 25–52 (2012)
Sense, P.: The primesensortmreference design 1.08. Prime Sense (2011)
Intel: Perceptual computing sdk (2013)
Šarić, M.: Libhand: A library for hand articulation Version 0.9 (2011)
SmithMicro: Poser10 (2010) http://poser.smithmicro.com/
Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: 2003 Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 750–757. IEEE (2003)
Romero, J., Feix, T., Kjellstrom, H., Kragic, D.: Spatio-temporal modeling of grasping actions. In: IROS (2010)
Daz3D: Every-hands pose library (2013). http://www.daz3d.com/everyday-hands-poses-for-v4-and-m4
Spinello, L., Arras, K.O.: People detection in rgb-d data. In: IROS (2011)
PrimeSense: Nite2 middleware (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Supplementary material 1 (MP4 17,420 KB)
Supplementary material 2 (MP4 16,284 KB)
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Rogez, G., Khademi, M., Supančič III, J.S., Montiel, J.M.M., Ramanan, D. (2015). 3D Hand Pose Detection in Egocentric RGB-D Images. In: Agapito, L., Bronstein, M., Rother, C. (eds) Computer Vision - ECCV 2014 Workshops. ECCV 2014. Lecture Notes in Computer Science(), vol 8925. Springer, Cham. https://doi.org/10.1007/978-3-319-16178-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-16178-5_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16177-8
Online ISBN: 978-3-319-16178-5
eBook Packages: Computer ScienceComputer Science (R0)