Abstract
Optical capturing of human body motion has many practical applications, ranging from motion analysis in sports and medicine, over ergonomy research, up to computer animation in game and movie production. Unfortunately, many existing approaches require expensive multi-camera systems and controlled studios for recording, and expect the person to wear special marker suits. Furthermore, marker-less approaches demand dense camera arrays and indoor recording. These requirements and the high acquisition cost of the equipment makes it applicable only to a small number of people. This has changed in recent years, when the availability of inexpensive depth sensors, such as time-of-flight cameras or the Microsoft Kinect has spawned new research on tracking human motions from monocular depth images. These approaches have the potential to make motion capture accessible to much larger user groups. However, despite significant progress over the last years, there are still unsolved challenges that limit applicability of depth-based monocular full body motion capture. Algorithms are challenged by very noisy sensor data, (self) occlusions, or other ambiguities implied by the limited information that a depth sensor can extract of the scene. In this article, we give an overview on the state-of-the-art in full body human motion capture using depth cameras. Especially, we elaborate on the challenges current algorithms face and discuss possible solutions. Furthermore, we investigate how the integration of additional sensor modalities may help to resolve some of the ambiguities and improve tracking results.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. CVIU 104(2), 90–126 (2006)
Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: ICCV (2011)
Menache, A.: Understanding Motion Capture for Computer Animation and Video Games, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)
Bregler, C., Malik, J., Pullen, K.: Twist based acquisition and tracking of animal and human kinematics. IJCV 56(3), 179–194 (2004)
Gall, J., Stoll, C., de Aguiar, E., Theobalt, C., Rosenhahn, B., Seidel, H.P.: Motion capture using joint skeleton tracking and surface estimation. In: CVPR, pp. 1746–1753 (2009)
Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.P., Rosenhahn, B.: Multisensor-fusion for 3d full-body human motion capture. In: CVPR, pp. 663–670 (2010)
Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: CVPR, pp. 1249–1256 (2011)
Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of gaussians body model. In: ICCV, pp. 951–958 (2011)
Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed particle filtering. In: CVPR, vol. 2, pp. 126–133 (2000)
Starck, J., Hilton, A.: Spherical matching for temporal correspondence of non-rigid surfaces. In: ICCV, pp. 1387–1394 (2005)
Starck, J., Hilton, A.: Correspondence labelling for wide-timeframe free-form surface matching. In: ICCV, pp. 1–8 (2007)
Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Computer Graphics and Applications 27(3), 21–31 (2007)
Matusik, W., Buehler, C., Raskar, R., Gortler, S., McMillan, L.: Image-based visual hulls. In: SIGGRAPH 2000, pp. 369–374 (2000)
de Aguiar, E., Stoll, C., Theobalt, C., Naveed, A., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. TOG 27, 1–10 (2008)
Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. TOG (2008)
Pekelny, Y., Gotsman, C.: Articulated object reconstruction and markerless motion capture from depth video. CGF 27(2), 399–408 (2008)
Knoop, S., Vacek, S., Dillmann, R.: Fusion of 2D and 3D sensor data for articulated body tracking. Robotics and Autonomous Systems 57(3), 321–329 (2009)
Bleiweiss, A., Kutliroff, E., Eilat, G.: Markerless motion capture using a single depth sensor. In: SIGGRAPH ASIA Sketches (2009)
Friborg, R.M., Hauberg, S., Erleben, K.: GPU accelerated likelihoods for stereo-based articulated tracking. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part II. LNCS, vol. 6554, pp. 359–371. Springer, Heidelberg (2012)
Demirdjian, D., Taycher, L., Shakhnarovich, G., Graumanand, K., Darrell, T.: Avoiding the streetlight effect: Tracking by exploring likelihood modes. In: ICCV, vol. 1, pp. 357–364 (2005)
Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real-time human pose tracking from range data. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 738–751. Springer, Heidelberg (2012)
Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 828–841. Springer, Heidelberg (2012)
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Realtime identification and localization of body parts from depth images. In: ICRA, Anchorage, Alaska, USA (2010)
Zhu, Y., Dariush, B., Fujimura, K.: Kinematic self retargeting: A framework for human pose estimation. CVIU 114(12), 1362–1375 (2010), Special issue on Time-of-Flight Camera Based Computer Vision
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: CVPR (2011)
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: ICCV, pp. 415–422 (2011)
Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.W.: The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In: CVPR (2012)
Salzmann, M., Urtasun, R.: Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction. In: CVPR (2010)
Ganapathi, V., Plagemann, C., Thrun, S., Koller, D.: Real time motion capture using a single time-of-flight camera. In: CVPR (2010)
Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.: Accurate 3d pose estimation from a single depth image. In: ICCV, pp. 731–738 (2011)
Liao, M., Zhang, Q., Wang, H., Yang, R., Gong, M.: Modeling deformable objects from a single depth camera. In: ICCV, pp. 167–174 (2009)
Krüger, B., Tautges, J., Weber, A., Zinke, A.: Fast local and global similarity searches in large motion capture databases. In: Symposium on Computer Animation, pp. 1–10 (2010)
Wei, X., Zhang, P., Chai, J.: Accurate realtime full-body motion capture using a single depth camera. TOG 31(6), 188:1–188:12 (2012)
Weiss, A., Hirshberg, D., Black, M.: Home 3D body scans from noisy image and range data. In: ICCV (2011)
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: Shape completion and animation of people. ACM TOG 24, 408–416 (2005)
Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. CGF 2(28) (March 2009)
Maimone, A., Fuchs, H.: Reducing interference between multiple structured light depth sensors using motion. In: 2012 IEEE Virtual Reality Short Papers and Posters (VRW), pp. 51–54 (2012)
Butler, A., Izadi, S., Hilliges, O., Molyneaux, D., Hodges, S., Kim, D.: Shake’n’sense: Reducing interference for overlapping structured light depth cameras. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2012, pp. 1933–1936 (2012)
Ziegler, J., Kretzschmar, H., Stachniss, C., Grisetti, G., Burgard, W.: Accurate human motion capture in large areas by combining IMU- and laser-based people tracking. In: IROS, pp. 86–91 (2011)
Chai, J., Hodgins, J.K.: Performance animation from low-dimensional control signals. TOG 24(3), 686–696 (2005)
Slyper, R., Hodgins, J.K.: Action capture with accelerometers. In: Symposium on Computer Animation, pp. 193–199 (2008)
Tautges, J., Zinke, A., Krüger, B., Baumann, J., Weber, A., Helten, T., Müller, M., Seidel, H.P., Eberhardt, B.: Motion reconstruction using sparse accelerometer data. TOG 30(3), 18 (2011)
Helten, T., Müller, M., Tautges, J., Weber, A., Seidel, H.-P.: Towards cross-modal comparison of human motion data. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 61–70. Springer, Heidelberg (2011)
Brox, T., Rosenhahn, B., Gall, J., Cremers, D.: Combined region- and motion-based 3d tracking of rigid and articulated objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(3), 402–415 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Helten, T., Baak, A., Müller, M., Theobalt, C. (2013). Full-Body Human Motion Capture from Monocular Depth Images. In: Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds) Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. Lecture Notes in Computer Science, vol 8200. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44964-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-44964-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44963-5
Online ISBN: 978-3-642-44964-2
eBook Packages: Computer ScienceComputer Science (R0)