Full-Body Human Motion Capture from Monocular Depth Images

Helten, Thomas; Baak, Andreas; Müller, Meinard; Theobalt, Christian

doi:10.1007/978-3-642-44964-2_9

Thomas Helten²⁰,
Andreas Baak²⁰,
Meinard Müller²¹ &
…
Christian Theobalt²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8200))

4902 Accesses
6 Citations

Abstract

Optical capturing of human body motion has many practical applications, ranging from motion analysis in sports and medicine, over ergonomy research, up to computer animation in game and movie production. Unfortunately, many existing approaches require expensive multi-camera systems and controlled studios for recording, and expect the person to wear special marker suits. Furthermore, marker-less approaches demand dense camera arrays and indoor recording. These requirements and the high acquisition cost of the equipment makes it applicable only to a small number of people. This has changed in recent years, when the availability of inexpensive depth sensors, such as time-of-flight cameras or the Microsoft Kinect has spawned new research on tracking human motions from monocular depth images. These approaches have the potential to make motion capture accessible to much larger user groups. However, despite significant progress over the last years, there are still unsolved challenges that limit applicability of depth-based monocular full body motion capture. Algorithms are challenged by very noisy sensor data, (self) occlusions, or other ambiguities implied by the limited information that a depth sensor can extract of the scene. In this article, we give an overview on the state-of-the-art in full body human motion capture using depth cameras. Especially, we elaborate on the challenges current algorithms face and discuss possible solutions. Furthermore, we investigate how the integration of additional sensor modalities may help to resolve some of the ambiguities and improve tracking results.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Real-Time Multi-person Motion Capture from Multi-view Video and IMUs

Article Open access 17 December 2019

DeMoCap: Low-Cost Marker-Based Motion Capture

Article 15 October 2021

Introduction to Low-Cost Motion-Tracking for Virtual Rehabilitation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Moeslund, T., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. CVIU 104(2), 90–126 (2006)
Google Scholar
Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: ICCV (2011)
Google Scholar
Menache, A.: Understanding Motion Capture for Computer Animation and Video Games, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Google Scholar
Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)
Article Google Scholar
Bregler, C., Malik, J., Pullen, K.: Twist based acquisition and tracking of animal and human kinematics. IJCV 56(3), 179–194 (2004)
Article Google Scholar
Gall, J., Stoll, C., de Aguiar, E., Theobalt, C., Rosenhahn, B., Seidel, H.P.: Motion capture using joint skeleton tracking and surface estimation. In: CVPR, pp. 1746–1753 (2009)
Google Scholar
Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.P., Rosenhahn, B.: Multisensor-fusion for 3d full-body human motion capture. In: CVPR, pp. 663–670 (2010)
Google Scholar
Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: CVPR, pp. 1249–1256 (2011)
Google Scholar
Stoll, C., Hasler, N., Gall, J., Seidel, H.P., Theobalt, C.: Fast articulated motion tracking using a sums of gaussians body model. In: ICCV, pp. 951–958 (2011)
Google Scholar
Deutscher, J., Blake, A., Reid, I.: Articulated body motion capture by annealed particle filtering. In: CVPR, vol. 2, pp. 126–133 (2000)
Google Scholar
Starck, J., Hilton, A.: Spherical matching for temporal correspondence of non-rigid surfaces. In: ICCV, pp. 1387–1394 (2005)
Google Scholar
Starck, J., Hilton, A.: Correspondence labelling for wide-timeframe free-form surface matching. In: ICCV, pp. 1–8 (2007)
Google Scholar
Starck, J., Hilton, A.: Surface capture for performance-based animation. IEEE Computer Graphics and Applications 27(3), 21–31 (2007)
Article Google Scholar
Matusik, W., Buehler, C., Raskar, R., Gortler, S., McMillan, L.: Image-based visual hulls. In: SIGGRAPH 2000, pp. 369–374 (2000)
Google Scholar
de Aguiar, E., Stoll, C., Theobalt, C., Naveed, A., Seidel, H.P., Thrun, S.: Performance capture from sparse multi-view video. TOG 27, 1–10 (2008)
Article Google Scholar
Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. TOG (2008)
Google Scholar
Pekelny, Y., Gotsman, C.: Articulated object reconstruction and markerless motion capture from depth video. CGF 27(2), 399–408 (2008)
Google Scholar
Knoop, S., Vacek, S., Dillmann, R.: Fusion of 2D and 3D sensor data for articulated body tracking. Robotics and Autonomous Systems 57(3), 321–329 (2009)
Article Google Scholar
Bleiweiss, A., Kutliroff, E., Eilat, G.: Markerless motion capture using a single depth sensor. In: SIGGRAPH ASIA Sketches (2009)
Google Scholar
Friborg, R.M., Hauberg, S., Erleben, K.: GPU accelerated likelihoods for stereo-based articulated tracking. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part II. LNCS, vol. 6554, pp. 359–371. Springer, Heidelberg (2012)
Chapter Google Scholar
Demirdjian, D., Taycher, L., Shakhnarovich, G., Graumanand, K., Darrell, T.: Avoiding the streetlight effect: Tracking by exploring likelihood modes. In: ICCV, vol. 1, pp. 357–364 (2005)
Google Scholar
Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real-time human pose tracking from range data. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 738–751. Springer, Heidelberg (2012)
Chapter Google Scholar
Ye, G., Liu, Y., Hasler, N., Ji, X., Dai, Q., Theobalt, C.: Performance capture of interacting characters with handheld kinects. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 828–841. Springer, Heidelberg (2012)
Chapter Google Scholar
Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Realtime identification and localization of body parts from depth images. In: ICRA, Anchorage, Alaska, USA (2010)
Google Scholar
Zhu, Y., Dariush, B., Fujimura, K.: Kinematic self retargeting: A framework for human pose estimation. CVIU 114(12), 1362–1375 (2010), Special issue on Time-of-Flight Camera Based Computer Vision
Google Scholar
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: CVPR (2011)
Google Scholar
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: ICCV, pp. 415–422 (2011)
Google Scholar
Taylor, J., Shotton, J., Sharp, T., Fitzgibbon, A.W.: The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation. In: CVPR (2012)
Google Scholar
Salzmann, M., Urtasun, R.: Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction. In: CVPR (2010)
Google Scholar
Ganapathi, V., Plagemann, C., Thrun, S., Koller, D.: Real time motion capture using a single time-of-flight camera. In: CVPR (2010)
Google Scholar
Ye, M., Wang, X., Yang, R., Ren, L., Pollefeys, M.: Accurate 3d pose estimation from a single depth image. In: ICCV, pp. 731–738 (2011)
Google Scholar
Liao, M., Zhang, Q., Wang, H., Yang, R., Gong, M.: Modeling deformable objects from a single depth camera. In: ICCV, pp. 167–174 (2009)
Google Scholar
Krüger, B., Tautges, J., Weber, A., Zinke, A.: Fast local and global similarity searches in large motion capture databases. In: Symposium on Computer Animation, pp. 1–10 (2010)
Google Scholar
Wei, X., Zhang, P., Chai, J.: Accurate realtime full-body motion capture using a single depth camera. TOG 31(6), 188:1–188:12 (2012)
Google Scholar
Weiss, A., Hirshberg, D., Black, M.: Home 3D body scans from noisy image and range data. In: ICCV (2011)
Google Scholar
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: Shape completion and animation of people. ACM TOG 24, 408–416 (2005)
Article Google Scholar
Hasler, N., Stoll, C., Sunkel, M., Rosenhahn, B., Seidel, H.P.: A statistical model of human pose and body shape. CGF 2(28) (March 2009)
Google Scholar
Maimone, A., Fuchs, H.: Reducing interference between multiple structured light depth sensors using motion. In: 2012 IEEE Virtual Reality Short Papers and Posters (VRW), pp. 51–54 (2012)
Google Scholar
Butler, A., Izadi, S., Hilliges, O., Molyneaux, D., Hodges, S., Kim, D.: Shake’n’sense: Reducing interference for overlapping structured light depth cameras. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2012, pp. 1933–1936 (2012)
Google Scholar
Ziegler, J., Kretzschmar, H., Stachniss, C., Grisetti, G., Burgard, W.: Accurate human motion capture in large areas by combining IMU- and laser-based people tracking. In: IROS, pp. 86–91 (2011)
Google Scholar
Chai, J., Hodgins, J.K.: Performance animation from low-dimensional control signals. TOG 24(3), 686–696 (2005)
Article Google Scholar
Slyper, R., Hodgins, J.K.: Action capture with accelerometers. In: Symposium on Computer Animation, pp. 193–199 (2008)
Google Scholar
Tautges, J., Zinke, A., Krüger, B., Baumann, J., Weber, A., Helten, T., Müller, M., Seidel, H.P., Eberhardt, B.: Motion reconstruction using sparse accelerometer data. TOG 30(3), 18 (2011)
Article Google Scholar
Helten, T., Müller, M., Tautges, J., Weber, A., Seidel, H.-P.: Towards cross-modal comparison of human motion data. In: Mester, R., Felsberg, M. (eds.) DAGM 2011. LNCS, vol. 6835, pp. 61–70. Springer, Heidelberg (2011)
Chapter Google Scholar
Brox, T., Rosenhahn, B., Gall, J., Cremers, D.: Combined region- and motion-based 3d tracking of rigid and articulated objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(3), 402–415 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

MPI Informatik, Campus E1.4, 66123, Saarbrücken, Germany
Thomas Helten, Andreas Baak & Christian Theobalt
International Audio Laboratories, Am Wolfsmantel 33, 91058, Erlangen, Germany
Meinard Müller

Authors

Thomas Helten
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Baak
View author publications
You can also search for this author in PubMed Google Scholar
Meinard Müller
View author publications
You can also search for this author in PubMed Google Scholar
Christian Theobalt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pattern Recognition Group, University of Siegen, Siegen, Germany
Marcin Grzegorzek
Max-lanck-Institute, Graphics, Vision & Video Gruop, Saarbrücken, Germany
Christian Theobalt
Multimedia Information Processing Group, University of Kiel, Kiel, Germany
Reinhard Koch
Computer Graphics and Multimedia Systems Group, University of Siegen, Siegen, Germany
Andreas Kolb

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Helten, T., Baak, A., Müller, M., Theobalt, C. (2013). Full-Body Human Motion Capture from Monocular Depth Images. In: Grzegorzek, M., Theobalt, C., Koch, R., Kolb, A. (eds) Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications. Lecture Notes in Computer Science, vol 8200. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44964-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-44964-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44963-5
Online ISBN: 978-3-642-44964-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Full-Body Human Motion Capture from Monocular Depth Images

Abstract

Chapter PDF

Similar content being viewed by others

Real-Time Multi-person Motion Capture from Multi-view Video and IMUs

DeMoCap: Low-Cost Marker-Based Motion Capture

Introduction to Low-Cost Motion-Tracking for Virtual Rehabilitation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Full-Body Human Motion Capture from Monocular Depth Images

Abstract

Chapter PDF

Similar content being viewed by others

Real-Time Multi-person Motion Capture from Multi-view Video and IMUs

DeMoCap: Low-Cost Marker-Based Motion Capture

Introduction to Low-Cost Motion-Tracking for Virtual Rehabilitation

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation