Abstract
We introduce a framework for unconstrained 3D human upper body pose estimation from multiple camera views in complex environment. Its main novelty lies in the integration of three components: single-frame pose recovery, temporal integration and model texture adaptation. Single-frame pose recovery consists of a hypothesis generation stage, in which candidate 3D poses are generated, based on probabilistic hierarchical shape matching in each camera view. In the subsequent hypothesis verification stage, the candidate 3D poses are re-projected into the other camera views and ranked according to a multi-view likelihood measure. Temporal integration consists of computing K-best trajectories combining a motion model and observations in a Viterbi-style maximum-likelihood approach. Poses that lie on the best trajectories are used to generate and adapt a texture model, which in turn enriches the shape likelihood measure used for pose recovery. The multiple trajectory hypotheses are used to generate pose predictions, augmenting the 3D pose candidates generated at the next time step.
We demonstrate that our approach outperforms the state-of-the-art in experiments with large and challenging real-world data from an outdoor setting.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 44–58.
Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: people detection and articulated pose estimation. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Balan, A., & Black, M. (2006). An adaptive appearance model approach for model-based articulated object tracking. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Balan, A. O., Sigal, L., Black, M. J., Davis, J. E., & Haussecker, H. W. (2007). Detailed human shape and pose from images. In: CVPR (pp. 1–8).
Bergtholdt, M., Kappes, J., Schmidt, S., & Schnörr, C. (2010). A study of parts-based object class detection using complete graphs. International Journal of Computer Vision, 87(1–2), 93–117.
Bissacco, A., Yang, M. H., & Soatto, S. (2007). Fast human pose estimation using appearance and motion via multi-dimensional boosting regression. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Bo, L., & Sminchisescu, C. (2010). Twin Gaussian processes for structured prediction. International Journal of Computer Vision, 87(1–2), 28–52.
Bouguet, J. Y. (2003). Camera calibration toolbox for Matlab.
Bray, M., Meier, E. K., Schraudolph, N. N., & Gool, L. J. V. (2007). Fast stochastic optimization for articulated structure tracking. Image and Vision Computing, 25(3), 352–364.
Brubaker, M., Fleet, D., & Hertzmann, A. (2010). Physics-based person tracking using the anthropomorphic walker. International Journal of Computer Vision, 87(1–2), 140–155.
Cheung, K. M., Baker, S., & Kanade, T. (2005a). Shape-from-silhouette across time—part I. International Journal of Computer Vision, 62, 221–247.
Cheung, K. M., Baker, S., & Kanade, T. (2005b). Shape-from-silhouette across time—part II. International Journal of Computer Vision, 63(3), 225–245.
Corazza, S., Mündermann, L., Gambaretto, E., Ferrigno, G., & Andriacchi, T. (2010). 3D human motion tracking with a coordinated mixture of factor analyzers. International Journal of Computer Vision, 87(1–2), 156–169.
Deutscher, J., & Reid, I. (2005). Articulated body motion capture by stochastic search. International Journal of Computer Vision, 61(2), 185–205.
Drummond, T., & Cipolla, R. (2001). Real-time tracking of highly articulated structures in the presence of noisy measurements. In Proc. of the IEEE international conference on computer vision (ICCV) (pp. 315–320).
Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2009). Pose search: retrieving people using their pose. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Fleuret, F., Berclaz, J., Lengagne, R., & Fua, P. (2008). Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 267–282.
Forsyth, D. A., Arikan, O., Ikemoto, L., O’Brien, J., & Ramanan, D. (2005). Computational studies of human motion. Foundations and Trends in Computer Graphics and Vision, 1(2–3), 77–254.
Fossati, A., Dimitrijevic, M., Lepetit, V., & Fua, P. (2007). Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Fossati, A., Salzmann, M., & Fua, P. (2009). Observable subspaces for 3D human motion recovery. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Gall, J., Stoll, C., de Aguiar, E., Theobalt, C., Rosenhahn, B., & Seidel, H. P. (2009). Motion capture using joint skeleton tracking and surface estimation. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Gall, J., Rosenhahn, B., Brox, T., & Seidel, H. P. (2010). Optimization and filtering for human motion capture. International Journal of Computer Vision, 87(1–2), 75–92.
Gavrila, D. M. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98.
Gavrila, D. M. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(8), 1408–1421.
Gavrila, D. M., & Davis, L. (1996). 3-D model-based tracking of humans in action: a multi-view approach. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Gavrila, D. M., & Munder, S. (2007). Multi-cue pedestrian detection and tracking from a moving vehicle. International Journal of Computer Vision, 73(1), 41–59.
Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Gall, J., & Seidel, H. P. (2009). Markerless motion capture with unsynchronized moving cameras. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Hofmann, M., & Gavrila, D. M. (2009a). Multi-view 3D human pose estimation combining single-frame recovery, temporal integration and model adaptation. In: Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Hofmann, M., & Gavrila, D. M. (2009b). Single-frame 3D human pose recovery from multiple views. In Proc. of the DAGM symposium on pattern recognition.
Kakadiaris, I., & Metaxas, D. (2000). Model-based estimation of 3-D human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1453–1459.
Kanaujia, A., Sminchisescu, C., & Metaxas, D. (2007). Semi-supervised hierarchical models for 3D human pose reconstruction. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Kehl, R., & Gool, L. V. (2006). Markerless tracking of complex human motions from multiple views. Computer Vision and Image Understanding, 103(2–3), 190–209.
Knossow, D., Ronfard, R., & Horaud, R. (2008). Human motion tracking with a kinematic parametrization of extremal contours. International Journal of Computer Vision, 79, 247–269.
Kohli, P., Rihan, J., Bray, M., & Torr, P. H. S. (2008). Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. International Journal of Computer Vision, 79, 285–298.
Laurentini, A. (1994). The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(2), 150–162.
Lee, C. S., & Elgammal, A. (2010). Coupled visual and kinematic manifold models for tracking. International Journal of Computer Vision, 87(1–2), 118–139.
Lee, M. W., & Cohen, I. (2006). A model-based approach for estimating human 3D poses in static images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 905–916.
Lee, M. W., & Nevatia, R. (2009). Human pose tracking in monocular sequence using multilevel structured models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 27–38.
Li, R., Tian, T. P., Sclaroff, S., & Yang, M. H. (2010). 3d human motion tracking with a coordinated mixture of factor analyzers. International Journal of Computer Vision, 87(1–2), 170–190.
Liem, M., & Gavrila, D. M. (2009). Multi-person tracking with overlapping cameras in complex, dynamic environments. In Proc. of the British machine vision conference (BMVC).
Lv, F., & Nevatia, R. (2007). Single view human action recognition using key pose matching and Viterbi path searching. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Marquardt, D. (1963). An algorithm for least-squares estimation of nonlinear parameters. SIAM Journal on Applied Mathematics, 11, 431–441.
Mikic, I., Trivedi, M., Hunter, E., & Cosman, P. (2003). Human body model acquisition and tracking using voxel data. International Journal of Computer Vision, 53(3), 199–223.
Moeslund, T. B., Hilton, A., & Kruger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 103(2–3), 90–126.
Mori, G., & Malik, J. (2006). Recovering 3D human body configurations using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(7), 1052–1062.
Navaratnam, R., Thayananthan, A., Torr, P. H. S., & Cipolla, R. (2005). Hierarchical part-based human body pose estimation. In Proc. of the British machine vision conference (BMVC).
Ong, E. J., Hilton, A., & Micilotta, A. S. (2006). Viewpoint invariant exemplar-based 3D human tracking. Computer Vision and Image Understanding, 104, 178–189.
Peursum, P., Venkatesh, S., & West, G. (2007). Tracking-as-recognition for articulated full-body human motion analysis. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Peursum, P., Venkatesh, S., & West, G. (2010). A study on smoothing for particle-filtered 3d human body tracking. International Journal of Computer Vision, 87(1–2), 53–74.
Pilu, M., & Fisher, R. B. (1995). Equal-distance sampling of superellipse models. In Proc. of the British machine vision conference (BMVC).
Rabiner, L. (1989). A tutorial on HMMs and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 65–81.
Roberts, T. J., McKenna, S. J., & Ricketts, I. W. (2006). Human tracking using 3D surface colour distributions. Image and Vision Computing, 24(12), 1332–1342.
Rogez, G., Rihan, J., Ramalingam, S., Orrite, C., & Torr, P. H. (2008). Randomized trees for human pose detection. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Rosenhahn, B., & Brox, T. (2007). Scaled motion dynamics for markerless motion capture. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Seshadri, N., & Sundberg, C. (1994). List Viterbi decoding algorithms with applications. IEEE Transactions on Communications, 42, 313–323.
Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In Proc. of the IEEE international conference on computer vision (ICCV) (pp. 750–757).
Sigal, L., & Black, M. (2010). Guest editorial: state of the art in image- and video-based human pose and motion estimation. International Journal of Computer Vision, 87(1–2), 1–3.
Sigal, L., Bhatia, S., Roth, S., Black, M. J., & Isard, M. (2004). Tracking loose-limbed people. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Sigal, L., Balan, A., & Black, M. (2010). Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87(1–2), 4–27.
Starck, J., & Hilton, A. (2003). Model-based multiple view reconstruction of people. In Proc. of the IEEE international conference on computer vision (ICCV) (pp. 915–922).
Stenger, B., Thayananthan, A., Torr, P. H. S., & Cipolla, R. (2006). Model-based hand tracking using a hierarchical Bayesian filter. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1372–1384.
Sundaresan, A., & Chellappa, R. (2009). Multicamera tracking of articulated human motion using shape and motion cues. IEEE Transactions on Image Processing, 18(9), 2114–2126.
Vondrak, M., Sigal, L., & Jenkins, O. C. (2008). Physical simulation for probabilistic motion tracking. In Proc. of the IEEE conf. on computer vision and pattern recognition (CVPR).
Xu, X., & Li, B. (2007). Learning motion correlation for tracking articulated human body with a Rao-Blackwellised particle filter. In Proc. of the IEEE international conference on computer vision (ICCV).
Zivkovic, Z. (2004). Improved adaptive Gaussian mixture model for background subtraction. In Proc. of the international conference on pattern recognition (2) (pp. 28–31).
Author information
Authors and Affiliations
Corresponding author
Additional information
Most research was carried out while the first author was with TNO Defence, Safety & Security, The Hague, The Netherlands.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Hofmann, M., Gavrila, D.M. Multi-view 3D Human Pose Estimation in Complex Environment. Int J Comput Vis 96, 103–124 (2012). https://doi.org/10.1007/s11263-011-0451-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0451-1