Abstract
This paper addresses the problem of human motion tracking from multiple image sequences. The human body is described by five articulated mechanical chains and human body-parts are described by volumetric primitives with curved surfaces. If such a surface is observed with a camera, an extremal contour appears in the image whenever the surface turns smoothly away from the viewer. We describe a method that recovers human motion through a kinematic parameterization of these extremal contours. The method exploits the fact that the observed image motion of these contours is a function of both the rigid displacement of the surface and of the relative position and orientation between the viewer and the curved surface. First, we describe a parameterization of an extremal-contour point velocity for the case of developable surfaces. Second, we use the zero-reference kinematic representation and we derive an explicit formula that links extremal contour velocities to the angular velocities associated with the kinematic model. Third, we show how the chamfer-distance may be used to measure the discrepancy between predicted extremal contours and observed image contours; moreover we show how the chamfer distance can be used as a differentiable multi-valued function and how the tracker based on this distance can be cast into a continuous non-linear optimization framework. Fourth, we describe implementation issues associated with a practical human-body tracker that may use an arbitrary number of cameras. One great methodological and practical advantage of our method is that it relies neither on model-to-image, nor on image-to-image point matches. In practice we model people with 5 kinematic chains, 19 volumetric primitives, and 54 degrees of freedom; We observe silhouettes in images gathered with several synchronized and calibrated cameras. The tracker has been successfully applied to several complex motions gathered at 30 frames/second.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agarwal, A., & Triggs, W. (2006). Recovering 3D human pose from monocular images. IEEE Transactions on Pattern Analysis & Machine Intelligence, 28(1), 44–58.
Balan, A. O., Sigal, L., & Black, M. J. (2005). A quantitative evaluation of video-based 3D person tracking. In PETS’05 (pp. 349–356).
Barrow, H. G., & Tenenbaum, J. M. (1981). Interpreting line drawings as three-dimensional surfaces. Artificial Intelligence, 17(1–3), 75–116.
Borgefors, G. (1986). Distance transformation in digital images. Computer Vision, Graphics, and Image Processing, 34(3), 344–371.
Bregler, C., Malik, J., & Pullen, K. (2004). Twist based acquisition and tracking of animal and human kinematics. International Journal of Computer Vision, 56(3), 179–194.
Cheung, K. M., Baker, S., & Kanade, T. (2005a). Shape-from-silhouette across time, part I: theory and algorithms. International Journal of Computer Vision, 62(3), 221–247.
Cheung, K. M., Baker, S., & Kanade, T. (2005b). Shape-from-silhouette across time, part II: applications to human modeling and markerless motion tracking. International Journal of Computer Vision, 63(3), 225–245.
David, P., DeMenthon, D. F., Duraiswami, R., & Samet, H. (2004). Softposit: simultaneous pose and correspondence determination. International Journal of Computer Vision, 59(3), 259–284.
Delamarre, Q., & Faugeras, O. (2001). 3D articulated models and multi-view tracking with physical forces. Computer Vision and Image Understanding, 81(3), 328–357.
Deutscher, J., Blake, A., & Reid, I. (2000). Articulated body motion capture by annealed particle filtering. In Computer vision and pattern recognition (pp. 2126–2133).
Do Carmo, M. P. (1976). Differential geometry of curves and surfaces. New York: Prentice-Hall.
Drummond, T., & Cipolla, R. (2001). Real-time tracking of highly articulated structures in the presence of noisy measurements. In ICCV (pp. 315–320).
Felzenswalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Forsyth, D. A., & Ponce, J. (2003). Computer vision—a modern approach. New Jersey: Prentice Hall.
Forsyth, D. A., Arikan, O., Ikemoto, L., O’Brien, J., & Ramanan, D. (2006). Computational studies of human motion, part 1: tracking and motion synthesis. Foundations and Trends in Computer Graphics and Vision, 1(2), 77–254.
Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631.
Gavrila, D. M. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98.
Gavrila, D. M., & Davis, L. S. (1996). 3D model-based tracking of humans in action: a multi-view approach. In Conference on computer vision and pattern recognition (pp. 73–80), San Francisco, CA.
Gavrila, D. M., & Philomin, V. (1999). Real-time object detection for smart vehicles. In IEEE Proceedings of the seventh international conference on computer vision (pp. 87–93), Kerkyra, Greece.
Gleicher, G., & Ferrier, N. (2002). Evaluating video-based motion capture. In Proceedings of the computer animation 2002 (pp. 75–80), Geneva, Switzerland, June 2002.
Huttenlocher, D. P., Klanderman, G. A., & Rucklidge, W. J. (1993). Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(9), 850–863.
Kakadiaris, I., & Metaxas, D. (2000). Model-based estimation of 3D human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1453–1459.
Kehl, R., & Van Gool, L. J. (2006). Markerless tracking of complex human motions from multiple views. Computer Vision and Image Understanding, 103(23), 190–209.
Knossow, D., Ronfard, R., Horaud, R., & Devernay, F. (2006). Tracking with the kinematics of extremal contours. In Lecture notes in computer science. Computer vision—ACCV 2006 (pp. 664–673), Hyderabad, India, January 2006. Berlin: Springer.
Koenderink, J. (1990). Solid shape. Cambridge: The MIT Press.
Kreyzig, E. (1991). Differential geometry. New York: Dover. Reprint of a U. of Toronto 1963 edition.
Martin, F., & Horaud, R. (2002). Multiple camera tracking of rigid objects. International Journal of Robotics Research, 21(2), 97–113.
McCarthy, J. M. (1990). Introduction to theoretical kinematics. Cambridge: MIT Press.
Mikic, I., Trivedi, M. M., Hunter, E., & Cosman, P. C. (2003). Human body model acquisition and tracking using voxel data. International Journal of Computer Vision, 53(3), 199–223.
Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126.
Mooring, B. W., Roth, Z. S., & Driels, M. R. (1991). Fundamentals of manipulator calibration. New York: Wiley.
Murray, R. M., Li, Z., & Sastry, S. S. (1994). A mathematical introduction to robotic manipulation. Ann Arbor: CRC Press.
Plaenkers, R., & Fua, P. (2003). Articulated soft objects for multi-view shape and motion capture. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(10), 1182–1187.
Ronfard, R., Schmid, C., & Triggs, W. (2002). Learning to parse pictures of people. In Proceedings of the 7th European conference on computer vision (Vol. 4, pp. 700–714), Copenhagen, Denmark, June 2002. Berlin: Springer.
Sigal, L., & Black, M. J. (2006). Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion (Technical Report CS-06-08). Department of Computer Science, Brown University, Providence, RI 02912, September 2006.
Sim, D. G., Kwon, O. K., & Park, R. H. (1999). Object matching algorithms using robust Hausdorff distance measures. IEEE Transactions on Image Processing, 8(3), 425–429.
Sminchisescu, C., & Triggs, W. (2003). Kinematic jump processes for monocular 3D human tracking. In International conference on computer vision and pattern recognition (Vol. I, pp. 69–76), June 2003.
Sminchisescu, C., & Triggs, W. (2005). Building roadmaps of minima and transitions in visual models. International Journal of Computer Vision, 61(1), 81–101.
Song, Y., Goncalves, L., & Perona, P. (2003). Unsupervised learning of human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 814–827.
Toyama, K., & Blake, A. (2002). Probabilistic tracking with exemplars in a metric space. International Journal of Computer Vision, 48(1), 9–19.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Knossow, D., Ronfard, R. & Horaud, R. Human Motion Tracking with a Kinematic Parameterization of Extremal Contours. Int J Comput Vis 79, 247–269 (2008). https://doi.org/10.1007/s11263-007-0116-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-007-0116-2