Abstract
In this paper, we consider modeling data lying on multiple continuous manifolds. In particular, we model the shape manifold of a person performing a motion observed from different viewpoints along a view circle at a fixed camera height. We introduce a model that ties together the body configuration (kinematics) manifold and visual (observations) manifold in a way that facilitates tracking the 3D configuration with continuous relative view variability. The model exploits the low-dimensionality nature of both the body configuration manifold and the view manifold, where each of them are represented separately. The resulting representation is used for tracking complex motions within a Bayesian framework, in which the model provides a low-dimensional state representation as well as a constrained dynamic model for both body configuration and view variations. Experimental results estimating the 3D body posture from a single camera are presented for the HUMANEVA dataset and other complex motion video sequences.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aggarwal, J. K., & Cai, Q. (1999). Human motion analysis: a review. Computer Vision and Image Understanding, 73(3), 428–440. http://dx.doi.org/10.1006/cviu.1998.0744.
Agarwal, A., & Triggs, B. (2004). 3D human pose from silhuettes by relevance vector regression. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 882–888).
Brand, M. (1999). Shadow puppetry. In Proceedings of the international conference on computer vision (ICCV) (Vol. 2, pp. 1237–1244).
Campbell, L. W., & Bobick, A. F. (1995). Recognition of human body motion using phase space constraints. In Proceedings of the international conference on computer vision (ICCV) (p. 624).
Christoudias, C. M., & Darrell, T. (2005). On modelling nonlinear shape-and-texture appearance manifolds. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 1067–1074).
Darrell, T., & Pentland, A. (1993). Space-time gesture. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 335–340).
Elgammal, A., & Lee, C. S. (2004a). Inferring 3D body pose from silhouettes using activity manifold learning. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 2, pp. 681–688).
Elgammal, A., & Lee, C. S. (2004b). Separating style and content on a nonlinear manifold. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 478–485).
Elgammal, A., & Lee, C. S. (2007). Nonlinear manifold learning for dynamic shape and dynamic appearance. Computer Vision and Image Understanding, 106(1), 31–46.
Elgammal, A., & Lee, C. S. (2009). Tracking people on a torus. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 520–538.
Gavrila, D. M. (1999). The visual analysis of human movement: a survey. Computer Vision and Image Understanding, 73(1), 82–98. http://dx.doi.org/10.1006/cviu.1998.0716.
Gavrila, D., & Davis, L. (1996). 3-D model-based tracking of humans in action: a multi-view approach. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 73–80).
Grauman, K., Shakhnarovich, G., & Darrell, T. (2003). Inferring 3D structure with a statistical image-based shape model. In Proceedings of the international conference on computer vision (ICCV) (p. 641).
Hogg, D. (1983). Model-based vision: a program to see a walking person. Image and Vision Computing, 1(1), 5–20.
Kakadiaris, I. A., & Metaxas, D. (1996). Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 81–87).
Lathauwer, L. D., de Moor, B., & Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4), 1253–1278.
Lawrence, N. D. (2004). Gaussian process models for visualisation of high dimensional data. In Proceedings of advances in neural information processing (NIPS).
Lee, C. S., & Elgammal, A. (2005). Homeomorphic manifold analysis: Learning decomposable generative models for human motion analysis. In Workshop on dynamical vision.
Lee, C. S., & Elgammal, A. (2006). Simultaneous inference of view and body pose using torus manifolds. In Proceedings of the international conference on pattern recognition (ICPR) (pp. 489–494).
Li, R., Tian, T. P., & Sclaroff, S. (2007). Simultaneous learning of nonlinear manifold and dynamic models for high-dimensional time series. In ICCV 2007 (pp. 1–8).
Lin, R. S., Liu, C. B., Yang, M. H., Ahuja, N., & Levinson, S. (2006). Learning nonlinear manifolds from time series. In Proceedings of the European conference on computer vision (ECCV) (pp. 245–256).
Magnus, J. R., & Neudecker, H. (1988). Matrix differential calculus with applications in statistics and econometrics. New York: Wiley.
Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis. Computer Vision and Image Understanding, 104(2), 90–126.
Moon, K., & Pavlovic, V. (2006). Impact of dynamics on subspace embedding and tracking of sequences. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 198–205).
Morariu, V. I., & Camps, O. I. (2006). Modeling correspondences for multi-camera tracking using nonlinear manifold learning and target dynamics. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 545–552).
Mori, G., & Malik, J. (2002). Estimating human body configurations using shape context matching. In Proceedings of the European conference on computer vision (ECCV) (pp. 666–680).
Murase, H., & Nayar, S. (1995). Visual learning and recognition of 3D objects from appearance. International Journal of Computer Vision, 14(1), 5–24.
O’Rourke, J. (1980). Badler: model-based image analysis of human motion using constraint propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6), 522–536.
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.
Rahimi, A., Recht, B., & Darrell, T. (2005). Learning appearance manifolds from video. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (Vol. 1, pp. 868–875).
Rehg, J. M., & Kanade, T. (1995). Model-based tracking of self-occluding articulated objects. In Proceedings of the international conference on computer vision (ICCV) (pp. 612–617).
Rohr, K. (1994). Towards model-based recognition of human movements in image sequence. Computer Vision, Graphics, and Image Processing, 59(1), 94–115.
Rosales, R., Athitsos, V., & Sclaroff, S. (2001). 3D hand pose reconstruction using specialized mappings. In Proceedings of the international conference on computer vision (ICCV) (pp. 378–387).
Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.
Schlkopf, B., & Smola, A. (2002). Learning with Kernels: support vector machines, regularization, optimization and beyond. Cambridge: MIT Press.
Shakhnarovich, G., Fisher, J. W., & Darrell, T. (2002). Face recognition from long-term observations. In Proceedings of the European conference on computer vision (ECCV) (pp. 851–865).
Shakhnarovich, G., Viola, P., & Darrell, T. (2003). Fast pose estimation with parameter-sensitive hashing. In Proceedings of the international conference on computer vision (ICCV) (pp. 750–759).
Sidenbladh, H., Black, M. J., & Fleet, D. J. (2000). Stochastic tracking of 3D human figures using 2d image motion. In Proceedings of the European conference on computer vision (ECCV) (pp. 702–718).
Sigal, L., & Black, M. J. (2006). Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion (Technical Report CS-06-08). Brown University.
Sminchisescu, C., & Jepson, A. (2004). Generative modeling of continuous non-linearly embedded visual inference. In Proceedings of the international conference on machine learning (ICML) (pp. 140–147).
Sminchisescu, C., Kanaujia, A., Li, Z., & Metaxas, D. N. (2005). Discriminative density propagation for 3D human motion estimation. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 390–397).
Tenenbaum, J. B., & Freeman, W. T. (2000). Separating style and content with bilinear models. Neural Computation, 12, 1247–1283.
Tenenbaum, J., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Tian, T. P., Li, R., & Sclaroff, S. (2005). Articulated pose estimation in a learned smooth space of feasible solutions. In Workshop on learning in computer vision and pattern recognition.
Urtasun, R., Fleet, D. J., Hertzmann, A., & Fua, P. (2005). Priors for people tracking from small training sets. In Proceedings of the international conference on computer vision (ICCV) (pp. 403–410).
Urtasun, R., Fleet, D. J., & Fua, P. (2006). 3D people tracking with Gaussian process dynamical models. In Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR) (pp. 238–245).
Vasilescu, M. A. O. (2002). Human motion signatures: analysis, synthesis, recognition. In Proceedings of the international conference on pattern recognition (ICPR) (Vol. 3, pp. 456–460).
Vasilescu, M. A. O., & Terzopoulos, D. (2002). Multilinear analysis of image ensembles: tensorfaces. In Proceedings of the European conference on computer vision (ECCV) (pp. 447–460).
Wang, J., Fleet, D. J., & Hertzmann, A. (2005). Gaussian process dynamical models. In Proceedings of advances in neural information processing (NIPS).
Yacoob, Y., & Black, M. J. (1999). Parameterized modeling and recognition of activities. Computer Vision and Image Understanding, 73(2), 232–247.
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
Below is the link to the electronic supplementary material. (WMV 802 KB)
Below is the link to the electronic supplementary material. (WMV 3.24 MB)
Below is the link to the electronic supplementary material. (WMV 2.32 MB)
Rights and permissions
About this article
Cite this article
Lee, CS., Elgammal, A. Coupled Visual and Kinematic Manifold Models for Tracking. Int J Comput Vis 87, 118–139 (2010). https://doi.org/10.1007/s11263-009-0266-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-009-0266-5