Abstract
This paper presents and investigates a set of local space-time descriptors for representing and recognizing motion patterns in video. Following the idea of local features in the spatial domain, we use the notion of space-time interest points and represent video data in terms of local space-time events. To describe such events, we define several types of image descriptors over local spatio-temporal neighborhoods and evaluate these descriptors in the context of recognizing human activities. In particular, we compare motion representations in terms of spatio-temporal jets, position dependent histograms, position independent histograms, and principal component analysis computed for either spatio-temporal gradients or optic flow. An experimental evaluation on a video database with human actions shows that high classification performance can be achieved, and that there is a clear advantage of using local position dependent histograms, consistent with previously reported findings regarding spatial recognition.
The support from the Swedish Research Council and from the Royal Swedish Academy of Sciences as well as the Knut and Alice Wallenberg Foundation is gratefully acknowledged. We also thank Christian Schüldt and Barbara Caputo for their help in obtaining the experimental video data.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Black, M.J., Jepson, A.D.: Eigentracking: Robust matching and tracking of articulated objects using view-based representation. IJCV 26(1), 63–84 (1998)
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE-PAMI 23(3), 257–267 (2001)
Chomat, O., Martin, J., Crowley, J.L.: A Probabilistic Sensor for the Perception and the Recognition of Activities. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. I:487–503. Springer, Heidelberg (2000)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proc. ICCV, pp. 726–733 (2003)
Fablet, R., Bouthemy, P.: Motion recognition using nonparametric image motion models estimated from temporal and multiscale co-occurrence statistics. IEEE-PAMI 25(12), 1619–1624 (2003)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR, Madison, Wisconsin, pp. 264–271 (2003)
Gavrila, D.M.: The visual analysis of human movement: A survey. Computer Vision and Image Understanding 73(1), 82–98 (1999)
Hoey, J., Little, J.J.: Representation and recognition of complex human motion. In: Proc. CVPR, pp. I:752–759 (2000)
Ke, Y., Sukthankar, R.: PCA-SIFT: A more disctinctive representation for local image descriptors. Technical Report IRP–TR–03–15, Intel (2003)
Koenderink, J.J., van Doorn, A.J.: Representation of local geometry in the visual system. Biol. Cyb. 55, 367–375 (1987)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. ICCV, pp. 432–439 (2003)
Laptev, I., Lindeberg, T.: Velocity adaptation of space-time interest points. In: Proc. of ICPR (to appear, 2004)
Laptev, I., Lindeberg, T.: Velocity-adapted spatio-temporal receptive fields for direct recognition of activities. IVC 22(2), 105–116 (2004)
Lindeberg, T.: Feature detection with automatic scale selection. IJCV 30(2), 77–116 (1998)
Lindeberg, T.: Time-recursive velocity-adapted spatio-temporal scale-space filters. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. I:52–67. Springer, Heidelberg (2002)
Lindeberg, T., Gårding, J.: Shape-adapted smoothing in estimation of 3-D depth cues from affine distortions of local 2-D structure. IVC 15, 415–434 (1997)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proc. 7th Int. Conf. on Computer Vision, Corfu, Greece, pp. 1150–1157 (1999)
Lukas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Image Understanding Workshop (1981)
Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. I:128–142. Springer, Heidelberg (2002)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proc. CVPR, pp.II: 257–263 (2003)
Nagel, H.H., Gehrke, A.: Spatiotemporal adaptive filtering for estimation and segmentation of optical flow fields. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 86–102. Springer, Heidelberg (1998)
Schiele, B., Crowley, J.: Recognition without correspondence using multidimensional receptive field histograms. IJCV 36(1), 31–50 (2000)
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. of ICPR (to appear, 2004)
Shah, M., Jain, R. (eds.): Motion-Based Recognition. Kluwer, Dordrecht (1997)
Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. Computer Vision and Image Understanding 73(2), 232–247 (1999)
Zelnik-Manor, L., Irani, M.: Event-based analysis of video. In: Proc. CVPR, pp. II:123–130 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Laptev, I., Lindeberg, T. (2006). Local Descriptors for Spatio-temporal Recognition. In: MacLean, W.J. (eds) Spatial Coherence for Visual Motion Analysis. SCVMA 2004. Lecture Notes in Computer Science, vol 3667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11676959_8
Download citation
DOI: https://doi.org/10.1007/11676959_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32533-8
Online ISBN: 978-3-540-32534-5
eBook Packages: Computer ScienceComputer Science (R0)