On Space-Time Interest Points

Laptev, Ivan

doi:10.1007/s11263-005-1838-7

On Space-Time Interest Points

Published: September 2005

Volume 64, pages 107–123, (2005)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Computer Vision Aims and scope Submit manuscript

On Space-Time Interest Points

Download PDF

Ivan Laptev¹

4681 Accesses
2016 Citations
3 Altmetric
Explore all metrics

Abstract

Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for interpretation of spatio-temporal events.

To detect spatio-temporal events, we build on the idea of the Harris and Förstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We estimate the spatio-temporal extents of the detected events by maximizing a normalized spatio-temporal Laplacian operator over spatial and temporal scales. To represent the detected events, we then compute local, spatio-temporal, scale-invariant N-jets and classify each event with respect to its jet descriptor. For the problem of human motion analysis, we illustrate how a video representation in terms of local space-time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Almansa, A. and Lindeberg, T. 2000. Fingerprint enhancement by shape adaptation of scale-space operators with automatic scale-selection. IEEE Transactions on Image Processing, 9(12):2027–2042.
Article Google Scholar
Barron, J., Fleet, D., and Beauchemin, S. 1994. Performance of optical flow techniques. International Journal of Computer Vision, 12(1):43–77.
Article Google Scholar
Baumberg, A.M. and Hogg, D. 1996. Generating spatiotemporal models from examples. Image and Vision Computing, { 14}(8):525–532.
Article Google Scholar
Bigün, J., Granlund, G., and Wiklund, J. 1991. Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):775–790.
Article Google Scholar
Black, M. and Jepson, A. 1998. Eigentracking: Robust matching and tracking of articulated objects using view-based representation. International Journal of Computer Vision, 26(1):63–84.
Article Google Scholar
Black, M., Yacoob, Y., Jepson, A., and Fleet, D. 1997. Learning parameterized models of image motion. Proc. Computer Vision and Pattern Recognition, pp. 561–567.
Blake, A. and Isard, M. 1998. Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5–28.
Article Google Scholar
Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 8–15.
Bretzner, L. and Lindeberg, T. 1998. Feature tracking with automatic selection of spatial scales. Computer Vision and Image Understanding, {71}(3):385–392.
Article Google Scholar
Chomat, O., de Verdiere, V., Hall, D., and Crowley, J. 2000a. Local scale selection for {G}aussian based description techniques. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:117–133.
Chomat, O., Martin, J., and Crowley, J. 2000b. A probabilistic sensor for the perception and recognition of activities. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:487–503.
Duda, R., Hart, P., and Stork, D. 2001. Pattern Classification, Wiley.
Efros, A., Berg, A., Mori, G., and Malik, J. (2003). Recognizing action at a distance. Proc. Ninth International Conference on Computer Vision, Nice, France, pp. 726–733.
Fergus, R., Perona, P., and Zisserman, A. 2003. Object class recognition by unsupervised scale-invariant learning. In Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. II:264–271.
Fleet, D., Black, M., and Jepson, A. 1998. Motion feature detection using steerable flow fields. In Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 274–281.
Florack, L.M.J. 1997. Image Structure, {K}luwer {A}cademic {P}ublishers, Dordrecht, Netherlands.
Google Scholar
Förstner, W.A. and Gülch, E. 1987. A fast operator for detection and precise location of distinct points, corners and centers of circular features. In Proc. Intercommission Workshop of the Int. Soc. for Photogrammetry and Remote Sensing, Interlaken, Switzerland.
Gârding, J. and Lindeberg, T. 1996. Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, {17}(2):163–191.
Article Google Scholar
Hall, D., de Verdiere, V., and Crowley, J. 2000. Object recognition using coloured receptive fields. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:164– 177.
Harris, C. and Stephens, M. 1988. A combined corner and edge detector. Alvey Vision Conference, pp. 147–152.
Hoey, J. and Little, J. 2000. Representation and recognition of complex human motion. In Proc. Computer Vision and Pattern Recognition, Hilton Head, SC, pp. I:752–759.
Koenderink, J. and van Doorn, A. 1987. Representation of local geometry in the visual system. Biological Cybernetics, {55}:367–375.
Article PubMed Google Scholar
Koenderink, J.J. 1988. Scale-time. Biological Cybernetics, {58}:159–162.
Article Google Scholar
Koenderink, J.J. and {van Doorn}, A.J. 1992. Generic neighborhood operators. IEEE Transactions on Pattern Analysis and Machine Intelligence, {14}(6):597–605.
Article Google Scholar
Laptev, I. and Lindeberg, T. 2002. Velocity-Adaptation of Spatio-Temporal Receptive Fields for Direct Recognition of Activities: An Experimental Study. In Proc. ECCV′02 Workshop on Statistical Methods in Video Processing (Extended Version to Appear in Image and Vision Computing), D. Suter (Ed.), Copenhagen, Denmark, pp. 61–66.
Laptev, I. and Lindeberg, T. 2003a. Interest Point Detection and Scale Selection in Space-Time. In Scale-Space′03, L. Griffin and M. Lillholm (Eds.), Vol. 2695 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 372–387.
Laptev, I. and Lindeberg, T. 2003b. Interest points in space-time. In Proc. Ninth International Conference on Computer Vision, Nice, France.
Leung, T. and Malik, J. 2001. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, {43}(1):29–44.
Article Google Scholar
Lindeberg, T. 1994. {Scale-{S}pace {T}heory in {C}omputer {V}ision}, Kluwer Academic Publishers, Boston.
Google Scholar
Lindeberg, T. 1997. On automatic selection of temporal scales in time-causal scale-space, AFPAC′97: Algebraic Frames for the Perception-Action Cycle, Vol. 1315 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 94–113.
Google Scholar
Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, {30}(2):77–116.
Google Scholar
Lindeberg, T. 2002. Time-recursive velocity-adapted spatio-temporal scale-space filters. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:52–67.
Lindeberg, T. and Bretzner, L. 2003. Real-time scale selection in hybrid multi-scale representations. In Scale-Space′03, L. Griffin and M. Lillholm (Eds)., Vol. 2695 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 148–163.
Google Scholar
Lindeberg, T. and Fagerström, D. 1996. Scale-space with causal time direction. In Proc. Fourth European Conference on Computer Vision, Vol. 1064 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Cambridge, UK, pp. I:229–240.
Lowe, D. 1999. Object recognition from local scale-invariant features. In Proc. Seventh International Conference on Computer Vision, Corfu, Greece, pp. 1150–1157.
Malik, J., Belongie, S., Shi, J., and Leung, T. 1999. Textons, contours and regions: Cue integration in image segmentation. In Proc. Seventh International Conference on Computer Vision, Corfu, Greece, pp. 918–925.
Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proc. Eighth International Conference on Computer Vision, Vancouver, Canada, pp. I:525–531.
Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:128–142.
Niyogi, S.A. 1995. Detecting kinetic occlusion. In Proc. Fifth International Conference on Computer Vision, Cambridge, MA, pp. 1044–1049.
Niyogi, S. and Adelson, H. 1994. Analyzing and recognizing walking figures in {XYT}. CVPR, pp. 469–474.
Schmid, C. and Mohr, R. 1997. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530–535.
Article Google Scholar
Schmid, C., Mohr, R., and Bauckhage, C. 2000. Evaluation of interest point detectors. International Journal of Computer Vision, 37(2):151–172.
Article Google Scholar
Sidenbladh, H., Black, M., and Fleet, D. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proc. Sixth European Conference on Computer Vision, Vol. 1843 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. II:702–718.
Smith, S. and Brady, J. 1995. ASSET-2: Real-time motion segmentation and shape tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):814–820.
Article Google Scholar
Tell, D. and Carlsson, S. 2002. Combining topology and appearance for wide baseline matching. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:68–83.
Tuytelaars, T. and Van Gool, L. 2000. Wide baseline stereo matching based on local, affinely invariant regions. British Machine Vision Conference, pp. 412–425.
Wallraven, C., Caputo, B., and Graf, A. 2003. Recognition with local features: the kernel recipe. In Proc. Ninth International Conference on Computer Vision, Nice, France.
Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for visual object class recognition. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:18–32.
Witkin, A.P. 1983. Scale-space filtering. In Proc. 8th Int. Joint Conf. Art. Intell., Karlsruhe, Germany, pp. 1019–1022.
Zelnik-Manor, L. and Irani, M. 2001. Event-based analysis of video. In Proc. Computer Vision and Pattern Recognition, Kauai Marriott, Hawaii, pp. II:123–130.

Download references

Author information

Authors and Affiliations

IRISA/INRIA, Campus Beaulieu, 35042, Rennes Cedex, France
Ivan Laptev

Authors

Ivan Laptev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Laptev.

Additional information

First online version published in June, 2005

Electronic supplementary material

Supplementary material (4.44 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laptev, I. On Space-Time Interest Points. Int J Comput Vision 64, 107–123 (2005). https://doi.org/10.1007/s11263-005-1838-7

Download citation

Received: 08 October 2003
Accepted: 23 June 2004
Issue Date: September 2005
DOI: https://doi.org/10.1007/s11263-005-1838-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On Space-Time Interest Points

Abstract

Article PDF

Similar content being viewed by others

Space–Time Signal Analysis and the 3D Shearlet Transform

Local Spatio-Temporal Representation Using the 3D Shearlet Transform

Spatio-Temporal Scale Selection in Video Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On Space-Time Interest Points

Abstract

Article PDF

Similar content being viewed by others

Space–Time Signal Analysis and the 3D Shearlet Transform

Local Spatio-Temporal Representation Using the 3D Shearlet Transform

Spatio-Temporal Scale Selection in Video Data

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation