Abstract
Local image features or interest points provide compact and abstract representations of patterns in an image. In this paper, we extend the notion of spatial interest points into the spatio-temporal domain and show how the resulting features often reflect interesting events that can be used for a compact representation of video data as well as for interpretation of spatio-temporal events.
To detect spatio-temporal events, we build on the idea of the Harris and Förstner interest point operators and detect local structures in space-time where the image values have significant local variations in both space and time. We estimate the spatio-temporal extents of the detected events by maximizing a normalized spatio-temporal Laplacian operator over spatial and temporal scales. To represent the detected events, we then compute local, spatio-temporal, scale-invariant N-jets and classify each event with respect to its jet descriptor. For the problem of human motion analysis, we illustrate how a video representation in terms of local space-time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Almansa, A. and Lindeberg, T. 2000. Fingerprint enhancement by shape adaptation of scale-space operators with automatic scale-selection. IEEE Transactions on Image Processing, 9(12):2027–2042.
Barron, J., Fleet, D., and Beauchemin, S. 1994. Performance of optical flow techniques. International Journal of Computer Vision, 12(1):43–77.
Baumberg, A.M. and Hogg, D. 1996. Generating spatiotemporal models from examples. Image and Vision Computing, { 14}(8):525–532.
Bigün, J., Granlund, G., and Wiklund, J. 1991. Multidimensional orientation estimation with applications to texture analysis and optical flow. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(8):775–790.
Black, M. and Jepson, A. 1998. Eigentracking: Robust matching and tracking of articulated objects using view-based representation. International Journal of Computer Vision, 26(1):63–84.
Black, M., Yacoob, Y., Jepson, A., and Fleet, D. 1997. Learning parameterized models of image motion. Proc. Computer Vision and Pattern Recognition, pp. 561–567.
Blake, A. and Isard, M. 1998. Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 29(1):5–28.
Bregler, C. and Malik, J. 1998. Tracking people with twists and exponential maps. Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 8–15.
Bretzner, L. and Lindeberg, T. 1998. Feature tracking with automatic selection of spatial scales. Computer Vision and Image Understanding, {71}(3):385–392.
Chomat, O., de Verdiere, V., Hall, D., and Crowley, J. 2000a. Local scale selection for {G}aussian based description techniques. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:117–133.
Chomat, O., Martin, J., and Crowley, J. 2000b. A probabilistic sensor for the perception and recognition of activities. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:487–503.
Duda, R., Hart, P., and Stork, D. 2001. Pattern Classification, Wiley.
Efros, A., Berg, A., Mori, G., and Malik, J. (2003). Recognizing action at a distance. Proc. Ninth International Conference on Computer Vision, Nice, France, pp. 726–733.
Fergus, R., Perona, P., and Zisserman, A. 2003. Object class recognition by unsupervised scale-invariant learning. In Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. II:264–271.
Fleet, D., Black, M., and Jepson, A. 1998. Motion feature detection using steerable flow fields. In Proc. Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 274–281.
Florack, L.M.J. 1997. Image Structure, {K}luwer {A}cademic {P}ublishers, Dordrecht, Netherlands.
Förstner, W.A. and Gülch, E. 1987. A fast operator for detection and precise location of distinct points, corners and centers of circular features. In Proc. Intercommission Workshop of the Int. Soc. for Photogrammetry and Remote Sensing, Interlaken, Switzerland.
Gârding, J. and Lindeberg, T. 1996. Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, {17}(2):163–191.
Hall, D., de Verdiere, V., and Crowley, J. 2000. Object recognition using coloured receptive fields. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:164– 177.
Harris, C. and Stephens, M. 1988. A combined corner and edge detector. Alvey Vision Conference, pp. 147–152.
Hoey, J. and Little, J. 2000. Representation and recognition of complex human motion. In Proc. Computer Vision and Pattern Recognition, Hilton Head, SC, pp. I:752–759.
Koenderink, J. and van Doorn, A. 1987. Representation of local geometry in the visual system. Biological Cybernetics, {55}:367–375.
Koenderink, J.J. 1988. Scale-time. Biological Cybernetics, {58}:159–162.
Koenderink, J.J. and {van Doorn}, A.J. 1992. Generic neighborhood operators. IEEE Transactions on Pattern Analysis and Machine Intelligence, {14}(6):597–605.
Laptev, I. and Lindeberg, T. 2002. Velocity-Adaptation of Spatio-Temporal Receptive Fields for Direct Recognition of Activities: An Experimental Study. In Proc. ECCV′02 Workshop on Statistical Methods in Video Processing (Extended Version to Appear in Image and Vision Computing), D. Suter (Ed.), Copenhagen, Denmark, pp. 61–66.
Laptev, I. and Lindeberg, T. 2003a. Interest Point Detection and Scale Selection in Space-Time. In Scale-Space′03, L. Griffin and M. Lillholm (Eds.), Vol. 2695 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 372–387.
Laptev, I. and Lindeberg, T. 2003b. Interest points in space-time. In Proc. Ninth International Conference on Computer Vision, Nice, France.
Leung, T. and Malik, J. 2001. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, {43}(1):29–44.
Lindeberg, T. 1994. {Scale-{S}pace {T}heory in {C}omputer {V}ision}, Kluwer Academic Publishers, Boston.
Lindeberg, T. 1997. On automatic selection of temporal scales in time-causal scale-space, AFPAC′97: Algebraic Frames for the Perception-Action Cycle, Vol. 1315 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 94–113.
Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, {30}(2):77–116.
Lindeberg, T. 2002. Time-recursive velocity-adapted spatio-temporal scale-space filters. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:52–67.
Lindeberg, T. and Bretzner, L. 2003. Real-time scale selection in hybrid multi-scale representations. In Scale-Space′03, L. Griffin and M. Lillholm (Eds)., Vol. 2695 of Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 148–163.
Lindeberg, T. and Fagerström, D. 1996. Scale-space with causal time direction. In Proc. Fourth European Conference on Computer Vision, Vol. 1064 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Cambridge, UK, pp. I:229–240.
Lowe, D. 1999. Object recognition from local scale-invariant features. In Proc. Seventh International Conference on Computer Vision, Corfu, Greece, pp. 1150–1157.
Malik, J., Belongie, S., Shi, J., and Leung, T. 1999. Textons, contours and regions: Cue integration in image segmentation. In Proc. Seventh International Conference on Computer Vision, Corfu, Greece, pp. 918–925.
Mikolajczyk, K. and Schmid, C. 2001. Indexing based on scale invariant interest points. In Proc. Eighth International Conference on Computer Vision, Vancouver, Canada, pp. I:525–531.
Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:128–142.
Niyogi, S.A. 1995. Detecting kinetic occlusion. In Proc. Fifth International Conference on Computer Vision, Cambridge, MA, pp. 1044–1049.
Niyogi, S. and Adelson, H. 1994. Analyzing and recognizing walking figures in {XYT}. CVPR, pp. 469–474.
Schmid, C. and Mohr, R. 1997. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):530–535.
Schmid, C., Mohr, R., and Bauckhage, C. 2000. Evaluation of interest point detectors. International Journal of Computer Vision, 37(2):151–172.
Sidenbladh, H., Black, M., and Fleet, D. 2000. Stochastic tracking of 3D human figures using 2D image motion. In Proc. Sixth European Conference on Computer Vision, Vol. 1843 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. II:702–718.
Smith, S. and Brady, J. 1995. ASSET-2: Real-time motion segmentation and shape tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):814–820.
Tell, D. and Carlsson, S. 2002. Combining topology and appearance for wide baseline matching. In Proc. Seventh European Conference on Computer Vision, Vol. 2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Copenhagen, Denmark, pp. I:68–83.
Tuytelaars, T. and Van Gool, L. 2000. Wide baseline stereo matching based on local, affinely invariant regions. British Machine Vision Conference, pp. 412–425.
Wallraven, C., Caputo, B., and Graf, A. 2003. Recognition with local features: the kernel recipe. In Proc. Ninth International Conference on Computer Vision, Nice, France.
Weber, M., Welling, M., and Perona, P. 2000. Unsupervised learning of models for visual object class recognition. In Proc. Sixth European Conference on Computer Vision, Vol. 1842 of Lecture Notes in Computer Science, Springer Verlag, Berlin, Dublin, Ireland, pp. I:18–32.
Witkin, A.P. 1983. Scale-space filtering. In Proc. 8th Int. Joint Conf. Art. Intell., Karlsruhe, Germany, pp. 1019–1022.
Zelnik-Manor, L. and Irani, M. 2001. Event-based analysis of video. In Proc. Computer Vision and Pattern Recognition, Kauai Marriott, Hawaii, pp. II:123–130.
Author information
Authors and Affiliations
Corresponding author
Additional information
First online version published in June, 2005
Electronic supplementary material
Supplementary material (4.44 MB)
Rights and permissions
About this article
Cite this article
Laptev, I. On Space-Time Interest Points. Int J Comput Vision 64, 107–123 (2005). https://doi.org/10.1007/s11263-005-1838-7
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11263-005-1838-7