Spatio-Temporal Object Recognition

De Geest, Roeland; Deboeverie, Francis; Philips, Wilfried; Tuytelaars, Tinne

doi:10.1007/978-3-319-25903-1_59

Roeland De Geest^19,21,
Francis Deboeverie^20,21,
Wilfried Philips^20,21 &
…
Tinne Tuytelaars^19,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9386))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

2876 Accesses

Abstract

Object recognition in video is in most cases solved by extracting keyframes from the video and then applying still image recognition methods on these keyframes only. This procedure largely ignores the temporal dimension. Nevertheless, the way an object moves may hold valuable information on its class. Therefore, in this work, we analyze the effectiveness of different motion descriptors, originally developed for action recognition, in the context of action-invariant object recognition. We conclude that a higher classification accuracy can be obtained when motion descriptors (specifically, HOG and MBH around trajectories) are used in combination with standard static descriptors extracted from keyframes. Since currently no suitable dataset for this problem exists, we introduce two new datasets and make them publicly available.

This work was financially supported by the project “Multi-camera human behavior monitoring and unusual event detection” (FWO G.0.398.11.N.10) and the PARIS project (IWT-SBO Nr. 110067).

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A Robust and Efficient Video Representation for Action Recognition

Article 17 July 2015

Frame-Level Covariance Descriptor for Action Recognition

Motion Boundary Trajectory for Human Action Recognition

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 282–295. Springer, Heidelberg (2010)
Chapter Google Scholar
Dollár, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: 2005 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72. IEEE (2005)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88(2), 303–338 (2010)
Article Google Scholar
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)
Google Scholar
Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2141–2148. IEEE (2010)
Google Scholar
Kalogeiton, V., Ferrari, V., Schmid, C.: Analysing domain shift factors between videos and images for object detection (2015)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: 2003 Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 432–439. IEEE (2003)
Google Scholar
Liu, L.-F., Jia, W., Zhu, Y.-H.: Survey of gait recognition. In: Huang, D.-S., Jo, K.-H., Lee, H.-H., Kang, H.-J., Bevilacqua, V. (eds.) ICIC 2009. LNCS, vol. 5755, pp. 652–659. Springer, Heidelberg (2009)
Google Scholar
Liu, X., Tao, D., Song, M., Ruan, Y., Chen, C., Bu, J.: Weakly supervised multiclass video segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 57–64 (2013)
Google Scholar
Liu, Y., Jang, Y., Woo, W., Kim, T.K.: Video-based object recognition using novel set-of-sets representations. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 533–540 (2014)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: 1999 Proceedings of the Seventh IEEE International Conference on Computer vision, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Matikainen, P., Hebert, M., Sukthankar, R.: Trajectons: Action recognition through the motion analysis of tracked features. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 514–521. IEEE (2009)
Google Scholar
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 104–111. IEEE (2009)
Google Scholar
Oneata, D., Revaud, J., Verbeek, J., Schmid, C.: Spatio-temporal object detection proposals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part III. LNCS, vol. 8691, pp. 737–752. Springer, Heidelberg (2014)
Google Scholar
Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3282–3289, June 2012
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge (2014)
Google Scholar
Sivic, J., Zisserman, A.: Video Google: a text retrieval approach to object matching in videos. In: 2003 Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 1470–1477. IEEE (2003)
Google Scholar
Snoek, C., Sande, K., Rooij, O., Huurnink, B., Uijlings, J., Liempt, M., Bugalhoy, M., Trancosoy, I., Yan, F., Tahir, M., et al.: The MediaMill TRECVID 2009 semantic video search engine. In: TRECVID Workshop (2009)
Google Scholar
Viola, P., Jones, M.J., Snow, D.: Detecting pedestrians using patterns of motion and appearance. In: 2003 Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 734–741. IEEE (2003)
Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)
Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3551–3558. IEEE (2013)
Google Scholar
Willems, G., Tuytelaars, T., Van Gool, L.: An efficient dense and scale-invariant spatio-temporal interest point detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Chapter Google Scholar
Yuen, J., Russell, B., Liu, C., Torralba, A.: Labelme video: building a video database with human annotations. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1451–1458. IEEE (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

KU Leuven ESAT - PSI, Leuven, Belgium
Roeland De Geest & Tinne Tuytelaars
UGent TELIN - IPI, Ghent, Belgium
Francis Deboeverie & Wilfried Philips
iMinds, Ledeberg, Belgium
Roeland De Geest, Francis Deboeverie, Wilfried Philips & Tinne Tuytelaars

Authors

Roeland De Geest
View author publications
You can also search for this author in PubMed Google Scholar
Francis Deboeverie
View author publications
You can also search for this author in PubMed Google Scholar
Wilfried Philips
View author publications
You can also search for this author in PubMed Google Scholar
Tinne Tuytelaars
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roeland De Geest .

Editor information

Editors and Affiliations

Dipartimento di Matematica e Informatica, Università di Catania, Catania, Catania, Italy
Sebastiano Battiato
Arcueil CX, France
Jacques Blanc-Talon
Catania, Italy
Giovanni Gallo
Gent, Belgium
Wilfried Philips
CSIRO, Sydney, New South Wales, Australia
Dan Popescu
Vision Lab., University of Antwerp, Antwerpen, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Geest, R., Deboeverie, F., Philips, W., Tuytelaars, T. (2015). Spatio-Temporal Object Recognition. In: Battiato, S., Blanc-Talon, J., Gallo, G., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2015. Lecture Notes in Computer Science(), vol 9386. Springer, Cham. https://doi.org/10.1007/978-3-319-25903-1_59

Download citation

DOI: https://doi.org/10.1007/978-3-319-25903-1_59
Published: 06 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25902-4
Online ISBN: 978-3-319-25903-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Spatio-Temporal Object Recognition

Abstract

Chapter PDF

Similar content being viewed by others

A Robust and Efficient Video Representation for Action Recognition

Frame-Level Covariance Descriptor for Action Recognition

Motion Boundary Trajectory for Human Action Recognition

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Spatio-Temporal Object Recognition

Abstract

Chapter PDF

Similar content being viewed by others

A Robust and Efficient Video Representation for Action Recognition

Frame-Level Covariance Descriptor for Action Recognition

Motion Boundary Trajectory for Human Action Recognition

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation