Object Level Grouping for Video Shots

Sivic, Josef; Schaffalitzky, Frederik; Zisserman, Andrew

doi:10.1007/978-3-540-24671-8_7

Josef Sivic¹⁶,
Frederik Schaffalitzky¹⁶ &
Andrew Zisserman¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3022))

Included in the following conference series:

European Conference on Computer Vision

1749 Accesses
23 Citations

Abstract

We describe a method for automatically associating image patches from frames of a movie shot into object-level groups. The method employs both the appearance and motion of the patches.

There are two areas of innovation: first, affine invariant regions are used to repair short gaps in individual tracks and also to join sets of tracks across occlusions (where many tracks are lost simultaneously); second, a robust affine factorization method is developed which is able to cope with motion degeneracy. This factorization is used to associate tracks into object-level groups.

The outcome is that separate parts of an object that are never visible simultaneously in a single frame are associated together. For example, the front and back of a car, or the front and side of a face. In turn this enables object-level matching and recognition throughout a video.

We illustrate the method for a number of shots from the feature film ‘Groundhog Day’.

Download to read the full chapter text

Chapter PDF

Multiple Object Tracking by Efficient Graph Partitioning

Tracking People in Video Sequences by Clustering Feature Motion Paths

A 3D Tracker for Ground-Moving Objects

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Aanaes, H., Fisker, R., Astrom, K., Carstensen, J.M.: Robust factorization. IEEE PAMI 24, 1215–1225 (2002)
Google Scholar
Bolles, R.C., Baker, H.H., Marimont, D.H.: Epipolar-plane image analysis: An approach to determining structure from motion. IJCV 1(1), 7–56 (1987)
Article Google Scholar
De la Torre, F., Black, M.J.: A framework for robust subspace learning. IJCV 54, 117–142 (2003)
Article MATH Google Scholar
Ferrari, V., Tuytelaars, T., Van Gool, L.: Wide-baseline multiple-view correspondences. In: Proc. CVPR, pp. 718–725 (2003)
Google Scholar
Fitzgibbon, A., Zisserman, A.: Automatic camera tracking. In: Shah, Kumar (eds.) Video Registration, Kluwer, Dordrecht (2003)
Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2000) ISBN: 0521623049
MATH Google Scholar
Jacobs, D.W.: Linear fitting with missing data: applications to structure-from-motion and to characterizing intensity images. In: Proc. CVPR, pp. 206–212 (1997)
Google Scholar
Lowe, D.: Object recognition from local scale-invariant features. In: Proc. ICCV, pp. 1150–1157 (1999)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proc. BMVC, pp. 384–393 (2002)
Google Scholar
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Chapter Google Scholar
Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “How do I organize my holiday snaps? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002)
Chapter Google Scholar
Schmid, C.: Appariement d’Images par Invariants Locaux de Niveaux de Gris. PhD thesis, L’Institut National Polytechnique de Grenoble, Grenoble (1997)
Google Scholar
Shum, H.-Y., Ikeuchi, I., Reddy, R.: Principal component analysis with missing data and its application to polyhedral object modeling. IEEE PAMI 17, 854–867 (1995)
Google Scholar
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. ICCV (2003)
Google Scholar
Torr, P.H.S.: Motion segmentation and outlier detection. PhD thesis, Dept. of Engineering Science, University of Oxford (1995)
Google Scholar
Torr, P.H.S., Szeliski, R., Anadan, P.: An integrated bayesian approach to layer extraction from image sequence. IEEE PAMI 23, 297–304 (2001)
Google Scholar
Torr, P.H.S., Zisserman, A., Maybank, S.: Robust detection of degenerate configurations for the fundamental matrix. CVIU 71(3), 312–333 (1998)
Google Scholar
Tuytelaars, T., Van Gool, L.: Wide baseline stereo matching based on local, affinely invariant regions. In: Proc. BMVC, pp. 412–425 (2000)
Google Scholar
Wallraven, C., Bulthoff, H.: Automatic acquisition of exemplar-based representations for recognition from image sequences. In: CVPR Workshop on Models vs. Exemplars (2001)
Google Scholar
Zelnik-Manor, L., Irani, M.: Multi-view subspace constraints on homographies. In: Proc. ICCV (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Robotics Research Group, Department of Engineering Science, University of Oxford,
Josef Sivic, Frederik Schaffalitzky & Andrew Zisserman

Authors

Josef Sivic
View author publications
You can also search for this author in PubMed Google Scholar
Frederik Schaffalitzky
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zisserman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Machine Perception, Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University, Prague 6, Czech Republic
Tomás Pajdla
Center for Machine Perception, Dept. of Cybernetics, Faculty of Elec. Eng., Czech Technical University in Prague, Karlovo nám. 13, 121 35, Prague, Czech Rep.
Jiří Matas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sivic, J., Schaffalitzky, F., Zisserman, A. (2004). Object Level Grouping for Video Shots. In: Pajdla, T., Matas, J. (eds) Computer Vision - ECCV 2004. ECCV 2004. Lecture Notes in Computer Science, vol 3022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24671-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-24671-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21983-5
Online ISBN: 978-3-540-24671-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Object Level Grouping for Video Shots

Abstract

Chapter PDF

Similar content being viewed by others

Multiple Object Tracking by Efficient Graph Partitioning

Tracking People in Video Sequences by Clustering Feature Motion Paths

A 3D Tracker for Ground-Moving Objects

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Object Level Grouping for Video Shots

Abstract

Chapter PDF

Similar content being viewed by others

Multiple Object Tracking by Efficient Graph Partitioning

Tracking People in Video Sequences by Clustering Feature Motion Paths

A 3D Tracker for Ground-Moving Objects

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation