Abstract
We describe a method for automatically associating image patches from frames of a movie shot into object-level groups. The method employs both the appearance and motion of the patches.
There are two areas of innovation: first, affine invariant regions are used to repair short gaps in individual tracks and also to join sets of tracks across occlusions (where many tracks are lost simultaneously); second, a robust affine factorization method is developed which is able to cope with motion degeneracy. This factorization is used to associate tracks into object-level groups.
The outcome is that separate parts of an object that are never visible simultaneously in a single frame are associated together. For example, the front and back of a car, or the front and side of a face. In turn this enables object-level matching and recognition throughout a video.
We illustrate the method for a number of shots from the feature film ‘Groundhog Day’.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aanaes, H., Fisker, R., Astrom, K., Carstensen, J.M.: Robust factorization. IEEE PAMI 24, 1215–1225 (2002)
Bolles, R.C., Baker, H.H., Marimont, D.H.: Epipolar-plane image analysis: An approach to determining structure from motion. IJCV 1(1), 7–56 (1987)
De la Torre, F., Black, M.J.: A framework for robust subspace learning. IJCV 54, 117–142 (2003)
Ferrari, V., Tuytelaars, T., Van Gool, L.: Wide-baseline multiple-view correspondences. In: Proc. CVPR, pp. 718–725 (2003)
Fitzgibbon, A., Zisserman, A.: Automatic camera tracking. In: Shah, Kumar (eds.) Video Registration, Kluwer, Dordrecht (2003)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2000) ISBN: 0521623049
Jacobs, D.W.: Linear fitting with missing data: applications to structure-from-motion and to characterizing intensity images. In: Proc. CVPR, pp. 206–212 (1997)
Lowe, D.: Object recognition from local scale-invariant features. In: Proc. ICCV, pp. 1150–1157 (1999)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Proc. BMVC, pp. 384–393 (2002)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “How do I organize my holiday snaps? In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002)
Schmid, C.: Appariement d’Images par Invariants Locaux de Niveaux de Gris. PhD thesis, L’Institut National Polytechnique de Grenoble, Grenoble (1997)
Shum, H.-Y., Ikeuchi, I., Reddy, R.: Principal component analysis with missing data and its application to polyhedral object modeling. IEEE PAMI 17, 854–867 (1995)
Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: Proc. ICCV (2003)
Torr, P.H.S.: Motion segmentation and outlier detection. PhD thesis, Dept. of Engineering Science, University of Oxford (1995)
Torr, P.H.S., Szeliski, R., Anadan, P.: An integrated bayesian approach to layer extraction from image sequence. IEEE PAMI 23, 297–304 (2001)
Torr, P.H.S., Zisserman, A., Maybank, S.: Robust detection of degenerate configurations for the fundamental matrix. CVIU 71(3), 312–333 (1998)
Tuytelaars, T., Van Gool, L.: Wide baseline stereo matching based on local, affinely invariant regions. In: Proc. BMVC, pp. 412–425 (2000)
Wallraven, C., Bulthoff, H.: Automatic acquisition of exemplar-based representations for recognition from image sequences. In: CVPR Workshop on Models vs. Exemplars (2001)
Zelnik-Manor, L., Irani, M.: Multi-view subspace constraints on homographies. In: Proc. ICCV (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sivic, J., Schaffalitzky, F., Zisserman, A. (2004). Object Level Grouping for Video Shots. In: Pajdla, T., Matas, J. (eds) Computer Vision - ECCV 2004. ECCV 2004. Lecture Notes in Computer Science, vol 3022. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24671-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-24671-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21983-5
Online ISBN: 978-3-540-24671-8
eBook Packages: Springer Book Archive