Abstract
Efficient access to information contained in video databases implies that a structured representation of the content of the video is built beforehand. This paper describes an approach in this direction, targeted at video indexing and browsing. Exploiting a 2D motion model estimator, we partition the video into shots, characterize camera motion, extract and track mobile objects. These steps rely on robust motion estimation, statistical tests and contextual statistical labeling. The content of each shot can then be viewed on a synoptic frame composed of a mosaic image of the background scene, on which trajectories of mobile objects are superimposed. The proposed method also provides instantaneous and long-term, qualitative and quantitative object motion cues for content-based indexing. Its different steps and the system they form are designed to keep computational cost low, while being able to cope with general video content was aimed at. We provide experimental results on real-world sequences. The structured output opens important possible extensions, for instance in the direction of higher-level interpretation.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
P. Aigrain and P. Joly.-The automatic real-time analysis of film editing and transition effects and its applications.-Computer & Graphics, 18(1):93–103, 1994.
P. Aigrain, H.J. Zhang, and D. Petkovic.-Content-based representation and retrieval of visual media: a state-of-the-art review.-Multimedia Tools and Applications, 3(3): 179–202, November 1996.
S. Ayer and H.S Sawhney.-Compact representations of videos through dominant and multiple motion estimation.-IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(8):814–830, August 1996.
M. Basseville.-Detecting changes in signals and systems — a survey.-Automatica, 24(3):309–326, 1988.
J.S. Boreczky and L.A. Rowe.-Comparison of video shot boundary detection techniques.-In In I.K. Sethi and R.C. Jain, editors, Proceedings of IS-T/SPIE Conference on Storage and Retrieval for Image and Video Databases IV, Vol. SPIE 2670, pages 170–179, 1996.
P. Bouthemy and F. Ganansia.-Video partitioning and camera motion characterization for content-based video indexing.-In Proc. of 3rd IEEE Int. Conf. on Image Processing, volume I, pages 905–909, Lausanne, Sept 1996.
C. Castel, L. Chaudron, and C. Tessier.-What is going on? A high level interpretation of sequences of images.-In 4th European Conf. on Computer Vision,, Cambridge UK, April 1996.-LNCS 1065.
J.D. Courtney.-Automatic video indexing via object motion analysis.-Pattern Recognition, 30(4):607–625, April 1997.
M. De Marsico, L. Cinque, and S Levialdi.-Indexing pictorial documents by their content: a survey of current techniques.-Image and Vision Computing, (15):119–141, 1997.
A. Del Bimbo, E. Vicario, and D. Zingoni.-Symbolic description and visual querying of image sequences using spatio-temporal logic.-IEEE Trans. on Knowledge and Data Engineering, 7(4):609–621, August 1995.
M. Flickner et al.-Query by image and video content: the QBIC system.-IEEE Computer, pages 23–32, Sept. 1995.
E. FranÇois and P. Bouthemy.-Derivation of qualitative information in motion analysis.-Image and Vision Computing, 8(4):279–287, Nov. 1990.
M. Gelgon and P. Bouthemy.-A region-level graph labeling approach to motionbased segmentation.-In Proc. of Conf. on Computer Vision and Pattern Recognition, pages 514–519, Puerto-Rico, June 1997.
F. Idris and S. Panchanathan.-Review of image and video indexing techniques.-Jal of Visual Communication and Image Representation, 8(2):146–166, June 1997.
M. Irani, P. Anandan, J. Bergen, R. Kumar, and S. Hsu.-Efficient representations of video sequences and their applications.-Signal Processing: Image Communication, (8):327–351, 1996.
M. Irani, B. Rousso, and S. Peleg.-Detecting and tracking multiple moving objects using temporal integration.-In Proc. of Second European Conference on Computer Vision, pages 282–287, Santa Margherita Ligure, Italy, May 1992.
A. Nagasaka and Y. Tanaka.-Automatic video indexing and full-video search for objects appearances.-Visual Database Systems II, pages 113–127, 1992.-E. Knuth and L.M. Wegner (eds.), Elsevier Science Publ.
J.M Odobez and P. Bouthemy.-Robust multiresolution estimation of parametric motion models.-Jal of Visual Communication and Image Representation, 6(4):348–365, December 1995.
N.V. Patel and I.K. Sethi.-Video shot detection and characterization for video databases.-Pattern Recognition, 30(4):607–625, April 1997.
B. Rousso, S. Peleg, I. Finci, and A. Rav-Acha.-Universal mosaicing using pipe projection.-In Proc. of IEEE International Conf. on Computer Vision (ICCV'98), pages 945–952, Bombay, India, January 1999.
H. Sawhney and R. Kumar.-True multi image alignment and its application to mosaicing and lens distorsion correction.-In Proc. of Conf. on Computer Vision and Pattern Recognition, pages 450–456, Puerto-Rico, June 1997.
C. Schmid and R. Mohr.-Combining greyvalue invariants with local constraints for object recognition.-In Proc. of Conf. on Computer Vision and Pattern Recognition, pages 872–877, San Francisco, USA., June 1996.
M.A Smith and T. Kanade.-Video skimming and characterization through the combination of image and language understanding techniques.-In Proc. of Conf. on Computer Vision and Pattern Recognition, pages 775–781, Puerto-Rico, June 1997.
C. Stiller.-Object-oriented estimation of dense motion fields.-IEEE Trans. on Image Processing, 6(2), February 1997.
J.Y.A Wang and E.H Adelson.-Representing moving images with layers.-IEEE Trans. on Image Processing, 3(5):625–638, September 1994.
H.J Zhang, A. Kankanhalli, and S.W. Smoliar.-Automatic partitioning of fullmotion video.-Multimedia Systems, 1:10–28, 1993.
H.J Zhang, J. Wu, D. Zhong, and S.W. Smoliar.-An integrated system for content-based video retrieval and browsing.-Pattern Recognition, 30(4):643–658, April 1997.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gelgon, M., Bouthemy, P. (1998). Determining a structured spatio-temporal representation of video content for efficient visualization and indexing. In: Burkhardt, H., Neumann, B. (eds) Computer Vision — ECCV'98. ECCV 1998. Lecture Notes in Computer Science, vol 1406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0055692
Download citation
DOI: https://doi.org/10.1007/BFb0055692
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64569-6
Online ISBN: 978-3-540-69354-3
eBook Packages: Springer Book Archive