Category-Specific Video Summarization

Potapov, Danila; Douze, Matthijs; Harchaoui, Zaid; Schmid, Cordelia

doi:10.1007/978-3-319-10599-4_35

Danila Potapov¹⁹,
Matthijs Douze¹⁹,
Zaid Harchaoui¹⁹ &
…
Cordelia Schmid¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8694))

Included in the following conference series:

European Conference on Computer Vision

19k Accesses
198 Citations

Abstract

In large video collections with clusters of typical categories, such as “birthday party” or “flash-mob”, category-specific video summarization can produce higher quality video summaries than unsupervised approaches that are blind to the video category.

Given a video from a known category, our approach first efficiently performs a temporal segmentation into semantically-consistent segments, delimited not only by shot boundaries but also general change points. Then, equipped with an SVM classifier, our approach assigns importance scores to each segment. The resulting video assembles the sequence of segments with the highest scores. The obtained video summary is therefore both short and highly informative. Experimental results on videos from the multimedia event detection (MED) dataset of TRECVID’11 show that our approach produces video summaries with higher relevance than the state of the art.

Download to read the full chapter text

Chapter PDF

Cluster-Based Video Summarization with Temporal Context Awareness

Creating Summaries from User Videos

Static Video Summarization: A Comparative Study of Clustering-Based Techniques

Keywords

References

Liu, Y., Zhou, F., Liu, W., De la Torre, F., Liu, Y.: Unsupervised summarization of rushes videos. In: ACM Multimedia (2010)
Google Scholar
de Avila, S., Lopes, A., et al.: VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters 32(1), 56–68 (2011)
Article Google Scholar
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)
Google Scholar
Wang, M., Hong, R., Li, G., Zha, Z.J., Yan, S., Chua, T.S.: Event driven web video summarization by tag localization and key-shot identification. Transactions on Multimedia 14(4), 975–985 (2012)
Article Google Scholar
Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: CVPR (2013)
Google Scholar
Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)
Google Scholar
Truong, B.T., Venkatesh, S.: Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications 3(1), 3 (2007)
Article Google Scholar
Over, P., Smeaton, A.F., Awad, G.: The Trecvid 2008 BBC rushes summarization evaluation. In: 2nd ACM TRECVID Video Summarization Workshop (2008)
Google Scholar
Ma, Y.F., Hua, X.S., Lu, L., Zhang, H.J.: A generic framework of user attention model and its application in video summarization. Transactions on Multimedia (2005)
Google Scholar
Li, K., Oh, S., Perera, A.G.A., Fu, Y.: A videography analysis framework for video retrieval and summarization. In: BMVC (2012)
Google Scholar
Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. Circuits and Systems for Video Technology 15(2) (2005)
Google Scholar
Divakaran, A., Peker, K., Radhakrishnan, R., Xiong, Z., Cabasson, R.: Video summarization using Mpeg-7 motion activity and audio descriptors. In: Video Mining, vol. 6. Springer (2003)
Google Scholar
Xie, L., Xu, P., Chang, S.F., Divakaran, A., Sun, H.: Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recognition Letters 25(7) (2004)
Google Scholar
Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for TV baseball programs. In: ACM Multimedia (2000)
Google Scholar
Sundaram, H., Xie, L., Chang, S.F.: A utility framework for the automatic generation of audio-visual skims. In: ACM Multimedia (2002)
Google Scholar
Zhao, B., Xing, E.P.: Quasi real-time summarization for consumer videos. In: CVPR (2014)
Google Scholar
Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. Transactions on Multimedia (2012)
Google Scholar
Kim, G., Sigal, L., Xing, E.P.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: CVPR (2014)
Google Scholar
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: ACL Workshop on Text Summarization Branches, pp. 74–81 (2004)
Google Scholar
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Transactions on Graphics 24(3), 577–584 (2005)
Article Google Scholar
Tighe, J., Lazebnik, S.: SuperParsing: Scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Chapter Google Scholar
Lezama, J., Alahari, K., Sivic, J., Laptev, I.: Track to the future: Spatio-temporal video segmentation with long-range motion cues. In: CVPR (2011)
Google Scholar
Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: CVPR (2010)
Google Scholar
Massoudi, A., Lefebvre, F., Demarty, C.H., Oisel, L., Chupeau, B.: A video fingerprint based on visual digest and local fingerprints. In: ICIP (2006)
Google Scholar
Chasanis, V., Kalogeratos, A., Likas, A.: Movie segmentation into scenes and chapters using locally weighted bag of visual words. In: CIVR (2009)
Google Scholar
Kay, S.M.: Fundamentals of Statistical signal processing, vol. 2: Detection theory. Prentice Hall PTR (1998)
Google Scholar
Harchaoui, Z., Bach, F., Moulines, E.: Kernel change-point analysis. In: NIPS (2008)
Google Scholar
Harchaoui, Z., Cappé, O.: Retrospective mutiple change-point estimation with kernels. In: IEEE Workshop on Statistical Signal Processing, pp. 768–772 (2007)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer (2009)
Google Scholar
Arlot, S., Celisse, A., Harchaoui, Z.: Kernel change-point detection. arXiv:1202.3878 (2012)
Google Scholar
Crow, F.C.: Summed-area tables for texture mapping. ACM SIGGRAPH Computer Graphics 18, 207–212 (1984)
Article Google Scholar
Oneata, D., Verbeek, J., Schmid, C.: Action and Event Recognition with Fisher Vectors on a Compact Feature Set. In: ICCV (2013)
Google Scholar
Cao, L., Mu, Y., Natsev, A., Chang, S.-F., Hua, G., Smith, J.R.: Scene aligned pooling for complex video recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 688–701. Springer, Heidelberg (2012)
Chapter Google Scholar
Gaidon, A., Harchaoui, Z., Schmid, C.: Temporal localization with actoms. PAMI (2013)
Google Scholar
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Chapter Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV (2013)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, Cambridge, vol. 1 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Inria, France
Danila Potapov, Matthijs Douze, Zaid Harchaoui & Cordelia Schmid

Authors

Danila Potapov
View author publications
You can also search for this author in PubMed Google Scholar
Matthijs Douze
View author publications
You can also search for this author in PubMed Google Scholar
Zaid Harchaoui
View author publications
You can also search for this author in PubMed Google Scholar
Cordelia Schmid
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Potapov, D., Douze, M., Harchaoui, Z., Schmid, C. (2014). Category-Specific Video Summarization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-10599-4_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Category-Specific Video Summarization

Abstract

Chapter PDF

Similar content being viewed by others

Cluster-Based Video Summarization with Temporal Context Awareness

Creating Summaries from User Videos

Static Video Summarization: A Comparative Study of Clustering-Based Techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Category-Specific Video Summarization

Abstract

Chapter PDF

Similar content being viewed by others

Cluster-Based Video Summarization with Temporal Context Awareness

Creating Summaries from User Videos

Static Video Summarization: A Comparative Study of Clustering-Based Techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation