Abstract
In large video collections with clusters of typical categories, such as “birthday party” or “flash-mob”, category-specific video summarization can produce higher quality video summaries than unsupervised approaches that are blind to the video category.
Given a video from a known category, our approach first efficiently performs a temporal segmentation into semantically-consistent segments, delimited not only by shot boundaries but also general change points. Then, equipped with an SVM classifier, our approach assigns importance scores to each segment. The resulting video assembles the sequence of segments with the highest scores. The obtained video summary is therefore both short and highly informative. Experimental results on videos from the multimedia event detection (MED) dataset of TRECVID’11 show that our approach produces video summaries with higher relevance than the state of the art.
Chapter PDF
Similar content being viewed by others
References
Liu, Y., Zhou, F., Liu, W., De la Torre, F., Liu, Y.: Unsupervised summarization of rushes videos. In: ACM Multimedia (2010)
de Avila, S., Lopes, A., et al.: VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters 32(1), 56–68 (2011)
Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: CVPR (2012)
Wang, M., Hong, R., Li, G., Zha, Z.J., Yan, S., Chua, T.S.: Event driven web video summarization by tag localization and key-shot identification. Transactions on Multimedia 14(4), 975–985 (2012)
Khosla, A., Hamid, R., Lin, C.J., Sundaresan, N.: Large-scale video summarization using web-image priors. In: CVPR (2013)
Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: CVPR (2013)
Truong, B.T., Venkatesh, S.: Video abstraction: A systematic review and classification. ACM Transactions on Multimedia Computing, Communications, and Applications 3(1), 3 (2007)
Over, P., Smeaton, A.F., Awad, G.: The Trecvid 2008 BBC rushes summarization evaluation. In: 2nd ACM TRECVID Video Summarization Workshop (2008)
Ma, Y.F., Hua, X.S., Lu, L., Zhang, H.J.: A generic framework of user attention model and its application in video summarization. Transactions on Multimedia (2005)
Li, K., Oh, S., Perera, A.G.A., Fu, Y.: A videography analysis framework for video retrieval and summarization. In: BMVC (2012)
Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. Circuits and Systems for Video Technology 15(2) (2005)
Divakaran, A., Peker, K., Radhakrishnan, R., Xiong, Z., Cabasson, R.: Video summarization using Mpeg-7 motion activity and audio descriptors. In: Video Mining, vol. 6. Springer (2003)
Xie, L., Xu, P., Chang, S.F., Divakaran, A., Sun, H.: Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recognition Letters 25(7) (2004)
Rui, Y., Gupta, A., Acero, A.: Automatically extracting highlights for TV baseball programs. In: ACM Multimedia (2000)
Sundaram, H., Xie, L., Chang, S.F.: A utility framework for the automatic generation of audio-visual skims. In: ACM Multimedia (2002)
Zhao, B., Xing, E.P.: Quasi real-time summarization for consumer videos. In: CVPR (2014)
Cong, Y., Yuan, J., Luo, J.: Towards scalable summarization of consumer videos via sparse dictionary selection. Transactions on Multimedia (2012)
Kim, G., Sigal, L., Xing, E.P.: Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In: CVPR (2014)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: ACL Workshop on Text Summarization Branches, pp. 74–81 (2004)
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Transactions on Graphics 24(3), 577–584 (2005)
Tighe, J., Lazebnik, S.: SuperParsing: Scalable nonparametric image parsing with superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Lezama, J., Alahari, K., Sivic, J., Laptev, I.: Track to the future: Spatio-temporal video segmentation with long-range motion cues. In: CVPR (2011)
Grundmann, M., Kwatra, V., Han, M., Essa, I.: Efficient hierarchical graph-based video segmentation. In: CVPR (2010)
Massoudi, A., Lefebvre, F., Demarty, C.H., Oisel, L., Chupeau, B.: A video fingerprint based on visual digest and local fingerprints. In: ICIP (2006)
Chasanis, V., Kalogeratos, A., Likas, A.: Movie segmentation into scenes and chapters using locally weighted bag of visual words. In: CIVR (2009)
Kay, S.M.: Fundamentals of Statistical signal processing, vol. 2: Detection theory. Prentice Hall PTR (1998)
Harchaoui, Z., Bach, F., Moulines, E.: Kernel change-point analysis. In: NIPS (2008)
Harchaoui, Z., Cappé, O.: Retrospective mutiple change-point estimation with kernels. In: IEEE Workshop on Statistical Signal Processing, pp. 768–772 (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer (2009)
Arlot, S., Celisse, A., Harchaoui, Z.: Kernel change-point detection. arXiv:1202.3878 (2012)
Crow, F.C.: Summed-area tables for texture mapping. ACM SIGGRAPH Computer Graphics 18, 207–212 (1984)
Oneata, D., Verbeek, J., Schmid, C.: Action and Event Recognition with Fisher Vectors on a Compact Feature Set. In: ICCV (2013)
Cao, L., Mu, Y., Natsev, A., Chang, S.-F., Hua, G., Smith, J.R.: Scene aligned pooling for complex video recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 688–701. Springer, Heidelberg (2012)
Gaidon, A., Harchaoui, Z., Schmid, C.: Temporal localization with actoms. PAMI (2013)
Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV (2013)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, Cambridge, vol. 1 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Potapov, D., Douze, M., Harchaoui, Z., Schmid, C. (2014). Category-Specific Video Summarization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-10599-4_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)