Abstract
We present an algorithm for automatically clustering tagged videos. Collections of tagged videos are commonplace, however, it is not trivial to discover video clusters therein. Direct methods that operate on visual features ignore the regularly available, valuable source of tag information. Solely clustering videos on these tags is error-prone since the tags are typically noisy. To address these problems, we develop a structured model that considers the interaction between visual features, video tags and video clusters. We model tags from visual features, and correct noisy tags by checking visual appearance consistency. In the end, videos are clustered from the refined tags as well as the visual features. We learn the clustering through a max-margin framework, and demonstrate empirically that this algorithm can produce more accurate clustering results than baseline methods based on tags or visual features, or both. Further, qualitative results verify that the clustering results can discover sub-categories and more specific instances of a given video category.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
YouTube: Statistics - youtube (2014) (accessed February 27, 2014)
Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)
Wang, Y., Jiang, H., Drew, M.S., Li, Z.N., Mori, G.: Unsupervised discovery of action classes. In: CVPR (2006)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: BMVC (2006)
Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y.: Learning to cluster search results. In: SIGIR (2004)
Schroff, F., Zitnick, C.L., Baker, S.: Clustering videos by location. In: BMVC (2009)
Hsu, C.F., Caverlee, J., Khabiri, E.: Hierarchical comments-based clustering. In: SAC (2011)
Zhou, G.T., Lan, T., Vahdat, A., Mori, G.: Latent maximum margin clustering. In: NIPS (2013)
Vahdat, A., Mori, G.: Handling uncertain tags in visual recognition. In: ICCV (2013)
Over, P., Awad, G., Michel, M., Fiscus, J., Kraaij, W., Smeaton, A.F., Quenot, G.: TRECVID 2011 — an overview of the goals, tasks, data, evaluation mechansims and metrics. In: TRECVID (2011)
Natarajan, P., Wu, S., Vitaladevuni, S.N.P., Zhuang, X., Tsakalidis, S., Park, U., Prasad, R., Natarajan, P.: Multimodal feature fusion for robust event detection in web videos. In: CVPR (2012)
Izadinia, H., Shah, M.: Recognizing complex events using large margin joint low-level event model. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 430–444. Springer, Heidelberg (2012)
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: ACM MM (2007)
Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: NIPS (2004)
Valizadegan, H., Jin, R.: Generalized maximum margin clustering and unsupervised kernel learning. In: NIPS (2006)
Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum margin clustering made practical. In: ICML (2007)
Zhao, B., Wang, F., Zhang, C.: Efficient multiclass maximum margin clustering. In: ICML (2008)
Yang, W., Toderici, G.: Discriminative tag learning on youtube videos with latent sub-tags. In: CVPR (2011)
Hoai, M., Zisserman, A.: Discriminative sub-categorization. In: CVPR (2013)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2001)
Do, T.M.T., Artières, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. on Pattern Analysis and Machine Intelligence 34(3), 480–492 (2012)
Kvalseth, T.O.: Entropy and correlation: Some comments. IEEE Transactions on Systems, Man and Cybernetics 17(3), 517–519 (1987)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Vahdat, A., Zhou, GT., Mori, G. (2014). Discovering Video Clusters from Visual Features and Noisy Tags. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-10599-4_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)