Discovering Video Clusters from Visual Features and Noisy Tags

Vahdat, Arash; Zhou, Guang-Tong; Mori, Greg

doi:10.1007/978-3-319-10599-4_34

Arash Vahdat¹⁹,
Guang-Tong Zhou¹⁹ &
Greg Mori¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8694))

Included in the following conference series:

European Conference on Computer Vision

17k Accesses
5 Citations

Abstract

We present an algorithm for automatically clustering tagged videos. Collections of tagged videos are commonplace, however, it is not trivial to discover video clusters therein. Direct methods that operate on visual features ignore the regularly available, valuable source of tag information. Solely clustering videos on these tags is error-prone since the tags are typically noisy. To address these problems, we develop a structured model that considers the interaction between visual features, video tags and video clusters. We model tags from visual features, and correct noisy tags by checking visual appearance consistency. In the end, videos are clustered from the refined tags as well as the visual features. We learn the clustering through a max-margin framework, and demonstrate empirically that this algorithm can produce more accurate clustering results than baseline methods based on tags or visual features, or both. Further, qualitative results verify that the clustering results can discover sub-categories and more specific instances of a given video category.

Download to read the full chapter text

Chapter PDF

Feature Clustering with Fading Affect Bias: Building Visual Vocabularies on the Fly

Efficient Object Localization and Segmentation in Weakly Labeled Videos

A methodology for image annotation of human actions in videos

Article 20 June 2020

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

YouTube: Statistics - youtube (2014) (accessed February 27, 2014)
Google Scholar
Kläser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: BMVC (2008)
Google Scholar
Wang, Y., Jiang, H., Drew, M.S., Li, Z.N., Mori, G.: Unsupervised discovery of action classes. In: CVPR (2006)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. In: BMVC (2006)
Google Scholar
Zeng, H.J., He, Q.C., Chen, Z., Ma, W.Y.: Learning to cluster search results. In: SIGIR (2004)
Google Scholar
Schroff, F., Zitnick, C.L., Baker, S.: Clustering videos by location. In: BMVC (2009)
Google Scholar
Hsu, C.F., Caverlee, J., Khabiri, E.: Hierarchical comments-based clustering. In: SAC (2011)
Google Scholar
Zhou, G.T., Lan, T., Vahdat, A., Mori, G.: Latent maximum margin clustering. In: NIPS (2013)
Google Scholar
Vahdat, A., Mori, G.: Handling uncertain tags in visual recognition. In: ICCV (2013)
Google Scholar
Over, P., Awad, G., Michel, M., Fiscus, J., Kraaij, W., Smeaton, A.F., Quenot, G.: TRECVID 2011 — an overview of the goals, tasks, data, evaluation mechansims and metrics. In: TRECVID (2011)
Google Scholar
Natarajan, P., Wu, S., Vitaladevuni, S.N.P., Zhuang, X., Tsakalidis, S., Park, U., Prasad, R., Natarajan, P.: Multimodal feature fusion for robust event detection in web videos. In: CVPR (2012)
Google Scholar
Izadinia, H., Shah, M.: Recognizing complex events using large margin joint low-level event model. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 430–444. Springer, Heidelberg (2012)
Chapter Google Scholar
Qi, G.J., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J.: Correlative multi-label video annotation. In: ACM MM (2007)
Google Scholar
Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: NIPS (2004)
Google Scholar
Valizadegan, H., Jin, R.: Generalized maximum margin clustering and unsupervised kernel learning. In: NIPS (2006)
Google Scholar
Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum margin clustering made practical. In: ICML (2007)
Google Scholar
Zhao, B., Wang, F., Zhang, C.: Efficient multiclass maximum margin clustering. In: ICML (2008)
Google Scholar
Yang, W., Toderici, G.: Discriminative tag learning on youtube videos with latent sub-tags. In: CVPR (2011)
Google Scholar
Hoai, M., Zisserman, A.: Discriminative sub-categorization. In: CVPR (2013)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: NIPS (2001)
Google Scholar
Do, T.M.T., Artières, T.: Large margin training for hidden markov models with partially observed states. In: ICML (2009)
Google Scholar
Vedaldi, A., Zisserman, A.: Efficient additive kernels via explicit feature maps. IEEE Trans. on Pattern Analysis and Machine Intelligence 34(3), 480–492 (2012)
Article Google Scholar
Kvalseth, T.O.: Entropy and correlation: Some comments. IEEE Transactions on Systems, Man and Cybernetics 17(3), 517–519 (1987)
Article Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing Science, Simon Fraser University, Canada
Arash Vahdat, Guang-Tong Zhou & Greg Mori

Authors

Arash Vahdat
View author publications
You can also search for this author in PubMed Google Scholar
Guang-Tong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Greg Mori
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 234 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vahdat, A., Zhou, GT., Mori, G. (2014). Discovering Video Clusters from Visual Features and Noisy Tags. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8694. Springer, Cham. https://doi.org/10.1007/978-3-319-10599-4_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-10599-4_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10598-7
Online ISBN: 978-3-319-10599-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Discovering Video Clusters from Visual Features and Noisy Tags

Abstract

Chapter PDF

Similar content being viewed by others

Feature Clustering with Fading Affect Bias: Building Visual Vocabularies on the Fly

Efficient Object Localization and Segmentation in Weakly Labeled Videos

A methodology for image annotation of human actions in videos

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 234 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Discovering Video Clusters from Visual Features and Noisy Tags

Abstract

Chapter PDF

Similar content being viewed by others

Feature Clustering with Fading Affect Bias: Building Visual Vocabularies on the Fly

Efficient Object Localization and Segmentation in Weakly Labeled Videos

A methodology for image annotation of human actions in videos

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Electronic Supplementary Material (PDF 234 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation