Weakly Supervised Action Labeling in Videos under Ordering Constraints

Bojanowski, Piotr; Lajugie, Rémi; Bach, Francis; Laptev, Ivan; Ponce, Jean; Schmid, Cordelia; Sivic, Josef

doi:10.1007/978-3-319-10602-1_41

Piotr Bojanowski¹⁹,
Rémi Lajugie¹⁹,
Francis Bach¹⁹,
Ivan Laptev¹⁹,
Jean Ponce²⁰,
Cordelia Schmid¹⁹ &
…
Josef Sivic¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8693))

Included in the following conference series:

European Conference on Computer Vision

23k Accesses
66 Citations

Abstract

We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk” then “sit” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies.

Download to read the full chapter text

Chapter PDF

Learning Actionness via Long-Range Temporal Order Verification

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

Weakly-supervised temporal action localization: a survey

Article 07 March 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

http://www.di.ens.fr/willow/research/actionordering/
Amer, M.R., Todorovic, S., Fern, A., Zhu, S.C.: Monte carlo tree search for scheduling activity recognition. In: ICCV (2013)
Google Scholar
Bach, F., Harchaoui, Z.: DIFFRAC: a discriminative and flexible framework for clustering. In: NIPS (2007)
Google Scholar
Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)
Google Scholar
Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding Actors and Actions in Movies. In: ICCV (2013)
Google Scholar
Bojanowski, P., Lajugie, R., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Weakly Supervised Action Labeling in Videos Under Ordering Constraints. In: arXiv (2014)
Google Scholar
Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV (2009)
Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Research Logistics Quarterly (1956)
Google Scholar
Gold, B., Morgan, N., Ellis, D.: Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition. Wiley (2011)
Google Scholar
Guo, Y., Schuurmans, D.: Convex Relaxations of Latent Variable Training. In: NIPS (2007)
Google Scholar
Harchaoui, Z.: Conditional gradient algorithms for machine learning. In: NIPS Workshop (2012)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference and prediction. Springer (2009)
Google Scholar
Hongeng, S., Nevatia, R.: Large-scale event detection using semi-hidden markov models. In: ICCV (2003)
Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. Journal of classification (1985)
Google Scholar
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. PAMI (2000)
Google Scholar
Jaccard, P.: The distribution of the flora in the alpine zone. New Phytologist (1912)
Google Scholar
Jaggi, M.: Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In: ICML (2013)
Google Scholar
Joulin, A., Bach, F., Ponce, J.: Discriminative Clustering for Image Co-segmentation. In: CVPR (2010)
Google Scholar
Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: CVPR (2012)
Google Scholar
Khamis, S., Morariu, V.I., Davis, L.S.: Combining per-frame and per-track cues for multi-person action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 116–129. Springer, Heidelberg (2012)
Chapter Google Scholar
Kwak, S., Han, B., Han, J.H.: Scenario-based video event recognition by constraint flow. In: CVPR (2011)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Google Scholar
Laxton, B., Lim, J., Kriegman, D.J.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: CVPR (2007)
Google Scholar
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
Google Scholar
Nguyen, M.H., Lan, Z.Z., la Torre, F.D.: Joint segmentation and classification of human actions in video. In: CVPR (2011)
Google Scholar
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)
Chapter Google Scholar
Rabiner, L.R., Juang, B.H.: Fundamentals of speech recognition. Prentice Hall (1993)
Google Scholar
Rohrbach, M., Regneri, M., Andriluka, M., Amin, S., Pinkal, M., Schiele, B.: Script Data for Attribute-Based Recognition of Composite Activities. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 144–157. Springer, Heidelberg (2012)
Chapter Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: CVPR (2006)
Google Scholar
Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: CVPR (2012)
Google Scholar
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. In: CVPR (1997)
Google Scholar
Sivic, J., Everingham, M., Zisserman, A.: “Who are you?” - Learning person specific classifiers from video. In: CVPR (2009)
Google Scholar
Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR (2012)
Google Scholar
Vu, V.T., Bremond, F., Thonnat, M.: Automatic video interpretation: A novel algorithm for temporal scenario recognition. In: IJCAI (2003)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
Google Scholar
Wang, H., Schmid, C.: Action Recognition with Improved Trajectories. In: ICCV (2013)
Google Scholar
Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum Margin Clustering. In: NIPS (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, France
Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Cordelia Schmid & Josef Sivic
École Normale Supérieure, France
Jean Ponce

Authors

Piotr Bojanowski
View author publications
You can also search for this author in PubMed Google Scholar
Rémi Lajugie
View author publications
You can also search for this author in PubMed Google Scholar
Francis Bach
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Laptev
View author publications
You can also search for this author in PubMed Google Scholar
Jean Ponce
View author publications
You can also search for this author in PubMed Google Scholar
Cordelia Schmid
View author publications
You can also search for this author in PubMed Google Scholar
Josef Sivic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bojanowski, P. et al. (2014). Weakly Supervised Action Labeling in Videos under Ordering Constraints. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-10602-1_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Weakly Supervised Action Labeling in Videos under Ordering Constraints

Abstract

Chapter PDF

Similar content being viewed by others

Learning Actionness via Long-Range Temporal Order Verification

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

Weakly-supervised temporal action localization: a survey

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Weakly Supervised Action Labeling in Videos under Ordering Constraints

Abstract

Chapter PDF

Similar content being viewed by others

Learning Actionness via Long-Range Temporal Order Verification

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

Weakly-supervised temporal action localization: a survey

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation