Abstract
We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk” then “sit” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amer, M.R., Todorovic, S., Fern, A., Zhu, S.C.: Monte carlo tree search for scheduling activity recognition. In: ICCV (2013)
Bach, F., Harchaoui, Z.: DIFFRAC: a discriminative and flexible framework for clustering. In: NIPS (2007)
Bertsekas, D.: Nonlinear Programming. Athena Scientific (1999)
Bojanowski, P., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Finding Actors and Actions in Movies. In: ICCV (2013)
Bojanowski, P., Lajugie, R., Bach, F., Laptev, I., Ponce, J., Schmid, C., Sivic, J.: Weakly Supervised Action Labeling in Videos Under Ordering Constraints. In: arXiv (2014)
Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: ICCV (2009)
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Research Logistics Quarterly (1956)
Gold, B., Morgan, N., Ellis, D.: Speech and Audio Signal Processing - Processing and Perception of Speech and Music, Second Edition. Wiley (2011)
Guo, Y., Schuurmans, D.: Convex Relaxations of Latent Variable Training. In: NIPS (2007)
Harchaoui, Z.: Conditional gradient algorithms for machine learning. In: NIPS Workshop (2012)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: data mining, inference and prediction. Springer (2009)
Hongeng, S., Nevatia, R.: Large-scale event detection using semi-hidden markov models. In: ICCV (2003)
Hubert, L., Arabie, P.: Comparing partitions. Journal of classification (1985)
Ivanov, Y.A., Bobick, A.F.: Recognition of visual activities and interactions by stochastic parsing. PAMI (2000)
Jaccard, P.: The distribution of the flora in the alpine zone. New Phytologist (1912)
Jaggi, M.: Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In: ICML (2013)
Joulin, A., Bach, F., Ponce, J.: Discriminative Clustering for Image Co-segmentation. In: CVPR (2010)
Joulin, A., Bach, F., Ponce, J.: Multi-class cosegmentation. In: CVPR (2012)
Khamis, S., Morariu, V.I., Davis, L.S.: Combining per-frame and per-track cues for multi-person action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 116–129. Springer, Heidelberg (2012)
Kwak, S., Han, B., Han, J.H.: Scenario-based video event recognition by constraint flow. In: CVPR (2011)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Laxton, B., Lim, J., Kriegman, D.J.: Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In: CVPR (2007)
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: CVPR (2011)
Nguyen, M.H., Lan, Z.Z., la Torre, F.D.: Joint segmentation and classification of human actions in video. In: CVPR (2011)
Niebles, J.C., Chen, C.-W., Fei-Fei, L.: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 392–405. Springer, Heidelberg (2010)
Rabiner, L.R., Juang, B.H.: Fundamentals of speech recognition. Prentice Hall (1993)
Rohrbach, M., Regneri, M., Andriluka, M., Amin, S., Pinkal, M., Schiele, B.: Script Data for Attribute-Based Recognition of Composite Activities. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 144–157. Springer, Heidelberg (2012)
Ryoo, M.S., Aggarwal, J.K.: Recognition of composite human activities through context-free grammar based representation. In: CVPR (2006)
Sadanand, S., Corso, J.J.: Action bank: A high-level representation of activity in video. In: CVPR (2012)
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. In: CVPR (1997)
Sivic, J., Everingham, M., Zisserman, A.: “Who are you?” - Learning person specific classifiers from video. In: CVPR (2009)
Tang, K., Fei-Fei, L., Koller, D.: Learning latent temporal structure for complex event detection. In: CVPR (2012)
Vu, V.T., Bremond, F., Thonnat, M.: Automatic video interpretation: A novel algorithm for temporal scenario recognition. In: IJCAI (2003)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: CVPR (2011)
Wang, H., Schmid, C.: Action Recognition with Improved Trajectories. In: ICCV (2013)
Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum Margin Clustering. In: NIPS (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bojanowski, P. et al. (2014). Weakly Supervised Action Labeling in Videos under Ordering Constraints. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-10602-1_41
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)