Abstract
The bag-of-words approach with local spatio-temporal features have become a popular video representation for action recognition in videos. Together these techniques have demonstrated high recognition results for a number of action classes. Recent approaches have typically focused on capturing global statistics of features. However, existing methods ignore relations between features and thus may not be discriminative enough. Therefore, we propose a novel feature representation which captures statistics of pairwise co-occurring local spatio-temporal features. Our representation captures not only global distribution of features but also focuses on geometric and appearance (both visual and motion) relations among the features. Calculating a set of bag-of-words representations with different geometrical arrangement among the features, we keep an important association between appearance and geometric information. Using two benchmark datasets for human action recognition, we demonstrate that our representation enhances the discriminative power of features and improves action recognition performance.
Chapter PDF
Similar content being viewed by others
References
Davis, J.: Hierarchical motion history images for recognizing human motion. In: IEEE Workshop on Detection and Recognition of Events in Video (2001)
Ahad, M., Tan, J., Kim, H., Ishikawa, S.: Motion history image: its variants and applications. Machine Vision and Applications (2010)
Aggarwal, J.K., Cai, Q.: Human motion analysis: a review. CVIU (1999)
Kim, T.-S., Uddin, Z.: In: Silhouette-based Human Activity Recognition Using Independent Component Analysis, Linear Discriminant Analysis and Hidden Markov Model. InTech (2010)
Lin, Z., Jiang, Z., Davis, L.S.: Recognizing actions by shape-motion prototype trees. In: ICCV (2009)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: ICCV (2009)
Raptis, M., Soatto, S.: Tracklet Descriptors for Action Modeling and Video Analysis. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 577–590. Springer, Heidelberg (2010)
Kaaniche, M.-B., Bremond, F.: Gesture recognition by learning local motion signatures. In: CVPR (2010)
Wang, H., Klaser, A., Schmid, C., Cheng-Lin, L.: Action recognition by dense trajectories. In: CVPR (2011)
Laptev, I.: On space-time interest points. IJCV (2005)
Rapantzikos, K., Avrithis, Y., Kollias, S.: Dense saliency-based spatiotemporal feature points for action recognition. In: CVPR (2009)
Klaser, A., Marszalek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: BMVC (2008)
Gilbert, A., Illingworth, J., Bowden, R.: Fast realistic multi-action recognition using mined dense spatio-temporal features. In: ICCV (2009)
Liu, J., Shah, M.: Learning human actions via information maximization. In: CVPR (2008)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, in Conjunction with ICCV (2005)
Willems, G., Tuytelaars, T., Van Gool, L.: An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 650–663. Springer, Heidelberg (2008)
Wang, H., Ullah, M.M., Klaser, A., Laptev, I., Schmid, C.: Evaluation of local spatio-temporal features for action recognition. In: BMVC (2009)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Gupta, A., Davis, L.S.: Objects in action: An approach for combining action understanding and object perception. In: CVPR (2007)
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)
Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T.-S., Li, J.: Hierarchical spatio-temporal context modeling for action recognition. In: CVPR (2009)
Wang, J., Chen, Z., Wu, Y.: Action recognition with multiscale spatio-temporal contexts. In: CVPR (2011)
Banerjee, P., Nevatia, R.: Learning neighborhood co-occurrence statistics of sparse features for human activity recognition. In: AVSS (2011)
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: CVPR (2010)
Oikonomopoulos, A., Patras, I., Pantic, M.: An implicit spatiotemporal shape model for human activity localisation and recognition. In: Workshop on Human Communicative Behaviour Analysis, in Conjunction with CVPR (2009)
Ta, A.P., Wolf, C., Lavoue, G., Baskurt, A., Jolion, J.-M.: Pairwise features for human action recognition. In: ICPR (2010)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR (2004)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos ”in the wild”. In: CVPR (2009)
Wu, X., Xu, D., Duan, L., Luo, J.: Action recognition using context and appearance distribution features. In: CVPR (2011)
Kim, T.-K., Wong, S.-F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: CVPR (2007)
Wu, S., Oreifej, O., Shah, M.: Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: ICCV (2011)
Jiang, Z., Lin, Z., Davis, L.: Recognizing human actions by learning and matching shape-motion prototype trees. PAMI (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bilinski, P., Bremond, F. (2012). Statistics of Pairwise Co-occurring Local Spatio-temporal Features for Human Action Recognition. In: Fusiello, A., Murino, V., Cucchiara, R. (eds) Computer Vision – ECCV 2012. Workshops and Demonstrations. ECCV 2012. Lecture Notes in Computer Science, vol 7583. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33863-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-33863-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33862-5
Online ISBN: 978-3-642-33863-2
eBook Packages: Computer ScienceComputer Science (R0)