Abstract
Human action classification is an important task in computer vision. The Bag-of-Words model is a representation method very used in action classification techniques. In this work we propose an approach based on mid-level features representation for human action description. First, an optimal vocabulary is created without a preliminary number of visual words, which is a known problem of the K-means method. We introduce a graph-based video representation using the interest points relationships, in order to take into account the spatial and temporal layout. Finally, a second visual vocabulary based on n-grams is used for classification. This combines the representational power of graphs with the efficiency of the bag-of-words representation. The representation method was tested on the KTH dataset using STIP and MoSIFT descriptors and multi-class SVM with a chi-square kernel. The experimental results show that our approach using STIP descriptor outperforms the best results of state-of-art, meanwhile using MoSIFT descriptor are comparable to them.
Chapter PDF
Similar content being viewed by others
References
Acosta-Mendoza, N., Gago-Alonso, A., Medina-Pagola, J.E.: Frequent approximate subgraphs as features for graph-based image classification. Knowledge-Based Systems 27, 381–392 (2012)
Chakraborty, B., Holte, M.B., Moeslund, T.B., Gonzàlez, J.: Selective spatio-temporal interest points. Computer Vision and Image Understanding 116(3), 396–410 (2012)
Chen, M.Y., Hauptmann, A.: Mosift: Recognizing human actions in surveillance videos. Research Showcase 929, Carnegie Mellon University. School of Computer Science. Computer Science Department (2009)
Cózar, J.R., Hernández, R., Heredia, Y., González-Linares, J.M., Guil, N.: Reducing Vocabulary Size in Human Action Classification. In: Frontiers in Artificial Intelligence and Applications, vol. 243, pp. 1712–1719. IOS Press (2012)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley Interscience (2001)
Fiedler, M., Borgelt, C.: Support computation for mining frequent subgraphs in a single graph. In: Proceedings of MLG-2007: 5th International Workshop on Mining and Learning with Graphs, pp. 1–6 (2007)
Gao, Z., Chen, M.-Y., Hauptmann, A.G., Cai, A.: Comparing Evaluation Protocols on the KTH Dataset. In: Salah, A.A., Gevers, T., Sebe, N., Vinciarelli, A. (eds.) HBU 2010. LNCS, vol. 6219, pp. 88–100. Springer, Heidelberg (2010)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proceedings of the Ninth IEEE International Conference on Computer Vision (ICCV 2003), vol. 1, pp. 432–439 (2003)
Laptev, I., Marszaek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR0 2008, pp. 1–8 (2008)
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1–8 (2009)
Morales-González, A., García-Reyes, E.: Assessing the role of spatial relations for the object recognition task. In: Bloch, I., Cesar Jr., R.M. (eds.) CIARP 2010. LNCS, vol. 6419, pp. 549–556. Springer, Heidelberg (2010)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning and of human and action categories and using and spatial-temporal words. International Journal on Computer Vision (79), 299–318 (2008)
Özdemir, B., Aksoy, S.: Image classification using subgraph histogram representation. In: ICPR 2010, pp. 1112–1115 (August 2010)
Schüldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36 (August 2004)
Thi, T.H., Cheng, L., Zhang, J., Wang, L., Satoh, S.: Structured learning of local features for human action classification and localization. Image and Vision Computing 30(1), 1–14 (2012)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision 103(1), 60–79 (2013)
Wiliem, A., Madasu, V.K., Boles, W.W., Yarlagadda, P.K.: Detecting uncommon and trajectories. In: Digital Image and Computing: Techniques and Applications, DICTA (December 2008)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: IEEE International Conference on Data Mining, ICDM 2002, pp. 721–724 (2002)
Zhang, S., Tian, Q., Hua, G., Huang, Q., Li, S.: Descriptive visual words and visual phrases for image applications. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 75–84 (October 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hernández-García, R., García-Reyes, E., Ramos-Cózar, J., Guil, N. (2014). Human Action Classification Using N-Grams Visual Vocabulary. In: Bayro-Corrochano, E., Hancock, E. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2014. Lecture Notes in Computer Science, vol 8827. Springer, Cham. https://doi.org/10.1007/978-3-319-12568-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-12568-8_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12567-1
Online ISBN: 978-3-319-12568-8
eBook Packages: Computer ScienceComputer Science (R0)