Human Action Recognition from RGB-D Frames Based on Real-Time 3D Optical Flow Estimation

Ballin, Gioia; Munaro, Matteo; Menegatti, Emanuele

doi:10.1007/978-3-642-34274-5_17

Gioia Ballin⁵,
Matteo Munaro⁵ &
Emanuele Menegatti⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 196))

1831 Accesses
8 Citations

Abstract

Modern advances in the area of intelligent agents have led to the concept of cognitive robots. A cognitive robot is not only able to perceive complex stimuli from the environment, but also to reason about them and to act coherently. Computer vision-based recognition systems serve the perception task, but they also go beyond it by finding challenging applications in other fields such as video surveillance, HCI, content-based video analysis and motion capture. In this context, we propose an automatic system for real-time human action recognition. We use the Kinect sensor and the tracking system in [1] to robustly detect and track people in the scene. Next, we estimate the 3D optical flow related to the tracked people from point cloud data only and we summarize it by means of a 3D grid-based descriptor. Finally, temporal sequences of descriptors are classified with the Nearest Neighbor technique and the overall application is tested on a newly created dataset. Experimental results show the effectiveness of the proposed approach.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Global Flow and Temporal-Shape Descriptors for Human Action Recognition from 3D Reconstruction Data

A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors

Article 30 November 2015

Motion keypoint trajectory and covariance descriptor for human action recognition

Article 09 January 2017

Keywords

References

Munaro, M., Basso, F., Menegatti, E.: Tracking people withing groups with rgb-d data. In: Proc. of the International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Portugal (2012)
Google Scholar
Johansson, G.: Visual perception of biological motion and a model for its analysis. Attention, Perception, & Psychophysics 14, 201–211 (1973), 10.3758/BF03212378
Article Google Scholar
Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., Berger, E., Wheeler, R., Ng, A.: Ros: an open-source robot operating system. In: Proceedings of the IEEE International Conference on Robotics and Automation, ICRA (2009)
Google Scholar
Basso, F., Munaro, M., Michieletto, S., Pagello, E., Menegatti, E.: Fast and Robust Multi-People Tracking from RGB-D Data for a Mobile Robot. In: Lee, S., Cho, H., Yoon, K.-J., Lee, J. (eds.) Intelligent Autonomous Systems 12. AISC, vol. 193, pp. 269–281. Springer, Heidelberg (2012)
Google Scholar
Carlsson, S., Sullivan, J.: Action recognition by shape matching to key frames. In: IEEE Computer Society Workshop on Models versus Exemplars in Computer Vision (2001)
Google Scholar
Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 984–989 (June 2005)
Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proc. Tenth IEEE Int. Conf. Computer Vision ICCV 2005, vol. 2, pp. 1395–1402 (2005)
Google Scholar
Rusu, R.B., Bandouch, J., Meier, F., Essa, I.A., Beetz, M.: Human action recognition using global point feature histograms and action shapes. Advanced Robotics 23(14), 1873–1908 (2009)
Article Google Scholar
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (October 2003)
Google Scholar
Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. In: Sixth International Conference on Computer Vision, pp. 120–127 (January 1998)
Google Scholar
Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(2), 288–303 (2010)
Article Google Scholar
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: Proc. Tenth IEEE Int. Conf. Computer Vision ICCV 2005, vol. 1, pp. 166–173 (2005)
Google Scholar
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (June 2008)
Google Scholar
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA 2007, pp. 357–360. ACM, New York (2007)
Chapter Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Ninth IEEE Int. Computer Vision Conf., pp. 432–439 (2003)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition CVPR 2008, pp. 1–8 (2008)
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proc. 2nd Joint IEEE Int. Visual Surveillance and Performance Evaluation of Tracking and Surveillance Workshop, pp. 65–72 (2005)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proc. 17th Int. Conf. Pattern Recognition ICPR 2004, vol. 3, pp. 32–36 (2004)
Google Scholar
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision 79, 299–318 (2008)
Article Google Scholar
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference, pp. 995–1004 (September 2008)
Google Scholar
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)
Article Google Scholar
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14 (June 2010)
Google Scholar
Holte, M.B., Moeslund, T.B.: View invariant gesture recognition using 3d motion primitives. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, March 31-April 4, pp. 797–800 (2008)
Google Scholar
Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3d human action recognition for multi-view camera systems. In: 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), pp. 342–349 (May 2011)
Google Scholar
Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from rgbd images. In: Plan, Activity, and Intent Recognition. AAAI Workshops, vol. WS-11-16. AAAI (2011)
Google Scholar
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: International Conference on Robotics and Automation, ICRA (2012)
Google Scholar
Yang, X., Tian, Y.: Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: IEEE Workshop on CVPR for Human Activity Understanding from 3D Data (2012)
Google Scholar
Zhang, H., Parker, L.E.: 4-dimensional local spatio-temporal features for human activity recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2044–2049 (September 2011)
Google Scholar
Ni, P.B., Wang, G., Moulin, P.: Rgbd-hudaact: A color-depth video database for human daily activity recognition. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1147–1153 (November 2011)
Google Scholar
Popa, M., Koc, A.K., Rothkrantz, L.J.M., Shan, C., Wiggers, P.: Kinect Sensing of Shopping Related Actions. In: Wichert, R., Van Laerhoven, K., Gelissen, J. (eds.) AmI 2011. CCIS, vol. 277, pp. 91–100. Springer, Heidelberg (2012)
Chapter Google Scholar
Schindler, K., van Gool, L.: Action snippets: How many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (June 2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, The University of Padova, via Gradenigo 6B, 35131, Padova, Italy
Gioia Ballin, Matteo Munaro & Emanuele Menegatti

Authors

Gioia Ballin
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Munaro
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Menegatti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gioia Ballin .

Editor information

Editors and Affiliations

Department of Chemical, Management,, Computer, Mechanical Engineering, Università di Palermo, Viale delle Scienze, Building 6 6, Palermo, 90128, Italy
Antonio Chella
Department of Chemical, Management,, Computer, Mechanical Engineering, Università di Palermo, Viale delle Scienze, Building 6 6, Palermo, 90128, Italy
Roberto Pirrone
Department of Chemical, Management,, Computer, Mechanical Engineering, Università di Palermo, Viale delle Scienze, Building 6 6, Palermo, 90128, Italy
Rosario Sorbello
, Department of Psychology, Reykjavik University, Menntavegur 1, Reykjavik, 101, Iceland
Kamilla Rún Jóhannsdóttir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ballin, G., Munaro, M., Menegatti, E. (2013). Human Action Recognition from RGB-D Frames Based on Real-Time 3D Optical Flow Estimation. In: Chella, A., Pirrone, R., Sorbello, R., Jóhannsdóttir, K. (eds) Biologically Inspired Cognitive Architectures 2012. Advances in Intelligent Systems and Computing, vol 196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34274-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-34274-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34273-8
Online ISBN: 978-3-642-34274-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Human Action Recognition from RGB-D Frames Based on Real-Time 3D Optical Flow Estimation

Abstract

Chapter PDF

Similar content being viewed by others

Global Flow and Temporal-Shape Descriptors for Human Action Recognition from 3D Reconstruction Data

A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors

Motion keypoint trajectory and covariance descriptor for human action recognition

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Human Action Recognition from RGB-D Frames Based on Real-Time 3D Optical Flow Estimation

Abstract

Chapter PDF

Similar content being viewed by others

Global Flow and Temporal-Shape Descriptors for Human Action Recognition from 3D Reconstruction Data

A Hybrid Feature Extraction Approach for Human Detection, Tracking and Activity Recognition Using Depth Sensors

Motion keypoint trajectory and covariance descriptor for human action recognition

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation