Abstract
Modern advances in the area of intelligent agents have led to the concept of cognitive robots. A cognitive robot is not only able to perceive complex stimuli from the environment, but also to reason about them and to act coherently. Computer vision-based recognition systems serve the perception task, but they also go beyond it by finding challenging applications in other fields such as video surveillance, HCI, content-based video analysis and motion capture. In this context, we propose an automatic system for real-time human action recognition. We use the Kinect sensor and the tracking system in [1] to robustly detect and track people in the scene. Next, we estimate the 3D optical flow related to the tracked people from point cloud data only and we summarize it by means of a 3D grid-based descriptor. Finally, temporal sequences of descriptors are classified with the Nearest Neighbor technique and the overall application is tested on a newly created dataset. Experimental results show the effectiveness of the proposed approach.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Munaro, M., Basso, F., Menegatti, E.: Tracking people withing groups with rgb-d data. In: Proc. of the International Conference on Intelligent Robots and Systems (IROS), Vilamoura, Portugal (2012)
Johansson, G.: Visual perception of biological motion and a model for its analysis. Attention, Perception, & Psychophysics 14, 201–211 (1973), 10.3758/BF03212378
Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., Berger, E., Wheeler, R., Ng, A.: Ros: an open-source robot operating system. In: Proceedings of the IEEE International Conference on Robotics and Automation, ICRA (2009)
Basso, F., Munaro, M., Michieletto, S., Pagello, E., Menegatti, E.: Fast and Robust Multi-People Tracking from RGB-D Data for a Mobile Robot. In: Lee, S., Cho, H., Yoon, K.-J., Lee, J. (eds.) Intelligent Autonomous Systems 12. AISC, vol. 193, pp. 269–281. Springer, Heidelberg (2012)
Carlsson, S., Sullivan, J.: Action recognition by shape matching to key frames. In: IEEE Computer Society Workshop on Models versus Exemplars in Computer Vision (2001)
Yilmaz, A., Shah, M.: Actions sketch: a novel action representation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 984–989 (June 2005)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proc. Tenth IEEE Int. Conf. Computer Vision ICCV 2005, vol. 2, pp. 1395–1402 (2005)
Rusu, R.B., Bandouch, J., Meier, F., Essa, I.A., Beetz, M.: Human action recognition using global point feature histograms and action shapes. Advanced Robotics 23(14), 1873–1908 (2009)
Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (October 2003)
Yacoob, Y., Black, M.J.: Parameterized modeling and recognition of activities. In: Sixth International Conference on Computer Vision, pp. 120–127 (January 1998)
Ali, S., Shah, M.: Human action recognition in videos using kinematic features and multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(2), 288–303 (2010)
Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: Proc. Tenth IEEE Int. Conf. Computer Vision ICCV 2005, vol. 1, pp. 166–173 (2005)
Liu, J., Ali, S., Shah, M.: Recognizing human actions using multiple features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (June 2008)
Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA 2007, pp. 357–360. ACM, New York (2007)
Laptev, I., Lindeberg, T.: Space-time interest points. In: Proc. Ninth IEEE Int. Computer Vision Conf., pp. 432–439 (2003)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition CVPR 2008, pp. 1–8 (2008)
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Proc. 2nd Joint IEEE Int. Visual Surveillance and Performance Evaluation of Tracking and Surveillance Workshop, pp. 65–72 (2005)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proc. 17th Int. Conf. Pattern Recognition ICPR 2004, vol. 3, pp. 32–36 (2004)
Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision 79, 299–318 (2008)
Kläser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3d-gradients. In: British Machine Vision Conference, pp. 995–1004 (September 2008)
Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)
Li, W., Zhang, Z., Liu, Z.: Action recognition based on a bag of 3d points. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–14 (June 2010)
Holte, M.B., Moeslund, T.B.: View invariant gesture recognition using 3d motion primitives. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, March 31-April 4, pp. 797–800 (2008)
Holte, M.B., Moeslund, T.B., Nikolaidis, N., Pitas, I.: 3d human action recognition for multi-view camera systems. In: 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), pp. 342–349 (May 2011)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Human activity detection from rgbd images. In: Plan, Activity, and Intent Recognition. AAAI Workshops, vol. WS-11-16. AAAI (2011)
Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from rgbd images. In: International Conference on Robotics and Automation, ICRA (2012)
Yang, X., Tian, Y.: Eigenjoints-based action recognition using naive-bayes-nearest-neighbor. In: IEEE Workshop on CVPR for Human Activity Understanding from 3D Data (2012)
Zhang, H., Parker, L.E.: 4-dimensional local spatio-temporal features for human activity recognition. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2044–2049 (September 2011)
Ni, P.B., Wang, G., Moulin, P.: Rgbd-hudaact: A color-depth video database for human daily activity recognition. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1147–1153 (November 2011)
Popa, M., Koc, A.K., Rothkrantz, L.J.M., Shan, C., Wiggers, P.: Kinect Sensing of Shopping Related Actions. In: Wichert, R., Van Laerhoven, K., Gelissen, J. (eds.) AmI 2011. CCIS, vol. 277, pp. 91–100. Springer, Heidelberg (2012)
Schindler, K., van Gool, L.: Action snippets: How many frames does human action recognition require? In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8 (June 2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ballin, G., Munaro, M., Menegatti, E. (2013). Human Action Recognition from RGB-D Frames Based on Real-Time 3D Optical Flow Estimation. In: Chella, A., Pirrone, R., Sorbello, R., Jóhannsdóttir, K. (eds) Biologically Inspired Cognitive Architectures 2012. Advances in Intelligent Systems and Computing, vol 196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34274-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-34274-5_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34273-8
Online ISBN: 978-3-642-34274-5
eBook Packages: EngineeringEngineering (R0)