Orientation Invariant Skeleton Feature (OISF): a new feature for Human Activity Recognition

Dwivedi, Neelam; Singh, Dushyant Kumar; Kushwaha, Dharmender Singh

doi:10.1007/s11042-020-08902-w

Orientation Invariant Skeleton Feature (OISF): a new feature for Human Activity Recognition

Published: 30 April 2020

Volume 79, pages 21037–21072, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Orientation Invariant Skeleton Feature (OISF): a new feature for Human Activity Recognition

Download PDF

Neelam Dwivedi¹,
Dushyant Kumar Singh¹ &
Dharmender Singh Kushwaha¹

393 Accesses
9 Citations
Explore all metrics

Abstract

Human Activity Recognition is the process of identifying the activity of a person by analyzing continuous frames of a video. In many application areas, human activity identification is either a direct goal or it is a key segment of a bigger objective. Some of the examples are surveillance system, elder healthcare monitoring system, abnormal activity detection systems such as fight detection, theft detection etc. Robust and accurate activity recognition is a challenging task due to diverse reasons, such as changing ambient illumination, noise, background turbulence, camera placements etc. Existing literatures discuss some techniques for identifying human activity but these approaches are restricted to the case of videos recorded from static camera. The aim of the proposed approach is to fill this gap. In this proposed method, a new skeleton based feature for human activity recognition- “Orientation Invariant Skeleton Feature (OISF)”- is introduced and used to train Random Forest (RF) classifier for Human Activity Recognition. Efficiency of newly introduced feature OISF is analyzed for the videos recorded with multiple cameras positioned at two different slant angles. Experimental results reveal that the newly introduced feature OISF has minimal dependency on variations of camera orientation. Accuracy achieved is ≈ 99.30% with ViHASi dataset, ≈ 96.85% with KTH dataset and ≈ 98.34% with in-house dataset which is higher than those achieved by other researches with existing features. The improved result of human activity recognition in terms of accuracy proves the appropriateness of the proposed research in being used commercially.

Comprehensive evaluation of skeleton features-based fall detection from Microsoft Kinect v2

Article 15 May 2019

Skeleton Tracking Based Complex Human Activity Recognition Using Kinect Camera

Machine Learning Models for Human Activity Recognition: A Comparative Study

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Human Activity Recognition aims to recognize the physical activity of one or more persons from a sequence of observations using sensors or video logs. It is a difficult task for real time videos because of background clutter, variations in illumination, varying viewpoints, partial occlusions and camera movements. Recognition of human activity is thus an open and challenging research area in the field of computer vision, machine vision, image processing [1] etc. A robust human activity recognition system can greatly enhance efficacy of surveillance system [7], human-computer interaction [21], gesture interpretation [39], life-care system [14, 23], sports analysis [32], smart homes [15] and other application domains. Due to its wide applicability in real life, many researchers have been working in this area since long. Recognition accuracy of such system broadly depends on the quality of extracted features and learning capability of the classifier. Different features like Motion Stable Shape (MSS) [18], Motion Scale-Invariant Feature Transformation (MoSIFT) [6], Speeded Up Robust Features Motion History Image (SURF-MHI) [35], Spatio-temporal interest point (STIP) detector [8], skeleton based features etc. have been used for Human Activity Recognition by the research community in the past.

Human Activity Recognition system is applied on the data collected through the wearable sensor or the recorded video (recorder from CCTV camera, Surveillance camera, etc.). Wearable sensor [3, 26, 39] based approach first collects the data from the sensors attached to the different parts of human body. Thereafter, physical activity of human is recognized by analyzing these data. This type of approach can be used only when a person is always wearing sensors which is not always feasible in a real life scenario. To mitigate this limitation, computer vision based human activity recognition is the preferred method. In the vision based approach, features like MoSIFT, SURF-MHI, etc. are extracted from the video data which are further used to train the classifier. These features work well for the videos recorded with static cameras, but not for the videos recorded with moving cameras. Also, processing time is high for extracting these features because feature extraction requires processing of all pixels in the image. To minimize these limitations, skeleton based features are proposed by many researchers in literature for human activity recognition. Feature selection plays an important role in vision based human activity recognition [9, 40] system because multiple features may be required to train the system for improving its accuracy. One can select large number of features to increase the accuracy, but this results in increased time for feature extraction & learning and testing of the system. Hence, skeleton-based features prove to be a viable alternative.

Skeleton features [15, 32, 37] are extracted by analyzing the skeleton pose in a sequence of frames. The skeletal representation of the human body at any instant reflects original pose of that person. A human pose in a video frame contains lots of pixels, while its skeletal representation contains comparatively smaller number of pixels thus reducing processing time for feature extraction. Human Activity Recognition using skeleton involves following steps:

1.
Extraction of skeletons from video frames
2.
Extraction of skeleton features
3.
Construction of feature vector
4.
Training and testing of the classifier

The skeleton feature usually contains joint information of different body parts like end and mid points of hands, legs, and position of head. Feature vectors are then constructed by fusing the features extracted from the person being tracked, in few consecutive frames. Finally, a supervised machine learning based classifier is used to identify the physical activity of the tracked person based on the extracted features. Neural trees, SVM, LDA and other classifiers are used in the existing literature for the activity recognition task. Since, human activity recognition is a multi-class classification problem, tree based classifier is a preferable approach. This observation motivates us to explore the suitability of Random Forest classifier for human activity recognition. The aim of this work is to propose skeleton features for human activity recognition with the following goals:

To improve the recognition accuracy of the system
To reduce the dependency on camera positions and angles
To obtain similar recognition accuracy for the videos recorded by either static or moving cameras

In this paper, a new feature named “Orientation Invariant Skeleton Feature (OISF)” is introduced for human activity recognition. This newly introduced feature is examined through number of experiments on two publicly available datasets: KTH and ViHASi, and one in-house dataset which is available at https://github.com/neelamdw/Actiondataset/. The in-house dataset is developed in the lab which contains five different actions (boxing, hand clapping, hand waving, jogging and walking). Three subsets (SA₁,SA₂&SA₃) are prepared by selecting similar and dissimilar activities from the ViHASi dataset (Table 1 presented in Section 4 may be reffered). Experiment #1, experiment #2 and experiment #3 are performed on SA₁,SA₂ and SA₃. All the activities of KTH dataset are taken together to perform the experiment #4. Similarly, All the activities of in-house dataset are taken together to perform the experiment #5. Following are the major contributions of this paper:

A new skeleton based feature named as “Orientation Invariant Skeleton Feature (OISF)” for human activity recognition
Performance evaluation of proposed feature with Random Forest classifier for KTH dataset, in-house dataset and different subsets of ViHASi dataset

Table 1 Three subsets of ViHasi dataset

Orientation Invariant Skeleton Feature (OISF): a new feature for Human Activity Recognition

Abstract

Similar content being viewed by others

Comprehensive evaluation of skeleton features-based fall detection from Microsoft Kinect v2

Skeleton Tracking Based Complex Human Activity Recognition Using Kinect Camera

Machine Learning Models for Human Activity Recognition: A Comparative Study

Explore related subjects

1 Introduction

2 Literature survey for human activity recognition

3 Proposed approach for human activity recognition

3.1 Phase 1 (pre-processing of video): skeletonization of input video frames

3.2 Phase 2: Region of Interest (RoI) selection

3.3 Features used

3.3.1 FV 1 feature extraction

3.3.2 OISF feature extraction

Example 1

4 Experiments and their analysis

4.1 Parameters used for performance measurement

4.2 Experiment #1

4.3 Experiment #2

4.4 Experiment #3

4.5 Experiment #4

4.6 Experiment #5

4.7 Effectiveness analysis of the proposed approach

5 Comparison of proposed approach with existing state-of-the-art approaches

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Details of Algorithms

Appendix: Details of Algorithms

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation