Multiview human activity recognition using uniform rotation invariant local binary patterns

Nigam, Swati; Singh, Rajiv; Singh, Manoj Kumar; Singh, Vivek Kumar

doi:10.1007/s12652-022-04374-y

Multiview human activity recognition using uniform rotation invariant local binary patterns

Original Research
Published: 24 September 2022

Volume 14, pages 4707–4725, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Multiview human activity recognition using uniform rotation invariant local binary patterns

Download PDF

Swati Nigam¹,
Rajiv Singh ORCID: orcid.org/0000-0003-4022-9945¹,
Manoj Kumar Singh² &
…
Vivek Kumar Singh²

320 Accesses
6 Citations
Explore all metrics

Abstract

Significant efforts have been made to monitor human activity, although it remains a challenging area for computer vision research. This paper has introduced a framework to identify the most common types of video surveillance activities. The proposed framework consists of three consecutive modules: (i) human detection by background subtraction, (ii) retrieval of uniform and rotation invariant local binary pattern (LBP) feature, and (iii) identification of human activities with a support vector machine (SVM) multiclass classifier. This framework provides a consistent view of the human actions that look at multiple subjects from different views. In addition to this, uniform patterns provide better performance in discriminating human activities. A multiclass SVM is used for classification of human activities. SVM classifier is set and trained to achieve the better efficiency by selecting the appropriate feature before it is integrated. Weizmann's Multiview dataset, CASIA dataset and IXMAS dataset confirm the high efficiency and better robustness of the proposed framework.

Multi-view human activity recognition based on silhouette and uniform rotation invariant local binary patterns

Article 03 March 2016

Human Activity Recognition Based on Motion Projection Profile Features in Surveillance Videos Using Support Vector Machines and Gaussian Mixture Models

Motion Intensity Code for Action Recognition in Video Using PCA and SVM

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Notable success achieved in the monitoring of human actions allows a variety of advanced multimedia applications (Singh et al. 2020a). Owing to its great importance, human activity recognition is exploited for applications like intelligent video surveillance, abnormal behavior recognition, sports, transportation, web and healthcare. It is found in literatures that computational methods are well capable of recognizing normal and abnormal activities from image and video sequences such as walking, fighting and robbing (Nigam et al. 2019; Sahoo et al. 2020; Rajagopal et al. 2020; Pillai et al. 2021; Yousef et al. 2022; Srivastava et al. 2021). Visual surveillance equipment are used for monitoring human activities and retrieving relevant information. These are further integrated to build advanced systems (Lv et al. 2021a, 2022). Depending on number of persons involved in it, human activities can be of four types—(i) actions involving one person, (ii) interactions involving two people, (iii) interactions involving a person and an object, and (iv) group interactions involving a number of persons.

Even though action recognition is a challenging issue, much work is done on it over the past decade (Binh et al. 2013; Poppe 2010; Weinland et al. 2011; Aggarwal and Ryoo 2011; Ji and Liu 2009). One of the major issues with actions performed in a real 3D environment is that cameras capture only 2D projects of real-time actions. Therefore, visual analysis of the activities performed in image plane is merely a projection of actual real-world actions. This projection is based on viewpoints and does not contain complete information about the action. As a solution to this problem, concept of exploring information obtained from multiple cameras mounted on different viewpoints is used (Ji and Liu 2009). Therefore, the mechanisms are developed by presenting a view independent analysis for multiple views (Ji and Liu 2009). Exploring information from multiple scenes enhances the accuracy of activity recognition by obtaining features from various 2D viewpoints to achieve visual consistency. Main objective of the whole scenario is to develop a reliable human activity recognition system.

We participate in the aforementioned solution and introduce an approach for multiview human activity recognition system for image sequences. The proposed framework consists of three steps:

(i)
Finding the human objects by removing the background.
(ii)
Extraction of uniform and rotation invariant characteristics of LBP.
(iii)
Identification of human activities using SVM.

We use simple frame differencing approach for background removal from input data. After background removal, we extract uniform rotation invariant LBP features. Due to rotation invariance characteristic of uniform LBP, it provides an independent analysis of human activity perceptions. This feature is categorized using a radial basis function (RBF) kernel-based SVM classifier with one versus all (OVA) structure. Use of SVM is influenced by the fact that non-sequential strategies, like SVM, are highly competitive and balanced in large-scale and continuous work data (Nigam et al. 2018). Multiclass classification is achieved using hierarchical organization of several binary classifiers.

To illustrate the effectiveness of our proposed work, experiments are performed on three benchmark and publicly available multiview human activity video datasets. These datasets are Weizmann, CASIA and INRIA Xmas Motion Acquisition Sequences (IXMAS). We evaluate the proposed method with the existing and established feature descriptor based methods. The test results of the three datasets show the efficacy of the proposed framework.

Following are the major highlights of the proposed work.

(i)
We introduce a rotation invariant human activity recognition framework.
(ii)
Multiview human activity recognition is handled with background removal.
(iii)
Uniform LBP and SVM classifier are exploited for implementation of the proposed framework.

Organization of this framework is as follows. Section 2 briefly discusses the related works. Section 3 elaborates the development of LBP features and the organization of the SVM classifier for multiple classes. Section 4 provides the implementation details of the proposed method. The results of the evaluation and discussion on the three public datasets are provided in Sect. 5. Section 6 provides conclusive remarks of the study.

2 Related works

Recognition of human activity is the process to detect human body motion patterns. Popular devices to detect human activity are sensors and cameras. Mostly, two type of activity recognition systems do exist, one is sensor-based and other is vision-based. Vision-based systems are more popular as compared to sensor-based since they provide important cues of activity recognition. Many researchers have contributed towards the reviews on human activity recognition (Saha et al. 2022). Based on these reviews, activity recognition approaches can be model-based and model-free.

Model-based activity recognition uses a pre-model to monitor human activities. These vivid 2D and 3D shape models are used to visualize people's activities. Global and local features have been combined in (Wang and Mori 2010) to implement a framework for human action recognition. This work has demonstrated that the combination of part-based model and motion features with large-scale improves the results. Instead of constructing hidden part model, the work in (Wu et al. 2014) has constructed hidden temporal models for each action class. It has focused on human action recognition in uncontrolled videos containing complex temporal structures. The work in (Lan et al. 2011) has focused on the recognition of specific actions and group activities. It has also defined a new feature called action context descriptor. This approach has demonstrated good visual results of several complex but mathematically costly tasks. Cheng et al. (2014) has developed a layered model to represent group activities at diverse granularities. New informative descriptions of the appearance of group actions have been introduced in this way. A nearest neighbour classifier and Gaussian mixture model based work has been proposed for video action recognition using motion curves in (Vrigkas et al. 2014).

However, from the analysis of the model based methods, it is observed that there is a trade-off in retrieving a detailed knowledge of the human body, and the cost of calculation and robustness. The model based methods exploit the pose and velocity vectors which may increase the computational complexity. Sometimes, major parts of body models are taken to reduce the complexity (for example hands, legs, torso etc.), still it is difficult to construct these models. Also, model based approaches need to be implemented directly and could not work in real time.

Model-free approaches overcome shortcomings of model-based approach. In model-free methods, low visibility features from area of interest are retrieved for action recognition. These methods are based on posture, global and local motions (Määttä et al. 2010). The feature based multiple view approaches obtain image data captured by multiple cameras. A two-camera based method has been implemented for multiple humans pointing in a direction (Matikainen et al. 2011). The different views of 2D pointer configurations have been used to obtain 3D pointing vectors. Five calibrated and synchronized cameras have been used in (Souvenir and Babbs 2008). R transform and manifold learning of the silhouettes have been used for view invariant activity recognition. The circular shift invariance nature of discrete Fourier transform have been exploited in (Iosifidis et al. 2010).

Data fusion has also been exploited for multiview human activity recognition (Weinland et al. 2010). It has used 3D histogram of oriented gradients (HOG) features and applied local partitioning along with hierarchical classification on it. A similar method has been implemented using view point aggregation and multiview dynamic image fusion for cross view 3D action recognition (Wang et al. 2021). It has used 3D characterization and fisher vector for representation of 3D action.

Cross-view activity recognition is an interesting topic for researchers. This is a difficult task of human activity recognition since training and testing views are different. Numerous techniques are proposed for this purpose including learning from short video clips (Vyas et al. 2020), bilayer classification model (Li et al. 2019) and unsupervised attention transfer (Ji et al. 2021).

In recent works, deep learning and transfer learning have become useful tools (Lv et al. 2021b; Singh et al. 2020b). Few deep learning-based techniques are defined in (Jan and Khan 2021; Verma and Singh 2021; Verma et al. 2020) which are very efficient to perform recognition task.

Today, dynamic texture patterns like LBP, have become an obvious choice for the recognition of the activity of a person considered as moving texture patterns. A few examples of them are (Nigam et al. 2021; Kellokumpu et al. 2010, 2011; Vili et al. 2008). However, none of these strategies uses the rotation invariant uniform LBP. Selecting such patterns reduces length of LBP histogram and improves efficacy of a classifier (Pietikäinen et al. 2011). It is widely accepted that uniform LBP is highly effective and has been used repeatedly in several other applications in addition to texture analysis (Bianconi and Fernández 2011). Although many upgraded versions of simple LBP have been introduced, many techniques still benefit from the uniform LBP. However, it is not clear that how the uniform patterns contribute to the LBP based discrimination (Lahdenoja et al. 2013). Furthermore, uniform LBP has been used successfully to obtain rotation invariance (Fernández et al. 2011). The use of uniform binary patterns with rotation invariance is advanced version when compared to its predecessors, as they provide additional integrated representation (Ojala et al. 2002). The global rotation of LBP has been achieved in (Ahonen et al. 2009) using the discrete Fourier transformation in the uniform LBP histograms bins. Rotation invariance characteristic of LBP variants has also been discussed in (Zhao et al. 2011).

From above description of human activity recognition literature, it can be inferred that uniform LBP results in better selection of human multiview activity recognition.

3 Principles and basics

This section briefly discusses two major components of the proposed method, which are the uniform rotation invariant LBP and the multiclass SVM.

3.1 Uniform and rotation invariant LBP

LBP

The LBP feature is built for a circular neighbourhood of radius R pixel. Intensity of P sample points is compared in the circular neighbourhood with the centre pixel in either clockwise or anticlockwise direction (see Fig. 1).

This comparison determines whether the pixel value should be zero (0) or one (1). A value 0 is given if the median pixel magnitude is greater than the neighborhood pixel and a value 1 is given if the median pixel magnitude is less than neighborhood pixel. A popular option is 8 for the number of sample points in the neighborhood and 1 for radius (i.e., P = 8 and R = 1). Although, other combinations may also be used. Intensity of a sample point between two pixels is determined by the bilinear interpolation. The LBP feature of an image is denoted by LBP (Pietikäinen et al. 2011). After having extracted the LBP of a pixel, intensity value of the pixel is replaced by this LBP. This procedure could not be followed for border pixels because all of the neighbor values do not exist there. Under these considerations, feature vector of an image is given by

$$LB{P}_{P,R}(x,y)={\sum }_{P=0}^{P-1}s({g}_{P}-{g}_{c}){2}^{P}$$

(1)

In Eq. (1), (x, y) is the center pixel location, g_c represent the center pixel intensity, g_p represent the pixel intensity of the neighborhood and s(w) is defined as

$$s(w)=\left\{\begin{array}{c}1, w\ge 0\\ 0, w<0\end{array}\right\}$$

(2)

The feature vector $LB{P}_{P,R}$ of an image is LBP histogram of all pixels in this image. The initial dimension of this LBP histogram is ${2}^{P}$ since each LBP may be assigned to a different bin. If an image has M regions, then total number of histograms formed in the image are M · ${2}^{P}$ or we can say that the image has a histogram whose size equals to M · ${2}^{P}$.

Rotation invariance

Several upgraded versions of LBP, as discussed in (Pietikäinen et al. 2011), have been developed to achieve invariance against rotation and to reduce the size of LBP histogram. For the rotated image, the gray value g_p will shift along the rotation circle perimeter, hence different $LB{P}_{P,R}$ can be calculated. To reduce the effect of rotation, an upgraded LBP including invariance against rotation is defined as

$$LB{P}_{P,R}^{ri}(x,y)=\mathit{min}\left\{ROR(LB{P}_{P,R},i)|i=\mathrm{0,1},....,R-1\right\}$$

(3)

In Eq. (3), $ROR(LB{P}_{P,R},i)$ makes a circular bitwise right shift i times to the R-bit number $LB{P}_{P,R}$. The $LB{P}_{P,R}^{ri}$ feature vector can have 36 different values for R = 8, and it can have the histogram size 36 for an image region.

Uniform patterns

Uniform LBP is having 0, 1 or 2 circular transitions between binary value 0 and 1. Let us take few examples of uniform and non-uniform patterns. The 0–1 transitions of uniform patterns for P = 8 and R = 1 are shown in Fig. 2. In a circular neighborhood of P pixels, number of uniform patterns found is P + 1. A brief description of uniform and non-uniform patterns is shown in Table 1.

Table 1 Uniform and non-uniform patterns

Multiview human activity recognition using uniform rotation invariant local binary patterns

Abstract

Similar content being viewed by others

Multi-view human activity recognition based on silhouette and uniform rotation invariant local binary patterns

Human Activity Recognition Based on Motion Projection Profile Features in Surveillance Videos Using Support Vector Machines and Gaussian Mixture Models

Motion Intensity Code for Action Recognition in Video Using PCA and SVM

Explore related subjects

1 Introduction

2 Related works

3 Principles and basics

3.1 Uniform and rotation invariant LBP

3.2 SVM multiclass classifier

4 The proposed framework

4.1 Input video

4.2 Preprocessing

4.3 Background subtraction

4.4 Feature vector extraction

4.5 Recognition of activities

5 Experimentation and results

5.1 Case study 1

5.2 Case study 2

5.3 Case study 3

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation