Spatiotemporal features of human motion for gait recognition

Khan, Muhammad Hassan; Farid, Muhammad Shahid; Grzegorzek, Marcin

doi:10.1007/s11760-018-1365-y

Spatiotemporal features of human motion for gait recognition

Original Paper
Published: 26 September 2018

Volume 13, pages 369–377, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Signal, Image and Video Processing Aims and scope Submit manuscript

Spatiotemporal features of human motion for gait recognition

Download PDF

Muhammad Hassan Khan¹,
Muhammad Shahid Farid² &
Marcin Grzegorzek¹

829 Accesses
28 Citations
Explore all metrics

Abstract

Gait is a novel biometric feature that offers human identification at a distance and without physical interaction with the imaging device. Moreover, it performs well even in low resolution which makes it ideal for use in numerous human identification applications, e.g.,visual surveillance, monitoring and access control systems. Most existing gait-based human identification solutions extract human body silhouettes, contours or shapes from the images and construct gait features. Therefore, the performance of such algorithms highly depends upon the accuracy of human body segmentation, which is still a challenging problem in the literature. In this paper, we propose a new gait recognition algorithm which uses the spatial and temporal motion characteristics of human gait for individual identification without needing the silhouette extraction. The proposed algorithm extracts a set of spatiotemporal local descriptors from the gait video sequences. The extracted descriptors are encoded using the Fisher vector encoding and Gaussian mixture model-based codebook. The encoded features are classified using a simple linear support vector machine to recognize the individuals. The proposed gait recognition method is evaluated on five widely used gait databases, including indoor (CMU MoBo, CASIA-B) and outdoor (NLPR, CASIA-C, TUM GAID) gait databases. The results reveal that our method showed excellent performance on all five databases and outperformed the state-of-the-art gait recognition approaches.

Gait Recognition Using Motion Trajectory Analysis

A generic codebook based approach for gait recognition

Article 13 August 2019

A Vision-Based Feature Extraction Techniques for Recognizing Human Gait: A Review

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Biometrics has emerged as an important research area in the recent years due to its vast applications in various fields such as surveillance, authentication, etc. A number of human physiological characteristics e.g., face, iris, fingerprints, deoxyribonucleic acid (DNA) have been shown to be unique, and therefore, they can be used for individual identification [28, 42]. Lately, the gait biometric has received attention of research community for human identification. It refers to the way a person walks. Gait is different from other biometric modalities in which the person interaction is needed with the system for identification such as fingers must be placed on the scanners to get their print, face must be in a specific position in front of a camera to be captured, DNA requires sample of blood or skin to be used for analysis; the gait on the other hand does not require any interaction with the imaging device. It works in a hidden manner, and the individuals are generally unaware that they are being captured by a camera installed at some secret place. Moreover, the performance of gait-based human identifiers is less susceptible to noise as compared to other biometrics and that is why they perform pretty well even in low-quality and low-resolution videos. Gait may not be as precise and powerful biometric as some other modalities are, but the aforementioned characteristics make the gait an ideal choice for visual surveillance applications.

The existing gait recognition algorithms are generally categorized into two types: model-based algorithms and model-free approaches. In the former category, the gait recognition algorithms use the human motion, body structures, and joint positions to track and recognize the individuals [57]. The structural models are usually generated based on the prior knowledge of the shape of human body [7, 23]. The motion models [3, 10] exploit the motion parameters of human body parts like joint angle trajectories, rotation patterns of hip and thigh. The gait recognition algorithm proposed in [3] computes features from joints’ locations using elliptic Fourier descriptors which are used to recognize the gait. The authors in [10] modeled a gait feature by exploiting the angular motion of the hip and thigh using the Fourier series. Wang et al. [53] modeled human body using fourteen rigid parts connected to each other at joint locations and used these locations to obtain joint angle trajectories which are used as gait descriptor. The model-based approaches are somewhat robust to occlusion, but they suffer with number of limitations such as the performance of these methods depends upon the torso localization and also requires high-quality gait videos [3, 57]. Moreover, these methods are computationally expensive too.

The model-free gait recognition approaches usually extract human silhouettes from gait video and drive different information from these silhouettes images for gait identification. Numerous techniques construct a template image from the segmented human body silhouettes and use it for person identification e.g., [1, 8, 17, 24, 30, 37, 41, 51]. The technique proposed in [30] computes silhouette images and averages the results over a gait cycle to estimate the person identity, known as gait energy image (GEI). Several improvements over GEI have also been reported e.g., [1, 24, 41, 46, 51, 60]. The approaches [14, 36, 59] extract different gait-related parameters from the silhouette images and use them for person identification. Goffredo et al. [14] computed the height and width features from the normalized and scaled silhouette region over time and used them as gait dynamics. In [59], radial basis function network and deterministic learning is used to classify the height and width ratio and contour centroid over time to approximate the individual’s gait.

Shape analysis on the sequence of silhouettes and projection of silhouettes in different directions have also explored in gait identification [21, 44, 45, 49, 54]. The 2D silhouette are mapped into 1D normalized distance signal in the algorithm described in [55]. The changes in shape of 1D silhouette over time are used to approximate the gait patterns. The methods in [44, 45] used the normalized segmented silhouette region and projected them in different directions. They consider a vector of projective values in each projection as gait feature. Motion information has also been extensively used for gait recognition e.g., [6, 9, 20]. Castro et al. [6] computed the spatiotemporal cuboids of optical flow from the video sequences and fed to convolutional neural network to obtain a high-level gait representation. The authors in [9] exploit the spatiotemporal motion characteristics, and and statistical and physical parameters of silhouettes for recognition.

The model-free gait recognition techniques have achieved better results compared to their counterpart model-based approaches, and they are computationally efficient too. However, the bottleneck in these techniques is the precise silhouette segmentation, as a poor segmentation can adversely affect the recognition performance [59]. This paper presents a novel spatiotemporal features-based gait representation using dense trajectories to model the human gait. The motivation in using the dense trajectories is because they contain the local motion patterns of walk and they can be easily computed from the gait video sequences using optical flow field. The extracted motion information is encoded using Fisher vector- and Gaussian mixture model-based codebook. These encoded features are classified using a simple linear support vector machine (SVM).

The proposed approach, unlike existing techniques, does not require gait cycle estimation or human body extraction, and therefore, it performs better in low-resolution videos and poor-quality videos. The experimental evaluation performed on CASIA-C gait database (Sect. 3.5) which consists of low-resolution video sequences reveals the efficacy of the proposed method in unfavorable and challenging conditions, as the video sequences are recorded at night using a thermal imaging camera. In particular, the proposed gait features comprise the persons static appearance and the relative motion information which are capable of capturing even a small variance in the gait. High recognition accuracy obtained with a simple linear SVM also confirm the discriminative nature of the proposed features for gait recognition. Preliminary results of this research were presented in [19]; however, in this paper a number of improvements are proposed which are summarized in the following.

In this study, we investigated the influence of dimensionality reduction on feature encoding. High-dimensional features inevitably increase the computational complexity and the storage requirements. It is inferred from a large experimentation that applying principal component analysis (PCA) after feature encoding not only improves the recognition rate, it also reduces the classification time and the storage requirements of the proposed algorithm.
It is also concluded that dimensionality reduction prior to feature encoding results in loss of few important cues for gait recognition, which may lower the recognition accuracy. Indeed, this information is important which significantly improves the efficiency of our method.
In this research, an extensive experimental evaluation is performed. The proposed technique is evaluated on five large benchmark gait databases and the recognition results are compared with state-of-the-art techniques.

2 Proposed gait recognition method

The proposed gait recognition technique consists of four steps. First, the dense trajectories are extracted for a set of sample points in a video sequence using optical flow field, and for each tracked point the appearance and motion information of a walker is computed within a spatiotemporal patch and saved in local descriptors. Second, we randomly selected one million local descriptors to build a codebook. The codebook is constructed using Gaussian mixture models (GMM), and the local descriptors are encoded using this codebook and Fisher vector encoding scheme in the third step. Finally, we used a linear SVM to classify the encoded features. The performance of the proposed algorithm is evaluated on five widely used benchmark gait databases, and the results are compared with the existing techniques. The recognition results confirm the superior performance of the proposed method. Figure 1 shows the block diagram of the proposed method. Each step of the proposed method is presented in the following sections.

2.1 Motion descriptor computation

Recently, dense trajectories have proven to be effective in action recognition [33, 52]. Since they encode the movement patterns of a person’s walk and can be easily computed directly from the gait video sequences, we used them to compute the gait features for person identification. In order to compute trajectories, we select a set of dense points from a frame and they are being tracked in the subsequent frames. The tracking is performed using displacement information in a dense optical flow field. In particular, each selected point $ P_{t} = (x_{t},y_{t}) $ in frame t is tracked in the subsequent frame $t+1$ and so forth. The concatenation of these tracked points in the subsequent frames form a trajectory. Assuming that S represents a sequence of displacement vector and can be computed as,

$$\begin{aligned} S = ({\Delta }P_{t},{\Delta }P_{t+1},\ldots ,{\Delta }P_{t+L-1}), \end{aligned}$$

(1)

where L is the trajectory length and ${\Delta }P_{t} = (P_{t+1} - P_{t})$. The resultant vector S is normalized:

$$\begin{aligned} S' = \frac{{(\Delta }P_{t},\ldots ,{\Delta }P_{t+L-1})}{ \sum _{j=t}^{t+L-1} \left\| {\Delta }P_{j}\right\| } \end{aligned}$$

(2)

Equation 2 describes the shape of the trajectory. The authors in [52] proposed the computation of local descriptors around the tracked points to capture the appearance and motion information for action recognition. The local descriptors are computed within space-time volume, and their information is saved in two histograms, namely histogram of oriented gradient (HOG) and histogram of optical flow (HOF). Furthermore, two additional descriptors are computed by taking the spatial derivatives along horizontal (x-axis) and vertical (y-axis) components of the optical flow known as Motion Boundary Histogram and their information is saved in two histograms namely, MBH$_{x}$ and MBH$_{y}$, respectively. They describe the relative motion between the pixels. We quantized the orientation information of each descriptor into 8-bin histogram. The descriptors are normalized with $L_{2}$-norm. In order to assess the efficacy of these descriptors for gait recognition, we evaluated their several combinations in [19] and concluded that the combination of MBH and HOG outperforms the others feature combinations. The main reason for the superior performance of this combination is that HOG captures the person’s static appearance and MBH encodes the relative motion information between the pixels; therefore, when used collectively they have a greater impact in recognizing the individuals using their appearances and motion information. Moreover, they are capable of capturing small variances in the gait patterns.

2.2 Codebook generation

Usually, a set of local descriptors are used to construct a fixed length vector to represent an image or video. We encoded HOG and MBH descriptors using Fisher vector (FV) encoding and Gaussian mixture model (GMM)-based codebook. The FV is based on the principle of Fisher kernel [38] which incorporates the advantages of both discriminative and generic approaches using a kernel from generative model of the data. The GMM defines a collection of several Gaussian distributions over the feature space [33] and can be expressed as:

$$\begin{aligned} p(X \mid \theta ) = \sum \limits _{i=1}^{K}w_{i}\mathscr {N}( x \mid \mu _{i}, \textstyle \sum _{i}) \end{aligned}$$

(3)

where K represents the number of components of GMM, $w_{i}$ is the weight of ith component with the constraint that $ \sum _{i=1}^{K}w_{i} = 1 $, $\mu _{i}$ and $\textstyle \sum _{i}$ are representing the mean vector and covariance matrix of the ith component, respectively. Additionally, $\theta = \{ w_{i},\mu _{i},\textstyle \sum _{i}, i=1,2,\ldots ,K\}$ is the list of parameters for all K components of GMM and $\mathscr {N}( x \mid \mu _{i}, \textstyle \sum _{i})$ represents the Gaussian distribution with D dimensions:

$$\begin{aligned} \mathscr {N}(x\mid \mu _{i},\textstyle \sum _{i}) = {\dfrac{1}{ \sqrt{(2\pi )^{D}{\mid }\sum _{i}{\mid }}} } e^{-\dfrac{1}{2}(x-\mu _{i})^{\top }\textstyle \sum _{i}^{-1}(x-\mu _{i})} \end{aligned}$$

(4)

For a given feature set $X = \{x_{1}, \ldots , x_{t}\}$, we used maximum likelihood estimation algorithm [31] to estimate the optimal parameters of GMM. It performs soft assignment of a data point to each component. The assignment score which is also known as posterior probability demonstrates the association of data point to the respective component in GMM and can be defined as,

$$\begin{aligned} q_{t}(i) = \dfrac{w_{i}\mathscr {N}(x_{t}\mid \mu _{i}, \textstyle \sum _{i})}{\sum _{j=1}^{K} w_{j} \mathscr {N}(x_{t}\mid \mu _{j}, \textstyle \sum _{j})} \end{aligned}$$

(5)

We consider that each component of GMM describes a particular appearance and motion pattern contributed by the local descriptors. Since GMM uses an iterative expectation maximization (EM) algorithm [11, 25] which implements soft assignment of a descriptor to all components, the descriptor would be assigned to multiple components using the posterior estimates of the descriptor.

2.3 Feature encoding

Feature encoding is the process of transforming the local descriptors of an image or video into a fixed length vector. Lately, FV encoding became quite popular due to its superior performance in many image processing applications [34, 38]. We chose FV to encode our local descriptors. The encoding of FV starts by learning a GMM (Sect. 2.2). Let $X = \{x_{t}| t=1, \ldots , T\}$ be a set of local descriptors which is transformed into a vector using (3).

$$\begin{aligned} F_{X} = \dfrac{1}{T}\nabla _{\theta }\log p(X|\theta ), \end{aligned}$$

(6)

where $F_{X}$ represents the resultant Fisher vector and $\nabla _{\theta }$ is the gradient of the log-likelihood with respect to the model parameters $\theta $. Let $x_{t}$ be a D-dimensional local descriptor, and its assignment to the ith Gaussian component (5) is represented as $q_{t}(i)$. Assuming that the covariance matrices $\textstyle \sum _{i}$ are diagonal and can be represented as ${\sigma _{i}}^2$, the gradient vector can be represented as in [35]:

$$\begin{aligned} u_{i} = \frac{1}{T\sqrt{w_{i}}} \sum _{t=1}^{T} q_{t}(i) \dfrac{x_{t}-\mu _{i}}{\sigma _{i}} \end{aligned}$$

(7)

where mean $\mu _{i}$ represents the mean of the ith component.

$$\begin{aligned} v_{i} = \frac{1}{T\sqrt{2w_{i}}} \sum _{t=1}^{T} q_{t}(i) \begin{bmatrix} \dfrac{(x_{t}-\mu _{i})^{2}}{\sigma _{i}^{2}} - 1\end{bmatrix}, \end{aligned}$$

(8)

where $u_{i}$ and $v_{i}$ are D-dimensional gradient vectors with respect to the model parameters mean and variance, respectively. The Fisher encoding for the set of local descriptors X is obtained as a concatenation of u and v.

$$\begin{aligned} f = [u_{1}^{\top },v_{1}^{\top }, u_{2}^{\top },v_{2}^{\top }, \ldots , u_{K}^{\top },v_{K}^{\top }]^{\top } \end{aligned}$$

(9)

The final gradient vector f consists of $u_{i}$ and $v_{i}$ vectors for $i=1,2,\ldots ,K$. The size of f is 2KD. The MBH$_{x}$, MBH$_{y}$ and HOG descriptors are encoded as described above, and they are fused using the representation-level fusion [33]. Each of the feature vectors representing a set of local descriptors are computed separately and a global representation is obtained by concatenating them.

Table 1 Summary of gait databases used in performance evaluation. Size represents the number of gait video sequences in the database used in experimental evaluation

Full size table

2.4 Classification

The similarity between two samples X and Y can be measured using the Fisher kernel (FK) [38]. It is defined as a dot-product between the feature vectors of X and Y: $FK(X,Y) = f_{X}' \cdot f_{Y}$. Here $f_{X}$ and $f_{Y}$ represent the Fisher vectors for samples X and Y, respectively. A nonlinear kernel machine using FK as a kernel is similar to a linear kernel machine using $f_{X}$ as feature vector. The main advantage of using such an explicit vector formulation is that we can exploit any simple linear classifier which can learn very efficiently. We used a simple linear SVM to solve this problem. In the implementation of the proposed algorithm, LIBLINEAR SVM library [13] is used to classify the encoded gait features. Before the actual model is trained on full training database, a 10-fold cross-validation is performed to validate the model by selecting the optimal value of its parameters.

3 Experiments and results

The performance of the proposed algorithm is evaluated on five large gait databases: TUM GAID [17], CMU MoBo [15], NLPR [55], CASIA-B [58] and CASIA-C [46] databases. Each database contains gait sequences captured in different environment with several variations in walking style, clothing, walking speed, etc. For example, in TUM GIAD database, three different walking styles are captured: normal walk, walk with coating shoes, and walk with backpack. Moreover, the database is recorded in two different seasons; thus, the clothing and the environment are significantly different. A summary of the characteristics of gait databases used in the experimental evaluation is presented in Table 1. For each database, a codebook is separately computed using a training set, a subset of the database, and is used to encode the local descriptors of that particular database. In all experiments, one million randomly selected local descriptors from the training set are used to build a codebook with GMM. The number of components K in GMM are empirically computed and set to $2^{8}$ in all experiments and the length of each local descriptor is 96. The same distribution of databases into training and testing sets are used in evaluating the performance of the proposed and the compared gait recognition methods.

We also assess the influence of dimensionality reduction on the classification accuracy of our computed features. It is observed that dimensionality reduction significantly improves the recognition accuracy of the proposed method and reduces the computational time and the storage requirements. This improvement is due to the impact of PCA on GMM, which decorrelates the different dimensions during the projection, and since we are using a diagonal covariance matrix in Gaussian distribution, it is considered more advantageous [43]. Let d be the dimension of our feature vector and n be the total number of instances in the training and the testing sets. Since $d \gg n$; therefore, we have applied PCA on our feature vectors and reduced their dimensions to $n-1$ to alleviate the curse of the dimensionality problem. It is empirically concluded that average recognition accuracy of the proposed algorithm on five gait databases is significantly increased (up to 27%), when the dimension of the final feature vector is reduced. Furthermore, using PCA the average percentage saving of memory on all the five databases is more than 50% and the average speedup achieved in classification time is more than 96%.

3.1 Evaluation on TUM GAID database

TUM GAID [17] is one of the biggest gait database, comprising 3370 gait sequences of 305 subjects. The gait sequences were recorded in two different seasons. Each subject in the database has ten walk sequences which include six sequences of normal walk (N), two sequences of walk with coating shoes (S) and two sequences of walk with backpack (B). A subset of 32 subjects in the database who took part in both recordings have ten more walk sequences (i.e., in total 20 sequences), and they are described as normal walk after time (TN), walk with coating shoes after time (TS) and walk with backpack after time (TB). The time variation, where the shoes, clothing, illumination and other recording properties are significantly changed, make this database extremely challenging. A similar division of gait sequences into the training (i.e., gallery) and the testing (i.e., probe) sets is used, defined in [17]. Specifically, the first four sequences of N for each person (i.e., $N_{1} - N_{4}$) are used in the training set and $N_{5}-N_{6}$, $S_{1}-S_{2}$ and $B_{1}-B_{2}$ are used in the testing sets separately, with three different experiments, namely N, S and B. Moreover, three more experiments are conducted namely TN, TS and TB where the sequences of $N_{7}-N_{8}$, $S_{3}-S_{4}$ and $B_{3}-B_{4}$ are used in the testing set separately, while the training set is similar as in the previous set of experiments.

Table 2 Performance comparison of proposed algorithm on TUM GAID database

Full size table

Table 3 Performance comparison of proposed algorithm on CMU MoBo gait database. The composition of training and testing sequences in each experiment is denoted as Training–Testing

Full size table

Recognition rate measure is used to assess the performance of gait recognition algorithms. The recognition results achieved by our algorithm and their comparison with the state-of-the-art techniques [4,5,6, 17, 56] are outlined in Table 2. The statistics show that in the first set of experiments N, S and B, the proposed algorithm outperforms all competing methods. It achieved the recognition accuracy of 99.7%, 99.7% and 100%, respectively. In the next three experiments, TN, TS and TB, our algorithm performs the best on TB experiment achieving 71.9% recognition accuracy. However, on TN and TS experiments, respectively, PFM [5] and CNN-SVM [6] perform better than our algorithm. Overall, our algorithm achieves the best average recognition accuracy of 96.5%.

3.2 Evaluation on CMU MoBo database

The CMU MoBo database [15] contains the gait videos of 25 subjects while they are walking on a treadmill. The database comprises four variations of walk: slow walk (S), walk with ball in hands (B), fast walk (F) and walk at certain slope (I). The sequences are recorded from six different viewing angles. The gait sequences recorded in lateral view are used to assess the performance of the proposed algorithm to deal with the variations in walking surface (i.e., incline), speed and carrying condition. In order to increase the number of gait sequences, similar to [59] we split the sequences into three equal parts. Two type of experiments are conducted, (1) similar walking styles are used in the training and the testing sets, (2) different walking styles are used in the training and the testing sets. The recognition accuracies obtained by the proposed algorithm and their comparison with competing methods are listed in Table 3. The proposed algorithm achieved excellent results, outperformed all competing methods in sixteen experiments and achieved the highest average recognition accuracy of 98.5%.

Table 4 Performance comparison of proposed algorithm on NLPR gait database

Full size table

3.3 Evaluation on NLPR gait database

The NLPR gait database [55] comprises the walk sequences of 20 subjects. Each subject has four walk sequences captured from three different viewing angles: $0{^\circ }$, $45{^\circ }$ and $90{^\circ }$. Each subject in the database walks twice, from right-to-left and from left-to-right. The performance of the proposed algorithm is evaluated on the gait sequences recorded at $0{^\circ }$ viewing angle. The first three recordings of each subject are used in the training set and the rest of the forth recording is used in the testing set. The recognition result obtained by the proposed algorithm on NLPR database and its comparison with other competing techniques is reported in Table 4. The proposed algorithm along with [14, 20, 21] achieves the 100% recognition rate on NLPR gait database.

Table 5 Performance comparison of proposed algorithm on CASIA-B gait database

Full size table

3.4 Evaluation on CASIA-B gait database

CASIA-B [58] is another large gait database comprising the gait sequences of 124 subjects. The sequences are captured from 11 different viewing angles. Each subject in the database has ten sequences of gait with three variations: six sequences of normal walk (nm), two of walk while wearing a coat (cl) and two of walk with bag (bg). The sequences recorded in the lateral view are used in the evaluation, and three different experiments namely nm, cl and bg are conducted. In all experiments, the first four normal walk sequences from all 124 subjects are placed in the training set, and the rest of two nm, cl and bg sequences are used in the testing set, separately. The recognition accuracies obtained by the proposed algorithm and their comparison with competing methods are presented in Table 5. The results reveal that the proposed algorithm outperforms the existing methods in nm and bg, while SDL [59] performs slightly better than our method in cl. Our method achieves the highest average recognition accuracy of 96.6%.

3.5 Evaluation on CASIA-C database

The CASIA-C database [46] comprises the gait videos of 153 subjects with four walking scenarios: normal walk (fn), fast walk (fq), slow walk (fs) and walk with backpack (fb). The database contains low-resolution videos which are recorded at night using a thermal imaging camera. Each subject in the database has four recordings of fn and two recordings of each fq, fs and fb. This database aims at evaluating the performance of gait recognition methods under variations in the walking speed, carrying condition and illumination changes. Four different experiments namely fn, fq, fs and fb are conducted. In all experiments, the first three walk sequences of fn from all 153 subjects are placed in the training set. In the first experiment, the forth remaining sequence of fn is placed in the testing set while the rest of fq, fs and fb sequences are placed in the testing set, separately, for the next three experiments. The recognition results obtained by the proposed algorithm and the competing methods are reported in Table 6. The results demonstrate that our method outperform the existing methods in most experiments and obtain the highest average recognition accuracy of 99.8%.

One can conclude from the results presented in the previous sections that the proposed gait recognition algorithm performed consistently better than many state-of-the-art gait recognition methods and that it is strong in resisting the changes in walk due to variations in clothing, shoes, backpacks, bags, jackets, and illumination. The source code of the proposed gait recognition algorithm is publicly released to reproduce the results reported in this paper. The source code along with a sample database and computed features are made available online at.^{Footnote 1}

Table 6 Performance comparison of proposed algorithm on CASIA-C gait database

Full size table

4 Conclusions

A new gait representation based on spatiotemporal characteristics of human motion is presented in this paper. Unlike most existing methods which require the segmented silhouette of human body region to compute features for gait recognition, the proposed method does not require such information, and it directly operates on the video sequences. The proposed method tracks a set of points in successive frames to compute the dense trajectories which are used to specify a spatiotemporal patch to compute the local descriptors in order to capture the person’s static appearance and motion information. The local descriptors are encoded using Fisher vector encoding and classification is performed using linear SVM. A rigorous experimental evaluation on five large well-known benchmark gait databases showed that the proposed algorithm is highly accurate. In future, we plan to extend the proposed method for view-invariant gait recognition and also explore the probabilistic modeling techniques to be able to track the individual even in occlusion.

Notes

http://www.pr.informatik.uni-siegen.de/en/gait-recognition-based-spatiotemporal-features-human-motion.

References

Alotaibi, M., Mahmood, A.: Reducing covariate factors of gait recognition using feature selection and dictionary-based sparse coding. Signal Image Video Process. 11(6), 1131–1138 (2017)
Article Google Scholar
Bashir, K., Xiang, T., Gong, S.: Gait recognition using gait entropy image. In: IET ICDP, pp. 1–6 (2009)
Bouchrika, I., Nixon, M.: Model-based feature extraction for gait analysis and recognition. In: IEEE ICCV, pp. 150–160 (2007)
Castro, F., Marín-Jiménez, M., Guil, N.: Multimodal features fusion for gait, gender and shoes recognition. Mach. Vis. Appl. 27, 1213–1228 (2016)
Article Google Scholar
Castro, F.: Fisher motion descriptor for multiview gait recognition. Int. J. Pattern Recognit. Artif. Intell. 31(1), 1756002 (2017)
Article Google Scholar
Castro, F.M., Marín-Jiménez, M.J., Guil, N., Pérez de la Blanca, N.: Automatic learning of gait signatures for people identification. In: Advances in Computational Intelligence, pp. 257–270 (2017)
Chai, Y., et al.: A novel human gait recognition method by segmenting and extracting the region variance feature. Proc. Int. Conf. Pattern Recognit. (ICPR) 4, 425–428 (2006)
Article Google Scholar
Chen, S., Gao, Y.: An invariant appearance model for gait recognition. In: Proc. IEEE Int. Conf. Multimed. and Expo (ICME), pp. 1375–1378. IEEE (2007)
Choudhury, S.D., Tjahjadi, T.: Silhouette-based gait recognition using procrustes shape analysis and elliptic fourier descriptors. Pattern Recognit. 45(9), 3414–3426 (2012)
Article Google Scholar
Cunado, D., Nixon, M.S., Carter, J.N.: Automatic extraction and description of human gait models for recognition purposes. Comput. Vis. Image Underst. 90(1), 1–41 (2003)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Dupuis, Y., Savatier, X., Vasseur, P.: Feature subset selection applied to model-free gait recognition. Image Vis. Comput. 31(8), 580–591 (2013)
Article Google Scholar
Fan, R.E.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Goffredo, M., Carter, J.N., Nixon, M.S.: Front-view gait recognition. In: 2nd IEEE Int. Conf. on Biometrics: Theory, Applications and Systems, pp. 1–6. IEEE (2008)
Gross, R., Shi, J.: The CMU motion of body (MoBo) database. Carnegie Mellon University (2001)
Guan, Y., Li, C.T.: A robust speed-invariant gait recognition system for walker and runner identification. In: IEEE Int. Conf. on Biometrics (ICB), pp. 1–8 (2013)
Hofmann, M., Bachmann, S., Rigoll, G.: 2.5D gait biometrics using the depth gradient histogram energy image. In: Proc. IEEE BATS Conf., pp. 399–403 (2012)
Khan, M.H., et al.: Automatic recognition of movement patterns in the vojta-therapy using RGB-D data. In: Proc. Int. Conf. Image Process. (ICIP), pp. 1235–1239 (2016)
Khan, M.H., Li, F., Farid, M.S., Grzegorzek, M.: Gait recognition using motion trajectoryanalysis. In: Proc. 10th Int. Conf. on Computer Recognition Systems (CORES), pp. 73–82. Springer (2017)
Kusakunniran, W.: Attribute-based learning for gait recognition using spatio-temporal interest points. Image Vis. Comput. 32(12), 1117–1126 (2014)
Article Google Scholar
Kusakunniran, W., et al.: Automatic gait recognition using weighted binary pattern on video. In: Proc. 6th IEEE AVSS, pp. 49–54 (2009)
Lee, H., Hong, S., Kim, E.: An efficient gait recognition based on a selective neural network ensemble. Int. J. Imaging Syst. Technol. 18(4), 237–241 (2008)
Article Google Scholar
Lee, L., Grimson, W.E.L.: Gait analysis for recognition and classification. In: 5th IEEE Int. Conf. on Automatic Face and Gesture Recognit., pp. 155–162. IEEE (2002)
Lishani, A.O., Boubchir, L., Khalifa, E., Bouridane, A.: Human gait recognition based on haralick features. Signal Image Video Process. 11(6), 1123–1130 (2017)
Article Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley Series in Probability and Statistics. Wiley, New York (1987)
Google Scholar
Liu, J., et al.: Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1028–1039 (2017)
Article Google Scholar
Liu, J.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2018)
Article MathSciNet Google Scholar
Loula, F.: Recognizing people from their movement. J. Exp. Psychol. Hum. Percept. 31(1), 210 (2005)
Article Google Scholar
Lu, J., Zhang, E., Jing, C.: Gait recognition using wavelet descriptors and independent component analysis. In: Int. Symposium on Neural Networks, pp. 232–237. Springer (2006)
Man, J., et al.: Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 316–322 (2006)
Article Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley, Hoboken (2004)
MATH Google Scholar
Nizami, I.F., et al.: Multi-view gait recognition fusion methodology. In: 3rd IEEE Conf. on Industrial Electronics and Applications, pp. 2101–2105. IEEE (2008)
Peng, X., Wang, L., Wang, X., Qiao, Y.: Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput. Vis. Image Underst. 150, 109–125 (2016)
Article Google Scholar
Peng, X., Zou, C., Qiao, Y., Peng, Q.: Action recognition with stacked fisher vectors. In: ECCV, pp. 581–595 (2014)
Perronnin, F., et al.: Improving the fisher kernel for large-scale image classification. In: ECCV, pp. 143–156 (2010)
Raheja, J.L., Chaudhary, A., Nandhini, K., Maiti, S.: Pre-consultation help necessity detection based on gait recognition. Signal Image Video Process. 9(6), 1357–1363 (2015)
Article Google Scholar
Rida, I., Almaadeed, S., Bouridane, A.: Gait recognition based on modified phase-only correlation. Signal Image Video Process. 10(3), 463–470 (2016)
Article Google Scholar
Sánchez, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)
Article MathSciNet MATH Google Scholar
Shaikh, S.H., Saeed, K., Chaki, N.: Gait recognition using partial silhouette-based approach. In: 2014 Int. Conf. on Signal Processing and Integrated Networks (SPIN), pp. 101–106 (2014)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 568–576 (2014)
Sivapalan, S., et al.: Gait energy volumes and frontal gait recognition using depth images. In: Proc. Int. Joint Conf. Biometrics (IJCB), pp. 1–6 (2011)
Stevenage, S.V., Nixon, M.S., Vince, K.: Visual analysis of gait as a cue to identity. Appl. Cogn. Psychol. 13(6), 513–526 (1999)
Article Google Scholar
Sun, C., Nevatia, R.: Large-scale web video event classification by use of fisher vectors. In: Proc. IEEE Workshop on Applications of Computer Vision (WACV), pp. 15–22. IEEE (2013)
Tan, D., Huang, K., Yu, S., Tan, T.: Uniprojective features for gait recognition. In: Proc. Int. Conf. Biom., pp. 673–682 (2007)
Tan, D., et al.: Walker recognition without gait cycle estimation. In: Proc. Int. Conf. Biom., pp. 222–231 (2007)
Tan, D., et al.: Efficient night gait recognition based on template matching. Proc. ICPR 3, 1000–1003 (2006)
Google Scholar
Vaidya, S., Shah, K.: Real time video surveillance system. Int. J. Comput. Appl. 86(14), 22–27 (2014)
Google Scholar
Varol, G., Laptev, I., Schmid, C.: Long-term temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1510–1517 (2018)
Article Google Scholar
Veeraraghavan, A., Chowdhury, A.R., Chellappa, R.: Role of shape and kinematics in human movement analysis. In: Proc. IEEE CVPR, vol. 1, pp. I–730 (2004)
Veeraraghavan, A., et al.: Matching shape sequences in video with applications in human movement analysis. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1896–1909 (2005)
Article Google Scholar
Wang, C., et al.: Human identification using temporal information preserving gait template. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2164–2176 (2012)
Article Google Scholar
Wang, H.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Wang, L., Ning, H., Tan, T., Hu, W.: Fusion of static and dynamic body biometrics for gait recognition. IEEE Trans. Circuits Syst. Video Technol. 14(2), 149–158 (2004)
Article Google Scholar
Wang, L., Tan, T., Hu, W., Ning, H.: Automatic gait recognition based on statistical shape analysis. IEEE Trans. Image Process. 12(9), 1120–1131 (2003)
Article MathSciNet Google Scholar
Wang, L., Tan, T., Ning, H., Hu, W.: Silhouette analysis-based gait recognition for human identification. IEEE Trans. Pattern Anal. Mach. Intell. 25(12), 1505–1518 (2003)
Article Google Scholar
Whytock, T., et al.: Dynamic distance-based shape features for gait recognition. J. Math. Imaging Vis. 50(3), 314–326 (2014)
Article MATH Google Scholar
Yang, Y., Tu, D., Li, G.: Gait recognition using flow histogram energy image. In: Proc. Int. Conf. Pattern Recognit. (ICPR), pp. 444–449 (2014)
Yu, S., et al.: A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. Proc. Int. Conf. Pattern Recognit. (ICPR) 4, 441–444 (2006)
Google Scholar
Zeng, W., et al.: Silhouette-based gait recognition via deterministic learning. Pattern Recognit. 47(11), 3568–3584 (2014)
Article Google Scholar
Zhang, E., Zhao, Y., Xiong, W.: Active energy image plus 2DLPP for gait recognition. Signal Process. 90(7), 2295–2302 (2010)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Research Group for Pattern Recognition, University of Siegen, Siegen, Germany
Muhammad Hassan Khan & Marcin Grzegorzek
College of Information Technology, University of the Punjab, Lahore, Pakistan
Muhammad Shahid Farid

Authors

Muhammad Hassan Khan
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Shahid Farid
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Grzegorzek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Hassan Khan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khan, M.H., Farid, M.S. & Grzegorzek, M. Spatiotemporal features of human motion for gait recognition. SIViP 13, 369–377 (2019). https://doi.org/10.1007/s11760-018-1365-y

Download citation

Received: 27 November 2017
Revised: 19 July 2018
Accepted: 14 September 2018
Published: 26 September 2018
Issue Date: 12 March 2019
DOI: https://doi.org/10.1007/s11760-018-1365-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Spatiotemporal features of human motion for gait recognition

Abstract

Similar content being viewed by others

Gait Recognition Using Motion Trajectory Analysis

A generic codebook based approach for gait recognition

A Vision-Based Feature Extraction Techniques for Recognizing Human Gait: A Review

1 Introduction

2 Proposed gait recognition method

2.1 Motion descriptor computation

2.2 Codebook generation

2.3 Feature encoding

2.4 Classification

3 Experiments and results

3.1 Evaluation on TUM GAID database

3.2 Evaluation on CMU MoBo database

3.3 Evaluation on NLPR gait database

3.4 Evaluation on CASIA-B gait database

3.5 Evaluation on CASIA-C database

4 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spatiotemporal features of human motion for gait recognition

Abstract

Similar content being viewed by others

Gait Recognition Using Motion Trajectory Analysis

A generic codebook based approach for gait recognition

A Vision-Based Feature Extraction Techniques for Recognizing Human Gait: A Review

Explore related subjects

1 Introduction

2 Proposed gait recognition method

2.1 Motion descriptor computation

2.2 Codebook generation

2.3 Feature encoding

2.4 Classification

3 Experiments and results

3.1 Evaluation on TUM GAID database

3.2 Evaluation on CMU MoBo database

3.3 Evaluation on NLPR gait database

3.4 Evaluation on CASIA-B gait database

3.5 Evaluation on CASIA-C database

4 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation