Keywords

1 Introduction

With the development of information technology, human identification has been widely studied. Recently, human identification bases on biological feature has been the hottest topic in this field, due to the uniqueness of biological characteristics. Biometric based human identification technology refers to identifying the human identity using the body’s different inherent physiological characteristics or behavioral characteristics. In our past research, the common used physiological biological features are fingerprint recognition [1], face recognition [2], iris recognition [3]. (1) Fingerprint recognition: fingerprint refers to the ridges on the frontal skin of human fingers. The starting point, the ending point, the joining point and the bifurcation point of the ridge line are feature points, which are different from people to people. The fingerprint recognition is wildly used in many situations, such as unlock the phone and unlock the door. (2) Face recognition, the human face consists of several parts, such as eyes, nose, mouth, and chin. The geometric description of these parts and the structural relationship between them can be used as important features to identify faces. The face recognition is now used in mobile payment and many other fields. (3) Iris recognition. The eye-structure of human consists of sclera, iris, pupil lens, retina and other parts. The iris is an annular portion between the black pupil and the white sclera, which contains many detailed features such as staggered spots, filaments, coronal, stripes and crypts. All these details are unique for human beings, so iris can be used to identify people. Iris is often used in high security requirement situations. In short, all these technologies bring great convenience in our daily life. However, the methods mentioned above are based on the humans’ cooperation and the identification task is only suitable for short distance recognition, which may cause masquerade and hidden problems. To overcome these problems, researchers started to focus on the human behavioral characteristics, and the human gait identification is proposed.

After several decades of development, human gait identification has already developed many mature frameworks, in this case, some common problems in gait recognition are solved to some extent such as human with different clothes, human carrying different things, and human in different light conditions. These problems are set in certain condition, one of the limits is they often conduct the research in a certain view angle, however, in practice, the camera is often fixed, and it only get certain gait view angles when the human walks through the captured area in different directions, which makes it difficult to acquire the overall information of human gait, especially for parallel conditions. To solve this problem caused by multi-view, researchers proposed several methods. View Transform Model (VTM), which can realize the mutual transformation between different view angles, makes it possible to obtain more abundant information, hence, the model is widely acknowledged by researchers. Based on VTM, in this paper, we propose an optimized algorithm, which converts all other view angles to 90° thus to obtain more gait features.

The rest of the article is organized as follows. In Sect. 2, we make a summary of the solutions designed for multi-view human gait recognition, and analyze the main challenges of them. In Sect. 3, we introduce the concept involved in View Transform Model (VTM) and demonstrate its application possibility for video-based human gait recognition. In Sect. 4, after demonstrating the experiment result, we discuss the preview of VTM based multi-view gait recognition, which could be utilized under different scenarios. Finally, we conclude the article and discuss the future directions of this research theme in Sect. 5.

2 Technical Background and Related Works

In general, gait recognition has three main steps, (1) gait detection and tracking, (2) gait feature extraction, (3) gait feature matching, after the three main steps, the recognition result can be obtained. As shown in Fig. 1, the captured video sequences containing probe gait are imported into the gait recognition system, and the human silhouettes are extracted from each frames. After that, we can extract the gait features according to the human silhouettes information. Finally, the extracted features are matched with the gallery gait and then recognized. In this case, the key topic of gait recognition is focusing on how to obtain more distinctive features. However, when it comes to multi-view gait recognition, the key topic can be specifically described as matching extracted features from different view angles properly.

Fig. 1.
figure 1

A typical flow of gait recognition.

After several decades of development, there have been many resolutions proposed in allusion to multi-view gait features extraction, (1) seeking view-invariant gait characteristics; (2) constructing 3D gait model with couples of cameras; (3) establishing view transform model.

For the first method, Liu [4] represented samples from different views as a linear combination of these prototypes in the corresponding views, and extracted the coefficients of feature representation for classification. Kusakunniran [5] extracted a novel view-invariant gait feature based on Procrustes Mean Shape (PMS) and consecutively measure a gait similarity based on Procrustes distance. In general, the common features in different view angles can be extracted in this method, thus, it is possible to realize gait recognition when the view angle span is wide. However, the common feature is often insufficient, which lead that the recognition performance is relatively poor.

For the second method, Kwolek [6] identified a person by motion data obtained by their unmarked 3D motion-tracking algorithm. Wolf [7] presented a deep convolutional neural network using 3D convolutions for multi-view gait recognition capturing spatial-temporal features. Comparably, this kind of methods can achieve higher recognition accuracy, however, constructing 3D model requires a heterogeneous layer learning architecture with additional calibration, as a result, the computational complexity constructing 3D model is significantly higher than other methods.

For the third method, Makihara [8] proposed a method of gait recognition from various view directions using frequency-domain features and a view transformation model. Kusakunniran [9] applied Linear Discriminant Analysis (LDA) for further simplifying computing. Compared with the two methods above, constructing VTM can achieve state-of-the-art accuracy with less cost, in this case, VTM based approaches become the mainstream approach of the multi-view gait recognition. Considering its’ advantage, in this paper, our proposal is based on VTM.

Building VTM, however, the recognition effect depends largely on the selection of which angle we transform other angles to. Meanwhile, the computational complexity when building the VTM model is relatively high. Considering the challenges mentioned above, we propose an optimized algorithm, which select a better gallery view angle for exacting more gait features and reduce the computational complexity. In this algorithm, we first get humans’ Gait Energy Image (GEI) from videos containing walking sequences. After that, we transform all other view angles to \( 90^\circ \) to exact more gait features, and then apply Principal Component Analysis (PCA) to reduce computational complexity. Finally, we measure the Euclidean distance between gallery GEI and transformed GEI for recognition.

3 Related Concept and Proposed Algorithm

3.1 Extracting Gait Energy Image (GEI)

The concept of Gait Energy Image (GEI) was first put forward by Han [10], which presents gait energy by normalizing gait silhouette in periods. As shown in Fig. 2, the higher the brightness, the higher the probability of silhouette appearing in the position.

Fig. 2.
figure 2

A sample of gait energy image (GEI)

To get the GEI, as a basis, we need to estimate the number of gait periods contained in each gait sequence. As mentioned in [11], we can determine the gait period by calculating the ratio of height (H) to width (W) of a person’s silhouette. Here, we use N to represent the number of periods contained in one gait sequence, and use T to represent the number of frames contained in one gait period. Then, we normalize all silhouettes by rescaling them along both horizontal direction and vertical direction to the same Width (W) and the same Height (H). The GEI can be obtained by:

$$ GEI(x,y) = \sum\limits_{n = 1}^{N} {\sum\limits_{t = 1}^{T} {S_{n,t} (x,y)/(N + T)} } $$
(1)

Where, \( S_{n,t} (x,y) \) represents a particular pixel locating in position \( (x,y) \) of t-th (t = 1, 2, …, T) image from n-th (n = 1, 2, …, N) gait period of a gait sequence.

3.2 Constructing an Optimized View Transform Model (VTM)

GEI contains several significant features, including gait silhouette, gait phase, gait frequency etc. However, the richness of GEI features in different view angles is different. As shown in Fig. 3, GEI in \( 90^{^\circ } \) contains most gait information. Therefore, we propose to transform other different angles to \( 90^{{^\circ }} \) for more distinctive features.

Fig. 3.
figure 3

Comparison between GEI of one sample in \( 0^{^\circ } \) and \( 90^{^\circ } \)

Makihara [8] first put forward the concept of VTM in his paper. In order to construct VTM, we apply the method of Singular Value Decomposition (SVD). SVD is an effective way to extract eigenvalue. For any matrix, it can be represented as the following form:

$$ A = U\sum V^{T} $$
(2)

If the size of A is \( M \times N \), U is a orthogonal matrix of \( M \times M \), V is a orthogonal matrix of \( N \times N \). Σ is a \( M \times N \) matrix, in addition to the diagonal elements are 0, elements on diagonal called singular value.

We create a matrix \( G_{K}^{M} \) with K rows and M columns, representing gait data containing K angles and M individuals.

$$ G_{K}^{M} = \left[ {\begin{array}{*{20}c} {{\text{g}}_{1}^{1} } & \cdots & {g_{1}^{m} } \\ \vdots & \ddots & \vdots \\ {g_{k}^{1} } & \cdots & {g_{k}^{m} } \\ \end{array} } \right] = USV^{T} = \left[ {\begin{array}{*{20}c} {P_{1} } \\ \vdots \\ {P_{K} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {v^{1} } & \cdots & {v^{M} } \\ \end{array} } \right] $$
(3)

In formula 3. \( g_{k}^{m} \) is a column vector of \( N_{g} \), representing the GEI characteristics of the m-th individual at the k-th angle. The size of U is \( KN_{\text{g}} \times M \), the size of V and S are \( M \times M \).\( P = [\begin{array}{*{20}c} {P_{1} } & \cdots & {P_{K} } \\ \end{array} ]^{T} = US \). After the singular value decomposition is completed, we can calculate the \( g_{k}^{m} \) with this formulation

$$ g_{k}^{m} = P_{k} \times v^{m} $$
(4)

Then, suppose \( G_{\theta }^{m} \) representing probe GEI feature with view \( \theta \) and \( \hat{G}_{(\varphi \leftarrow \theta )}^{m} \) representing the transformed GEI feature with view \( \varphi \). Firstly, using the probe GEI feature \( G_{\theta }^{m} \) estimate a point on the joint subspace \( \hat{G}_{( \leftarrow \theta )}^{m} \) by

$$ \hat{G}_{( \leftarrow \theta )}^{m} = P(\theta )^{ + } G_{\theta }^{m} $$
(5)
$$ {\text{Where}}\,\quad P(\theta )^{ + } { = }\left( {\left( {P\left( \theta \right)} \right)^{T} P\left( \theta \right)} \right)^{{{ - }1}} P\left( \theta \right)^{T} $$
(6)

Where || · ||2 denotes the L2 norm.

Secondly, the GEI feature \( \hat{G}_{(\varphi \leftarrow \theta )}^{m} \) of transformed view can be generated by projecting the estimated point on the joint subspace.

$$ \hat{G}_{(\varphi \leftarrow \theta )}^{m} = P(\varphi )\hat{G}_{( \leftarrow \theta )}^{m} $$
(7)

3.3 Applying Principal Component Analysis (PCA)

PCA [12] replace the original n features with less number of m for further reducing computational redundancy. New features is a linear combination of the characteristics of the old. These linear combinations maximize the sample variance, and make the new m features mutually related. Mapping from old features to new features captures the inherent variability in the data.

The main processes of PCA are:

Supposing there are N gait samples, and the grayscale value of each sample can be expressed as a column vector \( {\text{x}}_{i} \) with a size of \( M \times 1 \), the sample set can be expressed as \( [x_{1,} x_{2} \cdots x_{N} ] \).

The average vector of the sample set:

$$ \bar{X} = \frac{1}{N}\sum\limits_{i = 1}^{N} {x_{i} } $$
(8)

The covariance matrix of the sample set is:

$$ \Sigma = \frac{1}{N}\sum\limits_{i = 1}^{N} {(x_{i} - \bar{x})} (x_{i} - \bar{x})^{T} $$
(9)

Then, calculating the eigenvectors and eigenvalues of \( \Sigma \), the eigenvalues of X can be arranged in the following order \( \lambda_{1} \ge \lambda_{2} \ge \lambda_{3} \ge \cdots \ge \lambda_{N} \).We take the eigenvectors corresponding to the important eigenvalues to get a new dataset.

3.4 Contrasting Gait Similarity

In this paper, Euclidean distance is adopted to measure the similarity between gallery GEI (\( G^{i} \)) and transformed GEI (\( G^{i} \)). The Euclidean distance can be obtained by:

$$ {\text{d}}\left( {G^{i} ,G^{j} } \right) = \sum\limits_{X} {\sum\limits_{Y} {\left| {G^{i} (x,y) - G^{j} (x,y)} \right|} } $$
(10)

Where, \( {\text{d}}\left( {G^{i} ,G^{j} } \right) \) refers to Euclidean distance between \( G^{i} \) and \( G^{j} \), \( (x,y) \) refers to the location of one specific pixel in GEI. The smaller the Euclidean distance is, the more likely the two GEIs belong to the same person.

4 Experiment

CASIA - B [13] is an extensively used database, which was collected in 2005. It contains a total of 124 objects, each object separately contains 11 view-angles, which take \( 18^{\text{o}} \) as lad-der, range from \( 0^{\text{o}} \) to \( 180^{\text{o}} \), and meanwhile, there are 6 video sequences for each person in each angle. The sketch map of the database is showed as Fig. 4. In this paper, we construct VTM with 100 objects in one of the video sequences, and use other 24 objects to evaluate the performance of our proposal.

Fig. 4.
figure 4

The sketch map of the database

The overview framework of our proposal is shown in Fig. 5.

Fig. 5.
figure 5

The structure of the algorithm

Figure 6 illustrates one object transforming GEI view angle from \( 36^\circ \) to \( 90^\circ \) as example, which reflect the performance of VTM. Table 1 shows the performance of our proposal on the CASIA-B dataset. The first column indicates the probe view angle, and the second column indicates the accuracy of our proposal. Our proposal can significantly increase the gait feature information, in particular, the recognition accuracy is relatively higher when the angles are close to \( 90^\circ \).

Fig. 6.
figure 6

Comparison between estimate GEI obtained by VTM with the real GEI in \( 90^\circ \)

Table 1. Recognition accuracies with VTM

5 Conclusion and Future Work

The gait identification has gradually stepped into one of the mainstream approaches of biometric identification, considering it’s’ limitation caused by multi-view in practice, we choose \( 90^\circ \) as galley gait view to get more gait features. To transform other view angles to \( 90^\circ \), we first exact GEI from gait silhouettes, then, we construct a view transform model with SVD, and finally, we adopt PCA to further reduce the computational complexity. It can be drawn from Table 1 that our method can significantly improve the multi-view recognition performance.

The algorithm of multi-view gait recognition is still in the stage of continuous improvement, based on the researches we have done, our further study works mainly contains following aspect. Firstly, our proposal is suitable only for specific angles in the database, we will working for constructing a more ubiquitous view transformation model, which can realize the mutual transformation between arbitrary angles. Secondly, real-time recognition is not considered in this proposal, and how to establish a real-time multi-view gait recognition system is a tough task to be solved.