1 Introduction

In recent years, gait recognition techniques have received wide attentions in various fields, such as access control systems, medical diagnostics, security monitoring and criminal investigation [2, 4, 17, 22, 39]. Compared to other biometric recognition methods (such as fingerprints, palm prints, face recognitions, etc.), the most important advantage of gait recognition is that the gait recognition is known as a contactless, non-invasive and effective human identification method [6, 8, 9, 21]. Moreover, in the current world, the high demand of human surveillance and identification systems urges the development of accurate, efficient and robust gait recognition techniques [11, 14].

Conventional gait recognition methods usually consist of four main steps: pre-processing, gait-cycle detection, feature extraction and classification, and can be categorized into two groups: model-free approaches [10, 26, 37] and model-based approaches [30, 31]. Model-based gait recognition methods model human motions using physiological characteristics. Gait dynamics and human kinematics are described using explicit features, such as the stick-figure model [19, 20]. Model-free gait recognition methods process gait sequences without any models. Motion characteristics of silhouettes and spatio-temporal shapes are usually analyzed. A typical model-free gait recognition method consists of a gait cycle detection algorithm, a set of training data, a feature extractor and a classifier.

In this study, a novel model-free gait recognition method is proposed. The gait features are extracted from GEIs using Gabor wavelets. The (2D)2PCA method is used to reduce the feature matrix dimension. A multi-class SVM is employed to identify the gait sequences. Experimental results show that the proposed algorithm greatly reduces the feature matrix dimension with improved recognition accuracy rates. The main contributions of this paper include:

  1. 1)

    New gait representation. The Gabor wavelets based gait feature set is a new time-varying gait representation. Compared to existing methods that use GEIs directly as features, the Gabor wavelets based gait features contain more detailed scaling and directional information, which results in high classification rates.

  2. 2)

    Effective dimension reduction algorithm. An effective dimension reduction algorithm was proposed to reduce the gait feature matrix dimension. According to the experimental results, the training time of (2D)2PCA on Dataset B of the CASIA gait database takes about 40 min, whereas the training time of traditional PCA takes 2–3 h.

  3. 3)

    High gait classification rates. Experimental results show that the proposed gait recognition algorithm is effective, and produces much higher gait classification rates than existing methods.

2 Related works

Gait recognition has been an active research topic for the past decade. Huang et al. [12] presented a gait recognition method combining Gabor wavelets and GEI. The Gabor wavelets are calculated from the dataset directly. In the proposed method, the Gabor wavelets are extracted from the GEI; and the (2D)2PCA dimension reduction method is employed to reduce the training time. Yoo et al. [36] classified gaits using back-propagation neural network (BPNN). A sequence of 2D stick-figure models were extracted with topological analyses. Han [10] summarized the GEI as an averaged silhouette image of a gait cycle, and applied the GEI to gait recognition to save storage space and computing time as well as reducing the noise sensitivity of the silhouette images. Ekinci and Aykut [7] proposed a novel approach for gait recognition using the kernel PCA (KPCA). Liu and Tan [16] studied LDA-subspaces to extract discriminative information from gait features in various viewing angles. Zeng et al. [38] proposed a deterministic learning (DL) theory with radial basis function (RBF) neural networks for gait recognition. Kejun et al. [15] applied the sparsity preserving projection (SPP) method to gait recognition, which was originally used in face recognition problems [23]. They also implemented an extension of the SPP named kernel SPP (KSPP). Luo et al. [18] introduced a robust gait recognition system based on canonical correlation analysis (CCA). Tang et al. [24] looked for the common view surface of the gait images under significantly different viewing angles. Aggarwal et al. [1] presented a gait recognition framework that utilizes Zernike moment invariants to deal with the covariates in clothing and bag carrying conditions. Chen et al. [3] designed a multimodal fusion framework for face and fingerprint images using block based feature-image matrix, which has better classification rates with low dimensional multimodal biometrics. Islam [13] proposed a wavelet-based feature extraction method for gait recognition. A template matching based approach is used for the gait classification process. Wu et al. [29] studied a gait recognition based human identification system using deep convolution neural networks (CNNs). It is recognized as the first gait recognition work based on deep CNNs.

3 Methodology

A typical gait recognition method usually involves three steps: detection of moving objects, feature extraction and gait recognition. Since the technology of moving object detection is relatively mature [32, 33], in this study, the last two steps will be attacked. In the second step, i.e. feature extraction, a novel Gabor wavelets based gait feature extraction algorithm is proposed. In the last step, i.e. gait recognition, a multi-class SVM based gait recognition algorithm is presented. An overview of the proposed gait recognition framework is shown in Fig. 1.

Fig. 1
figure 1

System diagram of gait recognition using the proposed method

3.1 Gabor wavelets based gait feature extraction

Gait energy image (GEI) is a cumulative energy diagram representing the complete gait cycle. The luminance values of the pixels reflect the frequencies (energies) of the body positions in a gait cycle. Given a sequence of gait silhouette images, gait energy image (GEI) is defined as:

$$ G\left( x, y\right)=\frac{1}{N}\sum_{t=1}^N{B}_t\left( x, y\right), $$
(1)

where N is the number of frames in a gait cycle; B t (x,y) represents the pixel value at point (x,y). In Fig. 2, the GEIs of a complete gait cycle are shown. In Fig. 3, three different conditions, i.e. normal condition, with bag and with clothes at 90o viewing angle, are illustrated.

Fig. 2
figure 2

The GEIs of different viewing angles under normally walking

Fig. 3
figure 3

The GEIs of three conditions under the viewing angle of 90o

The Gabor wavelet was introduced by David Gabor in 1946 [5]. He found that Gabor wavelet effectively extracts the local orientations of images at different scales, which is useful in simulating human perceptive fields and identifying the spatial position of images. The Gabor wavelet makes the gait recognition sensitive to the external environment, such as lighting, walking direction, speed, postures and so on. The Gabor wavelet is defined as:

$$ {\psi}_{\mu, \nu}(z)=\frac{\left\Vert {k}_{\mu, \nu}\right\Vert }{\sigma^2}{e}^{\left(-\frac{{\left\Vert {k}_{\mu, \nu}\right\Vert}^2{\left\Vert z\right\Vert}^2}{2{\sigma}^2}\right)}\left[{e}^{ik_{u,{v}^z}}-{e}^{-\frac{\sigma^2}{2}}\right], $$
(2)

where μ and ν represent the direction and scale of the Gabor kernel, z = (x,y) represents the coordinates of the image. The wavelet vector k μ,ν is defined as follows:

$$ {k}_{\mu, \nu}={k}_{\nu}{e}^{i\varphi}\mu, $$
(3)

where \( {k}_{\nu}=\frac{k_{\max }}{f^{\nu}} \); \( {\varphi}_u=\frac{\pi \mu}{8} \); k max is the maximal frequency; f represents the kernel distribution coefficient in the frequency domain; σ represents Gabor kernel ratio of the width and wavelength. Let I(x, y) represents the gray value distribution of the image, a Gabor convolution kernel can be defined as:

$$ {G}_{\mu, \nu}(z)= I(z)\otimes {\psi}_{u, v}(z), $$
(4)

where z = (x, y), ‘⊗’ is the convolution operators, G μ , ν (z) represents the convolution result of orientation and scales Gabor kernel. In this study, the Gabor wavelets are used in five scales and eight directions. The extracted image of Gabor features is defined as follows:

$$ S=\left\{{G}_{\mu, \nu}(z)|\mu \in \left\{0,\dots, 7\right\},\nu \in \left\{0,\dots, 4\right\}\right\}. $$
(5)

According to the convolution theorem, G μ , ν (z) can be solved by the fast Fourier transform (FFT):

$$ \zeta \left\{{G}_{\mu, \nu}(z)\right\}=\zeta \left\{ I(z)\right\}\zeta \left\{{\psi}_{\mu, \nu}(z)\right\}; $$
(6)
$$ {G}_{\mu, \nu}(z)={\zeta}^{-1}\left\{\zeta \left\{ I(z)\right\}\zeta \left\{{\psi}_{\mu, \nu}(z)\right\}\right\}, $$
(7)

where ζ and ζ ‐1 represent the Fourier transform and inverse Fourier transform. The Equations from (4) to (7) are combined to formulate G μ , ν (z) as:

$$ {G}_{\mu, \nu}(z)={M}_{\mu, \nu}(z)\cdot \exp \left( i{\theta}_{\mu, \nu}(z)\right), $$
(8)

where M μ , ν (z) and θ μ , ν (z) represent the amplitude information and the phase value, respectively. Amplitude information M μ , ν (z) is calculated by the local energy differences in an image. The amplitude spectrums of gait energy images are selected as gait features. Parameters of the amplitude spectrums are defined as σ = 2π, k max = π/2, \( f=\sqrt{2} \). Figure 4 depicts the 40 amplitude spectrums in the Gabor kernel (five scales and eight directions).

Fig. 4
figure 4

Fourty amplitude spectrums in the Gabor kernel

3.2 Dimension reduction using (2D)2PCA

The dimension reduction method extracts the most important information from the high dimensional Gabor wavelet feature space, and eliminates the correlations between the features. The dimension reduction is an important step to improve the time complexity for the overall framework. In this study, an extension of the conventional principal component analysis is proposed, which is called (2D)2PCA.

First, a two-dimensional principal component analysis (2DPCA) is introduced and applied to a two-dimensional matrix.

Assuming that the size of an image (amplitude spectrum) A is m × n. An m × d matrix Y is obtained by projecting A onto a matrix Q ∈ R n × d(n ≥ d):

$$ Y= AQ, $$
(9)

where Q represents the projection matrix and Y represents the feature matrix of image A. In the 2DPCA method, the total dispersion J(Q) of the projected vector Y is an evaluation of the projected matrix(Q):

$$ \begin{array}{l} J(Q)= tr\left\{ Sq\right\}= tr\left\{ E\left[\left( Y- E Y\right){\left( Y- E Y\right)}^T\right]\right\}\\ {}= tr\left\{ E\left[{\left( AQ- E(AQ)\left( AQ- E(AQ)\right)\right)}^T\right]\right\}\\ {}= tr\left\{{Q}^T E\left[{\left( Q- E Q\right)}^T\left( Q- E Q\right)\right]\right\}\kern2.75em \end{array}. $$
(10)

The covariance matrix of image A is defined as:

$$ G= E\left[{\left( Q- EQ\right)}^T\left( Q- EQ\right)\right]=\frac{1}{M}\sum_{k=1}^M\left({Q}_k-\overline{Q}\right){\left({Q}_k-\overline{Q}\right)}^T, $$
(11)

where Q k is the kth m × n testing image, M represents the number of training samples. The average image matrix is defined as:

$$ \overline{Q}=\frac{1}{M}\sum_{k=1}^M{Q}_k. $$
(12)

In Eq. (10), the J(Q) can be simplified to:

$$ J(Q)= tr\left({Q}^T GQ\right), $$
(13)

where Q is the standard orthogonal matrix that maximizes J(Q). After an optimization process, the unit image matrix \( \widehat{Q} \) can be obtained corresponding to the maximum Eigen-values. Each column of \( \widehat{Q} \) is composed by the largest d feature vector corresponding to the nonzero Eigen-values of the covariance matrix G. The unit image matrix \( \widehat{Q} \) can be expressed as:

$$ \widehat{Q}=\left({q}_1,{q}_2,\dots, {q}_d\right)= \arg \max \left[ J(Q)\right], $$
(14)

Where q i T q j  = 0, i , j ∈ [1, d], and i ≠ j.

The gait features are extracted for the amplitude spectrums classification. For any A, the Y k can be defined as:

$$ {Y}_k={AQ}_k\kern0.75em k=1,\dots, d. $$
(15)

After projection, a set of feature vectors Y 1 , Y 2 ,...,Y d are called the main component vectors. The 2DPCA algorithm selects d principal component vectors to form an m × d matrix, which is called the feature matrix.

The (2D)2PCA algorithm is a two-dimensional PCA method in two directions (row and column). Gait features are extracted by combining the 2DPCA results on row and column direction. In (2D)2PCA, A k and \( \overline{A} \) are formulated as:

$$ {A}_k={\left[{\left({a}_k^{(1)}\right)}^T,{\left({a}_k^{(2)}\right)}^T,\dots, {\left({a}_k^{(m)}\right)}^T\right]}^T,\kern0.5em \overline{A}={\left[{\left({\overline{a}}^{(1)}\right)}^T,{\left({\overline{a}}^{(2)}\right)}^T,\dots, {\left({\overline{a}}^{\left(\mathrm{m}\right)}\right)}^T\right]}^T, $$
(16)

where a k (i) and \( {\overline{a}}^{(i)} \) represent the ith row vectors of A k and \( \overline{A} \). The Eq. (11) can be rewritten as:

$$ {G}_t=\frac{1}{M}\sum_{k=1}^M\sum_{i=1}^m{\left({a}_k^{(i)}-{\overline{a}}^{(i)}\right)}^T\left({a}_k^{(i)}-{\overline{a}}^{(i)}\right). $$
(17)

The optimal projection matrix Q is obtained by calculating the former d Eigen-vectors corresponding to largest Eigen-values. In Eq. (17), G t can be reconstructed using the outer product between the column vectors.

Similarly, in column direction, A k and \( \overline{A} \) are:

$$ {A}_k=\left[\left({a}_k^{(1)}\right),\left({a}_k^{(2)}\right),\dots, \left({a}_k^{(m)}\right)\right],\kern0.5em \overline{A}=\left[\left({\overline{a}}^{(1)}\right),\left({\overline{a}}^{(2)}\right),\dots, \left({\overline{a}}^{\left(\mathrm{m}\right)}\right)\right] $$
(18)

where a k (j) and \( {\overline{a}}^{(j)} \) represent the ith column vector of A k and \( \overline{A} \). The covariance matrix of image is:

$$ {G_t}^{\hbox{'}}=\frac{1}{M}\sum_{k=1}^M\sum_{i=1}^m{\left({a}_k^{(j)}-{\overline{a}}^{(j)}\right)}^T{\left({a}_k^{(j)}-{\overline{a}}^{(j)}\right)}^T. $$
(19)

By projecting an m × n image on Q, another m × n matrix Y(Y = AQ) is generated; and the feature matrix C (q × d) of (2D)2PCA can be obtained by:

$$ C={Z}^T AQ. $$
(20)

After projecting each training image A k (k = 1,2,...,M) to Q and Z, the feature matrices C k (k = 1,2,...,M) of the training images can be calculated.

To evaluate the efficiency of the proposed (2D)2PCA algorithm, the training time of the (2D)2PCA is compared with the traditional PCA on the CASIA gait database, Database B. The hardware specification includes a CPU with Intel Core i5 3.2GHz, and a RAM with Kingston 8GB DDR4 2400 MHz. Python (version 2.73) and OpenCV (version 2.4.9) are used to implement the PCA and (2D)2PCA algorithms. In Fig. 5, by comparing the training time against the number of videos, the proposed (2D)2PCA method shows much butter efficiency performance than the traditional PCA method.

Fig. 5
figure 5

Training time comparison between PCA and (2D)2PCA

3.3 Gait recognition using multi-class SVM

Support vector machine (SVM) is a powerful machine learning technique based on statistical learning theory [25]. In this study, a multi-class SVM based gait recognition algorithm is implemented for gait recognition.

The traditional SVM is a two-class classifier; and there are two approaches to solve the n-class problem with SVM: the one-against-one approach and the one-against-rest approach [34, 35]. Since the dimension of the selected feature subset is relatively small, the one-against-rest approach is adopted to implement the multi-class SVM. The detailed gait recognition algorithm is described in Algorithm 1.

figure d

The SVMs established in Step 4 are Gaussian kernel SVM, with Gaussian kernel function:

$$ g\left( x1, x2\right)= \exp \left(-\gamma .{\left\Vert x1- x2\right\Vert}^2\right). $$
(21)

The objective function is:

$$ \left\{\begin{array}{l}\underset{\alpha}{ \min}\kern1.5em \sum_{i=1}^n\sum_{j=1}^n{\alpha}_i{\alpha}_j{y}_i{y}_j\left({x}_i\cdot {x}_j\right)-\sum_{i=1}^n{\alpha}_i\\ {} s. t.\kern2em \sum_{i=1}^n{\alpha}_i{y}_i=0\\ {}{\alpha}_i\in \left[1, C\right],\kern1em i=1,2,\dots, n\end{array}\right.. $$
(22)

The parameters γ and C in Eqs. (21) and (22) must be tuned to achieve accurate classification results in Algorithm 1. The overall tuning process is shown in Fig. 6. Horizontally, the value of parameter C is fixed; and the value of γ is tuned from 0.1 to 100. Vertically, the value of parameter γ is fixed; and the value of C is tuned from 1 to 10. It is noted that the optimal combination of the two parameters must be tuned every time for different training data.

Fig. 6
figure 6

Parameter tuning process for the multi-class SVM

4 Experiments

The open-access CASIA gait database provided by Institute of China Automation is utilized to verify the effectiveness of the proposed algorithm [40]. The database consists of three sub-datasets, namely, Dataset A, Dataset B and Dataset C. Samples of the three sub-datasets are shown in Fig. 7. The Dataset A contains 240 image sequences corresponding to twenty different persons, three walking lanes and 4 different conditions. The Dataset B is a large multi-view gait database which contains 13,640 gait videos. 124 persons are recorded from 11 viewing angles, i.e. from 0o to 180o (every 18 degree). For each viewing angle of each person, there are 10 gait videos recorded, including 6 normal videos, 2 with bag and 2 with clothes. The Dataset C is another large-scale gait video collection by infrared cameras at night. In total 153 persons are recorded. Each person walks under four conditions: normal walking, fast walking, slowly walking and walking with a bag.

Fig. 7
figure 7

Examples of CASIA Dataset A, Dataset B and Dataset C

There are in total four experiments performed in this section. Experiments 1 and 2 are based on Dataset B; experiment 3 is performed on Dataset A; and experiment 4 is tested on Dataset C. In general, the experiments can always be divided into two phases, i.e., the training phase and the testing phase. In the training phase, the GEI is calculated for each video sequence; and the amplitude spectrums of Gabor wavelets are calculated as gait features. After applying (2D)2PCA to reduce the dimension of the feature matrix, optimal mapping matrix is obtained. In the testing phase, the testing gait sequence is converted from high-dimensional feature matrix to low-dimensional feature space using the calculated optimal mapping matrix. And the multi-class SVM algorithm is used to classify the gaits.

The correct classification rate (CCR) is employed to measure the classification accuracy:

$$ CCR=\frac{TP+ TN}{TP+ TN+ FP+ FN}\times 100\% $$
(23)

where TP, TN, FP and FN indicate the number of classified samples that are true positive, true negative, false positive and false negative [27, 28].

4.1 Experiment 1

In experiment 1, the first three sets of gait sequences from all normal groups in Dataset B, i.e. no coat and no bag, are selected as the training set, and the remaining three sets of gait sequences in normal groups are used as the testing set. Both the training sample number and the test sample number are 372. Seven different dimension reduction techniques are implemented for the dimension reduction step, including the conventional PCA, LDA method, KPCA [7], KSPP [15], PCA + SPP [23], 2DPCA method and (2D)2PCA. The gait recognition steps for all approaches are the same. The CCRs of the seven algorithms in Experiment 1 are shown in Table 1. All classification rates are recorded with rank = 1. The complete cumulative match characteristics (CMC) curve is shown in Fig. 6 with rank from 1 to 30 for the seven algorithms.

Table 1 The CCRs of seven different algorithms in experiment 1

From Table 1 and Fig. 8, the proposed method achieves the highest classification rates among all approaches. The coefficient matrix is calculated by sparse projection, which preserves the local information of the original data and potentially makes the gait features more distinguishable. Moreover, the CCRs of traditional methods are lower than hybrid methods. For example, by combining the PCA and SPP, the classification rates are higher than PCA and KPCA.

Fig. 8
figure 8

The CMC curves of seven methods in experiment 1

4.2 Experiment 2

In the experiment 2, three normal groups, one group with bags and one group with coats under 90-degree viewing angle in Database B are selected as the training set; and the other groups in Database B are selected as testing set. The CCRs of the seven algorithms (rank = 1) similar to Experiment 1 are shown in Table 2.

Table 2 The CCRs of seven different algorithms in experiment 2

Comparing the classification rates between PCA and Gabor + (2D)2PCA, the classification rates of PCA method in 18o, 36o and 54o angles are significantly lower than the proposed Gabor + (2D)2PCA method. Moreover, the training time of (2D)2PCA is about 40 min comparing with the 2–3 h training time for the PCA method. The multi-class SVM processes the matrix size of 64 × 64, whereas the PCA method processes the matrix of dimension 4096 × 4096.

Figure 9 shows the CMC curves of the seven algorithms in experiment 2. Since Gabor wavelets extract scaling and directional information effectively, the GEI feature set contains more detailed scaling information in all directions. Therefore, compared to the method using GEIs directly as features, the proposed method produces higher classification rate. The classification rates of PCA + SPP, KSPP, and the proposed method are close. But both former methods use linear programming techniques in the sparse reconstruction process. The training time is much longer than the proposed method. From Table 2 and Fig. 9, the proposed method demonstrates properties including lower complexity, less training time, and higher classification rates. Moreover, the robustness of the proposed algorithm is also demonstrated on two aspects:

  1. 1)

    Stable classification rates under different viewing angles. The classification rates using the proposed algorithm achieve more than 90% under all 11 viewing angles.

  2. 2)

    Similar classification rates with or without carrying bags. In gait recognition problems, the gait sequences for carrying and not carrying bags are different stories. The proposed method shows high classification rates for both cases of with or without carrying bags.

Fig. 9
figure 9

The CMC curves of seven different algorithms in experiment 2

4.3 Experiment 3

In experiment 3, two image sequences in Dataset A are selected as the training set; and the other sequences are treated as the testing datasets. Figure 10 shows the variation of the overall classification rates under different rank numbers using the proposed algorithm, PCA, Tang et al.’s 3-dimensional partial similarity matching (3DPSM) method [24] and Aggarwal et al.’s Zernike moment invariants (ZMI) method [1].

Fig. 10
figure 10

The CMC curves of four different algorithms in experiment 3

In Fig. 10, for rank number less than 20, the (2D)2PCA has higher classification rate than the other three methods. Figure 10 also illustrate the fact that, compared with the results in Experiment 1 and Experiment 2, the performance of the proposed method and the PCA approach are greatly improved. The reason is that, in Dataset A, there is no interference factors, such as bags and coats.

4.4 Experiment 4

In experiment 4, half of the image sequences are evenly selected from Dataset C as the training set; the remaining sequences are treated as the testing datasets. Figure 11 shows the CMC curve comparison among the proposed method, PCA, the 3DPSM method [24] and the ZMI method [1].

Fig. 11
figure 11

The CMC curves of four different algorithms in experiment 4

From Fig. 11, the classification rate of the proposed method is again higher than the other methods for rank number less than 25. In addition, compared with the results in Experiment 3, the overall classification rates decrease because of more interference factors in Dataset C, such as the poor video quality

5 Conclusion, discussion and future work

In this study, a novel gait recognition algorithm based on Gabor wavelets and (2D)2PCA is proposed to extract gait features. The proposed algorithm is demonstrated to have advantages, such as lower complexity, less training time, robustness and higher classification rates, compared to existing gait recognition methods. The concept of GEI preserves the important information including walk frequency, contour and phase information. The (2D)2PCA algorithm significantly reduces the feature space dimension to save the training time. A multi-class SVM is implemented to classify gait sequences. Experimental results show that the proposed method produces higher recognition accuracy than six existing approaches with less computational time. The robustness of the proposed algorithm is demonstrated by testing it on various viewing angles and rank numbers.

One limitation of the proposed approach is that the generated GEIs lose some dynamic information, since they are calculated by averaging a series of images. As a future work, other feature information, such as geometric information, texture and so on, will be integrated into GEI to improve the quality of the GEIs.