1 Introduction

Technology has invaded our lives as never before, and the effectiveness of current security systems has become increasingly important. The development of automatic personal identification systems has increased in recent years, and worldwide effort has been devoted to broaden and enhance personal identification systems. In particular, biometric recognition has become an area of particular interest and is used in numerous applications. Biometric recognition aims to identify individuals using unique, reliable and stable physiological and/or behavioral characteristics such as fingerprint, palmprint, face and gait. Gait recognition consists on discriminating among people by the way or manner they walk. Gait as a biometric trait can be seen as advantageous over other forms of biometric identification techniques for the following reasons:

  • The gait of a person walking can be extracted and analyzed from distance without any contact with the sensor.

  • The images used in gait recognition can be easily provided by low-resolution, video-surveillance cameras.

Gait recognition techniques can be classified into two main categories: model-based and model-free approaches. A model-based approach [1, 2] models the person body structure and uses the estimation over time of static body parameters for the recognition task (i.e., trajectory, limb lengths, etc). This process is usually computationally intensive since one needs to model and track the subjects body. On the other hand, a model-free approach does not recover a structural model of the human motion, and instead, it uses the features extracted from the motion or shape for the recognition. Compared to a model-based approach, the model-free approach is less computationally intensive while the use of dynamic information results in much improved recognition performance than a static counterpart [3]. These reasons have motivated the researchers to introduce new feature representations to the model-free approach context. The major challenges of methods belonging to the model-free gait recognition are due to the effect of various covariates, which are the results of the presence of shadows, clothing variations and carrying conditions (backpack, briefcase, handbag, etc). From a technical point of view, the segmentation process and the viewing dependency are further causes of gait recognition errors. This has motivated the work presented in this paper, which aims to mitigate the effect of the covariates and hence to improve the recognition performance. In the present work, we introduce a wrapper feature selection algorithm combined with a modified phase-only correlation (MPOC) matching method. It is an improved version of the phase-only correlation (POC) matching algorithm using a band-pass-type spectral weighting function in order to achieve superior performances. It is an effective and efficient method to match images with low texture feature, which has been successfully applied to partial shoeprints classification [4] and image registration [5].

The rest of this paper is organized as follows: Sect. 2 summarizes the previous works. Section 3 gives the theoretical description of the proposed method. Section 4 presents the experimental results. Section 5 offers our conclusion.

2 Related works

There exists a considerable amount of work in the context of model-free approaches for gait recognition. BenAbdelkader et al. [6] introduced a self-similarity representation to measure the similarity between pairs of silhouettes. Collins et al. [7] proposed a template-based silhouette matching in some key frames. Recent trends seem to favor gait energy image (GEI) representation suggested by Han and Bhanu [8]. GEI is a spatio-temporal representation of the gait obtained by averaging the silhouettes over a gait cycle. This representation has already been used in several state-of-the-art works [912].

It has been found that the different clothing and carrying conditions between the gallery and probe sequences influence the recognition performances [8, 13]. To overcome the limitations of the GEI representation, several works have been proposed. Bashir et al. [14] introduced a novel gait feature selection method referred to as gait entropy image (GEnI). It consists of computing Shannon entropy for each pixel over a gait cycle; in other terms, it aims to distinguish static and dynamic pixels of the GEI. In this case, GEnI represents a measure of feature significance (pixels with high entropy, which correspond to dynamic parts, are robust against appearance changes). In the same context, Bashir et al. [15] suggested a new gait representation called flow field in order to represent a weighted sum of the optical flow corresponding to each coordinate direction of the human motion. This representation showed good performances in the presence of covariates. Dupuis et al. [16] introduced an interesting feature selection method based on Random Forest rank features algorithm.

3 Methodology

Feature representation model-free approaches for gait recognition imply a high-dimensional feature space, thus requiring some dimensionality reduction techniques. Han and Bhanu [8] suggested a canonical discriminant analysis. This method consists of applying principal component analysis (PCA) followed by a multiple discriminant analysis (MDA). This technique has become popular in gait recognition applications and has already been used in some works [14, 15]. The disadvantage of this method is due to the drawbacks of the dimensionality reduction performed using linear techniques such as PCA, which makes it unsuitable in some situations. The returned bases of PCA have to be orthogonal, as consequence it has a limited ability to model nonlinear structures [17, 18].

In this paper among all available feature representations, we have chosen GEI since it is an easy and simple representation to compute, thus making it an effective compromise between the computational cost and the recognition performance. As shown in Fig. 1, our framework is divided into two main modules: The first one consists of selecting the feature subset using a wrapper feature selection algorithm (see Sect. 3.5) including the parameter of the spectral weighting function (see Sect. 3.3) on a feature selection set independent from the training and testing set. The second module is used to compute the performance of our method (correct classification rate) using GEI features selected in the first module and the MPOC algorithm of matching, which estimates its spectral weighting function parameter in the first module, too.

Fig. 1
figure 1

Scheme representing modules of our method

3.1 Gait energy image

The human walk is considered as a cyclic activity where the motion repeats at a constant frequency. GEI image is a representation of human walk using a single grayscale image obtained by averaging the silhouettes extracted over a complete gait cycle [8]. GEI is computed using the following equation

$$\begin{aligned} G(x,y)=\frac{1}{N}\sum _{t=1}^{N}B(x,y,t) \end{aligned}$$
(1)

where \(N\) is the number of the frames within a complete gait cycle, \(B\) is a silhouette image, \(x\) and \(y\) are the spatial coordinates of the image, while \(t\) being frame number in the cycle. Pixels with low intensity correspond to the dynamic parts of the body (lower part of the body), and this part of the GEI is very useful for the recognition and is not affected by the covariates as carrying and clothing conditions. Pixels with high intensity correspond to the static parts of the body (top part of the body), and this part contains the body shape information, which can be useful for identification, but it can be affected by the covariate conditions [14] (Fig. 2).

Fig. 2
figure 2

Gait energy image of an individual under different conditions

3.2 Phase-only correlation

Since in the Fourier domain, the phase information captures (preserves) more features of the patterns than the magnitudes, phase-based image matching can be very attractive [19]. This technique was already successfully used in biometric applications such as palmprint [20], fingerprint [21] and Iris [22]. This section shows the definition and concept of the phase-only correlation function on GEI.

Let us consider two images \(f(n,m)\) and \(g(n,m)\) each having a size of \(N_{1}\times N_{2}\) where \(N_{1}=2N+1\) and \(N_{2}=2M+1\) so that the index range of \(n\) and \(m\) is \(-N \cdots N\) and \(-M \cdots M\), respectively. Let \(F(u,v)\) and \(G(u,v)\) denote 2D DFTs of two images, which can be written as follows:

$$\begin{aligned} F(u,v)= & {} \sum _{n=-N}^{N} \sum _{m=-M}^{M} f(n,m)e^{\frac{-2jun\pi }{N_{1}}} e^{\frac{-2jvm\pi }{N_{2}}}\nonumber \\= & {} A_{F}(u,v)e^{j\theta _{F}(u,v)}\end{aligned}$$
(2)
$$\begin{aligned} G(u,v)= & {} \sum _{n=-N}^{N} \sum _{m=-M}^{M} g(n,m)e^{\frac{-2jun\pi }{N_{1}}} e^{\frac{-2jvm\pi }{N_{2}}}\nonumber \\= & {} A_{G}(u,v)e^{j\theta _{G}(u,v)} \end{aligned}$$
(3)

where \(A_{F}(u,v)\), \(A_{G}(u,v)\) and \(\theta _{F}(u,v)\), \(\theta _{G}(u,v)\) are the amplitude and phase components, respectively. The cross-phase spectrum is given by

$$\begin{aligned} R_{FG}(u,v)=\frac{F(u,v)\overline{G(u,v)}}{\left| F(u,v)\overline{G(u,v)} \right| }=e^{j\theta (u,v)} \end{aligned}$$
(4)

where \(\overline{G(u,v)}\) is the complex conjugate of \(G(u,v)\) and \(\theta (u,v)\) denotes the phase difference \(\theta _{F}(u,v)-\theta _{G}(u,v)\). The POC function \(r_{fg}(n,m)\) is the 2D inverse DFT (2D IDFT) of \(R_{FG}(u,v)\) given by

$$\begin{aligned} r_{fg}(n,m)=\frac{1}{N_{1}N_{2}}\sum _{u=-N}^{N} \sum _{v=-M}^{M} R_{FG}(u,v)e^{\frac{2jun\pi }{N_{1}}}e^{\frac{2jvm\pi }{N_{2}}}\nonumber \\ \end{aligned}$$
(5)

If the matched images \(f(n,m)\) and \(g(n,m)\) are similar, the POC function gives a distinct sharp peak like Kroneckers delta function \(\delta (n,m)\). The hight of the peak gives the similarity measure for image matching. If the images are not similar, the peak drops significantly (see Fig. 3).

Fig. 3
figure 3

Example of Phase-Only Correlation matching. a same class, b different class

3.3 Spectral weighting function

Since the high-frequency components have a low reliability (\(\frac{S}{N}\)), a spectral weighting function has been used to emphasize these frequencies in order to improve the registration. Gueham et al. [4] proposed a band-pass-type spectral weighting function \(W\) to enhance the recognition rate of shoeprints by eliminating high frequencies without affecting the peak sharpness as follows:

$$\begin{aligned} W(u,v)=\left( \frac{u^{2}+v^{2}}{\alpha }\right) e^{-\frac{u^{2}+v^{2}}{2\beta ^{2}}} \end{aligned}$$
(6)

where \(\beta \) is parameter for controlling the function width and \(\alpha =4\pi \beta ^{4}\) for normalizing the peak between 0 and 1. It has a shape of a Laplacian of Gaussian (LoG). The weighting function is applied to the cross-phase spectrum, and the resulting modified cross-phase spectrum is given by

$$\begin{aligned} \widetilde{R}_{FG}(u,v)= & {} \frac{F(u,v)\overline{G(u,v)}}{\left| F(u,v)\overline{G(u,v)} \right| }\times W(u,v) \nonumber \\= & {} e^{j\theta (u,v)}\times W(u,v) \end{aligned}$$
(7)

The modified phase-only correlation (MPOC) is given by

$$\begin{aligned} \widetilde{r}_{fg}(n,m)=\frac{1}{N_{1}N_{2}}\sum _{u,v}^{} R_{FG}(u,v)e^{\frac{2jun\pi }{N_{1}}}e^{\frac{2jvm\pi }{N_{2}}}\times W(u,v)\quad \,\,\, \end{aligned}$$
(8)

where \(\sum _{u,v}^{}= \sum _{u=-N}^{N} \sum _{v=-M}^{M}\)

3.4 Proposed recognition Algorithm

Let us consider an unknown GEI sample from the probe \(\{f_{i}\}_{i=1}^{N}\), where \(N\) is the size of the probe. The algorithm compares this sample to the entire gallery \(\{g_{j}\}_{i=1}^{M}\) where \(M\) is the size of the gallery and determines the matching score between each couple \((f_{i},g_{j})\). The matching score corresponds to the maximum value of the inverse Fourier transform of the cross-phase spectrum. After matching an input image from the gallery to all probes, the results are sorted from the highest match to the lowest. The result image is identified as the image having the highest score from the gallery. The correct classification rate (CCR) is the ratio of the number of well-classified samples over the total number of samples (see Algorithm 1).

figure a

Footnote 1

3.5 Supervised feature selection

The feature selection process aims to select a subset of relevant features from the initial set. The main goal of feature selection is to enhance the classification accuracy. There exist two families of supervised feature selection: filters and wrappers. The filter approach is independent of the learning algorithm and precedes the classification process (entropy). On the other hand, in a wrapper approach the classification is used itself to measure the importance of the features, and hence, this approach achieves better performances since it has a direct interaction with the specific classification method. Due to the large number of possible feature subsets \(2^{s}\) (where \(s=w\times h\), \(w\) is the width and \(h\) is the high of initial GEI) which are usually computationally intensive, a wrapper approach requires a strategy of search to explore efficiently the feature subsets.

To make the strategy of search efficient, we have reduced the number of features by considering each row as a feature unit. In addition, the gait of an individual is characterized much more by the horizontal than the vertical motion [23], which makes our new feature representation unit suitable to the problem at hand. We have divided the GEI into two equal parts (top and bottom part). We have removed the rows from the top of the bottom part sequentially (lower part of GEI contains dynamic information, which is important for the recognition process [14]). Once we have found the best feature subset of the bottom part, we have also investigated the top part of the GEIs, which may contain some informative features (head shape, neck) by adding sequentially rows from top of the top part (see Algorithm 2).

figure b

4 Experiments and results

4.1 Dataset

We have used CASIA database (dataset B) [13] to evaluate our method. It is a multiview gait database containing 124 subjects captured from 11 different angles starting from \(0^{\circ }\) to \(180^{\circ }\). Each subject has six normal walking sequences, two carrying-bag sequences and two wearing-coat sequences. The selection method should not be specific for a particular training set, and hence, we have applied our feature selection algorithm on a small feature selection set independent from gallery and probe sets [16, 24]. To create our feature selection set, we have randomly selected without replacement of 24 subjects, and for each subject, three sequences are randomly chosen corresponding to the three variants (normal, carrying-bag and wearing-coat) as consequence our feature selection set contains 72 GEIs (all selected sequences from the feature selection set were removed from the gallery and probe sets). The data partition of the carried out experiments and the content of CASIA database related \(90^{\circ }\) view angle are summarized in Tables 1 and 2, respectively.

Table 1 Data partition of carried out experiments under \(90^{\circ }\) view
Table 2 CASIA database content under \(90^{\circ }\) view

4.2 Feature selection

The feature selection algorithm is applied to the feature selection set, the evaluation is performed with a threefold cross-validation scheme (normal, carrying-bag, wearing-coat), two variants were used for training and the left-out variant for test using 64*64 GEIs. From Fig. 4, it can be seen that rows 44–64 in the bottom part, rows 1–15 in the top part and the width 31 of the spectral weighting function give the best CCR. This allows us to conclude that both the bottom and top parts of the GEI contribute in the recognition process. It has been already proven that the dynamic part of the legs is the most informative part [14], which is confirmed by our experiments. From our work, we can also show that the top part of GEIs is also discriminative and contains parts, which help to improve CCR such as the head shape and neck as is shown in Fig. 5.

Fig. 4
figure 4

Correct classification rates using various feature subsets and spectral weighting function size a bottom part, b top part, c weighting function

Fig. 5
figure 5

Feature subset selected in our method a top part, b bottom part

4.3 Impact of covariates

In this section we focus on the impact of clothing, carrying conditions and the experiments were carried out using the selected features returned by our selection algorithm from GEIs (see Fig. 5). Table 3 compares the results obtained by our method against the results reported by four other existing methods. We have carried out the same experiments done by the authors of the cited works (\(90^{\circ }\) view angle, gallery vs normal, gallery vs carrying-bag and gallery vs wearing-coat). Our experiments demonstrate that our feature selection algorithm improves the recognition performance. It can also be seen that the CCR performance of our method marginally decreases in the normal and carrying-bag walks and considerably increases in the wearing-coat walk when compared against the other existing methods. This clearly demonstrates that our method is able to eliminate the features from the bottom part of the GEI, which, in turn, improves the recognition performance in the case of normal and carrying-bag walks, while these features are considerably affected in the presence of wearing-coat covariates. It can also be seen that our method makes a good compromise between different gait walk conditions recognition performance, all that can be seen in the mean of our method that outperforms the mean of other existing methods.

Table 3 Comparison of CCRs (In percent) from several different algorithms on CASIA database using \(90^{\circ }\) view

4.4 Impact of viewing angle

In this part, we focus on the impact of the viewing angle on the recognition performance. To achieve this, we have used the features selected by our method (see Fig. 5) and have calculated the performance using different combinations of viewing angles (gallery vs probe). Figure 6 illustrates the final CCR performance, which is calculated by taking the mean of the recognition performance under different conditions: normal, carrying-bag and wearing-coat shown in the Tables 4, 5, 6, respectively. It can be noticed that the best recognition performance is localized on the diagonal, which means that our method gives better performances when the angle view of the probe and gallery is similar. It can also be noticed that the combination gallery vs probe close to \(90^{\circ }\) viewing angle gives better performances when compared against other combinations.

Table 4 Correct classification performance (%) in normal walking conditions (gallery vs probe)
Table 5 Correct classification performance (%) in carrying-bag conditions (gallery vs probe)
Table 6 Correct classification performance (%) in wearing-coat conditions (gallery vs probe)
Fig. 6
figure 6

Final CCR using different view angles (gallery vs probe)

From the experiments carried out, it can be concluded that the proposed method is robust against the covariates but sensitive to the angle variations. An idea that can be exploited in the future to make our method robust against the viewing angle variations is to estimate the pose with a classifier to be trained on data of 11 different classes corresponding to the different viewing angles from \(0^{\circ }\) to \(180^{\circ }\). Once the viewing angle of the probe sample to be classified is estimated, one can then match it against the gallery samples with the same viewing angle.

5 Conclusion

This paper has presented a supervised feature extraction method for improved gait recognition. The key idea was to deploy a wrapper feature selection algorithm combined with a modified phase-only correlation matching method. This was achieved by using a band-pass spectral weighting function of the well-known phase-only correlation matching technique to deal with the small texture features resulting in improved performances. The proposed method achieved 81.40 % of correct classification and demonstrated attractive results, especially in the presence of covariates. The results also show that our method is sensitive to the viewing angle variations. Some improvements can be proposed in the future to make it more robust against the viewing variation by introducing a pose estimation technique.