1 Introduction

Gait recognition is the identification of individuals based on personal walking motions [6]. The theoretic foundation of gait recognition is the 1964 work by Murray et al. [13], where it was reported that every person has a unique gait. In general, a human’s gait can be perceived at a distance [12], and gait recognition has the advantage of being noninvasive and a noncontact process. Furthermore, gait recognition can be used even when the human subject occupies too few image pixels for other biometrics to be perceivable. For this reason, the gait recognition system is an attractive and interesting subject for researchers in the field of computer vision and biometrics. Gait recognition can be roughly classified into two types: model-based and silhouette-based approaches [20].

Model-based methods [24] represent the human body or motion by employing explicit models which describe gait dynamics, such as stride dimensions and the kinematics of joint angles. However, model-based approaches are limited by imperfect vision techniques in the body structure/motion modeling and parameter recovery from a walking image sequence. Furthermore, the precision of the model makes the model-based approaches computationally expensive [11].

In contrast, the silhouette-based methods [1, 8, 9, 16, 19] characterize body movement using statistics of walking patterns, which capture both the static and dynamic properties of body shape [10]. In these approaches, the representation method for human gait obviously plays a critical part. Gait energy image (GEI) [8], head-torso-thigh image (HTI) [16], motion silhouette image (MSI) [9], active energy image (AEI) [19], and gait entropy image (GEnI) [1] are general and popular gait representations for silhouette-based methods. GEI is the most popular representation method and is robust against segmental error [8]. HTI is actually a part of GEI and is more robust against variations in walking speed than GEI [16]. MSI is a gray level image which describes the spatiotemporal gait information with high discriminating power [9]. AEI is constructed by accumulating only the active regions of silhouette images. It has the advantage of high quality human silhouettes and sufficient dynamic characteristics [19]. GEnI selects the relevant gait features from each pair of gallery and probe gait sequences by measuring the entropy to perform good recognition [1].

Unfortunately, however, when silhouettes are defective or corrupted, the gait representations are deformed, and the recognition performance is seriously degraded. To solve this problem, the probabilistic support vector machine (PSVM) [5] is employed. PSVM can evaluate the quality of the silhouette images since it yields probabilities of comparable quality to the other methods while still retaining the sparseness of the SVM [14]. The PSVM is trained to distinguish good silhouettes from bad ones and assigns the degree of goodness to each silhouette. Based on the outputs of the PSVM, the robust gait representation image is constructed by emphasizing the noise-free silhouettes while suppressing the corrupted images.

This paper is organized as follows: In Section 2, we provide some background about various gait representations and PSVM. In Section 3, the PSVM-based image-weighting method is presented. In Section 4, the proposed method is applied to the CASIA and SOTON databases to illustrate its performance. Conclusions are drawn in Section 5.

2 Background

2.1 Gait Energy Image (GEI)

GEI is an effective representation scheme with good discriminating power and robustness against segmental errors [8]. Given the preprocessed binary gait silhouette images f t (x, y) at time t in a sequence, GEI is computed by

$$ GEI\left( {x,y} \right) = \frac{1}{N}\sum\limits_{{t = 1}}^N {{{f}_t}\left( {x,y} \right)}, $$
(1)

where N is the number of frames in the complete gait sequence, and x and y are values in the image coordinates. Figure 1 shows some examples of GEI. In comparison with the binary silhouette sequence, GEI saves both storage space and computation time for recognition and is less sensitive to noise in individual silhouette images. When a silhouette is deformed, however, even GEI exhibits a degraded performance.

Fig. 1
figure 1

Examples of GEI

2.2 Head-Torso Image (HTI)

HTI [16], proposed by Tan et al., is a gait representation containing only the head and torso part of GEI in order to focus on a more stable region during walking in which there is less movement. Furthermore, we often identify people by the profile of the head and torso. From these observations, Tan et al. [16] use only the head-torso-thigh parts of human silhouettes to represent the human gait and refer to it as HTI.

HTI can be viewed to some extent as a first-order-statistic-based description of human gait from the structural perspective. HTI is defined as

$$ HTI\left( {x,y} \right) = \frac{1}{N}\sum\limits_{{t = 1}}^N {{{H}_t}\left( {x,y} \right)}, $$
(2)

where H t (x, y) is obtained by removing part of the crura from f t (x, y), and N is the number of frames in one sequence. Figure 2 shows some examples of HTI.

Fig. 2
figure 2

Examples of HTI

2.3 Motion Silhouette Image (MSI)

MSI [9] is a gray level image which retains the critical spatiotemporal information. The intensity of each pixel in an MSI is a function of its temporal history. MSI has high discriminating power and remains the critical information source for gait recognition. MSI is obtained by the following:

$$ MS{{I}_t}\left( {x,y} \right) = \left\{ {\matrix{ {255} \hfill &{{\text{if}}\;{{f}_t}\left( {x,y} \right) = 1} \hfill \\ {\max \left( {0,MS{{I}_{{t - 1}}}\left( {x,y} \right) - 1} \right)} \hfill &{{\text{otherswise}}.} \hfill \\ }<!end array> } \right. $$
(3)

Figure 3 shows examples of MSI.

Fig. 3
figure 3

Examples of MSI

2.4 Active Energy Image (AEI)

Existing gait representation methods usually have defects related to both the low quality of human silhouettes and insufficient dynamic characteristics. AEI [19] is constructed by accumulating only the active regions of silhouette images in order to make up for those defects. Given a binary gait silhouette image f t (x, y), we can calculate the difference image between two adjacent silhouettes as follows:

$$ {{D}_t}\left( {x,y} \right) = \left\{ {\matrix{ {{{f}_t}\left( {x,y} \right)} \hfill &{t = 1} \hfill \\ {\left\| {{{f}_t}\left( {x,y} \right) - {{f}_{{t - 1}}}\left( {x,y} \right)} \right\|} \hfill &{t > 1} \hfill \\ }<!end array> } \right., $$
(4)

where ║·║ is the Euclidean norm, and D t (x, y) is the difference between f t (x, y) and f t-1(x, y), i.e., D t (x, y) is the active region at time t, and it is desirable to use the difference image to extract the dynamic parts of the moving body. Accumulating the difference images, we can obtain AEI as

$$ AEI\left( {x,y} \right) = \frac{1}{N}\sum\limits_{{t = 1}}^N {{{D}_t}\left( {x,y} \right)} . $$
(5)

Figure 4 shows some examples of AEI.

Fig. 4
figure 4

Examples of MSI

2.5 Gait Entropy Image (GEnI)

GEnI [1] is proposed to distinguish the dynamic and static areas of a GEI by measuring the Shannon entropy at each pixel location. The gait cycle consists of a sequence of human silhouettes, and the intensity value of the silhouettes at a fixed pixel location can be considered as a discrete random variable. The Shannon entropy measures the uncertainty associated with the random variable over a complete gait cycle. To obtain the information content of the gait sequence proportional to the entropy value, GEnI is defined as

$$ {\text{GEnI}}\left( {x,y} \right) = - \sum\limits_{{k = 0}}^K {{{p}_k}\left( {x,y} \right){{{\log }}_2}\,{{p}_k}\left( {x,y} \right)}, $$
(6)

where p k (x, y) is the probability that a pixel (x, y) has the value k. Since f t (x, y) is a binary gait silhouette image with either 0 or 1, we set K = 1. The probability that the pixel has a value of 1 is \( {{p}_1}\left( {x,y} \right) = \frac{1}{T}\sum\limits_{{t = 1}}^T {{{f}_t}\left( {x,y} \right)} \) (i.e., the GEI), and the probability that the pixel has a value of 0 is \( {{p}_0}\left( {x,y} \right) = 1 - {{p}_1}\left( {x,y} \right) \). Figure 5 shows some examples of GEnI.

Fig. 5
figure 5

Examples of GEnI

2.6 Probabilistic Support Vector Machine (PSVM)

A posterior probability is very important in practical classification problems. SVM [17], however, cannot produce posterior probability but only a binary decision value. PSVM [5] maps the output SVM to the interval [0, 1] using a sigmoid function

$$ P\left( {\omega = \left\{ { + 1, - 1} \right\}|f} \right) = \frac{1}{{1 + \exp \left( { - d(f)} \right)}} $$
(7)

and determines the probability at which a data point belongs to a class where f is a data point; ω denotes its binary class with {+1,−1}, and d(f) is the unthresholded output of SVM defined as

$$ d(f) = {{w}^{\text{T}}} \cdot f + b, $$
(8)

where w and b denote a weight for f and the bias, respectively, in the SVM.

3 Probailistic image weighting scheme

In this section, we describe a new image-weighting scheme for robust silhouette-based gait recognition. An SVM classifier is trained to distinguish noise-free images from corrupted ones. The logistic sigmoid function is then employed to map the output of the SVM classifier to a probability interval [0, 1], assigning the probability to each test input image as a weight. Finally, a new gait representation is constructed using the previous methods such as GEI, HTI, MSI, etc., with the only exception being that the weights computed by the PSVM are used to represent the “goodness” of the silhouettes. For this reason, our methods can be combined with various silhouette-based gait-recognition systems to suppress the silhouette noise and improve the quality of the gait features. Figure 6 shows an overview of the proposed method.

Fig. 6
figure 6

Overview of the proposed method

3.1 Construction of the decision boundary

Both noise-free and corrupted images are used to train the SVM. For corrupted images, two kinds of rectangular noise are used: one is a black box noise, as shown in Fig. 7(b), and the other is a complement box noise which inverts the original color, as shown in Fig. 7(c). SVM is trained to separate the noise-free gait silhouettes from the corrupted ones. After training, when a new silhouette f is presented, the probability that the silhouette is noise-free is computed by

$$ P\left( {\omega = \left\{ {{\text{noise - free}}} \right\}|f} \right) = \frac{1}{{1 + \exp \left( { - d(f)} \right)}} = \frac{1}{{1 + \exp \left( { - {{w}^{\text{T}}} \cdot f - b} \right)}}, $$
(9)

where w and b denote the weight for f and the bias in the SVM, respectively.

Fig. 7
figure 7

Normal and noisy images: a normal image, b black box noise, c complement box noise

3.2 Probabilistic image weighting

SVM is an empirically optimal discriminant model which provides a deterministic binary decision on an input data point. In this paper, a scaled logistic sigmoid is employed to map the distance between the image and the decision boundary to the goodness of the image. Figure 8 shows the decision boundary of the SVM classifier and the normal or noisy images.

Fig. 8
figure 8

Decision boundary and normal or noisy images

In Fig. 8, d(f)Footnote 1 denotes the distance between the image and decision boundary and is defined in (8). In addition, ξ is a small positive value used to define the range of the uncertain region in the SVM learning.

As shown in Fig. 8, images near the decision boundary are difficult to classify as normal or noisy, while images at a distance greater than ξ from the decision boundary can be classified. We design the weight scheme using a sigmoid function to efficiently use barely classified images in the construction of a silhouette-based image. The image weight using the sigmoid function is represented by

$$ w(f) = \frac{1}{{1 + \exp \left( { - \left( {5/\xi } \right) \cdot d(f)} \right)}}. $$
(10)

Figure 9 shows the weight function. When d(f) > ξ, the silhouette is noise-free, and the full image is used in the gait representation method. Similarly, when d(f) < –ξ, the silhouette is completely corrupted and should be excluded. When ξ ≥ d(f) ≥ –ξ, the silhouette is moderately corrupted and should be weighted according to the degree of corruption. The weighting scheme can then be expressed as

(11)

where f(x, y) is the original silhouette image, w(f) is the output value of the PSVM, and is the weighted silhouette image. Finally, the gait representation image is represented by . Table 1 shows how the new gait representations are computed using

Fig. 9
figure 9

Sigmoid weight function

Table 1 Preparation of a new gait representation image

3.3 Consideration

In this subsection, the performance of the proposed method is analyzed based on a simplified gait silhouette model. The model used herein is similar to the one in [8]. For the sake of simplicity and easy understanding, a single pixel model is used as a gait silhouette. That is,

$$ b = f + \eta . $$
(12)

where f∈{0, 1} and η∈{−1, 0, 1} denote the noise-free silhouette and noise, respectively, and b∈{0, 1} denotes the noisy silhouette. Even though many features such as GEI, HTI, MSI, AEI and GEnI were considered, only GEI is used for mathematical consideration. First of all, let us assume that the probabilities of silhouette image f being 0 is and 1 are q and (1–q), respectively:

$$ P(f) = \left\{ {\matrix{ {P\left( {f = 0} \right) = q{ }} \\ {P\left( {f = 1} \right) = 1 - q.} \\ }<!end array> } \right. $$
(13)

We further assume that the probability of silhouette image f being flipped by noise η is silhouette image p, thus

$$ P\left( {\eta \left| f \right.} \right) = \left\{ {\matrix{ {P\left( {\eta = - 1\left| {f = 1} \right.} \right) = p{ }} \hfill \\ {P\left( {\eta = 0\left| {f = 1} \right.} \right) = 1 - p} \hfill \\ {P\left( {\eta = 1\left| {f = 0} \right.} \right) = p{ }} \hfill \\ {P\left( {\eta = 0\left| {f = 0} \right.} \right) = 1 - p.} \hfill \\ }<!end array> } \right. $$
(14)

Using our weighting scheme, (12) can be represented as

$$ wb = wf + w\eta $$
(15)

where w = w(f). For simplification, if the silhouette image is considered as a normal image (positive), then assign a positive number u for w and otherwise, we assign (1–u) for w as follows:

$$ w = \left\{ {\matrix{ u \hfill &{if\;d(f) > 0} \hfill \\ {1 - u} \hfill &{otherwise} \hfill \\ }<!end array> } \right. $$
(16)

where d(f) is the unthresholded output of SVM. In this case, w can be regarded as random variable according to SVM output, and it satisfies the following distribution:

$$ P\left( {w|\eta, f} \right) = \left\{ {\matrix{ {P\left( {w = u|\eta = 0,f = 1} \right) = \alpha } \hfill \\ {P\left( {w = 1 - u|\eta = 0,f = 1} \right) = 1 - \alpha } \hfill \\ {P\left( {w = u|\eta = - 1,f = 1} \right) = 1 - \beta } \hfill \\ {P\left( {w = 1 - u|\eta = - 1,f = 1} \right) = \beta } \hfill \\ {P\left( {w = u|\eta = 0,f = 0} \right) = \alpha } \hfill \\ {P\left( {w = 1 - u|\eta = 0,f = 0} \right) = 1 - \alpha } \hfill \\ {P\left( {w = u|\eta = 1,f = 0} \right) = 1 - \beta } \hfill \\ {P\left( {w = 1 - u|\eta = 1,f = 0} \right) = \beta } \hfill \\ }<!end array> } \right. $$
(17)

where α and β are the performance of SVM. Using true positive (TP), true negative (TN), false positive (FP), and false negative (FN), we can obtain

$$ \alpha = \frac{{TP}}{{TP + FN}}\quad \left( {1 - \alpha = \frac{{FN}}{{TP + FN}}} \right). $$
(18)

and

$$ \beta = \frac{{TN}}{{TN + FP}}\quad \left( {1 - \beta = \frac{{FP}}{{TN + FP}}} \right). $$
(19)

Therefore, we have

$$ P\left( {wf} \right) = \left\{ {\matrix{ {P\left( {wf = u} \right) = \alpha \left( {1 - p} \right)\left( {1 - q} \right)} \hfill &{when\;\eta = 0,\;f = 1} \hfill \\ {P\left( {wf = 1 - u} \right) = \left( {1 - \alpha } \right)\left( {1 - p} \right)\left( {1 - q} \right)} \hfill &{when\;\eta = 0,\;f = 1} \hfill \\ {P\left( {wf = u} \right) = \left( {1 - \beta } \right)p\left( {1 - q} \right)} \hfill &{when\;\eta = - 1,\;f = 1} \hfill \\ {P\left( {wf = 1 - u} \right) = \beta p\left( {1 - q} \right)} \hfill &{when\;\eta = - 1,\;f = 1} \hfill \\ {P\left( {wf = 0} \right) = \alpha \left( {1 - p} \right)q} \hfill &{when\;\eta = 0,\;f = 0} \hfill \\ {P\left( {wf = 0} \right) = \left( {1 - \alpha } \right)\left( {1 - p} \right)q} \hfill &{when\;\eta = 0,\;f = 0} \hfill \\ {P\left( {wf = 0} \right) = \left( {1 - \beta } \right)pq} \hfill &{when\;\eta = 1,\;f = 0} \hfill \\ {P\left( {wf = 0} \right) = \beta pq} \hfill &{when\;\eta = 1,\;f = 0} \hfill \\ }<!end array> } \right. $$
(20)

and

$$ P\left( {w\eta } \right) = \left\{ {\matrix{ {P\left( {w\eta = 0} \right) = \alpha \left( {1 - p} \right)\left( {1 - q} \right)} \hfill &{when\;\eta = 0,f = 1} \hfill \\ {P\left( {w\eta = 0} \right) = \left( {1 - \alpha } \right)\left( {1 - p} \right)\left( {1 - q} \right)} \hfill &{when\;\eta = 0,f = 1} \hfill \\ {P\left( {w\eta = - u} \right) = \left( {1 - \beta } \right)p\left( {1 - q} \right)} \hfill &{when\;\eta = - 1,f = 1} \hfill \\ {P\left( {w\eta = - \left( {1 - u} \right)} \right) = \beta p\left( {1 - q} \right)} \hfill &{when\;\eta = - 1,f = 1} \hfill \\ {P\left( {w\eta = 0} \right) = \alpha \left( {1 - p} \right)q} \hfill &{when\;\eta = 0,f = 0} \hfill \\ {P\left( {w\eta = 0} \right) = \left( {1 - \alpha } \right)\left( {1 - p} \right)q} \hfill &{when\;\eta = 0,f = 0} \hfill \\ {P\left( {w\eta = u} \right) = \left( {1 - \beta } \right)pq} \hfill &{when\;\eta = 1,f = 0} \hfill \\ {P\left( {w\eta = \left( {1 - u} \right)} \right) = \beta pq} \hfill &{when\;\eta = 1,f = 0} \hfill \\ }<!end array> { }} \right. $$
(21)

In order to see the effect of weight w, the noise-to-signal ratios (NSRs) of the conventional and proposed methods are compared. If NSR of the proposed method is lower than that of the conventional method, we can conclude that the effect of noise is relatively decreased by using our method. Given a walking cycle with N frames, NSR of GEI is defined as

$$ \frac{{Var\left( {\frac{1}{N}\sum\limits_{{i = 1}}^N {{{\eta }_i}} } \right)}}{{Var\left( {\frac{1}{N}\sum\limits_{{i = 1}}^N {{{f}_i}} } \right)}} = \frac{{Var\left( \eta \right)}}{{Var(f)}}. $$
(22)

From (13) to (14), Var(η) and Var(f) can be represented in terms of p, q as

$$ Var\left( \eta \right) = p\left( {1 - p} \right) $$
(23)
$$ Var(f) = q\left( {1 - q} \right) $$
(24)

Thus,

$$ \frac{{Var\left( {\frac{1}{N}\sum\limits_{{t = 1}}^N {{{\eta }_t}} } \right)}}{{Var\left( {\frac{1}{N}\sum\limits_{{t = 1}}^N {{{f}_t}} } \right)}} = \frac{{Var\left( \eta \right)}}{{Var(f)}} = \frac{{p\left( {1 - p} \right)}}{{q\left( {1 - q} \right)}}. $$
(25)

In similar, NSR of the GEI combined by the proposed method is defined as

$$ \frac{{Var\left( {\frac{1}{N}\sum\limits_{{t = 1}}^N {{{w}_t}{{\eta }_t}} } \right)}}{{Var\left( {\frac{1}{N}\sum\limits_{{t = 1}}^N {{{w}_t}{{f}_t}} } \right)}} = \frac{{Var\left( {w\eta } \right)}}{{Var\left( {wf} \right)}} = \frac{{E\left( {{{w}^2}{{\eta }^2}} \right) - {{{\left( {E\left( {w\eta } \right)} \right)}}^2}}}{{E\left( {{{w}^2}{{f}^2}} \right) - {{{\left( {E\left( {wf} \right)} \right)}}^2}}}. $$
(26)

From (20), E() and E(w 2 η 2) are

$$ \matrix{ {E\left( {w\eta } \right) = - u\left( {1 - \beta } \right)p\left( {1 - q} \right) - \left( {1 - u} \right)\beta p\left( {1 - q} \right) + u\;\left( {1 - \beta } \right)pq + \left( {1 - u} \right)\beta pq} \\ { = u\;\left( {1 - \beta } \right)p\left( {2q - 1} \right) + \left( {1 - u} \right)\beta p\left( {2q - 1} \right)} \\ { = p\left( {2q - 1} \right)\left\{ {u\;\left( {1 - \beta } \right) + \left( {1 - u} \right)\beta } \right\}} \\ }<!end array> $$
(27)

and

$$ \matrix{ {E\left( {{{w}^2}{{\eta }^2}} \right) = {{u}^2}\left( {1 - \beta } \right)p\left( {1 - q} \right) + {{{\left( {1 - u} \right)}}^2}\beta p\left( {1 - q} \right) + u{{\;}^2}\left( {1 - \beta } \right)pq + {{{\left( {1 - u} \right)}}^2}\beta pq} \\ { = {{u}^2}\left( {1 - \beta } \right)p + {{{\left( {1 - u} \right)}}^2}\beta p} \\ { = p\left\{ {{{u}^2}\left( {1 - \beta } \right) + {{{\left( {1 - u} \right)}}^2}\beta } \right\}.} \\ }<!end array> $$
(28)

In the same way, E(wf) and E(w 2 f 2) are

$$ \matrix{ {E\left( {wf} \right) = u\alpha \left( {1 - p} \right)\left( {1 - q} \right) + \left( {1 - u} \right)\left( {1 - \alpha } \right)\left( {1 - p} \right)\left( {1 - q} \right)} \\ { + u\;\left( {1 - \beta } \right)p\left( {1 - q} \right) + \left( {1 - u} \right)\beta p\left( {1 - q} \right)} \\ { = \left\{ {u\alpha + \left( {1 - u} \right)\left( {1 - \alpha } \right)} \right\}\left( {1 - p} \right)\left( {1 - q} \right) + \left\{ {u\;\left( {1 - \beta } \right) + \left( {1 - u} \right)\beta } \right\}p\left( {1 - q} \right)} \\ }<!end array> $$
(29)

and

$$ \matrix{ {E\left( {{{w}^2}{{f}^2}} \right) = {{u}^2}\alpha \left( {1 - p} \right)\left( {1 - q} \right) + {{{\left( {1 - u} \right)}}^2}\left( {1 - \alpha } \right)\left( {1 - p} \right)\left( {1 - q} \right)} \\ { + {{u}^2}\left( {1 - \beta } \right)p\left( {1 - q} \right) + {{{\left( {1 - u} \right)}}^2}\beta p\left( {1 - q} \right)} \\ { = \left\{ {{{u}^2}\alpha + {{{\left( {1 - u} \right)}}^2}\left( {1 - \alpha } \right)} \right\}\left( {1 - p} \right)\left( {1 - q} \right) + \left\{ {{{u}^2}\left( {1 - \beta } \right) + {{{\left( {1 - u} \right)}}^2}\beta } \right\}p\left( {1 - q} \right).} \\ }<!end array> $$
(30)

Finally, the NSRs of the conventional and proposed methods are compared. Let us assume that q = 0.5 and u = 0.9.

  1. 1)

    Let α = β = 0.9 and let us compare the NSRs while varying the probability p that the silhouette image is contaminated. In Fig. 10, two NSRs are compared. It is worth noting that when the probability of noise p is less than around 0.75, the proposed method exhibits lower NSR than that of the GEI. This figure explains why the proposed method outperforms the GEI in terms of recognition accuracy.

    Fig. 10
    figure 10

    NSRs of the conventional and proposed methods: α = β = 0.9

  2. 2)

    Let α = 0.9 and p = 0.7 and let us compare the NSRs while varying the TRR (true rejection rate) β of the SVM. Two NSRs are compared in Fig. 11.

    Fig. 11
    figure 11

    NSRs of the conventional and proposed methods: α = 0.9 and p = 0.7

As expected, when TRR β is higher than around 0.83, noisy silhouettes are reliably rejected and the proposed method demonstrates better NSR than the plain GEI. Figures 10 and 11 explain why the proposed method outperforms the GEI in terms of recognition accuracy.

4 Experimental results

4.1 CASIA database

In this section, we apply the proposed method to the CASIA database [18] and show a significant performance improvement in a noisy environment. This database is widely used to benchmark algorithms in gait recognition, and it is also known as the NLPR gait database. The database includes 20 subjects, each of which has four sequences: two sequences for one walking direction, and two for the reverse walking direction.

For the sake of simplicity, the nearest neighbor (NN) [7] is used as the classifier in our experiment. We use one sequence as a training set for PSVM, two sequences as a training set for NN, and the other sequence as a testing set to evaluate the performance of the proposed method. The proposed weighting scheme is combined with previous silhouette-based approaches, and their correct classification rates (CCR) are compared with one other and the previous methods. The simulation is repeated while the probability of noise in the gait silhouette database is changed from 0 to 0.4. Three-fold validation is used, and the experimental results of each silhouette-based method are shown in Tables 2, 3, 4, 5 and 6. In the tables, IMP denotes the improved rate and is defined as

$$ {\text{IMP}} = {{C}_p} - {{C}_s}, $$
(31)

where C s denotes the CCR from only silhouette-based gait recognition in a noisy environment, while C p represents the CCR when the proposed method is applied to silhouette-based gait recognition in a noisy environment.

Table 2 Comparison of performance: GEI
Table 3 Comparison of performance: HTI
Table 4 Comparison of performance: MSI
Table 5 Comparison of performance: AEI
Table 6 Comparison of performance: GEnI

Figure 12 shows the improvement rate according to noise rate for each silhouette-based representation image. From Fig. 12, we can observe that our proposed method degrades the performances of GEI by −1.66 % and HTI, GEnI by −1.67 % at a noise rate of 0 %. However, all silhouette-based methods for the general case having noisy silhouette images are improved using the proposed method. As mentioned above, the representation images are critical parts in gait recognition. The proposed method can exclude severe noisy images and decrease the influence of noisy images when constructing the gait representation image. This is the reason for such high performance. Figure 13 compares several silhouette-based representations with and without the proposed method in the corrupted CASIA database, where the probability of noise is 0.25.

Fig. 12
figure 12

Improvement rate according to noise rate

Fig. 13
figure 13

Silhouette-based representation images using the corrupted dataset (noise rate: 25 %) a GEI, b HTI, c MSI, d AEI, e GEnI, (left: conventional method, right: proposed method)

4.2 SOTON database

To make a generalized statement of our proposed method, we employ the larger SOTON database [15] for the second example. The SOTON database was created by Shutler et al. at the University of Southampton. It consists of more than 100 subjects. Similar to the experiment with the CASIA database, we use one sequence as a training set for the image weighting, two sequences as training for NN, and the other sequence as the testing set. Tables 7, 8, 9, 10 and 11 show the experimental results of each silhouette-based representation image.

Table 7 Comparison of performance: GEI
Table 8 Comparison of performance: HTI
Table 9 Comparison of performance: MSI
Table 10 Comparison of performance: AEI
Table 11 Comparison of performance: GEnI

Figure 14 shows the improvement rate according to noise rate. It can be observed from Tables 6, 7, 8, 9, 10 and 11 that the proposed method degrades the performances of GEI and HTI by 0.29 % at a noise rate of 10 %. Except for these two cases, however, the proposed method shows a significant improvement in performance for the corrupted SOTON database compared to the various silhouette-based gait recognition methods.

Fig. 14
figure 14

Improvement rate according to noise rate: SOTON database

5 Conclusions

Gait representations for silhouette-based approaches are obviously important for recognition. However, corrupted and noisy silhouette images are deformed and negatively affect the performance of gait recognition systems. In this paper, we employ PSVM, which outputs an adequate weight for an individual silhouette image according to its clarity, to successfully improve the quality of gait representations. The proposed method is tested with the CASIA and SOTON databases, and shows a significant performance improvement from the general silhouette-based gait recognitions in a noisy environment, thereby increasing the reliability of the gait recognition system.