Keywords

6.1 Introduction

Gait is one of the most popular biometric aspects of humans because it can be bona fide excluding subject cooperation at a distance from a lens. In the gait recognition model, the speed difference between matching pairs plays an important role and the gait mode of walking or running makes it more challenging. The distinctive features of the human body, such as handwriting, voice, and gait, have been highly studied for many years to make excellent progress in biometrics [1]. Every individual has a particular way of walking that happens because of facilitated and worked together activities of the skeletal muscles and sensory system. This makes the biometric step a ground-breaking marker to decide obsessive behavior caused by actual injury, maturation, or related issues. These inside and outside variables legitimately influence the movement and activity of the body and result in walk impedance [1]. Biomechanical examples of human movement are by and large speed-dependent, the adequacy of explicit development normally levels with the development speed (stride velocity is a determining factor in the nature of phases) [1, 2].

In an average walk examination, patients perform stride preliminaries at their agreeable speed and their step designs are regularly contrasted, and a reference design is from a regulating information base. Nonetheless, the impact of stride speed is commonly not represented when the step example of obsessive people is contrasted and sound ones who do not walk at an equal speed [2].

A repeated pattern consisting of steps and strides is the gait loop. A step phase began with one foot's initial contact and finishes with the other foot's initial contact. The stride phase begins with one foot's initial contact and finishes with the same foot's next initial contact, which comprises of two steps. There are two main phases in the gait cycle: stance phase comprises of 60% of the entire gait cycle and swing phase comprises of 40 percent of the full gait cycle. The step is the time when the foot is in touch with the ground and the weight is borne by the limb. The swing process is the time during which the reference foot is not in touch with the surface and swings in the air, indicating the weight of the body is borne by the contralateral limb [1, 3,4,5].

6.2 Related Work

6.2.1 Gait Recognition System

Human authentication based on gait has significant importance and a wide scope of research. The science of recognizing or identifying people by their physical characteristics is known as biometrics. One form of biometric technology is gait recognition [5]. The authors are working on gait recognition biometric technology to analyze the movements of body parts such as the foot, the knee, the shoulder, and so on. There are various gait acquisition technologies such as cameras, wearable sensors, and non-wearable sensors. Figure 6.1 demonstrates the gait recognition system.

Fig. 6.1
figure 1

Gait recognition system

Any recognition system is developed based on the training phase and the testing phase. There are two approaches, such as authentication and recognition. Automatic video processing was the basis for the first gait recognition [3, 6, 7]. It generates a mathematical model of motion. This method is the most common and requires the study of video samples of walking subjects, joint trajectories, and angles. In order to perform detection, the second approach uses a radar system to monitor the gait period that is then compared with the other samples [7].

A potential solution for this issue would be to gather a few walking preliminaries at different walking rates to construct a reference information base for essentially any conceivable step speed [8,9,10,11]. The tedious idea of such an assortment of knowledge, however, would be cost-restrictive and unviable. Scientists have suggested regression techniques as a reachable alternative for anticipating phase limits based on exploratory facts to resolve this barrier. The expectation data depends purely on the normal, slow, and fast walking speeds for stable subjects but only 10% time frame phase period when the complete step cycle was considered. [1, 2, 7, 12, 13]

The gait energy image (GEI) is an enhanced spatiotemporal feature to analyze a subject's gait and posture characteristics. Binary silhouette sequence normalization was used instead of direct silhouette sequence to provide robustness. GEI can be stated as,

$$ {\text{GEI}} = \frac{1}{{S_{f} }}\mathop \sum \limits_{S = 1}^{{S_{f} }} I_{o} \left( {u, v, s} \right) $$
(6.1)

where Sf states the total number of silhouette images in the gait cycle and s is an image number at an instant. Io (u, v) denotes the silhouette frame at an instant. Figure 6.2 shows the extracted GEI image from full gait cycle.

Fig. 6.2
figure 2

Gait energy image extraction

6.3 Proposed Work

6.3.1 Implementation Details of Proposed Model and Model Summary

CASIA B Gait Dataset contains silhouette sequence for each of the view angle. This silhouette sequence of images has been converted to gait energy image. So, there are in total 13,640 images. 11 angle view * 10 categories * 124 subjects = 13,640 GEIs. To evaluate the proposed method, all the sequences have been converted to a gait energy image (GEI), then set of GEIs are fed to gait recognition proposed model. Figure 6.3 shows the various GEIs under three conditions and 11 view angles.

Fig. 6.3
figure 3

GEI examples from an individual's CASIA B dataset under various conditions from view 0° to 180° with an 18° interval. Top row: regular walking, middle row: coat-wearing, and bottom row: bag-wearing

One layer consists of a 2D convolutional layer followed by batch normalization and max pooling in the proposed deep learning model for cross-view gait recognition. As such, three stacks of layers are available in total. Figure 6.4 shows the design of the model proposed. Table 6.1 shows description of the model proposed.

Fig. 6.4
figure 4

Proposed model

Table 6.1 Summary of proposed model (BN—batch normalization, MP—max pooling, and Conv—convolution layer)

6.3.1.1 Convolutional 2D Layer

A convolutional layer is a part of the deep neural network that can intake an input image, attribute importance to the different aspects/objects in the frame by learning weights and biases, and distinguish one from the other. The preprocessing needed for the convolutional neural network is much lower than for other classification algorithms. The size of input image is (240 × 240), and size of convolutional kernel is (7 × 7). The input image is downsampled, and output of convolutional 2D layer is about (234 × 234). Output of convolution layer is nothing but the feature map.

6.3.1.2 Batch Normalization

Batch normalization helps each network layer to learn much more independently of other layers. Higher learning values can be used because batch normalization assures that no activation has actually went high or really low. It reduces overfitting due to a slight regulation effect. To stabilize a neural network, it normalizes the performance of a previous activation unit by deducting batch average and dividing by batch confidence interval.

6.3.1.3 Pooling 2D

An issue with a feature map output is that they are sensitive to input feature position. To fix this sensitivity, sample the function maps is one approach. Pooling layers offer an approach to sample feature maps by summarizing features in the feature map kernel. Pooling is needed to detect feature in feature maps. We used max pooling. Results are pooled function maps highlighting the kernel's most present feature. The kernel size is (2 × 2).

6.3.1.4 Dropout

To simulate having a large number of distinct network architectures, a single model can also be used by arbitrarily lowering out nodes during training. This is known as dropout and provides a very inexpensive and effective regularization impact for reducing overfitting and enhancing generalization error in deep neural networks.

There are two sets of GEIs known as gallery set and probe set. While training the model gallery set is used and for evaluation probe set is used. As there are 11 views, there will be 11 models trained and tested.

6.4 Experiments and Result Analysis

6.4.1 CASIA-B Gait Dataset

This is a large multiview gait database; the information was gathered from 11 views and consists of 124 subjects. There are three varieties of variations clothing and carrying conditions including wide range of view angles and also consist of extracted silhouettes.

With 124 subjects, including both genders, 93 males and 31 females in an indoor environment, the data was gathered at 25 fps and a size of frame is 320 × 240. Therefore, there are 13,640 sequences in total. Figure 6.5 shows how data was collected from different angles [1,2,3,4,5,6, 14]. CASIA-B dataset contains 124 subjects; data is captured from 11 angle views from 0° to 180° with 18° interval. There are in total 10 categories such as normal walking with 6 subsets, walking while carrying bags with 2 subsets, and walking while wearing coat with 2 subsets.

Fig. 6.5
figure 5

11 angle view of CASIA-B dataset

6.4.2 Experimental Environment

To perform our experiment, we have used Python 3.7, Jupyter Notebook, and Anaconda Environment. TensorFlow 2.0 is used. The dataset used is CASIA-B dataset.

6.4.3 Result Analysis on CASIA-B Dataset

Proposed model is trained at 11 angle views, and each trained model is evaluated against different viewing angles probe set. Table 6.2 shows the experimental results on probe set.

Table 6.2 Shows cross-view gait recognition accuracy on CASIA-B dataset

It should be noted that the proposed method achieves the best recognition accuracies at almost all angles. This is the result of gait energy image over the silhouette sequence. For evaluation of the proposed model, we compared our model with GEI template methods, as input to model is GEIs. This are the models—Gait-Net [5], L-CRF [15], LB [16], RLTDA [17], CPM [12], and DV-GEIs [14]. Table 6.3 shows the comparison of the benchmark approaches and state-of-the-art models with the proposed approach with respect to some of the probe evaluation CASIA-B gait dataset.

Table 6.3 Comparison with under different walking view angles on CASIA-B by accuracies

It should be noted from Table 6.2. The average accuracy of the proposed model is around 93.0%, which outperforms the GaitNet with 12.2% and DV-GEI with 9.6%. Compared the output with state-of-the-art methods like L-CRF, use GEIs instead of.

Silhouette sequence as an input to the model, this is another proof that our gait characteristics strongly illustrate all major gait variations. Table 6.4 shows the comparison of cross-view recognition accuracy on CASIA-B dataset.

Table 6.4 Comparison of recognition cross-view accuracy on CASIA-B dataset

6.5 Conclusion

The deep learning approach is proposed in this paper. The proposed method can efficiently extract spatial and temporal parameters. In this model, we fed gait energy image (GEIs) as input to the model rather than gait silhouette series. This method is efficient than conventional methods. The gait energy image keeps the dynamic and static facts of a gait sequence, although time is not properly considered, this is the one limitation of the proposed method. The proposed model is robust in variations including obstructive objects, clothes, and viewing angles. Eventually, gait awareness develops stronger. Our model was tested on a broad CASIA-B dataset. Max pooling layer is used to adapt the spatial information, improving the mapping efficiency. Recognition results obtained demonstrated our model’s dominance over well-known approaches.