Keywords

1 Introduction

There is a growing demand for human identification in many applications (e.g., secure access control and criminal identification). Biometrics such as face, fingerprint, iris, palmprint, .etc., have been used for human identification for years. Gait recognition aims to discriminate a subject by the way she/he walks. Compared with other biometrics, the key advantage of gait recognition is its unobtrusiveness, i.e., the gait video can be captured from a distance without disturbing the subject. Many gait recognition methods are proposed, and deep learning based gait recognition methods achieve great progresses in recent years. The general gait recognition framework can be found in Fig. 1. Most of the existing gait recognition methods utilize RGB videos as input as reviewed in [23, 27, 37], due to their easy access. Besides, we also have witnessed the emergence of works [5, 22, 29, 44, 47] with other data modalities for gait recognition, such as skeletons, depth images, infrared data, acceleration and gyroscope. Figure 2 demonstrates different modalities for gait recognition. This is mainly due to the development of various affordable sensors as well as the advantages of different modalities for gait recognition depending on various application scenarios.

Fig. 1.
figure 1

Framework of gait recognition.

Generally, gait data can be roughly divided into two categories: vision-based gait data and non-vision gait data. The former are visually “intuitive” for representing gait, such as RGB images, silhouettes, 2D/3D skeletons [5], 3D Meshes [44]. Recently, vision-based gait data play important and effective roles for gait recognition. Among them, RGB video based gait data is the most common-used data modalities for gait recognition, which has been widely-used in surveillance and monitoring systems. Meanwhile, inertial-based gait data modalities, such as acceleration, gyroscope, are non-vision gait data as they are visually non-intuitive for representing human gaits. The non-vision gait data modalities can also be used in the scenarios that require to protect the privacy of subjects. Figure 2 shows several gait data modalities and more details of different modalities are discussed in Sect. 2.

Fig. 2.
figure 2

Some examples for various gait data modalities.

Compared with other biometrics, gait recognition face many challenges due to the long distance and uncooperative means. Many methods are proposed to deal with these challenges for robust gait recognition. So far, many gait recognition survey have been published, such as [23, 25, 27, 37]. However, most of the recent surveys mainly focus on vision based deep learning, and the non-vision gait recognition draws little attentions. This paper aims to conduct a comprehensive analyze about the development of gait recognition, including gait datasets, recent gait recognition approaches, as well as the performance and trend. The main contributions of this work can be summarized as follows.

  1. (1)

    We comprehensively review the gait recognition from the perspective of various data modalities, including RGB, depth, skeleton, infrared sequence, acceleration, gyroscope.

  2. (2)

    We review the advanced gait databases with various data modalities, and provide comparisions and analysises of the existing gait datasets, including the scale of database, viewing angle, walking speed and other challenges.

  3. (3)

    We provide comparisons of the advanced deep learning based methods and their performance with brief summaries.

The rest of this paper is organized as follows. Section 2 reviews gait recognition databases with different data modalities. Section 3 introduces related gait recognition methods and performance. Finally, conclusions of gait recognition is discussed in Sect. 4.

2 Datasets for Gait Recognition

In this section, we first introduce commonly-used gait datasets with different data modalities: vision-based and non-vision based gait data. Then, we analyze the differences among those gait databases, including the number of subjects, viewing angles, sequences, and the challenges.

2.1 Vision-Based Gait Datasets

Vision-based gait data modalities mainly include RGB image/video, infared image, depth image, skeleton data. So far, RGB image/video based gait data is the most popular gait modality for gait recognition. The CMU MoBo [8] gait database is a early gait database, which mainly study the influences of walking speed, carrying objects, viewing angles in gait recognition. CASIA series gait databases provide various data types. Specifically, the CASIA A [36] is first released with 20 subjects from 3 different viewing angles, while the CASIA B [39] contains 124 subjects with different clothes, carrying conditions and 11 viewing angles, which is one of the most commonly-used gait database. CASIA C [31] database contains infared image sequences. Besides, the OU-ISIR Speed [34], OU-ISIR Cloth [11], OU-ISIR MV [21] gait databases provide gait silhouettes with different walking speed, clothes and viewing angles.

Different with early gait database, the OU-LP [13], OU-LP Age [38], OU-MVLP [30], OU-LP Bag [35], OU-MVLP Pose [2] focus on large scale gait samples from 4007, 63846, 10307, 62528 and 10307 subjects respectively. The scale of gait database increases with the development of deep learning technologies. The GREW dataset [45] consists of 26000 identities and 128000 sequences with rich attributes for unconstrained gait recognition. The Gait3D database [44] contains 4000 subjects and over 25000 sequences extracted from 39 cameras in an unconstrained indoor scene. It provides 3D Skinned Multi-Person Linear (SMPL) models recovered from video frames which can provide dense 3D information of body shape, viewpoint, and dynamics. Figure 3 demonstrates several examples of different gait benchmark databases.

Fig. 3.
figure 3

Examples of gait benchmark databases.

2.2 Non-vision Gait Databases

Non-vision gait recognition can be performed by sensors in the floor, in the shoes or on the body. Generally, the non-vision gait datasets are mainly focus on inertial data. Inertial sensors, such as accelerometers and gyroscopes, are used to record the inertial data generated by the movement of a walking subject. Accelerometers and gyroscopes measure inertial dynamic information from three directions, namely, along the X, Y and Z axes. The accelerations in the three directions reflect the changes in the smartphone’s linear velocity in 3D space and also reflect the movement of smartphone users. The whuGAIT Datasets [47] provide the inertial gait data in the wild without the limitations of walking on specific roads or speeds. [1] provide a real-world gait database with 44 subjects over 7–10 days, including normal walking, fast walking, down and upstairs for each subject.

Table 1 show some benchmark datasets with various data modalities for gait recognition. (R: RGB, S: Skeleton, D: Depth, IR: Infrared, Ac: Acceleration, Gyr: Gyroscope, 3DS: 3D SMPL). It can be observed that with the development of deep learning technology, the scale and data modality of gait database increase rapidly. The real-world large-scale gait recognition databases draw lots of attentions.

Table 1. Comparisons of the gait recognition datasets regarding statistics, data type, captured environment, view variations and challenging factors. #Sub., #View. and #Sample. refer to numbers of identities, viewing point and sequences. Sil., Ske. Inf., Dep., Aud. and In. denote silhouette, skeleton, infrared, depth, audio and inertial data,respectively. VI, DIS, BA, CA, CL, OCC, ILL, SUR, SP, SH and WD are abbreviations of view, distractor, background, carrying, clothes, occlusion, illumination, surface, speed, shoes, and walking directions, respectively.

3 Approaches for Gait Recognition

Many hand-crafted feature based gait recognition approaches, such as GEI based subspace projection method [16] and trajectory-based method [3], were well designed for video-based gait recognition. Recently, with the great progress of deep learning techniques, various deep learning architectures have been proposed for gait recognition with strong representation capability and superior performance.

Vision-based gait recognition approaches can be roughly classified into two categories: model-based approaches and appearance-based approaches. Model-based approaches model the human body structure and extract gait features by fitting the model to the observed body data for recognition. Appearance-based approaches extract effective features directly from dynamic gait images. Most current appearance-based approaches use silhouettes extracted from a video sequence to represent gait. Non-vision based gait recognition approaches mainly use inertia-based gait data as input.

3.1 Appearance-Based Approaches

RGB data contains rich appearance information of the captured scene context and can be captured easily. Gait recognition from RGB data has been studies for years but still face many challenges, owing to the variations of backgrounds, viewpoints, scales of subjects and illumination conditions. Besides, RGB videos have generally large data sizes, leading to high computational costs when modeling the spatio-temporal context for gait recognition. The temporal modeling can be divided into single-image, sequence-based, and set-based approaches. Early approaches proposed to encode a gait cycle into a single image, i.e., Gait Energy Image (GEI) [9]. These representations are easy to compute but lose lots of temporal information. Sequence-based approaches focus on each input separately. For modeling the temporal information 3D CNN [12, 17] or LSTM [18] are utilized. Sequence-based approaches can comprehend more spatial information and gather more temporal information with higher computational costs. The set-based approach [4] with shuffled inputs and require less computational complexity. GaitPart [6] uses a temporal module to focus on capturing short-range temporal features. [19] conduct effective gait recognition via effective global-local feature representation and local temporal aggregation.

For the similarity comparision, current deep learning gait recognition methods often rely on effective loss functions such as contrastive loss, triplet loss [4] and angle center loss [41].

3.2 Model-Based Approaches

In recent years, model-based gait recognition using a 2D/3D skeleton has been widely studied through view-invariant modeling and analysis. 2D poses can be obtained by human pose estimation, while 3D dynamic gait features can be extracted from the depth data for gait recognition [14]. [18] represents pose-based approach for gait recognition with handcrafted pose features. [32] utilize human pose estimation method to obtain skeleton poses directly from RGB images, and then GaitGraph network is proposed to combine skeleton poses with Graph Convolutional Network for gait recognition.

Table 2. Averaged Rank-1 accuracies in percent on CASIA-B of the recent appearance-based and model-based methods.

In Table 2, we just compare several current the appearance-based and model-based methods due to the space limitations. We can observe that appearance-based gait recognition achieves better performance than model-based approaches. Model-based gait recognition approaches such as [32] achieve great progress compared with early model-based gait recognition method.

3.3 Inertia-Based Approaches

In the past decade, many inertia-based gait recognition methods have been developed [7, 15, 28, 33, 42]. Most of them require inertia sensors to be fastened to specific joints of the human body to obtain gait information, which is inconvenient for the gait data collection. Recently, many advanced inertial sensors, including accelerometers and gyroscopes, are integrated into smartphones nowadays [47] to collect the interial gait data. Compared with traditional methods, which often require a person to walk along a specified road and/or at a normal walking speed, [47] collects inertial gait data in unconstrained conditions. To obtain good performance, deeply convolutional neural network and recurrent neural network are utilized for robust inertial gait feature representation. So far, non-vision based recognition becomes more and more important due to its safety to protect people’s privacy.

3.4 Multi-modalities Gait Recognition

Traditionally, gait data are collected by either color sensors, such as a CCD camera, depth sensors, Microsoft Kinect devices, or inertial sensors, or an accelerometer. Generally, a single type of gait sensor may only capture part of the dynamic gait features which makes the gait recognition sensitive to complex covariate conditions. To solve these problems, approaches based on multi-modalities are proposed. For examples, [46] proposes EigenGait and TrajGait methods to extract gait features from the inertial data and the RGBD (color and depth) data for effective gait recognition. [40] proposes sensor fusion of data for gait recognition which is obtained by ambulatory inertial sensors and plastic optical fiber-based floor sensors. The multi-modalities gait recognition approaches can achieves better performance but require more sensors to obtain different gait modalities.

3.5 Challenges and Future Work

Although significant progress has been achieved in gait recognition, there are still many challenges for robust gait recognition. Generally, gait recognition can be improved from the aspects of database, recognition accuracy, computational cost and privicay. For the gait database, it is significant to develop more gait databases in real scenarios. More effective gait recognition methods should be developed to deal with challenges such as different viewing angles and object carrying. Besides, the computational cost and privicy projection are also important for gait recognition.

4 Conclusions

In this paper, we provide a comprehensive study of gait recognition with various data modalities, including RGB, depth, skeleton, infrared sequences and inertial data. We provide an up-to-date review of existing studies on both vision-based and non-vision based gait recognition approaches. Moreover, we also provide an extensive survey of available gait databases with multi-modalities by comparing and analyzing gait databases from aspects of subjects, viewing angle, carrying, speed and other challenges.