Keywords

1 Introduction

In automated multi-camera video-surveillance, person re-identification is defined as whether the same person has been already observed at another place by different camera field of view. It is used for behaviour recognition, person tracking, image retrieval and safety purpose at public place. For humans to manually monitor video-surveillance systems to identify a probe accurately and efficiently is a difficult task. It is vary challenging problem due to variation in a person’s appearance across different cameras. Therefore, person observed at multi-camera views have small inter-class variations and large ambiguities in intra-class variations.

For person re-identification, few surveys have been already exist [1,2,3,4]. In recent years, the availability of large size annotated person re-identification datasets and great success of deep learning in computer vision for image classification and object recognition also have made great influence in person re-identification. In this survey paper, we have presented the deep learning based approaches for person re-identification on both image and video datasets.

Section 2 present various deep learning approaches for person re-identification on image datasets. Section 3 describes different types of deep learning approaches for person re-identification on video datasets and various currently ongoing issues and future works. In Sect. 4, we have drawn conclusion.

Table 1. Statistics of benchmark image datasets for person re-identification

2 Deep Learning Based Person Re-identification Approaches on Image Datasets

In year 2012, convolutional neural network based deep learning model has been presented by Krizhevsky et al. [7] in ILSVRC’12 competition. They won this competition with a large margin in accuracy. Since then convolutional neural network based deep learning models have been becomes more popular in computer vision comunity. Yi et al. [5] have been proposed a deep metric learning approach for person re-identification using a siamese convolutional neural network with a symmetry structure comprising two sub-networks connected by a cosine layer. A pair of images is used as a input, extracts features from each image separately and then uses their cosine distance for similarity matching. In [6] authors have been proposed a siamese architecture wherein a patch-matching layer is used which multiplies convolutional feature responses from the two inputs at a variety of horizontal stripes and uses product to compute patch similarity in similar latitude. Varior et al. [8] have been presented a method by inserting a gating function after each convolutional layer into the network to find effective subtle patterns in testing of paired images. In [9], a soft attention based model has been integrated with a siamese neural network to adaptively focus on the important local parts of paired input images. Cheng et al. [10] have been presented a triplet loss function, wherein a triplet of three images as input has been created. Each image is partitioned into four overlapping body parts after the first convolutional layer and fusion of all as a final one has been done in the fully-connected layer. In [12] authors have proposed a pipeline for learning generic feature representations from multiple domains. They combine all the datasets together and train a designed convolutional neural network from scratch on combined dataset and a softmax loss is used in the classification. In [13] authors has presented an approach wherein they construct a single fisher vector [14] for each image by using SIFT and color histograms aggregation. They have used fisher vectors as a input and build a fully connected network and linear discriminative analysis is used as an objective function. In [22] authors have proposed a deep transfer learning approach wherein one stepped fine-tuning for large person re-identification datasets (Imagenet \(\rightarrow \) Market-1501) and two stepped fine-tuning for small datasets (Imagenet \(\rightarrow \) Market-1501 \(\rightarrow \) VIPeR) have been used. We have taken all the result from existing approaches and observed overwhelming advantage of deep learning [22] in rank-1 accuracy on largest datasets CUHK03 and Market-1501 so far (Tables 1 and 2).

Table 2. Rank-1 accuracy of different deep learning approaches for person re-identification on various image datasets, i.e., (VIPeR, CUHK-01, CUHK-03, PRID, iLIDS and Market-1501)

3 Deep Learning Based Person Re-identification Approaches on Video Dataset

The deep learning approaches for person re-identification on video datasets are [23, 25, 31] wherein appearance features have been used as the starting point into RNN to obtain the time flow information between frames. McLaughlin et al. [31] have been presented a framework wherein convolutional neural network is used to extract features from consecutive video frames and fedded through a recurrent final layer. In [23] authors have proposed the gated recurrent unit and an identification loss based recurrent neural network. Yan et al. [25] and Zheng et al. [33] have proposed a model in which each input video sequence is classifies into their respective subject by using the identification model. Color and local binary pattern features are fedded into LSTM cells. Wu et al. [24] has proposed a model to build a hybrid network by fusing color and LBP features to extract both spatial-temporal and appearance features from a video sequence. In [30] authors have presented a method to extract a compact and discriminative appearance features representation from selected frames based on flow energy profile instead of the whole sequence (Tables 3 and 4).

Table 3. Statistics of benchmark video datasets for person re-identification
Table 4. Rank-1 accuracy of deep learning based approaches for person re-identification on different datasets, i.e.,(iLIDS-VID and PRIQ-2011)

Computer vision community is always looking for annotated large size datasets for supervised learning. This is a challenging problem in person re-identification. Assigning an id to a pedestrian is not trivial. Open-world person re-identification can be viewed as a person verification task. Zheng et al. [35] has been presented a method to achieve low false and high true target recognition. Liao et al. [36] has proposed a method having two stages, in the first stage, it finds whether a query subject is present in the gallery or not. In second stage, assigns an id to the accepted query subject. Open-world person re-identification is still challenging task as evidenced by the low recognition rate under low false accept rate as shown in [35, 36]. Therefore, there is need to design an efficient methods to improve both accuracy and efficiency of the person re-id systems.

4 Conclusion

Increasing the demand of saftey at public places gain more interest for person re-identification. In this survey paper, we have presented deep learning approaches in both image and video datasets. Solving the data volume issue, re-identification re-ranking methods, and open world re-identification systems are some important open issues that may attract further attention from the community.