A Hybrid Deep Model for Person Re-Identification

Wu, Di; Zheng, Si-Jia; Cheng, Fei; Zhao, Yang; Yuan, Chang-An; Qin, Xiao; Jiang, Yong-Li; Huang, De-Shuang

doi:10.1007/978-3-319-95957-3_25

Di Wu¹⁷,
Si-Jia Zheng¹⁷,
Fei Cheng¹⁸,
Yang Zhao¹⁸,
Chang-An Yuan¹⁹,
Xiao Qin¹⁹,
Yong-Li Jiang²⁰ &
…
De-Shuang Huang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10956))

Included in the following conference series:

International Conference on Intelligent Computing

2501 Accesses

Abstract

In this study, we present a hybrid model that combines the advantages of the identification, verification and triplet models for person re-identification. Specifically, the proposed model simultaneously uses Online Instance Matching (OIM), verification and triplet losses to train the carefully designed network. Given a triplet images, the model can output the identities of the three input images and the similarity score as well as make the L_-2 distance between the mismatched pair larger than the one between the matched pair. Experiments on two benchmark datasets (CUHK01 and Market-1501) show that the proposed method can achieve favorable accuracy while compared with other state of the art methods.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Person re-identification by the asymmetric triplet and identification loss function

Article 14 September 2017

Triplet Ratio Loss for Robust Person Re-identification

Deep feature embedding learning for person re-identification based on lifted structured loss

Article 24 July 2018

Keywords

1 Introduction

As a basic task of multi-camera surveillance system, person re-identification aims to re-identify a query pedestrian observed from non-overlapping cameras or across different time with a single camera [1]. Person re-identification is an important part of many computer vision tasks, including behavioral understanding, threat detection [2,3,4,5,6] and video surveillance [7,8,9]. Recently, the task has drawn significant attention in computer vision community. Despite the researchers make great efforts on addressing this issue, it remains a challenging task due to the appearance of the same pedestrian may suffer significant changes under non-overlapping camera views.

Recently, owing to the great success of convolution neural network (CNN) in computer vision community [10,11,12,13,14], many CNN-based methods are introduced to address the person re-identification issue. These CNN-based methods achieve many promising performances. Unlike the hand-craft methods, CNN-based methods can learn deep features automatically by an end-to-end way [28]. These CNN-based methods for person re-identification can be roughly divided into three categories: identification models, verification models and triplet models. The three categories of models differ in input data form and loss function and have their own advantages and limitations. Our motivation is to combine the advantages of three models to learn more discriminative deep pedestrian descriptors.

The identification model treats the person re-identification issue as a task of multi-class classification. The model can make full use of annotation information of datasets [1]. However, it cannot consider the distance metric between image pairs. Verification model regards the person re-identification problem as a binary classification task, it takes paired images as input and outputs whether the paired images belong to the same person. Thus, verification model considers the relationship between the different images, but it does not make full use of the label information of datasets. As regards to the triplet model [15], it takes triplet unit $ (x_{i} ,x_{j} ,x_{k} ) $ as input, where $ x_{i} $ and $ x_{k} $ are the mismatched pair and the $ x_{i} $ and $ x_{j} $ are the matched pair. Given a triplet images, the model tries to make the relative distance between $ x_{i} $ and $ x_{k} $ larger than the one between $ x_{i} $ and $ x_{j} $. Accordingly, triplet-based model can learn a similarity measurement for the pair images, but it also cannot take advantage of the label information of the datasets and uses weak labels only [1].

We design a CNN-based architecture that combines the three types of popular deep models used for person re-identification, i.e. identification, verification and triplet models. The proposed architecture can jointly output the IDs of the input triplet images and the similarity score as well as force the L2 distance between the mismatched pair larger than the one between the matched pair, thus enhancing the discriminative ability of the learned deep features.

2 Proposed Method

As shown in Fig. 1, the proposed method is a triplet-based CNN model that combines identification, verification and triplet losses.

2.1 Loss Function

Identification Loss. In this work, we utilize the Online Instance Matching [16] to instead of the cross-entropy loss for supervising the identification submodule. For one image M, the similarity score of it belong to the ID j is written as:

$$ p_{j} = \frac{{\exp (v_{j}^{T} M/\partial )}}{{\sum\limits_{i = 1}^{L} {\exp (v_{i}^{T} M/\partial ) + \sum\limits_{k = 1}^{Q} {\exp (\mu_{k}^{T} M/\partial )} } }} $$

(1)

where $ v_{j}^{T} $ is the transposition of the j-th column of lookup table, $ \mu_{k}^{T} $ represents the transposition of the k-th ID of circular queue. $ \partial $ is the temperature scalar.

Verification Loss. For the verification subnetwork, we treat person re-identification as a task of binary classification issue. Similar to identification model, we also adopt cross-entropy loss as identification loss function, which is:

$$ \begin{aligned} q = soft\,\,\hbox{max} ({\text{conv}}), \hfill \\ L_{2} = \sum\limits_{i = 1}^{2} { - n_{i} \log } (q_{i} ) \hfill \\ \end{aligned} $$

(2)

in which n_i is the labels of paired image, when the pair is the same pedestrian, n₁= 1 and n₂= 0; otherwise, n₁= 0 and n₂= 1.

Triplet Loss. The triplet subnetwork is adopted to make the Euclidean distance between the positive pairs smaller than that between negative pairs. For one triplet unit $ R_{i} $, the triplet loss function can be written as:

$$ L_{3} = [thre \, + \, d (F_{w} (R_{i}^{o} ) , { }F_{w} (R_{i}^{ + } ) ) { - }d(F_{w} (R_{i}^{o} ),F_{w} (R_{i}^{ - } ))]_{ + } $$

(3)

where $ thre $ is a threshold value and is a positive number, $ \left[ x \right] + \, = { \hbox{max} }(0,{\text{x}}) $, and d () is Euclidean distance.

Hybrid Loss. The deep architecture is jointly supervised by three OIM losses, two cross-entropy losses and one triplet loss. During the training phase, the hybrid loss function can be written as:

$$ L{ = }\alpha_{ 1} L_{ 1} { + }\alpha_{ 2} L_{ 2} { + }\alpha_{ 3} L_{ 3} $$

(4)

in which $ \alpha_{1} $, $ \alpha_{2} $ and $ \alpha_{3} $ are the balance parameters.

2.2 Training Strategies

We utilize the pytorch framework to implement our network. In this work, we use the Adaptive Moment Estimation (Adam) as the optimizer of the deep model. We use two types of training strategies proposed by [17]. Specially, for large-scale dataset like Market-1501, we use the designed model directly to transfer on its training set. As for the small datasets (e.g. CUHK01), we first train the model on the large-scale person re-identification dataset (e.g. Market-1501), then fine-tune the model on the small dataset.

3 Experiments

We first resize the training images into 256*128, then the mean image is subtracted by those resized training images. For our hybrid model, it is crucial for it to organize the mini-batch that can satisfy the training purpose of both identification, verification and triplet subnetworks. In this study, we use the protocol proposed by, we sample Q identities randomly, and then all the images R in the training set are selected. Finally, we use QR images to constitute one mini-batch. Among these QR images, we choose the hardest positive and negative sample for each anchor to form the triplet units. And we randomly selected 100 paired images for verification training. As to the identification subnetwork, we use all QR images in the mini-batch for training.

(1)
Evaluation on CUHK01

For this dataset, 485 pedestrians are randomly selected to form the training set. The remainder 486 identities are selected for test. From Table 1, we can observe that the proposed method beats the most of compared approaches, which demonstrates the effectiveness of the proposed method.
Table 1. Fig. 1. Table 1. Results (Rank1, Rank5 and Rank10 matching accuracy in %) on the CUHK01 dataset. ‘-’ means no reported results is available.
Full size table
(2)
Evaluation on Market-1501

We compare the proposed model with eight state-of-the-art methods on Market-1501 dataset. We report the performances of mean average precision (mAP), Rank-1 and Rank-5. Both the results are based on single-query evaluation. From Table 2, it can be observed that the accuracies of mAP, Rank-1 and Rank-5 of the proposed model achieve 77.43%, 91.60% and 98.33%, respectively, and our method beats all the other competing methods, which further proves the effectiveness of the proposed method.
Table 2. Results (mAP, Rank-1 and Rank-5 matching accuracy in %) on the Market-1501 dataset in the single-shot. ‘-’ means no reported results is available.
Full size table

4 Conclusions

In this paper, we design a hybrid deep model for the person re-identification. The proposed model combines the identification, verification and triplet losses to handle the intra/inter class distances. Through the hybrid strategy, the model can learn a similarity measurement and the discriminative features at the same time. The proposed model outperforms most of the state-of-the-art on the CUHK01 and Market-1501 datasets.

References

Zheng, L., et al.: Person Re-identification: Past, Present and Future. Computer Vision and Pattern Recognition (2016). arXiv: 1610.02984
Google Scholar
Huang, D.S.: A constructive approach for finding arbitrary roots of polynomials by neural networks. IEEE Trans. Neural. Netw. 15(2), 477–491 (2004)
Article Google Scholar
Huang, D.S., et al.: A case study for constrained learning neural root finders. Appl. Math. Comput. 165, 699–718 (2005)
MathSciNet MATH Google Scholar
Huang, D.S., et al.: Zeroing polynomials using modified constrained neural network approach. IEEE Trans. Neural Netw. 16, 721–732 (2005)
Article Google Scholar
Huang, D.S., et al.: A new partitioning neural network model for recursively finding arbitrary roots of higher order arbitrary polynomials. Appl. Math. Comput. 162, 1183–1200 (2005)
MathSciNet MATH Google Scholar
Huang, D.S., et al.: Dilation method for finding close roots of polynomials based on constrained learning neural networks ☆. Phys. Lett. A 309, 443–451 (2003)
Article MathSciNet Google Scholar
Huang, D.S.: The local minima-free condition of feedforward neural networks for outer-supervised learning. IEEE Trans. Syst. Man Cybern. Part B Cybern. 28, 477 (1998). A Publication of the IEEE Systems Man & Cybernetics Society
Article Google Scholar
Huang, D.S.: The united adaptive learning algorithm for the link weights and shape parameter in RBFN for pattern recognition. Int. J. Pattern Recognit. Artif. Intell. 11, 873–888 (1997)
Article Google Scholar
Huang, D.S., Ma, S.D.: Linear and nonlinear feedforward neural network classifiers: a comprehensive understanding: journal of intelligent systems. J. Intell. Syst. 9, 1–38 (1999)
Article Google Scholar
Huang, D.S., Du, J.X.: A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks. IEEE Trans. Neural Netw. 19, 2099 (2008)
Article Google Scholar
Wang, X.F., Huang, D.S.: A novel density-based clustering framework by using level set method. IEEE Trans. Knowl. Data Eng. 21, 1515–1531 (2009)
Article Google Scholar
Shang, L., et al.: Palmprint recognition using FastICA algorithm and radial basis probabilistic neural network. Neurocomputing 69, 1782–1786 (2006)
Article Google Scholar
Zhao, Z.Q., et al.: Human face recognition based on multi-features using neural networks committee. Pattern Recogn. Lett. 25, 1351–1358 (2004)
Article Google Scholar
Huang, D.S., et al.: A neural root finder of polynomials based on root moments. Neural Comput. 16, 1721–1762 (2004)
Article Google Scholar
Ding, S., et al.: Deep feature learning with relative distance comparison for person re-identification. Pattern Recogn. 48, 2993–3003 (2015)
Article Google Scholar
Xiao, T., et al.: Joint detection and identification feature learning for person search. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3376–3385. IEEE (2017)
Google Scholar
Geng, M., et al.: Deep transfer learning for person re-identification (2016)
Google Scholar
Chen, S.Z., et al.: Deep ranking for person re-identification via joint representation learning. IEEE Trans. Image Process. 25, 2353–2367 (2016)
Article MathSciNet Google Scholar
Matsukawa, T., et al.: Hierarchical Gaussian Descriptor for Person Re-identification. In: Proceedings of the Computer Vision and Pattern Recognition, pp. 1363–1372 (2016)
Google Scholar
Xiao, T., et al.: Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1249–1258 (2016)
Google Scholar
Chen, W., et al.: Beyond triplet loss: a deep quadruplet network for person re-identification. In: Proceedings of the CVPR, pp. 1320–1329 (2017)
Google Scholar
Jin, H., et al.: Deep person re-identification with improved embedding and efficient training. In IEEE International Joint Conference on Biometrics, pp. 261–267 (2017)
Google Scholar
Liu, H., et al.: End-to-End comparative attention networks for person re-identification. IEEE Trans. Image Process. 26(7), 3492–3506 (2017). A Publication of the IEEE Signal Processing Society
Article MathSciNet Google Scholar
Zheng, Z., et al.: A discriminatively learned CNN embedding for person re-identification. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 14(1), 13 (2016)
MathSciNet Google Scholar
Zheng, Z., et al.: Unlabeled samples generated by GAN improve the person re-identification baseline in vitro, 3774–3782 (2017). arXiv preprint arXiv:1701.07717
Google Scholar
Zhong, Z., et al.: Re-ranking person re-identification with k-reciprocal encoding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3652–3661 (2017)
Google Scholar
Hermans, A., et al.: In Defense of the Triplet Loss for Person Re-Identification (2017). arXiv preprint, arXiv:1703.07737
Google Scholar
Huang, D.S.: Systematic Theory of Neural Networks for Pattern Recognition, Publishing House of Electronic Industry of China, May 1996
Google Scholar

Download references

Acknowledgements

This work was supported by the grants of the National Science Foundation of China, Nos. 61472280, 61672203, 61472173, 61572447, 61772357, 31571364, 61520106006, 61772370, 61702371 and 61672382, China Postdoctoral Science Foundation Grant, Nos. 2016M601646 & 2017M611619, and supported by “BAGUI Scholar” Program of Guangxi Zhuang Autonomous Region of China. De-Shuang Huang is the corresponding author of this paper.

Author information

Authors and Affiliations

Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
Di Wu, Si-Jia Zheng & De-Shuang Huang
Beijing E-Hualu Info Technology Co., Ltd., Beijing, China
Fei Cheng & Yang Zhao
Science Computing and Intelligent Information Processing of Guang Xi Higher Education Key Laboratory, Guangxi Teachers Education University, Nanning, 530001, Guangxi, China
Chang-An Yuan & Xiao Qin
Ningbo Haisvision Intelligence System Co., Ltd., Ningbo, 315000, Zhejiang, China
Yong-Li Jiang

Authors

Di Wu
View author publications
You can also search for this author in PubMed Google Scholar
Si-Jia Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Fei Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Chang-An Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Li Jiang
View author publications
You can also search for this author in PubMed Google Scholar
De-Shuang Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to De-Shuang Huang .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
Indian Institute of Technology Madras, Chennai, India
M. Michael Gromiha
Inha University, Incheon, Korea (Republic of)
Kyungsook Han
Liverpool John Moores University, Liverpool, United Kingdom
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, D. et al. (2018). A Hybrid Deep Model for Person Re-Identification. In: Huang, DS., Gromiha, M., Han, K., Hussain, A. (eds) Intelligent Computing Methodologies. ICIC 2018. Lecture Notes in Computer Science(), vol 10956. Springer, Cham. https://doi.org/10.1007/978-3-319-95957-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-95957-3_25
Published: 06 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95956-6
Online ISBN: 978-3-319-95957-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Hybrid Deep Model for Person Re-Identification

Abstract

Similar content being viewed by others

Person re-identification by the asymmetric triplet and identification loss function

Triplet Ratio Loss for Robust Person Re-identification

Deep feature embedding learning for person re-identification based on lifted structured loss

Keywords

1 Introduction

2 Proposed Method

2.1 Loss Function

2.2 Training Strategies

3 Experiments

4 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Hybrid Deep Model for Person Re-Identification

Abstract

Similar content being viewed by others

Person re-identification by the asymmetric triplet and identification loss function

Triplet Ratio Loss for Robust Person Re-identification

Deep feature embedding learning for person re-identification based on lifted structured loss

Keywords

1 Introduction

2 Proposed Method

2.1 Loss Function

2.2 Training Strategies

3 Experiments

4 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation