Abstract
Person Re-Identification (person re-ID) is an image retrieval task which identifies the same person in different camera views. Generally, a good person re-ID model requires a large dataset containing over 100000 images to reduce the risk of over-fitting. Most current handcrafted person re-ID datasets, however, are insufficient for training a learning model with high generalization ability. In addition, the lacking of images with various levels of occlusion is still remaining in most existing datasets. Motivated by these two problems, this paper proposes a new data augmentation method called Random Linear Interpolation that can enlarge the sizes of person re-ID datasets and improve the generalization ability of the learning model. The key enabler of our approach is generating fused images by interpolating pairs of original images. In other words, the innovation of the proposed approach is considering data augmentation between two random samples. Plenty of experimental results demonstrates that the proposed method is effective to improve baseline models. On Market1501 and DukeMTMC-reID datasets, our approach can achieve 92.71% and 82.19% rank-1 accuracy, respectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
With the rapid development of deep learning, more and more convolutional neural network models effectively deal with the problems of image classification, object detection and other tasks in computer vision. Large-scale data is required for current large convolutional networks to ensure the generalization ability of models. Person re-identification (person re-ID), identifying pedestrians among different camera views, is facing the challenge that the sizes of many person re-ID datasets are small. Although the recognition accuracy on different datasets have been increasing, the problem that some large networks can not work well on small datasets is still existing. For instance, the same CNN model has higher recognition accuracy on Market-1501 [36] with over 30,000 images than that on PRID450s [27] with less than 1000 images.
Data augmentation viewed as a data preprocessing method, generates new training samples from original datasets. It is used widely to increase data size of a dataset in image classification, object recognition and person re-identification. Moreover, data augmentation plays a key role in deep learning gradually due to the ability which alleviates over-fitting of large convolutional network models effectively. Two commonly used data augmentation methods, Random Cropping [32] and Random Flipping [29], effectively improve the generalization ability of most existing CNN models. In detail, Random Cropping decreases the influence of background in the CNN decision, and focuses more on parts of the object than on the whole object. Random Flipping enables CNN models to learn from different directions of input images. Thus, both of them concern about image processing on one image itself.
Recently, Random Erasing [41] has been proposed to implement in most existing CNN models. For person re-identification, it increases occluded images through selecting a random rectangle region in a pedestrian image and erasing its pixel with random values. Training images with different levels of occlusion are generated to reduce the risk of over-fitting. However, it only concerns a simple situation that some noise is embedded in the image.
In this paper, we propose Random Linear Interpolation (RLI), a new data augmentation method to increase images with more complicated occlusion in the person re-ID dataset. Random Linear Interpolation keeps a part of images unchanged, and generates new images by fusing pairs of images with linear interpolation [3, 4, 23]. Plenty of mixed images, looking similar to original images but not the same, are generated in training to power the generalization ability of CNN models. Furthermore, Random Linear Interpolation greatly improves the recognition accuracy in person re-ID, and makes CNN models robust to outliers and variable levels of occluded images. Examples of Random Linear Interpolation are shown in Fig. 1.
In Fig. 1, two random generated images with different interpolation strength on Market1501 [36] and DukeMTMC-reID [39] are shown. Image A is viewed as a base image, and image B is fused in it. The new generated image A* shares both features of image A and image B. μ is a hyper-parameter that controls the strength of interpolation among pedestrian images. We can obviously observe that the similarity between image A and image A* is determined by μ. For instance, when μ is equal to 0.3004, features of the pedestrian in image A remain a little in the generated image A on Market1501. With the increasing of μ, more original features are seemed remaining in new images. Specifically, when the value of μ is equal to 0.9212, only several features of pedestrian changes on DukeMTMC-reID.
In summary, this paper makes the following contributions:
We propose a new data augmentation method - R andom L inear I nterpolation (RLI), which is light-weighted and can implement in most existing convolutional neural network models to improve the generalization ability.
For person re-identification, RLI can reduce the requirement of I/O and increase various levels of complicated occlusion by fusing images of mini-batches in training. Softly adjusting the proportions of interpolated samples are adopted in RLI to control the learning ability of the model for outliers.
The proposed methods can improve the performance of baseline models including ResNet and DenseNet. Furthermore, we also augment data by exploring a new way that we consider data augmentation between two random samples rather than a sample itself.
2 Related work
2.1 Data augmentation
Data augmentation is first proposed by [31] to deal with missing value problems, as exemplified by some training samples without labels. Recently, it has been focused on once again when the convolutional neural network (CNN) developed rapidly. The size of data is vital to most existing CNN models because a CNN model with large trained samples will have a lower risk of over-fitting. Data augmentation is the technique that enlarges the number of training samples by processing original samples. In [20], random cropping was first proposed to construct a large dataset with 80 million tiny images for object and scene recognition. Simonyan and Zisserman [29] further adopted random flipping to augment the training dataset for large-scale image recognition.
Later, to deal with the task of image classification, object detection and person re-identification, random erasing [41] is presented to randomly select a rectangle region in training images and erase its pixels with random values. The deep convolutional generative adversarial network (DCGAN) is proposed by [39]. In DCGAN, a generator network as a subnet is used to generate virtual data and a discriminator network is used to identify whether the sample is virtual or real. Different from these methods, this paper presents a new method named Random Linear Interpolation to randomly generate mixed images by fusing an image to another image and feed both generated and actual images into the CNN model, thereby enhancing robustness of the model to outliers.
2.2 Person re-identification
Person re-identification is a challenging image retrieval problem [7, 21, 42]. Due to factors such as different lighting and camera angles, the same person displays different appearance features, which poses a great challenge for recognition. In order to find the invariant features of pedestrian images in different camera views, a large number of distance metric learning methods including KISSME, KCCA, MLF, EIML, RPLM and APML [13, 14, 22, 25, 28, 33] have been proposed. Some methods based on dictionary learning [6, 17,18,19] have also been used to deal with the unsupervised person re-ID problem because it can convert high-dimensional visual features into low-dimensional sparse coding.
In recent years, due to the continuous expansion of the person re-ID data sets, many convolutional neural network models have been proposed to improve the accuracy of pedestrian recognition. Zheng et al. [37] proposed the IDE method which deals with person re-ID by embedding discriminant identities. With using the triplet loss function to maximize the errors between the positive and negative samples while minimizing the errors of the positive and positive samples, TriNet [12] alleviates the problem that the scale of datasets are small and effectively works on various convolutional neural network models. SVDNet [30] optimizes the deep representation learning process in CNN training for the application of person re-identification. Zheng et al. [39] uses a deep convolutional neural network to generate unlabeled samples and proposes a label smoothing regularization method (LSRO) for integrating unlabeled outliers.
3 Our approach
In this section, we show the details of Random Linear Interpolation implementing in the convolutional neural network model for person re-identification. We first describe RLI by selecting two images to interpolate randomly in the training set. Then the theoretical basis of Random Linear Interpolation for person re-ID is analyzed. Finally, we analyze the difference among Random Linear Interpolation and other data augmentation methods.
3.1 Random linear interpolation
Random Linear Interpolation is used during training process to randomly mix two images by linear interpolation in the convolutional neural network model. Suppose there are n images in a mini-batch, then we keep k samples unchanged and perform linear interpolation operation on remaining images. In the step of data pre-processing, more and more virtual similar samples are generated randomly. And a new image represents the fusion of two random original images, in other words, a sample in the training set owning two labels.
Two original images Ia = (xa, ya) and It = (xt, yt), where Ia represents a sample to be interpolated while It is a random image in the same mini-batch, are selected to mix all pixels in them. Specifically, the new virtual sample has a soft label representation which differs from the hard label representation that one data only corresponds to one label. The samples in the new dataset may be shown as \( I_{a}^{*} = (\overline {x}_{a},\overline {y}_{a}) \), where \( \overline {x}_{a} \) represents features of the virtual sample generated by Random Linear Interpolation and \( \overline {y}_{a}\) is the mixed label of (ya, yt). Moreover, we keep Ia unchanged if Ia is not augmented by linear interpolation. The new generated virtual sample by Random Linear Interpolation is defined as:
where R, G, B represents the pixel values of three channels of the original input image. μ is a randomly generated number through a Beta distribution Beta(α, β). For simplicity, we set β equal to α in this paper. To be noted, in training, we only use one data loader to achieve one mini-batch and generate virtual samples in the same mini-batch. This operation reduces the requirement of I/O and also decreases the confusion of the dataset. The pipeline of Random Linear Interpolation is shown as Algorithm 1.
3.2 Random linear interpolation for person re-identification
In the person re-identification problem, we focus on finding a function f ∈ F that matches the same identities among different camera views. Most of existing methods based on convolutional neural network belong to supervised learning. In the recent convolution neural network, a large number of parameters, even more than the number of samples in the data set, may over-learn the feature of training data and have weak generalization ability to deal with outliers and the presence of occlusion. Therefore, we try to learn by enhancing the data that approximates the original samples, which fits well with the idea of Vicinal Risk Minimization (VRM) [5].
Assume that distribution of raw data set \(D=\left \{ (x_{i},y_{i}) \right \}_{i = 1}^{m}\) is P and (xi, yi) ∼ P for all i = 1, 2, ... , m. But in most practical situations, the data distribution P is unknown and \(\hat {P}\) is defined to approximate the true distribution P. VRM is one of the principles that minimize the risk between predictions and targets. The approximated distribution \(\hat {P}\) in the VRM is shown as (3):
where \(v(\overline {x},\overline {y} | x_{i},y_{i}) \) is the vicinal distribution. However, different from using gaussian kernel in VRM, we construct a virtual dataset \(D_{v} \left \{ (\overline {x},\overline {y}) \right \}_{i = 1}^{n}\) that satisfies the principle that new samples in Dv are generated by the linear interpolation of two random samples in D. And the new vicinal distribution can be written as (4) and (5):
where δ(∙) is the Dirac distribution and (xj, yj) represents a random sample in D. As a hyper-parameter, μ controls the similarity between virtual and real samples.
3.3 Comparison with other data augmentation methods
We compare Random Linear Interpolation (RLI) with three effective data augmentation methods for person re-identification including random cropping, random flipping and random erasing. First of all, random cropping is a classic data augmentation method that training images is cropped randomly to meet the input size of the convolutional neural network. Parts of input pedestrian images are paid more attention to than the whole image by random cropping, and meanwhile the background information of images can be ignored. Different from random cropping, Random Linear Interpolation takes each pedestrian image as a whole, and fuses other image in it linearly. In other words, although we change all the pixel values in an image, we do not change the internal structure of the image data.
Random flipping is another data augmentation method which is commonly used in person re-identification. It enables a CNN model to learn the same image with different directions, and do not incur information loss in pedestrian images during augmentation. In comparison with random flipping, each time the model is trained by Random Linear Interpolation, the new generated image will show different values of all the pixels. Moreover, The fusion of two images can be viewed as adding various levels of noise to one of images.
Recently, random erasing has been introduced to dealing with person re-ID problems. It selects a rectangle region in an image randomly and erasing its pixels with random values. To make the CNN model robust to occlusion in person re-identification, various levels of occlusion in pedestrian images are focused on during training. Compared to random erasing, Random Linear Interpolation generates a new image by fusing a pair of images. In addition, changing the pixels of the whole image by linear interpolation, RLI expands the scope of image occlusion and improve the generalization ability of the model.
In our experiment, it is observed that all of these data augmentation methods can improve the recognition accuracy of person re-ID, and the proposed method achieves the best performance.
4 Experiments
In this section, we describe in detail the experimental performance evaluation of Random Linear Interpolation(RLI) for person re-identification. Two standard datasets, i.e., Market1501 and DukeMTMC-reID are introduced at first and then we evaluate the performance of the proposed method in some baseline models, such as ResNet18, ResNet34, ResNet50, ResNet101, ResNet152 [11] and DenseNet [15]. Finally the results that RLI compared to superior data augmentation and person re-identification methods on two benchmark datasets are analyzed.
4.1 Datasets
Two commonly used datasets including Market1501 and DukeMTMC-reID are summarized in Table 1. We take many experiments on the benchmark datasets. Market1501 is collected by six cameras in front of a supermarket in Tsinghua University. Overall, this dataset contains 32,668 bounding boxes of 1,501 pedestrian identities. Images of each identity are captured about 20 images on average. Both hand-crafted and the Deformable Part Model (DPM) [8] are used to label bounding boxes of 1501 identities. We follow the paper [36] that used 12936 images for training and another 19732 images for testing.
DukeMTMC-reID was taken from the Duke MTMC tracking dataset and has a total of 36,411 images for 1404 identities. We used 702 identities for training and the remaining identities for testing. All of images of this dataset are captured by 8 cameras, and pedestrian bounding boxes are available by hand-crafted. Follow [39], in testing, we pick one query image for each identity in each camera and put the other images in the gallery.
4.2 Settings
4.2.1 Baseline CNN model for person Re-ID
ResNet is chosen as the baseline model to evaluate the performance of the propose method. In our experiments, we take many experiments on five ResNet models including ResNet18, ResNet34, ResNet50, ResNet101 and ResNet152. Note that we remove the full connected layer of ResNet, adding a linear layer after ResNet, and followed by Batch Normalization [16] to regularize mini-batch of input. Moreover, we use Leaky ReLU [34] as activation function with negative slope equal to 0.01 and a dropout layer is added with the possibility equal to 0.5. We use the weight parameters of ResNet model pre-trained on Image-net for fine-tuned and set bias to 0.
In training, we divide input images into mini-batches and adopt a stochastic gradient descent algorithm proposed in [10]. Note that, we set the training batch size of the ResNet101 and ResNet152 to 16 considering the memory limit, and in other networks we set the batch size to 32. Besides, the size of input images is 256 × 128. The learning rate is set to 0.1 in the full connected layer and classification layer, and 0.01 in other layers of the ResNet50 model. After 40 training epochs, we decrease the learning rate to 0.001. The momentum is 0.9 and the weight decaying is set to 5e− 4. We train 60 epochs in all networks.
After augmentation for an input image in a mini-batch, there are two target labels generated for the image. Most existing loss functions can not be applied to this situation that the error between the predicted label ypred and two ground-truth labels (ya, yb) is calculated. Thus, we adopt a novel loss function shown as (6):
In ResNet50, the training loss on Market1501 and DukeMTMC-reID of 10 epochs are shown in Fig. 2. It can be seen that, training loss in the model with Random Linear Interpolation is above baseline but this has no effect on the accuracy of the recognition.
4.2.2 Leaky ReLUs and dropout
Like rectified linear unit (ReLU), leaky ReLU is another commonly used activation functions to alleviate the vanishing gradient problem by identifying positive values. It is first proposed in acoustic model [26]. In comparison with ReLU, Leaky ReLU allows negative values and gives all of them with a non-zero slope. Xu et al. [34] has demonstrated that Leaky ReLU shows a better performance than ReLUs in image classification task. We used Leaky ReLU as activation function for the better performance in person re-identification.
Because of large cross-view misalignment, occlusions and pose variations in person re-ID, some patch information on the same pedestrain is likely learned incorrectly in training. We adopt the dropout strategy to overcome the influences of mismatched patches and bad neural units. When a training sample as input in each epochs, some outputs of convolutional layers are randomly set as zeros. In Dropout, we set possibility as 0.5 to reduce half of neural units in a convolutional layer randomly. This action can make the CNN model more stable in person re-ID.
4.3 Parameter analysis
Two important parameters are involved with Random Linear Interpolation, i.e., interpolation strength α and interpolation possibility γ. α is the parameter in beta distribution that determines values of hyper-parameters μ. For each interpolated image, the value of μ, which controls the strength of interpolation, is randomly generated by Beta(α, α). With gamma set to 0.4, the recognition accuracy of the proposed method is shown by adjust alpha from 0.1 to 1 stepping 0.1. It can be seen that in Fig. 3b, Random Linear Interpolation with α improves over the baseline model. And rank-1 accuracy changes weakly on Market1501 when using different values of α in RLI.
γ is another parameter that determines the number of samples to be subjected to Random Linear Interpolation. When α equals to 0.1, results of rank-1 accuracy with different γ are shown in Fig. 3b. It is observed that when is set to 0.4, ResNet50 achieves the highest recognition accuracy on DukeMTMC-reID. Furthermore, with the increase of γ, the model over-learns features of generated samples and reduces attention to the original matched samples.
4.4 Performance evaluation
4.4.1 Improving different baseline models
We first verify the effectiveness of Random Linear Interpolation. γ and α are set to 0.4 and 0.1, respectively. The same parameters such as learning rate, weight decaying and dropout possibility are used in the following experiments.
Three convolution neural network models including ResNet18, ResNet34 and ResNet50 are used as baseline methods for person re-identification. Results of rank-1 accuracy and mAP on Market1501 and DukeMTMC-reID are shown in Fig. 4, respectively.
In Fig. 4a, with using RLI, the rank-1 accuracy of three models increases from 1.58% to 2.17% and the mAP increases from 1.07% to 6.23% on Market1501. Specificallly, after implementing RLI in the models on DukeMTMC-reID shown in Fig. 4b, the rank-1 accuracy increased by 3.06%, 10.09% and 3.33% in ResNet18, ResNet34 and ResNet50, respectively. And mAP increased by an average of 4.13%. Results of these experiments show great performance while a baseline CNN model using Random Linear Interpolation data augmentation for person re-ID.
4.4.2 Compared to superior data augmentation methods
The comparison results among different data augmentation methods including Random Cropping (RC), Random Erasing (RE) and Random Linear Interpolation (RLI) on Market1501 and DukeMTMC-re-ID, are shown in Table 2. To be fair, ResNet50 with the same structure and initialized parameters in Section 4.2.1 is used as the baseline method. Note that, random cropping is not adopted when evaluating performance of random erasing and Random Linear Interpolation. Furthermore, we combine the models with re-ranking [1, 40].
It can be observed that, all of data augmentation methods improve rank-1 accuracy and mAP of the baseline. On Market1501, both random erasing and Random Linear Interpolation show a better performance than random cropping. Specially, ResNet50 with RLI gives rank-1 accuracy of 88.93% and mAP of 72.09%. This is 1.05% higher in rank-1 accuracy and 1.31% higher in mAP than ResNet50 with RC. In addition, ResNet50 with RLI obtains 70.87% rank-1 accuracy and 63.69% mAP for DukeMTMC-reID with re-ranking. Compared to random erasing, the proposed data augmentation is 1.64% higher in rank-1 accuracy and 1.42% higher in mAP. Experimental results in Table 2 demonstrate that Random Linear Interpolation is an effective method for data augmentation and achieves a better performance than two commonly used methods, i.e., Random Cropping and Random Erasing.
Furthermore, we also compare the proposed method with the Random Cropping method on three state-of-the-art baseline models including ResNet101, ResNet152 and DenseNet121. When performing experiments on ResNet101 and ResNet152, we set batch size to 16 because of memory limitations. Experiment results are shown in Table 3. The proposed method increases average rank-1 accuracy in three superior models by 2.34% and 6.32% on Market1501 and DukeMTMC-reID, respectively. In addition, it can be observed that our method has a greater performance improvement for DukeMTMC-reID dataset that is more difficult to accurately re-identify, which means that our methods can be applied to more difficult person re-identification tasks. We achieve 91.38% rank-1 accuracy and 76.89% mAP on the Market1501 dataset by DenseNet121, while achieving 74.01% rank-1 accuracy and 53.64% mAP on the DukeMTMC-reID dataset by ResNet152.
4.4.3 Comparison with State-of-the-Art methods for person Re-ID
Many state-of-the-art methods deal with person re-identification efficiently on two benchmark datasets including Market1501 and DukeMTMC-reID. Both DukeMTMC-reID and Market1501 have been commonly used to evaluate the performance of superior methods recently. In Table 4, we show the compared results among our approach and other methods including [2, 9, 12, 24, 30, 35,36,37,38,39, 41] on Market1501. It can be seen that our method achieves competitive results with the state of the art. Specifically, our method obtains rank-1 accuracy = 92.71% and mAP = 88.98% for Market1501 with re-ranking. In addition, some superior methods including DCGANs [39], LOMO + XQDA [24], PAN [38], IDE [37] and BoW+KISSME [36]are compared to our method on DukeMTMC-reID. The detail scores of rank-1 accuracy and mAP are summarized in Table 5. In the compared methods, Random Linear Interpolation exceeds PANs [38] by 2.41% in rank-1 accuracy and 2.14% in mAP in ResNet152. Furthermore, the proposed method achieves rank-1 accuracy = 82.19% and mAP = 75.91% on DukeMTMC-reID with re-ranking.
5 Conclusion
In this paper, we proposed a novel data augmentation named Random Linear Interpolation for person re-identification. Pairs of images in training are mixed by interpolating the pixels of them. New samples with various levels of complicated occlusion are generated in training that can improve the robustness of convolutional neural network models. Experiments conducted on two benchmark datasets Market1501 and DukeMTMC-reID demonstrate that Random Linear Interpolation improves the performance of baseline CNN models for person re-identification. However, there are also many standard single-shot person re-identification datasets such as VIPeR and PRID450S with only one image per camera for each pedestrian, which poses a heavy problem for existing data augmentation methods. Therefore, we are going to optimize our method and explore a better data augmentation method based on generating mixed pedestrian background to solve the person re-identification problem of single-shot datasets in the future work.
References
Bai S, Bai X (2016) Sparse contextual activation for efficient visual re-ranking. IEEE Trans Image Process 25(3):1056–1069
Barbosa IB, Cristani M, Caputo B, Rognhaugen A, Theoharis T (2017) Looking beyond appearances: synthetic training data for deep cnns in re-identification. Computer Vision & Image Understanding
Boroujeni FR, Wang S, Li Z, West N, Stantic B, Yao L, Long G (2018) Trace ratio optimization with feature correlation mining for multiclass discriminant analysis. In: The AAAI conference on artifical intelligence
Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank- k projections for bilinear analysis. IEEE Transactions on Neural Networks & Learning Systems 27(7):1502–1513
Chapelle O, Weston J (2011) Vicinal risk minimization. Advances in Neural Information Processing Systems, pp 416–422
Cheng D, Chang X, Liu L, Hauptmann AG, Gong Y, Zheng N (2017) Discriminative dictionary learning with ranking metric embedded for person re-identification. In: Twenty-sixth international joint conference on artificial intelligence, pp 964–970
Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):1–60
Felzenszwalb P, Mcallester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. Cvpr 8:1–8
Geng M, Wang Y, Xiang T, Tian Y (2016) Deep transfer learning for person re-identification. Computer Vision & Image Understanding
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. Computer Science pp 1319–1327
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. pp 770–778
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification
Hirzer M, Roth PM, Bischof H (2012) Person re-identification by efficient impostor-based metric learning. In: IEEE Ninth international conference on advanced video and signal-based surveillance, pp 203–208
Hirzer M, Roth PM, Stinger M, Bischof H (2012) Relaxed pairwise learned metric for person re-identification. In: European conference on computer vision, pp 780–793
Huang G, Liu Z, Maaten LVD, Weinberger KQ (2017) Densely connected convolutional networks. In: IEEE Conference on computer vision and pattern recognition, pp 2261–2269
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. pp 448–456
Kodirov E, Xiang T, Fu Z, Gong S (2007) Person re-identification by unsupervised ł1 graph learning. Hydrobiologia 415(11):35–40
Kodirov E, Xiang T, Gong S (2015) Dictionary learning with iterative laplacian regularisation for unsupervised person re-identification. In: British machine vision conference, pp 44.1–44.12
Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ (2003) Dictionary learning algorithms for sparse representation. Neural Comput 15(2):349–396
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1097–1105
Lei Z, Shen J, Liang X, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Transactions on Cybernetics PP (99):1–14
Li J, Wei Y, Liang X, Zhao F, Li J, Xu T, Feng J (2017) Deep attribute-preserving metric learning for natural language object retrieval. In: ACM On multimedia conference, pp 181–189
Li Z, Nie F, Chang X, Yang Y (2017) Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis. IEEE Transactions on Knowledge & Engineering PP(99):1–1
Liao S, Hu Y, Zhu X, Li SZ (2015) Person re-identification by local maximal occurrence representation and metric learning. In: Computer vision and pattern recognition, pp 2197–2206
Lisanti G, Masi I, Bimbo AD (2014) Matching people across camera views using kernel canonical correlation analysis. pp 1–6
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models
Roth PM, Hirzer M, K?stinger M, Beleznai C, Bischof H (2014) Mahalanobis distance learning for person re-identification. Springer, London
Roth PM, Wohlhart P, Hirzer M, Kostinger M, Bischof H (2012) Large scale metric learning from equivalence constraints. In: IEEE Conference on computer vision and pattern recognition, pp 2288–2295
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
Sun Y, Zheng L, Deng W, Wang S (2017) Svdnet for pedestrian retrieval. In: IEEE International conference on computer vision, pp 3820–3828
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation. Publ Am Stat Assoc 82(398):528–540
Torralba A, Fergus R, Freeman WT (2008) 80 Million Tiny Images: a Large Data Set for Nonparametric Object and Scene Recognition. IEEE Computer Society
Xiong F, Gou M, Camps O, Sznaier M (2014) Person re-identification using kernel-based metric learning methods. In: European conference on computer vision, pp 1–16
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. Computer Science
Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. In: Computer vision and pattern recognition, pp 1239–1248
Zheng L, Shen L, Tian L, Wang S, Wang J, Tian Q (2015) Scalable person re-identification: a benchmark. In: IEEE international conference on computer vision, pp 1116–1124
Zheng L, Yang Y, Hauptmann AG (2016) Person re-identification: past present and future
Zheng Z, Zheng L, Yang Y (2017) Pedestrian alignment network for large-scale person re-identification
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by gan improve the person re-identification baseline in vitro. pp 3774–3782
Zhong Z, Zheng L, Cao D, Li S (2017) Re-ranking person re-identification with k-reciprocal encoding. pp 3652–3661
Zhong Z, Zheng L, Kang G, Li S, Yang Y (2017) Random erasing data augmentation
Zhu L, Huang Z, Liu X, He X, Sun J, Zhou X (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans Multimedia 19(9):2066–2079
Acknowledgements
This work was financially supported by the National Natural Science Foundation of China (Program No. 61702415, No.61502387), Natural Science Basic Research Plan in Shaanxi Province of China (Program No.2017JM6056,2016JQ6029) and Talent Support Project of Science Association in Shaanxi Province (Program No. 20180108).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Z., Guo, J., Jiao, W. et al. Random linear interpolation data augmentation for person re-identification. Multimed Tools Appl 79, 4931–4947 (2020). https://doi.org/10.1007/s11042-018-7071-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-7071-5