Keywords

1 Introduction

With the popularity of video surveillance, more and more video surveillance systems are employed in the public areas, such as shopping malls, airports and hospitals. The control center of this system is usually connected with multiple cameras which are distributed in different areas. The control center operator can find and track a specific pedestrian (such as a criminal suspect) by observing the cameras. However, with the increasing number of cameras in the surveillance system, it becomes increasingly difficult to manually find and track pedestrians. Therefore, monitoring system needs to automatically find and track pedestrians.

The key of this monitoring system is to match the same pedestrian under different cameras, which is known as person re-identification. The problem of person re-identification can be expressed as follows: supposed that the existing n pedestrians pass through camera A and camera B in turn, the pedestrian images captured by camera A and camera B are called the probe images and the gallery images, respectively. Each probe image could determine a sort of the gallery images. The top-ranked gallery images are considered to be more similar to the probe images, in other words, they are more likely to be the same pedestrian.

However, many factors affect the performance of the person re-identification, such as various camera viewpoints, illumination, occlusion and the limitation of metric function. In recent years, there are two key points of person re-identification methods, i.e. extracting robust features and learning discriminative metrics. For feature representation, Liao et al. [10] propose an efficient feature representation called Local Maximal Occurrence (LOMO), using color and Scale Invariant Local Ternary Pattern (SILTP) histograms to represent picture appearance in a high dimension. In [15], a method called Gaussian of Gaussian (GOG) descriptor is proposed based on a hierarchical Gaussian distribution of pixel features. Lisanti et al. [12] propose a kernel descriptor to encode person appearance and project the data into common subspace using Kernel Canonical Correlation Analysis (KCCA). Chen et al. [1] propose an Spatially Constrained Similarity function on Polynomial feature map as SCSP to divide an image into four non-overlapping horizontal stripe regions, and each stripe region can be described by four visual cues which are organized as HSV1/HOG, HSV2/SILPT, LAB1/SILPT and LAB2/HOG. For distance metric, methods are designed to maximize the inter-class similarity and minimize the intra-class similarity. KISS Metric Learning (KISSME) [5], Crossview Quadratic Discriminant Analysis (XQDA) [10], Metric Learning with Accelerated Proximal Gradient (MLAPG) [11], Top-push Distance Learning model (TDL) [21] are representative methods.

While the first image of the list is usually not the true matching image in existing person re-identification methods. In order to solve this problem and improve the accuracy of person re-identification, a large number of researches called re-ranking have emerged for person re-identification [2, 3, 6, 7, 13, 20, 22]. Some of these researches [13] require the supervision of tag information, while this paper prefers unsupervised automatic re-ranking research. The method proposed in [3] learns an unsupervised re-ranking model by jointly considering content and context information in a sorting list for effectively eliminating fuzzy samples and improving the performance of person re-identification. Lend et al. [20] propose a bidirectional ranking method which combines the similarity of the content with context, and uses the new similarity to correct the initial sorting list. Recently, it is increasingly popular to correct the initial sorting by using the k-reciprocal neighbors [23]. However, the method in [23] has no effect if there is only one positive for each identity in the gallery (single-shot). In order to solve this problem, this paper proposes a forward and reverse sorting constraints algorithm under single-shot to significantly improve pedestrian re-identification performance.

2 Proposed Approach

2.1 Problem Definition

Given a probe set with M pedestrian images \(P = \{ {p_i}|i = 1,2,...,M\} \) and a gallery set with N pedestrian images \(G = \{ {g_j}|j = 1,2,...,N\} \), the original distance between the pedestrian images \({p_i}\) and \({g_j}\) is represented as \(d({p_i},{g_j})\). Therefore, the initial ranking list \(L({p_{i,}}G) = \{ g_{k,j}^i|i \in (1,2,...M);j,k \in (1,2,...N)\} \) can be obtained by calculating the distances between the probe image \({p_i}\) and all gallery images G, where \(g_{k,j}^i\) denotes the gallery \({g_j}\) which is ranked k-th in the list of \({p_i}\). The goal of re-ranking is to modify the initial ranking list \(L({p_i},G)\) so that more positive samples are ranked at the top of the list to improve the performance of person re-identification.

2.2 Forward Sorting Constraint

The matching of person re-identification is to calculate the distances between a probe image and all gallery images, then we sort the image of the gallery based on the distances. The first image of the list is considered to be the true match of the probe image. This sorting method is called forward sorting in this paper. However, the true matching results are not ranked first in the sorting list after the forward sorting. Inspired by [16], we propose the algorithm of forward sorting constraint.

Fig. 1.
figure 1

Forward sorting constraint: (a) Initial ranking results. (b) Re-ranking results.

As shown in Fig. 1(a), the left images are the probe images, and the right images are the sorting of gallery images for them. The values in parentheses of the figure indicate the distance values between the corresponding gallery images and the probe images. It can be seen from the figure, the gallery image \({g_1}\) is respectively rank-1 in the sorting list of \({p_1}\), rank-1 in the sorting list of \({p_2}\), and rank-2 in the sorting list of \({p_3}\). Therefore, according to the ranking, the possibilities that the gallery image \({g_1}\) and the probe images \({p_1}\), \({p_2}\) are the same pedestrian are higher than that of \({p_3}\). But it still cannot be judged that whether \({p_1}\) or \({p_2}\) is more likely to be the same pedestrian as \({g_1}\). Therefore, this paper combines the distance information with the constraint of the sorting information to judge. From the Fig. 1(a), because of \(d({p_1},{g_1}) < d({p_2},{g_1})\), the possibility that \({g_1}\) and \({p_1}\) are the same pedestrian is higher than the possibility that \({g_1}\) is the same pedestrian as \({p_2}\).

In summary, the possibility that the gallery image \({g_1}\) and the probe image \({p_1}\) are the same pedestrian is the highest. We assume that \({g_1}\) and \({p_1}\) are the same pedestrian, so \({g_1}\) and \({p_2}\), \({p_3}\) should be different pedestrian. Under the situation, the matching accuracy can be improved by “punishing” the distance between \({g_1}\) and \({p_2}\), \({p_3}\) (increasing the distance between them). The result of re-ranking after “punishment” is shown in Fig. 1(b). The ranking of correct matching is improved in the forward sorting list of \({p_2}\), \({p_3}\).

According to the above analysis, how to “punish” the distances of false matches becomes the primary problem. The algorithm of this paper gives the appropriate “punishment” based on the original distance value of mismatches. In short, the smaller (larger) original distance, the smaller (heavier) “punishment”. In this way, some excessive punishment can be effectively avoided. The detailed algorithm is summarized in Algorithm 1.

figure a

2.3 Reverse Sorting Constraint

The forward sorting uses the distance values between the probe images and the gallery images to sort the gallery images. In turn, the distance values between them can also sort the probe images, and the sorting method is called the reverse sorting in this paper.

In Fig. 2, we show an example of the reverse sorting constraint. The true match (green box) of the probe image \({p_1}\) is incorrectly ranked, shown in Fig. 2(a). Figure 2(b) shows an reverse sorting list of the first two gallery images from the forward sorting list in Fig. 2(a). It is not difficult to find that the probe image \({p_1}\) is located in the third and first positions of reverse sorting list of the gallery images \({g_1}\) and \({g_2}\), respectively. There are some distinctions between Fig. 2(a) and 2(b). Figure 2(a) shows that the position of \({g_2}\) is relatively low-ranking in the forward sorting list of \({p_1}\). In Fig. 2(b), in the reverse sorting list of \({g_2}\), the position of \({p_1}\) is relatively high-ranking. It inspires us to use the reverse sorting information to constrain the initial sorting.

Fig. 2.
figure 2

Reverse sorting constraint: (a) Probe image and its forward sorting list. (b) Gallery images \({g_1}\), \({g_2}\) and their reverse sorting lists. (c) Re-ranking results.

Fig. 3.
figure 3

An example of the results obtained by applying clustering algorithm to the distance computed between a probe and all the gallery images.

According to the above analysis, in order to efficiently use the reserve sorting, this paper does not implement reverse sorting constraint on all gallery images, while applies to several top gallery images in the forward sorting list. The top-ranked gallery images have a higher probability of containing true matches, so they are called content sets. In this paper, the reverse sorting constraint algorithm selects the content sets according to the dynamic method proposed by Jorge García et al. [2]. Figure 3 shows the relationship of the position of the gallery images in the forward sorting list of a probe image and the distance between the gallery image and the probe image: (1) at first ranks, the distance between the gallery image and the probe image increases abruptly, then flattens(first elbow); (2) from the first elbow, distances grow linearly till reaching high ranks, and at last, distances start increasing significantly again (second elbow). According to such trend, the gallery images can be divided into three classes: (1) the similar appearance class \(({C_{sa}})\), which corresponds to the gallery images whose positions are before the first elbow; (2) the difference appearance class \(({C_{da}})\), which corresponds to the gallery images whose positions are between the first elbow and the second elbow; (3) the opposite appearance class \(({C_{oa}})\), which corresponds to the gallery images whose positions are after the second elbow.

In order to find the positions of the three elbows and effectively divide the three types of the gallery set, this paper uses the k-means clustering algorithm to divide the original gallery set. The specific approach is as follows. First, defining the cluster mean value: \({\mu _{sa}} = d({p_i},g_1^i)\), \({\mu _{da}} = d({p_i},g_{(N/2)}^i)\) and \({\mu _{oa}} = d({p_i},g_N^i)\) where \(g_k^i\) denotes a gallery image ranked k-th in the forward sorting list of probe image \(p_i\). Then the cluster mean is substituted into the following cost function:

$$\begin{aligned} \sum \limits _{k \in \{ sa,da,oa\} } {\sum \limits _{g_j^i \in {C_k}} {||d({p_i},g_j^i) - {\mu _k}|{|^2},j = 1,2,...,N} } \end{aligned}$$
(1)

After continuously iterating and optimizing, the above cost function will converge. The similar appearance class will contain the first m gallery images of the forward sorting list, forming the content sets \(B_i^{cn} = \{ g_1^i,g_2^i,...,g_m^i\} \). The reverse sorting constraint algorithm will be described in detail.

Given a probe image \({p_i}\) and its forward sorting list \(L({p_i},G)\). First of all, according to the method described above, we find the content set \(B_i^{cn}\) of the forward sorting list and then calculate the reverse sorting list \(L(g_1^i,P)\) and \(L(g_m^i,P)\) of gallery images \(g_1^i\) and \(g_m^i\) on the content set. Finally, we give certain “rewards” to the initial distance between \({p_i}\) and \(g_1^i\) and the distance between \({p_i}\) and \(g_m^i\) (decreasing the distance between them). The “reward” here is calculated by a reward function that gives more “rewards” to the distance between the top-ranked probe image and the gallery image of the reverse sorting list. According to the above properties, this paper proposes a reward function called Reward, as shown in the Eq. (2). It is worth noting that other reward functions that meet the above properties can also be used here.

$$\begin{aligned} Reward({p_i},{g_j}) = 2 - {e^{0.01*rank({g_j},{p_i})}} \end{aligned}$$
(2)
figure b

The \(rank({g_j},{p_i})\) indicates the position of the probe image \({p_i}\) in the reverse sorting list of the gallery image \({g_j}\). Firstly, the “reward” is given to the distances between \({p_i}\) and \(g_1^i\), \(g_m^i\), then we update the sorting list. Secondly, the “reward” operation is repeated on the distances between \({p_i}\) and new \(g_1^i\), \(g_{m-1}^i\), then we update the sorting list again. Finally, we repeat the operation until “reward” the distances between \({p_i}\) and \(g_1^i\), \(g_{2}^i\). The complete algorithm is summarized in Algorithm 2.

3 Experiments

3.1 Results of Different Datasets

We apply the proposed re-ranking algorithm on four different datasets of one baseline method to prove the universality of the method. The baseline method extracts hierarchical Gaussian descriptor [15] and uses cross-view quadratic discriminant analysis (XQDA) as distance metric. The four standard datasets include VIPeR [4], CUHK01 [8], PRID450S [18] and CUHK03 [9]. The test protocol of each dataset is the same as literature [15].

Table 1. Matching rates (%) of this algorithm on different datasets

The VIPeR dataset is a challenging dataset for person re-identification task, which contains 632 person image pairs, captured by different cameras in an outdoor environment. The CUHK01 dataset contains 971 persons and each person has two images in each camera. The PRID450S dataset is an extension of the PRID2011 dataset. It contains 450 pedestrian image pairs captured by two outdoor cameras. The background, lighting and viewpoints of two cameras are very different. The CUHK03 dataset contains 13164 images of 1360 identities captured by six surveillance cameras. Each pedestrian in the dataset has an average of 4.8 images per view. It provides bounding boxes manually labeled pedestrian and bounding boxes automatically detected by the pedestrian detector.

As shown in the Table 1, experimental results on above four datasets demonstrate that our approach achieves obvious improvements on the baseline method. Among them, the rank-1 matching rates increase by \(5.32\mathrm{{\% }}\) at least and \(8.85\mathrm{{\% }}\) at most. At the same time, there are also improvements on rank-5 and rank-10. It can be concluded that the proposed algorithm can be applied to different datasets and will not be affected by the acquisition environment and magnitude of datasets.

3.2 Results of Different Person Re-Identification Algorithms

In order to verify the scalability of the proposed algorithm, we apply the proposed re-ranking algorithm on four different person re-id methods of the VIPeR dataset.

Table 2. Matching rates (%) of different pedestrian re-identification algorithms

As shown in Table 2, our re-ranking algorithm can significantly improve performances for different person re-identification algorithms. And our method improves the rank-1 matching rates by 4% at least.

3.3 Comparison with Other Re-Ranking Methods

In this section, we compare our method with other re-ranking algorithms, Prototype-Specific Feature Importance (PSFI) [14], Individual-Specific Feature Importance (ISFI) [14], and Probe-Specific re-ranking (PSR) [19]. The results are copied from their papers and recorded in Table 3 for comparison.

It is shown in Table 3 that all the re-ranking methods can achieve higher matching accuracies compared with their baseline algorithms. Compared with our baseline method GOG, our result increases by 5.88% at Rank-1, which is the best improvement at rank1 among the compared methods. It is shown that our method outperforms these existing methods. Besides, our final results are better than those of the compared methods.

3.4 Effect of Major Components

From Sects. 2.2 and 2.3, it can be observed that the proposed re-ranking algorithm contains two parts: forward sorting constraint and reverse sorting constraint. The evaluation of each part of the algorithm is performed with the Cumulative Matching Characteristics (CMC) curve on the VIPeR dataset.

Table 3. Matching rates (%) of different re-ranking methods

As shown in Fig. 4, both parts of the re-ranking algorithm (red and blue curves) have achieved good performances compared with the original results of baseline (black curve), and the combination of both parts will have better performance.

Fig. 4.
figure 4

Performance is obtained by separately considering the forward sorting constraint and reverse sorting constraint. Results have been computed considering the GOG baseline. (Color figure online)

3.5 Effect of Hyper-parameter on Algorithm Performance

Hyper-parameter \(\lambda \) is introduced to implement different levels of “punishment” for error matching in the algorithm 1. Figure 5 shows the influence of different \(\lambda \) values at rank-1 on three datasets. It is shown that the performances of person re-identification are best for VIPeR, CUHK01, and PRID450S, when \(\lambda \) values are 0.6, 0.5, and 0.5, respectively.

Fig. 5.
figure 5

The impact of the parameter \(\lambda \) on person re-identification performance on three datasets. (a) VIPeR dataset. (b) CUHK01 dataset. (c) PRID450S dataset.

4 Conclusions

In this paper, we use the implicit constraint information in the initial sorting list, which is formed by the existing person re-identification algorithm, and we conversely re-rank the initial sorting list to improve the performance of the original person re-identification algorithm. The experimental results show that the application of this algorithm on different datasets can significantly improve the performance of the original person re-identification algorithm, especially when the dataset is relatively large, the effect is even more gratifying. It is worth mentioning that the forward and reverse sorting constraints algorithm proposed in this paper is fully automatic and unsupervised, and it can be easily applied to existing person re-identification algorithms.