Keywords

1 Introduction

Vehicle detection, as an important part of intelligent transportation system, has always been a research hotspot to detect mobile vehicles in a video. It is the foundation of vehicle tracking, vehicle type identification, flow statistics, video velometer and other technologies. However, in practical applications, scenes often change dynamically. For example, if two vehicles are similar in exterior, it is likely to blur them. This detection result need to be further revised by the non-maximum suppression algorithm [1], otherwise the prediction box on the occluded vehicle is probably suppressed by the one on the adjacent vehicle causing a detection loss. Our work will improve the detection accuracy under occlusion circumstance.

The main contributions of the work are:

  • Propose a bounding box regression algorithm that introduces the adjacent punishment mechanism, so that the bounding box frames its specified target and being clear off from adjacent targets.

  • A proposal weak confidence suppression algorithm is proposed to reduce the probability with which the proposal is suppressed by adjacent targets.

  • Experiments are conducted on PASCAL VOC, UA-DETRAC and Highway-Vehicle datasets and show the efficiency of the proposed algorithm.

2 Related Work

Most of the early object detection algorithms are based on manual features. Malisiewicz [2] trains each sample through the SVM classifier based on the HOG feature. Felzenszwalb [3] proposes to use deformable part model to detect multi-class objects. In recent years, with the rise of deep learning, convolution neural network (CNN) has brought new ideas for object detection [4,5,6,7,8,9,10]. Networks such as SPP-net, R-FCN, and GoogleLeNet [11,12,13] can be used for object detection.

For vehicle detection, many researchers propose methods with efficiency and acceptable accuracy [14,15,16]. However, they have not considered occlusion. When the traffic flow on the road increases, the vehicles will occlude each other, which seriously affects the detection accuracy of vehicles if the above method are applied.

This paper improves the bounding box regression algorithm to make the proposal of the vehicle separated from adjacent targets, and proposes a proposal weak confidence suppression algorithm to avoid the proposal being mistakenly suppressed by adjacent targets, so as to improve the detection accuracy with occlusion.

3 Implementation Method

3.1 Bounding Box Regression and Its Improvement

Bounding box regression is used in object detection methods such as RCNN and Fast R-CNN to revise the position of the proposal and make it close to the designated target. Let \( P \) and \( G \) be the original proposal bounding box and true box represented by their center coordinates and width and height. Bounding box regression is to find a map from \( P \) to \( \hat{G} \), while \( \hat{G} \) is approximate to the ground-truth box and defined as:

$$ f(P) = (\hat G)\quad {\rm{s}}{\rm{.}}{\rm{t}}{\rm{.}}\quad dist(G,\hat G) < dist(G,P) $$
(1)

However, this regression only makes the proposal as close as possible to its target, without considering the influence of adjacent objects. When occlusion occurs, as shown in Fig. 1(a), the small box is ground-truth of A and the big box is ground-truth of B. When A is partially occluded by B, A’s proposal is likely to be misaligned due to the similarity of A and B as shown in Fig. 1(b). The dotted line box is A’s proposal and it may drift as B’s.

Fig. 1.
figure 1

The proposal bounding boxes of the occluded vehicle

To solve this problem, the adjacent punishment mechanism is applied as the bounding box regression algorithm. In the detector training process, each proposal will not only approach its ground-truth and define it as positive term LP, but also clear off from the ground-truth of adjacent objects and define it as negative term LN. By introducing the repulsive effect of adjacent objects on the proposal bounding box, the detection accuracy can be improved by avoiding the proposal drifting to the near similar objects when the target is occluded. The regression calculation is defined as:

$$ L \, = \, L_{P} \, + \, L_{N} $$
(2)

The positive term LP and negative term LN are defined as:

$$ L_{p} (p,g) = \sum\limits_{{i \in \{ {\text{x}},{\text{y}},{\text{w}},{\text{h}}\} }} {smooth_{{L_{1} }} (p_{i} - g_{i} )} $$
(3)
$$ smooth_{{L_{1} }} (x) = \left\{ {\begin{array}{*{20}l} {0.5x^{2} } \hfill & {{\text{if}}\,|x|{ < 1}} \hfill \\ {|x| - \, 0.5} \hfill & {\text{otherwise,}} \hfill \\ \end{array} } \right. $$
(4)

Since \( IoU(p,G_{i} ) \in [0,1] \), the \( smooth_{{L_{1} }} \) function is modified as follows:

$$ L_{N} (p,G_{n} ) = \sum\limits_{i \in n} {smooth_{{L_{1}^{1} }} } (IoU(p,G_{i} )) $$
(5)
$$ smooth_{{L_{1}^{1} }} (x) = \left\{ {\begin{array}{*{20}l} {(0.5 + x)^{2} - 0.25\quad \;x \le 0.5} \hfill \\ {\frac{|\ln (1 - x)|}{2 - x} - 0. 5\quad \;x > 0.5 ,} \hfill \\ \end{array} } \right. $$
(6)

LP is used to narrow the gap between proposals and ground-truth boxes, LN is used for repulsive effect on the proposal bounding box. Gn represents the ground-truth boxes set of all objects except the target, and \( IoU(p,G_{i} ) \) represents the IoU between the proposal and the ground-truth box.

3.2 Proposal Weak Confidence Suppression Algorithm

After the bounding box regression, a large number of proposals are generated near the target. In order to eliminate the false, the non-maximum suppression (NMS) algorithm is used to remove redundant proposals by the overlapping area (IoU), and reserve the proposal with the highest confidence for each target.

NMS algorithm selects the highest scored box bm from the proposal set B, then removes the proposal whose IoU value with bm is greater than the threshold Nt from B, and repeats until B is empty. However, there are some drawbacks in NMS. When dense occlusion occurs, the artificial Nt will matter the detection accuracy. If Nt is too large, it causes false detection. Contrarily, it loses detection.

Therefore, we improve NMS and propose a proposal weak confidence suppression algorithm. Instead of deleting proposals from the set B, the confidence Si of the proposal bi, which is determined by the polarity of IoU values of bi and bm is introduced to avoid false suppression and reduce the impact of Nt. Assuming that Up represents the IoU between bi and bm, if Up is greater than the threshold Nt, the confidence Si of bi is multiplied by the confidence attenuation coefficient \( \alpha \):

$$ \alpha = - U_{p} *\ln (1 + U_{p} ) + 1 $$
(7)

Its function is to reduce the confidence Si. Conversely, if Up is less than or equal to the Nt, the confidence Si will not be changed. Repeat the process until set B is empty. Finally, the prediction box set D and the confidence set Sd is output.

4 Experiments and Results Analysis

We use two open source image datasets: the PASCAL VOC dataset and the UA-DETRAC dataset [17]. The PASCAL VOC dataset is used for PASCAL VOC Challenge Competition with a total of 1,659 vehicle images. The UA-DETRAC dataset is mainly taken in Beijing and Tianjin with 6,250 vehicle images. We select the peak time video of Shanghai-Hangzhou-Ningbo Expressway, frame it to obtain images, and employ 20 students to annotate images manually. Finally, the image and annotation information are constructed into Highway-Vehicle dataset, which contains 12,800 vehicle images.

In order to verify the performances, Faster R-CNN is applied as the detector and the VGG-16 network in Faster R-CNN is replaced by ResNet-101 network with stronger feature extraction. The related parameters are set as follows: the training learning rate is 0.001, the attenuation step size is 30000, the attenuation coefficient is 0.1, the training batch size is 128, and the detection confidence threshold is 0.5.

For the evaluation, if the IoU between the prediction box and its ground-truth box is greater than 0.5, it is a correct detection, otherwise it is a false detection. When the annotated vehicle is not detected, it is a detection loss. Using three image datasets, Faster R-CNN and its reformative method is verified by five-fold cross validation under the different IoU thresholds. The average precision (AP) of these two methods are shown in Table 1.

Table 1. Vehicle detection AP of two methods on three datasets

The experimental results show that the detection performance on three image datasets is improved when the bounding box regression algorithm with adjacent punishment mechanism and the proposal weak confidence suppression algorithm is introduced. When the IoU threshold is 0.4, the proposed method yields the best performance. It is shown that the proposed method can effectively improve the detection accuracy and improve the accuracy and stability of the detector under the occlusion condition.

5 Conclusion

Vehicles have always been an important target for object detection. Due to the complexity of the actual road environment, such as vehicle occlusion, the accuracy and stability of object detection algorithms have been challenged. Compared with the traditional object detection algorithm, the proposed algorithm makes the proposal close to its designated target while clear from other nearby targets, and reduces the probability that the proposal is mistakenly suppressed by adjacent targets, so as to improve the detection performance of the detector. The experimental results show that the reformative Faster R-CNN is more efficient in vehicle detection.