Occlusion Detection in Visual Tracking: A New Framework and A New Benchmark

Niu, Xiaoguang; Gu, Yueyang; Lu, Zhifeng; Hong, Zehua; Tian, Yi; Xu, Kuan; Yang, Jie; Fang, Xingqi; Qiao, Yu

doi:10.1007/978-3-030-04212-7_51

Xiaoguang Niu¹⁶,
Yueyang Gu¹⁶,
Zhifeng Lu¹⁷,
Zehua Hong¹⁷,
Yi Tian¹⁷,
Kuan Xu¹⁶,
Jie Yang¹⁶,
Xingqi Fang¹⁶ &
…
Yu Qiao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11304))

Included in the following conference series:

International Conference on Neural Information Processing

2323 Accesses
2 Citations

Abstract

Occlusion remains being a challenge in visual object tracking. The robustness to occlusion is critical for tracking algorithms, though not much attention has been paid to it. In this paper, we first propose an occlusion detection framework which calculates the proportion of the target that is occluded, hence to decide whether to update the model of target. This framework can be integrated with existing tracking algorithms to increase their robustness to occlusion. Then we introduce a new benchmark which contains sequences where occlusion is the main difficulty. The sequences are chosen from public benchmarks and are fully annotated. The proposed framework is combined with several standard trackers and evaluated on the new benchmark. The experimental results show that our framework can improve the tracking performance, with explicit incorporation of occlusion detection.

This research is partly supported by USCAST2015-13, USCAST2016-23, SAST2016008, NSFC (No: 61375048).

Access provided by CONRICYT-eBooks. Download conference paper PDF

Hard Occlusions in Visual Object Tracking

GOA-net: generic occlusion aware networks for visual tracking

Article 07 July 2024

Anti-occlusion object tracking based on correlation filter

Article 27 November 2019

Keywords

1 Introduction

Generic object tracking [1, 3, 5,6,7, 12,13,14], where the tracker is not specialized to any specific category of objects, is a popular research field in recent years. Because of the category-agnostic, it is not possible to train a detector offline for a particular type of objects, such as pedestrians or hands. Consequently, occlusion is the most challenging factor for generic object trackers [8], since the trackers usually cannot discriminate the occluders from the targets.

Majority of the work in handling occlusion is to add a sub-module before target model updater to monitor the tracking reliability. In [20], the feedback from tracking results is utilized to decide whether or not to update the target model. However, this strategy still cannot tell what is actually happening, occlusion or target appearance variation, both of which will decrease the tracking confidence.

COD (Context-based Occlusion Detection for Tracking) [15,16,17] is a framework that monitors the background-patches around the target and can identify which of them occlude the target. However, several drawbacks exist. First, the number of background-patches that COD monitors is constant, which contaminates the adaptive ability of the framework. Furthermore, determining the occlusion occurrence simply by the number of occluders over-simplifies the problem and is not guaranteed to be reasonable in all occasions. To solve these issues, we present Adaptive COD, which is adaptive to differently sized targets and able to identify what proportion of the target is affected by occlusion. The number of background-patches is now dependent on the perimeter of the target, hence more background-patches will be allocated to deal with a larger target. After acquiring the positions of the background-patches that occlude the target, we calculate the proportion of the target that is under occlusion. If the proportion is greater than a threshold, model updater will not take any action, avoiding the model being corrupted. The background-patches that occlude the target continues to be monitored, while other background-patches are discarded and new ones will be generated around the new target. As a general framework, Adaptive COD can be integrated with any existing tracking algorithm to address the occlusion problem.

To better evaluate the performance of different trackers and promote the development of tracking algorithms, several benchmarks have been built. OTB [21], VOT [10], and ALOV [19] are the most widely used ones. In OTB [21], each sequence is tagged with 9 attributes, including occlusion, illumination variation and so on, which represent the challenging factors in visual tracking. A sequence will be tagged with attribute ‘occlusion’ if there are frames in the sequence where occlusion happens. In VOT [10], the attribute annotation is further refined to per-frame level. Later in NUS-PRO [11], the occlusion is classified into three levels: no occlusion, partial occlusion and full occlusion. Recently, attribute-specific benchmarks appear. In [18], a dataset for fast moving objects is collected. A higher frame rate video dataset is proposed in [4]. Although occlusion is one of the attributes in OTB [21] and VOT [10], the frames where occlusion happens only take up a small proportion of the overall sequence. Moreover, before the tracker meets these frames, the tracking results have already drift from the groundtruth, which means that different trackers will have different initialization setups in terms of evaluating their robustness to occlusion. In this paper, we build an attribute-specific benchmark which contains sequences where the target undergoes occlusion. In our proposed dataset, we exclude other attributes and only preserve the frames relevant to occlusion. Each sequence contains three parts: before, during and after occlusion. We evaluate our model updating strategy by integrating it with several mediocre tracking algorithms, including KCF [7], SAMF [14], DSST [3] and Staple [1]. The experimental results show that the Adaptive COD improves the robustness of these tracking algorithms.

In summary, the main contributions of this paper are as follows:

1.
We improve the occlusion detection framework in [17]. The number of background-patch trackers is adaptive to the size of target. A new model updating strategy is proposed.
2.
We establish a new dataset where the sequences contain occlusion for evaluating the robustness of tracking algorithms.
3.
Extensive experiments demonstrate the effectiveness of our occlusion detection framework and occlusion benchmark.

2 Occlusion Detection Framework

In this section we first briefly review the Context-based Occlusion Detection for Tracking (COD) framework [17]. Then the proposed Adaptive COD is presented.

2.1 COD Review

Based on the assumption that both target and background-patches are involved in occlusion, COD [17] pays attention to the background around the target to actively detect occlusion. As is shown in Algorithm 1, two kinds of trackers exist in the framework: target tracker and background-patch trackers. Target tracker estimates the bounding box of target in the current frame, while the background-patch trackers provide the position and tracking reliability of every background-patch surrounding the target. Intuitively, if the bounding boxes of a background-patch and the target overlap and that the background-patch has high tracking reliability (hence it is not occluded by the target), then the target is occluded by the background-patch. Please refer to [17] for more details.

However, COD has the following disadvantages. Firstly, the number of background-patches $N_{1}$ is constant for variously sized targets in different sequences. For small targets, $N_{1}$ is relatively too large. Therefore, many background-patches overlay with each other, causing the double counting and repeated calculation. For large objects, $N_{1}$ becomes relatively small, so the background around the target is not fully monitored. Secondly, the target model will be updated online if the number of background-patches that occlude the target, N, is greater than a constant threshold $N_{th}$. Similarly, for targets of different sizes, N as merely a counting result cannot properly measure the degree of occlusion.

2.2 Adaptive COD

We propose an Adaptive COD to overcome the limitations of COD mentioned in Sect. 2.1. Adaptive COD inherits the structure from COD but differs in two important aspects: the initialization step and the criterion for identifying occlusion. They are shown in Algorithm 1.

Denote the bounding box of target in frame t as $(x_t,y_t,w_t,h_t)$ for $t=1,...,T$, where $(x_t,y_t)$ are the upper-left corner point coordinates and $(w_t,h_t)$ are the width and height. Then we set $N_1 = [ \ (w_1+h_1)/2 \ ]$, where [x] will round x to its nearest integer. In this way, the number of background-patches is dependent on the size of target. Unless the scale of target varies heavily, we keep using $N_1$ in the following frames. The results can be seen in Fig. 1.

We propose a new criterion for identifying occlusion. For target with parameter $(x_t,y_t,w_t,h_t)$, we build a mask $M_t$ as follows:

$$ \begin{aligned} M_t(x,y)=\left\{ \begin{aligned}&1, if \ x \in [x_t,x_t+w_t] \ \& \& \ y \in [y_t,y_t+h_t] \\&0, otherwise \end{aligned} \right. \end{aligned}$$

(1)

I.e., $M_t$ has the same size of frame and the region representing the target is set as 1. The area of target region is $A_t=\sum {M_t}$. Similarly, for a background-patch with parameter $(bx_t^i,by_t^i,bw_t^i,bh_t^i)$ for $i=1,2,...,N_1$, we build a mask $m_t^i$. Denoting the tracking reliability of background-patch i as $r_t^i$ which is usually calculated as Peak-to-Sidelobe Ratio [2], we update $M_t$ as

$$\begin{aligned} M_t =\left\{ \begin{aligned}&M_t \ - \ m_t^i, \quad if \ r_t^i \ > \ r_{th} \\&M_t, \quad otherwise \end{aligned} \right. \end{aligned}$$

(2)

where $r_{th}$ is the threshold. After inspecting every background-patch and updating $M_t$, the area of target that is not occluded is $S_t=\sum {M_t}$. We use $\gamma _t = S_t\ / \ A_t$ as the measurement of occlusion, as is demonstrated in Fig. 1. Compared with using N as the indicator of occlusion in COD, the new area-based adaptive criterion makes sense for targets of any size.

After identifying occlusion, the algorithm makes decision on whether to update the target tracker. The background-patches that are identified as occluders will continue to be monitored. Meanwhile, the algorithm will not pay attention to the other background patches which does not occlude the target and new background patches around the target in current frame will be added in the monitoring set.

3 Occlusion Benchmark

In this section, we present a new specialized benchmark for evaluating the robustness of tracking algorithms to occlusion. The benchmark is available at https://pan.baidu.com/s/1qZ0KeoW.

Although occlusion is one of the attributes in OTB [21], VOT [10] and NUS-PRO [11], these benchmarks still cannot accurately reflect the robustness of tracking algorithms to occlusion, due to the following reason. Each sequence usually has multiple challenging factors. Suppose a sequence s with frames ($\#1$,...,$\#t_1$,...,$\#t_2$,...,$\#T$), where the occlusion happens in frames between $\#t_1$ and $\#t_2$. Since all the trackers start tracking in frame $\#1$, they will have different tracking outputs before the occlusion occurs in frame $\#t_1$, which means that the performance on frames between $\#t_1$ and $\#t_2$ is heavily influenced by the previous frames. As a recent study [9] shows, performance measures computed on a sequence are significantly biased to the dominant attribute of the sequence. Moreover, besides occlusion, there may exist other challenging factors in frames between $\#t_1$ and $\#t_2$, which makes the evaluation more unreliable.

Table 1. Statistics about our occlusion benchmark.

Full size table

Based on these observations, we propose an occlusion benchmark that has the following characteristics:

1.
Each sequence s with frames ($\#1$,...,$\#t_1$,...,$\#t_2$,...,$\#T$) can be divided into 3 sub-sequences. In the first sub-sequence with frames ($\#1$,...,$\#t_1$), neither occlusion nor other challenging factor occur, so the target model can be initialized. In the second sub-sequence with frames ($\#t_1$,...,$\#t_2$), the target is occluded. In the last sub-sequence with frames ($\#t_2$,...,$\#T$), occlusion disappears so we can identify if the tracking succeeds. See Fig. 2 for explanation.
2.
In frames ($\#t_1$,...,$\#t_2$), we exclude other attributes such as deformation, so that the only difficulty for tracking is to handle occlusion. However, it is a common scenario that the occluders are of the same category as the targets and have similar appearance, so we keep these sequences in the benchmark.
3.
The sequences are selected from OTB [21], VOT [10] and NUS-PRO [11] with diversity and richness. The statistics is shown in Table 1.

In our occlusion benchmark, we propose a new metric called Normalized Center Location Error (NCLE) for evaluating performance. For tracking result $(cx_1,cy_1,w_1,h_1)$ and ground-truth (cx, cy, w, h) where $(cx_1,cy_1)$ and (cx, cy) are center locations, the traditional CLE adopted by OTB [21] is defined as

$$\begin{aligned} CLE = \sqrt{ (cx_1-cx)^2 + (cy_1-cy)^2 }. \end{aligned}$$

(3)

A constant number, 20-pixel, is used for ranking trackers. However, for differently shaped and sized targets, 20-pixel deviation may have distinct meanings. For example, the width of a pedestrian target is usually smaller than the height, so the deviation is more serious if it is in the horizontal direction. In NCLE, we normalize the CLE by the width and height of target:

$$\begin{aligned} NCLE = min\{ \ max\{ \frac{\left| cx_1-cx\right| }{w},\frac{\left| cy_1-cy\right| }{h} \}, \ 1 \ \}. \end{aligned}$$

(4)

NCLE = 1 means a tracking failure. We utilize NCLE-based Precision Plot and Success Plot [21] as performance measurements in our occlusion benchmark.

4 Experiments

In this section, we present the experimental results of several recent tracking algorithms evaluated on our occlusion benchmark, including KCF [7], SAMF [14], DSST [3] and Staple [1]. Meanwhile, we integrate these trackers into our adaptive COD framework to validate its effectiveness. All the code is available at https://github.com/xgniu/Occlusion-Benchmark.

Table 2. Different $\gamma $ for different tracking algorithms. Our framework is not sensitive to the value of $\gamma $.

Full size table

4.1 Quantitative Evaluation

The quantitative evaluation results are shown in Fig. 3 in the form of Precision Plot and Success Plot. All the four trackers gain improvements in performance after being integrated into our adaptive occlusion detection framework. Moreover, we find that though different tracking algorithms require differently valued $\gamma $ for best performance, a wide range of $\gamma $ can provide comparable results (Table 2). The other thresholds are the same as in COD [17].

4.2 Qualitative Evaluation

Figure 4 visualizes several sequences from our occlusion benchmark along with the tracking results of different algorithms. Only the tracking results of SAMF, SAMF_OD, Staple and Staple_OD are shown for clarity, where the suffix ‘_OD’ stands for being integrated into our occlusion detection framework. As the figure shows, when occlusion occurs, SAMF_OD and Staple_OD outperform their baselines.

5 Conclusion

Based on COD [17], we propose an adaptive occlusion detection framework which calculates the proportion of target that is not occluded. To better evaluate the robustness of tracking algorithms to occlusion, we propose an occlusion benchmark that excludes other challenging factors. In our benchmark, normalized center location error is adopted as the performance measure. Much work is needed in future to solve the occlusion problem for robust visual object tracking.

References

Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1401–1409 (2016)
Google Scholar
Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2544–2550. IEEE (2010)
Google Scholar
Danelljan, M., Häger, G., Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference (BMVC). BMVA Press, Nottingham, 1–5 September 2014
Google Scholar
Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1134–1143 (2017)
Google Scholar
Gu, K., Zhou, T., Liu, F., Yang, J., Qiao, Y.: Correlation filter tracking via bootstrap learning. In: IEEE International Conference on Image Processing, pp. 459–463 (2016)
Google Scholar
Gu, K., Zhou, T., Liu, F., Yang, J., Qiao, Y.: Patch-based object tracking via locality-constrained linear coding. In: Proceedings of the 35th Chinese Control Conference, pp. 7015–7020 (2016)
Google Scholar
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)
Article Google Scholar
Kristan, M., Leonardis, A., Matas, J., Felsberg, M.: The visual object tracking VOT2017 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 1949–1972 (2017)
Google Scholar
Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016)
Article Google Scholar
Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Porikli, F., Čehovin, L.: The visual object tracking vot2013 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 564–586, December 2013
Google Scholar
Li, A., Lin, M., Wu, Y., Yang, M., Yan, S.: Nus-pro: a new visual tracking challenge. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 335–349 (2016)
Article Google Scholar
Li, Q., Qiao, Y., Yang, J.: Robust visual tracking based on local kernelized representation. In: IEEE International Conference on Robiotics and Biomimetics, pp. 2523–2528 (2014)
Google Scholar
Li, Q., Qiao, Y., Yang, J., Bai, L.: Robust visual tracking based on online learning of joint sparse dictionary. In: International Conference on Machine Vision (2013)
Google Scholar
Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 254–265. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16181-5_18
Chapter Google Scholar
Niu, X., Cui, Z., Geng, S., Yang, J., Qiao, Y.: Robust visual tracking via occlusion detection based on depth-layer information. In: International Conference on Neural Information Processing, pp. 44–53 (2017)
Chapter Google Scholar
Niu, X., Fang, X., Qiao, Y.: Robust visual tracking via occlusion detection based on staple algorithm. In: Asian Control Conference, pp. 1051–1056 (2017)
Google Scholar
Niu, X., Qiao, Y.: Context-based occlusion detection for robust visual tracking. In: IEEE International Conference on Image Processing, pp. 3655–3659 (2017)
Google Scholar
Rozumnyi, D., Kotera, J., Sroubek, F., Novotn, L., Matas, J.: The world of fast moving objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Article Google Scholar
Wang, M., Liu, Y., Huang, Z.: Large margin object tracking with circulant feature maps. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21–26 (2017)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2411–2418 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Image Processing and Pattern Recognition, Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Xiaoguang Niu, Yueyang Gu, Kuan Xu, Jie Yang, Xingqi Fang & Yu Qiao
Shanghai Electro-Mechanical Engineering Institute, Shanghai, China
Zhifeng Lu, Zehua Hong & Yi Tian

Authors

Xiaoguang Niu
View author publications
You can also search for this author in PubMed Google Scholar
Yueyang Gu
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zehua Hong
View author publications
You can also search for this author in PubMed Google Scholar
Yi Tian
View author publications
You can also search for this author in PubMed Google Scholar
Kuan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xingqi Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Qiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Qiao .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Niu, X. et al. (2018). Occlusion Detection in Visual Tracking: A New Framework and A New Benchmark. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11304. Springer, Cham. https://doi.org/10.1007/978-3-030-04212-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-030-04212-7_51
Published: 17 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04211-0
Online ISBN: 978-3-030-04212-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics