The Tenth Visual Object Tracking VOT2022 Challenge Results

Kristan, Matej; Leonardis, Aleš; Matas, Jiří; Felsberg, Michael; Pflugfelder, Roman; Kämäräinen, Joni-Kristian; Chang, Hyung Jin; Danelljan, Martin; Zajc, Luka Čehovin; Lukežič, Alan; Drbohlav, Ondrej; Björklund, Johanna; Zhang, Yushan; Zhang, Zhongqun; Yan, Song; Yang, Wenyan; Cai, Dingding; Mayer, Christoph; Fernández, Gustavo; Ben, Kang; Bhat, Goutam; Chang, Hong; Chen, Guangqi; Chen, Jiaye; Chen, Shengyong; Chen, Xilin; Chen, Xin; Chen, Xiuyi; Chen, Yiwei; Chen, Yu-Hsi; Chen, Zhixing; Cheng, Yangming; Ciaramella, Angelo; Cui, Yutao; Džubur, Benjamin; Dasari, Mohana Murali; Deng, Qili; Dhar, Debajyoti; Di, Shangzhe; Nardo, Emanuel Di; Du, Daniel K.; Dunnhofer, Matteo; Fan, Heng; Feng, Zhenhua; Fu, Zhihong; Gao, Shang; Gorthi, Rama Krishna; Granger, Eric; Gu, Q. H.; Gupta, Himanshu; He, Jianfeng; He, Keji; Huang, Yan; Jangid, Deepak; Ji, Rongrong; Jiang, Cheng; Jiang, Yingjie; Lawin, Felix Järemo; Kang, Ze; Kiran, Madhu; Kittler, Josef; Lai, Simiao; Lan, Xiangyuan; Lee, Dongwook; Lee, Hyunjeong; Lee, Seohyung; Li, Hui; Li, Ming; Li, Wangkai; Li, Xi; Li, Xianxian; Li, Xiao; Li, Zhe; Lin, Liting; Ling, Haibin; Liu, Bo; Liu, Chang; Liu, Si; Lu, Huchuan; Cruz, Rafael M. O.; Ma, Bingpeng; Ma, Chao; Ma, Jie; Ma, Yinchao; Martinel, Niki; Memarmoghadam, Alireza; Micheloni, Christian; Moallem, Payman; Nguyen-Meidine, Le Thanh; Pan, Siyang; Park, ChangBeom; Paudel, Danda; Paul, Matthieu; Peng, Houwen; Robinson, Andreas; Rout, Litu; Shan, Shiguang; Simonato, Kristian; Song, Tianhui; Song, Xiaoning; Sun, Chao; Sun, Jingna; Tang, Zhangyong; Timofte, Radu; Tsai, Chi-Yi; Gool, Luc Van; Verma, Om Prakash; Wang, Dong; Wang, Fei; Wang, Liang; Wang, Liangliang; Wang, Lijun; Wang, Limin; Wang, Qiang; Wu, Gangshan; Wu, Jinlin; Wu, Xiaojun; Xie, Fei; Xu, Tianyang; Xu, Wei; Xu, Yong; Xu, Yuanyou; Xue, Wanli; Xun, Zizheng; Yan, Bin; Yang, Dawei; Yang, Jinyu; Yang, Wankou; Yang, Xiaoyun; Yang, Yi; Yang, Yichun; Yang, Zongxin; Ye, Botao; Yu, Fisher; Yu, Hongyuan; Yu, Jiaqian; Yu, Qianjin; Yu, Weichen; Ze, Kang; Zhai, Jiang; Zhang, Chengwei; Zhang, Chunhu; Zhang, Kaihua; Zhang, Tianzhu; Zhang, Wenkang; Zhang, Zhibin; Zhang, Zhipeng; Zhao, Jie; Zhao, Shaochuan; Zheng, Feng; Zheng, Haixia; Zheng, Min; Zhong, Bineng; Zhu, Jiawen; Zhu, Xuefeng; Zhuang, Yueting

doi:10.1007/978-3-031-25085-9_25

Matej Kristan¹⁰,
Aleš Leonardis¹¹,
Jiří Matas¹²,
Michael Felsberg¹³,
Roman Pflugfelder^14,15,16,17,
Joni-Kristian Kämäräinen¹⁸,
Hyung Jin Chang¹¹,
Martin Danelljan¹⁹,
Luka Čehovin Zajc¹⁰,
Alan Lukežič¹⁰,
Ondrej Drbohlav¹²,
Johanna Björklund²⁰,
Yushan Zhang¹³,
Zhongqun Zhang¹¹,
Song Yan¹⁸,
Wenyan Yang¹⁸,
Dingding Cai¹⁸,
Christoph Mayer¹⁹,
Gustavo Fernández¹⁴,
Kang Ben²⁷,
Goutam Bhat¹⁹,
Hong Chang³³,
Guangqi Chen²⁵,
Jiaye Chen³⁵,
Shengyong Chen⁵²,
Xilin Chen³³,
Xin Chen²⁷,
Xiuyi Chen²²,
Yiwei Chen⁴⁴,
Yu-Hsi Chen²¹,
Zhixing Chen²⁵,
Yangming Cheng⁶⁴,
Angelo Ciaramella⁵⁶,
Yutao Cui³⁹,
Benjamin Džubur¹⁰,
Mohana Murali Dasari³¹,
Qili Deng²⁵,
Debajyoti Dhar⁴⁸,
Shangzhe Di²³,
Emanuel Di Nardo^55,56,
Daniel K. Du²⁵,
Matteo Dunnhofer⁶⁰,
Heng Fan⁵⁷,
Zhenhua Feng⁵⁹,
Zhihong Fu²⁵,
Shang Gao⁵⁰,
Rama Krishna Gorthi³¹,
Eric Granger³⁶,
Q. H. Gu²⁴,
Himanshu Gupta²⁸,
Jianfeng He⁵⁸,
Keji He²²,
Yan Huang²²,
Deepak Jangid²⁸,
Rongrong Ji⁶²,
Cheng Jiang³⁹,
Yingjie Jiang³⁵,
Felix Järemo Lawin¹³,
Ze Kang³⁵,
Madhu Kiran³⁶,
Josef Kittler⁵⁹,
Simiao Lai²⁷,
Xiangyuan Lan⁴¹,
Dongwook Lee⁴³,
Hyunjeong Lee⁴³,
Seohyung Lee⁴³,
Hui Li³⁵,
Ming Li²⁶,
Wangkai Li⁵⁸,
Xi Li⁶⁴,
Xianxian Li²⁹,
Xiao Li²⁵,
Zhe Li⁵⁰,
Liting Lin⁴⁶,
Haibin Ling⁴⁹,
Bo Liu³⁴,
Chang Liu²⁷,
Si Liu³²,
Huchuan Lu²⁷,
Rafael M. O. Cruz³⁶,
Bingpeng Ma⁵³,
Chao Ma⁴⁵,
Jie Ma³⁰,
Yinchao Ma⁵⁸,
Niki Martinel⁶⁰,
Alireza Memarmoghadam⁵⁴,
Christian Micheloni⁶⁰,
Payman Moallem⁵⁴,
Le Thanh Nguyen-Meidine³⁶,
Siyang Pan⁴⁴,
ChangBeom Park⁴³,
Danda Paudel¹⁹,
Matthieu Paul¹⁹,
Houwen Peng³⁷,
Andreas Robinson¹³,
Litu Rout⁴⁸,
Shiguang Shan³³,
Kristian Simonato⁶⁰,
Tianhui Song³⁹,
Xiaoning Song³⁵,
Chao Sun⁶⁴,
Jingna Sun²⁵,
Zhangyong Tang³⁵,
Radu Timofte^19,61,
Chi-Yi Tsai⁵¹,
Luc Van Gool¹⁹,
Om Prakash Verma²⁸,
Dong Wang²⁷,
Fei Wang⁵⁸,
Liang Wang²²,
Liangliang Wang²⁵,
Lijun Wang²⁷,
Limin Wang³⁹,
Qiang Wang⁴⁴,
Gangshan Wu³⁹,
Jinlin Wu²²,
Xiaojun Wu³⁵,
Fei Xie⁴⁷,
Tianyang Xu³⁵,
Wei Xu²⁵,
Yong Xu⁴⁶,
Yuanyou Xu⁶⁴,
Wanli Xue⁵²,
Zizheng Xun²³,
Bin Yan²⁷,
Dawei Yang⁵⁸,
Jinyu Yang⁵⁰,
Wankou Yang⁴⁷,
Xiaoyun Yang⁴²,
Yi Yang⁶⁴,
Yichun Yang³⁹,
Zongxin Yang⁶⁴,
Botao Ye³³,
Fisher Yu¹⁹,
Hongyuan Yu²²,
Jiaqian Yu⁴⁴,
Qianjin Yu⁵⁸,
Weichen Yu²²,
Kang Ze³⁵,
Jiang Zhai⁴⁷,
Chengwei Zhang²⁶,
Chunhu Zhang⁴⁵,
Kaihua Zhang³⁸,
Tianzhu Zhang⁵⁸,
Wenkang Zhang⁴⁷,
Zhibin Zhang⁵²,
Zhipeng Zhang⁴⁰,
Jie Zhao²⁷,
Shaochuan Zhao³⁵,
Feng Zheng⁵⁰,
Haixia Zheng⁶³,
Min Zheng²⁵,
Bineng Zhong²⁹,
Jiawen Zhu²⁷,
Xuefeng Zhu³⁵ &
…
Yueting Zhuang⁶⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13808))

Included in the following conference series:

European Conference on Computer Vision

1674 Accesses
11 Citations

Abstract

The Visual Object Tracking challenge VOT2022 is the tenth annual tracker benchmarking activity organized by the VOT initiative. Results of 93 entries are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in recent years. The VOT2022 challenge was composed of seven sub-challenges focusing on different tracking domains: (i) VOT-STs2022 challenge focused on short-term tracking in RGB by segmentation, (ii) VOT-STb2022 challenge focused on short-term tracking in RGB by bounding boxes, (iii) VOT-RTs2022 challenge focused on “real-time” short-term tracking in RGB by segmentation, (iv) VOT-RTb2022 challenge focused on “real-time” short-term tracking in RGB by bounding boxes, (v) VOT-LT2022 focused on long-term tracking, namely coping with target disappearance and reappearance, (vi) VOT-RGBD2022 challenge focused on short-term tracking in RGB and depth imagery, and (vii) VOT-D2022 challenge focused on short-term tracking in depth-only imagery. New datasets were introduced in VOT-LT2022 and VOT-RGBD2022, VOT-ST2022 dataset was refreshed, and a training dataset was introduced for VOT-LT2022. The source code for most of the trackers, the datasets, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

Access provided by Autonomous University of Puebla. Download conference paper PDF

The Eighth Visual Object Tracking VOT2020 Challenge Results

Long-term Visual Tracking: Review and Experimental Comparison

Article 07 November 2022

The Visual Object Tracking VOT2016 Challenge Results

Keywords

1 Introduction

A decade ago, the Visual Object Tracking (VOT) initiative was founded in response to the lack of standardised performance evaluation in visual object tracking. To facilitate the development of this highly active computer vision field, the first VOT2013 challenge [13] was organized in conjunction with ICCV2013. Encouraged by the strong interest of the emerging community, eight VOT challenges have been organized since, with the results presented at the accompanying workshops at major computer vision conferences: ECCV2014 (VOT2014 [14]), ICCV2015 (VOT2015 [12]), ECCV2016 (VOT2016 [10]), ICCV2017 (VOT2017 [9]), ECCV2018 (VOT2018 [8]), ICCV2019 (VOT2019 [6]), ECCV2020 (VOT2020 [7]), ICCV2021 (VOT2021 [11]). The VOT challenge is now the main annual tracking performance evaluation event in computer vision.

The primary mission of the VOT initiative has been the promotion of the development of general trackers for single-camera, single-target, model-free, causal tracking. For nearly a decade the VOT has thus been a community-driven forum for gradual development and in-situ testing of performance evaluation protocols, dataset development and exploration of the tracking challenges landscape. The VOT2013 [13] started with a single short-term tracking challenge; VOT-ST. In VOT2014 [14] the VOT-TIR challenge was added to explore tracking in thermal imagery. In VOT2017 [9] the real-time tracking challenge VOT-RT was established to promote tracking speed and computational efficiency in parallel to robustness. Long-term tracking challenge VOT-LT was introduced in VOT2018 [8] and a year later in VOT2019 [6], multi-modal (RGB+thermal and RGB+depth) tracking challenges VOT-RGBT and VOT-RGBD were introduced.

Particular attention has been put on the development of informative performance evaluation measures. Two basic weakly correlated performance measures were introduced in VOT2013 [13] to evaluate the tracking accuracy and robustness of short-term trackers. A ranking-based methodology to identify the top performers was also proposed but was abandoned in VOT2015 [12] in favor of a more principled and interpretable combination of the primary scores in form of the expected average overlap score EAO. For the first seven VOT challenges, the measures were calculated under a reset-based protocol, in which a tracker is reset upon drifting off the target. This protocol was replaced in VOT2020 [7] by the anchor-based evaluation protocol that produces the most stable performance evaluation results compared to related protocols, yet inherits the benefits from the reset-based protocol. Similarly, a performance evaluation protocol and measures tailored for long-term tracking have been developed [16] and applied first in VOT2018 [8]. These measures have consistently shown good evaluation capabilities for long-term trackers.

Several datasets have been developed over the years. A dataset creation and maintenance protocol has been established for the main short-term tracking challenge to produce datasets which are sufficiently small for practical evaluation yet include a variety of challenging tracking situations for in-depth analysis. In VOT2017 [9], a sequestered dataset for identification of the short-term tracking challenge winner was introduced. This dataset has been refreshed along with the public versions over the years. Alongside, datasets specialized for long-term tracking, RGB+thermal and RGBD tracking were constructed and gradually updated.

In most of the VOT challenges, the trackers are required to report the target position as an axis-aligned bounding box. While this is a reasonable target state encoding, the VOT short-term tracking challenge gradually explored more detailed pose encodings to push the bar on tracking accuracy and expand the range of applications. Thus rotated bounding boxes were introduced in VOT2014 [14]. To reduce human annotation bias, VOT2016 [10] introduced fitting rotated bounding boxes to semi-automatically segmented objects in each frame. In VOT2020 [7] bounding boxes were abandoned and the short-term trackers are required to provide full target segmentation (the VOT-ST dataset was accordingly re-annotated to ensure high ground truth accuracy) – with this move, the VOT short-term tracking challenge has started narrowing the gap between visual object tracking and the related field of video object segmentation. The remaining challenges (VOT-LT, VOT-RGBD, VOT-RGBT) maintain axis-aligned target annotation.

This paper presents the tenth edition of the VOT challenges – the VOT2022 challenge. After two years of virtual editions due to the global Covid19 pandemic, the 10th anniversary of VOT was organized in a mixed form with in-person and online attendance, in conjunction with the ECCV2022 Visual Object Tracking VOT2022 Workshop. In the following, we overview the challenge and participation requirements.

1.1 The VOT2022 Challenge

The evaluation toolkit and the datasets are provided by the VOT2022 organizers. The challenges opened in the first week of April and closed on May 3rd. The winners of individual challenges were identified in late June, but not publicly disclosed. The results were presented at the ECCV2022 VOT2022 workshop on 24th October. The VOT2022 challenge contained seven challenges:

1.
VOT-STs2022 challenge addressed short-term tracking by target segmentation in RGB images.
2.
VOT-STb2022 challenge addressed short-term tracking by bounding boxes in RGB images.
3.
VOT-RTs2022 challenge addressed the same class of trackers as VOT-STs2022, except that the trackers had to process the sequences in real-time.
4.
VOT-RTb2022 challenge addressed the same class of trackers as VOT-STb2022, except that the trackers had to process the sequences in real-time.
5.
VOT-LT2022 challenge addressed long-term tracking by bounding boxes in RGB images.
6.
VOT-RGBD2022 challenge addressed short-term tracking by bounding boxes in RGB+depth (RGBD) imagery.
7.
VOT-D2022 challenge addressed short-term tracking by bounding boxes in depth map images.

The authors participating in the challenge were required to integrate their tracker into the VOT2022 evaluation kit, which automatically performed a set of standardized experiments. The results were analyzed according to the VOT2022 evaluation methodology.

Participants were encouraged to submit their own new or previously published trackers as well as modified versions of third-party trackers. In the latter case, modifications had to be significant enough for acceptance. Participants were expected to submit a single set of results per tracker If a participant coauthored several submissions with a similar design, only the top performer from this cluster was considered to compete in the final top-performer ranking and winner identification.

Each submission was accompanied by a short abstract describing the tracker, which was used for the short tracker descriptions in Appendix [5] – the authors were asked to provide a clear description useful to the readers of the VOT2022 results report. In addition, participants filled out a questionnaire on the VOT submission page to categorize their tracker according to various design properties. Authors were encouraged to submit their tracker integrated into a Singularity container provided by VOT, which allows result reproduction and aids potential further evaluation. The participants with sufficiently well-performing submissions who contributed to the text for this paper and agreed to make their tracker code publicly available from the VOT page (or upon request) were offered co-authorship of this results paper. The committee reserved the right to disqualify any tracker that, by their judgement, attempted to cheat the evaluation protocols or failed in the post-hoc evaluation.

Methods considered for prizes in the VOT2022 challenge were not allowed to be trained on certain datasets (OTB, VOT, ALOV, UAV123, NUSPRO, TempleColor and RGBT234), except for VOT-LT2022, where the VOT-LT2021 dataset was allowed. For GOT10k, a list of 1k prohibited sequences was created in VOT2019, while the remaining 9k+ sequences were allowed for learning. The reason was that part of the GOT10k was used in the VOT-ST2022 dataset.

The use of class labels specific to VOT was not allowed (i.e., identifying a target class in each sequence and applying pre-trained class-specific trackers was not allowed). The organizers of VOT2022 were allowed to participate in the challenge but were not eligible to win. Further details are available from the challenge homepage^{Footnote 1}.

VOT2022 goes beyond previous challenges by updating the datasets in VOT-ST2022 and VOT-RT2022, introducing a training dataset as well as a sequestered dataset in the VOT-RGBD2022 challenge, introducing a depth-only tracking challenge VOT-D2022 and a new challenging VOT-LT2022 tracking dataset. The Python VOT evaluation toolkit was updated as well.

The remainder of this report is structured as follows. Section 2 describes the performance evaluation protocols, Sect. 3 describes the individual challenges, Sect. 4.5 overviews the results and conclusions are drawn in Sect. 5. Short descriptions of the tested trackers are available in Appendix [5].

2 Performance Evaluation Protocol

Since VOT2018, the VOT challenges adopt the following definitions from [16] to distinguish between short-term and long-term trackers:

Short-term tracker (\(\textrm{ST}_0\)). The target position is reported at each frame. The tracker does not implement target re-detection and does not explicitly detect occlusion.
Short-term tracker with conservative updating (\(\textrm{ST}_1\)). The target position is reported at each frame. Target re-detection is not implemented, but tracking robustness is increased by selectively updating the visual model depending on a tracking confidence estimation mechanism.
Pseudo long-term tracker (\(\textrm{LT}_0\)). The target position is not reported in frames when the target is predicted not visible. The tracker does not implement explicit target re-detection but uses an internal mechanism to identify and report tracking failure.
Re-detecting long-term tracker (\(\textrm{LT}_1\)). The target position is not reported in frames when the target is predicted not visible. The tracker detects tracking failure and implements explicit target re-detection.

Since the two classes of trackers make distinct assumptions on target presence, separate performance measures and evaluation protocols were designed in VOT to probe the tracking properties.

2.1 The Short-Term Evaluation Protocols

The short-term performance evaluation protocol entails initializing the tracker at several frames in the sequence, called the anchor points, which are spaced approximately 50 frames apart. The tracker is run from each anchor - in the first half of the sequences in the forward direction, for anchors in the second half backwards, till the first frame. Performance is evaluated by two basic measures accuracy (A) and robustness (R).

Accuracy is the average overlap on frames before tracking failure, averaged over all sub-sequences. Robustness is the percentage of successfully tracked sub-sequence frames, averaged over all sub-sequences. Tracking failure is defined as the frame at which the overlap between the ground truth and predicted target position dropped below 0.1 and did not increase above this during the next 10 frames. This definition allows short-term failure recovery in short-term trackers. The primary performance measure is the expected average overlap EAO, which is a principled combination of tracking accuracy and robustness. Please see [7] for further details on the VOT short-term tracking performance measures.

2.2 The Long-Term Evaluation Protocol

The long-term performance evaluation protocol follows the protocol proposed in [16] and entails initializing the tracker in the first frame of the sequence and running it until the end of the sequence. The tracker is required to report the target position in each frame along with a score that reflects the certainty that the target is present at that position. Performance is measured by two basic measures called the tracking precision (Pr) and the tracking recall (Re), while the overall performance is summarized by the tracking F-measure.

The performance measures depend on the target presence certainty threshold, thus the performance can be visualized by the tracking precision-recall and tracking F-measure plots obtained by computing these scores for all thresholds. The final values of Pr, Re and F-measure are obtained by selecting the certainty threshold that maximizes tracker-specific F-measure. This avoids all manually-set thresholds in the primary performance measures.

3 Description of Individual Challenges

3.1 VOT-ST2022 Challenge Outline

This challenge addressed RGB tracking in a short-term tracking setup. The initial VOT challenges required target prediction in form of bounding boxes, while a transition to segmentation output requirement has been made in VOT2020. Nevertheless, to support the very much active community that develops bounding box prediction trackers, the bounding box challenge is re-introduced in VOT2022. Thus the VOT-ST2022 ran two subchallenges: the main segmentation-based short-term tracking challenge VOT-STs2022, and the legacy bounding-box-based short-term tracking challenge VOT-STb2022.

The Dataset. Results of the VOT2021 showed that the dataset was not saturated [11], thus the public dataset has been only refreshed by the addition of two sequences which include new challenging scenarios not present in previous VOT datasets: (i) a transparent deforming object and (ii) a flat object with significant out of plane rotations (see Fig. 1). The sequestered dataset has been updated with two sequences matching the public dataset extension.

The new sequences were frame-by-frame semi-automatically segmented to provide the segmentation ground truth for the main VOT-STs2022 subchallenge. For the legacy VOT-STb2022 subchallenge, the target position was annotated in all sequences by fitting axis-aligned bounding boxes to the target segmentation masks. Per-frame visual attributes were semi-automatically assigned to the new sequences following the VOT attribute annotation protocol. In particular, each frame was annotated by the following visual attributes: (i) occlusion, (ii) illumination change, (iii) motion change, (iv) size change, (v) camera motion.

Winner Identification Protocol. The VOT-STs2022 winner was identified as follows. Trackers were ranked according to the EAO measure on the public dataset. The top five ranked trackers were then re-run by the VOT2022 committee on the sequestered dataset. The top-ranked tracker on the sequestered dataset not submitted by the VOT2022 committee members is the winner. The same protocol was used to identify the winner of the legacy short-term challenge VOT-STb2022.

3.2 VOT-RT2022 Challenge Outline

This challenge addressed real-time RGB tracking in a short-term tracking setup. The dataset was the same as in the VOT-ST2022 challenge, but the evaluation protocol was modified to emphasize the real-time component in tracking performance. In particular, the VOT-RT2022 challenge requires predisetcting bounding boxes faster or equal to the video frame rate. The toolkit sends images to the tracker via the Trax protocol [21] at 20fps. If the tracker does not respond in time, the last reported bounding box is assumed as the reported tracker output at the available frame (zero-order hold dynamic model). The same performance evaluation protocol as in VOT-ST2022 is then applied. As in VOT-ST2022, two realtime subchallenges were considered: the main segmentation-based realtime subchallenge VOT-RTs2022 and the legacy bounding-box-based realtime subchallenge VOT-RTb2022.

Winner Identification Protocol. All trackers are ranked on the public RGB short-term tracking dataset with respect to the EAO measure. The winner of the main VOT-RTs2022 subchallenge was identified as the top-ranked tracker not submitted by the VOT2022 committee members. The same methodology was applied to identify the winner of the VOT-RTb2022 challenge.

3.3 VOT-LT2022 Challenge Outline

This challenge addressed RGB tracking in a long-term tracking setup and is a continuation of the VOT-LT2021 challenge. We adopt the definitions from [16], which are used to position the trackers on the short-term/long-term spectrum. A long-term performance evaluation protocol and measures from Sect. 2.2 were used to evaluate tracking performance on VOT-LT2022. Compared to VOT-LT2021, a significant change is a new dataset described in the following.

The Dataset. The new VOT-LT dataset contains 50 sequences, carefully selected to obtain a dataset with long sequences containing many target disappearances. The LTB50 [16], which was used in VOT-LT2021, is the training set this year. The new VOT-LT dataset contains 50 challenging sequences of diverse objects (persons, cars, motorcycles, bicycles, boats, animals, etc.) with a total length of 168,282 frames. The sequence resolution is 1280 \(\times \) 720. Each sequence contains on average 10 long-term target disappearances, each lasting on average 52 frames. An overview of the dataset is shown in Fig. 2.

The targets are annotated by axis-aligned bounding boxes. Sequences are annotated by the following visual attributes: (i) full occlusion, (ii) out-of-view, (iii) partial occlusion, (iv) camera motion, (v) fast motion, (vi) scale change, (vii) aspect ratio change, (viii) viewpoint change, (ix) similar objects. Note this is per-sequence, not per-frame annotation and a sequence can be annotated by several attributes. Compared with LTB50, the new VOT-LT dataset is more challenging in small objects, similar objects, fast motion, and full/partial occlusions.

Winner Identification Protocol. The VOT-LT2022 winner was identified as follows. Trackers were ranked according to the tracking F-score on the new LT dataset (no sequestered dataset available). The top-ranked tracker on the dataset not submitted by the VOT2022 committee members is the winner of the VOT-LT2022 challenge.

3.4 VOT-RGBD2022 Challenge Outline

The first RGBD (RGB and Depth) challenge was introduced to VOT 2019 and the two first challenges were based on the same public dataset, CDTB [15], which consists of 80 sequences where the target momentarily disappears or is fully occluded. In VOT 2021, the CDTB dataset was replaced with new sequences captured with an Intel RealSense 415 RGBD camera that provides spatially aligned RGB and depth frames. The 2021 dataset contained 80 public and 50 sequestered test sequences. The main motivation for the new dataset was to make it more challenging in the sense that sometimes depth cue is more informative and sometimes RGB. Moreover, separate training and test sequences were provided to allow method fine-tuning with dataset-specific data. More details about the dataset and its properties can be found from [25]. The two major changes as compared to the previous years’ RGBD tracks are that 1) the challenge is now a short-term (ST) tracking challenge and 2) the challenge is divided into RGBD and depth-only (D) tracks in order to better understand how much depth contributes to RGBD tracking, i.e. complementarity of the two modalities.

The main motivation to switch from the long-term evaluation to short-term evaluation is that in the long-term setting the target disappearance played an important role and many of the proposed RGBD trackers used the depth channel to assist in occlusion detection, but otherwise the cue was omitted. Now, the two tracks, RGBD and D, provide information about the complementary properties of color texture and depth. It is noteworthy that the RGBD and D challenges use otherwise exactly the same data.

The Dataset. Inspired by the recent work on depth-only tracking [26], we converted the long-term sequences from the CDTB dataset used in the first two VOT-RGBD challenges and DepthTrack used in the latest challenge, to short-term sequences. We converted all 80 sequences from CDTB and 50 test sequences of DepthTrack. Since the DepthTrack training sequences were not used they can be used in training learning-based trackers. The short-term sequences were manually checked and sequences with poor depth information or other errors were removed. Finally, 127 sequences were selected and published on the VOT Web site. See Fig. 3 for example frames.

VOT-D2022. The data for the VOT-D2022 challenge is exactly the same as for VOT-RGBD except that the RGB frames are removed.

Winner Identification Protocol. The VOT-RGBD2022 and VOT-D2022 winners were identified as follows. Trackers were ranked according to the EAO measure on the public dataset and the top-ranked tracker on the public dataset not submitted by the VOT2022 committee members is the winner. The same protocol was used to identify the winners of both the VOT-RGBD and VOT-D challenges.

4 The VOT2022 Challenge Results

This section summarizes the trackers submitted, results analysis and winner identification for each of the VOT2022 challenges. Due to page limit, we provide the appendix with more detailed descriptions of the submitted trackers in the supplementary material [5]. For browsing convenience, we also compiled a version of the paper with the appendix included – please see the VOT2022 resutls page^{Footnote 2} for this verison.

4.1 The VOT-STs2022 Challenge Results

The VOT-STs2022 challenge tested 31 trackers, including the baselines contributed by the VOT committee. Each submission included the binaries or source code that allowed verification of the results if required. In the following, we briefly overview the entries and provide the references to original papers in the Appendix [5] where available.

Of the participating trackers, 13 trackers (42\(\%\)) were categorized as ST\(_0\), 14 trackers ( 45\(\%\)) as ST\(_1\), and 4 (13\(\%\)) as LT\(_0\). 81\(\%\) applied discriminative and 19\(\%\) applied generative models. Most trackers (81\(\%\)) used a holistic model, while 19\(\%\) of the participating trackers used part-based models. Most trackers (75\(\%\)) applied an equally probably displacement within a region centered at the current position^{Footnote 3} or a random walk dynamic model (25\(\%\)). 42\(\%\) of trackers localized the target in a single stage, while the rest applied several stages, typically involving approximate target localization and position refinement. Most of the trackers (84\(\%\)) use deep features. The majority of the submissions (72\(\%\)) localized the target by segmentation, while the rest reported a bounding box.

The trackers were based on various tracking principles. 11 trackers were based on classical or deep discriminative correlation filters (RTS, ATOM_AR, DiMP_AR, KYS_AR, PrDiMP_AR, CSRDCF, D3Sv2, SuperDiMP_AR, KCF, LWL, LWL-B2S), 2 trackers were based purely on Siamese correlation (SiamFC, SiamUSCMix), 14 trackers were based on transformers (DAMT, DAMTMask, DGformer, Linker, MixFormerM, MS_AOT, OSTrackSTS, SwinT, SRATransTS, TransLL, TransT, transt_ar, TransT_M, and TRASFUSTm), two were deformable parts trackers (ANT and LGT), a meanshift tracker (ASMS), and a video-object segmentation method adapted to tracking (STM).

In summary, we observe a significant increase in a new class of trackers identified in VOT2021 – the transformers. In fact, 47% of trackers are now from this class, 41% of trackers apply discriminative correlation filters, while 6% apply classical siamese correlation networks.

Results. The results are summarized in the AR-raw plots and EAO plots in Fig. 4 and in Table 9. The top ten trackers according to the primary EAO measure (Fig. 4) are MS_AOT, DAMTMask, MixFormerM, OSTrackSTS, Linker, SRATransTS, TransT_M, DGformer, TransLL and LWL-B2S. Nine of the top trackers apply transformers as the core tracking methodology and one applied deep DCFs. Seven apply a two-stage target localization, meaning that they first localize the target by a bounding box and the segment the target withing the bounding box with a separate network (two of these apply Alpha-Refine [24] – the winner of VOT-RT2020 challenge). Three of the top 10 trackers are single-stage, meaning that they directly segment the target. Four of the trackers are apply elements (or are extensions) of MixFormers [3], four extend TransT [2] and three apply ViT [4].

The top tracker on the public set according to EAO is MS_AOT, which is based on the recent transformer-based video object segmentation AOT [28]. For normal-sized objects, the tracker acts as a single-stage segmentation method. For tiny objects, the tracker works in a two-stage regime in which the object is first localized by bounding box using MixFormer [3] and then segmented by the AOT.

The second-best tracker is DAMTMask, which is build on top of MixFormer [3] and SuperDiMP [1], and applied a two-stage target localization and segmentation approach. The target location is predicted by RepPoints [27] and a MixFormer-like head is implemented to predict the segmentation mask.

The third-best tracker is MixFormerM, a two-stage tracker which uses a new mixed attention module for simultaneous feature extraction and target information fusion.

The three top performers in EAO are among the top three performers in accuracy (A) and robustness (R) measures as well (Table 9). While these trackers are comparable in target localization accuracy, MS_AOT stands out by its remarkable robustness (Fig. 4).

Table 1. VOT-STs2022 tracking difficulty with respect to the following visual attributes: camera motion (CM), illumination change (IC), motion change (MC), occlusion (OC) and size change (SC).

Full size table

Three of the tested trackers have been published in major computer vision journals and conferences in the last two years (2021/2022). These trackers are indicated in Fig. 4, along with their average performance (EAO = 0.504), which constitutes the VOT2022 state-of-the-art bound. Approximately 32% of the submissions exceed this bound.

The per-attribute robustness analysis is shown in Fig. 5 for individual trackers. The overall top performers remain at the top of per-attribute ranks as well. MS_AOT achieves top robustness in all attributes. According to the median failure over each attribute (Table 1) the most challenging attribute remains occlusion. The drop on this attribute is consistent for all trackers (Fig. 5).

The VOT-STs2022 Challenge Winner.

The top five trackers from the baseline experiment (Table 9) were re-run on the sequestered dataset. Their scores obtained on the sequestered dataset are shown in Table 2. The top tracker according to the EAO is MS_AOT and is thus the VOT-STs2022 challenge winner.

Table 2. The top five trackers from Table 9 re-ranked on the VOT-STs2022 sequestered dataset.

Full size table

4.2 The VOT-STb2022 Challenge Results

The VOT-STb2022 challenge tested 41 trackers, including the baselines contributed by the VOT committee. Each submission included the binaries or source code that allowed verification of the results if required. In the following, we briefly overview the entries and provide the references to original papers in the Appendix [5] where available. The trackers were based on various tracking principles. 13 trackers were based on classical or deep discriminative correlation filters (SuperFus, TCLCFcpp, KCF, D3Sv2, DiMP, ATOM, CSRDCF, SuperDiMP, PrDiMP, FSC2F, oceancycle, DeepTCLCF, KYS), 4 trackers were based purely on Siamese correlation (NfS, SiamUSCMix, SiamVGGpp, SiamFC), 19 trackers were based on transformers (TransT_M, TransT, ADOTstb, GOANET, DAMT, tomp, TransLL, APMT_MR, APMT_RT, DGformer, Linker_B, MixFormer, ViTCRT, MixFormerL, OSTrackSTB, SRATransT, vittrack, SwinTrack, SBT), one ensamble-based (TRASFUST), one was based on meta-learning (ReptileFPN), one was scale-adaptive mean-shift tracker (ASMS), and two were part-based generative trackers (ANT and LGT).

Results. The results are summarized in the AR-raw plots and EAO plots in Fig. 6, and in Table 10. The top ten trackers according to the primary EAO measure (Fig. 6) are DAMT, MixFormerL, OSTrackSTB, APMT_MR, MixFormer, APMT_RT, ADOTstb, SRATransT, Linker_B, TransT_M. Like in the segmentation tracking challenge VOT-STs2022, all top ten trackers apply transformers. In fact, seven of the top trackers are modifications of segmentation-based counterparts, ranked among the top ten trackers on the VOT-STs2022: MixFormerL, DAMT, OSTrackSTB, MixFormer, SRATransT, Linker, TransT.

All three top-ranked trackers on the public dataset according to EAO, are counterparts of the top-ranked trackers on the main segmentation challenge VOT-STs2022. The two top performers, with equal EAO are MixFormerL and DAMT. MixFormerL, is a counterpart of the tracker ranked third on VOT-STs2022, while DAMT is a counterpart of the second-ranked tracker on VOT-STs2022. The two trackers excel in different tracking properties. DAMT is more robust than MixFormerL, while MixformerL is delivers a more accurate target estimation than DAMT. The third-best ranked tracker is OSTrackSTB is a counterpart of the fourth-best ranked tracker on VOT-STs2022.

Seven of the tested trackers have been published in major computer vision journals and conferences in the last two years (2021/2022). These trackers are indicated in Fig. 6, along with their average performance (EAO=0.484), which constitutes the VOT2022 state-of-the-art bound. Approximately 43.9% of the submissions exceed this bound.

The per-attribute robustness analysis is shown in Fig. 5 for individual trackers. The overall top performers remain at the top of per-attribute ranks as well, however, none of the trackers consistently outperforms the rest in all attributes. According to the median failure over each attribute (Table 3) the most challenging attribute remains occlusion. The drop on this attribute is consistent for all trackers (Fig. 5).

Table 3. VOT-STb2022 tracking difficulty with respect to the following visual attributes: camera motion (CM), illumination change (IC), motion change (MC), occlusion (OC) and size change (SC).

Full size table

The VOT-STb2022 Challenge Winner. Top trackers from the baseline experiment (Table 10) were re-run on the sequestered dataset. Since some of the top trackers were variations of the same tracker, the VOT committee selected only the top-performing variant as a representative to be run on the sequestered dataset. Note that there are several ways to specify the ground truth against which the predicted bounding boxes from the trackers can be evaluated. The most straight-forward way is to fit bounding boxes to the ground truth masks (as done in the public evaluation). However, the most accurate ground truth target location specification is actually a segmentation mask and the predicted bounding box from the tracker can be considered as its parametric approximation. We thus inspected the tracker performance for winner identification along the bounding box ground truth specification and along the segmentation mask ground truth specification.

The scores using the bounding box ground truth are shown in Table 4, while the scores using the segmentation mask ground truth are shown in Table 5. We observe that the tracker ranks remain the same across the two ground truth specifications, except from the top two, who switch ranks. For this reason, both top-performers are determined as the winners of the VOT-STb2022 challenge, each in its category. The winner of the VOT-STb2022 challenge in the bounding box ground truth category is OSTrackSTB, while the winner in the segmentation mask ground truth category is APMT_MR.

Table 4. The top five trackers from Table 10 re-ranked on the VOT-STb2022 sequestered dataset using the bounding box ground truth.

Full size table

Table 5. The top five trackers from Table 10 re-ranked on the VOT-STb2022 sequestered dataset using the segmentation masks as ground truth.

Full size table

4.3 The VOT-RTs2022 Challenge Results

The trackers that entered the VOT-STs2022 challenge were also run on the VOT-RTs2022 challenge. Thus the statistics of submitted trackers were the same as in VOT-ST2022. For details please see Sect. 4.2.

Results. The EAO scores and AR-raw plots for the trackers participating in the VOT-RTs2022 challenge are shown in Fig. 7 and Table 9. The top ten segmentation-based real-time trackers are MS_AOT, OSTrackSTS, SRATransTS, TransT_M, DGformer, MixFormerM, TransLL, TransT and Linker and RTS.

Nine of the top ten trackers are based on transformers. Nine trackers are ranked among to top 10 on the VOT-STs2022 challenge: MS_AOT, OSTrackSTS, SRATransTS, TransT_M, DGformer, MixFormerM, TransLL, Linker and rts, while TransT is a variation of TransT_M. The top-ranked tracker on realtime challenge according to EAO is MS_AOT, which is also the top-performer on the VOT-STs2022 public datast, the second-best is OSTrackSTS, which ranks fourth on VOT-STs2022 and the third is SRATransTS, which ranks seventh on VOT-STs2022. This indicates significant advancement in the field of visual object tracking since the inception of the VOT realtime challenges, indicating that the speed limitation of modern robust trackers has been confidently breached by transformers.

Three of the tested trackers have been published in major computer vision journals and conferences in the last two years (2021/2022). These trackers are indicated in Fig. 7, along with their average performance (EAO = 0.422), which constitutes the VOT2022 state-of-the-art bound. Approximately 45.2% of the submissions exceed this bound.

The VOT-RTs2022 Challenge Winner. According to the EAO results in Table 9, the top performer and the winner of the segmentation-based real-time tracking challenge VOT-RTs2022 is MS_AOT.

4.4 The VOT-RTb2022 Challenge Results

The trackers that entered the VOT-STb2022 challenge were also run on the VOT-RTb2022 challenge. Thus the statistics of submitted trackers were the same as in VOT-STb2022. For details please see Sect. 4.1 and [5].

Results. The EAO scores and AR-raw plots for the trackers participating in the VOT-RTb2022 challenge are shown in Fig. 8 and Table 10. The top ten bounding-box-based real-time trackers are OSTrackSTB, APMT_RT, MixFormer, APMT_MR, SRATransT, DAMT, TransT_M, vittrack, SBT, TransT. All of these are based on transformers. Seven are among the top ten performers on the public dataset in VOT-STb2022: OSTrackSTB, APMT_RT, MixFormer, APMT_MR, SRATransT, DAMT and TransT_M. Thus, similarly to VOT-RTs2022, results here show that performance is minimally compromised if at all on account of speed in transformer-based tracking.

The top-performer according to the EAO on the public dataset is OSTrackSTB, which is based on the recent OSTrack [29] and uses a ViT [4] backbone. This tracker is ranked third on VOT-STb2022. The second and the third-best trackers on VOT-RTb2022 are APMT_RT and MixFormer, which are ranked fourth and fifth on VOT-STb2022.

Note that 7 of the tested trackers have been published in major computer vision journals and conferences in the last two years (2021/2022). These trackers are indicated in Fig. 8, along with their average performance (EAO=0.421), which constitutes the VOT2022 state-of-the-art bound. Approximately 53.7% of the submissions exceed this bound.

The VOT-RTb2022 Challenge Winner.

According to the EAO results in Table 10, the top performer and the winner of the bounding-box-based real-time tracking challenge VOT-RTb2022 is OSTrackSTB.

4.5 The VOT-LT2022 Challenge Results

Trackers Submitted. The VOT-LT2022 challenge received 7 valid entries. The VOT2022 committee contributed additional trackers SuperDiMP and KeepTrack as baselines; thus 9 trackers were considered in the challenge. In the following, we briefly overview the entries and provide the references to original papers in [5] where available.

All participating trackers were categorized as ST\(_1\) according to the ST-LT taxonomy from Sect. 2 in that they implemented explicit target re-detection. All trackers were based on convolutional neural networks. Four trackers applied Transformer architecture akin to STARK [23] for target localization (CoCoLoT, mixLT, mlpLT, and VITKT_M). Particularly, VITKT_M is based purely on a Transformer-backbone [20] for feature extraction. Four trackers applied SuperDiMP structure [1] as their basic tracker (ADiMPLT, mixLT, mlpLT, SuperDiMP). Three trackers selected KeepTrack [18] as their auxiliary tracker due to its robustness to distractors (CoCoLoT, VITKT_M, KeepTrack). One tracker is based on MixFormer [3] to design a long-term tracker that focuses on target recapture (HuntFormer). One tracker extends the D3Sv2 [17] short-term tracker with long-term capabilities (D3SLT). Four trackers combined different tracking methods and switched them based on their tracking scores (CoCoLoT, D3SLT, mixLT, mlpLT, VITKT_M). Among them, two trackers use an online real-time MDNet-based [19] verifier to determine the tracking score (CoCoLoT, D3SLT).

Table 6. List of trackers that participated in the VOT-LT2022 challenge along with their performance scores (Pr, Re, F-score) and ST/LT categorization.

Full size table

Results. The overall performance is summarized in Fig. 9 and Table 6. The top-three performers are VITKT_M, mixLT_LT and HuntFormer. VITKT_M obtains the highest F-score (0.617) in 2022, while last year winner (mlpLT) obtains 0.565. Since the new VOT-LT dataset is more challenging, it should be noted that the average F-Score of these trackers decreased by 11.4% than last year. All the results are based on the submitted numbers, but these were verified by running the codes multiple times. The VITKT_M is composed of a Transformer-based tracker VitTrack, an auxiliary tracker KeepTrack and a motion module. Specifically, the master tracker VitTrack is a Transformer-based tracker composed of a backbone network, a corner prediction head and a classification head. Besides, a simple motion module is used to predict the target current state according to the temporal trajector. When scores of VitTrack and KeepTrack are all lower than a threshold, and the target moves abnormally, this motion module is triggered to predict the current state.

The mixLT architecture is a progressive fusion of multiple trackers, mainly STARK [23] and SuperDiMP. Specifically, it first fuses the results of two trackers, STARK-ST50 and STARK-ST101. The states of two trackers are then corrected based on the fusion resuls. SuperDiMP controlled by meta-updater is introduced for further fusion between dissimilar trackers, in order to improve the robustness of long-term tracking. The final tracking result is determined according to the confidences of the trackers over several frames, and another tracker correction is performed.

Based on MixFormer, the tracker HuntFormer propose an effective motion prediction model that provides a reliable search region for the tracker to recapture the target. Meanwhile, we propose a novel soft-threshold-based dynamic memory update model, which keeps a set of reliable target templates in the memory that can be used to match the target position in the search region. The two modules cooperate with each other, which greatly improves the recapture ability of the tracker.

The VITKT_M achieves an overall best F-score and significantly surpasses mixLT (by 1.7%) and MixFormer (by 1.9%). All of these methods are based on Transformer. Two similar trackers, VITKT_M and VITKT were submitted by one team. The only difference is that the VITKT is a more concise version than VITKT_M without the motion module. When ablating the motion module (VITKT), the F-score decreases by 1.2%. Since VITKT is a minor variant of VITKT_M, we only keep VITKT_M in our ranking.

The VOT-LT2022 Challenge Winner. According to the F-score in Table 6, the top-performing tracker is VITKT_M, closely followed by mixLT and HuntFormer. Thus the winner of the VOT-LT 2021 challenge is VITKT_M.

4.6 The VOT-RGBD2022 Challenge Results

Eight trackers were submitted to the 2022 RGBD challenge: DMTracker, keep_track, MixForRGBD, OSTrack, ProMix, SAMF, SBT_RGBD and SPT.

All trackers are based on the popular deep learning-based tracker architectures that have performed well in the previous years VOT RGB challenges. The new deep architecture for this year is MixFormer [3] that is in multiple submissions (MixForRGBD, ProMix and SAMF). The main difference between the submitted trackers is how they fuse the two modalities, depth and RGB, and in their training prodedures. Some teams submitted multiple trackers, but since their architectures are different they were all accepted.

Results. The Expected Average Overlap (EAO), Accuracy (A) and Robustness (R) metrics of the submitted and a number of additional trackers are shown in Table 7. The two best performing trackers, MixForRGBD and SAMF, are distinctively better than the next ones. The six best performing trackers are this year submissions while the DepthTrack database baseline, DeT_DiMP50_Max, is the seventh. The two RGB trackers perform the worst as was expected.

The VOT-RGBD2022 Challenge Winner. The results in Fig. 10 show that MixForRGBD and SAMF perform very similarly and are clearly better than the rest. Still, MixForRGBD obtains the best EAO score and is thus the winner of the VOT-RGBD2022 challenge.

Table 7. Results for the eight submitted VOT-RGBD2022 trackers. For comparison, the table also includes the results for the three best performing RGBD trackers from VOT2020 (ATCAIS) and VOT2021 (STARK_RGBD and DRefine), two strong baseline RGB trackers from the previous years (DiMP and ATOM) and the baseline RGBD tracker from the DepthTrack dataset (DeT_DiMP50_Max [25]).

Full size table

4.7 The VOT-D2022 Challenge Results

The VOT-D2022 challenge uses the same 127 short-time tracking sequences as the above RGBD2022 challenge, but in the D (depth-only) challenge the trackers are provided only the depth map frames. This challenge was added as it was intriguing to study how much RGB adds to the depth cue and what is the complementary power of the two modalities.

The total of six trackers were submitted to the depth-only challenge. The submitted trackers are: CoDeT, MixFormerD, OSTrack_D, RSDiMP, SBT_Depth and UpDoT.

Not surprisingly the D-only challenge attracted submissions from the same groups that also participated the RGBD challenge. For example, CoDeT is a D-only version of DMTRacker, MixFormerD of MixForRGBD, OSTrack_D of OSTrack, and SBT_Depth of SBT_RGB. RSDiMP is from the same group as the SPT RGBD tracker, but the two architectures are different. The authors of CoDeT also submitted UpDoT which corresponds to standard DiMP trained with two different versions of depth data.

Table 8. Results for the six submitted VOT-D2022 trackers. For comparison, the table also includes the results for the recent dept-only tracker DOT [26] and RGB DiMP that was trained with RGB but tested with colormap converted depth images.

Full size table

Table 9. Results for VOT-STs2022 and VOTs-RT2022 challenges. Expected average overlap (EAO), accuracy and robustness are shown. For reference, a no-reset average overlap AO [22] is shown under Unsupervised.

Full size table

Table 10. Results for VOT-STb2022 and VOTb-RT2022 challenges. Expected average overlap (EAO), accuracy and robustness are shown. For reference, a no-reset average overlap AO [22] is shown under Unsupervised.

Full size table

Results. The computed performance metrics for the D (depth-only) trackers are in Table 8 and the corresponding graphs in Fig. 11. From the results we can see that the depth-only variants of the best performing RGBD trackers also perform well in the D-only challenge (MixFormerRGBD \(\rightarrow \) MixFormerD and OSTrack \(\rightarrow \) OSTrack_D). The only dedicated D-only tracker submitted to the D-only challenge and which does not have an RGBD counterpart, RSDiMP, obtains the second best EAO score. Overall the three best methods, MixFormerD, RSDiMP and OSTrack_D, perform almost on par and are distinctively better than the rest. Therefore, these three trackers are good starting points to understand how to effectively use the depth channel in tracking.

Notably, there is a clear difference between the D-only and RGBD results on the same data (Table 7 vs. Table 8). That confirms that the both modalities, D and RGB, are beneficial for object tracking. For example, the RGB DiMP in Table 7 is clearly better than the depth-only DiMP in Table 8 (EAO 0.534 vs. 0.336), but inferior to the best D-only tracker (MixFormerD 0.600).

The VOT-D2022 Challenge Winner. The three best depth-only trackers, MixFormerD, RSDiMP and OSTrack_D, perform on par, but since MixFormerD obtains the best EAO score, it is selected as the winner.

5 Conclusions

Results of the VOT2022 challenge were presented. The challenge is composed of the following challenges focusing on various tracking aspects and domains: (i) the segmentation-based short-term RGB tracking challenge (VOT-STs2022), (ii) the legacy bounding-box-based short-term RGB tracking challenge (VOT-STb2022), (iii) the realtime counterpart of VOT-STs2022 (VOT-RTs2022), (iv) the realtime countrepart of VOT-STb2022 (VOT-RTb2022), (v) the VOT2022 long-term RGB tracking challenge (VOT-LT2021), (vi) the VOT2022 short-term RGB and depth (D) tracking challenge (VOT-RGBD2022) and its variation (vii) the VOT2022 short-term depth-only tracking challenge (VOT-D2022).

In this VOT edition, new VOT-LT2022, VOT-RGBD2022 and VOT-D2022 datasets were introduced, a legacy bounding-box-based tracking challenge VOT-STb2022 was reintroduced, the VOT-ST2022 public and sequestered datasets were refreshed, and a training dataset has been introduced for VOT-LT2022.

A methodological shift, which was indicated already in the VOT2021 [11], has been made even more aparent this year. Nearly half of the trackers participating in VOT-STs2022 challenge were based on transformers, approximately 40% were using discriminative correlation filters, while only few were based on Siamese correlation trackers (a methodology highly popular in VOT2021). All of the top 9 trackers were based on transformers. Apart from being robust, these trackers are also very fast – 9 of top VOT-STs2022 trackers are among the top trackers on VOT-RTs2022 challenge. Variations of the segmentation trackers were submitted to the legacy bounding-box tracking challenge VOT-STb2022. Seven of the top ten trackers on VOT-STb2022 were modifications of the trackers ranked among the top ten on VOT-STs2022. The winner of the VOT-STs2022 challenge is MS_AOT, while the winner of the VOT-STb2022 challenge in the bounding box ground truth category is OSTrackSTB and the winner in the segmentation mask ground truth category is APMT_MR. The winner of the VOT-RTs2022 challenge is MS_AOT and the winner of the VOT-RTb2022 challenge is OSTrackSTB.

The VOT-LT2022 challenge’s top-three performers all apply Transformer-based tracker structure for short-term localization and long-term re-detection. Among all submitted trackers, the dominant methodologies are SuperDiMP [1], STARK [23], KeepTrack [18], and MixFormer [3]. The top perfomer and the winner of the VOT-LT2022 is VITKT_M, which ensembles the results of VitTrack and KeepTrack. This tracker obtains a significantly better performance than the second-best tracker.

In the VOT-RGBD2022 and VOT-D2022 challenges, the same tracker architecture obtained the best results in all tracking metrics. There are two interesting points in this submission that possibly explain its success as compared to others. At first, the tracker is based on the recent Convolutional Visual Transformer (CvT) model and, secondly, the both RGB and depth representations are learned from data. Since there are no depth-only tracking datasets that are sufficiently large for network training, the existing RGB datasets were converted to pseudo depth map datasets using a monocular depth estimation method. These design choices turned out to be the winning ones this year, and therefore the same authors won the VOT-RGBD2022 and VOT-D2022 challenges with their two trackers adopting the same architecture, MixForRGBD and MixFormerD.

For the last decade, the primary objective of VOT has been to establish a platform for discussion of tracking performance evaluation and contributing to the tracking community with verified annotated datasets, performance measures and evaluation toolkits. The VOT2022 was the tenth effort toward this, following the very successful VOT2013, VOT2014, VOT2015, VOT2016, VOT2017, VOT2018, VOT2019, VOT2020 and VOT2021. Since its beginning, the VOT has successfully identified modern milestone tracking methodologies at their inception, spanning discriminative correlation filters, Siamese trackers and most recently the transformer-based architectures. By pushing the boundaries, presenting ever challenging sequences and opening new challenges, the VOT has been successfully fulfilling its service to community. The effort, however, is joint with the tracking community who continually raises to the challenges and is the one generating the fast pace of tracker architecture development. We thank the community for their collaboration and look forward to future developments in this exciting scientific field.

Notes

1.
https://www.votchallenge.net/vot2022/participation.html.
2.
https://www.votchallenge.net/vot2022.
3.
The target was sought in a window centered at its estimated position in the previous frame. This is the simplest dynamic model that assumes all positions within a search region containing the target have an equal prior probability.

References

Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6182–6191 (2019)
Google Scholar
Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., Lu, H.: Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8126–8135 (2021)
Google Scholar
Cui, Y., Jiang, C., Wang, L., Wu, G.: Mixformer: End-to-end tracking with iterative mixed attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13608–13618 (2022)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Kristan, M., et. al.: Appendix of the tenth visual object tracking vot2022 challenge results. In: European Conference on Computer Vision ECCV2022 Workshops (2022)
Google Scholar
Kristan, M., et al.: The seventh visual object tracking vot2019 challenge results. In: ICCV2019 Workshops, Workshop on Visual Object Tracking Challenge (2019)
Google Scholar
Kristan, M., et al.: The eighth visual object tracking VOT2020 challenge results. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12539, pp. 547–601. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-68238-5_39
Chapter Google Scholar
Kristan, M., et al.: The visual object tracking vot2018 challenge results. In: ECCV2018 Workshops, Workshop on Visual Object Tracking Challenge (2018)
Google Scholar
Kristan, M., et al.: The visual object tracking vot2017 challenge results. In: ICCV2017 Workshops, Workshop on Visual Object Tracking Challenge (2017)
Google Scholar
Kristan, M., et al.: The visual object tracking VOT2016 challenge results. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 777–823. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_54
Chapter Google Scholar
Kristan, M., et. al.: The ninth visual object tracking vot2021 challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision ICCV2021 Workshops, Workshop On Visual Object Tracking Challenge, pp. 2711–2738 (2021)
Google Scholar
Kristan, M., et al.: The visual object tracking vot2015 challenge results. In: ICCV2015 Workshops, Workshop on Visual Object Tracking Challenge (2015)
Google Scholar
Kristan, M., et al.: The visual object tracking vot2013 challenge results. In: ICCV2013 Workshops, Workshop on Visual Object Tracking Challenge, pp. 98–111 (2013)
Google Scholar
Kristan, M.: the visual object tracking VOT2014 challenge results. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 191–217. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16181-5_14
Chapter Google Scholar
Lukežič, A., Kart, U., Kämäräinen, J., Matas, J., Kristan, M.: CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark. In: ICCV (2019)
Google Scholar
Lukežič, A., Čehovin Zajc, L., Vojír̃, T., Matas, J., Kristan, M.: Sperformance evaluation methodology for long-term single object tracking. IEEE Trans. Cybern. (2020)
Google Scholar
Lukežič, A., Matas, J., Kristan, M.: A discriminative single-shot segmentation network for visual object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2021). https://doi.org/10.1109/TPAMI.2021.3137933
Mayer, C., Danelljan, M., Paudel, D.P., Van Gool, L.: Learning target candidate association to keep track of what not to track. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13444–13454 (2021)
Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)
Google Scholar
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. arXiv preprint arXiv:2012.12877 (2020)
Čehovin, L.: TraX: the visual tracking exchange protocol and library. Neurocomputing (2017). https://doi.org/10.1016/j.neucom.2017.02.036
Article Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: A benchmark. Comp. Vis. Patt. Recogn. (2013)
Google Scholar
Yan, B., Peng, H., Fu, J., Wang, D., Lu, H.: Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10448–10457 (2021)
Google Scholar
Yan, B., Zhang, X., Wang, D., Lu, H., Yang, X.: Alpha-refine: Boosting tracking performance by precise bounding box estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5289–5298 (2021)
Google Scholar
Yan, S., Yang, J., Käpylä, J., Zheng, F., Leonardis, A., Kämäräinen, J.K.: DepthTrack: Unveiling the power of RGBD tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10725–10733 (2021)
Google Scholar
Yan, S., Yang, J., Leonardis, A., Kämäräinen, J.K.: Depth-only object tracking. In: British Machine Vision Conference (BMVC) (2021)
Google Scholar
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: Point set representation for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666 (October 2019)
Google Scholar
Yang, Z., Miao, J., Wang, X., Wei, Y., Yang, Y.: Associating objects with scalable transformers for video object segmentation. arXiv preprint arXiv:2203.11442 (2022)
Ye, B., Chang, H., Ma, B., Shan, S.: Joint feature learning and relation modeling for tracking: A one-stream framework. arXiv preprint arXiv:2203.11991 (2022)

Download references

Acknowledgements

This work was supported in part by the following research programs and projects: Slovenian research agency research program P2-0214 and project J2-2506. The challenge was sponsored by the Faculty of Computer Science, University of Ljubljana, Slovenia. This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP), in particular in terms of the Wallenberg research arena for Media and Language, and the Berzelius cluster at NSC, both funded by the Knut and Alice Wallenberg Foundation, as well as by ELLIIT, a strategic research environment funded by the Swedish government. Besides, this work was partially supported by the Fundamental Research Funds for the Central Universities (No. 226-2022-00051). This work has also received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement no. 899987. Hyung Jin Chang and Aleš Leonardis were supported by the Institute of Information and communications Technology Planning and evaluation (IITP) grant funded by the Korea government (MSIT) (2021-0-00537). Gustavo Fernández was supported by the AIT Strategic Research Program 2022 Visual Surveillance and Insight.

Author information

Authors and Affiliations

University of Ljubljana, Ljubljana, Slovenia
Matej Kristan, Luka Čehovin Zajc, Alan Lukežič & Benjamin Džubur
University of Birmingham, Birmingham, UK
Aleš Leonardis, Hyung Jin Chang & Zhongqun Zhang
Czech Technical University, Prague, Czech Republic
Jiří Matas & Ondrej Drbohlav
Linköping University, Linköping, Sweden
Michael Felsberg, Yushan Zhang, Felix Järemo Lawin & Andreas Robinson
Austrian Institute of Technology, Seibersdorf, Austria
Roman Pflugfelder & Gustavo Fernández
TU Vienna, Vienna, Austria
Roman Pflugfelder
TU Munich, Munich, Germany
Roman Pflugfelder
Technion Israel Institute of Technology, Haifa, Israel
Roman Pflugfelder
Tampere University, Tampere, Finland
Joni-Kristian Kämäräinen, Song Yan, Wenyan Yang & Dingding Cai
ETH Zurich, Zurich, Switzerland
Martin Danelljan, Christoph Mayer, Goutam Bhat, Danda Paudel, Matthieu Paul, Radu Timofte, Luc Van Gool & Fisher Yu
Umeå University, Umea, Sweden
Johanna Björklund
Academia Sinica, New Taipei, Taiwan
Yu-Hsi Chen
AI School, Beijing, China
Xiuyi Chen, Keji He, Yan Huang, Liang Wang, Jinlin Wu, Hongyuan Yu & Weichen Yu
Beihang University, Beijing, China
Shangzhe Di & Zizheng Xun
Beijing Jiaotong University, Beijing, China
Q. H. Gu
ByteDance, Beijing, China
Guangqi Chen, Zhixing Chen, Qili Deng, Daniel K. Du, Zhihong Fu, Xiao Li, Jingna Sun, Liangliang Wang, Wei Xu & Min Zheng
Dalian Maritime University, Dalian, China
Ming Li & Chengwei Zhang
Dalian University of Technology, Dalian, China
Kang Ben, Xin Chen, Simiao Lai, Chang Liu, Huchuan Lu, Dong Wang, Lijun Wang, Bin Yan, Jie Zhao & Jiawen Zhu
Dr B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, India
Himanshu Gupta, Deepak Jangid & Om Prakash Verma
Guangxi Normal University, Guilin, China
Xianxian Li & Bineng Zhong
Huaqiao University, Quanzhou, China
Jie Ma
Indian Institute of Technology, Mumbai, India
Mohana Murali Dasari & Rama Krishna Gorthi
Institute of Artificial Intelligence, Beijing, China
Si Liu
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Hong Chang, Xilin Chen, Shiguang Shan & Botao Ye
JD Finance America Corporation, Mountain View, USA
Bo Liu
Jiangnan University, Wuxi, China
Jiaye Chen, Yingjie Jiang, Ze Kang, Hui Li, Xiaoning Song, Zhangyong Tang, Xiaojun Wu, Tianyang Xu, Kang Ze, Shaochuan Zhao & Xuefeng Zhu
LIVIA-École de technologie supérieure, Montreal, Canada
Eric Granger, Madhu Kiran, Rafael M. O. Cruz & Le Thanh Nguyen-Meidine
Microsoft Research Asia, Beijing, China
Houwen Peng
Nanjing University of Information Science and Technology, Nanjing, China
Kaihua Zhang
Nanjing University, Nanjing, China
Yutao Cui, Cheng Jiang, Tianhui Song, Limin Wang, Gangshan Wu & Yichun Yang
NLP, Beijing, China
Zhipeng Zhang
Peng Cheng Laboratory, Shenzhen, China
Xiangyuan Lan
Remark AI, Las Vegas, China
Xiaoyun Yang
Samsung Advanced Institute of Technology (SAIT), Yongin-si, Korea
Dongwook Lee, Hyunjeong Lee, Seohyung Lee & ChangBeom Park
Samsung R &D Institute China Beijing (SRCB), Beijing, China
Yiwei Chen, Siyang Pan, Qiang Wang & Jiaqian Yu
Shanghai Jiao Tong University, Shanghai, China
Chao Ma & Chunhu Zhang
South China University of Technology, Beijing, China
Liting Lin & Yong Xu
Southeast University, Nanjin, China
Fei Xie, Wankou Yang, Jiang Zhai & Wenkang Zhang
Space Applications Centre, Ahmedabad, India
Debajyoti Dhar & Litu Rout
Stony Brook University, New York, USA
Haibin Ling
Sustech, Shenzhen, China
Shang Gao, Zhe Li, Jinyu Yang & Feng Zheng
Tamkang University, New Taipei, Taiwan
Chi-Yi Tsai
Tianjin University of Technology, Tianjin, China
Shengyong Chen, Wanli Xue & Zhibin Zhang
University of Chinese Academy of Sciences, Beijing, China
Bingpeng Ma
University of Isfahan, Isfahan, Iran
Alireza Memarmoghadam & Payman Moallem
University of Milan, Milano, Italy
Emanuel Di Nardo
University of Naples Parthenope, Naples, Italy
Angelo Ciaramella & Emanuel Di Nardo
University of North Texas, Denton, USA
Heng Fan
University of Science and Technology of China, Hefei, China
Jianfeng He, Wangkai Li, Yinchao Ma, Fei Wang, Dawei Yang, Qianjin Yu & Tianzhu Zhang
University of Surrey, Guildford, UK
Zhenhua Feng & Josef Kittler
University of Udine, Udine, Italy
Matteo Dunnhofer, Niki Martinel, Christian Micheloni & Kristian Simonato
University of Wurzburg, Würzburg, Germany
Radu Timofte
Xiamen University, Xiamen, China
Rongrong Ji
Xian Jiaotong University, Xian, China
Haixia Zheng
Zhejiang University, Hangzhou, China
Yangming Cheng, Xi Li, Chao Sun, Yuanyou Xu, Yi Yang, Zongxin Yang & Yueting Zhuang

Authors

Matej Kristan
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Leonardis
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Matas
View author publications
You can also search for this author in PubMed Google Scholar
Michael Felsberg
View author publications
You can also search for this author in PubMed Google Scholar
Roman Pflugfelder
View author publications
You can also search for this author in PubMed Google Scholar
Joni-Kristian Kämäräinen
View author publications
You can also search for this author in PubMed Google Scholar
Hyung Jin Chang
View author publications
You can also search for this author in PubMed Google Scholar
Martin Danelljan
View author publications
You can also search for this author in PubMed Google Scholar
Luka Čehovin Zajc
View author publications
You can also search for this author in PubMed Google Scholar
Alan Lukežič
View author publications
You can also search for this author in PubMed Google Scholar
Ondrej Drbohlav
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Björklund
View author publications
You can also search for this author in PubMed Google Scholar
Yushan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongqun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Song Yan
View author publications
You can also search for this author in PubMed Google Scholar
Wenyan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dingding Cai
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Mayer
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Kang Ben
View author publications
You can also search for this author in PubMed Google Scholar
Goutam Bhat
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chang
View author publications
You can also search for this author in PubMed Google Scholar
Guangqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiaye Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shengyong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xilin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiuyi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yiwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hsi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhixing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yangming Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Angelo Ciaramella
View author publications
You can also search for this author in PubMed Google Scholar
Yutao Cui
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Džubur
View author publications
You can also search for this author in PubMed Google Scholar
Mohana Murali Dasari
View author publications
You can also search for this author in PubMed Google Scholar
Qili Deng
View author publications
You can also search for this author in PubMed Google Scholar
Debajyoti Dhar
View author publications
You can also search for this author in PubMed Google Scholar
Shangzhe Di
View author publications
You can also search for this author in PubMed Google Scholar
Emanuel Di Nardo
View author publications
You can also search for this author in PubMed Google Scholar
Daniel K. Du
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Dunnhofer
View author publications
You can also search for this author in PubMed Google Scholar
Heng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhua Feng
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Rama Krishna Gorthi
View author publications
You can also search for this author in PubMed Google Scholar
Eric Granger
View author publications
You can also search for this author in PubMed Google Scholar
Q. H. Gu
View author publications
You can also search for this author in PubMed Google Scholar
Himanshu Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng He
View author publications
You can also search for this author in PubMed Google Scholar
Keji He
View author publications
You can also search for this author in PubMed Google Scholar
Yan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Jangid
View author publications
You can also search for this author in PubMed Google Scholar
Rongrong Ji
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Felix Järemo Lawin
View author publications
You can also search for this author in PubMed Google Scholar
Ze Kang
View author publications
You can also search for this author in PubMed Google Scholar
Madhu Kiran
View author publications
You can also search for this author in PubMed Google Scholar
Josef Kittler
View author publications
You can also search for this author in PubMed Google Scholar
Simiao Lai
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyuan Lan
View author publications
You can also search for this author in PubMed Google Scholar
Dongwook Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyunjeong Lee
View author publications
You can also search for this author in PubMed Google Scholar
Seohyung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hui Li
View author publications
You can also search for this author in PubMed Google Scholar
Ming Li
View author publications
You can also search for this author in PubMed Google Scholar
Wangkai Li
View author publications
You can also search for this author in PubMed Google Scholar
Xi Li
View author publications
You can also search for this author in PubMed Google Scholar
Xianxian Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Li
View author publications
You can also search for this author in PubMed Google Scholar
Liting Lin
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Ling
View author publications
You can also search for this author in PubMed Google Scholar
Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Si Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huchuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Rafael M. O. Cruz
View author publications
You can also search for this author in PubMed Google Scholar
Bingpeng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Chao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yinchao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Niki Martinel
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Memarmoghadam
View author publications
You can also search for this author in PubMed Google Scholar
Christian Micheloni
View author publications
You can also search for this author in PubMed Google Scholar
Payman Moallem
View author publications
You can also search for this author in PubMed Google Scholar
Le Thanh Nguyen-Meidine
View author publications
You can also search for this author in PubMed Google Scholar
Siyang Pan
View author publications
You can also search for this author in PubMed Google Scholar
ChangBeom Park
View author publications
You can also search for this author in PubMed Google Scholar
Danda Paudel
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Paul
View author publications
You can also search for this author in PubMed Google Scholar
Houwen Peng
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Robinson
View author publications
You can also search for this author in PubMed Google Scholar
Litu Rout
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Simonato
View author publications
You can also search for this author in PubMed Google Scholar
Tianhui Song
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoning Song
View author publications
You can also search for this author in PubMed Google Scholar
Chao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jingna Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhangyong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Radu Timofte
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Yi Tsai
View author publications
You can also search for this author in PubMed Google Scholar
Luc Van Gool
View author publications
You can also search for this author in PubMed Google Scholar
Om Prakash Verma
View author publications
You can also search for this author in PubMed Google Scholar
Dong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Liangliang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lijun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Limin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Gangshan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jinlin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Fei Xie
View author publications
You can also search for this author in PubMed Google Scholar
Tianyang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyou Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Xue
View author publications
You can also search for this author in PubMed Google Scholar
Zizheng Xun
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jinyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wankou Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yichun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zongxin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Botao Ye
View author publications
You can also search for this author in PubMed Google Scholar
Fisher Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyuan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqian Yu
View author publications
You can also search for this author in PubMed Google Scholar
Qianjin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Weichen Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kang Ze
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Chengwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chunhu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kaihua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tianzhu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenkang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhipeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shaochuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Feng Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Haixia Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Min Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Bineng Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Jiawen Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yueting Zhuang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aleš Leonardis .

Editor information

Editors and Affiliations

IBM Research AI and MIT-IBM Watson AI Lab, Haifa, Israel
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kristan, M. et al. (2023). The Tenth Visual Object Tracking VOT2022 Challenge Results. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13808. Springer, Cham. https://doi.org/10.1007/978-3-031-25085-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-25085-9_25
Published: 12 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25084-2
Online ISBN: 978-3-031-25085-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Tenth Visual Object Tracking VOT2022 Challenge Results

Abstract

Similar content being viewed by others

The Eighth Visual Object Tracking VOT2020 Challenge Results

Long-term Visual Tracking: Review and Experimental Comparison

The Visual Object Tracking VOT2016 Challenge Results

Keywords

1 Introduction

1.1 The VOT2022 Challenge

2 Performance Evaluation Protocol

2.1 The Short-Term Evaluation Protocols

2.2 The Long-Term Evaluation Protocol

3 Description of Individual Challenges

3.1 VOT-ST2022 Challenge Outline

3.2 VOT-RT2022 Challenge Outline

3.3 VOT-LT2022 Challenge Outline

3.4 VOT-RGBD2022 Challenge Outline

4 The VOT2022 Challenge Results

4.1 The VOT-STs2022 Challenge Results

4.2 The VOT-STb2022 Challenge Results

4.3 The VOT-RTs2022 Challenge Results

4.4 The VOT-RTb2022 Challenge Results

4.5 The VOT-LT2022 Challenge Results

4.6 The VOT-RGBD2022 Challenge Results

4.7 The VOT-D2022 Challenge Results

5 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation