Abstract
Applications for real-time visual tracking can be found in many areas, including visual odometry and augmented reality. Interest point detection and feature description form the basis of feature-based tracking, and a variety of algorithms for these tasks have been proposed. In this work, we present (1) a carefully designed dataset of video sequences of planar textures with ground truth, which includes various geometric changes, lighting conditions, and levels of motion blur, and which may serve as a testbed for a variety of tracking-related problems, and (2) a comprehensive quantitative evaluation of detector-descriptor-based visual camera tracking based on this testbed. We evaluate the impact of individual algorithm parameters, compare algorithms for both detection and description in isolation, as well as all detector-descriptor combinations as a tracking solution. In contrast to existing evaluations, which aim at different tasks such as object recognition and have limited validity for visual tracking, our evaluation is geared towards this application in all relevant factors (performance measures, testbed, candidate algorithms). To our knowledge, this is the first work that comprehensively compares these algorithms in this context, and in particular, on video streams.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Adams, A., Gelfand, N., & Pulli, K. (2008). Viewfinder alignment. Computer Graphics Forum, 27(2), 597–606. doi:10.1111/j.1467-8659.2008.01157.x.
Agrawal, M., Konolige, K., & Blas, M. R. (2008). CenSurE: Center surround extremas for realtime feature detection and matching. In Proceedings of the European conference on computer vision (ECCV’08) (Vol. 5305, pp. 102–115). doi:10.1007/978-3-540-88693-8_8.
Baker, S., & Matthews, I. (2001). Equivalence and efficiency of image alignment algorithms. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’01) (Vol. 1, pp. 1090–1097).
Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., & Szeliski, R. (2007). A database and evaluation methodology for optical flow. In Proceedings of the IEEE intl. conference on computer vision (ICCV’07) (pp. 1–8). doi:10.1109/ICCV.2007.4408903.
Bay, H., Ess, A., Tuytelaars, T., & Van Gool, L. (2008). Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110, 346–359. doi:10.1016/j.cviu.2007.09.014.
Beaudet, P. R. (1978). Rotationally invariant image operators. In Proceedings of the intl. joint conference on pattern recognition (pp. 579–583).
Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522. doi:10.1109/34.993558.
Benhimane, S., & Malis, E. (2004). Real-time image-based tracking of planes using efficient second-order minimization. In Proceedings of the IEEE/RSJ intl. conference on intelligent robots and systems (pp. 943–948).
Bleser, G., & Stricker, D. (2008). Advanced tracking through efficient image processing and visual-inertial sensor fusion. In Proceedings of the IEEE virtual reality conference (VR’08) (pp. 137–144). doi:10.1109/VR.2008.4480765.
Brown, M., & Lowe, D. (2002). Invariant features from interest point groups. In Proceedings of the British machine vision conference (BMVC’02).
Calonder, M., Lepetit, V., & Fua, P. (2008). Keypoint signatures for fast learning and recognition. In Proceedings of the 11th European conference on computer vision (ECCV’08), Marseille, France.
Campbell, J., Sukthankar, R., & Nourbakhsh, I. (2004). Techniques for evaluating optical flow for visual odometry in extreme terrain. In Proceedings of the IEEE/RSJ intl. conference on intelligent robots and systems (Vol. 4, pp. 3704–3711).
Carneiro, G., & Jepson, A. D. (2003). Multi-scale phase-based local features. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’03) (Vol. 1, pp. 736–743).
Carrera, G., Savage, J., & Mayol-Cuevas, W. (2007). Robust feature descriptors for efficient vision-based tracking. In Proceedings of the 12th Iberoamerican congress on pattern recognition (pp. 251–260). doi:10.1007/978-3-540-76725-1_27.
Chekhlov, D., Pupilli, M., Mayol-Cuevas, W., & Calway, A. (2006). Real-time and robust monocular SLAM using predictive multi-resolution descriptors. In Proceedings of the 2nd intl. symposium on visual computing.
Chekhlov, D., Pupilli, M., Mayol, W., & Calway, A. (2007). Robust real-time visual SLAM using scale prediction and exemplar based feature description. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’07) (pp. 1–7). doi:10.1109/CVPR.2007.383026.
Cheng, Y., Maimone, M. W., & Matthies, L. (2006). Visual odometry on the mars exploration rovers—a tool to ensure accurate driving and science imaging. IEEE Robotics & Automation Magazine, 13(2), 54–62. doi:10.1109/MRA.2006.1638016.
Chum, O., & Matas, J. (2005). Matching with PROSAC—progressive sample consensus. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’05) (pp. 220–226). doi:10.1109/CVPR.2005.221.
Davison, A. J., Reid, I. D., Molton, N. D., & Stasse, O. (2007). MonoSLAM: Real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), 1052–1067. doi:10.1109/TPAMI.2007.1049.
DiVerdi, S., & Höllerer, T. (2008). Heads up and camera down: A vision-based tracking modality for mobile mixed reality. IEEE Transactions on Visualization and Computer Graphics, 14(3), 500–512. doi:10.1109/TVCG.2008.26.
DiVerdi, S., Wither, J., & Höllerer, T. (2008). Envisor: Online environment map construction for mixed reality. In Proceedings of the IEEE virtual reality conference (VR’08) (pp. 19–26). doi:10.1109/VR.2008.4480745.
Eade, E., & Drummond, T. (2006a). Edge landmarks in monocular SLAM. In Proceedings of the 17th British machine vision conference (BMVC’06), Edinburgh (Vol. 1, pp. 7–16).
Eade, E., & Drummond, T. (2006b). Scalable monocular SLAM. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’06) (Vol. 1, pp. 469–476). doi:10.1109/CVPR.2006.263.
Ebrahimi, M., & Mayol-Cuevas, W. (2009). SUSurE: Speeded up surround extrema feature detector and descriptor for realtime applications. In Workshop on feature detectors and descriptors: the state of the art and beyond. IEEE conference on computer vision and pattern recognition (CVPR’09).
Fiala, M. (2005). ARTag, a fiducial marker system using digital techniques. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’05) (Vol. 2, pp. 590–596), Washington, DC, USA. doi:10.1109/CVPR.2005.74.
Fischler, M. A., & Bolles, R. C. (1981). Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395. doi:10.1145/358669.358692.
Förstner, W. (1994). A framework for low level feature extraction. In Proceedings of the 3rd European conference on computer vision (ECCV’94), Secaucus, NJ, USA (Vol. II, pp. 383–394).
Freeman, W. T., & Adelson, E. H. (1991). The design and use of steerable filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9), 891–906. doi:10.1109/34.93808.
Gauglitz, S., Höllerer, T., Krahwinkler, P., & Roßmann, J. (2009). A setup for evaluating detectors and descriptors for visual tracking. In Proceedings of the 8th IEEE intl. symposium on mixed and augmented reality (ISMAR’09).
Gauglitz, S., Höllerer, T., & Turk, M. (2010). Dataset and evaluation of interest point detectors for visual tracking (Technical Report 2010-06). Department of Computer Science, UC Santa Barbara.
Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Proceedings of the 4th ALVEY vision conference (pp. 147–151).
Hartley, R., & Zisserman, A. (2004). Multiple view geometry in computer vision (2nd ed.). Cambridge: Cambridge University Press.
Horn, B. K. P. (1987). Closed-form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America, A, Optics, Image Science & Vision, 4(4), 629–642.
Julier, S. J., & Uhlmann, J. K. (1997). New extension of the Kalman filter to nonlinear systems. In I. Kadar (Ed.), Proceedings of the SPIE conference on signal processing, sensor fusion, & target recognition VI (Vol. 3068, pp. 182–193). doi:10.1117/12.280797.
Kadir, T., Zisserman, A., & Brady, M. (2004). An affine invariant salient region detector. In Proceedings of the 8th European conference on computer vision (ECCV’04) (pp. 228–241).
Kato, H., & Billinghurst, M. (1999). Marker tracking and HMD calibration for a video-based augmented reality conferencerencing system. In Proceedings of the 2nd IEEE and ACM intl. workshop on augmented reality (IWAR’99) (p. 85), Washington, DC, USA.
Ke, Y., & Sukthankar, R. (2004). PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’04) (Vol. 2, pp. 506–513). doi:10.1109/CVPR.2004.183.
Kitchen, L., & Rosenfeld, A. (1982). Gray-level corner detection. Pattern Recognition Letters, 1(2), 95–102. doi:10.1016/0167-8655(82)90020-4.
Klein, G., & Murray, D. (2007). Parallel tracking and mapping for small AR workspaces. In Proceedings of the 6th IEEE and ACM intl. symposium on mixed and augmented reality (ISMAR’07), Nara, Japan.
Klein, G., & Murray, D. (2008). Improving the agility of keyframe-based SLAM. In Proceedings of the 10th European conference on computer vision (ECCV’08), Marseille, France (pp. 802–815).
Klein, G., & Murray, D. (2009). Parallel tracking and mapping on a camera phone. In Proceedings of the 8th IEEE intl. symposium on mixed and augmented reality (ISMAR’09) (pp. 83–86). doi:10.1109/ISMAR.2009.5336495.
Lee, S., & Song, J. B. (2004). Mobile robot localization using optical flow sensors. International Journal of Control, Automation, and Systems, 2(4), 485–493.
Lee, T., & Höllerer, T. (2008). Hybrid feature tracking and user interaction for markerless augmented reality. In Proceedings of the IEEE virtual reality conference (VR’08) (pp. 145–152). doi:10.1109/VR.2008.4480766.
Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1465–1479. doi:10.1109/TPAMI.2006.188.
Levin, A., & Szeliski, R. (2004). Visual odometry and map correlation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’04) (Vol. 1, pp. 611–618). doi:10.1109/CVPR.2004.266.
Lieberknecht, S., Benhimane, S., Meier, P., & Navab, N. (2009). A dataset and evaluation methodology for template-based tracking algorithms. In Proceedings of the IEEE intl. symposium on mixed and augmented reality (ISMAR’09).
Lindeberg, T. (1994). Scale-space theory: A basic tool for analysing structures at different scales. Journal of Applied Statistics, 21(2), 224–270.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In Proceedings of the IEEE intl. conference on computer vision (ICCV’99), Corfu (pp. 1150–1157).
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Matas, J., Chum, O., Urban, M., & Pajdla, T. (2002). Robust wide baseline stereo from maximally stable extremal regions. In Proceedings of the British machine vision conference (BMCV’02) (pp. 384–393).
Matthies, L., & Shafer, S. A. (1987). Error modeling in stereo navigation. IEEE Journal of Robotics and Automation, 3(3), 239–248.
McCarthy, C. D. (2005). Performance of optical flow techniques for mobile robot navigation (Master’s thesis). Department of Computer Science and Software Engineering, University of Melbourne.
Mikolajczyk, K., & Schmid, C. (2001). Indexing based on scale invariant interest points. In Proceedings of the IEEE intl. conference on computer vision (ICCV’01) (Vol. 1, p. 525). doi:10.1109/ICCV.2001.10069.
Mikolajczyk, K., & Schmid, C. (2002). An affine invariant interest point detector. In Proceedings of the 7th European conference on computer vision (ECCV’02) (pp. 128–142), London, UK.
Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86. doi:10.1023/B:VISI.0000027790.02288.f2.
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630. doi:10.1109/TPAMI.2005.188.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & van Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(7), 43–72.
Mohanna, F., & Mokhtarian, F. (2006). Performance evaluation of corner detectors using consistency and accuracy measures. Computer Vision and Image Understanding, 102(1), 81–94. doi:10.1016/j.cviu.2005.11.001.
Montemerlo, M., Thrun, S., Koller, D., & Wegbreit, B. (2002). FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI national conference on artificial intelligence (pp. 593–598).
Montemerlo, M., Thrun, S., Koller, D., & Wegbreit, B. (2003). FastSLAM 2.0: An improved particle filtering algorithm for simultaneous localization and mapping that provably converges. In Proceedings of the intl. joint conference on artificial intelligence (IJCAI’03) (pp. 1151–1156).
Moravec, H. (1980). Obstacle avoidance and navigation in the real world by a seeing robot rover (Technical Report CMU-RI-TR-80-03). Robotics Institute, Carnegie Mellon University.
Moreels, P., & Perona, P. (2007). Evaluation of features detectors and descriptors based on 3D objects. International Journal of Computer Vision, 73(3), 263–284. doi:10.1007/s11263-006-9967-1.
Moreno-Noguer, F., Lepetit, V., & Fua, P. (2007). Accurate non-iterative o(n) solution to the pnp problem. In Proceedings of the IEEE international conference on computer vision (ICCV’07) (pp. 1–8). doi:10.1109/ICCV.2007.4409116.
Neira, J., & Tardos, J. D. (2001). Data association in stochastic mapping using the joint compatibility test. IEEE Transactions on Robotics and Automation, 17(6), 890–897. doi:10.1109/70.976019.
Nistér, D., Naroditsky, O., & Bergen, J. (2004). Visual odometry. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’04) (Vol. 1, pp. 652–659). doi:10.1109/CVPR.2004.1315094.
Özuysal, M., Fua, P., & Lepetit, V. (2007). Fast keypoint recognition in ten lines of code. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’07), Minneapolis, Minnesota, USA. doi:10.1109/CVPR.2007.383123.
Park, Y., Lepetit, V., & Woo, W. (2008). Multiple 3D object tracking for augmented reality. In Proceedings of the 7th IEEE and ACM intl. symposium on mixed and augmented reality (ISMAR’08) (pp. 117–120). doi:10.1109/ISMAR.2008.4637336.
Rosten, E., & Drummond, T. (2005). Fusing points and lines for high performance tracking. In Proceedings of the IEEE intl. conference on computer vision (ICCV’05) (Vol. 2, pp. 1508–1511). doi:10.1109/ICCV.2005.104.
Rosten, E., & Drummond, T. (2006). Machine learning for high-speed corner detection. In Proceedings of the IEEE European conference on computer vision (ECCV’06) (Vol. 1, pp. 430–443). doi:10.1007/11744023_34.
Schaffalitzky, F., & Zisserman, A. (2002). Multi-view matching for unordered image sets, or “How Do I Organize My Holiday Snaps?”. In Proceedings of the 7th European conference on computer vision (ECCV’02) (Vol. 1, pp. 414–431), London, UK.
Schmid, C., & Mohr, R. (1997). Local greyvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 530–535.
Schmid, C., Mohr, R., & Bauckhage, C. (2000). Evaluation of interest point detectors. International Journal of Computer Vision, 37(2), 151–172.
Se, S., Lowe, D., & Little, J. (2002). Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. The International Journal of Robotics Research, 21(8), 735–758. doi:10.1177/027836402761412467.
Seitz, S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R. (2006). A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’06) (Vol. 1, pp. 519–528). Los Alamitos: IEEE Computer Society. doi:10.1109/CVPR.2006.19.
Shi, J., & Tomasi, C. (1994). Good features to track. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’94) (pp. 593–600). doi:10.1109/CVPR.1994.323794.
Skrypnyk, I., & Lowe, D. G. (2004). Scene modelling, recognition and tracking with invariant image features. In Proceedings of the 3rd IEEE and ACM intl. symposium on mixed and augmented reality (ISMAR’04) (pp. 110–119). doi:10.1109/ISMAR.2004.53.
Taylor, S., Rosten, E., & Drummond, T. (2009). Robust feature matching in 2.3us. In Workshop, IEEE conference on computer vision and pattern recognition (pp. 15–22). doi:10.1109/CVPRW.2009.5204314.
Torr, P. H. S., & Zisserman, A. (2000). MLESAC: A new robust estimator with application to estimating image geometry. Computer Vision and Image Understanding, 78(1), 138–156. doi:10.1006/cviu.1999.0832.
Trajkovic, M., & Hedley, M. (1998). Fast corner detection. Image and Vision Computing, 16(2), 75–87. doi:10.1016/S0262-8856(97)00056-5.
Tuytelaars, T., & van Gool, L. (2000). Wide baseline stereo matching based on local, affinely invariant regions. In Proceedings of the British machine vision conference (BMVC’00) (pp. 412–425).
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’01) (Vol. 1, p. 511). Los Alamitos: IEEE Computer Society. doi:10.1109/CVPR.2001.990517.
Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., & Schmalstieg, D. (2008). Pose tracking from natural features on mobile phones. In Proceedings of the 7th IEEE and ACM intl. symposium on mixed and augmented reality (ISMAR’08), Cambridge, UK.
Wagner, D., Schmalstieg, D., & Bischof, H. (2009). Multiple target detection and tracking with guaranteed framerates on mobile phones. In Proceedings of the 8th IEEE intl. symposium on mixed and augmented reality (ISMAR’09) (pp. 57–64). doi:10.1109/ISMAR.2009.5336497.
Wagner, D., Mulloni, A., Langlotz, T., & Schmalstieg, D. (2010). Real-time panoramic mapping and tracking on mobile phones. In Proceedings of the IEEE virtual reality conference (VR’10).
Williams, B., Klein, G., & Reid, I. (2007). Real-time SLAM relocalisation. In Proceedings of the IEEE intl. conference on computer vision (ICCV’07) (pp. 1–8). doi:10.1109/ICCV.2007.4409115.
Winder, S., & Brown, M. (2007). Learning local image descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’07) (pp. 1–8). doi:10.1109/CVPR.2007.382971.
Winder, S., Hua, G., & Brown, M. (2009). Picking the best daisy. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR’09) (pp. 178–185). doi:10.1109/CVPRW.2009.5206839.
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys, 38. doi:10.1145/1177352.1177355.
Zhang, Z. (1997). Parameter estimation techniques: a tutorial with application to conic fitting. Image and Vision Computing, 15, 59–76.
Zimmermann, K., Matas, J., & Svoboda, T. (2009). Tracking by an optimal sequence of linear predictors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 677–692. doi:10.1109/TPAMI.2008.119.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gauglitz, S., Höllerer, T. & Turk, M. Evaluation of Interest Point Detectors and Feature Descriptors for Visual Tracking. Int J Comput Vis 94, 335–360 (2011). https://doi.org/10.1007/s11263-011-0431-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0431-5