New Trend in Front-End Techniques of Visual SLAM: From Hand-Engineered Features to Deep-Learned Features

Wang, Yue; Fu, Yu; Zheng, Ruixue; Wang, Le; Qi, Jianzhong

doi:10.1007/978-981-16-9423-3_38

Yue Wang^42,43,
Yu Fu⁴³,
Ruixue Zheng⁴³,
Le Wang⁴⁴ &
…
Jianzhong Qi⁴⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 854))

1871 Accesses
1 Citations

Abstract

Visual simultaneous localization and mapping (V-SLAM) technique plays a key role in perception of autonomous mobile robots, augmented/mixed/virtual reality, as well as spatial AI applications. This paper gives a very concise survey about the front-end module of a V-SLAM system, which is charge of feature extraction, short/long-term data association with outlier rejection, as well as variable initialization. Visual features are mainly salient and repeatable points, called keypoints, their traditional extractors and matchers are hand-engineered and not robust to viewpoint/illumination/seasonal change, which is crucial for long-term autonomy of mobile robots. Therefore, new trend about deep-learning-based keypoint extractors and matchers are introduced to enhance the robustness of V-SLAM systems even under challenging conditions.

Y. Wang—This work was supported by the Pre-Research Project of Space Science (No. XDA15014700), the National Natural Science Foundation of China (No. 61601328), the Scientific Research Plan Project of the Committee of Education in Tianjin (No. JW1708), and the Doctor Foundation of Tianjin Normal University (No. 52XB1417), and the Beijing Natural Science Foundation (No. 4204095).

Access provided by Autonomous University of Puebla. Download conference paper PDF

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Article 08 September 2018

Dyna-MSDepth: multi-scale self-supervised monocular depth estimation network for visual SLAM in dynamic scenes

Article 19 August 2024

Incorporating learnt local and global embeddings into monocular visual SLAM

Article 05 August 2021

Keywords

1 Introduction

The visual simultaneous localization and mapping (V-SLAM) system adopts cameras (usually with inertial measurement units (IMUs)) to simultaneously inference its own state (e.g. pose) and build a consistent surrounding map [1]. V-SLAM systems are widely used by autonomous mobile robots, especially the resource-limited agile micro drones [2], augmented/mixed/virtual reality devices [3], as well as spatial AI applications [4, 5]. Recently, with the huge tide of deep learning (DL) techniques [6, 7], some end-to-end DL-based V-SLAM or visual odometry (VO) systems are proposed [8,9,10,11]. In this paper, we focus on the traditional de-facto framework of V-SLAM systems with standard cameras, as shown in Fig. 1, which contains two modules: a front-end and a back-end. The front-end is responsible for processing visual and IMU data, including feature extraction, short/long-term data association with outlier rejection, as well as initial estimation of the current pose and newly detected landmark positions. The back-end takes the initialization information from the front-end, and makes a maximum-a-posteriori (MAP) estimation based on a factor graph to refine both poses and landmark positions [12]. This paper focuses on the new trend in front-end techniques of V-SLAM systems.

2 Hand-Engineered Features

Traditionally, visual features are mainly restricted to salient and repeatable points (called keypoints), due to the unreliable extraction of high-level geometric features (e.g. lines or edges) using unlearning methods. Besides keypoints, dense methods using all pixel information (e.g. optical flow [13] or correspondence-free method [14]) have also been adopted to inference ego-motion with small motion assumption. In the early research of stereo VO for Mars rovers [15,16,17,18], keypoints were tracked among successive images from nearby viewpoints. Nister proposed that keypoints could be independently extracted in all images and then matched in his landmark paper [19]. This method then became the dominant approach because it can successfully work with a large motion/viewpoint change [20].

Keypoint-based feature extraction includes two stages: keypoint detection and keypoint description. The keypoint detectors can be divided into two categories: corner detectors and blob detectors [21]. Traditional keypoint extractors are hand-engineered: corner detectors include Moravec [22], Forstner [23], Harris [24], Shi-Tomasi [25], and FAST/FASTER [26, 27]; blob detectors include SIFT [28], SURF [29], SENSURE [30], RootSIFT [31], and KAZE [32, 33]; keypoint descriptors include SSD/NCC [34], census transform [35], SIFT [28], GLOH [36], SURF [29], DAISY[37], BRIEF [38], ORB [39], BRISK [40], and KAZE [32, 33].

Data association (also called feature matching) is commonly conducted by comparing similarity measurements between keypoint descriptors along with a mutual consistency check procedure [21]. With a prior knowledge of motion constraints (e.g. from IMU sensors or constant velocity assumptions) or stereo’s epipolar line constraints, the time used for data association can be shorten by restricting the searching space [17, 41, 42]. Due to the visual aliasing, wrong data associations (called outliers) are unavoidable to both short-term feature matching and long-term loop closure. Therefore, consensus set search (e.g. RANSAC) [43,44,45,46,47,48] and geometric information (i.e. pose and map estimations) from the back-end are usually adopted to remove outliers.

Since traditional back-end of V-SLAM systems relies on local iterative optimization algorithms (e.g. Gauss-Newton) [12], a fairly good initialization of variables (e.g. 6-DoF poses and 3D coordinate of keypoints) is required. The 6-DoF poses can be estimated via multiview geometry knowledge [18, 34, 49] using keypoints. 3D coordinate of keypoints are obtained by triangulation with careful keyframe selection [50, 51].

3 Deep-Learned Features

Traditional hand-engineered keypoint extractors and matchers are not robust to viewpoint/illumination/seasonal change, which is crucial for long-term autonomy of mobile robots. While deep learning, especially the convolutional neural network (CNN) [52], is good at feature extraction and processing. Therefore, new trend about DL-based keypoint extractors and matchers are proposed to enhance the robustness of V-SLAM systems even under challenging conditions.

The DL-based keypoint extractors can be divided into three categories: detect-then-describe, jointly detect-and-describe, describe-to-detect. For the detect-then-describe method, keypoint detectors include TILDE [53], covariant feature detector [54], TCDet [55], MagicPoint [56], Quad-Networks [57], texture feature detector [58], and Key.Net [59]; keypoint descriptors include convex optimized descriptor [60], Deepdesc [61], TFeat [62], UCN [63], L2-Net [64], HardNet [65], averaging precision ranking [66], GeoDesc [67], RDRL [68], LogPolarDesc [69], ContextDesc [70], SOSNet [71], GIFT [72], and CAPS descriptor [73]. Jointly detect-and-describe methods include LIFT [74], DELE [75], LF-Net [76], SuperPoint [77], UnsuperPoint [78], GCNv2 [79], D2-Net [80], RF-Net [81], R2D2 [82], ASLFeat [83], and UR2KiD [84]. Different from the above two categories, Tian proposed a describe-to-detect (D2D) method [85], which selects keypoints based on the dense descriptors information.

After feature extracting, robust keypoint matching methods include content networks (CNe) [86], deep fundamental matrix estimation [87], NG-RANSAC [88], NM-Net [89], Order-Aware Network [90], ACNe [91], and SuperGlue [92].

Besides above extract-then-match methods, some end-to-end keypoint extract-and-match methods have also been proposed, they include NCNet [93], KP2D [94], Sparse-NCNet [95], LoFTR [96], and Patch2Pix [97].

Some datasets and benchmarks are released to train and evaluate different DL networks, such as Brown and Lowe’s dataset [98], HPatches [99], matching in the dark (MID) dataset [100], image matching challenge [101], and long-term visual localization benchmark [102, 103].

4 Conclusions

In this paper, we give a very concise survey of the front-end techniques of V-SLAM systems, emphasizing the DL-based keypoint extracting and matching techniques. Future research directions include extraction and association of high-level geometric features (e.g. lines [104, 105], symmetry [106], and holistic 3D structures [107]) as well as semantic object-level features [108,109,110].

References

Cadena, C., et al.: Past, present, and future of simultaneous localization and mapping: toward the robust-perception age. IEEE Trans. Robot. 32(6), 1309–1332 (2016)
Article Google Scholar
Vidal, A.R., Rebecq, H., Horstschaefer, T., Scaramuzza, D.: Ultimate SLAM? Combining events, images, and IMU for robust visual SLAM in HDR and high-speed scenarios. IEEE Robot. Autom. Lett. 3(2), 994–1001 (2018)
Article Google Scholar
Microsoft HoloLens 2. http://www.microsoft.com/en-us/hololens
Davison, A.J.: FutureMapping: the computational structure of spatial AI systems. ArXiv Preprint arXiv:1803.11288 (2018)
Davison, A.J., Ortiz, J.: FutureMapping 2: Gaussian belief propagation for spatial AI. ArXiv Preprint arXiv:1910.14139 (2019)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Bengio, Y., LeCun, Y., Hinton, G.: Deep learning for AI. Commun. ACM 64(7), 58–65 (2021)
Article Google Scholar
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: IEEE 12th International Conference on Computer Vision (ICCV), 7–13 December 2015, Santiago, Chile, pp. 1–9 (2015)
Google Scholar
Wang, S., Clark, R., Wen, H., Trigoni, N.: DeepVO: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: IEEE International Conference on Robotics and Automation (ICRA), 29 May–3 June 2017, Singapore, pp. 1–8 (2017)
Google Scholar
Yang, N., Stumberg, L.V., Wang, R., Cremers, D.: D3VO: deep depth, deep pose and deep uncertainty for monocular visual odometry. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13–19 June 2020, Seattle, WA, USA, pp. 1–12 (2020)
Google Scholar
Chen, C., Wang, B., Lu, C.X., Trigoni, N., Markham, A.: A survey on deep learning for localization and mapping: towards the age of spatial machine intelligence. ArXiv Preprint arXiv:2006.12567v2 (2020)
Wang, Y., Peng, X.: New trend in back-end techniques of visual SLAM: from local iterative solvers to robust global optimization. In: International Conference on Artificial Intelligence in China (CHINAAI), 21–22 August 2021, pp. 1–10 (2021)
Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)
Article Google Scholar
Makadia, A., Geyer, C., Daniilidis, K.: Correspondence-free structure from motion. Int. J. Comput. Vis. 75(3), 311–327 (2007)
Article Google Scholar
Moravec, H.: Obstacle avoidance and navigation in the real world by a seeing robot rover. Ph.D. dissertation, Standford University, USA (1980)
Google Scholar
Cheng, Y., Maimone, M.W., Matthies, L.: Visual odometry on the mars exploration rovers. IEEE Robot. Automat. Mag. 13(2), 54–62 (2006)
Article Google Scholar
Maimone, M.W., Cheng, Y., Matthies, L.: Two years of visual odometry on the mars exploration rovers: field reports. J. Field Robot. 24(3), 169–186 (2007)
Article Google Scholar
Scaramuzza, D., Fraundorfer, F.: Visual odometry: part I: the first 30 years and fundamentals. IEEE Robot. Autom. Mag. 18(4), 80–92 (2011)
Article Google Scholar
Nister, D., Naroditsky, O., Bergen, J.: Visual odometry. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27 June–2 July 2004, Washington, DC, USA, pp. 1–8 (2004)
Google Scholar
Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)
Article Google Scholar
Fraundorfer, F., Scaramuzza, D.: Visual odometry: part II: matching, robustness, optimization, and applications. IEEE Robot. Autom. Mag. 19(2), 78–90 (2012)
Article Google Scholar
Moravec, H.: Towards automatic visual obstacle avoidance. In: 5th International Joint Conference on Artificial Intelligence, p. 584, August 1977
Google Scholar
Forstner, W.: A feature based correspondence algorithm for image matching. Int. Arch. Photogram. Remote Sens. 26(3), 150–166 (1986)
Google Scholar
Harris, C., Pike, J.: 3D positional integration from image sequences. In: 3rd Alvey Vision Conference, Cambridge, pp. 233–236, September 1987
Google Scholar
Tomasi, C., Shi, J.: Good features to track. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21–23 June 1994, Seattle, WA, USA, pp. 1–8 (1994)
Google Scholar
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_34
Chapter Google Scholar
Rosten, E., Porter, R., Drummond, T.: Faster and better: a machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 105–119 (2010)
Article Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 20(2), 91–110 (2003)
Article Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Agrawal, M., Konolige, K., Blas, M.R.: CenSurE: center surround extremas for realtime feature detection and matching. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 102–115. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_8
Chapter Google Scholar
Arandjelovic, R., Zisserman, A.: Three things everyone should know to improve object retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 16–21 June 2012, Providence, RI, USA, pp. 1–8 (2012)
Google Scholar
Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 214–227. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_16
Chapter Google Scholar
Alcantarilla, P.F., Nuevo, J., Bartoli, A.: Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: British Machine Vision Conference (BMVC), 9–13 September 2013, Bristol, UK, pp. 1–11 (2013)
Google Scholar
Ma, Y., Soatto, S., Kosěcká, J., Sastry, S.S.: An Invitation to 3-D Vision: From Images to Geometric Models. Springer, New York (2004). https://doi.org/10.1007/978-0-387-21779-6
Book MATH Google Scholar
Zabih, R., Woodfill, J.: Non-parametric local transforms for computing visual correspondence. In: Eklundh, J.-O. (ed.) ECCV 1994. LNCS, vol. 801, pp. 151–158. Springer, Heidelberg (1994). https://doi.org/10.1007/BFb0028345
Chapter Google Scholar
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)
Article Google Scholar
Tola, E., Lepetit, V., Fua, P.: DAISY: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 815–830 (2010)
Article Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chapter Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: IEEE International Conference on Computer Vision (ICCV), 6–13 November 2011, Barcelona, Spain, pp. 2564–2571 (2011)
Google Scholar
Leutenegger, S., Chli, M., Siegwart, R.: BRISK: binary robust invariant scalable keypoints. In: IEEE International Conference on Computer Vision (ICCV), 6–13 November 2011, Barcelona, Spain, pp. 2548–2555 (2011)
Google Scholar
Davison, A.: Real-time simultaneous localization and mapping with a single camera. In: IEEE International Conference on Computer Vision (ICCV), 14–17 October 2003, Nice, France, pp. 1403–1410 (2003)
Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Article Google Scholar
Fishler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)
Article MathSciNet Google Scholar
Latif, Y., Cadena, C., Neira, J.: Robust loop closing over time for pose graph SLAM. Intl. J. Robot. Res. 32(14), 1611–1626 (2013)
Article Google Scholar
Mangelson, J.G., Dominic, D., Eustice, R.M., Vasudevan, R.: Pairwise consistent measurement set maximization for robust multi-robot map merging. In: IEEE International Conference on Robotics and Automation (ICRA), 21–25 May 2018, Brisbane, Australia, pp. 1–8 (2018)
Google Scholar
Yang, J., Huang, Z., Quan, S., Qi, Z., Zhang, Y.: SAC-COT: sample consensus by sampling compatibility triangles in graphs for 3-D point cloud registration. IEEE Trans. Geosci. Remote Sens. (Early Access)
Google Scholar
Yang, H., Antonante, P., Tzoumas, V., Carlone, L.: Graduated non-convexity for robust spatial perception: from non-minimal solvers to global outlier rejection. IEEE Robot. Autom. Lett. 5(2), 1127–1134 (2020). (Best Paper Award Honorable Mention)
Article Google Scholar
Shi, J., Yang, H., Carlone, L.: ROBIN: a graph-theoretic approach to reject outliers in robust estimation using invariants. In: IEEE International Conference on Robotics and Automation (ICRA), 30 May–5 June 2021, Xi’an, China, pp. 1–16 (2021)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, UK (2003)
MATH Google Scholar
Mourikis, A.I., Roumeliotis, S.I.: A multi-state constraint Kalman filter for vision-aided inertial navigation. In: IEEE International Conference on Robotics and Automation (ICRA), 10–14 April 2007, Roma, Italy, pp. 1–8 (2007)
Google Scholar
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual-inertial odometry using nonlinear optimization. Intl. J. Robot. Res. 34(3), 314–334 (2015)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Verdie, Y., Yi, K.M., Fua, P., Lepetit, V.: TILDE: a temporally invariant learned DEtector. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7–12 June 2015, Boston, MA, USA, pp. 1–10 (2015)
Google Scholar
Lenc, K., Vedaldi, A.: Learning covariant feature detectors. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 100–117. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_11
Chapter Google Scholar
Zhang, X., Yu, F.X., Karaman, S., Chang, S.-F.: Learning discriminative and transformation covariant local feature detectors, In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, Honolulu, HI, USA, pp. 1–9 (2017)
Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: Toward geometric deep SLAM. ArXiv Preprint arXiv:1707.07410v1 (2017)
Savinov, N., Seki, A., Ladický, L., Sattler, T., Pollefeys, M.: Quad-networks: unsupervised learning to rank for interest point detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, Honolulu, HI, USA, pp. 1–9 (2017)
Google Scholar
Zhang, L., Rusinkiewicz, S.: Learning to detect features in texture images. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–22 June 2018, Salt Lake City, UT, USA, pp. 1–9 (2018)
Google Scholar
Laguna, A.B., Riba, E., Ponsa, D., Mikolajczyk, K.: Key.Net: keypoint detection by handcrafted and learned CNN filters. In: IEEE/CVF International Conference on Computer Vision (ICCV), 27 October–2 November 2019, Seoul, Korea (South), pp. 1–9 (2019)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Learning local feature descriptors using convex optimisation. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1573–1585 (2014)
Article Google Scholar
Simo-Serra, E., Trulls, E., Ferraz, L. Kokkinos, I., Fua, P., Moreno-Noguer, F.: Discriminative learning of deep convolutional feature point descriptors. In: IEEE International Conference on Computer Vision (ICCV), 7–13 December 2015, Santiago, Chile, pp. 1–9 (2015)
Google Scholar
Balntas, V., Riba, E., Ponsa, D., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: British Machine Vision Conference (BMVC), 19–22 September 2016, York, UK, pp. 1–11 (2016)
Google Scholar
Choy, C.B., Gwak, J., Savarese, S., Chandraker, M.: Universal correspondence network. In: Conference on Neural Information Processing Systems (NeurIPS), 5–10 December 2016, Barcelona, Spain, pp. 1–9 (2016)
Google Scholar
Tian, Y., Fan, B., Wu, F.: L2-net: deep learning of discriminative patch descriptor in euclidean space. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, Honolulu, HI, USA, pp. 1–9 (2017)
Google Scholar
Mishchuk, A., Mishkin, D., Radenović, F., Matas, J.: Working hard to know your neighbor’s margins: local descriptor learning loss. In: Conference on Neural Information Processing Systems (NeurIPS), 4–9 December 2017, Long Beach, NY, USA, pp. 1–12 (2017)
Google Scholar
He, K., Lu, Y., Sclaroff, S.: Local descriptors optimized for averaging precision. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–22 June 2018, Salt Lake City, UT, USA, pp. 1–10 (2018)
Google Scholar
Luo, Z., et al.: GeoDesc: learning local descriptors by integrating geometry constraints. In: European Conference on Computer Vision (ECCV), 8–14 September 2018, Munich, Germany, pp. 170–185 (2018)
Google Scholar
Yu, X., et al.: Unsupervised extraction of local image descriptors via relative distance ranking loss. In: IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 27 October–2 November 2019, Seoul, Korea (South), pp. 1–10 (2019)
Google Scholar
Ebel, P., Trulls, E., Yi, K.M., Fua, P., Mishchuk, A.: Beyond cartesian representations for local descriptors. In: IEEE/CVF International Conference on Computer Vision (ICCV), 27 October–2 November 2019, Seoul, Korea (South), pp. 1–10 (2019)
Google Scholar
Luo, Z., et al.: ContextDesc: local descriptor augmentation with cross-modality context. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16–20 June 2019, Long Beach, CA, USA, pp. 1–10 (2019)
Google Scholar
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. 16–20 June 2019, Long Beach, CA, USA, pp. 1–10 (2019)
Google Scholar
Liu, Y., Shen, Z., Lin, Z., Peng, S., Bao, H., Zhou, X.: GIFT: learning transformation-invariant dense visual descriptors via group CNNs. In: Conference on Neural Information Processing Systems (NeurIPS), 8–14 December 2019, Vancouver, Canada, pp. 1-12 (2019)
Google Scholar
Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 757–774. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_44
Chapter Google Scholar
Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: learned invariant feature transform. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 467–483. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_28
Chapter Google Scholar
Noh, H., Araujo, A., Sim, J., Weyand, T., Han, B.: Large-scale image retrieval with attentive deep local features. In: IEEE International Conference on Computer Vision (ICCV), 22–29 October 2017, Venice, Italy, pp. 1–10 (2017)
Google Scholar
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-net: learning local features from images. In: Conference on Neural Information Processing Systems (NeurIPS), 2–8 December 2018, Montreal, Canada, pp. 1–11 (2018)
Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 18–22 June 2018, Salt Lake City, UT, USA, pp. 1–13 (2018)
Google Scholar
Christiansen, P.H., Kragh, M.F., Brodskiy, Y., Karstoft, H.: UnsuperPoint: end-to-end unsupervised interest point detector and descriptor. ArXiv Preprint arXiv:1907.04011 (2019)
Tang, J., Ericson, L., Folkesson, J., Jensfelt, P.: GCNv2: efficient correspondence prediction for real-time SLAM. IEEE Robot. Autom. Lett. 4(4), 3505–3512 (2019)
Google Scholar
Dusmanu, M., et al.: D2-Net: a trainable cnn for joint description and detection of local features. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16–20 June 2019, Long Beach, CA, USA, pp. 1–10 (2019)
Google Scholar
Shen, X., et al.: RF-net: an end-to-end image matching network based on receptive field. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16–20 June 2019, Long Beach, CA, USA, pp. 1–9 (2019)
Google Scholar
Revaud, J., Weinzaepfel, P., Souza, C.D., Humenberger, M.: R2D2: repeatable and reliable detector and descriptor. In: Conference on Neural Information Processing Systems (NeurIPS), 8–14 December 2019, Vancouver, Canada, pp. 1–11 (2019)
Google Scholar
Luo, Z., et al.: ASLFeat: learning local features of accurate shape and localization. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14–19 June 2020, pp. 1–10 (2020)
Google Scholar
Yang, T.-Y., Nguyen, D.-K., Heijnen, H., Balntas, V.: UR2KiD: unifying retrieval, keypoint detection, and keypoint description without local correspondence supervision. ArXiv Preprint arXiv:2001.07252 (2020)
Tian, Y., Balntas, V., Ng, T., Barroso-Laguna, A., Demiris, Y., Mikolajczyk, K.: D2D: keypoint extraction with describe to detect approach. In: Asian Conference on Computer Vision (ACCV), 30 November–4 December 2020, pp. 1–18 (2020)
Google Scholar
Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–22 June 2018, Salt Lake City, UT, USA, pp. 1–9 (2018)
Google Scholar
Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: European Conference on Computer Vision (ECCV), 8–14 September 2018, Munich, Germany, pp. 292–309 (2018)
Google Scholar
Brachmann, E., Rother, C.: Neural-guided RANSAC: learning where to sample model hypotheses. In: IEEE/CVF International Conference on Computer Vision (ICCV), 27 October–2 November, Seoul, Korea (South), pp. 1–10 (2019)
Google Scholar
Zhao, C., Cao, Z., Li, C., Li, X., Yang, J.: NM-net: mining reliable neighbors for robust feature correspondences. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15–20 June 2019, Long Beach, CA, USA, pp. 1–10 (2019)
Google Scholar
Zhang, J., et al.: Learning two-view correspondences and geometry using order-aware network. In: IEEE/CVF International Conference on Computer Vision (ICCV), 27 October–2 November 2019, Seoul, Korea (South), pp. 1–10 (2019)
Google Scholar
Sun, W., Jiang, W., Trulls, E., Tagliasacchi, A., Yi, K.M.: ACNe: attentive context normalization for robust permutation-equivariant learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14–19 June 2020, pp. 1–10 (2020)
Google Scholar
Sarlin, P.-E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14–19 June 2020, pp. 1–10 (2020)
Google Scholar
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., Sivic, J.: Neighboorhood consensus networks. In: Conference on Neural Information Processing Systems (NeurIPS), 2–8 December 2018, Montreal, Canada, pp. 1–12 (2018)
Google Scholar
Tang, J., Kim, H., Guizilini, V., Pillai, S., Ambrus, R.: Neural outlier rejection for self-supervised keypoint learning. In: International Conference on Learning Representations (ICLR), 26 April–1 May 2020, pp. 1–14 (2020)
Google Scholar
Rocco, I., Arandjelović, R., Sivic, J.: Efficient neighbourhood consensus networks via submanifold sparse convolutions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 605–621. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_35
Chapter Google Scholar
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19–25 June 2021, pp. 1–10 (2021)
Google Scholar
Zhou, Q., Sattle, T., Leal-Taixé, L.: Patch2Pix: epipolar-guided pixel-level correspondences. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19–25 June 2021, pp. 1–10 (2021)
Google Scholar
Brown, M., Lowe, D.: Automatic panoramic image stitching using invariant features. Int. J. Comput. Vis. 74, 59–73 (2007)
Article Google Scholar
Balntas, V., Lenc, K., Vedaldi, A.: A benchmark and evaluation of handcrafted and learned local descriptor. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 21–26 July 2017, Honolulu, HI, USA, pp. 1–10 (2017)
Google Scholar
Song, W., Suganuma, M., Liu, X.: Matching in the dark: a dataset for matching image pairs of low-light scenes. ArXiv Preprint arXiv:2109.03585 (2021)
Jin, Y., et al.: Image matching across wide baseline: from paper to practice. Int. J. Comput. Vis. 129, 517–547 (2021)
Article Google Scholar
Sattler, T., et al.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18–22 June 2018, Salt Lake City, UT, USA, pp. 1–10 (2018)
Google Scholar
Toft, C., et al.: Long-term visual localization: revisited. IEEE Trans. Pattern Anal. Mach. Intell. (Early Access)
Google Scholar
Zhou, Y., Qi, H., Ma, Y.: End-to-end wireframe parsing. In: IEEE/CVF International Conference on Computer Vision (ICCV), 27 October–2 November 2019, Seoul, Korea (South), pp. 1–10 (2019)
Google Scholar
Dai, X., Yuan, X., Gong, H., Ma, Y.: Fully convolutional line parsing. ArXiv Preprint arXiv:2104.11207 (2021)
Zhou, Y., Liu, S., Ma, Y.: NeRD: neural 3D reflection symmetry detector. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19–25 June 2021, pp. 1–10 (2021)
Google Scholar
Zhou, Y., et al.: HoliCity: a city-scale data platform for learning holistic 3D structures. ArXiv Preprint arXiv: 2008.03286 (2020)
Salas-Moreno, R.F., Newcombe, R.A., Strasdat, H., Kelly, P.H.J, Davison, A.J.: SLAM++: simultaneous localisation and mapping at the level of objects. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 23–28 June 2013, Portland, OR, USA, pp. 1–8 (2013)
Google Scholar
Mccormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: volumetric object-level SLAM. In: International Conference on 3D Vision (3DV), 5–8 September 2018, Verona, Italy, pp. 1–10 (2018)
Google Scholar
Sucar, E., Wada, K., Davison, A.: NodeSLAM: neural object descriptors for multi-view shape reconstruction. In: International Conference on 3D Vision (3DV), 25–28 November 2020, Fukuoka, Japan, pp. 1–10 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Electronics and Information Technology for Space Systems, Chinese Academy of Sciences, Beijing, 100190, China
Yue Wang
Tianjin Key Laboratory of Wireless Mobile Communications and Power Transmission, Tianjin Normal University, Tianjin, 300387, China
Yue Wang, Yu Fu & Ruixue Zheng
Department of Electronic and Information Engineering, North China University of Technology, Beijing, 100144, China
Le Wang & Jianzhong Qi

Authors

Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Fu
View author publications
You can also search for this author in PubMed Google Scholar
Ruixue Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Le Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Qi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Wang .

Editor information

Editors and Affiliations

Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA
Qilian Liang
Tianjin Normal University, Tianjin, China
Wei Wang
Tianjin Normal University, Tianjin, China
Jiasong Mu
Dalian University of Technology, Dalian, China
Xin Liu
School of Information Science and Technology, Dalian Maritime University, Dalian, China
Zhenyu Na

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Fu, Y., Zheng, R., Wang, L., Qi, J. (2022). New Trend in Front-End Techniques of Visual SLAM: From Hand-Engineered Features to Deep-Learned Features. In: Liang, Q., Wang, W., Mu, J., Liu, X., Na, Z. (eds) Artificial Intelligence in China. Lecture Notes in Electrical Engineering, vol 854. Springer, Singapore. https://doi.org/10.1007/978-981-16-9423-3_38

Download citation

DOI: https://doi.org/10.1007/978-981-16-9423-3_38
Published: 22 March 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-9422-6
Online ISBN: 978-981-16-9423-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

New Trend in Front-End Techniques of Visual SLAM: From Hand-Engineered Features to Deep-Learned Features

Abstract

Similar content being viewed by others

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Dyna-MSDepth: multi-scale self-supervised monocular depth estimation network for visual SLAM in dynamic scenes

Incorporating learnt local and global embeddings into monocular visual SLAM

Keywords

1 Introduction

2 Hand-Engineered Features

3 Deep-Learned Features

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

New Trend in Front-End Techniques of Visual SLAM: From Hand-Engineered Features to Deep-Learned Features

Abstract

Similar content being viewed by others

Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities

Dyna-MSDepth: multi-scale self-supervised monocular depth estimation network for visual SLAM in dynamic scenes

Incorporating learnt local and global embeddings into monocular visual SLAM

Keywords

1 Introduction

2 Hand-Engineered Features

3 Deep-Learned Features

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation