Abstract
Depth map has been utilized for refinement of geometric information in a variety of fields such as 3D reconstruction and pose estimation in SLAM system where ill-posed problems are occurred. Currently, as learning-based approaches are successfully introduced throughout many problems of vision-based fields, several depth estimation algorithms based on CNN are suggested, which only conduct training of spatial information. Since an image sequence or video used for SLAM system tends to have temporal information, this paper proposes a recurrent CNN architecture for SLAM system to estimate depth map by exploring not only spatial but also temporal information by using convolutional GRU cell, which is constructed to remember weights of past convolutional layers. Furthermore, this paper proposes using additional layers that preserve structure of scenes by utilizing sparse depth cues obtained from SLAM system. The sparse depth cues are produced by projecting reconstructed 3D map into each camera frame, and the sparse cues help to predict accurate depth map avoiding ambiguity of depth map generation of untrained structures in latent space. Despite accuracy of depth cues according to monocular SLAM system degrades than stereo SLAM system, the proposed masking approach, which takes the confidence of depth cues with regard to a relative camera pose between current frame and previous frame, retains the performance of the proposed system with the proposed adaptive regularization in loss function. In the training phase, by preprocessing exponential quantization of ground-truth depth map to eliminate the ill-effects of the captured large distances, the depth map prediction of the proposed system improves more than other baseline methods with accomplishment of real-time system. We expect that this proposed system can be used in SLAM system to refine geometric information for more accurate 3D reconstruction and pose estimation, which are essential parts for robust navigation system of robots.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
G. J. Moon and Q. Zhihua, “An autonomous underwater vehicle as an underwater glider and its depth control,” International Journal of Control, Automation, and Systems, vol. 13, no. 5, pp.1212–1220, 2015.
N. Metni and T. Hamel, “Visual tracking control of aerial robotic systems with adaptive depth estimation,” International Journal of Control, Automation, and Systems, vol. 5, no. 1, pp.51–60, 2007.
A. Torralba and A. Oliva, “Depth estimation from image structure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1226–1238, 2002.
D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a single image using a multi-scale deep network,” Advances in Neural Information Processing Systems, pp. 2366–2374, 2014.
H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2002–2011, 2018.
R. Mur-Artal and J. D. Tardós, “ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.
J. Engel, T. Schöps, and D. Cremers, “LSD-SLAM: large-scale direct monocular SLAM,” Proc. of European Conference on Computer Vision, pp. 834–849, September 2014.
A. CS. Kumar, S. M. Bhandarkar, and M. Prasad, “Depthnet: a recurrent neural network architecture for monocular depth prediction,” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 283–291, 2018.
S. H. I. Xingjian, Z. Chen, H. Wang, H., D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” Advances in Neural Information Processing Systems, pp. 802–810, 2015.
K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoderdecoder approaches,” Proc. of Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014.
S. Hochreiter and J. Schmidhuber, “LSTM can solve hard long time lag problems,” Advances in Neural Information Processing Systems, pp. 473–479, 1997.
M. Siam, S. Valipour, M. Jagersand, and N. Ray, “Convolutional gated recurrent networks for video segmentation,” Proc. of IEEE International Conference on Image Processing (ICIP), pp. 3090–3094, September, 2017.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” Proc. International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241, October, 2015.
R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, 2nd ed, New York, NY, USA, 2003.
G. Younes, D. Asmar, E. Shammas, and J. Zelek, “Keyframe-based monocular SLAM: design, survey, and future directions,” Robotics and Autonomous Systems, vol. 98, pp. 67–88, 2017.
N. K. Ibragimov, Elementary Lie Group Analysis and Ordinary Differential Equations, vol. 197, New York, Wiley, 1999.
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361, June 2012.
A. Saxena, M. Sun, and A. Y. Ng, “Make3d: learning 3d scene structure from a single still image,” IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824–840, 2009.
B. Liu, S. Gould, and D. Koller, “Single image depth estimation from predicted semantic labels,” Proc. of Conference on Computer Vision and Pattern Recognition, 2010.
J. Johnson, Report of CNN Resnet Speed, https://github.com/jcjohnson/cnn-benchmarks, September 2017.
F. Ma, G. V. Cavalheiro, and S. Karaman, “Self-supervised sparse-to-dense: self-supervised depth completion from lidar and monocular camera,” Proc. of IEEE International Conference on Robotics and Automation, 2019.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Recommended by Associate Editor Hyun Myung under the direction of Editor Jessie (Ju H.) Park. This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education [No. 2016R1D1A3B03934808].
Sang Jun Lee received his B.S. degree in Computer science and Engineering from Handong Global University, Pohang-si, Korea, in 2017. He is currently pursuing an M.S. degree in the Dept. of Information Technology at the Handong Global University. His research interests include the SLAM system for the localization of self-driving cars, robotics, or users who use augmented reality, virtual reality, as well as 3D reconstruction, and optimization of these technologies using machine learning.
Heeyoul Choi is an assistant professor at Handong Global University. He was a visiting researcher in MILA at University of Montreal from 2015 to 2016. He worked for Samsung Advanced Institute of Technology for 5 years, and was a post-doctoral researcher in Psychological and Brain Science at Indiana University, Indiana, from 2010 to 2011. He received his B.S. and M.S. degrees from Pohang University of Science and Technology, Korea, in 2002 and 2005, respectively, and his Ph.D. degree from Texas A&M University, Texas, in 2010. His research interests cover deep learning and cognitive science.
Sung Soo Hwang received his B.S. degree in Electrical Engineering and Computer Science from Handong Global Unveristy, Pohang, Korea in 2008, and his M.S and Ph.D. degrees in Korea Advanced Institute of Science and Technology, Daejeon, Korea, in 2010 and 2015, respectively. His research interests include image-based 3D modeling, 3D data compression, augmented reality, and Simultaneous Localization and Mapping system.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, S.J., Choi, H. & Hwang, S.S. Real-time Depth Estimation Using Recurrent CNN with Sparse Depth Cues for SLAM System. Int. J. Control Autom. Syst. 18, 206–216 (2020). https://doi.org/10.1007/s12555-019-0350-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-019-0350-8