Abstract
The robot Pepper provides a bad depth estimation. In this paper, we present a method for improving that 3D estimation. The method is based on using the RGB image to predict monocular depth. As it will be shown, the combination of both, monocular and 3D depth, provides a better 3D data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cao, Y., Wu, Z., Shen, C.: Estimating depth from monocular images as classification using deep fully convolutional residual networks, May 2016
Castanedo, F.: A review of data fusion techniques. Sci. World J. 2013, 19 (2013)
Dosovitskiy, A., Springenberg, J.T., Tatarchenko, M., Brox, T.: Learning to generate chairs, tables and cars with convolutional networks. arXiv e-prints, November 2014
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2014, pp. 2366–2374. MIT Press, Cambridge (2014). http://dl.acm.org/citation.cfm?id=2969033.2969091
Elmenreich, W.: An introduction to sensor fusion. Research Report 47/2001, Technische Universität Wien, Institut für Technische Informatik, Treitlstr. 1-3/182-1, 1040 Vienna, Austria (2001)
Engelhard, N., Endres, F., Hess, J., Sturm, J., Burgard, W.: Real-time 3D visual SLAM with a hand-held RGB-D camera. In: Proceedings of the RGB-D Workshop on 3D Perception in Robotics at the European Robotics Forum, Vasteras, Sweden, April 2011
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv e-prints, December 2015
Karsch, K., Liu, C., Kang, S.B.: Depth extraction from video using non-parametric sampling. In: Proceedings of the 12th European Conference on Computer Vision - Volume Part V. ECCV 2012, pp. 775–788. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_56
Kim, P., Chen, J., Cho, Y.K.: SLAM-driven robotic mapping and registration of 3D point clouds. Autom. Constr. 89, 38–48 (2018). http://www.sciencedirect.com/science/article/pii/S0926580517303990
Kim, P., Chen, J., Kim, J., Cho, Y.K.: SLAM-driven intelligent autonomous mobile robot navigation for construction applications. In: Smith, I.F.C., Domer, B. (eds.) Advanced Computing Strategies for Engineering, pp. 254–269. Springer, Cham (2018)
Konrad, J., Wang, M., Ishwar, P.: 2D-to-3D image conversion by learning depth from examples. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 16–22 (2012)
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. CoRR abs/1606.00373 (2016). http://arxiv.org/abs/1606.00373
Li, B., Shen, C., Dai, Y., van den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015) (2015)
Li, B., Dai, Y., Chen, H., He, M.: Single image depth estimation by dilated deep residual convolutional neural network and soft-weight-sum inference. CoRR arxiv:abs/1705.00534 (2017)
Li, J., Klein, R., Yao, A.: Learning fine-scaled depth maps from single RGB images. CoRR abs/1607.00730 (2016). http://arxiv.org/abs/abs/1607.00730
Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition (2015). http://arxiv.org/abs/1411.6387
Liu, M., Salzmann, M., He, X.: Discrete-continuous depth estimation from a single image. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 716–723. IEEE Computer Society, Washington, DC (2014). https://doi.org/10.1109/CVPR.2014.97
Mallick, T., Das, P.P., Majumdar, A.K.: Characterizations of noise in kinect depth images: a review. IEEE Sens. J. 14(6), 1731–1740 (2014)
Nathan Silberman, Derek Hoiem, P.K., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: ECCV (2012)
Nguyen, C.V., Izadi, S., Lovell, D.: Modeling kinect sensor noise for improved 3D reconstruction and tracking. In: 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization Transmission, pp. 524–530, October 2012
Roy, A., Todorovic, S.: Monocular depth estimation using neural regression forest. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5506–5514 (2016)
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Weiss, Y., Schölkopf, B., Platt, J.C. (eds.) Advances in Neural Information Processing Systems 18, pp. 1161–1168. MIT Press (2006). http://papers.nips.cc/paper/2921-learning-depth-from-single-monocular-images.pdf
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Proceedings of the 12th European Conference on Computer Vision - Volume Part V, ECCV 2012, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.: Towards unified depth and semantic prediction from a single image. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2800–2809, June 2015. https://doi.org/10.1109/CVPR.2015.7298897
Yu, Y., Song, Y., Zhang, Y., Wen, S.: A shadow repair approach for kinect depth maps. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) Computer Vision - ACCV 2012, pp. 615–626. Springer, Heidelberg (2013)
Acknowledgments
This work has been supported by the Spanish Government TIN2016-76515R Grant, supported with Feder funds. Edmanuel Cruz is funded by Panamenian grant for PhD studies IFARHU & SENACYT 270-2016-207. This work has also been supported by a Spanish grant for PhD studies ACIF/2017/243 and FPU16/00887. Thanks to Nvidia also for the generous donation of a Titan Xp and a Quadro P6000.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bauer, Z., Escalona, F., Cruz, E., Cazorla, M., Gomez-Donoso, F. (2019). Improving the 3D Perception of the Pepper Robot Using Depth Prediction from Monocular Frames. In: Fuentetaja Pizán, R., García Olaya, Á., Sesmero Lorente, M., Iglesias Martínez, J., Ledezma Espino, A. (eds) Advances in Physical Agents. WAF 2018. Advances in Intelligent Systems and Computing, vol 855. Springer, Cham. https://doi.org/10.1007/978-3-319-99885-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-99885-5_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99884-8
Online ISBN: 978-3-319-99885-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)