Abstract
3D object classification is an important component in semantic scene understanding for mobile robots. However, many current systems do not consider the practical issues such as object representation from different viewing positions of mobile robots. A novel 3D object representation is introduced using cylindrical occupancy grid and 3D convolutional neural network with row-wise max pooling layer. Due to the rotationally invariant characteristics of this method, robots can successfully classify 3D objects regardless of starting positions of object modelling. Experimental results on publicly available benchmark dataset show the significantly improved performance compared with other conventional algorithms.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
I. Kostavelis and A. Gasteratos, “Semantic mapping for mobile robotics tasks: a survey,” Robotics and Autonomous Systems, vol. 66, pp. 86–103, 2015.
A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “CNN features off-the-shelf: an astounding baseline for recognition,” Proc. of IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 512–519, 2014.
R. Socher, B. Huval, B. Bath, C. D. Manning, and A. Y. Ng, “Convolutional-recursive deep learning for 3d object classification,” Advances in Neural Information Processing Systems, pp. 656–664, 2012.
L. A. Alexandre, “3d object recognition using convolutional neural networks with transfer learning between input channels,” Intelligent Autonomous Systems 13, pp. 889–898, 2016.
D. Maturana and S. Scherer, “Voxnet: a 3D convolutional neural network for real-time object recognition,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 922–928, 2015.
V. Hegde and R. Zade, “Fusionnet: 3D object classification using multiple data representations,” arXiv preprint arXiv:1607.05695, 2016.
J. Behley, V. Steinhage, and A. B. Cremers, “Performance of histogram descriptors for the classification of 3D laser range data in urban environments,” Proc. of IEEE International Conference on Robotics and Automation, pp. 4391–4398, 2012.
A. Teichman, J. Levinson, and S. Thrun, “Towards 3d object recognition via classification of arbitrary object tracks,” Proc. of IEEE International Conference on Robotics and Automation, pp. 4034–4041, 2011.
S. M. Prakhya, B. Liu, and W. Lin, “B-shot: a binary feature descriptor for fast and efficient keypoint matching on 3d point clouds,” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1929–1934, 2015.
S. Bu, P. Han, Z. Liu, K. Li, and J. Han, “Shift-invariant ring feature for 3d shape,” The Visual Computer, vol. 30, no. 6–8, pp. 867–876, 2014.
F. J. Huang, Y. L. Boureau, and Y. LeCun, “Unsupervised learning of invariant feature hierarchies with applications to object recognition,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2007.
H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” Proc. of 26th Annual International Conference on Machine Learning, pp. 609–616, 2009.
M. Norouzi, M. Ranjbar, and G. Mori, “Stacks of convolutional restricted boltzmann machines for shift-invariant feature learning,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2735–2742, 2009.
R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” Proc. of 25th International Conference on Machine Learning, pp. 160–167, 2008.
J. Fan, W. Xu, Y. Wu, and Y. Gong, “Human tracking using convolutional neural networks,” IEEE Trans. on Neural Networks, vol. 21, no. 10, pp. 1610–1623, 2010.
M. Yang, F. Lv, W. Xu, and Y. Gong, “Detection driven adaptive multi-cue integration for multiple human tracking,” Proc. of IEEE International Conference on Computer Vision, pp. 1554–1561, 2009.
H. Lee, P. Pham, Y. Largman, and A. Y. Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks,” Advances in Neural Information Processing Systems, pp. 1096–1104, 2009.
S. Hershey, S. Chaudhuri, D. P. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, “CNN architectures for large-scale audio classification,” Proc. of IEEE International Conference on Acoustics, Speech and Signal, pp. 131–135, 2017.
H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3D shape recognition,” Proc. of IEEE International Conference on Computer Vision, pp. 945–953, 2015.
B. Shi, S. Bai, Z. Zhou, and X. Bai, “Deeppano: deep panoramic representation for 3D shape recognition,” IEEE Signal Processing Letters, vol. 22, no. 12, pp. 2339–2343, 2015.
Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: a deep representation for volumetric shapes,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1912–1920, 2015.
Y. Zhou and O. Tuzel, “Voxelnet: end-to-end learning for point cloud based 3d object detection,” arXiv preprint arXiv:1711.06396, 2017.
A. Brock, T. Lim, J. M. Ritchie, and N. Weston, “Generative and discriminative voxel modeling with convolutional neural networks,” arXiv preprint arXiv:1608.04236, 2016.
G. Riegler, A. O. Ulusoy, and A. Geiger, “Octnet: learning deep 3d representations at high resolutions,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3577–3586, 2017.
M. Engelcke, D. Rao, D. Z. Wang, C. H. Tong, and I. Posner, “Vote3deep: fast object detection in 3D point clouds using efficient convolutional neural networks,” Proc. of IEEE International Conference on Robotics and Automation, pp. 1355–1361, 2017.
C. R. Qi, H. Su, M. NieSSner, A. Dai, M. Yan, and L. J. Guibas, “Volumetric and multi-view cnns for object classification on 3D data,” Proc. of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 5648–5656, 2016.
C. V. Nguyen, S. Izadi, and D. Lovell, “Modeling kinect sensor noise for improved 3D reconstruction and tracking,” Proc. of IEEE International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, pp. 524–530, 2012.
S. Li, J. Wang, Z. Liang, and L. Su, “Tree point clouds registration using an improved icp algorithm based on kdtree,” Proc. of IEEE International Conference on Geoscience and Remote Sensing Symposium, pp. 4545–4548, 2016.
K. Liu, H. Skibbe, T. Schmidt, T. Blein, K. Palme, T. Brox, and O. Ronneberger, “Rotation-invariant hog descriptors using fourier analysis in polar and spherical coordinates,” International Journal of Computer Vision, vol. 106, no. 3, pp. 342–364, 2014.
S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
K. V. Vishwanath, D. Gupta, A. Vahdat, and K. Yocum, “Modelnet: towards a datacenter emulation environment,” Proc. of IEEE International Conference on Peer-to-Peer Computing, pp. 81–82, 2009.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Dong-Joong Kang under the direction of Editor Euntai Kim. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. 2017R1A2B2002608).
Jiyoun Moon received her Bachelor’s of Science in Robotics from Kwangwoon University in August 2014. Her major research interests include Natural language process, Semantic scene understanding, and Mission planning.
Hanjun Kim received his Bachelor’s of Science in Electrical and Computer Engineering from Seoul National University in February 2015. His major research interests include SLAM, Reinforcement Learning, and Semantic scene understanding.
Beomhee Lee received the B.S. and M.S. degrees in Electronics Engineering from Seoul National University, in 1978 and 1980, respectively, and the Ph.D. degree in Computer Information, and control engineering from the University of Michigan, Ann Arbor, MI, USA in 1985. Since then, he had been associated with the School of Electrical Engineering at Purdue University as an Assistant Professor until 1987.
Rights and permissions
About this article
Cite this article
Moon, J., Kim, H. & Lee, B. View-point Invariant 3D Classification for Mobile Robots Using a Convolutional Neural Network. Int. J. Control Autom. Syst. 16, 2888–2895 (2018). https://doi.org/10.1007/s12555-018-0182-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-018-0182-y