Abstract
In this paper, we propose a two-stage framework based on voxel neighborhood feature aggregation for 3D object detection in autonomous driving, named Neighbor Voxels to Point-RCNN (NV2P-RCNN). The point representation of point clouds can encode refined features, and the voxel representation provides an efficient processing framework, so we take advantage of both point representation and voxel representation of the point cloud in this paper. In the first stage, we add point density to the voxel feature encoding and extract voxel features by a 3D sparse convolutional network. In the second stage, the features of the raw point cloud are extracted and fused with the voxel features. To achieve the fast aggregation of voxel-to-point features, we design a neighbor voxels query method named NV-Query to find neighbor voxels directly through the voxel spatial coordinates of the points. The results on the KITTI and ONCE datasets show that NV2P-RCNN achieves higher detection precision compared with other existing methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 779–788 . https://doi.org/10.1109/cvpr.2016.91
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/tpami.2016.2577031
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems (NeurIPS), pp. 379–387
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S.E, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision (ECCV), pp. 21–37
Yu X, Ye X, Zhang S (2022) Floating pollutant image target extraction algorithm based on immune extremum region. Digital Signal Process 123:103442. https://doi.org/10.1016/j.dsp.2022.103442
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 6526–6534. https://doi.org/10.1109/cvpr.2017.691
Song S, Chandraker M (2015) Joint sfm and detection cues for monocular 3d localization in road scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3734–3742. https://doi.org/10.1109/cvpr.2015.7298997
Chen X, Kundu K, Zhu Y, Berneshawi A, Ma H, Fidler S, Urtasun R (2015) 3d object proposals for accurate object class detection. In: Advances in neural information processing systems (NeurIPS), pp. 424–432. https://doi.org/10.1109/tpami.2017.2706685
Li B, Zhang T, Xia T (2016) Vehicle detection from 3d lidar using fully convolutional network. In: Robotics: science and systems (RSS), vol. 12 . https://doi.org/10.15607/rss.2016.xii.042
Zhou Y, Tuzel O (2018) Voxelnet: end-to-end learning for point cloud based 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 4490–4499. https://doi.org/10.1109/cvpr.2018.00472
Liu Z, Tang H, Lin Y, Han S (2019) Point-voxel cnn for efficient 3d deep learning. In: Advances in neural information processing systems (NeurIPS), pp. 963–973
Yan Y, Mao Y, Li B (2018) Second: sparsely embedded convolutional detection. Sensors 18(10):3337. https://doi.org/10.3390/s18103337
Charles RQ, Su H, Kaichun M, Guibas LJ (2017) Pointnet: deep learning on point sets for 3d classification and segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 77–85. https://doi.org/10.1109/cvpr.2017.16
Qi C.R, Yi L, Su H, Guibas L.J (2017) Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems (NeurIPS), pp. 5099–5108
Wu W, Qi Z, Fuxin L (2019) Pointconv: deep convolutional networks on 3d point clouds. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 9621–9630 . https://doi.org/10.1109/cvpr.2019.00985
Thomas H, Qi C.R, Deschaud J.-E, Marcotegui B, Goulette F, Guibas L (2019) Kpconv: Flexible and deformable convolution for point clouds. In: IEEE international conference on computer vision (ICCV), pp. 6411–6420 . https://doi.org/10.1109/iccv.2019.00651
Engelmann F, Kontogianni T, Leibe B (2020) Dilated point convolutions: on the receptive field size of point convolutions on 3d point clouds. In: IEEE international conference on robotics and automation (ICRA), pp. 9463–9469. https://doi.org/10.1109/icra40945.2020.9197503
Simonovsky M, Komodakis N (2017) Dynamic edge-conditioned filters in convolutional neural networks on graphs. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 29–38. https://doi.org/10.1109/cvpr.2017.11
Zarzar J, Giancola S, Ghanem B (2019) Pointrgcn: graph convolution networks for 3d vehicles detection refinement. arXiv preprint arXiv:1911.12236
Shi W, Rajkumar R (2020) Point-gnn: Graph neural network for 3d object detection in a point cloud. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1711–1719. https://doi.org/10.1109/cvpr42600.2020.00178
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3d shapenets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1912–1920. https://doi.org/10.1109/cvpr.2015.7298801
Maturana D, Scherer S (2015) Voxnet: A 3d convolutional neural network for real-time object recognition. In: IEEE international conference on intelligent robots and systems (IROS), pp. 922–928. https://doi.org/10.1109/iros.2015.7353481
Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–779. https://doi.org/10.1109/cvpr.2019.00086
Jiang M, Wu Y, Lu C (2018) Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv preprint arXiv:1807.00652
Qi C.R, Liu W, Wu C, Su H, Guibas L.J (2018) Frustum pointnets for 3d object detection from rgb-d data. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 918–927. https://doi.org/10.1109/cvpr.2018.00102
Zhao H, Jiang L, Fu C.-W, Jia J (2019) Pointweb: Enhancing local neighborhood features for point cloud processing. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 5565–5573. https://doi.org/10.1109/cvpr.2019.00571
Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: IEEE international conference on intelligent robots and systems (IROS), pp. 1–8. https://doi.org/10.1109/iros.2018.8594049
Simon M, Amende K, Kraus A, Honer J, Samann T, Kaulbersch H, Milz S, Gross H.M (2019) Complexer-yolo: real-time 3d object detection and tracking on semantic point clouds. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 1–10. https://doi.org/10.1109/cvprw.2019.00158
Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European conference on computer vision (ECCV), pp. 663–678 . https://doi.org/10.1007/978-3-030-01270-0_39
Liang, M., Yang, B., Chen, Y., Hu, R., Urtasun, R.: Multi-task multi-sensor fusion for 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 7345–7353 (2019). https://doi.org/10.1109/cvpr.2019.00752
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 6517–6525. https://doi.org/10.1109/cvpr.2017.690
Lang A.H, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: fast encoders for object detection from point clouds. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 12697–12705. https://doi.org/10.1109/cvpr.2019.01298
Ye M, Xu S, Cao T (2020) Hvnet: hybrid voxel network for lidar based 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1631–1640. https://doi.org/10.1109/cvpr42600.2020.00170
Li X, Guivant J.E, Kwok N, Xu Y (2019) 3d backbone network for 3d object detection. arXiv preprint arXiv:1901.08373
Shi S, Wang Z, Shi J, Wang X, Li H (2021) From points to parts: 3d object detection from point cloud with part-aware and part-aggregation network. IEEE Trans Pattern Anal Mach Intell 43(8):2647–2664. https://doi.org/10.1109/tpami.2020.2977026
Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: sparse-to-dense 3d object detector for point cloud. In: IEEE international conference on computer vision (ICCV), pp. 1951–1960 . https://doi.org/10.1109/iccv.2019.00204
Yang Z, Sun Y, Liu S, Jia J (2020) 3dssd: Point-based 3d single stage object detector. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 11040–11048 . https://doi.org/10.1109/cvpr42600.2020.01105
Qi C.R, Litany O, He K, Guibas L (2019) Deep hough voting for 3d object detection in point clouds. In: IEEE international conference on computer vision (ICCV), pp. 9277–9286 . https://doi.org/10.1109/iccv.2019.00937
Yang Z, Sun Y, Liu S, Shen X, Jia J (2018) IPOD: Intensive point-based object detector for point cloud. arXiv preprint arXiv:1812.05276
Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495. https://doi.org/10.1109/tpami.2016.2644615
Zhang W, Wang X, You W, Chen J, Dai P, Zhang P (2019) RESLS: region and edge synergetic level set framework for image segmentation. IEEE Trans Image Process 29:57–71. https://doi.org/10.1109/tip.2019.2928134
Yu X (2014) Blurred trace infrared image segmentation based on template approach and immune factor. Infrared Phys Technol 67:116–120. https://doi.org/10.1016/j.infrared.2014.07.002
Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: IEEE international conference on computer vision (ICCV), pp. 9775–9784 . https://doi.org/10.1109/iccv.2019.00987
Shi S, Guo C, Jiang L, Wang Z, Shi J, Wang X, Li H (2020) Pv-rcnn: Point-voxel feature set abstraction for 3d object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 10529–10538. https://doi.org/10.1109/cvpr42600.2020.01054
Shi S, Jiang L, Deng J, Wang Z, Guo C, Shi J, Wang X, Li H (2021) Pv-rcnn++: point-voxel feature set abstraction with local vector representation for 3d object detection. arXiv preprint arXiv:2102.00463
Li J, Sun Y, Luo S, Zhu Z, Dai H, Krylov AS, Ding Y, Shao L (2021) P2v-rcnn: point to voxel feature learning for 3d object detection from point clouds. IEEE Access 9:98249–98260. https://doi.org/10.1109/access.2021.3094562
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778. https://doi.org/10.1109/cvpr.2016.90
Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/iccv.2017.324
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3354–3361 (2012). https://doi.org/10.1109/cvpr.2012.6248074
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237. https://doi.org/10.1177/0278364913491297
Mao J, Niu M, Jiang C, Liang H, Chen J, Liang X, Li Y, Ye C, Zhang W, Li Z, et al. (2021) One million scenes for autonomous driving: once dataset. arXiv preprint arXiv:2106.11037
Funding
This work is supported by a project of the National Natural Science Foundation of China (62072025).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huo, W., Jing, T. & Ren, S. NV2P-RCNN: Feature Aggregation Based on Voxel Neighborhood for 3D Object Detection. Neural Process Lett 55, 6925–6945 (2023). https://doi.org/10.1007/s11063-023-11244-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11244-x