Abstract
We present a novel attention-based mechanism to learn enhanced point features for point cloud processing tasks, e.g., classification and segmentation. Unlike prior studies, which were trained to optimize the weights of a pre-selected set of attention points, our approach learns to locate the best attention points to maximize the performance of a specific task, e.g., point cloud classification. Importantly, we advocate the use of single attention point to facilitate semantic understanding in point feature learning. Specifically, we formulate a new and simple convolution, which combines convolutional features from an input point and its corresponding learned attention point (LAP). Our attention mechanism can be easily incorporated into state-of-the-art point cloud classification and segmentation networks. Extensive experiments on common benchmarks, such as ModelNet40, ShapeNetPart, and S3DIS, all demonstrate that our LAP-enabled networks consistently outperform the respective original networks, as well as other competitive alternatives, which employ multiple attention points, either pre-selected or learned under our LAP framework.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Weickert J. Anisotropic diffusion in image processing. B.G. Teubner Stuttgart, 1998
Buades A, Coll B, Morel J M. A non-local algorithm for image denoising. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2005. 60–65
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of Conference on Neural Information Processing Systems, 2017. 5998–6008
Liu X, Xia T, Wang J, et al. Fully convolutional attention networks for fine-grained recognition. 2016. ArXiv:1603.06765
Wang F, Jiang M, Qian C, et al. Residual attention network for image classification. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2017. 3156–3164
Peng Y X, He X T, Zhao J J. Object-part attention model for fine-grained image classification. IEEE Trans Image Process, 2018, 27: 1487–1500
Mnih V, Heess N, Graves A, et al. Recurrent models of visual attention. In: Proceedings of Conference on Neural Information Processing Systems, 2014. 2204–2212
Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. In: Proceedings of Conference on Neural Information Processing Systems, 2017. 1024–1034
Wang Y, Sun Y, Liu Z, et al. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 2019, 38: 146
Zhao H, Jiang L, Fu C W, et al. PointWeb: enhancing local neighborhood features for point cloud processing. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019. 5565–5573
Yan X, Zheng C, Li Z, et al. PointASNL: robust point clouds processing using nonlocal neural networks with adaptive sampling. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2020. 5589–5598
Liu X, Han Z, Liu Y S, et al. Point2Sequence: learning the shape representation of 3D point clouds with an attention-based sequence to sequence network. In: Proceedings of AAAI Conference on Artificial Intelligence, 2019. 8778–8785
Qi C R, Yi L, Su H, et al. PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Proceedings of Conference on Neural Information Processing Systems, 2017. 5099–5108
Liu Y, Fan B, Xiang S, et al. Relation-shape convolutional neural network for point cloud analysis. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019. 8895–8904
Thomas H, Qi C R, Deschaud J E, et al. KPConv: flexible and deformable convolution for point clouds. In: Proceedings of International Conference on Computer Vision, 2019. 6411–6420
Qi C R, Su H, Mo K, et al. PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2017. 652–660
Riegler G, Ulusoy A O, Geiger A. OctNet: learning deep 3D representations at high resolutions. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2017. 3577–3586
Atzmon M, Maron H, Lipman Y. Point convolutional neural networks by extension operators. ACM Trans Graph, 2018, 37: 1–12
Li Y, Bu R, Sun M, et al. PointCNN: convolution on x-transformed points. In: Proceedings of Conference on Neural Information Processing Systems, 2018. 820–830
Guo Y, Wang H, Hu Q, et al. Deep learning for 3D point clouds: a survey. IEEE Trans Pattern Anal Mach Intell, 2021, 43: 4338–4364
Dovrat O, Lang I, Avidan S. Learning to sample. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019
Xu M, Zhang J, Zhou Z, et al. Learning geometry-disentangled representation for complementary understanding of 3D object point cloud. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021. 3056–3064
Boulch A. ConvPoint: continuous convolutions for point cloud processing. Comput Graphics, 2020, 88: 24–34
Komarichev A, Zhong Z, Hua J. A-CNN: annularly convolutional neural networks on point clouds. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019. 7421–7430
Wu W, Qi Z, Fuxin L. PointConv: deep convolutional networks on 3D point clouds. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019. 9621–9630
Xu M, Ding R, Zhao H, et al. PAConv: position adaptive convolution with dynamic kernel assembling on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021. 3173–3182
Cheng M, Hui L, Xie J, et al. Cascaded non-local neural network for point cloud semantic segmentation. 2020. ArXiv:2007.15488
Yang J, Zhang Q, Ni B, et al. Modeling point clouds with self-attention and Gumbel subset sampling. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019. 3323–3332
Zhang W, Xiao C. PCAN: 3D attention map learning using contextual information for point cloud based retrieval. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019. 12436–12445
Chen C, Fragonara L Z, Tsourdos A. GAPNet: graph attention based point neural network for exploiting local feature of point cloud. 2019. ArXiv:1905.08705
Zhao H, Jiang L, Jia J, et al. Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. 16259–16268
Lu H, Chen X, Zhang G, et al. SCANet: spatial-channel attention network for 3D object detection. In: Proceedings of International Conference on Acoustics, Speech & Signal Processing, 2019. 1992–1996
Han W, Wen C, Wang C, et al. Point2Node: correlation learning of dynamic-node for point cloud feature modeling. In: Proceedings of AAAI Conference on Artificial Intelligence, 2020. 10925–10932
Zhang L, Xu D, Arnab A, et al. Dynamic graph message passing networks. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2020. 3726–3735
Xue L, Li X, Zhang N L. Not all attention is needed: gated attention network for sequence data. In: Proceedings of AAAI Conference on Artificial Intelligence, 2020. 6550–6557
Xie S, Liu S, Chen Z, et al. Attentional ShapeContextNet for point cloud recognition. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2018. 4606–4615
Liang Z, Yang M, Wang C. 3D graph embedding learning with a structure-aware loss function for point cloud semantic instance segmentation. 2019. ArXiv:1902.05247
Wu Z, Song S, Khosla A, et al. 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2015. 1912–1920
Li R, Li X, Heng P A, et al. PointAugment: an auto-augmentation framework for point cloud classification. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2020. 6378–6387
Liu Y, Fan B, Meng G, et al. DensePoint: learning densely contextual representation for efficient point cloud processing. In: Proceedings of International Conference on Computer Vision, 2019. 5239–5248
Goyal A, Law H, Liu B, et al. Revisiting point cloud shape classification with a simple and effective baseline. In: Proceedings of International Conference on Machine Learning, 2021
Yi L, Kim V G, Ceylan D, et al. A scalable active framework for region annotation in 3D shape collections. ACM Trans Graph, 2016, 35: 1–12
Chang A X, Funkhouser T, Guibas L, et al. ShapeNet: an information-rich 3D model repository. 2015. ArXiv:1512.03012
Klokov R, Lempitsky V. Escape from cells: deep Kd-networks for the recognition of 3D point cloud models. In: Proceedings of International Conference on Computer Vision, 2017. 863–872
Armeni I, Sener O, Zamir A R, et al. 3D semantic parsing of large-scale indoor spaces. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2016. 1534–1543
Tatarchenko M, Park J, Koltun V, et al. Tangent convolutions for dense prediction in 3D. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2018. 3887–3896
Landrieu L, Simonovsky M. Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2018. 4558–4567
Jiang L, Zhao H, Liu S, et al. Hierarchical point-edge interaction network for point cloud semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019. 10433–10441
Choy C, Gwak J, Savarese S. 4D spatio-temporal ConvNets: Minkowski convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, 2019. 3075–3084
Qiu S, Anwar S, Barnes N. Semantic segmentation for real point cloud scenes via bilateral augmentation and adaptive fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. 1757–1767
Hu Z, Zhen M, Bai X, et al. JSENet: joint semantic segmentation and edge detection network for 3D point clouds. In: Proceedings of European Conference on Computer Vision, 2020
Sauder J, Sievers B. Self-supervised deep learning on point clouds by reconstructing space. In: Proceedings of Conference on Neural Information Processing Systems, 2019. 12962–12972
Acknowledgements
This work was supported in part by National Natural Science Foundation of China (Grant Nos. U21B2023, 62161146005, U2001206, 61902254), Guangdong Talent Program (Grant No. 2019JC05X328), Guangdong Science and Technology Program (Grant No. 2020A0505100064), DEGP Key Project (Grant Nos. 2018KZDXM058, 2020SFKC059), Shenzhen Science and Technology Innovation Program (Grant Nos. JCYJ20210324120213036, RCJC20200714114435012), HKSAR (Grant No. CUHK14206320), NSERC (Grant No. 611370), and Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ). We thank all anonymous reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supporting information Appendixes A and B. The supporting information is available online at https://info.scichina.com and https://springerlink.bibliotecabuap.elogim.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.
Supplementary File
Rights and permissions
About this article
Cite this article
Lin, L., Huang, P., Fu, CW. et al. On learning the right attention point for feature enhancement. Sci. China Inf. Sci. 66, 112107 (2023). https://doi.org/10.1007/s11432-021-3431-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-021-3431-9