Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network

Li, Zhenyu; Zhou, Aiguo

doi:10.1007/s10846-021-01421-2

Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network

Regular Paper
Published: 03 June 2021

Volume 102, article number 58, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network

Download PDF

222 Accesses
13 Citations
Explore all metrics

Abstract

Visual scene recognition is an indispensable part of automatic localization and navigation. In the same scene, the appearance and viewpoint may be changed greatly, which is the largest challenge for some advanced unmanned systems,e.g. robot,vehicle and UAV,etc., to identify scenes where they have visited. Traditional methods have been subjected to hand-made feature-based paradigms for a long time, mainly relying on the prior knowledge of the designer, and are not sufficiently robust to extreme changing scenes. In this paper, we cope with scene recognition with automatically learning the representation of features from big image samples. Firstly, we propose a novel approach for scene recognition via training a slight-weight convolutional neural network (CNN) that overall has less complex and more efficient network architecture, and is trainable in the manner of end-to-end. The proposed approach uses the deep-leaning features of self-selection combining with light CNN process to perform high semantic understanding of visual scenes. Secondly, we propose to employ a salient region-based technology to extract the local feature representation of a specific scene region directly from the convolution layer based on self-selection mechanism, and each layer performs a linear operation with end-to-end manner. Furthermore, we also utilize probability statistics to calculate the total similarity of several regions in one scene to other regions, and finally rank the similarity scores to select the correct scene. We have conducted a lot of experiments to evaluate the results of performance by comparing four methods (namely, our proposed and other three well known and advanced methods). Experimental results show that the proposed method is more robust and accurate than other three well-known methods in extremely harsh environments (e. g. weak light and strong blur).

Article PDF

Scene representation using a new two-branch neural network model

Article 01 December 2023

FCT: fusing CNN and transformer for scene classification

Article 15 September 2022

A very high-resolution scene classification model using transfer deep CNNs based on saliency features

Article 17 October 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Data Availability

Not applicable.

References

Cummins, M., Newman, P.: Appearance-only SLAM at large scale with FAB-MAP 2.0. Int. J. Robot. Res. 30, 1100–1123 (2011)
Article Google Scholar
Ng, P. C., Henikoff, S.: SIFT Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003)
Article Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Understand. 110, 346–359 (2008)
Article Google Scholar
Oishi, S., Inoue, Y., Miura, J., et al.: SeqSLAM++: View-based robot localization and navigation. Robot. Auton. Syst. 112, 13–21 (2019)
Article Google Scholar
Gálvez-López, D., Tardos, J. D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28, 1188–1197 (2012)
Article Google Scholar
Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: European Conference on Computer Vision, Berlin, Heidelberg, pp 748–761 (2010)
Sünderhauf, N., Dayoub, F., Shirazi, S., et al.: On the performance of convnet features for place recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Barcelona, Spain, pp 4297–4304 (2015)
Chen, Z., Lam, O., Jacobson, A.: Convolutional neural network-based place recognition. Comput. Sci. (2014)
Arroyo, R., Alcantarilla, P.F., Bergasa, L.M.: Fusion and Binarization of CNN features for robust topological localization across seasons. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, pp 4656–4663 (2016)
Chen, Z., Jacobson, A., Sunderhauf, N., et al.: Deep learning features at scale for visual place recognition. In: IEEE International Conference on Robotics and Automation (ICRA), Marina Bay Sands, Singapore, pp 3223–3230 (2017)
Fang, Y., Yan, J., Li, L., et al.: No reference quality assessment for screen content images with both local and global feature representation. IEEE Trans. Image Process. 27, 1600–1610 (2017)
Article MathSciNet Google Scholar
Stanchev, P. L., Green, D. Jr, Dimitrov, B.: High level colour similarity retrieval. Int. J. Inf. Theor. Appl. 10, 363–369 (2003)
Google Scholar
Islam, M.M., Zhang, D., Lu, G.: A geometric method to compute directionality features for texture images. In: Proc. ICME, pp 1521–1524 (2008)
Zhang, D., Lu, G.: Review of shape representation and description techniques. Pattern Recognit. 37, 1–19 (2004)
Article Google Scholar
Zhang, X., Wang, L., Zhao, Y., et al.: Graph-based place recognition in image sequences with CNN features. J. Intell. Robot. Syst. 95, 389–403 (2019)
Article Google Scholar
Arandjelovic, R., Gronat, P., Torii, A., et al.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5297–5307 (2016)
Chen, B., Li, J., Wei, G., et al.: M-SAC-VLADNet: a multi-path deep feature coding model for visual classification. Entropy 20, 341 (2018)
Article Google Scholar
Fan, R., Shuai, H., Liu, Q.: PointNet-Based channel attention VLAD network. In: Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pp 320–331 (2019)
Gomez-Ojeda, R., Lopez-Antequera, M., Petkov, N., et al.: Training a convolutional neural network for appearance-invariant place recognition. arXiv:1505.07428 (2015)
Quan, Y., Li, Z.: Zhang F.others. DNet-65 R-CNN: Object detection model fusing deep dilated convolutions and light-weight networks. In: Pacific Rim International Conference on Artificial Intelligence., pp 16–28 (2019)
Park, C., Jang, J., Zhang, L., et al.: Light-weight visual place recognition using convolutional neural network for mobile robots. In: 2018 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, pp 1–4 (2018)
Kim, J., Yoon, S.E.: Regional attention based deep feature for image retrieval. In: Proc. British Machine Vision Conference (BMVC), Newcastle, England (2018)
Lankton, S., Tannenbaum, A.: Localizing region-based active contours. IEEE Trans. Image Process. 17, 2029–2039 (2008)
Article MathSciNet Google Scholar
Carson, C., Thomas, M., Belongie, S., et al.: Blobworld: A system for region-based image indexing and retrieval. In: International Conference on Advances in Visual Information Systems, Berlin, Germany, pp 509–517 (1999)
Dai, J., Li, Y., He, K., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp 379–387 (2016)
Khaliq, A., Ehsan, S., Milford, M., et al.: CAMAL: Context-Aware Multi-Scale Attention framework for Lightweight Visual Place Recognition. arXiv:1909.08153 (2019)
Khaliq, A., Ehsan, S., Chen, Z., et al.: A Holistic Visual Scene Recognition Approach using Lightweight CNNs for Severe ViewPoint and Appearance Changes. arXiv:1811.03032 (2018)
Li, Z., Zhou, A., Wang, M., et al.: Deep fusion of multi-layers salient CNN features and similarity network for robust visual place recognition. In: IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali, China, pp 22–29 (2019)
Li, Z., Zhou, A., Shen, Y.: An end-to-end trainable multi-column CNN for scene recognition in extremely changing environment. Sensors 20, 1556 (2020)
Article Google Scholar
Wan, L., Zeiler, M., Zhang, S., et al.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp 1058–1066 (2013)
Chen, Z., Maffra, F., Sa, I., et al.: Only look once, mining distinctive landmarks from convnet for visual place recognition.. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, pp 9–16 (2017)
Azizpour, H., Sharif Razavian, A., Sullivan, J., et al.: From generic to specific deep representations for visual recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, USA, pp 36–45 (2015)
Geiger, A., Lenz, P., Stiller, C., et al.: Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32, 1231–1237 (2013)
Article Google Scholar
Luo, J., Pronobis, A., Caputo, B., Jensfelt, P.: The kth-idol2 database. KTH, CAS/CVAP, Tech Rep. 304 (2006)
Torii, A., Arandjelovic, R., Sivic, J., et al.: 24/7 place recognition by view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, vol. 8-10, pp 1808–1817 (2015)
Milford, M. J., Wyeth, G.F.: SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In: 2012 IEEE International Conference on Robotics and Automation, St. Paul, MN, USA, vol. 14-18, pp 1643–1649 (2012)
Arandjelovic, R., Gronat, P., Torii, A., et al.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp 5297–5307 (2016)
Zhou, B., Lapedriza, A., Khosla, A.: Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2017)
Article Google Scholar
Xiao J., Hays J., Ehinger, KA., et al.: Sun database: Large-scale scene recognition from abbey to zoo. In: Proc. CVPR (2010)
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: Proc. CVPR (2009)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proc. CVPR (2006)
Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: Proc. CVPR (2012)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Understand. (2007)
Griffin, G., Holub, A., Perona, P: Caltec 256 object category dataset (2007)
Yao, B., Jiang, X., Khosla, A., et al.: Human action recognition by learning bases of action attributes and parts. In: Proc. ICCV (2011)
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: Proc. ICCV (2007)
Gong, Y. C., Wang, L. W., Guo, R.Q.: Multi-scale orderless pooling of deep convolutional activation features (2014)
Razavian, A. S., Azizpour, H., Sullivan, J.S., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 806-813. IEEE, Columbus (2014)
Azizpour, H., Razavian, A. S., Sullivan, J., et al.: From generic to specific deep representations for visual recognition. In: Conference on Computer Vision and Pattern Recognition Workshop, pp. 36–45. IEEE, Boston (2015)
Zhou, B., Garcia, A. L., Xiao, J., et al.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, NIPS, Montréal, Quebec, Canada, pp 487–495 (2015)

Download references

Acknowledgments

The authors would like to thank the anonymous referee for the helpful comments. The research was supported by the National Key Research and Development Program of China (no. 2016YFB0100902).

Funding

The research was supported by the National Key Research and Development Program of China (no. 2016YFB0100902).

Author information

Authors and Affiliations

School of Mechanical Engineering, Tongji University, Shanghai, China
Zhenyu Li & Aiguo Zhou

Authors

Zhenyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Aiguo Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Zhenyu Li and Aiguo Zhou; Methodology: Zhenyu Li and Aiguo Zhou; Formal analysis and investigation: Aiguo Zhou; Writing - original draft preparation: Zhenyu Li; Writing - review and editing: Zhenyu Li and Aiguo Zhou; Funding acquisition: Aiguo Zhou;

Corresponding author

Correspondence to Zhenyu Li.

Ethics declarations

Competing interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Zhou, A. Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network. J Intell Robot Syst 102, 58 (2021). https://doi.org/10.1007/s10846-021-01421-2

Download citation

Received: 31 August 2020
Accepted: 13 May 2021
Published: 03 June 2021
DOI: https://doi.org/10.1007/s10846-021-01421-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network

Abstract

Article PDF

Similar content being viewed by others

Scene representation using a new two-branch neural network model

FCT: fusing CNN and transformer for scene classification

A very high-resolution scene classification model using transfer deep CNNs based on saliency features

Data Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network

Abstract

Article PDF

Similar content being viewed by others

Scene representation using a new two-branch neural network model

FCT: fusing CNN and transformer for scene classification

A very high-resolution scene classification model using transfer deep CNNs based on saliency features

Explore related subjects

Data Availability

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation