Abstract
Visual scene recognition is an indispensable part of automatic localization and navigation. In the same scene, the appearance and viewpoint may be changed greatly, which is the largest challenge for some advanced unmanned systems,e.g. robot,vehicle and UAV,etc., to identify scenes where they have visited. Traditional methods have been subjected to hand-made feature-based paradigms for a long time, mainly relying on the prior knowledge of the designer, and are not sufficiently robust to extreme changing scenes. In this paper, we cope with scene recognition with automatically learning the representation of features from big image samples. Firstly, we propose a novel approach for scene recognition via training a slight-weight convolutional neural network (CNN) that overall has less complex and more efficient network architecture, and is trainable in the manner of end-to-end. The proposed approach uses the deep-leaning features of self-selection combining with light CNN process to perform high semantic understanding of visual scenes. Secondly, we propose to employ a salient region-based technology to extract the local feature representation of a specific scene region directly from the convolution layer based on self-selection mechanism, and each layer performs a linear operation with end-to-end manner. Furthermore, we also utilize probability statistics to calculate the total similarity of several regions in one scene to other regions, and finally rank the similarity scores to select the correct scene. We have conducted a lot of experiments to evaluate the results of performance by comparing four methods (namely, our proposed and other three well known and advanced methods). Experimental results show that the proposed method is more robust and accurate than other three well-known methods in extremely harsh environments (e. g. weak light and strong blur).
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Data Availability
Not applicable.
References
Cummins, M., Newman, P.: Appearance-only SLAM at large scale with FAB-MAP 2.0. Int. J. Robot. Res. 30, 1100–1123 (2011)
Ng, P. C., Henikoff, S.: SIFT Predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Understand. 110, 346–359 (2008)
Oishi, S., Inoue, Y., Miura, J., et al.: SeqSLAM++: View-based robot localization and navigation. Robot. Auton. Syst. 112, 13–21 (2019)
Gálvez-López, D., Tardos, J. D.: Bags of binary words for fast place recognition in image sequences. IEEE Trans. Robot. 28, 1188–1197 (2012)
Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: European Conference on Computer Vision, Berlin, Heidelberg, pp 748–761 (2010)
Sünderhauf, N., Dayoub, F., Shirazi, S., et al.: On the performance of convnet features for place recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Barcelona, Spain, pp 4297–4304 (2015)
Chen, Z., Lam, O., Jacobson, A.: Convolutional neural network-based place recognition. Comput. Sci. (2014)
Arroyo, R., Alcantarilla, P.F., Bergasa, L.M.: Fusion and Binarization of CNN features for robust topological localization across seasons. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, pp 4656–4663 (2016)
Chen, Z., Jacobson, A., Sunderhauf, N., et al.: Deep learning features at scale for visual place recognition. In: IEEE International Conference on Robotics and Automation (ICRA), Marina Bay Sands, Singapore, pp 3223–3230 (2017)
Fang, Y., Yan, J., Li, L., et al.: No reference quality assessment for screen content images with both local and global feature representation. IEEE Trans. Image Process. 27, 1600–1610 (2017)
Stanchev, P. L., Green, D. Jr, Dimitrov, B.: High level colour similarity retrieval. Int. J. Inf. Theor. Appl. 10, 363–369 (2003)
Islam, M.M., Zhang, D., Lu, G.: A geometric method to compute directionality features for texture images. In: Proc. ICME, pp 1521–1524 (2008)
Zhang, D., Lu, G.: Review of shape representation and description techniques. Pattern Recognit. 37, 1–19 (2004)
Zhang, X., Wang, L., Zhao, Y., et al.: Graph-based place recognition in image sequences with CNN features. J. Intell. Robot. Syst. 95, 389–403 (2019)
Arandjelovic, R., Gronat, P., Torii, A., et al.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5297–5307 (2016)
Chen, B., Li, J., Wei, G., et al.: M-SAC-VLADNet: a multi-path deep feature coding model for visual classification. Entropy 20, 341 (2018)
Fan, R., Shuai, H., Liu, Q.: PointNet-Based channel attention VLAD network. In: Chinese Conference on Pattern Recognition and Computer Vision (PRCV), pp 320–331 (2019)
Gomez-Ojeda, R., Lopez-Antequera, M., Petkov, N., et al.: Training a convolutional neural network for appearance-invariant place recognition. arXiv:1505.07428 (2015)
Quan, Y., Li, Z.: Zhang F.others. DNet-65 R-CNN: Object detection model fusing deep dilated convolutions and light-weight networks. In: Pacific Rim International Conference on Artificial Intelligence., pp 16–28 (2019)
Park, C., Jang, J., Zhang, L., et al.: Light-weight visual place recognition using convolutional neural network for mobile robots. In: 2018 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, pp 1–4 (2018)
Kim, J., Yoon, S.E.: Regional attention based deep feature for image retrieval. In: Proc. British Machine Vision Conference (BMVC), Newcastle, England (2018)
Lankton, S., Tannenbaum, A.: Localizing region-based active contours. IEEE Trans. Image Process. 17, 2029–2039 (2008)
Carson, C., Thomas, M., Belongie, S., et al.: Blobworld: A system for region-based image indexing and retrieval. In: International Conference on Advances in Visual Information Systems, Berlin, Germany, pp 509–517 (1999)
Dai, J., Li, Y., He, K., et al.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp 379–387 (2016)
Khaliq, A., Ehsan, S., Milford, M., et al.: CAMAL: Context-Aware Multi-Scale Attention framework for Lightweight Visual Place Recognition. arXiv:1909.08153 (2019)
Khaliq, A., Ehsan, S., Chen, Z., et al.: A Holistic Visual Scene Recognition Approach using Lightweight CNNs for Severe ViewPoint and Appearance Changes. arXiv:1811.03032 (2018)
Li, Z., Zhou, A., Wang, M., et al.: Deep fusion of multi-layers salient CNN features and similarity network for robust visual place recognition. In: IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali, China, pp 22–29 (2019)
Li, Z., Zhou, A., Shen, Y.: An end-to-end trainable multi-column CNN for scene recognition in extremely changing environment. Sensors 20, 1556 (2020)
Wan, L., Zeiler, M., Zhang, S., et al.: Regularization of neural networks using dropconnect. In: International Conference on Machine Learning, pp 1058–1066 (2013)
Chen, Z., Maffra, F., Sa, I., et al.: Only look once, mining distinctive landmarks from convnet for visual place recognition.. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, pp 9–16 (2017)
Azizpour, H., Sharif Razavian, A., Sullivan, J., et al.: From generic to specific deep representations for visual recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, USA, pp 36–45 (2015)
Geiger, A., Lenz, P., Stiller, C., et al.: Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 32, 1231–1237 (2013)
Luo, J., Pronobis, A., Caputo, B., Jensfelt, P.: The kth-idol2 database. KTH, CAS/CVAP, Tech Rep. 304 (2006)
Torii, A., Arandjelovic, R., Sivic, J., et al.: 24/7 place recognition by view synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, vol. 8-10, pp 1808–1817 (2015)
Milford, M. J., Wyeth, G.F.: SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In: 2012 IEEE International Conference on Robotics and Automation, St. Paul, MN, USA, vol. 14-18, pp 1643–1649 (2012)
Arandjelovic, R., Gronat, P., Torii, A., et al.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp 5297–5307 (2016)
Zhou, B., Lapedriza, A., Khosla, A.: Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1452–1464 (2017)
Xiao J., Hays J., Ehinger, KA., et al.: Sun database: Large-scale scene recognition from abbey to zoo. In: Proc. CVPR (2010)
Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: Proc. CVPR (2009)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proc. CVPR (2006)
Patterson, G., Hays, J.: Sun attribute database: Discovering, annotating, and recognizing scene attributes. In: Proc. CVPR (2012)
Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Understand. (2007)
Griffin, G., Holub, A., Perona, P: Caltec 256 object category dataset (2007)
Yao, B., Jiang, X., Khosla, A., et al.: Human action recognition by learning bases of action attributes and parts. In: Proc. ICCV (2011)
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: Proc. ICCV (2007)
Gong, Y. C., Wang, L. W., Guo, R.Q.: Multi-scale orderless pooling of deep convolutional activation features (2014)
Razavian, A. S., Azizpour, H., Sullivan, J.S., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 806-813. IEEE, Columbus (2014)
Azizpour, H., Razavian, A. S., Sullivan, J., et al.: From generic to specific deep representations for visual recognition. In: Conference on Computer Vision and Pattern Recognition Workshop, pp. 36–45. IEEE, Boston (2015)
Zhou, B., Garcia, A. L., Xiao, J., et al.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, NIPS, Montréal, Quebec, Canada, pp 487–495 (2015)
Acknowledgments
The authors would like to thank the anonymous referee for the helpful comments. The research was supported by the National Key Research and Development Program of China (no. 2016YFB0100902).
Funding
The research was supported by the National Key Research and Development Program of China (no. 2016YFB0100902).
Author information
Authors and Affiliations
Contributions
Conceptualization: Zhenyu Li and Aiguo Zhou; Methodology: Zhenyu Li and Aiguo Zhou; Formal analysis and investigation: Aiguo Zhou; Writing - original draft preparation: Zhenyu Li; Writing - review and editing: Zhenyu Li and Aiguo Zhou; Funding acquisition: Aiguo Zhou;
Corresponding author
Ethics declarations
Competing interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, Z., Zhou, A. Self-Selection Salient Region-Based Scene Recognition Using Slight-Weight Convolutional Neural Network. J Intell Robot Syst 102, 58 (2021). https://doi.org/10.1007/s10846-021-01421-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-021-01421-2