A Hierarchical Scene Text Detector Concerning Hard Examples

Yang, Jun; Zhang, Zhaogong; Li, Jianzhong

doi:10.1007/978-3-030-70665-4_13

Jun Yang⁸,
Zhaogong Zhang⁸ &
Jianzhong Li⁹

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 88))

Included in the following conference series:

The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery

64 Accesses

Abstract

Nowadays, the scene text detection approaches based on deep learning have already achieved promising performances. However, due to the insufficient use of the intermediate feature layer information of the neural network, and the learning of the indistinguishable scale and hard examples, it is difficult to improve the accuracy and the recall. Therefore, based on the Fully Convolutional Networks (FCNs), we propose a hierarchical scene text detector concerning hard examples. Specifically, our proposed detector HST-DHE directly predicts text lines of arbitrary orientations in full images without pre-defined anchor boxes. And the feature pyramid is constructed to predict the features at different levels to make full use of the feature information in the middle layers. At the same time, the loss function is redesigned to make the network focus on hard examples and further improve the precision of text detection. Experiments on standard datasets including ICDAR 2013, ICDAR 2015 and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art detectors. According to the results, our proposed method has stronger robustness in multi-scale and multi-direction natural scene images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network

A Fast Method for Scene Text Detection

TextFuse: Fusing Deep Scene Text Detection Models for Enhanced Performance

Article 07 August 2023

References

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Lin, T., Dollr, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of CVPR, pp. 936–944 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Article Google Scholar
Jiang, Y., et al.: $R^{2}$CNN: Rotational Region CNN for Orientation Robust Scene Text Detection. CoRR abs/1706.09579 (2017)
Google Scholar
Lin, T., Goyal, P., Girshick, R., He, K., Dollr, P.: Focal loss for dense object detection. In: Proceedings of ICCV, pp. 2999–3007 (2017)
Google Scholar
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of CVPR, pp. 761–769 (2016)
Google Scholar
Huang, L., Yang, Y., Deng, Y., Yu, Y.: Densebox: Unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874. (2015)
Rezatofighi, S., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of CVPR (2019)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings of ICDAR, pp. 1156–1160 (2015)
Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: Proceedings of CVPR (2012)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of CVPR, pp. 2642–2651 (2017)
Google Scholar
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Cong Yao.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of ECCV, pp. 19–35 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Heilongjiang University, Harbin, China
Jun Yang & Zhaogong Zhang
Harbin Institute of Technology, Harbin, China
Jianzhong Li

Authors

Jun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaogong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge, UK
Hongying Meng
School of Electronical Information and Artificial Engineering, Shaanxi University of Science and Technology, Xi’an, China
Tao Lei
College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge, UK
Maozhen Li
College of Electrical and Information, Hunan University, Changsha, China
Kenli Li
Division of Intelligent Future Technologies, Mälardalen University, Västerås, Västmanlands Län, Sweden
Ning Xiong
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Lipo Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, J., Zhang, Z., Li, J. (2021). A Hierarchical Scene Text Detector Concerning Hard Examples. In: Meng, H., Lei, T., Li, M., Li, K., Xiong, N., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 88. Springer, Cham. https://doi.org/10.1007/978-3-030-70665-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-70665-4_13
Published: 27 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70664-7
Online ISBN: 978-3-030-70665-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Hierarchical Scene Text Detector Concerning Hard Examples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network

A Fast Method for Scene Text Detection

TextFuse: Fusing Deep Scene Text Detection Models for Enhanced Performance

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Hierarchical Scene Text Detector Concerning Hard Examples

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network

A Fast Method for Scene Text Detection

TextFuse: Fusing Deep Scene Text Detection Models for Enhanced Performance

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation