A Document Image Quality Assessment Algorithm Based on Information Entropy in Text Region

Zhang, Zongrui; Qiu, Jian; He, Hao

doi:10.1007/978-3-031-20738-9_72

Zongrui Zhang^8,9,
Jian Qiu^8,9 &
Hao He^8,9

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 153))

Included in the following conference series:

The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery

1757 Accesses

Abstract

The quality of the image is critical to Optical Character Recognition (OCR), poor quality images will lead OCR to generate unreliable results. There are relative high ratio of low quality images in practical OCR-based application scenarios, how to evaluate quality of image and filter out unqualified images by document image quality assessment (DIQA) algorithms effectively is a big challenge for these scenarios. Current DIQA algorithms mainly focus on the overall image features rather than the text region, while the quality of the text region is dominant factor for OCR. In this paper, we propose a document image quality assessment algorithm based on information entropy in text region of image. Our algorithmic framework mainly consists of three networks to detect, extract and evaluate text region in image respectively. We build a quality prediction network based on HyperNet, and use the information entropy of the text region as the score weight, so that the final score can reflect the quality of the text region better. Finally, testing results on benchmark dataset SmartDoc-QA and our constructed dataset DocImage1k demonstrate that the proposed algorithm achieves excellent performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Document Image Quality Assessment Method Based on Feature Fusion

Camera Captured DIQA with Linearity and Monotonicity Constraints

Document Image Quality Assessment via Explicit Blur and Text Size Estimation

References

Kong, T., Yao, A., Chen, Y., Sun, F.: Hypernet: Towards accurate region proposal generation and joint object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 845–853 (2016)
Google Scholar
Nayef, N., Luqman, M. M., Prum, S., Eskenazi, S., Chazalon, J., Ogier, J. M.: SmartDoc-QA: a dataset for quality assessment of smartphone captured document images-single and multiple distortions. In: 2015 International Conference on Document Analysis and Recognition (ICDAR), pp. 1231–1235 (2015)
Google Scholar
Blando, L. R., Kanai, J., Nartker, T. A.: Prediction of OCR accuracy using simple image features. In: 1995 International Conference on Document Analysis and Recognition, pp. 319–322 (1995)
Google Scholar
Souza, A., Cheriet, M., Naoi, S., Suen, C. Y.: Automatic filter selection using image quality assessment. In: 2003 International Conference on Document Analysis and Recognition, pp. 508–512 (2003)
Google Scholar
Cannon, M., Hochberg, J., Kelly, P.: Quality assessment and restoration of typewritten document images. Int. J. Doc. Anal. Recogn. 80–89 (1999)
Google Scholar
Peng, X., Cao, H., Subramanian, K., Prasad, R.,Natarajan, P.: Automated image quality assessment for camera-captured OCR. In: 2011 IEEE International Conference on Image Processing, pp. 2621–2624 (2011)
Google Scholar
Kumar, J., Chen, F., Doermann, D.: Sharpness estimation for document and scene images. In: 2012 International Conference on Pattern Recognition (ICPR), pp. 3292–3295 (2012)
Google Scholar
Bui, Q. A., Molard, D., Tabbone, S.: Predicting mobile-captured document images sharpness quality. In: 2018 International Workshop on Document Analysis Systems (DAS), pp. 275–280 (2018)
Google Scholar
Li, H., Zhu, F., Qiu, J.: Towards document image quality assessment: a text line based framework and a synthetic text line image dataset. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 551–558 (2019)
Google Scholar
Peng, X., Wang, C.: Camera captured DIQA with linearity and monotonicity constraints. In: 2020 International Workshop on Document Analysis Systems, pp. 168–181 (2020)
Google Scholar
Lu, T., Dooms, A.: A deep transfer learning approach to document image quality assessment. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1372–1377 (2019)
Google Scholar
Rodin, D., Loginov, V., Zagaynov, I., Orlov, N.: Document image quality assessment via explicit blur and text size estimation. In: 2021 International Conference on Document Analysis and Recognition(ICDAR), pp. 281–292 (2021)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: 2020 Conference on Artificial Intelligence, pp. 11474–11481 (2020)
Google Scholar
Kim, J., Zeng, H., Ghadiyaram, D., Lee, S., Zhang, L., Bovik, A.C.: Deep convolutional neural models for picture-quality prediction: challenges and solutions to data-driven image quality assessment. IEEE Signal Process. Mag. 34(6), 130–141 (2017)
Article Google Scholar
Ye, P., Kumar, J., Kang, L., Doermann, D.: Unsupervised feature learning framework for no-reference image quality assessment. In: 2012 Conference on Computer Vision and Pattern Recognition(CVPR), pp. 1098–1105 (2012)
Google Scholar
Nayef, N., Ogier, J. M.: Metric-based no-reference quality assessment of heterogeneous document images. In: Document Recognition and Retrieval XXII, vol. 9402, p. 94020L (2015)
Google Scholar
Li, H., Zhu, F., Qiu, J.:CG-DIQA: no-reference document image quality assessment based on character gradient. In: 2018 International Conference on Pattern Recognition (ICPR), pp. 3622–3626 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Lab of Advanced Optical Communication System and Network, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
Zongrui Zhang, Jian Qiu & Hao He
China Institute for Smart Court, Shanghai Jiao Tong University, Shanghai, China
Zongrui Zhang, Jian Qiu & Hao He

Authors

Zongrui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Hao He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zongrui Zhang or Hao He .

Editor information

Editors and Affiliations

Division of Intelligent Future Technologies, Mälardalen University, Västerås, Västmanlands Län, Sweden
Ning Xiong
Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UK
Maozhen Li
School of Information Science and Technology, Hunan University, Changsha, Hunan, China
Kenli Li
School of Information Science and Technology, Hunan University, Changsha, Hunan, China
Zheng Xiao
College of Computer and Data Science, Fuzhou University, Fuzhou, Fujian, China
Longlong Liao
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Lipo Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Qiu, J., He, H. (2023). A Document Image Quality Assessment Algorithm Based on Information Entropy in Text Region. In: Xiong, N., Li, M., Li, K., Xiao, Z., Liao, L., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 153. Springer, Cham. https://doi.org/10.1007/978-3-031-20738-9_72

Download citation

DOI: https://doi.org/10.1007/978-3-031-20738-9_72
Published: 30 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20737-2
Online ISBN: 978-3-031-20738-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Document Image Quality Assessment Algorithm Based on Information Entropy in Text Region

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Document Image Quality Assessment Method Based on Feature Fusion

Camera Captured DIQA with Linearity and Monotonicity Constraints

Document Image Quality Assessment via Explicit Blur and Text Size Estimation

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Document Image Quality Assessment Algorithm Based on Information Entropy in Text Region

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Document Image Quality Assessment Method Based on Feature Fusion

Camera Captured DIQA with Linearity and Monotonicity Constraints

Document Image Quality Assessment via Explicit Blur and Text Size Estimation

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation