Optical Character Recognition and Text Line Recognition of Handwritten Documents: A Survey

Dutta, Prarthana; Muppalaneni, Naresh Babu

doi:10.1007/978-981-99-5881-8_41

Prarthana Dutta⁷ &
Naresh Babu Muppalaneni⁷

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Included in the following conference series:

International Conference on Worldwide Computing and Its Applications

123 Accesses

Abstract

Optical Character Recognition (OCR) is a research area that deals with digitizing and converting any handwritten text or document into digitized form. The main need for such a conversion is to efficiently store, access, preserve, and also transfer the wealth of knowledge in these documents for the future. The last few decades have witnessed the escalation and interest of the research community toward developing new ideas and methodologies for OCR in the context of text line extraction and recognition. Identifying the individual lines in a handwritten document is one of the most crucial stages in recognizing language, words, and characters. The nature and style of handwriting make the task of recognizing the individual text lines from a handwritten document a challenging one. A critical analysis of the various text line recognition systems in offline handwritten documents is presented in this work. This overview will help researchers understand OCR and the various text line recognition strategies carried out in research over the years.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Survey on Line Segmentation Techniques for Indic Scripts

An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Documents

Machine-Printed Character Recognition

References

Adıgüzel H, Şahin E, Duygulu P (2012) A hybrid approach for line segmentation in handwritten documents. In: 2012 International conference on frontiers in handwriting recognition, pp 503–508
Google Scholar
Alaei A, Nagabhushan P, Pal U (2011) Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents. Pattern Anal Appl 14:381–394
Article MathSciNet Google Scholar
Ali AAA, Suresha M (2019) Efficient algorithms for text lines and words segmentation for recognition of Arabic handwritten script. In: Emerging research in computing, information, communication and applications: ERCICA 2018, vol 1. Springer, pp 387–401
Google Scholar
Ali AAA, Suresha M (2020) Survey on segmentation and recognition of handwritten Arabic script. SN Comput Sci 1(4):192
Article Google Scholar
Balaha HM, Ali HA, Badawy M (2021) Automatic recognition of handwritten Arabic characters: a comprehensive review. Neural Comput Appl 33:3011–3034
Article Google Scholar
Balakrishnan N, Reddy R, Ganapathiraju M, Ambati V (2006) Digital library of India: a testbed for Indian language research. TCDL Bulletin 3(1):No–pp
Google Scholar
Barakat BK, Droby A, Alaasam R, Madi B, Rabaev I, Shammes R, El-Sana J (2021) Unsupervised deep learning for text line segmentation. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 2304–2311
Google Scholar
Chhajro M, Khan H, Khan F, Kumar K, Wagan A, Solangi S (2020) Handwritten Urdu character recognition via images using different machine learning and deep learning techniques. Indian J Sci Technol 13(17):1746–1754
Article Google Scholar
Dargan S, Kumar M, Ayyagari MR, Kumar G (2020) A survey of deep learning and its applications: a new paradigm to machine learning. Arch Comput Meth Eng 27(4):1071–1092
Article MathSciNet Google Scholar
Dutta A, Garai A, Biswas S, Das AK (2021) Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images. Int J Doc Anal Recogn (IJDAR) 24(4):299–313
Article Google Scholar
Dutta P, Muppalaneni NB (2022) A survey on image segmentation for handwriting recognition. In: Third international conference on image processing and capsule networks: ICIPCN 2022. Springer, pp 491–506
Google Scholar
Farooq F, Govindaraju V, Perrone M (2005) Pre-processing methods for handwritten arabic documents. In: Eighth international conference on document analysis and recognition (ICDAR’05). IEEE, pp 267–271
Google Scholar
Grimsdale R, Sumner F, Tunis C, Kilburn T (1959) A system for the automatic recognition of patterns. Proc IEE-Part B Radio Electron Eng 106(26):210–221
Article Google Scholar
Hiremath P, Pujari JD, Shivashankar S, Mouneswara V (2010) Script identification in a handwritten document image using texture features. In: 2010 IEEE 2nd international advance computing conference (IACC). IEEE, pp 110–114
Google Scholar
Jetley S, Belhe S, Koppula VK, Negi A (2012) Two-stage hybrid binarization around fringe map based text line segmentation for document images. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 343–346
Google Scholar
Jindal S, Lehal GS (2012) Line segmentation of handwritten Gurmukhi manuscripts. In: Proceeding of the workshop on document analysis and recognition, pp 74–78
Google Scholar
Kesiman MWA, Burie JC, Ogier JM (2016) A new scheme for text line and character segmentation from gray scale images of palm leaf manuscript. In: 2016 15th international conference on frontiers in handwriting recognition (ICFHR). IEEE, pp 325–330
Google Scholar
Khobragade RN, Koli NA, Lanjewar VT (2020) Challenges in recognition of online and off-line compound handwritten characters: a review. Smart Trends Comput Commun Proc SmartCom 2019:375–383
Article Google Scholar
Kundu S, Paul S, Bera SK, Abraham A, Sarkar R (2020) Text-line extraction from handwritten document images using gan. Expert Syst Appl 140:112916
Article Google Scholar
Lee AW, Chung J, Lee M (2021) Gnhk: A dataset for English handwriting in the wild. In: document analysis and recognition–ICDAR 2021: 16th international conference, Lausanne, Switzerland, Sept 5–10, 2021, Proceedings, Part IV 16. Springer, pp 399–412
Google Scholar
Li D, Wu Y, Zhou Y (2021) Linecounter: Learning handwritten text line segmentation by counting. In: 2021 IEEE international conference on image processing (ICIP). IEEE, pp 929–933
Google Scholar
Malik SA, Maqsood M, Aadil F, Khan MF (2020) An efficient segmentation technique for Urdu optical character recognizer (OCR). In: Advances in information and communication: proceedings of the 2019 future of information and communication conference (FICC), vol 2. Springer, pp 131–141
Google Scholar
Memon J, Sami M, Khan RA, Uddin M (2020) Handwritten optical character recognition (OCR): a comprehensive systematic literature review (SLR). IEEE Access 8:142642–142668
Article Google Scholar
Messaoud IB, Amiri H, El Abed H, Märgner V (2012) A multilevel text-line segmentation framework for handwritten historical documents. In: 2012 international conference on frontiers in handwriting recognition. IEEE, pp 515–520
Google Scholar
Narang SR, Jindal MK, Kumar M (2020) Ancient text recognition: a review. Artifi Intell Rev 53:5517–5558
Article Google Scholar
Neche C, Belaid A, Kacem-Echi A (2019) Arabic handwritten documents segmentation into text-lines and words using deep learning. In: 2019 international conference on document analysis and recognition workshops (ICDARW), vol 6. IEEE, pp 19–24
Google Scholar
Obaidullah SM, Halder C, Santosh K, Das N, Roy K (2018) Phdindic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification. Multimedia Tools Appl 77:1643–1678
Article Google Scholar
Plamondon R, Srihari SN (2000) Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans Pattern Anal Mach Intell 22(1):63–84
Article Google Scholar
Purohit A, Chauhan SS (2016) A literature survey on handwritten character recognition. Int J Comput Sci Inf Technol (IJCSIT) 7(1):1–5
Google Scholar
Rajyagor B, Rakholia R (2021) Tri-level handwritten text segmentation techniques for Gujarati language. Indian J Sci Technol 14(7):618–627
Article Google Scholar
Rani S (2015) Recognition of Gurmukhi handwritten manuscripts. Ph.D. thesis, Punjabi University, Patiala
Google Scholar
Razak Z, Zulkiflee K, Idris MYI, Tamil EM, Noor MNM, Salleh R, Yaakob M, Yusof ZM, Yaacob M (2008) Off-line handwriting text line segmentation: a review. Int J Comput Sci Netw Secur 8(7):12–20
Google Scholar
Rohini S, RS UD, Mohanavel S (2012) Segmentation of touching, overlapping, skewed and short handwritten text lines. Int J Comput Appl 49(19)
Google Scholar
Saabni R, Asi A, El-Sana J (2014) Text line extraction for historical document images. Pattern Recognit Lett 35:23–33
Article Google Scholar
Singh S, Garg NK (2021) Review of optical Devanagari character recognition techniques. In: Intelligent system design: proceedings of intelligent system design: India 2019. Springer, pp 97–106
Google Scholar
Singh S (2013) Optical character recognition techniques: a survey. J Emerg Trends Comput Inf Sci 4(6)
Google Scholar
Souhar A, Boulid Y. Ameur E, Ouagague MM (2017) Watershed transform for text lines extraction on binary Arabic handwriten documents. In: Proceedings of the 2nd international conference on big data, cloud and applications, pp 1–6
Google Scholar
Suleyman E, Hamdulla A, Tuerxun P, Moydin K (2021) An adaptive threshold algorithm for offline Uyghur handwritten text line segmentation. Wireless Netw 27:3483–3495
Article Google Scholar
Tamhankar PA, Masalkar KD et al (2020) A novel approach for character segmentation of offline handwritten Marathi documents written in Modi script. Procedia Comput Sci 171:179–187
Article Google Scholar
Vashist PC, Pandey A, Tripathi A (2020) A comparative study of handwriting recognition techniques. In: 2020 international conference on computation, automation and knowledge management (ICCAKM). IEEE, pp 456–461
Google Scholar
Vo QN, Kim SH, Yang HJ, Lee GS (2018) Text line segmentation using a fully convolutional network in handwritten document images. IET Image Process 12(3):438–446
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology Silchar, Assam, India
Prarthana Dutta & Naresh Babu Muppalaneni

Authors

Prarthana Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Naresh Babu Muppalaneni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prarthana Dutta .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Malaviya National Institute of Technology Jaipur, Jaipur, Rajasthan, India
Ashish Kumar Tripathi
Department of Computer Science and Engineering, Sir Padampat Singhania University, Udaipur, India
Darpan Anand
School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
Atulya K. Nagar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dutta, P., Muppalaneni, N.B. (2023). Optical Character Recognition and Text Line Recognition of Handwritten Documents: A Survey. In: Tripathi, A.K., Anand, D., Nagar, A.K. (eds) Proceedings of World Conference on Artificial Intelligence: Advances and Applications. WWCA 1997. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-99-5881-8_41

Download citation

DOI: https://doi.org/10.1007/978-981-99-5881-8_41
Published: 02 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5880-1
Online ISBN: 978-981-99-5881-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Optical Character Recognition and Text Line Recognition of Handwritten Documents: A Survey

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Line Segmentation Techniques for Indic Scripts

An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Documents

Machine-Printed Character Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optical Character Recognition and Text Line Recognition of Handwritten Documents: A Survey

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey on Line Segmentation Techniques for Indic Scripts

An Automated Pipeline for Robust Image Processing and Optical Character Recognition of Historical Documents

Machine-Printed Character Recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation