Skip to main content

An Overview of Handwritten Character Recognition Systems for Historical Documents

  • Chapter
  • First Online:
Rediscovering Heritage Through Technology

Part of the book series: Studies in Computational Intelligence ((SCI,volume 859))

Abstract

Analysing Handwritten Documents is a challenging task. This particular area cannot always come up with general solutions, given that most handwritten manuscripts contain unique characteristics that describe how the document was written, which include different handwritings. These challenges in transcribing different handwriting styles are due to various scribes contributing to the transcription of the text and degradation of the script. In this chapter, an overview of different techniques used in handwritten text recognition systems is presented. The approaches and algorithms can be adopted for different document types irrespective of the state of the scanned documents. Moreover, two different general approaches to handwritten character recognition are shown. The first approach goes through a fairly standard process to normalise, segment and recognise characters. The other approach is a segmentation free approach that uses neural networks for both segmentation and recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Binarisation [38].

  2. 2.

    Erosion and dilation techniques [45].

  3. 3.

    Machine learning [35].

References

  1. I. Ahmad, G.A. Fink, Training an arabic handwriting recognizer without a handwritten training data set, in 2015 13th international conference on document analysis and recognition (ICDAR) (IEEE, 2015), pp. 476–480

    Google Scholar 

  2. M. Basavanna, S. Gornale, Skew detection and skew correction in scanned document image using principal component analysis (2015)

    Google Scholar 

  3. D.C. Blair, M.E. Maron, An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28(3), 289–299 (1985)

    Article  Google Scholar 

  4. T.C. Bockholt, G.D. Cavalcanti, C.A. Mello, Document image retrieval with morphology-based segmentation and features combination, Document Recognition and Retrieval XVIII, vol. 7874 (International Society for Optics and Photonics, 2011), p. 787415

    Google Scholar 

  5. C.-A. Boiangiu, M.C. Tanase, R. Ioanitescu. Text line segmentation in handwritten documents based on dynamic weights. J. Inf. Syst. Oper. Manag. 1 (2013)

    Google Scholar 

  6. S. Bukhari, F. Shafait, T. Breuel, Segmentation of curled textlines using active contours (2008), pp. 270–277

    Google Scholar 

  7. S.S. Bukhari, F. Shafait, T.M. Breuel, Script-independent handwritten textlines segmentation using active contours, in 2009 10th International Conference on Document Analysis and Recognition (2009), pp. 446–450

    Google Scholar 

  8. K. Chen, F. Yin, C.-L. Liu, Hybrid page segmentation with efficient whitespace rectangles extraction and grouping, in 2013 12th International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2013), pp. 958–962

    Google Scholar 

  9. D.C. Ciresan, U. Meier, L.M. Gambardella, J. Schmidhuber, Convolutional neural network committees for handwritten character classification, in 2011 International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2011), pp. 1135–1139

    Google Scholar 

  10. S.K.S. Dalbir et al., Review of online & offline character recognition. Int. J. Eng. Comput. Sci. 4(05) (2015)

    Google Scholar 

  11. M. Diem, F. Kleber, R. Sablatnig, Text classification and document layout analysis of paper fragments, in 2011 International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2011), pp. 854–858

    Google Scholar 

  12. A. Dingli, M. Bugeja, D. Seychell, S. Mercieca, Recognition of handwritten characters using google fonts and freeman chain codes, in International Cross-Domain Conference for Machine Learning and Knowledge Extraction (Springer, Berlin, 2018), pp. 65–78

    Google Scholar 

  13. L. Fortunati, J. O’Sullivan, Situating the social sustainability of print media in a world of digital alternatives. Telematics and Informatics (2018)

    Google Scholar 

  14. A. Gaur, S. Yadav, Handwritten hindi character recognition using k-means clustering and svm, in 2015 4th International Symposium on Emerging Trends and Technologies in Libraries and Information Services (ETTLIS) (IEEE, 2015), pp. 65–70

    Google Scholar 

  15. A. Graves, J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, in Advances in Neural Information Processing Systems (2009), pp. 545–552

    Google Scholar 

  16. P.V. Hough, Method and means for recognizing complex patterns, Accessed 18 Dec 1962. US Patent 3,069,654

    Google Scholar 

  17. A. Jameel, Experiments with various recurrent neural network architectures for handwritten character recognition, in 1994 Proceedings of Sixth International Conference on Tools with Artificial Intelligence (IEEE, 1994), pp. 548–554

    Google Scholar 

  18. P. Jana, S. Ghosh, S.K. Bera, R. Sarkar, Handwritten document image binarization: an adaptive k-means based approach, in 2017 IEEE Calcutta Conference (CALCON) (IEEE, 2017), pp. 226–230

    Google Scholar 

  19. M.S. Kadhm, A.P.D.A.K. Abdul, Handwriting word recognition based on svm classifier. Int. J. Adv. Comput. Sci. Appl. 1, 64–68 (2015)

    Google Scholar 

  20. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

    Google Scholar 

  21. B.S. Kumar, Image denoising based on non-local means filter and its method noise thresholding. Signal Image Video Process. 7(6), 1211–1227 (2013)

    Article  Google Scholar 

  22. V. Lavrenko, T.M. Rath, R. Manmatha, Holistic word recognition for handwritten historical documents, in First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings (IEEE, 2004), pp. 278–287

    Google Scholar 

  23. Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)

    Article  Google Scholar 

  24. L. Likforman-Sulem, A. Hanimyan, C. Faure, A Hough based algorithm for extracting text lines in handwritten documents, in Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 2 (1995), pp. 774–777

    Google Scholar 

  25. L. Likforman-Sulem, A. Zahour, B. Taconet, Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 9(2–4), 123–138 (2007)

    Article  Google Scholar 

  26. L. Liu, Y. Lu, C.Y. Suen, Near-duplicate document image matching: A graphical perspective. Pattern Recognit. 47(4), 1653–1663 (2014)

    Article  Google Scholar 

  27. G. Louloudis, B. Gatos, C. Halatsis, Text line detection in unconstrained handwritten documents using a block-based Hough transform approach, in Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2 (2007), pp. 599–603

    Google Scholar 

  28. G. Louloudis, B. Gatos, I. Pratikakis, C. Halatsis, Text line and word segmentation of handwritten documents. Pattern Recognit. 42(12), 3169–3183 (2009)

    Article  Google Scholar 

  29. H. Ma, D. Doermann, Word level script identification for scanned document images, Document Recognition and Retrieval XI, vol. 5296 (International Society for Optics and Photonics, 2003), pp. 124–136

    Google Scholar 

  30. R. Manmatha, N. Srimal, Scale space technique for word segmentation in handwritten documents, in International Conference on Scale-Space Theories in Computer Vision (Springer, Berlin, 1999), pp. 22–33

    Google Scholar 

  31. R.J. Mooney, L. Roy, Content-based book recommending using learning for text categorization, in Proceedings of the Fifth ACM Conference on Digital Libraries (ACM, 2000), pp. 195–204

    Google Scholar 

  32. B. Moysset, C. Kermorvant, C. Wolf, J. Louradour, Paragraph text segmentation into lines with recurrent neural networks, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2015)

    Google Scholar 

  33. M. Murdock, S. Reid, B. Hamilton, J. Reese, Icdar 2015 competition on text line detection in historical documents, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2015), pp. 1171–1175

    Google Scholar 

  34. D. Nasien, H. Haron, S.S. Yuhaniz, Support vector machine (svm) for english handwritten character recognition, in 2010 Second International Conference on Computer Engineering and Applications (ICCEA), vol. 1 (IEEE, 2010), pp. 249–252

    Google Scholar 

  35. N.M. Nasrabadi, Pattern recognition and machine learning. J. Electron. Imaging 16(4), 049901 (2007)

    Article  MathSciNet  Google Scholar 

  36. N. Otsu, A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  Google Scholar 

  37. N. Ouwayed, A. Belaïd, A general approach for multi-oriented text line extraction of handwritten documents. Int. J. Doc. Anal. Recognit. (IJDAR) 15(4), 297–314 (2012)

    Article  Google Scholar 

  38. L.O. Gorman, Binarization and multi thresholding of document images using connectivity. CVGIP: Graph. Models Image Process. 56(6), 494–506 (1994)

    Google Scholar 

  39. R. Plamondon, S.N. Srihari, Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)

    Article  Google Scholar 

  40. M.M. Rahman, M. Akhand, S. Islam, P.C. Shill, M.H. Rahman et al., Bangla handwritten character recognition using convolutional neural network. Int. J. Image, Graphics Signal Process. (IJIGSP) 7(8), 42–49 (2015)

    Google Scholar 

  41. T.M. Rath, R. Manmatha, Word spotting for historical documents. Int. J. Doc. Anal. Recognit. (IJDAR) 9(2–4), 139–152 (2007)

    Article  Google Scholar 

  42. Z. Shi, S. Setlur, V. Govindaraju, A steerable directional local profile technique for extraction of handwritten arabic text lines, in 10th International Conference on Document Analysis and Recognition, 2009. ICDAR’09 (IEEE, 2009), pp. 176–180

    Google Scholar 

  43. B.K. Shukla, G. Kumar, A. Kumar, An approach for skew detection using hough transform. Int. J. Comput. Appl. 136(9), 20–23 (2016)

    Google Scholar 

  44. P.Y. Simard, D. Steinkraus, J.C. Platt, Best practices for convolutional neural networks applied to visual document analysis, in null (IEEE, 2003), p. 958

    Google Scholar 

  45. P. Soille, Erosion and dilation, Morphological Image Analysis (Springer, Berlin, 2004), pp. 63–103

    Chapter  Google Scholar 

  46. M.H.J. Vala, A. Baxi, A review on otsu image segmentation algorithm. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 2(2), PP–387 (2013)

    Google Scholar 

  47. A. Al-Khatatneh, S.A. Pitchay, M. Al-Qudah, A review of skew detection techniques for document, in 2015 17th UKSim-AMSS International Conference on Modelling and Simulation (UKSim) (IEEE, 2015), pp. 316–321

    Google Scholar 

  48. C. Xu, J.L. Prince, Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 7(3), 359–369 (1998)

    Article  MathSciNet  Google Scholar 

  49. F. Yin, C.-L. Liu, Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recognit. 42(12), 3146 – 3157 (2009). New Frontiers in Handwriting Recognition

    Google Scholar 

  50. T. Zhang, C.Y. Suen, A fast parallel algorithm for thinning digital patterns. Commun. ACM 27(3), 236–239 (1984)

    Article  Google Scholar 

  51. M. Ziabari, V. Mottaghitalab, A. Haghi, Application of direct tracking method for measuring electrospun nanofiber diameter. Braz. J. Chem. Eng. 26(1), 53–62 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexiei Dingli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bugeja, M., Dingli, A., Seychell, D. (2020). An Overview of Handwritten Character Recognition Systems for Historical Documents. In: Seychell, D., Dingli, A. (eds) Rediscovering Heritage Through Technology. Studies in Computational Intelligence, vol 859. Springer, Cham. https://doi.org/10.1007/978-3-030-36107-5_1

Download citation

Publish with us

Policies and ethics