Abstract
Analysing Handwritten Documents is a challenging task. This particular area cannot always come up with general solutions, given that most handwritten manuscripts contain unique characteristics that describe how the document was written, which include different handwritings. These challenges in transcribing different handwriting styles are due to various scribes contributing to the transcription of the text and degradation of the script. In this chapter, an overview of different techniques used in handwritten text recognition systems is presented. The approaches and algorithms can be adopted for different document types irrespective of the state of the scanned documents. Moreover, two different general approaches to handwritten character recognition are shown. The first approach goes through a fairly standard process to normalise, segment and recognise characters. The other approach is a segmentation free approach that uses neural networks for both segmentation and recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
I. Ahmad, G.A. Fink, Training an arabic handwriting recognizer without a handwritten training data set, in 2015 13th international conference on document analysis and recognition (ICDAR) (IEEE, 2015), pp. 476–480
M. Basavanna, S. Gornale, Skew detection and skew correction in scanned document image using principal component analysis (2015)
D.C. Blair, M.E. Maron, An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28(3), 289–299 (1985)
T.C. Bockholt, G.D. Cavalcanti, C.A. Mello, Document image retrieval with morphology-based segmentation and features combination, Document Recognition and Retrieval XVIII, vol. 7874 (International Society for Optics and Photonics, 2011), p. 787415
C.-A. Boiangiu, M.C. Tanase, R. Ioanitescu. Text line segmentation in handwritten documents based on dynamic weights. J. Inf. Syst. Oper. Manag. 1 (2013)
S. Bukhari, F. Shafait, T. Breuel, Segmentation of curled textlines using active contours (2008), pp. 270–277
S.S. Bukhari, F. Shafait, T.M. Breuel, Script-independent handwritten textlines segmentation using active contours, in 2009 10th International Conference on Document Analysis and Recognition (2009), pp. 446–450
K. Chen, F. Yin, C.-L. Liu, Hybrid page segmentation with efficient whitespace rectangles extraction and grouping, in 2013 12th International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2013), pp. 958–962
D.C. Ciresan, U. Meier, L.M. Gambardella, J. Schmidhuber, Convolutional neural network committees for handwritten character classification, in 2011 International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2011), pp. 1135–1139
S.K.S. Dalbir et al., Review of online & offline character recognition. Int. J. Eng. Comput. Sci. 4(05) (2015)
M. Diem, F. Kleber, R. Sablatnig, Text classification and document layout analysis of paper fragments, in 2011 International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2011), pp. 854–858
A. Dingli, M. Bugeja, D. Seychell, S. Mercieca, Recognition of handwritten characters using google fonts and freeman chain codes, in International Cross-Domain Conference for Machine Learning and Knowledge Extraction (Springer, Berlin, 2018), pp. 65–78
L. Fortunati, J. O’Sullivan, Situating the social sustainability of print media in a world of digital alternatives. Telematics and Informatics (2018)
A. Gaur, S. Yadav, Handwritten hindi character recognition using k-means clustering and svm, in 2015 4th International Symposium on Emerging Trends and Technologies in Libraries and Information Services (ETTLIS) (IEEE, 2015), pp. 65–70
A. Graves, J. Schmidhuber, Offline handwriting recognition with multidimensional recurrent neural networks, in Advances in Neural Information Processing Systems (2009), pp. 545–552
P.V. Hough, Method and means for recognizing complex patterns, Accessed 18 Dec 1962. US Patent 3,069,654
A. Jameel, Experiments with various recurrent neural network architectures for handwritten character recognition, in 1994 Proceedings of Sixth International Conference on Tools with Artificial Intelligence (IEEE, 1994), pp. 548–554
P. Jana, S. Ghosh, S.K. Bera, R. Sarkar, Handwritten document image binarization: an adaptive k-means based approach, in 2017 IEEE Calcutta Conference (CALCON) (IEEE, 2017), pp. 226–230
M.S. Kadhm, A.P.D.A.K. Abdul, Handwriting word recognition based on svm classifier. Int. J. Adv. Comput. Sci. Appl. 1, 64–68 (2015)
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
B.S. Kumar, Image denoising based on non-local means filter and its method noise thresholding. Signal Image Video Process. 7(6), 1211–1227 (2013)
V. Lavrenko, T.M. Rath, R. Manmatha, Holistic word recognition for handwritten historical documents, in First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings (IEEE, 2004), pp. 278–287
Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D. Jackel, Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
L. Likforman-Sulem, A. Hanimyan, C. Faure, A Hough based algorithm for extracting text lines in handwritten documents, in Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 2 (1995), pp. 774–777
L. Likforman-Sulem, A. Zahour, B. Taconet, Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 9(2–4), 123–138 (2007)
L. Liu, Y. Lu, C.Y. Suen, Near-duplicate document image matching: A graphical perspective. Pattern Recognit. 47(4), 1653–1663 (2014)
G. Louloudis, B. Gatos, C. Halatsis, Text line detection in unconstrained handwritten documents using a block-based Hough transform approach, in Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2 (2007), pp. 599–603
G. Louloudis, B. Gatos, I. Pratikakis, C. Halatsis, Text line and word segmentation of handwritten documents. Pattern Recognit. 42(12), 3169–3183 (2009)
H. Ma, D. Doermann, Word level script identification for scanned document images, Document Recognition and Retrieval XI, vol. 5296 (International Society for Optics and Photonics, 2003), pp. 124–136
R. Manmatha, N. Srimal, Scale space technique for word segmentation in handwritten documents, in International Conference on Scale-Space Theories in Computer Vision (Springer, Berlin, 1999), pp. 22–33
R.J. Mooney, L. Roy, Content-based book recommending using learning for text categorization, in Proceedings of the Fifth ACM Conference on Digital Libraries (ACM, 2000), pp. 195–204
B. Moysset, C. Kermorvant, C. Wolf, J. Louradour, Paragraph text segmentation into lines with recurrent neural networks, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2015)
M. Murdock, S. Reid, B. Hamilton, J. Reese, Icdar 2015 competition on text line detection in historical documents, in 2015 13th International Conference on Document Analysis and Recognition (ICDAR) (IEEE, 2015), pp. 1171–1175
D. Nasien, H. Haron, S.S. Yuhaniz, Support vector machine (svm) for english handwritten character recognition, in 2010 Second International Conference on Computer Engineering and Applications (ICCEA), vol. 1 (IEEE, 2010), pp. 249–252
N.M. Nasrabadi, Pattern recognition and machine learning. J. Electron. Imaging 16(4), 049901 (2007)
N. Otsu, A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
N. Ouwayed, A. Belaïd, A general approach for multi-oriented text line extraction of handwritten documents. Int. J. Doc. Anal. Recognit. (IJDAR) 15(4), 297–314 (2012)
L.O. Gorman, Binarization and multi thresholding of document images using connectivity. CVGIP: Graph. Models Image Process. 56(6), 494–506 (1994)
R. Plamondon, S.N. Srihari, Online and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)
M.M. Rahman, M. Akhand, S. Islam, P.C. Shill, M.H. Rahman et al., Bangla handwritten character recognition using convolutional neural network. Int. J. Image, Graphics Signal Process. (IJIGSP) 7(8), 42–49 (2015)
T.M. Rath, R. Manmatha, Word spotting for historical documents. Int. J. Doc. Anal. Recognit. (IJDAR) 9(2–4), 139–152 (2007)
Z. Shi, S. Setlur, V. Govindaraju, A steerable directional local profile technique for extraction of handwritten arabic text lines, in 10th International Conference on Document Analysis and Recognition, 2009. ICDAR’09 (IEEE, 2009), pp. 176–180
B.K. Shukla, G. Kumar, A. Kumar, An approach for skew detection using hough transform. Int. J. Comput. Appl. 136(9), 20–23 (2016)
P.Y. Simard, D. Steinkraus, J.C. Platt, Best practices for convolutional neural networks applied to visual document analysis, in null (IEEE, 2003), p. 958
P. Soille, Erosion and dilation, Morphological Image Analysis (Springer, Berlin, 2004), pp. 63–103
M.H.J. Vala, A. Baxi, A review on otsu image segmentation algorithm. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 2(2), PP–387 (2013)
A. Al-Khatatneh, S.A. Pitchay, M. Al-Qudah, A review of skew detection techniques for document, in 2015 17th UKSim-AMSS International Conference on Modelling and Simulation (UKSim) (IEEE, 2015), pp. 316–321
C. Xu, J.L. Prince, Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 7(3), 359–369 (1998)
F. Yin, C.-L. Liu, Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recognit. 42(12), 3146 – 3157 (2009). New Frontiers in Handwriting Recognition
T. Zhang, C.Y. Suen, A fast parallel algorithm for thinning digital patterns. Commun. ACM 27(3), 236–239 (1984)
M. Ziabari, V. Mottaghitalab, A. Haghi, Application of direct tracking method for measuring electrospun nanofiber diameter. Braz. J. Chem. Eng. 26(1), 53–62 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bugeja, M., Dingli, A., Seychell, D. (2020). An Overview of Handwritten Character Recognition Systems for Historical Documents. In: Seychell, D., Dingli, A. (eds) Rediscovering Heritage Through Technology. Studies in Computational Intelligence, vol 859. Springer, Cham. https://doi.org/10.1007/978-3-030-36107-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-36107-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36106-8
Online ISBN: 978-3-030-36107-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)