Abstract
In this work, we propose a convolutional neural network (CNN) architecture to identify six word-level handwritten scripts involving Arabic, Latin, Chinese, Bangla, Devanagari and Telugu. A large dataset of 14k word images per script was constructed based on several public handwritten datasets. Then, three architectures are proposed and compared based on standard metrics performance and time execution. Experiments conducted on both test and validation classification show high performances that outperform the state-of-art techniques. Indeed, the best result was provided by CNN model with three-convolutional-polling pairs layers that achieved an average script identification accuracy of 97.67% and ran in a sufficiently fast time of 2 ms per frame during the test phase.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ukil, S., Ghosh, S., Obaidullah, S.M., Santosh, K.C., Roy, K., Das, N.: Deep learning for word-level handwritten indic script identification. In: Santosh, K.C., Gawali, B. (eds.) RTIP2R 2020. CCIS, vol. 1380, pp. 499–510. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0507-9_42
Kanoun, S., et al.: Script and nature differentiation for Arabic and Latin text images. In: Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition. IEEE (2002)
Hochberg, J., et al.: Script and language identification for handwritten document images. Int. J. Doc. Anal. Recogn. 2(2) (1999)
Moussa, S.B., et al.: Fractal-based system for Arabic/Latin, printed/handwritten script identification. In: 2008 19th International Conference on Pattern Recognition. IEEE (2008)
Benjelil, M., et al.: Arabic and Latin script identification in printed and handwritten types based on steerable pyramid features. In: 2009 10th International Conference on Document Analysis and Recognition. IEEE (2009)
Cheikh Rouhou, A., Abdelhedi, Z., Kessentini, Y.: A HMM-based Arabic/Latin handwritten/printed identification system. In: Abraham, A., Haqiq, A., Alimi, A.M., Mezzour, G., Rokbani, N., Muda, A.K. (eds.) HIS 2016. AISC, vol. 552, pp. 298–307. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52941-7_30
Mahmoud, S.A., et al.: Online-khatt: an open-vocabulary database for Arabic online-text processing. Open Cybern. Syst. J. 12(1) (2018)
Pechwitz, M., et al.: IFN/ENIT-database of handwritten Arabic words. In: Proceedings of CIFED, vol. 2. Citeseer (2002)
Sarkar, R., et al.: CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int. J. Doc. Anal. Recogn. (IJDAR) 15(1) (2012)
Liu, C.-L., et al.: Online and offline handwritten Chinese character recognition: benchmarking on new databases. Pattern Recogn. 46(1) (2013)
Su, T., Zhang, T., Guan, D.: HIT-MW dataset for offline Chinese handwritten text recognition. In: Tenth International Workshop on Frontiers in Handwriting Recognition, Suvisoft (2006)
Dutta, K., et al.: Offline handwriting recognition on Devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE (2018)
Liwicki, M., Bunke, H.: IAM-OnDB-an on-line English sentence database acquired from handwritten text on a whiteboard. In: Eighth International Conference on Document Analysis and Recognition (ICDAR 2005). IEEE (2005)
Dutta, K., et al.: Towards spotting and recognition of handwritten words in Indic scripts. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE (2018)
LeCun, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11) (1998)
Culurciello, E.: Neural Network Architectures. Synthesis Lectures on Artificial Intelligence and Machine Learning, San Francisco (2017)
Cireşan, D.C., et al.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12) (2010)
Ciregan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. IN: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2012)
Mantas, J.: An overview of character recognition methodologies. Pattern Recogn. 19(6) (1986)
Obaidullah, S.Md., et al.: Handwritten Indic script identification in multi-script document images: a survey. Int. J. Pattern Recogn. Artif. Intell. 32(10) (2018)
Jaderberg, M., et al.: Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903 (2014)
Manmatha, R., Srimal, N.: Scale space technique for word segmentation in handwritten documents. In: Nielsen, M., Johansen, P., Olsen, O.F., Weickert, J. (eds.) Scale-Space 1999. LNCS, vol. 1682, pp. 22–33. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48236-9_3
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
El Bahy, S., Aboutabit, N., Ait Mait, H. (2023). A Deep Convolutional Neural Networks Approach for Word-Level Handwritten Script Identification Using a Large Dataset. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-29313-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28845-6
Online ISBN: 978-3-031-29313-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)