Script Identification in Printed Bilingual Documents

Dhanya, D.; Ramakrishnan, A. G.

doi:10.1007/3-540-45869-7_2

D. Dhanya⁶ &
A. G. Ramakrishnan⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2423))

Included in the following conference series:

International Workshop on Document Analysis Systems

1221 Accesses
11 Citations

Abstract

Identification of script in multi-lingual documents is essential for many language dependent applications suchas machine translation and optical character recognition. Techniques for script identification generally require large areas for operation so that sufficient information is available. Suchassumption is nullified in Indian context, as there is an interspersion of words of two different scripts in most documents. In this paper, techniques to identify the script of a word are discussed. Two different approaches have been proposed and tested. The first method structures words into 3 distinct spatial zones and utilizes the information on the spatial spread of a word in upper and lower zones, together with the character density, in order to identify the script. The second technique analyzes the directional energy distribution of a word using Gabor filters withsuitable frequencies and orientations. Words withv arious font styles and sizes have been used for the testing of the proposed algorithms and the results obtained are quite encouraging.

Download to read the full chapter text

Chapter PDF

Word-Level Handwritten Script Identification from Multi-script Documents

Script identification algorithms: a survey

Article 29 July 2017

Multi-script Identification from Printed Words

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Spitz, A.L.: Determination of Script and Language Content of Document Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 235–245
Article Google Scholar
Sibun, P., Spitz, A.L.: Natural Language Processing from Scanned Document Images. In: Proceedings of the Applied Natural Language Processing, Stuttgart (1994) 115–121
Google Scholar
Nakayama, T., Spitz, A.L.: European Language Determination from Image. In: Proceedings of the International Conference on Document Analysis and Recognition, Japan (1993) 159–162
Google Scholar
Hochberg, J., et al.: Automatic Script Identification from Images Using Cluster-Based Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997) 176–181
Article Google Scholar
Dang, L., et al.: Language Identification for Printed Text Independent of Segmentation. In: Proceedings of the International Conference on Image Processing. (1995) 428–431
Google Scholar
Tan, C.L., et al.: Language Identification in Multi-lingual Documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 751–756
Article Google Scholar
Tan, T.N.: Rotation Invariant Texture Features and their Use in Automatic Script Identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 751–756
Article Google Scholar
Chaudhuri, B.B., Pal, U.: A complete Printed bangla OCR System. Pattern Recognition 31 (1998) 531–549
Article Google Scholar
Chaudhuri, B.B., Pal, U.: Automatic Separation of Words in Multi-lingual Multiscript Indian Documents. In: Proceedings of the International Conference on Document Analysis and Recognition, Germany (1997) 576–579
Google Scholar
Chaudhury, S., Sheth, R.: Trainable Script Identification Strategies for Indian languages. In: Proceedings of the International Conference on Document Analysis and Recognition, India (1999) 657–660
Google Scholar
Hubel, D.H., Wiesel, T.N.: Receptive Fields and Functional Architecture in Two Non-striate Visual Areas 18 and 19 of the Cat. Journal of Neurophysiology 28 (1965) 229–289
Google Scholar
Campbell, F.W., Kulikowski, J.J.: Orientational Selectivity of Human Visual System. Journal of Physiology 187 (1966) 437–445
Google Scholar
Chen, Y.K., et al.: Skew Detection and Reconstruction Based on Maximization of Variance of Transition-Counts. Pattern Recognition 33 (2000) 195–208
Article Google Scholar
Dhanya, D.: Bilingual OCR for Tamil and Roman Scripts. Master’s thesis, Department of Electrical Engineering, Indian Institute of Science (2001)
Google Scholar
Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2 (1998) 955–974
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Science, 560 012, Bangalore, India
D. Dhanya & A. G. Ramakrishnan

Authors

D. Dhanya
View author publications
You can also search for this author in PubMed Google Scholar
A. G. Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bell Labs, Lucent Technologies, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Daniel Lopresti
Avaya Labs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
Jianying Hu & Ramanujan Kashi &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dhanya, D., Ramakrishnan, A.G. (2002). Script Identification in Printed Bilingual Documents. In: Lopresti, D., Hu, J., Kashi, R. (eds) Document Analysis Systems V. DAS 2002. Lecture Notes in Computer Science, vol 2423. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45869-7_2

Download citation

DOI: https://doi.org/10.1007/3-540-45869-7_2
Published: 09 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44068-0
Online ISBN: 978-3-540-45869-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Script Identification in Printed Bilingual Documents

Abstract

Chapter PDF

Similar content being viewed by others

Word-Level Handwritten Script Identification from Multi-script Documents

Script identification algorithms: a survey

Multi-script Identification from Printed Words

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Script Identification in Printed Bilingual Documents

Abstract

Chapter PDF

Similar content being viewed by others

Word-Level Handwritten Script Identification from Multi-script Documents

Script identification algorithms: a survey

Multi-script Identification from Printed Words

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation