A font and size-independent OCR system for printed Kannada documents using support vector machines

Ashwin, T. V.; Sastry, P. S.

doi:10.1007/BF02703311

A font and size-independent OCR system for printed Kannada documents using support vector machines

Published: February 2002

Volume 27, pages 35–58, (2002)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Sadhana Aims and scope Submit manuscript

A font and size-independent OCR system for printed Kannada documents using support vector machines

Download PDF

T. V. Ashwin¹^nAff2 &
P. S. Sastry¹

345 Accesses
66 Citations
3 Altmetric
Explore all metrics

Abstract

This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm is motivated by the structure of the script. We propose a novel set of features for the recognition problem which are computationally simple to extract. The final recognition is achieved by employing a number of 2-class classifiers based on the Support Vector Machine (SVM) method. The recognition is independent of the font and size of the printed text and the system is seen to deliver reasonable performance.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Antani S, Agnihotri L 1999 Gujarathi character recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 418–421
Google Scholar
Ashwin T V 2000A font and size independent OCR for printed Kannada using SVM. M E Project Report, Dept. Electrical Engg., Indian Institute of Science, Bangalore
Google Scholar
Bansal V, Sinha R M K 1999 On how to describe shapes of Devanagari characters and use them for recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 410–13
Google Scholar
Bosker M 1992 Omnidocument technologies.Proc. IEEE 80: 1066–1078
Article Google Scholar
Burges C 1988 A tutorial on support vector machines for pattern recognition.Data Mining Knowledge Discovery 2: 121–167, available athttp://svm.research.bell-labs.com/papers/tutoriaL web -page.ps.gz.
Article Google Scholar
Choudhury B B, Pal U 1997 An OCR system to read two Indian language scripts: Bangla and Devanagari. InProc. Fourth Int. Conf. on Document Analysis and Recognition (IEEE Computer Society Press) pp 1011–1015
Jagadeesh G S Gopinath V 2000 Kantex, a transliteration package for Kannada available at http://langmuir.eecs.berkeley.edur venkates/kantex_l.00.html).
Joachims T 1999a Making large-scale support vector machine learning practical. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available athttp://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.ps.gz
Google Scholar
Joachims T 1999bSVMlight. http://www-ai.informatik.uni-dortmund.de/FORSCHUNG/VER-FAHREN/SVM_LIGHT/svm_light.eng.html
Keerthi S S, Shevade S K, Bhattacharyya C, Murthy K R K 2000 A fast iterative nearest point algorithm for support vector machine classifier design.IEEE Trans. Neural Networks 11: 124–136
Article Google Scholar
Lee H J, Chen B 1992 Recognition of handwritten Chinese characters via short line segments.Pattern Recogn. 25: 543–552
Article Google Scholar
Lu S W, Ren Y, Suen C Y 1991 Hierarchical attributed graph representation and recognition of handwritten Chinese characters.Pattern Recogn. 24: 617–632
Article Google Scholar
Mangasarian O L, Musicant D R 1999 Successive overrelaxation for support vector machines.IEEE Trans. Neural Networks 10: 1032–1037
Article Google Scholar
O’Gorman L, Kasturi R 1995Document image analysis (IEEE Computer Society Press)
Pavlidis T 1986 A vectorizer and feature extractor for document recognition.Comput. Vision Graphics Image Process. 35: 111–127
Article Google Scholar
Platt J C 1999 Sequential minimal optimisation: A fast algorithm for training support vector machines. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available athttp://www.research.microsoft.com/∼jplatt
Google Scholar
Sekita I, Toraichi K, Mori R 1988 Feature extraction of hand written Japanese characters using spline functions and relaxation matching.Pattern Recogn. 21: 821–828
Article Google Scholar
Sinha R M K, Mahabala H 1979 Machine recognition of Devanagari script.IEEE Trans. Syst., Man Cybern. 9: 435–149
Article MATH MathSciNet Google Scholar
Trier O D, Jain A K, Taxt T 1996 Feature extraction methods for character recognition -a survey.Pattern Recogn. 29: 641–662
Article Google Scholar
Vapnik V N 1995The nature of statistical learning theory (New York: Springer-Verlag)
MATH Google Scholar
Vapnik V N 1999 An overview of statistical learning theory.IEEE Trans. Neural Networks 10: 988–999
Article Google Scholar

Download references

Author information

T. V. Ashwin
Present address: Research Staff Member, IBM India Research Laboratories, I IT Campus, 110 016, New Delhi, India

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Science, 560 012, Bangalore, India
T. V. Ashwin & P. S. Sastry

Authors

T. V. Ashwin
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ashwin, T.V., Sastry, P.S. A font and size-independent OCR system for printed Kannada documents using support vector machines. Sadhana 27, 35–58 (2002). https://doi.org/10.1007/BF02703311

Download citation

Issue Date: February 2002
DOI: https://doi.org/10.1007/BF02703311

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A font and size-independent OCR system for printed Kannada documents using support vector machines

Abstract

Article PDF

Similar content being viewed by others

SVM with Inverse Fringe as Feature for Improving Accuracy of Telugu OCR Systems

OCR System for the Recognition of Ethiopic Real-Life Documents

An Overview of Recent Trends in OCR Systems for Manuscripts

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A font and size-independent OCR system for printed Kannada documents using support vector machines

Abstract

Article PDF

Similar content being viewed by others

SVM with Inverse Fringe as Feature for Improving Accuracy of Telugu OCR Systems

OCR System for the Recognition of Ethiopic Real-Life Documents

An Overview of Recent Trends in OCR Systems for Manuscripts

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation