Abstract
This paper describes an OCR system for printed text documents in Kannada, a South Indian language. The input to the system would be the scanned image of a page of text and the output is a machine editable file compatible with most typesetting software. The system first extracts words from the document image and then segments the words into sub-character level pieces. The segmentation algorithm is motivated by the structure of the script. We propose a novel set of features for the recognition problem which are computationally simple to extract. The final recognition is achieved by employing a number of 2-class classifiers based on the Support Vector Machine (SVM) method. The recognition is independent of the font and size of the printed text and the system is seen to deliver reasonable performance.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Antani S, Agnihotri L 1999 Gujarathi character recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 418–421
Ashwin T V 2000A font and size independent OCR for printed Kannada using SVM. M E Project Report, Dept. Electrical Engg., Indian Institute of Science, Bangalore
Bansal V, Sinha R M K 1999 On how to describe shapes of Devanagari characters and use them for recognition. InProc. Fifth Int. Conf. on Document Analysis and Recognition, Bangalore (IEEE Computer Society Press) pp 410–13
Bosker M 1992 Omnidocument technologies.Proc. IEEE 80: 1066–1078
Burges C 1988 A tutorial on support vector machines for pattern recognition.Data Mining Knowledge Discovery 2: 121–167, available athttp://svm.research.bell-labs.com/papers/tutoriaL web -page.ps.gz.
Choudhury B B, Pal U 1997 An OCR system to read two Indian language scripts: Bangla and Devanagari. InProc. Fourth Int. Conf. on Document Analysis and Recognition (IEEE Computer Society Press) pp 1011–1015
Jagadeesh G S Gopinath V 2000 Kantex, a transliteration package for Kannada available at http://langmuir.eecs.berkeley.edur venkates/kantex_l.00.html).
Joachims T 1999a Making large-scale support vector machine learning practical. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available athttp://www-ai.cs.uni-dortmund.de/DOKUMENTE/joachims_99a.ps.gz
Joachims T 1999bSVMlight. http://www-ai.informatik.uni-dortmund.de/FORSCHUNG/VER-FAHREN/SVM_LIGHT/svm_light.eng.html
Keerthi S S, Shevade S K, Bhattacharyya C, Murthy K R K 2000 A fast iterative nearest point algorithm for support vector machine classifier design.IEEE Trans. Neural Networks 11: 124–136
Lee H J, Chen B 1992 Recognition of handwritten Chinese characters via short line segments.Pattern Recogn. 25: 543–552
Lu S W, Ren Y, Suen C Y 1991 Hierarchical attributed graph representation and recognition of handwritten Chinese characters.Pattern Recogn. 24: 617–632
Mangasarian O L, Musicant D R 1999 Successive overrelaxation for support vector machines.IEEE Trans. Neural Networks 10: 1032–1037
O’Gorman L, Kasturi R 1995Document image analysis (IEEE Computer Society Press)
Pavlidis T 1986 A vectorizer and feature extractor for document recognition.Comput. Vision Graphics Image Process. 35: 111–127
Platt J C 1999 Sequential minimal optimisation: A fast algorithm for training support vector machines. InAdvances in kernel methods -support vector learning (eds) B Scholkopf, C J C Burges, A Smola (Cambridge, MA: MIT Press) available athttp://www.research.microsoft.com/∼jplatt
Sekita I, Toraichi K, Mori R 1988 Feature extraction of hand written Japanese characters using spline functions and relaxation matching.Pattern Recogn. 21: 821–828
Sinha R M K, Mahabala H 1979 Machine recognition of Devanagari script.IEEE Trans. Syst., Man Cybern. 9: 435–149
Trier O D, Jain A K, Taxt T 1996 Feature extraction methods for character recognition -a survey.Pattern Recogn. 29: 641–662
Vapnik V N 1995The nature of statistical learning theory (New York: Springer-Verlag)
Vapnik V N 1999 An overview of statistical learning theory.IEEE Trans. Neural Networks 10: 988–999
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ashwin, T.V., Sastry, P.S. A font and size-independent OCR system for printed Kannada documents using support vector machines. Sadhana 27, 35–58 (2002). https://doi.org/10.1007/BF02703311
Issue Date:
DOI: https://doi.org/10.1007/BF02703311