Abstract.
A system for automatically identifying the script used in a handwritten document image is described. The system was developed using a 496-document dataset representing six scripts, eight languages, and 279 writers. Documents were characterized by the mean, standard deviation, and skew of five connected component features. A linear discriminant analysis was used to classify new documents, and tested using writer-sensitive cross-validation. Classification accuracy averaged 88% across the six scripts. The same method, applied within the Roman subcorpus, discriminated English and German documents with 85% accuracy.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Author information
Authors and Affiliations
Additional information
Received December 1, 1998 / Revised April 5, 1999
Rights and permissions
About this article
Cite this article
Hochberg, J., Bowers, K., Cannon, M. et al. Script and language identification for handwritten document images. IJDAR 2, 45–52 (1999). https://doi.org/10.1007/s100320050036
Issue Date:
DOI: https://doi.org/10.1007/s100320050036