Abstract.
Since optical character recognition systems often require very large amounts of training data for optimum performance, it is important to automate the process of finding ground truth character identities for document images. This is done by finding a transformation that matches a scanned image to the machine-readable document description that was used to print the original. Rather than depend on finding feature points, a more robust procedure is to follow up by using an optimization algorithm to refine the transformation. The function to optimize can be based on the character bounding boxes – it is not necessary to have access to the actual character shapes used when printing the original.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Author information
Authors and Affiliations
Additional information
Received 25 June, 1997 / Revised August 20, 1997
Rights and permissions
About this article
Cite this article
Hobby, J. Matching document images with ground truth. IJDAR 1, 52–61 (1998). https://doi.org/10.1007/s100320050006
Issue Date:
DOI: https://doi.org/10.1007/s100320050006