Abstract
The processing of scanned documents calls for automatic recognition of the text by OCR (Optical Character Recognition) computer programs, followed by human validation and correction. Crowdsourcing of these essential manual tasks is a good option, provided one can take care of some key challenges, so that the quality level expected by the customer is met. We show how tools for efficient validation and correction are adapted and enhanced to address issues associated with crowdsourcing, such as data privacy, quality control, crowd monitoring, and job quality assurance. We started to implement these ideas and technologies in our COoperative eNgine for Correction of ExtRacted Text (CONCERT), which is used in book digitization projects.
Chapter PDF
Similar content being viewed by others
Keywords
References
US patent 5,455,875: System and method for correction of optical character recognition with display of image segments according to character data
Karnin, E., Walach, E.: Virtual Service Delivery Centers. Presented in Frontiers in Service 2007 conference (2007)
IMPACT Project, http://www.impact-project.eu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karnin, E.D., Walach, E., Drory, T. (2010). Crowdsourcing in the Document Processing Practice. In: Daniel, F., Facca, F.M. (eds) Current Trends in Web Engineering. ICWE 2010. Lecture Notes in Computer Science, vol 6385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16985-4_36
Download citation
DOI: https://doi.org/10.1007/978-3-642-16985-4_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16984-7
Online ISBN: 978-3-642-16985-4
eBook Packages: Computer ScienceComputer Science (R0)