Abstract
Ground truthing tools mainly fall into two categories: automatic and semi-automatic. In this paper, we first discuss the pros and cons of the two approaches. We then report our own work on designing and implementing systems for generating a chart image dataset and multi-level ground truth data. Both semi-automatic and automatic approaches were adopted, resulting in two independent systems. The dataset as well as the ground truth data are publicly available so that other researchers can access them for evaluating and comparing performances of different systems.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Nagy, G.: Twenty years of Document Image Analysis in PAMI. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(1), 38–62 (2000)
Yang, L., Huang, W.H., Tan, C.L.: Semi-automatic ground truth generation for chart image recognition. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 324–335. Springer, Heidelberg (2006)
Haralick, R.M., et al.: UW English document image database I: A database of document images for OCR research. UW CD-ROM
Haralick, R. M. et al: UW-II English/Japanese Document Image Database: A Database of Document Images for OCR Research, http://www.science.uva.nl/research/dlia/datasets/uwash2.html
Phillips, I.: Users’ reference manual. CD-ROM, UW-III Document Image Database-III (1995)
Wang, Y., Haralick, R.M., Phillips, I.T.: Automatic Table Ground Truth Generation and a Background-Analysis-Based Table Structure Extraction Method. In: 6th Int. Conf. on Document Analysis and Recognition, ICDAR 2001, Seattle, pp. 528–532 (2001)
Zi, G., Doermann, D.: Document Image Ground Truth Generation from Electronic Text. In: 17th Int. Conf. on Pattern Recognition, ICPR 2004, vol. 2, pp. 663–666 (2004)
Yacoub, S., Saxena, V., Sami, S.: PerfectDoc: A Ground Truthing Environment for Complex Documents. In: 8th Int. Conf. on Document Analysis and Recognition, vol. 1, pp. 452–456 (2005)
Suzuki, M., Suzuki, S., Nomura, A.: A Ground-Truthed Mathematical Character and Symbol Image Database. In: 8th Int. Conf. on Document Analysis and Recognition, vol. 2, pp. 675–679 (2005)
Baird, H.S.: Document Image Defect Models. In: Proceedings of IAPR Workshop on Syntactic and Structural Pattern Recognition, Murray Hill, NJ; Reprinted in: Baird, H.S., Bunke, H., Yamamoto, K.: Structured Document Image Analysis, pp. 546–556. Springer, New York (1990)
Zhai, J., Liu, W.Y., Dori, D., Li, Q.: A Line Drawings Degradation Model for Performance Characterization. In: 7th International Conference on Document Analysis and Recognition, Edinburgh, Scotland (2003)
Gonzalez, R.C., Wintz, P.: Digital Image Processing, 2nd edn. Addison-Wesley Publishing Company, Reading (1987)
William, H.P., Saul, A.T., William, T.V., Brian, P.F.: Numerical recipes in C++: The Art of Scientific Computing. Cambridge University Press, New York (2002)
Ross, S.M.: A Course in Simulation. Macmillan Publishing Company, New York (1990)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, W., Tan, C.L., Zhao, J. (2008). Generating Ground Truthed Dataset of Chart Images: Automatic or Semi-automatic?. In: Liu, W., Lladós, J., Ogier, JM. (eds) Graphics Recognition. Recent Advances and New Opportunities. GREC 2007. Lecture Notes in Computer Science, vol 5046. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88188-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-88188-9_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88184-1
Online ISBN: 978-3-540-88188-9
eBook Packages: Computer ScienceComputer Science (R0)