Abstract
Data charts can be used to effectively compress large amounts of complex information and can convey information in an efficient and succinct manner. It is now easier to create data charts by using a variety of automated software systems. These data charts are routinely inserted in text documents and are widely disseminated over many different media. This study addresses the problem of finding goodness of data charts in mixed-mode documents. The quality of the graphics can be used to assist the document development process as well as to serve as an additional criterion for search engines like Google and Yahoo. The quality measures are motivated by principles of visual learning and are based on research in educational psychology and cognitive theories and use attributes of both the graphic and its textual context. We have implemented the approach and evaluated its effectiveness using a set of documents compiled from the Web. Results of a human study shows that the proposed quality measures have a high correlation with the quality ratings of the users for each of the five classes of data charts studied in this research.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Arnheim R.: Entropy and Art, Disorder and Order. University Of California Press, Berkeley (1971)
Bertin J.: Semiology of Graphics. University of Wisconsin Press, Wisconsin (1983)
Black, P.E.: Dictionary of Algorithms and Data Structures. NIST (2005)
Boyer R.S., Moore J.S.: A fast string searching algorithm. Commun. ACM 20, 762–772 (1977)
Canny J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Machine Intell. 8(6), 679–698 (1986)
Carriero, C., Futrelle, R., Nikolakis, N., Tselman, M.: Informational diagrams in scientific documents, AAAI Symposium: Reasoning with Diagrammatic Representations, Stanford University, pp. 185–188 (1992)
Coll R., Coll J., Thakur G.: Graphs and tables: A four-factor experiment. Commun. ACM 37(4), 77–86 (1994)
Crochemore M., Rytter W.: Text Algorithms. Oxford University Press, New York (1994)
Doermann, D.: An introduction to vectorization and segmentation. In: Graphics Recognition: Algorithms and Systems, Lecture Notes in Computer Science, vol. 1389, pp. 1–8. Springer, Berlin (1998)
Futrelle, R.: The conversion of diagrams to knowledge bases. In: IEEE Workshop on Visual Languages, pp. 240–242 (1992)
Futrelle, R., Nikolakis, N.: Efficient analysis of complex diagrams using constraint-based parsing. In: Proceedings of Third International Conference on Document Analysis and Recognition, Montreal, Canada, pp. 782–790 (1995)
Futrelle, R.: Ambiguity in visual language theory and its role in diagram parsing. In: Proceedings of 1999 IEEE Symposium on Visual Languages, Tokyo, Japan, pp. 172–175 (1999)
Jain R., Kasturi R., Schunk B.: Machine Vision. McGraw-Hill, New York (1995)
Kaneko T.: Line structure extraction form line-drawing images. Pattern Recognit. 25(9), 963–971 (1992)
Kasturi R., Tombre K. (eds): Graphics recognition: methods and applications, Lecture Notes in Computer Science, vol. 1072. pp.190–203, (1996)
Knuth D.E., Morris J.H., Pratt V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Lewandowsky S., Spense I.: The perception of statistical graphs. Sociol. Methods Res. 18(2, 3), 200–242 (1989/1990)
Li, L.: Adaptive text/line separation in document images based on vectorization and OCR, MS Thesis, University of Nebraska Lincoln (1998)
Li L., Nagy G., Samal A., Seth S., Xu Y.: Integrated text and line-art extraction from a topographic map. Int. J. Document Anal. Recognit. 2(4), 177–185 (2000)
Martinez-Perez M.P., Jimenez J., Navalon J.L.: Thinning algorithm based on contours. Comput. Vis. Image Process. 39, 186–201 (1987)
Mayer R.E.: Multimedia Learning. Cambridge University Press, New York (2001)
Myers G.K., Mulgaonkar P., Chen C., DeCurtins J., Chen E.: Verification based approach for automated text and feature extraction from raster-scanned maps. In: Kasturi, R., Tombre, K. (eds) Graphics Recognition: Methods and Applications. Lecture Notes in Computer Science, vol. 1072, pp. 190–203. Springer, Berlin (1996)
Nagasamy V., Langrana N.A.: Engineering drawing, processing and vectorization system. Comput. Vis. Graph. Image Process. 49(3), 379–397 (1990)
Nagy, G., Xu, Y.: Automatic prototype extraction for OCR. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Ulm (1997)
Otsu N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Petre M.: Why looking isn’t always seeing: Readership skills and graphical programming. Commun. ACM 38(6), 33–44 (1995)
Porter M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Roth, S., Mattis, J.: Data characterization for intelligent graphics presentation. In: Proceedings of ACM SIGCHI 1990 Conference on Human Factors in Computing Systems, Seattle, Washington, pp. 193–200 (1990)
Shapiro L.G., Stockman G.C.: Computer Vision. Prentice Hall, Upper Saddle River (2001)
Shimotsuji, S.: A Robust Drawing recognition system based on contour shape analysis. In: 10th International conference on Pattern Recognition, pp. 717–719 (1990)
Smith R.W.: Computer processing of line images: A survey. Pattern Recognit. 20, 7–15 (1987)
Sonka M., Hlavac V., Boyle R.: Image Procession, Analysis and Machine Vision. Brooks/Cole, Pacific Grove (1999)
Strothotte C., Strothotte T.: Seeing between the Pixels, Pictures in Interactive Systems. Springer, Berlin (1997)
Suzuki, S.: Graph based vectorization method for line patterns. In: IEEE Computer Vision and Pattern Recognition, pp. 616–621 (1998)
Tufte E.R.: The Visual Display of Quantitative Information. Graphics Press, Chesire (1983)
Tufte E.R.: Envisioning Information. Graphics Press, Chesire (1990)
Tufte E.R.: Visual Explanations: Images and Quantities, Evidence and Narrative. Graphics Press, Cheshire (1997)
Umbaugh S.: Computer Imaging: Digital Image Analysis and Processing. Taylor & Francis, New York (2005)
Weisstein, E.W.: Zipf’s Law, MathWorld (1999)
Xu, Y.: Prototype Extraction and OCR. Ph.D. Thesis, Rensselaer Polytechnic Institute, Troy (1998)
Zhou, Y., Tan, C.: Bar charts recognition using Hough based syntactic segmentation. In: Proceedings of Diagrams 2000, LNAI, vol. 1889, pp. 494–497. Springer, Berlin (2000)
Zhou, Y., Tan, C.: Chart analysis and recognition in document images. In: Procedings of Sixth International Conference on Document Analysis and Recognition (ICDAR 2001), Seattle, Washington, pp. 1055–1058. IEEE Computer Society Press, New York (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shukla, S., Samal, A. Recognition and quality assessment of data charts in mixed-mode documents. IJDAR 11, 111–126 (2008). https://doi.org/10.1007/s10032-008-0065-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-008-0065-5