Abstract
Combining data from different sources for further automatic processing is often hindered by differences in the underlying semantics and representation. Therefore when linking information presented in documents in tabular form with data held in databases, it is important to determine as much information about the table and its content. Important information about the table data is often given in the text surrounding the table in that document. The table’s creators cannot clarify all the semantics in the table itself therefore they use the table context or the text around it to give further information. These semantics are very useful when integrating and using this data, but are often difficult to detect automatically. We propose a solution to part of this problem based on a domain ontology. The input to our system is a document that contains tabular data and the system aims to find semantics in the document that are related to the tabular data. The output of our system is a set of detected semantics linked to the corresponding table. The system uses elements of semantic detection, semantic representation, and data integration. In this paper, we discuss the experiment used to evaluate the prototype system. We also discuss the different types of test, the experiment will perform. After using the system with the test data and gathering the results of these tests, we show the significant results in our experiment.
Chapter PDF
Similar content being viewed by others
References
Bornhovd, C.: Semantic Metadata for the integration of Web-based for Electronic Commerce. In: International Workshop on Advanced Issues of E-Commerce and Web-based Information Systems (WECWIS 1999), Santa Clara, California (1999)
Madnick, S.E.: From VLDB to VMLDB (Very Many Large Database: Dealing with Large-Scale Semantic Heterogeneity. In: Proc. 21st VLBD Conf. (1995)
Hori, O., Doermann, D.S.: Robust table-form structure analysis based on box-driven reasoning. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, p. 218. IEEE Computer Society, Los Alamitos (1995)
Niyogi, D.: A Knowledge-Based Approach to Deriving Logical Structure from Document Images, Department of Computer Science, SUNY Buffalo, pp. 94–35 (1994)
Hurst, M.: The Interpretation of Tables in Text, University of Edinburgh (2000)
Pyreddy, P., Bruce Croft, W.: TINTIN: A System for Retrieval in Text Tables. In: 2nd ACM International Conference on Digital Libraries (1997)
Yoshida, M., Torisawa, K., Tsujii, J.: A method to integrate tables of the World Wide Web. In: Proceedings of the First International Workshop on Web Document Analysis, Seattle, Washington. ICDAR 2001, pp. 31–34 (2001)
Diamantopoulos, A., Schlegelmilch, B.: Taking the Fear Out of Data Analysis. The Dryden Press, London (1997)
Alrashed, S., Gray, W.A.: Detection Approaches for Table Semantics in Text. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, p. 287. Springer, Heidelberg (2002)
Alrashed, S., Gray, W.A.: Semantic Detection for Tabular Data in Text. In: 7th World Multiconference on Systemics, Cybernetics and Informatics. IEEE Computer Society, Orlando (2003)
Alrashed, S., Gray, W.A.: Utilising Semantic Conversion Functions to Link Tabular Data. In: The 9th. International Conference on Information Systems Analysis and Synthesis: ISAS 2003. IEEE Computer Society, Los Alamitos (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alrashed, S.A. (2006). Finding Hidden Semantics of Text Tables. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_40
Download citation
DOI: https://doi.org/10.1007/11669487_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)