Abstract
We present a general framework for information extraction from web pages based on a special wrapper language, called token-templates. By using token-templates in conjunction with logic programs we are able to reason about web page contents, search and collect facts and derive new facts from various web pages. We give a formal definition for the semantics of logic programs extended by token-templates and define a general answer-complete calculus for these extended programs. These methods and techniques are used to build intelligent mediators and web information systems.
Preview
Unable to display preview. Download preview PDF.
References
M. E. Califf and R. J. Mooney. Relational Learning of Pattern-Match Rules for Information Extraction. In Working Papers of the ACL-97 Workshop in Natural Language Learning, 1997.
B. Carpenter. Typed Feature Structures: an Extension of First-order Terms. In Proceedings of the International Symposium on Logic Programming, 1991. San Diego.
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, Y. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proceedings of IPSJ, 1994. Japan.
O. Etzioni. Moving Up the Information Food Chain. AI Magazine, 18(2):11–18, Summer 1997.
M. R. Genesereth, A. M. Keller, and O. Duschka. Informaster: An Information Integration System. Proceedings of ACM SIGMOD Conference, May 1997.
J. Gruser, L. Raschid, M. Vidal, and L. Bright. A wrapper generation toolkit to specify and construct wrappers for web accesible data. Technical report, UMIACS, University of Maryland, 1998.
J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting semistructured information from the web. In In Proceedings of the Workshop on Management of Semistructured Data, May 1997.
D. Konopnicki and O. Shmueli. W3QS: A query system for the world-wide web. In Proceedings of VLDB’95, 1995.
N. Kushmerick, D. S. Weld, and R. Doorenbos. Wrapper Induction for Information Extraction. In M. E. Pollack, editor, Fifteenth International Joint Conference on Artificial Intelligence, volume 1, pages 729–735, August 1997. Japan.
A. Y. Levy, A. Rajaraman, and J. J. Ordille. Querying Heterogeneous Information Sources Using Source Descriptions. In Proceedings of the 22nd VLDB Conference, 1996. Mumbai (Bombay), India.
J. Lloyd. Foundations of Logic Programming. Springer-Verlag, 2 edition, 1987.
S. M. Shieber. An Introduction to Unification-Based Approaches to Grammar. CSLI, Leland Stanford Junior University, 1986. CSLI Lecture Notes 4.
M. Stickel. Automated Deduction by Theory Resolution. Journal of Automated Reasoning, 1:333–355, 1985.
V. Subrahmanian, S. Adali, A. Brink, R. Emery, J. J. Lu, A. Rajput, T. J. Rogers, R. Ross, and C. Ward. HERMES: A Heterogeneous Reasoning and Mediator System, 1996. http://www.cs.umd.edu//projects/hermes/overview/paper/index.html.
B. Thomas. Intelligent Web Querying with Logic Programs. In J. Dix and S. Hölldobler, editors, Proceedings of the Workshop on Inference Systems in Knowledge-based Systems, preceding the national German AI conference KI’98, Bremen, Germany. University of Koblenz, TR 10/98, August 1998.
G. Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, pages 38–49, March 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thomas, B. (1999). Logic programs for intelligent web search. In: Raś, Z.W., Skowron, A. (eds) Foundations of Intelligent Systems. ISMIS 1999. Lecture Notes in Computer Science, vol 1609. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0095104
Download citation
DOI: https://doi.org/10.1007/BFb0095104
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65965-5
Online ISBN: 978-3-540-48828-6
eBook Packages: Springer Book Archive