Abstract
In this paper, we want to show which difficulties arise when automatically constructing a domain-independent knowledge base from the web. We show possible applications for such a knowledge base to emphasize its importance. Current knowledge bases often use manually-built patterns for extraction and quality assurance which does not scale well. Our contribution to the community will be a technique to automatically assess extracted information to ensure high quality of the information and a method of how the knowledge base can be kept up to date. The research builds upon the existing WebKnox system for Web Knowledge Extraction which is able to extract named entities and facts from the web. This is a position paper.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open Information Extraction from the Web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670–2676 (2007)
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284(5), 28–37 (2001)
Downey, D., Etzioni, O., Soderland, S.: A Probabilistic Model of Redundancy in Information Extraction. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, pp. 1034–1041. Professional Book Center (2005)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)
Kasneci, G., Ramanath, M., Suchanek, F.M., Weikum, G.: The YAGO-NAGA approach to knowledge discovery. SIGMOD Record 37(4), 41–47 (2008)
Urbansky, D., Feldmann, M., Thom, J.A., Schill, A.: Entity Extraction from the Web withWebKnox. In: Proceedings of the Sixth Atlantic Web Intelligence Conference (to appear, 2009)
Urbansky, D., Thom, J.A., Feldmann, M.: WebKnox: Web Knowledge Extraction. In: Proceedings of the Thirteenth Australasian Document Computing Symposium, pp. 27–34 (2008)
Wang, R.C., Cohen, W.W.: Language-Independent Set Expansion of Named Entities Using the Web. In: The 2007 IEEE International Conference on Data Mining, pp. 342–350 (2007)
Wu, M., Marian, A.: Corroborating Answers from Multiple Web Sources. In: Proceedings of the 10th International Workshop on Web and Databases (WebDB 2007) (2007)
Zhao, S., Betz, J.: Corroborate and Learn Facts from the Web. In: KDD 2007: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge discovery and data mining, pp. 995–1003. ACM, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Urbansky, D. (2009). Automatic Construction of a Semantic, Domain-Independent Knowledge Base. In: Meersman, R., Herrero, P., Dillon, T. (eds) On the Move to Meaningful Internet Systems: OTM 2009 Workshops. OTM 2009. Lecture Notes in Computer Science, vol 5872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05290-3_96
Download citation
DOI: https://doi.org/10.1007/978-3-642-05290-3_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05289-7
Online ISBN: 978-3-642-05290-3
eBook Packages: Computer ScienceComputer Science (R0)