Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web

Krause, Sebastian; Li, Hong; Uszkoreit, Hans; Xu, Feiyu

doi:10.1007/978-3-642-35176-1_17

Sebastian Krause²⁶,
Hong Li²⁶,
Hans Uszkoreit²⁶ &
…
Feiyu Xu²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7649))

Included in the following conference series:

International Semantic Web Conference

4074 Accesses
25 Citations

Abstract

We present a large-scale relation extraction (RE) system which learns grammar-based RE rules from the Web by utilizing large numbers of relation instances as seed. Our goal is to obtain rule sets large enough to cover the actual range of linguistic variation, thus tackling the long-tail problem of real-world applications. A variant of distant supervision learns several relations in parallel, enabling a new method of rule filtering. The system detects both binary and n-ary relations. We target 39 relations from Freebase, for which 3M sentences extracted from 20M web pages serve as the basis for learning an average of 40K distinctive rules per relation. Employing an efficient dependency parser, the average run time for each relation is only 19 hours. We compare these rules with ones learned from local corpora of different sizes and demonstrate that the Web is indeed needed for a good coverage of linguistic variation.

Download to read the full chapter text

Chapter PDF

Using Distant Supervision for Extracting Relations on a Large Scale

Language-Agnostic Relation Extraction from Wikipedia Abstracts

Deep Distant Supervision: Learning Statistical Relational Models for Weak Supervision in Natural Language Extraction

Keywords

References

Agichtein, E.: Confidence estimation methods for partially supervised information extraction. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) SDM 2006. SIAM (2006)
Google Scholar
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM (2000)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: Veloso, M.M. (ed.) IJCAI 2007, pp. 2670–2676 (2007)
Google Scholar
Berners-Lee, T.: Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. HarperCollins, New York (1999)
Google Scholar
Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)
Chapter Google Scholar
Carlson, A., Betteridge, J., Hruschka Jr., E.R., Mitchell, T.M.: Coupling semi-supervised learning of categories and relations. In: NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pp. 1–9 (2009)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artif. Intell. 165, 91–134 (2005)
Article Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: ACL 2005 (2005)
Google Scholar
Grishman, R., Sundheim, B.: Message understanding conference - 6: A brief history. In: COLING 1996, pp. 466–471 (1996)
Google Scholar
Hoffmann, R., Zhang, C., Weld, D.S.: Learning 5000 relational extractors. In: ACL 2010, pp. 286–295 (2010)
Google Scholar
Hovy, E.H., Kozareva, Z., Riloff, E.: Toward completeness in concept extraction and classification. In: EMNLP 2009, pp. 948–957 (2009)
Google Scholar
Kozareva, Z., Hovy, E.H.: A semi-supervised method to learn and construct taxonomies using the Web. In: EMNLP 2010, pp. 1110–1118 (2010)
Google Scholar
Kozareva, Z., Riloff, E., Hovy, E.H.: Semantic class learning from the Web with hyponym pattern linkage graphs. In: ACL 2008, pp. 1048–1056 (2008)
Google Scholar
McDonald, R., Pereira, F., Kulick, S., Winters, S., Jin, Y., White, P.: Simple algorithms for complex relation extraction with applications to biomedical IE. In: ACL 2005 (2005)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Su, K.Y., Su, J., Wiebe, J. (eds.) ACL/IJCNLP 2009, pp. 1003–1011 (2009)
Google Scholar
Nguyen, T.V.T., Moschitti, A.: End-to-end relation extraction using distant supervision from external semantic repositories. In: ACL 2011, Short Papers, pp. 277–282 (2011)
Google Scholar
Pantel, P., Ravichandran, D., Hovy, E.: Towards terascale semantic acquisition. In: COLING 2004 (2004)
Google Scholar
Parker, R., et al.: English Gigaword Fifth Edition. Linguistic Data Consortium, Philadelphia (2011)
Google Scholar
Pasca, M., Lin, D., Bigham, J., Lifchits, A., Jain, A.: Names and similarities on the web: Fact extraction in the fast lane. In: ACL/COLING 2006 (2006)
Google Scholar
Ravichandran, D., Hovy, E.H.: Learning surface text patterns for a question answering system. In: ACL 2002, pp. 41–47 (2002)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A large ontology from Wikipedia and WordNet. J. Web. Semant. 6, 203–217 (2008)
Article Google Scholar
Surdeanu, M., Gupta, S., Bauer, J., McClosky, D., Chang, A.X., Spitkovsky, V.I., Manning, C.D.: Stanford’s distantly-supervised slot-filling system. In: Proceedings of the Fourth Text Analysis Conference (2011)
Google Scholar
Uszkoreit, H.: Learning Relation Extraction Grammars with Minimal Human Intervention: Strategy, Results, Insights and Plans. In: Gelbukh, A. (ed.) CICLing 2011, Part II. LNCS, vol. 6609, pp. 106–126. Springer, Heidelberg (2011)
Chapter Google Scholar
Volokh, A.: MDParser. Tech. rep., DFKI GmbH (2010)
Google Scholar
Volokh, A., Neumann, G.: Comparing the benefit of different dependency parsers for textual entailment using syntactic constraints only. In: SemEval-2 Evaluation Exercises on Semantic Evaluation PETE (2010)
Google Scholar
Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia (2006)
Google Scholar
Weld, D.S., Hoffmann, R., Wu, F.: Using Wikipedia to bootstrap open information extraction. SIGMOD Record 37, 62–68 (2008)
Article Google Scholar
Wu, F., Hoffmann, R., Weld, D.S.: Information extraction from Wikipedia: moving down the long tail. In: KDD 2009, pp. 731–739 (2008)
Google Scholar
Xu, F.: Bootstrapping Relation Extraction from Semantic Seeds. Ph.D. thesis, Saarland University (2007)
Google Scholar
Xu, F., Uszkoreit, H., Krause, S., Li, H.: Boosting relation extraction with limited closed-world knowledge. In: COLING 2010, Posters, pp. 1354–1362 (2010)
Google Scholar
Xu, F., Uszkoreit, H., Li, H.: A seed-driven bottom-up machine learning framework for extracting relations of various complexity. In: ACL 2007 (2007)
Google Scholar
Xu, W., Grishman, R., Zhao, L.: Passage retrieval for information extraction using distant supervision. In: IJCNLP 2011, pp. 1046–1054 (2011)
Google Scholar
Yangarber, R.: Counter-training in discovery of semantic patterns. In: ACL 2003. pp. 343–350 (2003)
Google Scholar
Yangarber, R., Grishman, R., Tapanainen, P.: Automatic acquisition of domain knowledge for information extraction. In: COLING 2000, pp. 940–946 (2000)
Google Scholar
Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: TextRunner: open information extraction on the Web. In: HLT-NAACL 2007, Demonstrations, pp. 25–26 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Language Technology Lab, DFKI, Alt-Moabit 91c, Berlin, Germany
Sebastian Krause, Hong Li, Hans Uszkoreit & Feiyu Xu

Authors

Sebastian Krause
View author publications
You can also search for this author in PubMed Google Scholar
Hong Li
View author publications
You can also search for this author in PubMed Google Scholar
Hans Uszkoreit
View author publications
You can also search for this author in PubMed Google Scholar
Feiyu Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Fribourg, Switzerland
Philippe Cudré-Mauroux
Lehigh University, 18015, Bethlehem, PA, USA
Jeff Heflin
Clark & Parsia, 20001, Washington, DC, USA
Evren Sirin
Stanford University, CA, USA
Tania Tudorache
INRIA & LIG, Le Cesnay Cedex, France
Jérôme Euzenat
National University of Ireland, DERI, Galway, Ireland
Manfred Hauswirth & Josiane Xavier Parreira &
Rensselaer Polytechnic Institute (RPI), Troy, NY, USA
Jim Hendler
VU University Amsterdam, The Netherlands
Guus Schreiber
University of Zurich, Switzerland
Abraham Bernstein
Linköping University, Sweden
Eva Blomqvist

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Krause, S., Li, H., Uszkoreit, H., Xu, F. (2012). Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web. In: Cudré-Mauroux, P., et al. The Semantic Web – ISWC 2012. ISWC 2012. Lecture Notes in Computer Science, vol 7649. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35176-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-35176-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35175-4
Online ISBN: 978-3-642-35176-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web

Abstract

Chapter PDF

Similar content being viewed by others

Using Distant Supervision for Extracting Relations on a Large Scale

Language-Agnostic Relation Extraction from Wikipedia Abstracts

Deep Distant Supervision: Learning Statistical Relational Models for Weak Supervision in Natural Language Extraction

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web

Abstract

Chapter PDF

Similar content being viewed by others

Using Distant Supervision for Extracting Relations on a Large Scale

Language-Agnostic Relation Extraction from Wikipedia Abstracts

Deep Distant Supervision: Learning Statistical Relational Models for Weak Supervision in Natural Language Extraction

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation