A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources

Dutta, Arnab; Meilicke, Christian; Ponzetto, Simone Paolo

doi:10.1007/978-3-319-07443-6_20

Arnab Dutta²¹,
Christian Meilicke²¹ &
Simone Paolo Ponzetto²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8465))

Included in the following conference series:

European Semantic Web Conference

2487 Accesses
9 Citations

Abstract

Open Information Extraction (OIE) systems like Nell and ReVerb have achieved impressive results by harvesting massive amounts of machine-readable knowledge with minimal supervision. However, the knowledge bases they produce still lack a clean, explicit semantic data model. This, on the other hand, could be provided by full-fledged semantic networks like DBpedia or Yago, which, in turn, could benefit from the additional coverage provided by Web-scale IE. In this paper, we bring these two strains of research together, and present a method to align terms from Nell with instances in DBpedia. Our approach is unsupervised in nature and relies on two key components. First, we automatically acquire probabilistic type information for Nell terms given a set of matching hypotheses. Second, we view the mapping task as the statistical inference problem of finding the most likely coherent mapping – i.e., the maximum a posteriori (MAP) mapping – based on the outcome of the first component used as soft constraint. These two steps are highly intertwined: accordingly, we propose an approach that iteratively refines type acquisition based on the output of the mapping generator, and vice versa. Experimental results on gold-standard data indicate that our approach outperforms a strong baseline, and is able to produce ever-improving mappings consistently across iterations.

Download to read the full chapter text

Chapter PDF

DuIE: A Large-Scale Chinese Dataset for Information Extraction

From RDF to Natural Language and Back

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Keywords

#eswc2014Dutta

References

Agirre, E., de Lacalle, O.L., Soroa, A.: Knowledge-based WSD on specific domains: performing better than generic supervised WSD. In: Proc. of IJCAI (2009)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: Proc. of IJCAI 2007 (2007)
Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia – A crystallization point for the web of data. Journal of Web Semantics 7(3) (2009)
Google Scholar
Brin, S.: Extracting patterns and relations from the World Wide Web. In: Proc. of WebDB Workshop at EDBT 1998 (1998)
Google Scholar
Bunescu, R., Paşca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proc. of EACL 2006 (2006)
Google Scholar
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proc. of AAAI (2010)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proc. of EMNLP-CoNLL 2007 (2007)
Google Scholar
Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proc. of COLING 2010 (2010)
Google Scholar
Dutta, A., Niepert, M., Meilicke, C., Ponzetto, S.P.: Integrating open and closed information extraction: Challenges and first steps. In: Proc. of the ISWC 2013 NLP and DBpedia workshop (2013)
Google Scholar
Etzioni, O.: Search needs a shake-up.. Nature 476(7358), 25–26 (2011)
Article Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in KnowItAll (Preliminary results). In: Proc. of WWW (2004)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proc. of EMNLP (2011)
Google Scholar
Galárraga, L.A., Preda, N., Suchanek, F.M.: Mining rules to align knowledge bases. In: Proc. of AKBC 2013 (2013)
Google Scholar
Hoffmann, R., Zhang, C., Weld, D.S.: Learning 5000 relational extractors. In: Proc. of ACL 2010 (2010)
Google Scholar
Ji, H., Grishman, R.: Knowledge base population: Successful approaches and challenges. In: Proc. of ACL 2011(2011)
Google Scholar
Jiang, S., Lowd, D., Dou, D.: Learning to refine an automatically extracted knowledge base using markov logic. In: Proc. of ICDM 2012 (2012)
Google Scholar
Navigli, R.: Word Sense Disambiguation: A survey. ACM Computing Surveys 41(2), 1–69 (2009)
Article Google Scholar
Navigli, R., Ponzetto, S.P.: Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250 (2012)
Article MATH MathSciNet Google Scholar
Niepert, M., Meilicke, C., Stuckenschmidt, H.: A probabilistic-logical framework for ontology matching. In: Proc. of AAAI (2010)
Google Scholar
Noessner, J., Niepert, M., Stuckenschmidt, H.: RockIt: Exploiting parallelism and symmetry for map inference in statistical relational models. In: Proc. of AAAI (2013)
Google Scholar
Paşca, M., Van Durme, B.: Weakly-supervised acquisition of open-domain classes and class attributes from Web documents and query logs. In: Proc. of ACL 2008 (2008)
Google Scholar
Ponzetto, S.P., Navigli, R.: Knowledge-rich Word Sense Disambiguation rivaling supervised systems. In: Proc. of ACL 2010 (2010)
Google Scholar
Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2) (2006)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: Proc. of WWW 2007. ACM Press (2007)
Google Scholar
Völker, J., Niepert, M.: Statistical schema induction. In: Proc. of ESWC 2011 (2011)
Google Scholar
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - A Link Discovery Framework for the Web of Data. In: Proc. of LDOW 2009 (2009)
Google Scholar
Wu, F., Weld, D.: Open information extraction using Wikipedia. In: Proc. of ACL 2010 (2010)
Google Scholar
Yang Chen, D.Z.W.: Web-scale knowledge inference using markov logic networks. In: ICML workshop on Structured Learning: Inferring Graphs from Structured and Unstructured Inputs (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Data and Web Science, University of Mannheim, Germany
Arnab Dutta, Christian Meilicke & Simone Paolo Ponzetto

Authors

Arnab Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Christian Meilicke
View author publications
You can also search for this author in PubMed Google Scholar
Simone Paolo Ponzetto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Cognitive Sciences and Technologies, Semantic Technology Laboratory, ISTC-CNR, Via Nomentana 56, 00161, Rome, Italy
Valentina Presutti
Department of Compter Science, University of Bari, Via Orabona, 4, 70125, Bari, Italia
Claudia d’Amato
Wimmics Research Team at Inria, University of Nice - Sophia Antipolis, Route des Lucioles, BP 93, 06902, Sophia Antipolis, France
Fabien Gandon
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Mathieu d’Aquin
Institute for Web Science and Technologies, University of Koblenz, Universitätsstraße 1, 56016, Koblenz, Germany
Steffen Staab
Elsevier B.V., Radarweg 29, 1043 NX, Amsterdam, The Netherlands
Anna Tordai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dutta, A., Meilicke, C., Ponzetto, S.P. (2014). A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds) The Semantic Web: Trends and Challenges. ESWC 2014. Lecture Notes in Computer Science, vol 8465. Springer, Cham. https://doi.org/10.1007/978-3-319-07443-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-07443-6_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07442-9
Online ISBN: 978-3-319-07443-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources

Abstract

Chapter PDF

Similar content being viewed by others

DuIE: A Large-Scale Chinese Dataset for Information Extraction

From RDF to Natural Language and Back

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources

Abstract

Chapter PDF

Similar content being viewed by others

DuIE: A Large-Scale Chinese Dataset for Information Extraction

From RDF to Natural Language and Back

SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation