A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud

Gottron, Thomas; Knauf, Malte; Scheglmann, Stefan; Scherp, Ansgar

doi:10.1007/978-3-642-38288-8_16

Thomas Gottron²¹,
Malte Knauf²¹,
Stefan Scheglmann²¹ &
…
Ansgar Scherp²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7882))

Included in the following conference series:

Extended Semantic Web Conference

3756 Accesses
10 Citations

Abstract

Schema information about resources in the Linked Open Data (LOD) cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources’ properties. In this paper, we present a method and metrics to analyse the information theoretic properties and the correlation between the two manifestations of schema information. Furthermore, we actually perform such an analysis on large-scale linked data sets. To this end, we have extracted schema information regarding the types and properties defined in the data set segments provided for the Billion Triples Challenge 2012. We have conducted an in depth analysis and have computed various entropy measures as well as the mutual information encoded in the two types of schema information. Our analysis provides insights into the information encoded in the different schema characteristics. Two major findings are that implicit schema information is far more discriminative and that applications involving schema information based on either types or properties alone will only capture between 63.5% and 88.1% of the schema information contained in the data. Based on these observations, we derive conclusions about the design of future schemas for LOD as well as potential application scenarios.

Download to read the full chapter text

Chapter PDF

Adoption of the Linked Data Best Practices in Different Topical Domains

Quality Metrics for Linked Open Data

Linked Data Finland: A 7-star Model and Platform for Publishing and Re-using Linked Datasets

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Alexander, K., Cyganiak, R., Hausenblas, M., Zhao, J.: Describing linked datasets with the void vocabulary, http://www.w3.org/TR/void/ (accessed March 9, 2013)
Auer, S., Demter, J., Martin, M., Lehmann, J.: LODStats – an extensible framework for high-performance dataset analytics. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 353–362. Springer, Heidelberg (2012)
Chapter Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience (1991)
Google Scholar
Gottron, T., Pickhardt, R.: A detailed analysis of the quality of stream-based schema construction on linked open data. In: CSWS 2012: Proceedings of the Chinese Semantic Web Symposium (2012) (to appear)
Google Scholar
Gottron, T., Scherp, A., Krayer, B., Peters, A.: Get the google feeling: Supporting users in finding – relevant sources of linked open data at web-scale. In: Semantic Web Challenge, Submission to the Billion Triple Track (2012)
Google Scholar
Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K.U., Umbrich, J.: Data summaries for on-demand queries over linked data. In: WWW, pp. 411–420. ACM (2010)
Google Scholar
Heath, T., Bizer, C.: Linked Data: Evolving the Web Into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology. Morgan & Claypool (2011)
Google Scholar
Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. Web Semantics: Science, Services and Agents on the World Wide Web 14, 14–44 (2012)
Article Google Scholar
Konrath, M., Gottron, T., Scherp, A.: Schemex – web-scale indexed schema extraction of linked open data. In: Semantic Web Challenge, Submission to the Billion Triple Track (2011)
Google Scholar
Konrath, M., Gottron, T., Staab, S., Scherp, A.: Schemex—efficient construction of a data catalogue by stream-based indexing of linked data. Web Semantics: Science, Services and Agents on the World Wide Web 16, 52–58 (2012); The Semantic Web Challenge 2011
Article Google Scholar
Lorey, J., Abedjan, Z., Naumann, F., Böhm, C.: Rdf ontology (re-) engineering through large-scale data mining. In: Semantic Web Challenge (2011)
Google Scholar
Luo, X., Shinavier, J.: Entropy-based metrics for evaluating schema reuse. In: Gómez-Pérez, A., Yu, Y., Ding, Y. (eds.) ASWC 2009. LNCS, vol. 5926, pp. 321–331. Springer, Heidelberg (2009)
Chapter Google Scholar
Maduko, A., Anyanwu, K., Sheth, A., Schliekelman, P.: Graph summaries for subgraph frequency estimation. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 508–523. Springer, Heidelberg (2008)
Chapter Google Scholar
Neumann, T., Moerkotte, G.: Characteristic sets: Accurate cardinality estimation for rdf queries with multiple joins. In: Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, Hannover, Germany, April 11-16, pp. 984–994 (2011)
Google Scholar
Neumann, T., Weikum, G.: Scalable join processing on very large rdf graphs. In: SIGMOD Conference, pp. 627–640. ACM (2009)
Google Scholar
Schaible, J., Gottron, T., Scheglmann, S., Scherp, A.: LOVER: Support for Modeling Data Using Linked Open Vocabularies. In: LWDM 2013: 3rd International Workshop on Linked Web Data Management (to appear, 2013)
Google Scholar
Shannon, C.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)
Google Scholar
Yao, Y.Y.: Information-theoretic measures for knowledge discovery and data mining. In: Karmeshu (ed.) Entropy Measures, Maximum Entropy Principle and Emerging Applications. STUDFUZZ, vol. 119, pp. 115–136. Springer, Heidelberg (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

WeST – Institute for Web Science and Technologies, University of Koblenz-Landau, 56070, Koblenz, Germany
Thomas Gottron, Malte Knauf, Stefan Scheglmann & Ansgar Scherp

Authors

Thomas Gottron
View author publications
You can also search for this author in PubMed Google Scholar
Malte Knauf
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Scheglmann
View author publications
You can also search for this author in PubMed Google Scholar
Ansgar Scherp
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CITEC, University of Bielefeld, 33615, Bielefeld, Germany
Philipp Cimiano
Universidad Politécnica de Madrid, 28660, Boadilla del Monte, Spain
Oscar Corcho
National Research Council, 00136, Rome, Italy
Valentina Presutti
Vrije Universiteit Amsterdam, 1081 HV, Amsterdam, The Netherlands
Laura Hollink
Technische Universität Dresden, 01069, Dresden, Germany
Sebastian Rudolph

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gottron, T., Knauf, M., Scheglmann, S., Scherp, A. (2013). A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds) The Semantic Web: Semantics and Big Data. ESWC 2013. Lecture Notes in Computer Science, vol 7882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38288-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-38288-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38287-1
Online ISBN: 978-3-642-38288-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud

Abstract

Chapter PDF

Similar content being viewed by others

Adoption of the Linked Data Best Practices in Different Topical Domains

Quality Metrics for Linked Open Data

Linked Data Finland: A 7-star Model and Platform for Publishing and Re-using Linked Datasets

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud

Abstract

Chapter PDF

Similar content being viewed by others

Adoption of the Linked Data Best Practices in Different Topical Domains

Quality Metrics for Linked Open Data

Linked Data Finland: A 7-star Model and Platform for Publishing and Re-using Linked Datasets

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation