Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis

Bizer, Christian; Eckert, Kai; Meusel, Robert; Mühleisen, Hannes; Schuhmacher, Michael; Völker, Johanna

doi:10.1007/978-3-642-41338-4_2

Christian Bizer²⁶,
Kai Eckert²⁶,
Robert Meusel²⁶,
Hannes Mühleisen²⁷,
Michael Schuhmacher²⁶ &
…
Johanna Völker²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8219))

Included in the following conference series:

International Semantic Web Conference

3263 Accesses
36 Citations

Abstract

More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our findings to be verified and to be used as starting points for further domain-specific investigations as well as for focused information extraction endeavors.

Download to read the full chapter text

Chapter PDF

The WebDataCommons Microdata, RDFa and Microformat Dataset Series

Adoption of the Linked Data Best Practices in Different Topical Domains

LODStats: The Data Web Census Dataset

Keywords

References

Adida, B., Birbeck, M.: RDFa primer - bridging the human and data webs - W3C recommendation (2008), http://www.w3.org/TR/xhtml-rdfa-primer/
Goel, K.: Extended schema.org news support (2011), http://blog.schema.org/2011/09/extended-schemaorg-news-support.html
Goel, K., Guha, R.V., Hansson, O.: Introducing rich snippets (2009), http://googlewebmastercentral.blogspot.de/2009/05/introducing-rich-snippets.html
Guha, R.V.: Schema.org support for job postings (2011), http://blog.schema.org/2011/11/schemaorg-support-for-job-postings.html
Haas, K., Mika, P., Tarjan, P., Blanco, R.: Enhanced results for web search. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 725–734. ACM, New York (2011)
Google Scholar
Hickson, I.: HTML Microdata. Working Draft (2011), http://www.w3.org/TR/microdata/
Mühleisen, H., Bizer, C.: Web data commons – extracting structured data from two large web corpora. In: LDOW 2012: Linked Data on the Web. CEUR Workshop Proceedings, vol. 937. CEUR-ws.org (2012)
Google Scholar
Mika, P.: Microformats and RDFa deployment across the Web (2011), http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/
Mika, P., Potter, T.: Metadata statistics for a large web corpus. In: LDOW 2012: Linked Data on the Web. CEUR Workshop Proceedings, vol. 937. CEUR-ws.org (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Data and Web Science Group, University of Mannheim, Germany
Christian Bizer, Kai Eckert, Robert Meusel, Michael Schuhmacher & Johanna Völker
Centrum Wiskunde & Informatica, Database Architectures Group, The Netherlands
Hannes Mühleisen

Authors

Christian Bizer
View author publications
You can also search for this author in PubMed Google Scholar
Kai Eckert
View author publications
You can also search for this author in PubMed Google Scholar
Robert Meusel
View author publications
You can also search for this author in PubMed Google Scholar
Hannes Mühleisen
View author publications
You can also search for this author in PubMed Google Scholar
Michael Schuhmacher
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Völker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Media Institute, The Open University, Milton Keynes, UK
Harith Alani
Massachusetts Institute of Technology, Cambridge, MA, USA
Lalana Kagal
IBM Research, Hawthorne, NY, USA
Achille Fokoue
Free University Amsterdam, The Netherlands
Paul Groth
Technical University Darmstadt, Germany
Chris Biemann
Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
Josiane Xavier Parreira
VU Amsterdam, The Netherlands
Lora Aroyo
Stanford University, CA, USA
Natasha Noy
IBM Research, Yorktown Heights, NY, USA
Chris Welty
University of California, Santa Barbara, CA, USA
Krzysztof Janowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bizer, C., Eckert, K., Meusel, R., Mühleisen, H., Schuhmacher, M., Völker, J. (2013). Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis. In: Alani, H., et al. The Semantic Web – ISWC 2013. ISWC 2013. Lecture Notes in Computer Science, vol 8219. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41338-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-41338-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41337-7
Online ISBN: 978-3-642-41338-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis

Abstract

Chapter PDF

Similar content being viewed by others

The WebDataCommons Microdata, RDFa and Microformat Dataset Series

Adoption of the Linked Data Best Practices in Different Topical Domains

LODStats: The Data Web Census Dataset

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis

Abstract

Chapter PDF

Similar content being viewed by others

The WebDataCommons Microdata, RDFa and Microformat Dataset Series

Adoption of the Linked Data Best Practices in Different Topical Domains

LODStats: The Data Web Census Dataset

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation