The Wikipedia XML Corpus

Denoyer, Ludovic; Gallinari, Patrick

doi:10.1007/978-3-540-73888-6_2

Ludovic Denoyer¹ &
Patrick Gallinari¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

1621 Accesses
19 Citations

Abstract

This article presents the general Wikipedia XML Collection developped for Structured Information Retrieval and Structured Machine Learning. This collection has been built from the Wikipedia Enclyclopedia. We detail particularly here which parts of this collection have been used during INEX 2006 for the Ad-hoc track and for the XML Mining track. Note that other tracks of INEX - multimedia track for example - have also been based on this collection.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Building a Wikipedia N-GRAM Corpus

LaTeXML 2012 - A Year of LaTeXML

The Responsa Project: Some Promising Future Directions

Author information

Authors and Affiliations

Laboratoire d’Informatique de Paris 6, 8 rue du capitaine Scott, 75015 Paris,
Ludovic Denoyer & Patrick Gallinari

Authors

Ludovic Denoyer
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Gallinari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Denoyer, L., Gallinari, P. (2007). The Wikipedia XML Corpus. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-73888-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73887-9
Online ISBN: 978-3-540-73888-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Wikipedia XML Corpus

Abstract

Chapter PDF

Similar content being viewed by others

Building a Wikipedia N-GRAM Corpus

LaTeXML 2012 - A Year of LaTeXML

The Responsa Project: Some Promising Future Directions

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Wikipedia XML Corpus

Abstract

Chapter PDF

Similar content being viewed by others

Building a Wikipedia N-GRAM Corpus

LaTeXML 2012 - A Year of LaTeXML

The Responsa Project: Some Promising Future Directions

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation