An Approach for Deriving Semantically Related Category Hierarchies from Wikipedia Category Graphs

Hejazy, Khaled A.; El-Beltagy, Samhaa R.

doi:10.1007/978-3-642-36981-0_8

Khaled A. Hejazy⁵ &
Samhaa R. El-Beltagy⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 206))

2183 Accesses
2 Citations

Abstract

Wikipedia is the largest online encyclopedia known to date. Its rich content and semi-structured nature has made it into a very valuable research tool used for classification, information extraction, and semantic annotation, among others. Many applications can benefit from the presence of a topic hierarchy in Wikipedia. However, what Wikipedia currently offers is a category graph built through hierarchical category links the semantics of which are un-defined. Because of this lack of semantics, a sub-category in Wikipedia does not necessarily comply with the concept of a sub-category in a hierarchy. Instead, all it signifies is that there is some sort of relationship between the parent category and its sub-category. As a result, traversing the category links of any given category can often result in surprising results. For example, following the category of “Computing” down its sub-category links, the totally unrelated category of “Theology” appears. In this paper, we introduce a novel algorithm that through measuring the semantic relatedness between any given Wikipedia category and nodes in its sub-graph is capable of extracting a category hierarchy containing only nodes that are relevant to the parent category. The algorithm has been evaluated by comparing its output with a gold standard data set. The experimental setup and results are presented.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Uncovering the Semantics of Wikipedia Categories

Analysis of Japanese Wikipedia Category for Constructing Wikipedia Ontology and Semantic Similarity Measure

Towards Increasing Density of Relations in Category Graphs

Keywords

References

Kittur, A., Chi, E.H., Suh, B.: What’s in Wikipedia?: mapping topics and conflict using socially annotated category structure. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1509–1512 (2009)
Google Scholar
Suchecki, K., Salah, A.A.A., Gao, C., Scharnhorst, A.: Evolution of Wikipedia’s Category Structure. Advances in Complex Systems 15 (2012)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)
Google Scholar
Strube, M., Ponzetto, S.P.: WikiRelate! Computing Semantic Relatedness Using Wikipedia. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 2, pp. 1419–1424 (2006)
Google Scholar
Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. Association for the Advancement of Artificial Intelligence (2008)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)
Article Google Scholar
Apache Solr (2012), http://lucene.apache.org/solr/
El-Beltagy, S.R., Rafea, A.: KP-Miner: A keyphrase extraction system for English and Arabic documents. Information Systems 34(1), 132–144 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Informatics Science, Nile University, Cairo, Egypt
Khaled A. Hejazy & Samhaa R. El-Beltagy

Authors

Khaled A. Hejazy
View author publications
You can also search for this author in PubMed Google Scholar
Samhaa R. El-Beltagy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khaled A. Hejazy .

Editor information

Editors and Affiliations

, Faculty of Science and Technology, University Fernando Pessoa, Praça 9 de Abril 349, Porto, 4249-004, Portugal
Álvaro Rocha
, Instituto Superior de Estatística e, Universidade Nova de Lisboa, Campus de Campolide, Lisboa, 1070-312, Portugal
Ana Maria Correia
Broomfield Road 9, Sheffield, S10 2SE, United Kingdom
Tom Wilson
Empirica GmbH, Oxfordstr. 2, Bonn, 53111, Germany
Karl A. Stroetmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hejazy, K.A., El-Beltagy, S.R. (2013). An Approach for Deriving Semantically Related Category Hierarchies from Wikipedia Category Graphs. In: Rocha, Á., Correia, A., Wilson, T., Stroetmann, K. (eds) Advances in Information Systems and Technologies. Advances in Intelligent Systems and Computing, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36981-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-36981-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36980-3
Online ISBN: 978-3-642-36981-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

An Approach for Deriving Semantically Related Category Hierarchies from Wikipedia Category Graphs

Abstract

Chapter PDF

Similar content being viewed by others

Uncovering the Semantics of Wikipedia Categories

Analysis of Japanese Wikipedia Category for Constructing Wikipedia Ontology and Semantic Similarity Measure

Towards Increasing Density of Relations in Category Graphs

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Approach for Deriving Semantically Related Category Hierarchies from Wikipedia Category Graphs

Abstract

Chapter PDF

Similar content being viewed by others

Uncovering the Semantics of Wikipedia Categories

Analysis of Japanese Wikipedia Category for Constructing Wikipedia Ontology and Semantic Similarity Measure

Towards Increasing Density of Relations in Category Graphs

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation