Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction

Rousseau, François; Vazirgiannis, Michalis

doi:10.1007/978-3-319-16354-3_42

François Rousseau¹⁹ &
Michalis Vazirgiannis¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9022))

Included in the following conference series:

European Conference on Information Retrieval

4004 Accesses
29 Citations

Abstract

In this paper, we apply the concept of k-core on the graph-of-words representation of text for single-document keyword extraction, retaining only the nodes from the main core as representative terms. This approach takes better into account proximity between keywords and variability in the number of extracted keywords through the selection of more cohesive subsets of nodes than with existing graph-based approaches solely based on centrality. Experiments on two standard datasets show statistically significant improvements in F1-score and AUC of precision/recall curve compared to baseline results, in particular when weighting the edges of the graph with the number of co-occurrences. To the best of our knowledge, this is the first application of graph degeneracy to natural language processing and information retrieval.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Document keyword extraction based on semantic hierarchical graph model

Article 30 March 2023

Improved Automatic Keyword Extraction Given More Semantic Knowledge

Unsupervised Keyword Extraction Using the GoW Model and Centrality Scores

Keywords

References

Bassiou, N., Kotropoulos, C.: Word clustering using PLSA enhanced with long distance bigrams. In: Proceedings of ICPR 2010, pp. 4226–4229 (2010)
Google Scholar
Batagelj, V., Zavernik, M.: Fast algorithms for determining core groups in social networks. Advances in Data Analysis and Classification 5(2), 129–145 (2011)
Article MATH MathSciNet Google Scholar
Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Information Retrieval 15(1), 54–92 (2012)
Article Google Scholar
Blank, I., Rokach, L., Shani, G.: Leveraging the citation graph to recommend keywords. In: Proceedings of RecSys 2013, pp. 359–362 (2013)
Google Scholar
Bollobs, B.: Extremal Graph Theory. Academic Press, London (1978)
Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of ICML 2006, pp. 233–240 (2006)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22(1), 457–479 (2004)
Google Scholar
Filippova, K.: Multi-sentence compression: finding shortest paths in word graph. In: Proceedings of COLING 2010, pp. 322–330 (2010)
Google Scholar
Giannakopoulos, G., Karkaletsis, V., Vouros, G., Stamatopoulos, P.: Summarization system evaluation revisited: N-gram graphs. ACM Transactions on Speech and Language Processing 5(3), 1–39 (2008)
Article Google Scholar
Giatsidis, C., Thilikos, D.M., Vazirgiannis, M.: D-cores: Measuring collaboration of directed graphs based on degeneracy. In: Proceedings of ICDM 2011, pp. 201–210 (2011)
Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of EMNLP 2003, pp. 216–223 (2003)
Google Scholar
Karkali, M., Plachouras, V., Stefanatos, C., Vazirgiannis, M.: Keeping keywords fresh: A BM25 variation for personalized keyword extraction. In: Proceedings of TempWeb 2012, pp. 17–24 (2012)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Krapivin, M., Autaeu, A., Marchese, M.: Large dataset for keyphrases extraction. Technical Report DISI-09-055, University of Trento (May 2009)
Google Scholar
Leskovec, J., Grobelnik, M., Milic-Frayling, N.: Learning semantic graph mapping for document summarization. In: Proceedings of KDO 2004 (2004)
Google Scholar
Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of MMIES 2008, pp. 17–24 (2008)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
McKeown, K., Passonneau, R.J., Elson, D.K., Nenkova, A., Hirschberg, J.: Do summaries help. In: Proceedings of SIGIR 2005, pp. 210–217 (2005)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of EMNLP 2004, pp. 404–411 (2004)
Google Scholar
Nenkova, A., McKeown, K.R.: Automatic summarization. Foundations and Trends in Information Retrieval 5(2-3), 103–233 (2011)
Article Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical Report 1999-0120, Stanford University (1999)
Google Scholar
Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of CIKM 2013, pp. 59–68 (2013)
Google Scholar
Seidman, S.B.: Network structure and minimum degree. Social Networks 5, 269–287 (1983)
Article MathSciNet Google Scholar
Turney, P.D.: Learning to extract keyphrases from text. Technical report, National Research Council of Canada, Institute for Information Technology (1999)
Google Scholar
Turpin, A., Tsegay, Y., Hawking, D., Williams, H.E.: Fast generation of result snippets in web search. In: Proceedings of SIGIR 2007, pp. 127–134 (2007)
Google Scholar
Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: KEA: practical automatic keyphrase extraction. In: Proceedings of DL 1999, pp. 254–255 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

LIX, École Polytechnique, France
François Rousseau & Michalis Vazirgiannis

Authors

François Rousseau
View author publications
You can also search for this author in PubMed Google Scholar
Michalis Vazirgiannis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Vienna University of Technology, Institute of Software Technology and Interactive Systems, Favoritenstraße 9-11/188, 1040, Vienna, Austria
Allan Hanbury
Lumi, Semion Ltd., 111 Charterhouse Street, EC1M 6AW, London, UK
Gabriella Kazai
Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstraße 9-11/188, 1040, Vienna, Austria
Andreas Rauber
Universität Duisburg-Essen, Lotharstraße 65, 47057, Duisburg, Germany
Norbert Fuhr

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rousseau, F., Vazirgiannis, M. (2015). Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds) Advances in Information Retrieval. ECIR 2015. Lecture Notes in Computer Science, vol 9022. Springer, Cham. https://doi.org/10.1007/978-3-319-16354-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-16354-3_42
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16353-6
Online ISBN: 978-3-319-16354-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction

Abstract

Chapter PDF

Similar content being viewed by others

Document keyword extraction based on semantic hierarchical graph model

Improved Automatic Keyword Extraction Given More Semantic Knowledge

Unsupervised Keyword Extraction Using the GoW Model and Centrality Scores

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Main Core Retention on Graph-of-Words for Single-Document Keyword Extraction

Abstract

Chapter PDF

Similar content being viewed by others

Document keyword extraction based on semantic hierarchical graph model

Improved Automatic Keyword Extraction Given More Semantic Knowledge

Unsupervised Keyword Extraction Using the GoW Model and Centrality Scores

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation