Impact of Context on Keyword Identification and Use in Biomedical Literature Mining

Dasigi, Venu G.; Karam, Orlando; Pydimarri, Sailaja

doi:10.1007/978-3-030-02686-8_38

Venu G. Dasigi¹⁷,
Orlando Karam¹⁸ &
Sailaja Pydimarri¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 880))

Included in the following conference series:

Proceedings of the Future Technologies Conference

1714 Accesses

Abstract

The use of two statistical metrics in automatically identifying important keywords associated with a concept such as a gene by mining scientific literature is reviewed. Starting with a subset of MEDLINE^® abstracts that contain the name or synonyms of a gene in their titles, the aforementioned metrics contrast the prevalence of specific words in these documents against a broader “background set” of abstracts. If a word occurs substantially more often in the document subset associated with a gene than in the background set that acts as a reference, then the word is viewed as capturing some specific attribute of the gene.

The keywords thus automatically identified may be used as gene features in clustering algorithms. Since the background set is the reference against which keyword prevalence is contrasted, the authors hypothesize that different background document sets can lead to somewhat different sets of keywords to be identified as specific to a gene. Two different background sets are discussed that are useful for two somewhat different purposes, namely, characterizing the function of a gene, and clustering a set of genes based on their shared functional similarities. Experimental results that reveal the significance of the choice of background set are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Introduction to Biomedical Literature Text Mining: Context and Objectives

Mining Biomedical Literature: An Open Source and Modular Approach

GPDminer: a tool for extracting named entities and analyzing relations in biological literature

Article Open access 06 March 2024

Notes

1.
This is also sometimes called the collection frequency of the term in the set of documents, and counts the total number of occurrences of the term in all the documents of the collection. It differs from the document frequency of a term in a collection of documents in that the document frequency just counts how many documents contain the term (with no distinction on the number of occurrences).

References

Andrade, M., Valencia, A.: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 14(7), 600–607 (1998). https://doi.org/10.1093/bioinformatics/14.7.600
Article Google Scholar
Cherepinsky, V., Feng, J., Rejali, M., Mishra, B.: Shrinkage based similarity metric for cluster analysis of microarray data. Proc. Natl. Acad. Sci. USA 100(17), 418–427 (2003). https://doi.org/10.1073/pnas.1633770100
Article MathSciNet MATH Google Scholar
Dasigi, V., Karam, O., Pydimarri, S.: An evaluation of keyword selection on gene clustering in biomedical literature mining. In: Proceedings of Fifth IASTED International Conference on Computational Intelligence, pp. 119–124 (2010). URL: http://www.actapress.com/Abstract.aspx?paperId=43008
Hamdan, H., Bellot, P., Béchet, F.: The impact of Z-score on Twitter sentiment analysis. In: Proceedings of 8th International Workshop on Semantic Evaluation, pp. 596–600 (2014). https://doi.org/10.3115/v1/s14-2113
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a K-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979). https://doi.org/10.2307/2346830
Article MATH Google Scholar
Ikeda, D., Suzuki, E.: Mining peculiar compositions of frequent substrings from sparse text data using background texts. In: Proceedings of European Conference on Machine Learning and Knowledge Discovery in Databases, Springer Lecture Notes in Artificial Intelligence, vol. 5781, pp. 596–611 (2009). https://doi.org/10.1007/978-3-642-04180-8_56
Chapter Google Scholar
Liu, Y., Navathe, S., Pivoshenko, A., Dasigi, V., Dingledine, R., Ciliax, B.: Text analysis of MEDLINE for discovering functional relationships among genes: evaluation of keyword extraction weighting schemes. Int. J. Data Min. Bioinform. 1(1), 88–110 (2006). https://doi.org/10.1504/ijdmb.2006.009923
Article Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24, 513–523 (1988). https://doi.org/10.1016/0306-4573(88)90021-0
Article Google Scholar

Download references

Acknowledgments

The authors acknowledge that the MEDLINE^® data used in this research are covered by a license agreement supported by the U.S. National Library of Medicine. Thanks are also due to Professor Rajnish Singh (Kennesaw State University) for her assistance in relation to evaluating the keywords for the various genes, and for her help in other ways related to this work.

Author information

Authors and Affiliations

Bowling Green State University, Bowling Green, OH, USA
Venu G. Dasigi
Kennesaw State University, Marietta, GA, USA
Orlando Karam
Life University, Marietta, GA, USA
Sailaja Pydimarri

Authors

Venu G. Dasigi
View author publications
You can also search for this author in PubMed Google Scholar
Orlando Karam
View author publications
You can also search for this author in PubMed Google Scholar
Sailaja Pydimarri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Venu G. Dasigi .

Editor information

Editors and Affiliations

Saga University , Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia
The Science and Information (SAI) Organization, Bradford, UK
Supriya Kapoor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dasigi, V.G., Karam, O., Pydimarri, S. (2019). Impact of Context on Keyword Identification and Use in Biomedical Literature Mining. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2018. FTC 2018. Advances in Intelligent Systems and Computing, vol 880. Springer, Cham. https://doi.org/10.1007/978-3-030-02686-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-030-02686-8_38
Published: 18 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02685-1
Online ISBN: 978-3-030-02686-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Impact of Context on Keyword Identification and Use in Biomedical Literature Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction to Biomedical Literature Text Mining: Context and Objectives

Mining Biomedical Literature: An Open Source and Modular Approach

GPDminer: a tool for extracting named entities and analyzing relations in biological literature

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Impact of Context on Keyword Identification and Use in Biomedical Literature Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction to Biomedical Literature Text Mining: Context and Objectives

Mining Biomedical Literature: An Open Source and Modular Approach

GPDminer: a tool for extracting named entities and analyzing relations in biological literature

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation