Abstract
Biomedical named-entity recognition is the process of identifying entity names such as disease, symptom, drug, protein, and chemical in biomedical texts. It plays an important role in natural language processing, such as relationship extraction, question-answer systems, keyword extraction, machine translation, and text summarization. Biomedical domain information extraction can be used for early diagnosis of diseases, detection of missing relationships between biomedical entities such as diseases and chemicals, and determination of drug interactions and side effects. Since biomedical texts contain domain-specific words, complicated phrases, and abbreviations, named entity recognition in this domain is still a challenging task. In this study, we first investigated methods for named entity recognition in the biomedical domain. These methods are classified into four categories: dictionary-based, rule-based, machine learning, and deep learning methods. Recent advances such as deep learning and transformer-based biomedical language models have helped to achieve successful results in the named entity recognition task. Second, we conduct an experimental study on an annotated dataset called MedMention which is available to researchers. Finally, we present our experimental results and discuss the challenges and opportunities of the existing methods. The experimental study shows that the most successful method for extracting diseases and symptoms from biomedical texts is BioBERT, with an F1 score of 0.72.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 34(1), 50–70 (2020)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
Zhang, Y., Chen, Q., Yang, Z., Lin, H., Lu, Z.: BioWordVec, improving biomedical word embeddings with subword information and MeSH. Sci. Data 6(1), 1–9 (2019)
Kaddari, Z., Mellah, Y., Berrich, J., Bouchentouf, T., Belkasmi, M.G.: Biomedical question answering: a survey of methods and datasets. In: 2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS), pp. 1–8. IEEE (2020)
Aramaki, E., Miura, Y., Tonoike, M., Ohkuma, T., Masuichi, H., Ohe, K.: Text2table: Medical text summarization system based on named entity recognition and modality identification. In: Proceedings of the BioNLP 2009 Workshop, pp. 185–192 (2009)
Çelikten, A., Uğur, A., Bulut, H.: Keyword extraction from biomedical documents using deep contextualized embeddings. In: 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–5 (2021). https://doi.org/10.1109/INISTA52262.2021.9548470
Yang, Z., Lin, H., Li, Y.: Exploiting the performance of dictionary-based bio-entity name recognition in biomedical literature. Comput Biol Chem 32(4), 287–291 (2008)
Aronson, A.R.: Effective mapping of biomedical text to the UMLS metathesaurus: the metamap program. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2001)
Kang, N., Singh, B., Afzal, Z., et al.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Inform. Assoc. 20(5), 876–881 (2013)
Fukuda, K.I., Tsunoda, T., Tamura, A., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In Pac. Symp. Biocomput. 707(18), 707–718 (1998)
Khordad, M., Mercer, R.E., Rogan, P.: A machine learning approach for phenotype name recognition. In: Proceedings of COLING 2012, pp. 1425–1440 (2012)
Zhu, Q., Li, X., Conesa, A., Pereira, C.: GRAM-CNN: a deep learning approach with local context for named entity recognition in biomedical text. Bioinformatics 34(9), 1547–1554 (2018)
Kazama, J., Makino, T., Ohta, Y., et al.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain-vol. 3, pp. 1–8. Association for Computational Linguistics (2002)
Kazkılınç, S., Adalı, E.: Koşullu Rastgele Alanlar ile Türkçe Haber Metinlerinin Etiketlenmesi. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 5(2) (2012)
McDonald, R., Pereira, F.: Identifying gene and protein mentions in text using conditional random fields. BMC Bioinform. 6(1), 1–7 (2005)
Luo, L., et al.: An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics 34(8), 1381–1388 (2018)
Alsentzer, E., Murphy, J.R., Boag, W., Weng, W.H., Jin, D., Naumann, T., McDermott, M.: Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 (2019)
Beltagy, I., Lo, K., Cohan, A.: SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)
Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
Krallinger, M., et al.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminform. 7(1), 1–17 (2015)
Li, J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016, baw068 (2016). https://doi.org/10.1093/database/baw068
Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications, pp. 70–75 (2004)
Pafilis, E., et al.: The species and organisms resources for fast and accurate identification of taxonomic names in text. PLoS ONE 8(6), e65390 (2013)
Mohan, S., Li, D.: Medmentions: A large biomedical corpus annotated with umls concepts. arXiv preprint arXiv:1902.09476 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Çelikten, A., Onan, A., Bulut, H. (2023). Investigation of Biomedical Named Entity Recognition Methods. In: Hemanth, D.J., Yigit, T., Kose, U., Guvenc, U. (eds) 4th International Conference on Artificial Intelligence and Applied Mathematics in Engineering. ICAIAME 2022. Engineering Cyber-Physical Systems and Critical Infrastructures, vol 7. Springer, Cham. https://doi.org/10.1007/978-3-031-31956-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-31956-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31955-6
Online ISBN: 978-3-031-31956-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)