Enhancing Relation Extraction from Biomedical Texts by Large Language Models

Asada, Masaki; Fukuda, Ken

doi:10.1007/978-3-031-60615-1_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14736))

Included in the following conference series:

International Conference on Human-Computer Interaction

589 Accesses

Abstract

In this study, we propose a novel relation extraction method enhanced by large language models (LLMs). We incorporated three relation extraction models that leverage LLMs: (1) relation extraction via in-context few-shot learning with LLMs, (2) enhancing the sequence-to-sequence (seq2seq)-based full fine-tuned relation extraction by CoT reasoning explanations generated by LLMs, (3) enhancing the classification-based full fine-tuned relation extraction by entity descriptions that are automatically generated by LLMs. In the experiment, we shot that in-context few-shot learning with LLMs suffers in biomedical relation extraction tasks. We further show that entity explanations that are generated by LLMs can improve the performance of the classification-based relation extraction in the biomedical domain. Our proposed model achieved an F-score of 85.61% on the DDIExtraction-2013 dataset, which is competitive with the state-of-the-art models.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Keywords

1 Introduction

Relation extraction (RE) is the natural language processing task of automatically extracting important relations between named entities in the text. One of the applications of relation extraction is automatic database completion and expansion. In order to construct databases from textual resources so that humans can easily access important information, it is necessary to comprehensively read a bunch of documents, which requires a large amount of manual cost. The research on relation extraction from texts is crucial in terms of achieving advanced human-computer interactions.

Relation extraction from biomedical texts is vital research to help biomedical experts. One of the tasks is extracting drug-drug interactions from texts. Drug-drug interaction (DDI) is defined as a change in the effects of one drug by the presence of another drug [4]. In order to practice “evidence-based medicine” [16] and prevent accidents caused by drugs, it is important to extract knowledge about DDIs from pharmaceutical papers comprehensively. Automatic DDI extraction can greatly benefit the pharmaceutical industry, providing an interesting way of reducing the time spent by healthcare professionals reviewing the medical literature.

Classification-based supervised methods [14, 17] have been conventionally adopted for information extraction from biomedical texts, however, with the success of large language models (LLMs), prompt-tuning-based information extraction methods [6] have been started to be studied. In prompt-tuning methods, the input sentence and the prompt, which is an instruction text for the target downstream task, are fed into the LLM, and then the LLM predicts the entities and relations between these entities. In recent years, research on prompt-tuning has drawn more and more attention, and various methods such as in-context learning [15] and instruction tuning [9] have been proposed. Because of the extremely large number of parameters in the LLM, it is not realistic to update the entire model parameters by supervised learning. Instead, a few-shot learning approach with only a few supervised examples, or a zero-shot learning approach with no supervised examples is commonly used to predict answers.

The critical issue is that despite the success of LLMs in generative tasks such as summarization and question answering, LLMs do not significantly improve performance on the information extraction task. According to the previous surveys [6, 7], the GPT-3.5 model, which has 355B parameters, underperformed traditional classification-based state-of-the-art methods on several biomedical named entity recognition and relation extraction tasks. Furthermore, the GPT-4 model, which has an even larger model size, underperforms the method with fully supervised PubMedBERT [11], which has only 110 M parameters. These results show that the existing prompt-based few-shot and zero-shot learning with LLMs is not effective in the information extraction task in the biomedical domain.

In this study, we propose a novel information extraction method enhanced by LLMs. The overview of our proposed method is shown in Fig. 1. We investigated three DDI extraction methods that leverage LLMs. In the first method, we investigate the ability to extract DDIs in a few-shot learning setting via an extremely large-sized language model Gemini-Pro [20]. In the second method, we enhance the seq2seq-based full fine-tuned DDI extraction by CoT reasoning explanations generated by Gemini-Pro. In the third method, we enhance the classification-based full fine-tuned DDI extraction by drug entity descriptions that are automatically generated by Gemini-Pro. Our contributions are summarized as follows:

We propose three DDI extraction methods that leverage the benefit of LLMs.
Experimental results on the DDIExtraction-2013 dataset show that the entity descriptions that are generated by LLMs can boost the performance of the classification-based DDI extraction method, achieving significant F-score improvement.

2 Related Work

Extracting information from biomedical literature is an important NLP task that can convert unstructured text data such as academic papers and web articles to structured data that can be easily accessed by humans. One of the target tasks is drug-drug interaction (DDI) extraction from the literature. The definition of DDI is broadly described as a change in the effects of one drug by the presence of another drug [4]. The detection of DDIs is an important research area in patient safety since these interactions can become very dangerous and increase healthcare costs. The DDIExtraction-2013 [18] dataset was constructed to promote automatic DDI extraction from the literature via machine learning methods.

On the DDI extraction task, classification-based methods using encoder-only relatively small pre-trained language models (PLMs) have shown high performance. PLMs in the biomedical domain such as BioBERT [13], SciBERT [5] and PubMedBERT [11] have been adopted for the DDI extraction task. Methods combining PLMs with information from external drug databases, e.g., DrugBank [22] have been proposed and it has been reported that using information from external databases improves the extraction performance rather than considering only the context [1,2,3].

In the general domain of relation extraction, REBEL [12], which adopted seq2seq-based PLMs showed higher performance than existing pipeline-based methods on joint extraction of entities and relations. Wadhwa et al. [21] firstly showed that few-shot learning with GPT-3 yields near state-of-the-art performance on general domain relation extraction datasets and then proposed the approach of training Flan-T5 with Chain-of-Thought (CoT) style “explanations” (generated automatically by GPT-3) that support relation inferences; this achieved state-of-the-art results on general domain relation extraction tasks.

On the other hand, Chen et al. [8] reported that LLMs do not significantly improve performance on the information extraction task in the biomedical domain. GPT-3.5 model, which has 355B parameters, underperformed traditional classification-based state-of-the-art on several biomedical named entity recognition and relation extraction tasks. Furthermore, GPT-4 model, which has an even larger model size, underperforms the method with fully supervised PubMedBERT [11], which has only 110M parameters. There has been not enough discussion regarding the effectiveness of LLMs, and methods for combining LLMs and smaller-size PLMs on the biomedical information extraction task.

3 Method

3.1 Relation Extraction via In-Context Few-Shot Learning with LLMs

We adopt forms of instructional in-context few-shot prompting to Gemini-Pro [20]. Figure 2 shows the instructional prompt and examples (“shots”) for the input of LLMs. In this method, we verify two approaches: Direct prompting, which predicts the relation type directly from the instructional prompt and a few examples, and chain-of-thought prompting, which predicts the relation type after predicting an explanation of two entities.

Direct Prompting. To construct prompts for relation extraction, we use the prompt that defines the types of relations and instructs LLMs to predict the correct relation type from the given texts, as shown in Fig. 2 A. Special tokens (<e1>, </e1>, <e2>, </e2>) are used to clarify which of the drugs in the sentence are targeted. Example sentences are selected from the training dataset of the relation extraction corpus. Among them, we select the sentences that appeared within the annotation guideline for dataset construction, because we consider these examples to be representative of their relation types.

Chain-of-Thought Prompting. In chain-of-thought (CoT), the prompt instructs LLMs to first generate an explanation of entities and then predict the relation type, rather than directly predict the relation types. Examples for the few-shot learning are selected in the same way as in the Direct prompting method, and an explanation of each sentence is added, as shown in Fig. 2 B. As explanations, we adopt the text that describes the relation between entities in the annotation guideline.

3.2 Seq2seq-Based Relation Extraction Enhanced by LLMs

We applied the method [21] of using LLMs for data augmentation in full fine-tuning of relation extraction with seq2seq-based PLMs to the biomedical domain. Figure 3 shows the overview of the method. In this method, relatively small-size PLMs with less than 1B parameters are fine-tuned on the whole training dataset. The relation labels are generated by the seq2seq model, and we add CoT style explanations generated automatically by LLMs that support relation inferences in fine-tuning on training dataset. Firstly we prepare the CoT style explanations for all examples of the training dataset, by feeding the instructional prompt and examples as shown in the left part of Fig. 3. Then we fine-tune seq2seq PLMs on gold relation labels and explanations generated by LLMs, as shown in the right part of Fig. 3.

3.3 Classification-Based Relation Extraction Enhanced by LLMs

We propose a classification-based relation extraction method that is enhanced by LLMs. In this approach, input sentences are converted into a pooled representation by the encoder-only PLMs, and resulting vectors are converted into the dimension of the number of relation labels for multi-class classification. We utilize LLMs for augmenting the information of entities in full fine-tuning with PLMs. Specifically, for the two entities in the sentence, descriptions of entities are generated in advance by LLMs with the prompt “Please provide a short description on <ENTITY> in one sentence.”, as shown in Fig. 4. The input sentence, the first entity description, and the second entity description are given to the PLMs. Three output vectors are concatenated and finally, the resulting vector is fed to the linear layer for dimension conversion. We prepare the separated two PLMs, one for the input sentences, and the other for entity descriptions.

4 Experimental Settings

4.1 DDI Extraction Task Settings

We followed the DDIExtraction-2013 [18] shared task settings. This dataset is composed of input sentences containing the drug mention pair, and the following four DDI types are annotated to each drug pair.

Mechanism: This type is assigned when a pharmacokinetic interaction is described in an input sentence.
Effect: This type is assigned when a pharmacodynamic interaction is described in an input sentence.
Advice: This type is assigned when a recommendation or advice regarding the concomitant use of two drugs is described in an input sentence.
Interaction (Int.): This type is assigned when the sentence states that interaction occurs and does not provide any detailed information about the interaction.

Table 1 shows the statistics of DDI extraction dataset. We can see that the dataset is highly imbalanced, there are roughly six times the number of pairs not mentioning a relation (negative pairs) than the pairs mentioning a relation (positive pairs). Since no validation set splitting is provided by the official dataset, we split the training data into a smaller training set and validation set to perform hyper-parameter tuning. After determining the hyper-parameters, we re-trained the model on the whole training set and evaluated the model on the test set.

Table 1. The statistics of DDIExtraction-2013 dataset

Full size table

4.2 LLMs and Prompts

We adopted Gemini-Pro [20] as a LLM. Gemini-Pro is a performance-optimized model in terms of cost as well as latency that delivers significant performance across a wide range of tasks. In the evaluation results on the series of text-based academic benchmarks covering reasoning, reading comprehension, STEM and coding, Gemini-Pro showed higher performance than GPT-3.5. We obtained the output from Gemini-Pro via Google AI API^{Footnote 1}. If the model generated text that did not match any relation label name, it was assumed to predict the negative relation.

To prepare the prompts for few-shot learning, we selected 14 examples from the annotation guideline^{Footnote 2} of the DDIExtraction-2013 dataset. The explanations for CoT reasoning are also extracted from the annotation guideline.

4.3 PLMs for Seq2seq Methods

We adopted Flan-T5 Large [9] model, which has 783M parameters, as a baseline of the seq2seq-based method. In the seq2seq-based DDI extraction, the model generates the output in the form of Relation: xxx, and the model with CoT generates Relation: xxx Explanation: xxx. The generated explanation part is not used for the evaluation, only the generated relation type is used. When the model generates an output that does not match any of the relation types, We assume that the negative label is predicted. Flan-T5 model parameters are trained on all training samples of the DDIExtraction-2013 dataset. Besides, the model with CoT is trained on the explanations that are generated by Gemini-Pro in advance. We set the beam size as 5 for the generation. We employed the Adafactor optimizer [19], and tuned hyper-parameters on the development dataset.

4.4 PLMs for Classification Methods

We employed PubMedBERT Large [11] as a baseline of the encoder-only PLMs for classification-based relation extraction. We employed the Adafactor optimizer [19] and tuned hyper-parameters on the development dataset. Our significance tests are based on the permutation test [10]. We set the number of shuffles to 5,000.

5 Results and Discussions

5.1 In-Context Few-Shot Learning-Based Relation Extraction by LLMs

Table 2 shows the performance comparison among the traditional classification-based method and few-shot in-context learning methods via Gemini-Pro with and without CoT. As shown in Table 2, few-shot in-context learning via Gemini-Pro showed quite low performances compared to the classification-based method with smaller PLM (PubMedBERT-Large). The model with CoT showed a higher F-score than the direct prompting model, but the performance is still much lower than the fully fine-tuned PubMedBERT. These results are consistent with the report [6] that have validated GPT-3.5 in other biomedical relation extraction datasets, indicating that while LLMs have reasonable text generation capacity, it is difficult to correctly predict relations between entities from few-shot samples.

We performed further analysis on the predicted relation labels by LLMs. Figure 5 shows the normalized confusion matrix of the gold labels and predictions from Gemini-Pro with and without CoT. Each row of the matrix shows the distribution of the label predictions by the model for each gold label, and the scale is normalized. The diagonal components of the matrix indicate the samples that are correctly predicted, which means the darker color of all diagonal elements indicates higher model performance. As shown in Fig. 5, there are many positive relation instances incorrectly predicted as negative relations on the model of Gemini-Pro without CoT. In the Gemini-Pro with CoT model, there are fewer cases of incorrectly predicting positive relations as negative relations, however, there are more cases of incorrectly predicting negative relations as positive relations. These results show that it is difficult for LLMs-based in-context few-shot learning to predict correct relation labels on a highly imbalanced relation extraction dataset.

Table 2. The performance of DDI extraction on in-context few-shot prompt learning methods

Full size table

5.2 Seq2seq-Based Relation Extraction Enhanced by LLMs

Table 3 shows the F-score comparison with baseline models and seq2seq-based models. Seq2seq-based Flan-T5 backboned DDI extraction model showed 82.25% of the F-score, which is lower than the classification-based baseline mode. In particular, the precision score is much lower than the classification-based method. The CoT model with the explanations generated by Gemini-Pro showed a lower F-score than the model without CoT.

Table 3. The performance of DDI extraction on seq2seq-based methods

Full size table

Table 4. The performance of DDI extraction on classification-based methods. \(~^{*}\) indicates performance improvement from PubMedBERT (baseline) at a significance level of \(p < 0.05\)

Full size table

Table 5. The comparison of F-scores for individual DDI types on the DDIExtraction-2013 test dataset. Mech. and Int. denote Mechanism and Interaction, respectively.

Full size table

5.3 Classification-Based Relation Extraction Enhanced by LLMs

Table 4 shows the F-score comparison between the baseline model, the model with entity explanations by Gemini-Pro, and state-of-the-art method HKG-DDIE [2] that utilizes the heterogeneous knowledge graphs information into DDI extraction task. By using the entity explanations that are generated by Gemini-Pro, the F-score improved by 1.77 pp, showing significant performance improvement on the permutation test. Table 5 shows the performance comparison for individual DDI types. The model with entity explanations showed higher performance than the baseline model on the relation labels of Mechanism, Effect, and Interaction, while showing lower performance on Advise relation type. In particular, our proposed model improved the 12.41 pp F-score on Interaction type. These results show the effectiveness of leveraging LLMs for classification-based DDI extraction methods.

6 Conclusion

In this paper, we proposed three methods that leverage LLMs for the DDI extraction task. We showed that in-context few-shot learning with LLMs suffers in biomedical relation extraction tasks, which is also consistent with previous reports. We then investigated the seq2seq-based relation extraction in the biomedical domain. Seq2seq-based models showed a lower F-score, which lies in the low precision score. We added CoT explanations generated by LLMs to seq2seq-based models, but the model CoT explanations do not improve the DDI extraction performance. We further showed entity explanations that are generated by LLMs can improve the performance of the classification-based relation extraction method on the DDIExtraction-2013 task.

Notes

References

Asada, M., Miwa, M., Sasaki, Y.: Enhancing drug-drug interaction extraction from texts by molecular structure information. In: Gurevych, I., Miyao, Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 680–685. Association for Computational Linguistics, Melbourne, Australia (Jul 2018). https://doi.org/10.18653/v1/P18-2108, https://aclanthology.org/P18-2108
Asada, M., Miwa, M., Sasaki, Y.: Integrating heterogeneous knowledge graphs into drug-drug interaction extraction from the literature. Bioinformatics 39(1), btac754 (2022). https://doi.org/10.1093/bioinformatics/btac754
Asada, M., et al.: Using drug descriptions and molecular structures for drug-drug interaction extraction from literature. Bioinformatics 37(12), 1739–1746 (2020). https://doi.org/10.1093/bioinformatics/btaa907
Baxter, K., Preston, C.L.: Stockley’s Drug Interactions, vol. 495. Pharmaceutical Press, London (2010)
Google Scholar
Beltagy, I., et al.: SciBERT: a pretrained language model for scientific text. In: Proceedings of EMNLP-IJCNLP 2019, pp. 3615–3620. Hong Kong, China (Nov 2019)
Google Scholar
Chen, Q., et al.: An extensive benchmark study on biomedical text generation and mining with ChatGPT. Bioinformatics 39(9), btad557 (2023). https://doi.org/10.1093/bioinformatics/btad557
Chen, Q., et al.: Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations. arXiv preprint arXiv:2305.16326 (2023)
Chen, Y.: Incomplete utterance rewriting as sequential greedy tagging. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Findings of the Association for Computational Linguistics: ACL 2023, pp. 7265–7276. Association for Computational Linguistics, Toronto, Canada (Jul 2023). https://doi.org/10.18653/v1/2023.findings-acl.456, https://aclanthology.org/2023.findings-acl.456
Chung, H.W., et al.: Scaling instruction-finetuned language models (2022)
Google Scholar
Fisher, R.A., et al.: The design of experiments (1937)
Google Scholar
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. (HEALTH) 3(1), 1–23 (2021)
Google Scholar
Huguet Cabot, P.L., Navigli, R.: REBEL: relation extraction by end-to-end language generation. In: Moens, M.F., Huang, X., Specia, L., Yih, S.W.t. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2370–2381. Association for Computational Linguistics, Punta Cana, Dominican Republic (Nov 2021). https://doi.org/10.18653/v1/2021.findings-emnlp.204, https://aclanthology.org/2021.findings-emnlp.204
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019). https://doi.org/10.1093/bioinformatics/btz682
Liu, S., et al.: Drug-drug interaction extraction via convolutional neural networks. Comput. Math. Methods Med. 2016 (2016)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI blog 1(8), 9 (2019)
Google Scholar
Sackett, D.L.: Evidence-based medicine. In: Seminars in Perinatology, vol. 21, pp. 3–5. Elsevier (1997)
Google Scholar
Sahu, S.K., Anand, A.: Drug-drug interaction extraction from biomedical texts using long short-term memory network. J. Biomed. Inform. 86, 15–24 (2018)
Article Google Scholar
Segura-Bedmar, I., Martínez, P., Herrero-Zazo, M.: SemEval-2013 task 9 : extraction of drug-drug interactions from biomedical texts (DDIExtraction 2013). In: Manandhar, S., Yuret, D. (eds.) Second Joint Conference on Lexical and Computational Semantics (*SEM), vol. 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 341–350. Association for Computational Linguistics, Atlanta, Georgia, USA (Jun 2013). https://aclanthology.org/S13-2056
Shazeer, N., Stern, M.: Adafactor: adaptive learning rates with sublinear memory cost. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 4596–4604. PMLR (10–15 Jul 2018). https://proceedings.mlr.press/v80/shazeer18a.html
Team, G., et al.: Gemini: a family of highly capable multimodal models (2023)
Google Scholar
Wadhwa, S., Amir, S., Wallace, B.: Revisiting relation extraction in the era of large language models. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 15566–15589. Association for Computational Linguistics, Toronto, Canada (Jul 2023). https://doi.org/10.18653/v1/2023.acl-long.868, https://aclanthology.org/2023.acl-long.868
Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46(D1), D1074–D1082 (2017). https://doi.org/10.1093/nar/gkx1037

Download references

Author information

Authors and Affiliations

National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
Masaki Asada & Ken Fukuda

Authors

Masaki Asada
View author publications
You can also search for this author in PubMed Google Scholar
Ken Fukuda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masaki Asada .

Editor information

Editors and Affiliations

Siemens Corporation, Princeton, NJ, USA
Helmut Degen
Foundation for Research and Technology - FORTH, Heraklion, Crete, Greece
Stavroula Ntoa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Asada, M., Fukuda, K. (2024). Enhancing Relation Extraction from Biomedical Texts by Large Language Models. In: Degen, H., Ntoa, S. (eds) Artificial Intelligence in HCI. HCII 2024. Lecture Notes in Computer Science(), vol 14736. Springer, Cham. https://doi.org/10.1007/978-3-031-60615-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-60615-1_1
Published: 23 May 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60614-4
Online ISBN: 978-3-031-60615-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Relation Extraction from Biomedical Texts by Large Language Models

Abstract

Keywords

1 Introduction

2 Related Work

3 Method

3.1 Relation Extraction via In-Context Few-Shot Learning with LLMs

3.2 Seq2seq-Based Relation Extraction Enhanced by LLMs

3.3 Classification-Based Relation Extraction Enhanced by LLMs

4 Experimental Settings

4.1 DDI Extraction Task Settings

4.2 LLMs and Prompts

4.3 PLMs for Seq2seq Methods

4.4 PLMs for Classification Methods

5 Results and Discussions

5.1 In-Context Few-Shot Learning-Based Relation Extraction by LLMs

5.2 Seq2seq-Based Relation Extraction Enhanced by LLMs

5.3 Classification-Based Relation Extraction Enhanced by LLMs

6 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation