SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers for Medical Texts

Nguyen, Nhung Thi-Hong; Ha, Phuong Phan-Dieu; Nguyen, Luan Thanh; Van Nguyen, Kiet; Nguyen, Ngan Luu-Thuy

doi:10.1007/978-3-031-10986-7_30

Nhung Thi-Hong Nguyen^12,13,
Phuong Phan-Dieu Ha^12,13,
Luan Thanh Nguyen^12,13,
Kiet Van Nguyen^12,13 &
…
Ngan Luu-Thuy Nguyen^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13369))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

1966 Accesses
5 Citations
1 Altmetric

Abstract

Question answering (QA) systems have gained explosive attention in recent years. However, QA tasks in Vietnamese do not have many datasets. Significantly, there is mostly no dataset in the medical domain. Therefore, we built a Vietnamese Healthcare Question Answering dataset (ViHealthQA), including 10,015 question-answer passage pairs for this task, in which questions from health-interested users were asked on prestigious health websites and answers from highly qualified experts. This paper proposes a two-stage QA system based on Sentence-BERT (SBERT) using multiple negatives ranking (MNR) loss combined with BM25. Then, we conduct diverse experiments with many bag-of-words models to assess our system’s performance. With the obtained results, this system achieves better performance than traditional methods.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A question-entailment approach to question answering

Article Open access 22 October 2019

UNCC Biomedical Semantic Question Answering Systems. BioASQ: Task-7B, Phase-B

Pre-trained Language Model for Biomedical Question Answering

Keywords

1 Introduction

Today, many websites have QA forums, where users can post their questions and answer other users’ questions. However, they usually take time to wait for responses. Moreover, data for question answering has become enormous, which means new questions inevitably have duplicate meanings from the questions in the database. In order to reduce latency and effort, QA systems based on information retrieval (IR) retrieving a good answer from the answer collection is essential. QA relies on open domain datasets such as texts on the web or closed domain datasets such as collections of medical papers like PubMed [8] to find relevant passages. Moreover, in the COVID-19 pandemic, people care more about their health, and the number of questions posted on health forums has increased rapidly. Therefore, QA in the medical domain plays an important role. Lexical gaps between queries and relevant documents that occur when both use different words to describe similar contents have been a significant issue. Table 1 shows a typical example of this issue in our dataset. Previous studies applied word embeddings to estimate semantic similarity between texts to solve [26]. Various research studies approached deep neural networks and BERT to extract semantically meaningful texts [11]. Primarily, SBERT has recently achieved state-of-the-art performance on several tasks, including retrieval tasks [7]. This paper focuses on exploring fine-tuned SBERT models with MNR.

We contribute: (1) Introduce a ViHealthQA dataset containing 10,015 pairs in the medical domain. (2) Propose two-stage QA system based on SBERT with MNR loss. (3) Perform multiple experiments, including traditional models such as BM25, TF-IDF cosine similarity, and Language Model to compare our system.

Table 1. A typical example of Lexical gaps in ViHealthQA dataset.

Full size table

2 Related Work

In early-stage works of QA retrieval, several studies [3] presented sparse vector models. Using unigram word counts, these models map queries and documents to vectors having many 0 values and rank the similarity values to extract potential documents. In 2008, Manning et al. [14] did many experiments to gain a deeper understanding of the role of vectors, including how to compare queries with documents. Moreover, many researchers [4, 19] pay attention to BM25 methods in IR tasks.

IR methods with sparse vectors have a significant drawback: lexical gap challenges. The solution to this problem is using dense embedding to represent queries and documents. This idea was proposed early with the LSI approach [2]. However, the most well-known model is BERT. BERT applied encoders to compute embeddings for the queries and the documents. Liu et al. [13] installed the final mean pooling layer and then calculated similarity values between outputs. Instead, Karpukhin et al. [9] used the initial CLS token. Many studies [10, 12] applied BERT and reached significant results. Significantly, SBERT [18] uses Siamese and triplet network structures to represent semantically meaningful sentence embeddings. Multiple research approaches have approached SBERT for Semantic Textual Similarity (STS) and Natural Language Inference (NLI) benchmarks. In 2021, Ha et al. [5] utilized SBERT to find similar questions in community question answering. They did several experiments on SBERT with multiple losses, including MNR loss.

Because of our task in the medical domain, we reviewed some related corpus. For example, CliCR [22] comprises around 100,000 gap-filling queries based on clinical case reports, and MedQA [28] includes answers for real-world multiple-choice questions. In Vietnam, Nguyen et al. 2021 [24] published ViNewsQA, including 22,057 human-generated question-answer pairs. This dataset supports machine reading comprehension tasks.

3 Task Description

There are n question-answer passage pairs in the database. We have a collection of questions ${Q = \{q_1, q_2, ...,q_n \}}$ and a collection of answer passages ${A = \{a_1,a_2,...,a_n\}}$. Our task is creating models with question i ${ (q_i)}$ belongs to collection Q ${ (q_i \in Q)}$ can retrieve precise answer passage ${a_i}$ ${ (a_i \in A)}$.

4 Dataset

4.1 Dataset Characteristics

We release ViHealthQA, a novel Vietnamese dataset for question answering and information retrieval, including 10,015 question-answer passage pairs. We collect data from Vinmec^{Footnote 1} and VnExpress^{Footnote 2} websites by using the BeautifulSoup^{Footnote 3} library. These ones are forums where users ask health-related questions answered by qualified doctors. The dataset consists of 4 features: index, question, answer passage, and link.

4.2 Overall Statistics

After the collecting data phase, we divide our dataset into train, dev, and test sets. In particular, there are 7,009 pairs in Train, 993 pairs in Dev, and 2,013 pairs in Test (Table 2).

According to Table 3, most of the answer passages are in the range of 101–300 words (34.1%), the second ratio is the number of answer passages with 301–500 words (31.13%), followed by 501–700 words (15.88%), and 701–1000 words (9.98%). Longer answer passages (over 1000 words) comprise a small proportion (above 7.58%).

Table 2. Statistics of ViHealthQA dataset.

Full size table

Table 3. Distribution of the answer passage length (%).

Full size table

4.3 Vocabulary-Based Analysis

To understand the medical domain, we use the WordClouds tool^{Footnote 4} to display visual word frequency that appears commonly in the dataset (Fig. 1). Table 4 shows the top 10 words with the most frequency. These words are related to the medical domain. Besides, users ask many questions about Coronavirus (COVID-19), children, inflammatory diseases, and allergies.

Table 4. Top 10 common words in the ViHealthQA dataset.

Full size table

5 SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers

In this paper, we propose a two-stage question answering system called SPBERTQA (Fig. 2), including BM25-based sentence retriever and SBERT using PhoBERT fine-tuning with MNR loss. After training, the inputs (the question and the document collection) feed into BM25-SPhoBERT. Then, we rank the top K cosine similarity scores between sentence-embedding outputs to extract top K candidate documents.

5.1 BM25 Based Sentence Retriever

We aim to train the model by focusing on the meaningful knowledge of our dataset. Thus, we propose the sentence retriever stage that extracts the K sentences in every answer passage the most relevant to the corresponding question. Moreover, this stage helps solve the obstacle of the maximum length sequence of every pre-trained BERT model is 512 tokens ($max\_seq\_length$ of PhoBERT = 256 tokens), while the number of answer passages over 300 tokens in Train accounts for above 65.47%.

We use BM25 for the first stage because BM25 mostly brings good results in IR systems [20]. Besides, most answer passages have below four sentences (Average number of sentences in every answer passage $= 3.95$ in Table 2), so we choose $K = 5$.

5.2 SBERT Using PhoBERT and Fine-Tuning with MNR Loss

Multiple Negatives Ranking (MNR) Loss: MNR loss works great for IR, and semantic search [7]. The loss function is given by Equation (1).

$$\begin{aligned} L=-\frac{1}{N} \cdot \frac{1}{K} \cdot \sum _{i=1}^{K}\left[ S\left( x_{i}, y_{i}\right) -\log \sum _{j=1}^{K} e^{S\left( x_{i}, y_{j}\right) }\right] \end{aligned}$$

(1)

In every batch, there are K positive pairs (${x_i, y_i}$: question and positive answer passage), and each positive pair has $K - 1$ random negative answer passages ${(y_j, i\ne j)}$. The similarity between question and answer passage (S(x, y)) is cosine similarity. Moreover, N is the Train size.

In the second stage, we use the pre-trained PhoBERT model. PhoBERT [15] is the first public large-scale monolingual language model for Vietnamese. PhoBERT pre-training approach is based on RoBERTa, which optimizes more robust performance. Then, we fine-tune PhoBERT with MNR loss.

6 Experiments

6.1 Comparative Methods

We compare our system with traditional methods such as BM25, TFIDF-Cos, and LM; pre-trained PhoBERT; and fine-tuned SBERT such as BM25-SXMLR and BM25-SmBERT.

BM25. BM25 is an optimized version of TF-IDF. Equation (2) portrays the BM25 score of document D given a query q. ${d_{avg}}$ is the length of the average document. Moreover, BM25 adds two parameters: k helps balance the value between term frequency and IDF, and b adjusts the importance of document length normalization. In 2008, Manning et al. [14] suggested reasonable values are $k = [1.2,2.0]$ and $b = 0.75$.

$$\begin{aligned} B M 25(D, q)=\underbrace{\frac{f(q, D)*(k+1)}{f(t,D)+k *\left( 1-b+b * \frac{D}{d_{a v g}}\right) }}_{T F} *\underbrace{\log \left( \frac{N-N(q)+0.5}{N(q)+0.5}+1\right) }_{I D F} \end{aligned}$$

(2)

TF-IDF Cosine Similarity (TFIDF-Cos). Cosine similarity is one of the most popular similarity measures applied to information retrieval applications and is superior to the other measures such as the Jaccard measure and Euclidean measure [21]. Given a and b as the respective TF-IDF bag-of-words of question and answer passage. The similarity between a and b is calculated by Equation (3) [16].

$$\begin{aligned} Cos(a, b)=\frac{a \cdot b}{\Vert a\Vert \Vert b\Vert }=\frac{\sum _{1}^{n} a_{i} b_{i}}{\sqrt{\sum _{1}^{n} a_{i}^{2}} \sqrt{\sum _{1}^{n} b_{i}^{2}}} \end{aligned}$$

(3)

Language Model (LM). LM is a probabilistic model of text [23]. Questions and answers are modeled based on a probability distribution over sequences of words. The original and basic method for using LM is unigram query likelihood (Equation (4)).

$$\begin{aligned} P\left( q_{i} \mid D\right) =\left( 1-\alpha _{D}\right) *P\left( q_{i} \mid D\right) +\alpha _{D}*P\left( q_{i} \mid C\right) \end{aligned}$$

(4)

P(q|D) is the probability of the query q under the language model derived from D. P(q|C) denotes a background corpus to compute unigram probabilities to avoid 0 scores [27]. Besides, various smoothing based on how to handle ${\alpha _{D}}$ and ${ \alpha _{D} \in [0,1]}$.

PhoBERT. We directly use PhoBERT to encode question and answer passages. Then, we rank the top K answer passages having the highest cosine similarity scores with the corresponding question.

BM25-SXLMR. Similar to our model, but in the second stage, we use XLM-RoBERTa instead of PhoBERT. XLM-RoBERTa [1] was pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages (including Vietnamese).

BM25-SmBERT. Similar to our model, but in the second stage, we use BERT multilingual. BERT multilingual was introduced by [17]. This model is a transformers model pre-trained on the enormous Wikipedia corpus with 104 languages (including Vietnamese) using a masked language modeling (MLM) objective.

6.2 Data Preprocessing

We pre-process data such as lowercase, removing uninterpretable characters (e.g., new-line and extra whitespace). In order to tokenize data, we employ the RDRSegmenter of VnCoreNLP [25]. Moreover, stop-words can become noisy factors for traditional methods working well on pairs with high word matching between query and answer. Therefore, we conduct the removing stop-words phase. Firstly, we use TF-IDF to extract stop-words, and then we remove these words from the data.

6.3 Experimental Settings

We choose xlm-roberta-base^{Footnote 5}, bert-base-multilingual-cased^{Footnote 6}, and vinai/phobert-base^{Footnote 7}. Then, we fine-tune SBERT with 15 epochs, batch size of 32, learning rate of $2 e^{-5}$, and maximum length of 256. Our experiments are performed on a single NVIDIA Tesla P100 GPU on the Google Collaboratory server^{Footnote 8}.

6.4 Evaluation Metric

P@K (Equation (5)) is the percentage of questions for which the exact answer passage appears in one of the K retrieved passages [24].

$$\begin{aligned} P@K =\frac{1}{|Q|} \sum _{1}^{n}\left\{ \begin{array}{c} 1 \quad a_{q} \in A_{K}(q) \\ 0\quad Otherwise \end{array}\right. \end{aligned}$$

(5)

where, ${Q = {q_1, q_2,...,q_n }}$: collection of questions and ${q \in Q}$. ${A = {a_1, a_2,...,a_n }}$: collection of answer passages. ${a_q}$ is exact answer-passage of question q. ${A_K (q) \subseteq A}$ is the K most relevant passages extracted for question q.

Besides, mean average precision (mAP) is used to evaluate the performance of models.

Table 5. Results on Dev and Test with P@K score (%).

Full size table

Table 6. Results on Dev and Test with mAP score (%).

Full size table

7 Experiments

7.1 Results and Discussion

With the results shown in Tables 5 and 6, our system achieves the best performance with 62.25% mAP score, 50,92% P@1 score, and 83.76% P@10 score on the Test. BM25-SXLMR and BM25-SmBERT utilizing multilingual BERT do not work better than our system using monolingual PhoBERT. Compared to the PhoBERT model without fine-tuning with MNR loss, models fine-tuned with MNR (BM25-SXLMR, BM25-SmBERT, and our system) have good results, which proves that using MNR loss to fine-tune models for this task is suitable.

7.2 Analysis

To understand deeply about our system is more robust than traditional methods, and traditional methods have disadvantages in lexical gap issues, we run models on pairs having lexical overlap (the number of duplicate words between question and answer passage - X) from 0 to 10. As results are shown in Fig. 3, with $X < 4$, bag-of-words methods cannot extract the precise answer. Especially with $X = 0$, these models mostly do not work. While, with $X = 0$, fine-tuned models have results with an upper 50% P@1 score. From $K = 3$, these models have good scores with an upper 80% P@1 score. Moreover, we provide typical examples of Dev predicted by BM25, LM, and our system (Table 7). ID 169 has word matching between question and answer passage. The models that can retrieve precise answers are BM25, LM, and our system. In contrast, in ID 776, no words of question appear in the answer passage. Hence, the models must understand the semantic backgrounds instead of capturing high lexical overlap information to retrieve the precise answer. BERT models capture context and meaning better than bag-of-words methods [6]. In particular, SBERT can derive semantically meaningful sentence embeddings [18]. Therefore, our system based on sentence transformers can find the exact answer passage for the question with ID 776.

Table 7. Examples in Dev predicted by traditional methods and our system.

Full size table

8 Conclusion and Future Work

In this paper, we created the ViHealthQA dataset that comprises 10,015 question-answer passage pairs in the medical domain. Every answer passage is a doctor’s reply to the corresponding user’s question, so the ViHealthQA dataset is suitable for real search engines. Secondly, we propose the SPBERTQA, a two-stage question answering system based on sentence transformers on our dataset. Our proposed system performs best over bag-of-word-based models and fine-tuned multilingual pre-trained language models. This system solves the problem of linguistic gaps.

In future, we plan to employ the machine reading comprehension (MRC) module. This module helps extract answer spans from answer passages so that users can comprehend the meaning of the answer faster.

Notes

References

Conneau, A., et al.: Unsupervised Cross-Lingual Representation Learning at Scale, pp. 8440–8451 (2020)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Dierk, S.F.: The smart retrieval system: experiments in automatic document processing - Gerard Salton, ed. (Englewood Cliffs, NJ.: Prentice-Hall, 1971, 556 pp., \$15.00). IEEE Trans. Profess. Commun. PC-15(1), 17 (1972)
Google Scholar
Géry, M., Largeron, C.: Bm25t: a bm25 extension for focused information retrieval. Knowl. Inf. Syst. 32, 1–25 (2011)
Google Scholar
Ha, T.-T., Nguyen, V.-N., Nguyen, K.-H., Nguyen, K.-A., Than, Q.-K.: Utilizing Sbert for finding similar questions in community question answering. In: 2021 13th International Conference on Knowledge and Systems Engineering (KSE), pp. 1–6 (2021)
Google Scholar
Han, S., Wang, X., Bendersky, M., Najork, M.: Learning-to-rank with bert in TF-ranking. arXiv preprint arXiv:2004.08476 (2020)
Henderson, M., et al.: Efficient natural language response suggestion for smart reply. arXiv preprint arXiv:1705.00652 (2017)
Jin, Q., Dhingra, B., Liu, Z., Cohen, W.W., Lu, X.: A dataset for biomedical research question answering. In: EMNLP, Pubmedqa (2019)
Google Scholar
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020)
Laskar, Md.T. Rahman, Huang, J.X., Hoque, E.: Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task. In: Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, May 2020, pp. 5505–5514. European Language Resources Association (2020)
Google Scholar
Rahman, Md.T., Laskar, X.H., Hoque, E.: Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task. In: Proceedings of the 12th Language Resources and Evaluation Conference, pp. 5505–5514 (2020)
Google Scholar
Lee, K., Chang, M.-W., Toutanova, K.: Latent retrieval for weakly supervised open domain question answering, pp. 6086–6096 (2019)
Google Scholar
Liu, C.-W., Lowe, R., Serban, I., Noseworthy, M., Charlin, L., Pineau, J.: How NOT to evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, November 2016, pp. 2122–2132. Association for Computational Linguistics (2016)
Google Scholar
Manning, C., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval DRAFT, vol. 1 (2008)
Google Scholar
Nguyen, D.Q., Nguyen, A.T.: PhoBERT: pre-trained language models for Vietnamese. In: Findings of the Association for Computational Linguistics: EMNLP 2020, November 2020, pp. 1037–1042. Association for Computational Linguistics (2020)
Google Scholar
Pathak, B., Lal, N.: Information retrieval from heterogeneous data sets using moderated IDF-cosine similarity in vector space model. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 3793–379 (2017)
Google Scholar
Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual bert? arXiv preprint arXiv:1906.01502 (2019)
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using Siamese bert-networks, pp. 3973–3983 (2019)
Google Scholar
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields, pp. 42–49 (2004)
Google Scholar
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 42–49 (2004)
Google Scholar
Subhashini, R., Jawahar Senthil Kumar, V.: Evaluating the performance of similarity measures used in document clustering and information retrieval. In: 2010 First International Conference on Integrated Intelligent Computing, pp. 27–31 (2010)
Google Scholar
Šuster, S., Daelemans, W.: CliCR: a dataset of clinical case reports for machine reading comprehension. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long Papers), New Orleans, Louisiana, June 2018, pp. 1551–1563. Association for Computational Linguistics (2018)
Google Scholar
Tan, B., Shen, X., Zhai, C.: Mining long-term search history to improve search accuracy. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), New York, pp. 718–723. Association for Computing Machinery (2006)
Google Scholar
Van Nguyen, K., Van Huynh, T., Nguyen, D.-V., Nguyen, A.G.-T., Nguyen, N.L.-T.: New Vietnamese corpus for machine reading comprehension of health news articles. arXiv preprint arXiv:2006.11138 (2020)
Vu, T., Nguyen, D.Q., Nguyen, D.Q., Dras, M., Johnson, M.: VnCoreNLP: a Vietnamese natural language processing toolkit. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, New Orleans, Louisiana, June 2018, pp. 56–60. Association for Computational Linguistics (2018)
Google Scholar
Ye, X., Shen, H., Ma, X., Bunescu, R., Liu, C.: From word embeddings to document similarities for improved information retrieval in software engineering. In: 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pp. 404–415 (2016)
Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Article Google Scholar
Zhang, X., Wu, J., He, Z., Liu, X., Su, Y.: Medical exam question answering with large-scale reading comprehension. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18. AAAI Press (2018)
Google Scholar

Download references

Acknowledgement

Luan Thanh Nguyen was funded by Vingroup JSC and supported by the Master Scholarship Programme of Vingroup Innovation Foundation (VINIF), Vingroup Big Data Institute (VinBigdata), VINIF.2021.ThS.41.

Author information

Authors and Affiliations

University of Information Technology, Ho Chi Minh City, Vietnam
Nhung Thi-Hong Nguyen, Phuong Phan-Dieu Ha, Luan Thanh Nguyen, Kiet Van Nguyen & Ngan Luu-Thuy Nguyen
Vietnam National University, Ho Chi Minh City, Vietnam
Nhung Thi-Hong Nguyen, Phuong Phan-Dieu Ha, Luan Thanh Nguyen, Kiet Van Nguyen & Ngan Luu-Thuy Nguyen

Authors

Nhung Thi-Hong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Phuong Phan-Dieu Ha
View author publications
You can also search for this author in PubMed Google Scholar
Luan Thanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Kiet Van Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ngan Luu-Thuy Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luan Thanh Nguyen .

Editor information

Editors and Affiliations

Télécom Paris, Paris, France
Gerard Memmi
Purdue University, West Lafayette, IN, USA
Baijian Yang
Shanghai Jiao Tong University, Shanghai, Shanghai, China
Linghe Kong
Nanyang Technological University, Singapore, Singapore
Tianwei Zhang
Texas A&M University – Commerce, Commerce, TX, USA
Meikang Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, N.TH., Ha, P.PD., Nguyen, L.T., Van Nguyen, K., Nguyen, N.LT. (2022). SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers for Medical Texts. In: Memmi, G., Yang, B., Kong, L., Zhang, T., Qiu, M. (eds) Knowledge Science, Engineering and Management. KSEM 2022. Lecture Notes in Computer Science(), vol 13369. Springer, Cham. https://doi.org/10.1007/978-3-031-10986-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-10986-7_30
Published: 19 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10985-0
Online ISBN: 978-3-031-10986-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers for Medical Texts

Abstract

Similar content being viewed by others

A question-entailment approach to question answering

UNCC Biomedical Semantic Question Answering Systems. BioASQ: Task-7B, Phase-B

Pre-trained Language Model for Biomedical Question Answering

Keywords

1 Introduction

2 Related Work

3 Task Description

4 Dataset

4.1 Dataset Characteristics

4.2 Overall Statistics

4.3 Vocabulary-Based Analysis

5 SPBERTQA: A Two-Stage Question Answering System Based on Sentence Transformers

5.1 BM25 Based Sentence Retriever

5.2 SBERT Using PhoBERT and Fine-Tuning with MNR Loss

6 Experiments

6.1 Comparative Methods

6.2 Data Preprocessing

6.3 Experimental Settings

6.4 Evaluation Metric

7 Experiments

7.1 Results and Discussion

7.2 Analysis

8 Conclusion and Future Work

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation