Abstract
With the swift growth of the information over the past few years, taking full benefit is increasingly essential. Question Answering System is one of the promising methods to access this much information. The Question Answering System lacks humans’ common sense and reasoning power and cannot identify unanswerable questions and irrelevant questions. These questions are answered by making unreliable and incorrect guesses. In this paper, we address this limitation by proposing a Question Similarity mechanism. Before a question is posed to a Question-Answering system, it is compared with possible generated questions of the given paragraph, and then a Question Similarity Score is generated. The Question Similarity mechanism effectively identifies the unanswerable and irrelevant questions. The proposed Question Similarity mechanism incorporates a human way of reasoning to identify unanswerable and irrelevant questions. This mechanism can avoid the unanswerable and irrelevant questions altogether from being posed to the Question Answering system. It helps the Question Answering Systems to focus only on the answerable questions to improve their performance. Along with this, we introduce an application of the Question Answering System that generates the question-answer pairs given a passage and is useful in several fields.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The Question Answering System (QAS) plays an important role in getting questions and automatically answering them using a knowledge information system. This paper blends the essence of Question Generation, Question Comprehension, and Question Answering to overcome the Question Answering System’s limitations.
Question Answering System had existed way back in the 1960s. The first-ever question answering system introduced was BASEBALL [16]. It was built with a sequence of handwritten rules, and all baseball figures were stored in a database accumulated over the year. Later, LUNAR [42] was introduced during the Apollo mission to answer questions. This system was built to answer the moon’s geological patterns and other related information about the APOLLO mission. The customized nature of this system leads to the generation of highly accurate answers.
As the research evolved, the Question Answering System started gaining higher credibility due to data outbursts. The Natural Language Processing (NLP) systems were introduced to reach realistic language understanding [18]. Using the NLP concept, there has been significant research in Question Answering Systems during the past four decades. Early examples of primordial NLP systems are ELIZA [40], SHRDLU [41], which were developed to understand language between humans and machines. Although ELIZA was closer to a human conversation but was much less intelligent and knew almost nothing. SHRDLU on the other hand, was able to reason about the block world. Although the conversation was limited to the block’s world, so not convincingly human-like, it does know what it is talking about.
Later in the year, 2011 IBM Watson [15] gained worldwide attention, which uses NLP to analyze human speech for meaning and syntax. Way back, it was commonly referred to as a brain. In recent years, search engines (Google), chatbots (SIRI, ALEXA, and CORTANA) are becoming better at going beyond by answering the exact answer to our question. The Question Answering System has also seen significant changes in the architecture from basic Recurrent Neural Network (RNN) to transformers [8, 12] over the years.
The Question Answering System is classified into an Open-domain Question Answering System, and Closed-domain Question Answering System [24]. The open-domain question answering systems like [10, 17] can handle nearly any questions based on world knowledge. This type of Question Answering System has access to more data to extract the answer. The closed-domain question answering systems are domain-specific [2, 9, 45]. Closed-domain question answering systems answers from either a pre-structured database or the collection of domain-specific natural language documents.
According to the studies [32, 33], the human accuracy of answering the question is 89.45%, and the state-of-the-art Question Answering System’s accuracy is 93.01%. Although system accuracy exceeds human accuracy, such a Question-Answering system lacks reasoning power as humans do [30, 34, 44], to identify the questions and understand them. The SQUAD 2.0 dataset [31] provides unanswerable questions with plausible answers; however, identifying the unanswerable question remains a challenge.
The limitations of the question answering system are:
-
Unanswerable Questions: A question that is incorrect and related to the context is posed to the Question Answering System. The Question Answering System, which has outstripped human accuracy, should know that the question is unanswerable and should not generate the answer. However, the SQUAD 1.1 dataset models answer such unanswerable questions by unreliable guesses on questions for which the correct answer is not stated. It indicates that these models lack a rational way of reasoning. Even though the SQUAD 2.0 dataset introduced unanswerable questions in the dataset, identifying the unanswerable questions remains unsolved.
-
Irrelevant Questions: When the Question Answering System is posed with irrelevant questions that are out of context, the system still generates an understandable but nonsensical answer. On the other hand, humans do not provide such nonsensical answers; instead, they will identify that the question is irrelevant and out of context.
The contributions of this paper are as follows:
-
1.
To automatically generate the possible question-answer pairs, given a passage.
-
2.
We introduce a Question Similarity mechanism, where it will identify the unanswerable and irrelevant questions.
-
3.
We combine the Question Generation System with Question Answering System to create an application called Automatic Question-Answer Pairs Generation System.
Rest of the paper is organized as follows. Section 2 provides the related work on Question Answering Systems. Section 3 explains about the automatic question-answer pair generation, and question similarity mechanism. Section 4 provides details about the datasets used and the experiments. The experimental results are presented in Section 5, and Section 6 discusses about the results. Finally, in Section 7 we conclude this paper.
2 Related works
In recent years, several works are proposed to tackle world knowledge by combining search factors based on bi-gram hashing, TF-IDF matching [7] and machine reading comprehension [22, 29]. It brought the Question Answering System a good beginning. The most recent QAS is the Bidirectional Encoder Representations from Transformers (BERT) [11]. It uses neural models such as transformers to pre-train the large corpora of data. Such a latest refinement has led to remarkable gains in NLP tasks such as Question Answering, Text Summarization, and many classification problems. Besides BERT, for a broad range of applications, researchers have lately exhibited the efficiency of neural models using pretraining language modeling by taking BERT as a base model. By combining different neural architectures with the BERT language model and exploiting its embeddings, cutting-edge results in English has been achieved [5]. BERT model with the advancement of the research, a few systems such as the end-to-end interactive chatbot system like BERTserini [43], a lighter version of BERT called ALBERT [21], and an all-purpose language model called DistilBERT [36] were introduced.
The model is trained on a specific dataset after pre-training with a large corpus of data to answer the questions either in an open-domain or closed-domain question answering system. There are few datasets for question answering systems such as the CuratedTREC dataset [1], WebQuestions dataset [3] that answer questions from Freebase [4], and the Stanford Question Answering Dataset (SQuAD) [33], which is based on Wikipedia knowledge source.
The SQuAD is one of the most significant general-purpose Open-domain Question Answering datasets currently available among all these datasets. There are two versions of SQuAD dataset: SQuAD 1.1 [33] and SQuAD 2.0 [32]. The dataset SQuAD 2.0 contains unanswerable questions with plausible answers in addition to the SQuAD 1.1 dataset.
However, as seen in Table 1, when unanswerable and irrelevant questions are asked to the system, the model would make unreliable and incorrect guesses and answers to such questions.
Along with the Question Answering System (QAS) side, the Question Generation System (QGS) plays a vital role in making the model understand the question and answer it. According to Sun et al. [39], there is a close relation between Question Answering and Question Generation. The question generation task has seen many training objectives. Works such as [13, 25, 37] does not capture long-term dependencies but concentrate on the most recent tokens. Even though these papers provide a good result, these works lack capturing long-term dependencies [19, 22]. The work proposed by Qi et al. [29] has a future n-gram as a training objective, thus providing excellent results in question generation tasks.
When we extensively tested the Question Answering System keeping in mind how the answer is generated, it is found that Question Comprehension plays a significant role in the question answering system [38]. Also, systems like [46] introduce a pair-to-sequence model that captures the interaction between the question asked and the given paragraph. Specific systems like ParaQG [20] try to generate the questions from the paragraph. Systems like [35] pick up the keywords from the question and paragraph and match them using RNN. Pota et al., [27] used Convolution Neural Networks (CNNs) to classify the questions. The question classification plays a vital role in extracting the correct answer in the Question Answering System. The method proposed by Esposito et al., [14] extracts the most relevant terms from the questions, and then these words are placed in the context. This document collection is later used in the QA system. Some other work like [28] uses Part of Speech (POS) tagging based on a deep neural network. Here the POS is tagged at the character level, and then it is eventually fed to Bi-LSTM. This method handles rare and Out-of-Vocabulary words as well as common and known words.
3 Methodology
This section introduces an automatic Question-Answer pairs generation system, a combination of Question Answering System and Question Generation System. To address the limitations of the Question Answering System, we propose a Question Similarity mechanism. The possible generated questions are from the state-of-the-art question generation system called ProphetNet [29] and the Question posed is from the SQuAD 2.0 dataset. The Question Similarity mechanism calculates the cosine similarity between the possible generated questions from the given paragraph and the question posed.
3.1 Automatic question-answer pairs generation system
The automatic question-answer pairs generation system uses pre-trained weights of a state-of-the-art question generation system called ProphetNet [29] to generate the questions, and BERT [11] model to generate the answers for the generated questions.
As shown in Fig. 1, first, we provide the passage as input to both the question generation system and answering system. Once the question generation system generates the possible set of questions based on the answer spans, which are found by a noun and verb phrases in the passage, the generated questions are given to the question answering system. The question answering system based on the passage and the set of generated questions generates the answers. Finally, we get the Question-Answer pairs from this system.
3.2 Question similarity mechanism
In addition to automatically generating Question-Answer pairs, if additional questions are posed to the system, such questions are identified either as answerable or unanswerable and irrelevant before passing it to the Question Answering System. To identify the questions, we introduce a mechanism called a Question Similarity mechanism. This mechanism calculates the cosine similarity between the generated questions and the question posed.
As shown in Fig. 2, the passage is initially passed to the Question Generation System to generate the possible set of questions on the given paragraph based on the answer spans derived on the noun and verb phrases.
Let GQ and QP be the set of generated questions and the question posed with |GQ| = m and |QP| = 1. The sentence embeddings for the generated questions is obtained using Universal Sentence Encoder [6], which gives better results than the pre-trained word embeddings such as those produced by GloVe [26] and word2vec [23] and it is given by,
where,
-
\(X^{GQ}_{SE}\) is the set of Sentence Embeddings (SE) for the Generated Questions (GQ), and
-
EGQ is the sentence embeddings for each Generated Question (GQ).
Similarly, we obtain the sentence embeddings for the question posed as
where,
-
\(X_{SE}^{QP}\) is the set of Sentence Embeddings (SE) for the Questions Posed (QP), and
-
EQP is the sentence embeddings for each Question Posed (QP).
The cosine similarity between the generated questions and the question posed is computed as per the (3).
where \(\langle E_{GQ}^{(i)},X_{SE}^{QP}\rangle \) denotes the inner product of \(E_{GQ}^{(i)}\), and \(X_{SE}^{QP}\).
To calculate Question Similarity Score (QSS), we need to identify the question among the generated questions, whose cosine similarity is highest with respect to the posed question. We call it as Highest Similarity Score Question, and it is obtained by (4).
Now, the Question Similarity Score between the generated question (identified as per the (4)) and the question posed is given by,
where \(E_{GQ}^{(j)}\), and \(X^{QP}_{SE}\) are the sentence embeddings for the j th generated questions (as obtained by (4)) and the question posed respectively.
3.3 Question Posed
Question Answering System is posed with several question types. The questions are classified into unanswerable, irrelevant, or answerable
-
Unanswerable: When the context is available in the passage, but the user poses the question in a very complex way, which is unanswerable by the question answering system, this question is labeled an unanswerable question.
-
Irrelevant: When the user poses a question that is out of context with the given passage, this question is labeled as irrelevant.
-
Answerable: It is defined as the question whose context is available in the given passage, and this question is answerable by the question answering system.
3.4 Question similarity score
The question similarity mechanism is used as a question filter to the Question Answering System. This mechanism identifies and filters unanswerable, irrelevant, and answerable questions based on the threshold value. The range of the QSS threshold and the corresponding label of the posed question is given in Table 2.
In our experiment, 1000 questions are chosen for unanswerable questions, irrelevant questions, and answerable questions from the SQuAD 2.0 dataset. We have found that the Irrelevant questions have question similarity scores in the range of 0.00 to 0.50 and unanswerable questions have their question similarity scores in the range 0.50 to 0.80. Further, we experimented to check the question similarity scores for the answerable questions and found that the question similarity scores are in the range of 0.85 to 1.00. So, we set the threshold values to be in the range of 0.00 − 0.50 if the posed question is Irrelevant, 0.50 − 0.85 if the posed question is Unanswerable, and 0.85 − 1.00 if the posed question is Answerable question. If the question posed crosses the threshold value, it is identified as an answerable or relevant question, and it is passed to the question answering system to get the answer to that question. If the question posed does not cross the threshold, then as per the Table 2 it is identified either as irrelevant or unanswerable.
4 Data and Experiments
The following data are used for the experiments:
-
1.
We have used SQuAD 2.0 [32] dataset for our experiments. It consists of 50,000 additional questions to that of SQuAD 1.1 [33] dataset, which has 100,000 answerable questions.
-
2.
The pre-trained weights of state-of-the-art Question Generation System called ProphetNet [29] to generate the questions for a given paragraph.
-
3.
The pre-trained weights of BERT [11] Question Answering System, which is fine-tuned on the SQuAD 1.1 dataset [33].
-
4.
Pre-trained Universal Sentence Encoder (USE) [6] to generate the sentence embeddings for the questions (Tables 3, 4, 5, 6, 7, 8, 9 and 10).
5 Results
5.1 Automatic question-answer pairs generation system
This subsection shows the results produced by the automatic question-answer pairs generation system. We have generated automatic question-answer pairs for 100 passages from the SQuAD 2.0 dataset [7]. Tables 3, 6 and 9 show all possible questions generated from the passages by question generation system. These questions are further given to the question answering system and the passage to generate the answers to the possible generated questions. Table 4, Table 7, and Table 10 show the question-answer pairs generated by automatic question-answer pairs generation system. On manual reading, it is found that the question-answer pairs generated are of good quality (Table 11).
5.2 Question similarity mechanism
This subsection provides the results of the proposed question similarity mechanism. When a question is posed to the question answering system, the question similarity mechanism identifies whether the question posed is answerable or unanswerable and relevant or irrelevant questions. Both unanswerable and irrelevant questions are taken from the SQuAD 2.0 dataset [7] for the experiments.
We have carried out the experiments for random 100 passages from the SQuAD 2.0 dataset [7] with unanswerable and irrelevant questions. As shown in Tables 5, 8 and 12, when the cosine similarity score of generated question and posed question does not exceed the threshold of 0.85, it is marked or labeled either as an unanswerable or irrelevant question. Such a question will not be passed to the Question Answering System. So, the question posed with less than the threshold will not be passed to the Question Answering system. Our proposed question similarity mechanism does not allow the question-answering model to answer the unanswerable or irrelevant questions by incorrect guessing. We also present the Question Similarity Scores for answerable questions from SQuAD 2.0 dataset [7]. We found that the answerable questions get Question Similarity scores above 0.90. We can infer that the question similarity mechanism identifies the questions on par with human judgment.
We have experimented with 1000 questions for both Unanswerable and Irrelevant questions. In our experiments, we have used the BERT model trained on SQuAD 1.1 dataset. BERT model trained on SQuAD 2.0 should not predict answers for the Unanswerable questions. However, this model answers few Unanswerable questions. We introduced the Question Similarity mechanism with the BERT model trained on SQuAD 1.1; this mechanism helps to identify unanswerable and irrelevant questions. Irrelevant questions are not introduced in the SQuAD 2.0 dataset. For a particular passage in SQuAD 2.0 dataset, irrelevant questions are chosen randomly from the different passages. So that the randomly chosen questions will not be related to the context. The efficiency of the model is calculated by,
6 Discussion
The automatic question-answer pairs generation gives an overview of how the question answering system and question generation work as a twin task system to obtain satisfactory results. Also, on manual reading, we can infer that this system generates good question and answer pairs. The question-answer pairs generated to date are confined to generate only ‘wh’ questions and their answers. The majority of the question-answer pairs generation systems are rule-based systems. Whereas our proposed application generates all possible question-answer pairs using a machine learning approach.
In the question similarity mechanism, we show the work’s significance by addressing the Question Answering System’s challenge. Even though the works like [1, 3, 32, 33] introduced different techniques to overcome the limitations of the Question Answering System, the identification of the unanswerable questions remains an open challenge. The proposed Question Similarity mechanism does not require training. It improves the question answering systems’ performance by focusing only on the answerable or relevant questions. By this, we can infer that the Question Similarity mechanism incorporates a human way of reasoning to identify unanswerable and irrelevant questions and hence addresses the limitation of QAS.
7 Conclusion
In this paper, we introduce an application by combining the Question Generation and Question Answering system called automatic question-pairs generation system, where all possible question and answer pairs will be generated. It has got various applications in different fields. Later, we introduce a Question Similarity mechanism that imitates human reasoning to identify whether the question posed is answerable questions or unanswerable and irrelevant questions. The existing question answering systems cannot identify whether the question posed is answerable or unanswerable and irrelevant. If the question posed is unanswerable or irrelevant, then such questions are not passed to the QAS. As there is no training process involved in this model, it requires less computational resources. This mechanism can be included with state-of-the-art Question Answering Systems so that the models can concentrate on answerable questions to improve their performance. The automatically generated question-answer pairs can be used as a dataset to train the Question Answering models.
Code Availability
The code is available on request.
References
Baudiš P, Šedivý J (2015) Modeling of the Question Answering Task in the yodaQA System. In: Mothe J, Savoy J, Kamps J, Pinel-Sauvagnat K, Jones G, San Juan E, Capellato L, Ferro N (eds) Experimental IR meets multilinguality, multimodality, and interaction. Springer International Publishing, Cham, pp 222–228
Benamara F (2004) Cooperative question answering in restricted domains: the WEBCOOP experiment. In: Proceedings of the Conference on Question Answering in Restricted Domains, pp. 31–38. Association for Computational Linguistics, Barcelona. https://www.aclweb.org/anthology/W04-0506
Berant J, Chou A, Frostig R, Liang P (2013) Semantic Parsing on Freebase from Question-Answer Pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1533–1544. Association for Computational Linguistics, Seattle. https://www.aclweb.org/anthology/D13-1160
Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08. Association for Computing Machinery, New York, pp 1247–1250. https://doi.org/10.1145/1376616.1376746
Catelli R, Casola V, De Pietro G, Fujita H, Esposito M (2021) Combining contextualized word representation and sub-document level analysis through Bi-LSTM+CRF architecture for clinical de-identification. Knowl-Based Syst 213:106649. https://doi.org/10.1016/j.knosys.2020.106649, https://www.sciencedirect.com/science/article/pii/S0950705120307784
Cer D, Yang Y, Kong S.y, Hua N, Limtiaco N, St. John R, Constant N, Guajardo-Cespedes M, Yuan S, Tar C, Strope B, Kurzweil R (2018) Universal Sentence Encoder for English. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 169–174. Association for Computational Linguistics, Brussels. https://doi.org/10.18653/v1/D18-2029, https://www.aclweb.org/anthology/D18-2029
Chen D, Fisch A, Weston J, Bordes A (2017) Reading wikipedia to answer Open-Domain questions. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, pp 1870–1879. https://doi.org/10.18653/v1/P17-1171https://doi.org/10.18653/v1/P17-1171
Chen Y, Li H (2020) DAM: Transformer-Based relation detection for Question Answering over Knowledge Base. Knowl-Based Syst 201-202:1–8. https://doi.org/10.1016/j.knosys.2020.106077
Cuteri B, Reale K, Ricca F (2019) A Logic-Based question answering system for cultural heritage. In: Calimeri F, Leone N, Manna M (eds) Logics in artificial intelligence. Springer International Publishing, Cham, pp 526–541
Dehghani M, Azarbonyad H, Kamps J, de Rijke M (2019) Learning to transform, combine, and reason in Open-Domain question answering. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM ’19. Association for Computing Machinery, New York, pp 681–689. https://doi.org/10.1145/3289600.3291012
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training Of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, pp 4171–4186. https://doi.org/10.18653/v1/N19-1423
Di Gennaro G, Buonanno A, Di Girolamo A, Ospedale A, Palmieri FAN (2020) Intent Classification in Question-Answering Using LSTM Architectures. In: Esposito A, Faundez-Zanuy M, Morabito FC (eds) Progresses in Artificial Intelligence and Neural Systems. https://doi.org/10.1007/978-981-15-5093-5_11. Springer Singapore, Singapore, pp 115–124
Dutil F, Gulcehre C, Trischler A, Bengio Y (2017) Plan, Attend, Generate: Planning for Sequence-to-Sequence Models. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17. Curran Associates Inc., Red Hook, pp 5480–5489
Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H (2020) Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Information Sciences 514:88–105. https://doi.org/10.1016/j.ins.2019.12.002
Ferrucci D, Nyberg E, Allan J, Barker K, Brown EW, Chu-Carroll J, Ciccolo AC, Duboué PA, Fan J, Gondek DC, Hovy E, Katz B, Lally A, McCord M, Morarescu P, Murdock B, Porter B, Prager JM, Strzalkowski T, Welty C, Zadrozny W (2009) IBM Research report towards the open advancement of question answering systems. Tech. Rep. RC24789 (w0904-093) IBM
Green BF, Wolf AK, Chomsky C, Laughery K (1961) Baseball: An Automatic Question-Answerer. In: Papers Presented at the May 9-11, 1961, Western Joint IRE-AIEE-ACM Computer Conference, IRE-AIEE-ACM ’61 (Western), pp 219–224. Association for Computing Machinery, New York. https://doi.org/10.1145/1460690.1460714https://doi.org/10.1145/1460690.1460714
Hermjakob E, Hovy U, Gerber L, Junk M, Lin CY (2000) Question answering in webclopedia. In: Proceedings of the TREC-9 conference, NIST, Gaithersburg, pp 1–10
Khurana D, Koli A, Khatter K, Singh S (2017) Natural language processing: State of the art. Current Trends and Challenges. arXiv:1708.05148
Krueger D, Maharaj T, Kramȧr J, Pezeshki M, Ballas N, Ke NR, Goyal A, Bengio Y, Courville AC, Pal C (2017) Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations. In: Proceedings of the 5th International Conference on Learning Representations, ICLR, Toulon, pp 1–11
Kumar V, Muneeswaran S, Ramakrishnan G, Li YF (2019) ParaQG: A System for Generating Questions and Answers from Paragraphs. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, pp. 175–180. Association for Computational Linguistics, Hong Kong. https://doi.org/10.18653/v1/D19-3030, https://www.aclweb.org/anthology/D19-3030
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In: Proceedings of Eighth International Conference on Learning Representation (ICLR), Addis Ababa, pp 1–17. https://iclr.cc/virtual_2020/poster_H1eA7AEtvS.html
Merity S, Keskar NS, Socher R (2018) Regularizing and optimizing LSTM language models. In: Proceedings of the 6th International Conference on Learning Representations, ICLR, Vancouver, pp 1–10
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed Representations of Words and Phrases and their Compositionality. In: Burges C J C, Bottou L, Welling M, Ghahramani Z, Weinberger K Q (eds) Advances in neural information processing systems, vol 26, Curran Associates, Inc, pp 3111–3119
Mishra A, Jain SK (2016) A survey on question answering systems with classification. J King Saud Univ-Comput Inf Sci 28(3):345–361
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research. http://proceedings.mlr.press/v28/pascanu13.html, vol 28. PMLR, Atlanta, pp 1310–1318
Pennington J, Socher R, Manning C (2014) GloVe: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha. https://doi.org/10.3115/v1/D14-1162. https://www.aclweb.org/anthology/D14-1162
Pota M, Esposito M, De Pietro G, Fujita H (2020) Best Practices of Convolutional Neural Networks for Question Classification. Appl Sci 10(14). https://doi.org/10.3390/app10144710, https://www.mdpi.com/2076-3417/10/14/4710
Pota M, Marulli F, Esposito M, De pietro G, Fujita H (2019) Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings. Knowledge-Based Sys. 164:309–323. https://doi.org/10.1016/j.knosys.2018.11.003
Qi W, Yan Y, Gong Y, Liu D, Duan N, Chen J, Zhang R, Zhou M (2020) Prophetnet: Predicting Future N-gram for Sequence-to-Sequence Pre-training In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, Online, pp 2401–2410. https://doi.org/10.18653/v1/2020.findings-emnlp.217
Qiao C, Hu X (2020) A neural knowledge graph evaluator: Combining structural and semantic evidence of knowledge graphs for predicting supportive knowledge in scientific QA. Inf Process Manag 57(6):102309. https://doi.org/10.1016/j.ipm.2020.102309
Rajpurkar P (2020) Performance of Unanswerble questions in SQUAD 2.0. https://rajpurkar.github.io/SQuAD-explorer/explore/v2.0/dev/ (2020) [Online; accessed 10
Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for SQuAD. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Melbourne, pp 784–789. https://doi.org/10.18653/v1/P18-2124, https://www.aclweb.org/anthology/P18-2124
Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) SQUAD: 100,000+ Questions for Machine Comprehension of Text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, pp 2383–2392. https://doi.org/10.18653/v1/D16-1264
Ray A, Christie G, Bansal M, Batra D, Parikh D (2016) Question relevance in VQA: identifying Non-Visual and False-Premise questions. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, pp 919–924. https://doi.org/10.18653/v1/D16-1090
Reddy S, Raghu D, Khapra MM, Joshi S (2017) Generating Natural Language Question-Answer Pairs from a Knowledge Graph Using a RNN Based Question Generation Model. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Association for Computational Linguistics, Valencia, pp 376–385. https://www.aclweb.org/anthology/E17-1036
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing, pp 1–5. Vancouver. https://www.emc2-ai.org/assets/docs/neurips-19/emc2-neurips19-paper-33.pdf
Serdyuk D, Ke NR, Sordoni A, Trischler A, Pal C, Bengio Y (2018) Twin networks: Matching the future for sequence generation. In: Proceedings of the 6th International Conference on Learning Representations, ICLR, Vancouver, pp 1–12
Song J, Liu F, Ding K, Du K, Zhang X (2020) Semantic comprehension of questions in q& a system for chinese language based on semantic element combination. IEEE Access 8:102971–102981. https://doi.org/10.1109/ACCESS.2020.2997958
Sun Y, Tang D, Duan N, Qin T, Liu S, Yan Z, Zhou M, Lv Y, Yin W, Feng X, Qin B, Liu T (2020) Joint learning of question answering and question generation. IEEE Trans Knowl Data Eng 32(5):971–982
Weizenbaum J (1966) ELIZA-A computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45
Winograd T (1972) Understanding natural language. Cogn Psychol 3(1):1–191. https://doi.org/10.1016/0010-0285(72)90002-3
Woods WA, Kaplan R (1977) Lunar rocks in natural English: Explorations in natural language question answering. In: Zampolli A (ed) linguistic structures processing, fundamental studies in computer science. North-holland publishing company, pp 266–290
Yang W, Xie Y, Lin A, Li X, Tan L, Xiong K, Li M, Lin J (2019) End-to-end Open-Domain Question Answering with BERTserini. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Association for Computational Linguistics, Minneapolis, pp 72–77. https://doi.org/10.18653/v1/N19-4013
Ye Y, Zhang S, Li Y, Qian X, Tang S, Pu S, Xiao J (2020) Video question answering via grounded cross-attention network learning. Inf Process Manag 57(4):102265. https://doi.org/10.1016/j.ipm.2020.102265
Zahedi M, Rahgozar M, Zoroofi R (2020) HCA: Hierarchical Compare Aggregate model for question retrieval in community question answering. Inf Process sManag 57(6):102318. https://doi.org/10.1016/j.ipm.2020.102318
Zhu H, Dong L, Wei F, Wang W, Qin B, Liu T (2019) Learning to Ask Unanswerable Questions for Machine Reading Comprehension. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, pp 4238–4248. https://doi.org/10.18653/v1/P19-1415, https://www.aclweb.org/anthology/P19-1415
Acknowledgements
The authors would like to thank Dr.Harishchandra Hebbar from the School of Information Science (SOIS), Manipal, for providing access to the GPU-based computing facility.
We thank the anonymous reviewers whose insightful comments and suggestions have significantly improved this paper.
Funding
Open access funding provided by Manipal Academy of Higher Education, Manipal. For this study, we have not sought any funding from any agency.
Author information
Authors and Affiliations
Contributions
All authors have contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Shivani G Aithal and Abishek B Rao. Shivani G Aithal wrote the first draft of the manuscript, Abishek B Rao, and all authors commented on previous versions of the manuscript. Sanjay Singh did the supervision, reviewing, and editing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no conflict of interest.
Additional information
Availability of data and material
For this study, we have used the dataset available in the public domain. The source of the dataset is cited in the paper.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aithal, S.G., Rao, A.B. & Singh, S. Automatic question-answer pairs generation and question similarity mechanism in question answering system. Appl Intell 51, 8484–8497 (2021). https://doi.org/10.1007/s10489-021-02348-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02348-9