Keywords

1 Introduction

In the growing technology world, the importance of data is increasing. Question and answer systems have been developed for the growth of the data, the extraction of the desired information from the data and the processing of this information. Question answering (QA) is the system that takes a certain query input from the user and brings the closest answer to this query over the desired data.

QA consists of various systems such as search engine, chatbot. These systems vary according to needs. At first, search engines would only return documents containing information related to queries created by users in natural language, but over time, it is desired to return a direct answer to the user’s question along with the documents and the needs are increasing. Question answering systems consist of research areas such as Information Retrieval (IR), Answer Extraction (AE), and Natural Language Processing (NLP). Different studies, methods and datasets have been published in the field of QA. To this end, a comprehensive picture of the current state of QA is requested.

In this study, our purpose is to analyze the studies conducted between 2000 and 2022 in the field of QA. These analyzes are prepared on the methods used, the most used techniques, and datasets. The sections of this article are determined as follows. In Sect. 2, research methods are described. The criteria and results determined for the research questions are given in Sect. 3. In the last section, the summary of this study is given.

2 Methodology

2.1 Review Method

A systematic approach is chosen when conducting a literature search on question answering systems. Systematic literature reviews are well established method of review in question answering. In a systematic literature review, it can be defined as examining all the necessary research in a subject area and drawing conclusions [1]. This systematic literature review was prepared according to the criteria suggested by Kitchenham and Charters (2007). Some of the works and figures in this section have also been adapted by (Radjenović, Heričko, Torkar,Živkovič, 2013) [2], (Unterkalmsteiner et al. 2012) [3] and Wahono [4].

Fig. 1.
figure 1

Systematic literature review steps

As shown in Fig. 1, Srl work consists of certain stages. These stages are planning, executing and reporting. In the planning stage, the needs are determined.

In the introduction part, realization targets are mentioned. Then, existing slr studies on question answering are collected and reviewed. The purpose of this review is designed to reduce researcher bias when conducting the slr study (Step 2). Research questions, search strategy, inclusion and exclusion criteria, study process, data extraction are described in Sects. 2.2, 2.3, 2.4 and 2.5.

2.2 Research Questions

The research questions studied in this review are indicated in Table 1.

Table 1. Identified research questions

The methods and datasets used in the question answering area shown in Table 1 from RQ1 to RQ7 were analyzed. Important methods, datasets are analyzed between RQ4 and RQ7. It gives a summary of the work done in the field of question answering from RQ1 to RQ3.

2.3 Search Strategy

The search process (Step 4) consists of several stages. Determination of digital libraries, determination of search keywords, development of search queries and final studies that match the search query from digital libraries are extracted. In order to select the most relevant articles, first of all, appropriate database sets are determined. The most popular literature database sets are researched and selected in order to keep our field of study wide. Digital databases used: ACM Digital Library, IEEE eXplore, ScienceDirect, Springer, Scopus

The search query is determined according to certain criteria. These criteria are;

  1. 1.

    Search terms were determined from the research questions

  2. 2.

    Searching the generated query in related titles, abstracts and keywords

  3. 3.

    Identifying different spellings, synonyms and opposite meaning of query

  4. 4.

    A comprehensive search string was created using the specified search terms

Boolean AND and OR. The generated search string is as follows.

(“question answering” AND “natural language processing”) AND (“information retrieval”) AND (“Document Retrieval” OR “Passage Retrieval” OR “Answer Extraction”)

Digital databases were scanned based on keywords, titles and abstracts. The search limited publications between 2000 and 2022. Within the scope of the research, only journal articles and conference papers published in English were included in the search.

2.4 Study Selection

Inclusion and exclusion criteria specified in Table 2 are shown in order to determine the final studies.

Table 2. Inclusion and exclusion criteria

Figure 2 shows each step of the review process and the number determined. The study selection process was carried out in 2 steps. Title, abstract and full-text studies have been removed. Literature studies and studies that did not include experimental results were also excluded. Other studies were included according to the degree of similarity with question answering from the remaining studies.

Fig. 2.
figure 2

Search and selection of final studies

In the first stage, the final list was selected. The final list includes 91 final studies. Considering the inclusion and exclusion criteria of 91 final studies, research questions and study similarities were examined.

2.5 Data Extraction

In the final study, our goal is to identify studies that contribute to the research questions. A data extraction form was created for each of the 91 final studies. This form was designed to collect information on studies and to answer research questions. In Table 3, five features were used to analyze the research questions.

Table 3. Data extraction features matched to research questions

2.6 Threats to Validity of Research

Some conference papers and journal articles were omitted because it is difficult to manually review all article titles during the literature review.

3 Analysis Results

3.1 Important Journal Publications

In this literature study, there are 91 final studies in the field of question answering. Depending on the final studies, we showed the numerical change of the studies in the field of question answering over the years. Our aim here is to see how the interest has changed over the years. Observation by years is shown in Fig. 3. It is observed that the interest in the field of question answering has increased more since 2005 and it shows that the studies carried out are more contemporary.

Fig. 3.
figure 3

Distribution of selected studies over the years

The most important journals included in this literature study are shown in Fig. 4.

Fig. 4.
figure 4

Journal publications

The Scimago Journal Rank (SJR) values of the most important journals with final studies are given in Table 4.

Table 4. SJR of journals
Fig. 5.
figure 5

Influential researchers and number of studies

3.2 Most Active Researchers

The researchers who are most active in the field of question answering are shown in Fig. 5 according to the number of studies. Boris Katz, Yuan-ping Nie, Mourad Sarrouti, SaidOuatik El Alaoui, Prodromos Malakasiotis, Ion Androutsopoulos, Paolo Rosso, Stefanie Tellex, Aaron Fernandes, Gregory Marton, Dragomir Radev, Weiguo Fan, Davide Buscaldi, Emilio Sanchis, Dietrich Klakow, Matthew W. Bilottiare are the most active researchers.

3.3 Research Topics in the Question Answering Field

To answer this question, we considered Yao’s classification paper. When the final studies were examined, it was seen that the studies were carried out on four topics [5].

  1. 1.

    Natural Language Processing based (NLP): Machine learning, NLP techniques are used to extract the answers.

  2. 2.

    Information Retrieval based (IR): It deals with the retrieval or sorting of answers, documents and passages in search engine usage.

  3. 3.

    Knowledge Base based (KB): Finding answers is done through structured data. Standard database queries are used in replacement of word-based searches [6].

  4. 4.

    Hybrid Based: A hybrid approach is the combination of IR, NLP and KB.

Figure 6 shows the total distribution of research topics on question answering from 2000 until 2022. From the 91 studies, 6.72% of the papers implemented a knowledge base, 31.94% implemented a natural language processing, 59.24% implemented an information retrieval and 2.1% implemented a Hybrid. When the final studies are examined, it is seen that there are more studies in the field of NLP. As the reasons why researchers focus on this issue, studies on obtaining information through the search engine are increasing. A lot of text nlp and machine learning techniques have been tried to be applied in order to extract the most correct answer from the unstructured data.

Fig. 6.
figure 6

Ratio of subjects

3.4 Datasets Used for Question Answering

Dataset is a data collection on which machine learning is applied [6]. The training set is the data on which the model is trained by giving it to the learning system. The test set or evaluation set is a dataset used to evaluate the model developed on a training set.

The distribution of datasets by years is presented. 35.95% of the studies are private datasets. Since these datasets are not public, the results of the studies cannot be compared with the results of the proposed models. The distribution of final studies by years is shown in Fig. 7. Looking at the distribution, there is an increasing awareness of the use of public data.

Fig. 7.
figure 7

Number of datasets

3.5 Methods Used in Question Answering

As can be seen in Fig. 8, fourteen methods used and recommended in the field of question answering since 2000 have been determined. These determined methods are shown in Fig. 8.

Fig. 8.
figure 8

Methods used in question answering

3.6 Best Method Used for Question Answering

Many studies have been carried out in the field of question answering. When the literature is examined, there is a pipeline process consisting of Natural Language Processing (NLP), Information Retrieval (IR), and Answer Extraction (IE). A question given in natural language first goes through the analysis phase. In other words, search queries are created to facilitate document retrieval, which is the next step. When the literature is examined, it is seen that the first studies used mostly classical methods such as tf-idf, bm25 [8,9,10] in the retrieval phase. Here, retrieval is provided by searching for words similar to the query received by the user as input.

When we look at other studies, one of the most used methods is the name entity recognition(ner) and post tagging methods. It has been observed that success in the retrieval phase increases thanks to semantic role labeling with these methods [11,12,13]. It is seen that support vector machine (SVM) is used as the other classical method classifier. Here, the category to which the query belongs is the classifier that performs document retrieval over that category. Semantic capture was improved with SVM [9, 14].

The disadvantage of classical methods is that the query is misspelled or fails to find semantically similar words. When we examine the literature, we observe that deep learning studies have increased in recent years. When we examine the studies using deep learning, we see that more successful results are obtained than the classical methods (Chen, Y.,) (Pappas, D.) (X. Zhang,) (Lin, H.) (Nie P.) [15,16,17,18]. The advantage of deeplearning is that words are captured in semantic and misspelled words. In this way, most of the studies in the field of question answering in recent years are on deep learning.

4 Conclusion and Future Works

In this systematic literature study, our goal is to analyze and summarize the trends, datasets and methods used in the studies in the field of question answering between 2000–2022. According to the inclusion and exclusion criteria, 91 final studies were determined.

When the studies in the literature are examined, problems such as noisy data, performance and success rates have been dealt with and these problems are still among the subjects that are open to research. In the analysis of selected final studies, it was determined that the current question answering research focused on four topics: KB, IR, NLP, Hybrid Base. When the studies in the field of question answering are examined, 6.72% of the topics are KB topics, 31.94% are IR topics, 59.24% are NLP topics and 2.10% are Hybrid base. In addition, 65.05% of the studies were used as public datasets and 34.95% as private datasets. Fourteen different methods were used for question answering. Among the fourteen methods, seven most applied methods were determined in the field of question answering. These are relation finding(similarity distance), parsing, ner, Tokenize, deep learning, post tagging, graph. Using some of these techniques, the researchers proposed some techniques to improve accuracy in the QA field.