Abstract
In this paper, we give the overview of the open domain Question Answering (or open domain QA) shared task in the NLPCC-ICCPOL 2016. We first review the background of QA, and then describe two open domain Chinese QA tasks in this year’s NLPCC-ICCPOL, including the construction of the benchmark datasets and the evaluation metrics. The evaluation results of submissions from participating teams are presented in the experimental part.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Background
Question Answering (or QA) is a fundamental task in Artificial Intelligence, whose goal is to build a system that can automatically answer natural language questions. In the last decade, the development of QA techniques has been greatly promoted by both academic field and industry field.
In the academic field, with the rise of large scale curated knowledge bases, like Yago, Satori, Freebase, etc., more and more researchers pay their attentions to the knowledge-based QA (or KBQA) task, such as semantic parsing-based approaches [1–7] and information retrieval-based approaches [8–16]. Besides KBQA, researchers are interested in document-based QA (or DBQA) as well, whose goal is to select answers from a set of given documents and use them as responses to natural language questions. Usually, information retrieval-based approaches [18–22] are used for the DBQA task.
In the industry field, many influential QA-related products have been built, such as IBM Watson, Apple Siri, Google Now, Facebook Graph Search, Microsoft Cortana and XiaoIce etc. These kinds of systems are immerging into every user’s life who is using mobile devices.
Under such circumstance, in this year’s NLPCC-ICCPOL shared task, we call the open domain QA task that cover both KBQA and DBQA tasks. Our motivations are two-folds:
-
1.
We expect this activity can enhance the progress of QA research, esp. for Chinese;
-
2.
We encourage more QA researchers to share their experiences, techniques, and progress.
The remainder of this paper is organized as follows. Section 1 describes two open domain Chinese QA tasks. In Sect. 2, we describe the benchmark datasets constructed. Section 3 describes evaluation metrics, and Sect. 4 presents the evaluation results of different submissions. We conclude the paper in Sect. 5, and point out our plan on future QA evaluation activities.
2 Task Description
The NLPCC-ICCPOL 2016 open domain QA shared task includes two QA tasks for Chinese language: knowledge-based QA (KBQA) task and document-based QA (DBQA) task.
2.1 KBQA Task
Given a question, a KBQA system built by each participating team should select one or more entities as answers from a given knowledge base (KB). The datasets for this task include:
-
A Chinese KB. It includes knowledge triples crawled from the web. Each knowledge triple has the form: <Subject, Predicate, Object>, where ‘Subject’ denotes a subject entity, ‘Predicate’ denotes a relation, and ‘Object’ denotes an object entity. A sample of knowledge triples is given in Fig. 1, and the statistics of the Chinese KB is given in Table 1.
-
A training set and a testing set. We assign a set of knowledge triples sampled from the Chinese KB to human annotators. For each knowledge triple, a human annotator will write down a natural language question, whose answer should be the object entity of the current knowledge triple. The statistic of labeled QA pairs and an annotation example are given in Table 2:
In KBQA task, any data resource can be used to train necessary models, such as entity linking, semantic parsing, etc., but answer entities should come from the provided KB only.
2.2 DBQA Task
Given a question and its corresponding document, a DBQA system built by each participating team should select one or more sentences as answers from the document. The datasets for this task include:
-
A training set and a testing set. We assign a set of documents to human annotators. For each document, a human annotator will (1) first, select a sentence from the document, and (2) then, write down a natural language question, whose answer should be the selected sentence. The statistic of labeled QA pairs and an annotation example are given in Table 3:
As shown in the example in Table 3, a question (the 1st column), question’s corresponding document sentences (the 2nd column), and their answer annotations (the 3rd column) are provided. If a document sentence is the correct answer of the question, its annotation will be 1, otherwise its annotation will be 0. The three columns will be separated by the symbol ‘\t’.
In DBQA task, any data resource can be used to train necessary models, such as paraphrasing model, sentence matching model, etc., but answer sentences should come from the provided documents only.
3 Evaluation Metrics
The quality of a KBQA system is evaluated by Averaged F1, and the quality of a DBQA system is evaluated by MRR, MAP, and ACC@1.
-
Averaged F1
\( F_{i} \) denotes the F1 score for question \( Q_{i} \) computed based on \( C_{i} \) and \( A_{i} \). \( F_{i} \) is set to 0 if \( C_{i} \) is empty or doesn’t overlap with \( A_{i} \). Otherwise, \( F_{i} \) is computed as follows:
where \( \# (C_{i} ,A_{i} ) \) denotes the number of answers occur in both \( C_{i} \) and \( A_{i} \). \( |C_{i} | \) and \( |A_{i} | \) denote the number of answers in \( C_{i} \) and \( A_{i} \) respectively.
-
MRR
\( |Q| \) denotes the total number of questions in the evaluation set, \( rank_{i} \) denotes the position of the first correct answer in the generated answer set \( C_{i} \) for the \( i^{th} \) question \( Q_{i} \). If \( C_{i} \) doesn’t overlap with the golden answers \( A_{i} \) for \( Q_{i} \), \( \frac{1}{{rank_{i} }} \) is set to 0.
-
MAP
\( AveP\left( {C, A} \right) = \frac{{\mathop \sum \nolimits_{k = 1}^{n} \left( {P\left( k \right) \cdot rel\left( k \right)} \right)}}{min(m,n) } \) denotes the average precision. \( k \) is the rank in the sequence of retrieved answer sentences. \( m \) is the number of correct answer sentences. \( n \) is the number of retrieved answer sentences. If \( min(m,n) \) is 0, \( AveP\left( {C, A} \right) \) is set to 0. \( P\left( k \right) \) is the precision at cut-off \( k \) in the list. \( rel\left( k \right) \) is an indicator function equaling 1 if the item at rank \( k \) is an answer sentence, and 0 otherwise.
-
ACC@N
\( \delta (C_{i} , A_{i} ) \) equals to 1 when there is at least one answer contained by \( C_{i} \) occurs in \( A_{i} \), and 0 otherwise.
5 Conclusion
This paper briefly introduces the overview of this year’s two open domain Chinese QA shared tasks. Comparing to last year’s results (19 teams registered and only 3 teams submitted final submissions), in this year, we have 99 teams registered and 39 teams submitted final submissions, which has been a great progress for the Chinese QA community. In the future, we plan to provide more QA datasets and call for new QA tasks for Chinese. Besides, we plan to extend the QA tasks from Chinese to English as well.
References
Wang, Y., Berant, J., Liang, P.: Building a semantic parser overnight. In: ACL (2015)
Pasupat, P., Liang, P.: Compositional semantic parsing on semi-structured tables. In: ACL (2015)
Pasupat, P., Liang, P.: Zero-shot entity extraction from web pages. In: ACL (2014)
Bao, J., Duan, N., Zhou, M., Zhao, T.: Knowledge-based question answering as machine translation. In: ACL (2014)
Yang, M.-C., Duan, N., Zhou, M., Rim, H.-C.: Joint relational embeddings for knowledge-based question answering. In: EMNLP (2014)
Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: EMNLP (2013)
Kwiatkowski, T., Choi, E., Artzi, Y., Zettlemoyer, L.: Scaling semantic parsers with on-the-fly ontology matching. In: EMNLP (2013)
Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory network. In: ICLR (2015)
Weston, J., Bordes, A., Chopra, S., Mikolov, T.: Towards AI-complete question answering: a set of prerequisite toy tasks, arXiv (2015)
Dong, L., Wei, F., Zhou, M., Xu, K.: Question answering over freebase with multi-column convolutional neural networks. In: ACL (2015)
Yih, W.-T., Chang, M.-W., He, X., Gao, J.: Semantic parsing via staged query graph generation: question answering with knowledge base. In: ACL (2015)
Yao, X.: Lean question answering over freebase from scratch. In: NAACL (2015)
Berant, J., Liang, P.: Semantic parsing via paraphrasing. In: ACL (2014)
Yao, X., Van Durme, B.: Information extraction over structured data: question answering with freebase. In: ACL (2014)
Bordes, A., Weston, J., Chopra, S.: Question answering with subgraph embeddings. In: EMNLP (2014)
Bordes, A., Weston, J., Usunier, N.: Open question answering with weakly supervised embedding models. In: ECML-PKDD (2014)
Yang, Y., Yih, W.-T., Meek, C.: WIKIQA: a challenge dataset for open-domain question answering. In: EMNLP (2015)
Miao, Y., Yu, L., Blunsom, P.: Neural variational inference for text processing, arXiv (2015)
Wang, D., Nyberg, E.: A long short term memory model for answer sentence selection in question answering. In: ACL (2015)
Yin, W., Schütze, H., Xiang, B., Zhou, B.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. In: ACL (2016)
Yu, L., Hermann, K.M., Blunsom, P., Pullman, S.: Deep learning for answer sentence selection. In: NIPS Workshop (2014)
Yan, Z., Duan, N., Bao, J., Chen, P., Zhou, M., Li, Z., Zhou, J.: DocChat: an information retrieval approach for chatbot engines using unstructured documents. In: ACL (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Duan, N. (2016). Overview of the NLPCC-ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_89
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_89
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)