Overview of the NLPCC-ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering

Duan, Nan

doi:10.1007/978-3-319-50496-4_89

Nan Duan¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10102))

Included in the following conference series:

5213 Accesses
12 Citations

Abstract

In this paper, we give the overview of the open domain Question Answering (or open domain QA) shared task in the NLPCC-ICCPOL 2016. We first review the background of QA, and then describe two open domain Chinese QA tasks in this year’s NLPCC-ICCPOL, including the construction of the benchmark datasets and the evaluation metrics. The evaluation results of submissions from participating teams are presented in the experimental part.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Overview of the NLPCC 2017 Shared Task: Open Domain Chinese Question Answering

Overview of the NLPCC 2015 Shared Task: Open Domain QA

Overview of the NLPCC 2018 Shared Task: Open Domain QA

Keywords

1 Background

Question Answering (or QA) is a fundamental task in Artificial Intelligence, whose goal is to build a system that can automatically answer natural language questions. In the last decade, the development of QA techniques has been greatly promoted by both academic field and industry field.

In the academic field, with the rise of large scale curated knowledge bases, like Yago, Satori, Freebase, etc., more and more researchers pay their attentions to the knowledge-based QA (or KBQA) task, such as semantic parsing-based approaches [1–7] and information retrieval-based approaches [8–16]. Besides KBQA, researchers are interested in document-based QA (or DBQA) as well, whose goal is to select answers from a set of given documents and use them as responses to natural language questions. Usually, information retrieval-based approaches [18–22] are used for the DBQA task.

In the industry field, many influential QA-related products have been built, such as IBM Watson, Apple Siri, Google Now, Facebook Graph Search, Microsoft Cortana and XiaoIce etc. These kinds of systems are immerging into every user’s life who is using mobile devices.

Under such circumstance, in this year’s NLPCC-ICCPOL shared task, we call the open domain QA task that cover both KBQA and DBQA tasks. Our motivations are two-folds:

1.
We expect this activity can enhance the progress of QA research, esp. for Chinese;
2.
We encourage more QA researchers to share their experiences, techniques, and progress.

The remainder of this paper is organized as follows. Section 1 describes two open domain Chinese QA tasks. In Sect. 2, we describe the benchmark datasets constructed. Section 3 describes evaluation metrics, and Sect. 4 presents the evaluation results of different submissions. We conclude the paper in Sect. 5, and point out our plan on future QA evaluation activities.

2 Task Description

The NLPCC-ICCPOL 2016 open domain QA shared task includes two QA tasks for Chinese language: knowledge-based QA (KBQA) task and document-based QA (DBQA) task.

2.1 KBQA Task

Given a question, a KBQA system built by each participating team should select one or more entities as answers from a given knowledge base (KB). The datasets for this task include:

A Chinese KB. It includes knowledge triples crawled from the web. Each knowledge triple has the form: <Subject, Predicate, Object>, where ‘Subject’ denotes a subject entity, ‘Predicate’ denotes a relation, and ‘Object’ denotes an object entity. A sample of knowledge triples is given in Fig. 1, and the statistics of the Chinese KB is given in Table 1.
Fig. 1.
An example of the Chinese KB.
Full size image

Table 1. Statistics of the Chinese KB.
Full size table
A training set and a testing set. We assign a set of knowledge triples sampled from the Chinese KB to human annotators. For each knowledge triple, a human annotator will write down a natural language question, whose answer should be the object entity of the current knowledge triple. The statistic of labeled QA pairs and an annotation example are given in Table 2:
Table 2. Statistics of the KBQA datasets.
Full size table

In KBQA task, any data resource can be used to train necessary models, such as entity linking, semantic parsing, etc., but answer entities should come from the provided KB only.

2.2 DBQA Task

Given a question and its corresponding document, a DBQA system built by each participating team should select one or more sentences as answers from the document. The datasets for this task include:

A training set and a testing set. We assign a set of documents to human annotators. For each document, a human annotator will (1) first, select a sentence from the document, and (2) then, write down a natural language question, whose answer should be the selected sentence. The statistic of labeled QA pairs and an annotation example are given in Table 3:
Table 3. Statistics of the DBQA datasets.
Full size table

As shown in the example in Table 3, a question (the 1^st column), question’s corresponding document sentences (the 2^nd column), and their answer annotations (the 3^rd column) are provided. If a document sentence is the correct answer of the question, its annotation will be 1, otherwise its annotation will be 0. The three columns will be separated by the symbol ‘\t’.

In DBQA task, any data resource can be used to train necessary models, such as paraphrasing model, sentence matching model, etc., but answer sentences should come from the provided documents only.

3 Evaluation Metrics

The quality of a KBQA system is evaluated by Averaged F1, and the quality of a DBQA system is evaluated by MRR, MAP, and ACC@1.

Averaged F1

$$ Averaged F1 = \frac{1}{|Q|}\mathop \sum \limits_{i = 1}^{|Q|} F_{i} $$

$ F_{i} $ denotes the F1 score for question $ Q_{i} $ computed based on $ C_{i} $ and $ A_{i} $. $ F_{i} $ is set to 0 if $ C_{i} $ is empty or doesn’t overlap with $ A_{i} $. Otherwise, $ F_{i} $ is computed as follows:

$$ F_{i} = \frac{{2.\frac{{\# (C_{i} ,A_{i} )}}{{|C_{i} |}}.\frac{{\# (C_{i} ,A_{i} )}}{{|A_{i} |}}}}{{\frac{{\# (C_{i} ,A_{i} )}}{{|C_{i} |}} + \frac{{\# (C_{i} ,A_{i} )}}{{|A_{i} |}}}} $$

where $ \# (C_{i} ,A_{i} ) $ denotes the number of answers occur in both $ C_{i} $ and $ A_{i} $. $ |C_{i} | $ and $ |A_{i} | $ denote the number of answers in $ C_{i} $ and $ A_{i} $ respectively.

MRR

$$ MRR = \frac{1}{|Q|}\mathop \sum \limits_{i = 1}^{|Q|} \frac{1}{{rank_{i} }} $$

$ |Q| $ denotes the total number of questions in the evaluation set, $ rank_{i} $ denotes the position of the first correct answer in the generated answer set $ C_{i} $ for the $ i^{th} $ question $ Q_{i} $. If $ C_{i} $ doesn’t overlap with the golden answers $ A_{i} $ for $ Q_{i} $, $ \frac{1}{{rank_{i} }} $ is set to 0.

MAP

$$ MAP = \frac{1}{|Q|}\mathop \sum \limits_{i = 1}^{|Q|} AveP(C_{i} , A_{i} ) $$

$ AveP\left( {C, A} \right) = \frac{{\mathop \sum \nolimits_{k = 1}^{n} \left( {P\left( k \right) \cdot rel\left( k \right)} \right)}}{min(m,n) } $ denotes the average precision. $ k $ is the rank in the sequence of retrieved answer sentences. $ m $ is the number of correct answer sentences. $ n $ is the number of retrieved answer sentences. If $ min(m,n) $ is 0, $ AveP\left( {C, A} \right) $ is set to 0. $ P\left( k \right) $ is the precision at cut-off $ k $ in the list. $ rel\left( k \right) $ is an indicator function equaling 1 if the item at rank $ k $ is an answer sentence, and 0 otherwise.

ACC@N

$$ Accuracy@N = \frac{1}{|Q|}\mathop \sum \limits_{i = 1}^{|Q|} \delta (C_{i} , A_{i} ) $$

$ \delta (C_{i} , A_{i} ) $ equals to 1 when there is at least one answer contained by $ C_{i} $ occurs in $ A_{i} $, and 0 otherwise.

4 Evaluation Results

There are totally 99 teams registered for the above two Chinese QA task, and 39 teams submitted their results. Tables 4 and 5 lists the evaluation results of KBQA and DBQA tasks respectively.

Table 4. Evaluation results of the KBQA task.

Full size table

Table 5. Evaluation results of the DBQA task.

Full size table

5 Conclusion

This paper briefly introduces the overview of this year’s two open domain Chinese QA shared tasks. Comparing to last year’s results (19 teams registered and only 3 teams submitted final submissions), in this year, we have 99 teams registered and 39 teams submitted final submissions, which has been a great progress for the Chinese QA community. In the future, we plan to provide more QA datasets and call for new QA tasks for Chinese. Besides, we plan to extend the QA tasks from Chinese to English as well.

References

Wang, Y., Berant, J., Liang, P.: Building a semantic parser overnight. In: ACL (2015)
Google Scholar
Pasupat, P., Liang, P.: Compositional semantic parsing on semi-structured tables. In: ACL (2015)
Google Scholar
Pasupat, P., Liang, P.: Zero-shot entity extraction from web pages. In: ACL (2014)
Google Scholar
Bao, J., Duan, N., Zhou, M., Zhao, T.: Knowledge-based question answering as machine translation. In: ACL (2014)
Google Scholar
Yang, M.-C., Duan, N., Zhou, M., Rim, H.-C.: Joint relational embeddings for knowledge-based question answering. In: EMNLP (2014)
Google Scholar
Berant, J., Chou, A., Frostig, R., Liang, P.: Semantic parsing on freebase from question-answer pairs. In: EMNLP (2013)
Google Scholar
Kwiatkowski, T., Choi, E., Artzi, Y., Zettlemoyer, L.: Scaling semantic parsers with on-the-fly ontology matching. In: EMNLP (2013)
Google Scholar
Bordes, A., Usunier, N., Chopra, S., Weston, J.: Large-scale simple question answering with memory network. In: ICLR (2015)
Google Scholar
Weston, J., Bordes, A., Chopra, S., Mikolov, T.: Towards AI-complete question answering: a set of prerequisite toy tasks, arXiv (2015)
Google Scholar
Dong, L., Wei, F., Zhou, M., Xu, K.: Question answering over freebase with multi-column convolutional neural networks. In: ACL (2015)
Google Scholar
Yih, W.-T., Chang, M.-W., He, X., Gao, J.: Semantic parsing via staged query graph generation: question answering with knowledge base. In: ACL (2015)
Google Scholar
Yao, X.: Lean question answering over freebase from scratch. In: NAACL (2015)
Google Scholar
Berant, J., Liang, P.: Semantic parsing via paraphrasing. In: ACL (2014)
Google Scholar
Yao, X., Van Durme, B.: Information extraction over structured data: question answering with freebase. In: ACL (2014)
Google Scholar
Bordes, A., Weston, J., Chopra, S.: Question answering with subgraph embeddings. In: EMNLP (2014)
Google Scholar
Bordes, A., Weston, J., Usunier, N.: Open question answering with weakly supervised embedding models. In: ECML-PKDD (2014)
Google Scholar
Yang, Y., Yih, W.-T., Meek, C.: WIKIQA: a challenge dataset for open-domain question answering. In: EMNLP (2015)
Google Scholar
Miao, Y., Yu, L., Blunsom, P.: Neural variational inference for text processing, arXiv (2015)
Google Scholar
Wang, D., Nyberg, E.: A long short term memory model for answer sentence selection in question answering. In: ACL (2015)
Google Scholar
Yin, W., Schütze, H., Xiang, B., Zhou, B.: ABCNN: attention-based convolutional neural network for modeling sentence pairs. In: ACL (2016)
Google Scholar
Yu, L., Hermann, K.M., Blunsom, P., Pullman, S.: Deep learning for answer sentence selection. In: NIPS Workshop (2014)
Google Scholar
Yan, Z., Duan, N., Bao, J., Chen, P., Zhou, M., Li, Z., Zhou, J.: DocChat: an information retrieval approach for chatbot engines using unstructured documents. In: ACL (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Asia, Beijing, China
Nan Duan

Authors

Nan Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Duan .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Chin-Yew Lin
Brandeis University, Waltham, Massachusetts, USA
Nianwen Xue
Peking University, Beijing, China
Dongyan Zhao
Fudan University, Shanghai, China
Xuanjing Huang
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duan, N. (2016). Overview of the NLPCC-ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_89

Download citation

DOI: https://doi.org/10.1007/978-3-319-50496-4_89
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Overview of the NLPCC-ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering

Abstract

Similar content being viewed by others

Overview of the NLPCC 2017 Shared Task: Open Domain Chinese Question Answering

Overview of the NLPCC 2015 Shared Task: Open Domain QA

Overview of the NLPCC 2018 Shared Task: Open Domain QA

Keywords

1 Background

2 Task Description

2.1 KBQA Task

2.2 DBQA Task

3 Evaluation Metrics

4 Evaluation Results

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Overview of the NLPCC-ICCPOL 2016 Shared Task: Open Domain Chinese Question Answering

Abstract

Similar content being viewed by others

Overview of the NLPCC 2017 Shared Task: Open Domain Chinese Question Answering

Overview of the NLPCC 2015 Shared Task: Open Domain QA

Overview of the NLPCC 2018 Shared Task: Open Domain QA

Keywords

1 Background

2 Task Description

2.1 KBQA Task

2.2 DBQA Task

3 Evaluation Metrics

4 Evaluation Results

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation