Topic Knowledge Acquisition and Utilization for Machine Reading Comprehension in Social Media Domain

Tian, Zhixing; Zhang, Yuanzhe; Liu, Kang; Zhao, Jun

doi:10.1007/978-3-030-84186-7_11

Zhixing Tian^16,17,
Yuanzhe Zhang¹⁶,
Kang Liu^16,17 &
…
Jun Zhao^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12869))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

1606 Accesses
1 Citations

Abstract

In this paper, we focus on machine reading comprehension in social media. In this domain, one normally posts a message on the assumption that the readers have specific background knowledge. Therefore, those messages are usually short and lacking in background information, which is different from the text in the other domain. Thus, it is difficult for a machine to understand the messages comprehensively. Fortunately, a key nature of social media is clustering. A group of people tend to express their opinion or report news around one topic. Having realized this, we propose a novel method that utilizes the topic knowledge implied by the clustered messages to aid in the comprehension of those short messages. The experiments on TweetQA datasets demonstrate the effectiveness of our method.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Word network topic model: a simple but general solution for short and imbalanced texts

Article 23 September 2015

SUMOPE: Enhanced Hierarchical Summarization Model for Long Texts

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

Keywords

1 Introduction

As an increasing number of people share and obtain information from social media, social media is now becoming an important real-time information source. The unprecedented volume, variety of user-generated content, and the user interaction network constitute new opportunities for understanding social behavior and building socially intelligent systems. It is important and challenging to teach a machine to automatically understand the content presented in social media.

Although considerable progress has been made in the task of Machine Reading Comprehension (MRC), most of the previous works only focus on the comprehension of the other domains, such as news [1, 5], story [11, 16] and Wikipedia [17, 27], and there are very few works addressing the problem of social media MRC. Table 1 shows an example of social media comprehension. Different from the other domains, in social media domain, one normally posts a message on the assumption that the readers have specific background knowledge. Those messages are generally short and contain limited contextual information as shown in the table. Thus, it is difficult for a machine to understand them thoroughly based only on the text itself. In this example, if only look at the message, without background knowledge, a machine reader would be puzzled about its topic and could not answer the question.

Table 1. An example of machine reading comprehension in social media domain.

Full size table

To obtain background knowledge for a machine reader, one feasible method is to introduce external knowledge from the knowledge base like ConceptNet [10] and WordNet [4], as the previous works do in other domains [9, 22, 24]. Unfortunately, due to the nature of informality and diversity of the messages in social media, some key phrases of the short messages cannot be found in those pre-constructed knowledge base. For example, in Table 1, the token fantasticfour, which indicates the topic of the message, cannot be found in ConceptNet or WordNet.

By studying social media messages, we find that a significant nature of them is clustering. That is to say, on a social media platform, a group of people tend to express their opinion or report news around one topic. Specifically, those topic-relevant messages is commonly clustered by the hashtag, which is marked with “#” symbol (e.g., #fantasticfour in Table 1) and ubiquitous in social media domain. Thus, given a social media message, we can find a group of relevant messages based on the hashtag. As shown in Table 2, there are a series of topic-relevant messages clustered by the hashtag “#fantasticfour”. Through those messages, we would know the topic is a science fiction film, from Marvel, about some superheroes and so on. Those hashtag-clustered messages tend to share a common topic and can be considered as a knowledge source of the topic of the given message. To this end, we propose a novel method, which obtains and utilizes the topic knowledge from the hashtag-clustered messages, to address the problem of lack of background knowledge in the task of social media comprehension.

Table 2. An example of hashtag-clustered messages in social media.

Full size table

Given a message and a question, we extract the hashtag from the message and retrieve the relevant messages based on the hashtag. Subsequently, we refine topic knowledge from the retrieved messages. Moreover, we construct a neural network, dubbed as Topic Knowledge Reader (TKR). The refined knowledge will be fused into the TKR model and contribute to the process of reading comprehension and question answering. We conduct experiments on the TweetQA dataset [26]. The result shows the effectiveness of our method.

To summarize, the major contributions of this paper are as follows:

In the task of machine reading comprehension, we investigate the problem of lack of background knowledge in social media domain. We propose to utilize the nature of clustering of social media to obtain the knowledge from the other relevant messages.
We propose a particular knowledge acquisition approach, which retrieves and refines topic knowledge from those relevant messages clustered by the hashtag which exists generally in the social media messages.
We build a machine reading comprehension model, TKR, to utilize the refined knowledge in a targeted manner and conduct experiments on the public dataset, which demonstrates the effectiveness of our method.

2 Related Work

Social Media NLP: Over the past few years, social media has revolutionized the way we communicate. Massive amount of information in form of text is continuously generated by the users, which creates enormous challenges for NLP community to analyze and understand those text automatically. In recent years, several NLP techniques and datasets for processing social media text have been proposed. Dos Santos and Gatti [3] use a deep convolutional neural network that exploits from character-level to sentence-level information to perform sentiment analysis of short texts. Vo and Zhang [21] splits the context and employs distributed word representations and neural pooling functions to extract features from tweets. Zhou and Chen [29] propose a graphical model, named location-time constrained topic (LTT), to capture the content, time, and location of social messages. Singh et al. [18] develop an event classification and location prediction system which uses the Markov model for location inference. Qian et al. [13] jointly discover subevents from microblogs of multiple media types–user, text, and image, and design a multimedia event summarization process.

Machine Reading Comprehension: Due to the fast development of deep learning techniques and large-scale datasets, Machine Reading Comprehension (MRC) has gained increasingly wide attention over the past few years. Richardson et al. [15] build the multiple choice dataset MCTest, and this dataset encourages the early research of machine reading comprehension, and a strand of MRC models [11, 16] are inspired by the dataset. Hermann et al. [5] propose a cloze test dataset CNN & Daily Mail, which is large-scale and more suitable than MCTest for deep learning methods. Based on this dataset, Hermann et al. [5] propose an attention-based LSTM [6] model named Attentive Reader. Moreover, Rajpurkar et al. [14] release the span extraction dataset, SQuAD, which has become the most popular MRC dataset over recent years. This dataset enlightens a lot of classical MRC model, like BiDAF [17] and R-Net [25]. In addition, the multi-hop MRC dataset HotpotQA [27] has gained recent wide attention. This dataset addresses the problem of multiple clues based question answering.

3 Method

Figure 1 shows the framework of our method. Note that it is an example of the tweets, but the method is universal for the messages from other social media platforms. Given a tweet and a question, we obtain the answer by the following steps: First, we extract the hashtag from the tweet, meanwhile, we encode the tweet and the question by the BERT encoder. Second, we retrieve relevant tweets that contain the same hashtag. Third, we refine the topic knowledge from the retrieved tweets. Next, the knowledge is encoded and fused with the BERT representation. Finally, the model predicts the answer based on the knowledge aware representation of the tweet and question.

3.1 Knowledge Acquisition

We regard the set of tweets clustered by the hashtag as the resource of knowledge and obtain topic knowledge from them. We first retrieve relevant tweets, then from those tweets, we gather common concepts. Meanwhile, we maintain a hashtag pool to score each concept. Finally, we refine the concepts by select the top-k scored ones as the topic knowledge.

Retrieving Relevant Tweets. Given a tweet text T, we first extract the hashtag H. Next, we use H as the query and retrieve the relevant tweets. So we have a set S consisting of tweets that contain the same hashtag. The tweets in S tend to share the same topic with the given tweet T, and the information of them is helpful to understand T comprehensively. Next, we remove the non-English tweets from S, and then delete the non-normal strings in the text of S such as the URL which starts with “http” and the reference of the picture which starts with “$pic\backslash $”.

Exceptionally, for those tweets that contain no hashtag, we utilize a hashtag extractor to extract hashtag words from the tweet. The extractor is composed of a BERT encoder and a span pointer. We input the tweet, T, to the BERT model and obtain the representation $P = \left\{ p_0, p_1, ..., p_n\right\} $, where $p_i$ is the i-th word of the tweet. Then we extract the hashtag from the tweet by a pointer:

$$\begin{aligned} \begin{aligned} Start_i = \frac{exp(w_0^Tp_i)}{\sum _{j}{exp(w_0^Tp_{j})}} \qquad End_i = \frac{exp(w_1^Tp_i)}{\sum _{j}{exp(w_1^Tp_{j})}} \end{aligned} \end{aligned}$$

(1)

where $w_0$ and $w_1$ are trainable vectors. The pointer labels the probability of each word as the start and the end of the hashtag, respectively. We calculate the score of a span by multiplying those two probabilities and take the span with the max score as the hashtag. Figure 2 shows an example. We train the extractor model on a hashtag extraction dataset proposed by Zhang et al. (2016). [28]. Evaluated on the test set of the dataset, our extractor achieves 85.1% accuracy.

Gathering Relevant Concepts. Having retrieved the tweets with the same topic, we gather the fine-grained knowledge, i.e., concepts that connect to the topic. We tokenize every tweet text in S and obtain a set of tokens. Then we segment each token to get the concept. Due to the nature of informality, some tokens from the tweets could contain multiple words, like the hashtag “#secretwars”, thus we conduct a segmentation on each token. After that, we obtain a set C consisting of concepts (e.g., “movie”, “marvel”, and “secret wars”).

Maintaining Hashtag Pool. To further refine the concepts, we maintain a hashtag pool. First of all, a large scale of recent tweets is collected as the original corpus. Based on the corpus, we collect the hashtags, then find the relevant tweets and obtain the concept set C for each hashtag follow above-mentioned process. Those hashtags and their all relevant concepts are added to the empty hashtag pool as the initialization. When a new tweet, T, is given during application, we update the hashtag pool by adding the hashtag of T and the relevant concepts, C, to the pool.

Refining Topic Knowledge. Given tweet T, by above-mentioned steps, we have concepts C, then we apply Term Frequency-Inverse Document Frequency (TF-IDF) to score each concept. The score for $concept_i$ in C is calculated by:

$$\begin{aligned} score_i = \frac{n_i}{N}log\frac{|P|}{|p_i|+1} \end{aligned}$$

(2)

where $n_i$ is the frequency of $concept_i$ in C, N is the total count of concepts in C, P denotes the hashtag pool. Thus, |P| is the total number of the hashtags in the hashtag pool, and $|p_i|$ is the number of hashtags, whose relevant concepts contain the $concept_i$, in the hashtag pool. Finally, the top-k scored concepts are selected as the topic knowledge K of the tweet T.

3.2 Topic Knowledge Reader

As shown in Fig. 3, we propose a reading comprehension model, named Topic Knowledge Reader (TKR), to fuse the refined concepts and then answer the question. The inputs of the model are

the given tweet $T = \left\{ t_{0}, t_{1}, ..., t_{n-1}\right\} \in \mathbb {R}^{n}$, where n is the number of words in the tweet, $t_i$ is the i-th word in T.
the question $Q = \left\{ q_{0}, q_{1}, ..., q_{m-1}\right\} \in \mathbb {R}^{m}$, where m is the number of words in the question, $q_i$ is the i-th word in Q.
the concept knowledge $K = \left\{ k_{00}, k_{01}, ..., k_{ij},..., k_{(l-1)x}\right\} \in \mathbb {R}^{y}$, where y is the number of words of all concepts, $k_{ij}$ refers to the j-th word of i-th concept.
the concept score $S = \left\{ s_{0}, s_{1}, ..., s_{l-1}\right\} \in \mathbb {R}^{l}$.

The output of the model is the predicted answer.

Encoding Tweet and Question. We first concatenate the question Q and the tweet T. The combination passage is

$$\begin{aligned} \begin{aligned} D = \left\{ [CLS], t_{0}, t_{1}, ..., t_{n-1}, [SEP], q_{0}, q_{1},... q_{m-1}, [SEP]\right\} \end{aligned} \end{aligned}$$

(3)

where we add the special word “[CLS]” and “[SEP]”, which follows the process from Devlin et al. [2]. Then, we employ BERT [2] to encode the tweet and the question together, thus we have the question-aware representation of the passage:

$$\begin{aligned} \begin{aligned} P^0 = BERT\left( D\right) \in \mathbb {R}^{(m+n+3) \times h} \end{aligned} \end{aligned}$$

(4)

Encoding Concepts. We encode the concepts, before the step of fusion. To obtain the original representation, we apply BERT encoder for the concepts as well. Analogously, we add the special word “[CLS]” to the single sequence of knowledge words,

$$\begin{aligned} \begin{aligned} K = \left\{ [CLS], k_{00}, k_{01}, ..., k_{ij},..., k_{(l-1)x}\right\} \end{aligned} \end{aligned}$$

(5)

and then the pre-trained model, BERT, is applied on encoding the concepts:

(6)

Words Aggregation: As there are multiple words in some concepts, we aggregate the words of each concept by mean pooling, then obtain the one-vector representation, $c^0_i$, for each concept:

$$\begin{aligned} \begin{aligned} c^0_i = \frac{1}{N}\sum _{j\in [0, N) }{c^0_{ij}} \qquad C^0 = \left\{ c^0_0, c^0_1, ..., c^0_i, ..., c^0_{l-1}\right\} \in \mathbb {R}^{l \times h} \end{aligned} \end{aligned}$$

(7)

Self Attention: Though no sequential relation exists among those concepts, they are still interrelated. Thus, we use the self-attention mechanism to perform a non-sequence context encoding on the concepts:

$$\begin{aligned} \begin{aligned} c^1_i = \sum _j{\alpha _{ij}c^0_j} \qquad \alpha _{ij} = \frac{exp(\sigma (W_qc^0_i)\cdot \sigma (W_kc^0_j))}{\sum _{j^\prime } {exp(\sigma (W_qc^0_i)\cdot \sigma (W_kc^0_{j^\prime }))}} \end{aligned} \end{aligned}$$

(8)

where $\sigma $ is the activation function, $W_q\in \mathbb {R}^{h \times h}$ and $W_k \in \mathbb {R}^{h \times h}$ are trainable matrixes. Thus we have self-aligned concepts $C^1 = \left\{ c^1_0, c^1_1, ..., c^1_{l-1}\right\} \in \mathbb {R}^{l \times h}$.

Score Scaling: We then scale the concepts by the score $S \in \mathbb {R}^{l}$ assigned in the step of knowledge refining:

$$\begin{aligned} \begin{aligned} C^2 = S C^1 \in \mathbb {R}^{l \times h} \end{aligned} \end{aligned}$$

(9)

$C^2$ denotes the final representation of the Concepts.

Topic Knowledge Fusion. The Concepts are fused into the passage by:

$$\begin{aligned} \begin{aligned} p^1_i = \sum _j{\beta _{ij}c^2_j} \qquad \beta _{ij} = \frac{exp(\sigma (W_pp^0_i)\cdot \sigma (W_cc^2_j))}{\sum _{j^\prime } {exp(\sigma (W_pp^0_i)\cdot \sigma (W_cc^2_{j^\prime }))}} \end{aligned} \end{aligned}$$

(10)

where $\sigma $ is the activation function, $W_p\in \mathbb {R}^{h \times h}$ and $W_c \in \mathbb {R}^{h \times h}$ are trainable matrixes. Thus, we obtain the concepts-aware passage representation $P^1 = \left\{ p^1_0, p^1_1, ..., p^1_{m+n+2}\right\} \in \mathbb {R}^{(m+n+3) \times h}$. A bidirectional LSTM is applied to conduct an additional sequential context encoding and aggregate the original question-aware passage representation $P^0$ and the concepts-aware passage representation $P^1$.

$$\begin{aligned} P^2 = BiLSTM([P^0;P^1])\in \mathbb {R}^{(m+n+3) \times h} \end{aligned}$$

(11)

Prediction. We employ two Linear layers to point the start position and the end position of the answer in the passage, respectively, and then normalize the prediction scores:

$$\begin{aligned} \begin{aligned} \tilde{Start_i} = \frac{exp(w_s^Tp^2_i)}{\sum _{j}{exp(w_s^Tp^2_{j})}} \qquad \tilde{End_i} = \frac{exp(w_e^Tp^2_i)}{\sum _{j}{exp(w_e^Tp^2_{j})}} \end{aligned} \end{aligned}$$

(12)

$w_s\in \mathbb {R}^{h}$ and $w_e \in \mathbb {R}^{h}$ are trainable weight vectors. We utilize Negative Log Likelihood (NLL) as the loss function during training. Moreover, during the evaluation, we obtain the score of each span of the tweet by multiplying its start score and the end score and then select the text span with the max score as the answer.

4 Experiment

4.1 TweetQA Dataset

We conduct experiments on the recently released social media MRC dataset, TweetQA. Each instance of the dataset is a triple consisting of a tweet text, a human proposed question, and a list of human-annotated answers. The dataset is composed of 10692 training triples, 1086 development triples, and 1979 test triples. It is the first large-scale MRC dataset over social media data.

4.2 Implement Detail

Preprocess: As we employ BERT [2] to encode the text, we tokenize the text by the default tokenizer of BERT. Since the answer spans are not labeled in the train set, we annotate the approximate answer span in each tweet by selecting the span that achieves the best F1 score.

Knowledge Acquisition: To simulate the real-world scenario where a social media MRC system works, we regard the train set as the original corpus for the initialization of the Hashtag Pool. During evaluating, we update the Hashtag Pool by the hashtag and the relevant concepts, from the development set and the test set. Based on the experimental analysis, we select top-8 scored concepts for each hashtag at the step of refining the knowledge.

Training: We select the instances that contain the span whose F1 score no less than 0.6 to train the model in the way of weakly supervised. As a result, 8238 instances are used during training. We employ Adam optimizer to train the model. The learning rate is set to $3\times 10^{-5}$, the model is fine-tuned for 3 epochs, and the dropout rate of BERT is set to 0.1. The BERT model we choose is the pre-trained bert-base [2] model, distinguished from the bert-large model. The hidden size of BERT is 768.

Evaluation: As the answer in TweetQA is not always a span of given tweet, following Xiong et al. [26], we use the metrics for natural language generation to evaluate the models, namely BLEU-1, Meteor, and Rouge-L. The answers of the test set are not released, so we submit our prediction to the official evaluating platform of TweetQA^{Footnote 1} and receive the response of the performance results.

4.3 Baselines

Query Matching: a simple IR baseline [7], which is adapted to the TweetQA Task by Xiong et al. [26].
BiDAF: a popular neural baseline [17] of Machine Reading Comprehension, which extract answers from the original tweet text.
Gerative QA: a RNN-based generative model [19]. The model employs both copy and coverage mechanisms during the process of generating.
BERT Extraction: a recently proposed pre-trained model [2]. Following [2], we construct a BERT based answer extraction model by inputting the representation of passage, $P^0$, obtained from Eq. 4 directly to the prediction layer formulated by Eq. 12.
BERT Generation: Because part of the answers of TweetQA are not a span of the tweet text, we build a BERT based generative model. We use BERT as the encoder same with Eq. 4. Following [20], we employ a pointer generator, which selects words from both the tweet and the vocabulary, to decode the answer. The generative model is trained on all instances of the train set.
Knowledge Concat: We also introduce another simpler method to fuse the topic knowledge, named “Knowledge Concat”. The model directly concatenates topic knowledge (i.g., the selected concepts) with the sequence of tweet and question before BERT encoding and finally, same with TKR, conduct a span prediction.
KAR: Knowledge Aided Reader (KAR) [23] is a recently proposed MRC model, which utilizes the knowledge from WordNet. The model conducts mutual attention and self-attention based on the connections among the words of the question and the passage. The connections are built based on the knowledge from WordNet. For a fair comparison, we change the model by utilizing BERT as the basic encoder instead of the original embedding layers composed of Glove [12], CNN [8], and LSTM.

4.4 Main Results

Table 3. The results on TweetQA dataset. Extract-UB denotes the upper bound of extractive methods.

Full size table

As shown in Table 3, our model, TKR, surpasses the recently proposed Knowledge Aided Reader (KAR) and achieves competitive performance. From our point of view, due to the limitation of the knowledge from the pre-constructed knowledge base (WordNet), KAR suffers from the sparsity problem of knowledge extraction for the diverse and informal expressions in social media domain. Besides, TKR outperforms all of the other baselines significantly, especially the BERT based model, BERT Extraction. The model, BERT Extraction, is exactly the rest architecture of TKR when we ablate topic knowledge from TKR. Thus, the comparison between TKR and BERT Extraction can directly demonstrate the advantages of our methods to acquire and utilize topic knowledge for social media comprehension.

Moreover, Knowledge Concat performs better than BERT Extraction, which also validates the effectiveness of the knowledge. Besides, comparing Knowledge Concat with TKR, we find that TKR performs better. This is because TKR can integrate our refined knowledge to the MRC model in a more targeted manner.

4.5 Different Number of Concepts

To further verify the effectiveness of the topic knowledge, we study the relationship between the number of employed concepts and the performance of TKR. We choose top-k concepts during the step of refining, where k changes from 2 to 18, and then train and evaluate TKR at different settings of k. As shown in Fig. 4, by increasing the number, k, from 2 to 18, the performance of TKR first rises rapidly until k reaches 8 and then drop down slowly. The gain of performance from $k=2$ to $k=8$ proves our topic knowledge effective. The loss of performance from $k=8$ to $k=18$ is caused by the noise introduced by the concepts with low scores.

Table 4. Sampled cases which show the effect of the different numbers of concepts. The concepts in blue are the Top-5 scored ones, and those in black are the 13–18th scored concepts.

Full size table

To probe the effect of employing different numbers of concepts more intuitively, we sample and analyze some cases from the development set. Table 4 shows one of them, where Question0 and Question1 are two questions proposed based on the same tweet as shown in the table. In this example, we find that the top-5 concepts describe the topic comprehensively, build a semantic connection between some key concepts in the tweet including panda, Mei Xiang, and Nattional Zoo, and finally contribute to answering the questions. On the contrary, the 13–18th scored concepts tend to deviate from the topic, and as noisy-like information, they even damage the Reading Comprehension model, TKR, when introduced into the model.

4.6 Ablation Study

Table 5. Ablation study on the development set. -score scale denotes TKR without the module of score scaling. -self attn denotes TKR without self attention. -word agg denotes TKR without word aggregation. - LSTM denotes TKR that use a dense layer instead of LSTM for the knowledge fusion

Full size table

To study the effect of some key modules of TKR, we conduct ablation experiments on the development set. As shown in Table 5, all of the three knowledge encoding module, including score scale, self attention and word agg, contribute to the overall performance. The results demonstrate that those modules. which are designed for the topic knowledge in a targeted manner, indeed help the model to encode the knowledge and further to absorb it. Furthermore, the performance of - LSTM is slightly behind the original TKR, which proves that the sequential information captured by the additional context encoding is beneficial for the comprehension.

4.7 Extractive vs. Generative

Table 6. Sampled cases that show the difference between the generative model and the extractive one.

Full size table

As shown in Table 3, compared with BERT Generation, the extractive models including BERT Extraction and TKR achieve better performance, though there is no identically matching substring in the tweet for part of answers. Table 6 shows two sampled cases which tell the difference between the extractive model and the generative one. As shown in the table, the generative model performs better in some cases where the answer is supposed to be synthesized based on the question and tweet. On the contrary, the generative model lag behind the extractive one, when the answer is an uninterrupted snippet of the tweet. However, as studying more cases, we find that even in many cases, where the answer need to synthesize, the generative model fails to provide a qualified answer. We consider that much more data is needed to train a qualified generative MRC model.

4.8 Weakly Supervised Training

To train the extractive model, TKR, we annotate the answer span in the tweets by the F1 score. We train the model to locate the annotated span. As the annotated span may not be the true answer, it is a process of weakly supervised training. As shown in Table 7, we study the relationship between the span score of training data and the performance which is evaluated on the development set. As shown in the table, by reducing the threshold of span score, increasing training data is involved, meanwhile the performance first rises until span score = 0.6 and then drop down. That is to say, in the process of introducing different amount of the weakly supervised training data, span score = 0.6 is the point where the difference between the benefit from the positive example and the damage from the noise is maximized.

Table 7. The performance on development set with different scale of train data. $Span Score \ge i$ denotes that the model is trained by the instances containing the span whose F1 score is no less than i. data and proportion refer to the scale and the proportion of the selected training data.

Full size table

5 Conclusion

In this paper, we focus on machine reading comprehension in social media domain. We propose a novel method to address the problem of lacking in background knowledge in this task. Utilizing the nature of clustering of social media, we retrieve and refine topic knowledge from the relevant messages, and then integrate the knowledge into an MRC model, TKR. Experimental results show that our proposed method outperforms the recently proposed models and the BERT-based baselines, which proves the method effective overall. By introducing different amount of topic knowledge, we demonstrate the effectiveness of our refined knowledge. Moreover, the ablation study further validates the contribution of the key modules of TKR for utilizing the knowledge.

Notes

1.
https://tweetqa.github.io/.

References

Chen, D., Bolton, J., Manning, C.D.: A thorough examination of the CNN/daily mail reading comprehension task. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2358–2367. Association for Computational Linguistics, Berlin, August 2016. https://doi.org/10.18653/v1/P16-1223. https://www.aclweb.org/anthology/P16-1223
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Hermann, K.M., et al.: Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems, pp. 1693–1701 (2015)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)
Article Google Scholar
Kočiskỳ, T., et al.: The narrativeQA reading comprehension challenge. Trans. Assoc. Comput. Linguist. 6, 317–328 (2018)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Lin, H., Sun, L., Han, X.: Reasoning with heterogeneous knowledge for commonsense machine comprehension. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 2032–2043, September 2017. https://doi.org/10.18653/v1/D17-1216. https://www.aclweb.org/anthology/D17-1216
Liu, H., Singh, P.: ConceptNet - a practical commonsense reasoning tool-kit. BT Technol. J. 22(4), 211–226 (2004). https://doi.org/10.1023/B:BTTJ.0000047600.45421.6d
Article Google Scholar
Narasimhan, K., Barzilay, R.: Machine comprehension with discourse relations. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1253–1262 (2015)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP (2014)
Google Scholar
Qian, X., Li, M., Ren, Y., Jiang, S.: Social media based event summarization by user-text-image co-clustering. Knowl.-Based Syst. 164, 107–121 (2019)
Article Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250 (2016)
Richardson, M., Burges, C.J., Renshaw, E.: MCTest: a challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 193–203. Association for Computational Linguistics, Seattle, October 2013. https://www.aclweb.org/anthology/D13-1020
Sachan, M., Dubey, K., Xing, E., Richardson, M.: Learning answer-entailing structures for machine comprehension. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 239–249 (2015)
Google Scholar
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)
Singh, J.P., Dwivedi, Y.K., Rana, N.P., Kumar, A., Kapoor, K.K.: Event classification and location prediction from tweets during disasters. Ann. Oper. Res. 283, 737–757 (2019). https://doi.org/10.1007/s10479-017-2522-3
Article MATH Google Scholar
Song, L., Wang, Z., Hamza, W.: A unified query-based generative model for question generation and question answering (2017)
Google Scholar
Tay, Y., et al.: Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 4922–4931, July 2019. https://www.aclweb.org/anthology/P19-1486
Vo, D.T., Zhang, Y.: Target-dependent twitter sentiment classification with rich automatic features. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Google Scholar
Wang, C., Jiang, H.: Explicit utilization of general knowledge in machine reading comprehension. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, pp. 2263–2272, July 2019. https://www.aclweb.org/anthology/P19-1219
Wang, C., Jiang, H.: Explicit utilization of general knowledge in machine reading comprehension. In: ACL (2019)
Google Scholar
Wang, L., Sun, M., Zhao, W., Shen, K., Liu, J.: Yuanfudao at SemEval-2018 Task 11: three-way attention and relational knowledge for commonsense machine comprehension. arXiv preprint arXiv:1803.00191 (2018)
Wang, W., Yang, N., Wei, F., Chang, B., Zhou, M.: Gated self-matching networks for reading comprehension and question answering. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 189–198. Association for Computational Linguistics, Vancouver, July 2017. https://doi.org/10.18653/v1/P17-1018. https://www.aclweb.org/anthology/P17-1018
Xiong, W., et al.: TWEETQA: a social media focused question answering dataset. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5020–5031. Association for Computational Linguistics, Florence, July 2019. https://www.aclweb.org/anthology/P19-1496
Yang, Z., et al.: HotpotQA: a dataset for diverse, explainable multi-hop question answering. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2369–2380. Association for Computational Linguistics, Brussels, October–November 2018. https://doi.org/10.18653/v1/D18-1259. https://www.aclweb.org/anthology/D18-1259
Zhang, Q., Wang, Y., Gong, Y., Huang, X.: Keyphrase extraction using deep recurrent neural networks on Twitter. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845. Association for Computational Linguistics, Austin, November 2016. https://doi.org/10.18653/v1/D16-1080. https://www.aclweb.org/anthology/D16-1080
Zhou, X., Chen, L.: Event detection over Twitter social media streams. VLDB J.-Int. J. Very Large Data Bases 23(3), 381–400 (2014)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61976211, No. 61922085, No. 61906196). This work is also supported by the Key Research Program of the Chinese Academy of Sciences (Grant NO. ZDBS-SSW-JSC006), the Open Project of Beijing Key Laboratory of Mental Disroders (2019JSJB06).

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Zhixing Tian, Yuanzhe Zhang, Kang Liu & Jun Zhao
University of Chinese Academy of Sciences, Beijing, 100049, China
Zhixing Tian, Kang Liu & Jun Zhao

Authors

Zhixing Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhe Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixing Tian .

Editor information

Editors and Affiliations

Harbin Institute of Technology, Harbin, China
Sheng Li
Tsinghua University, Beijing, China
Maosong Sun
Tsinghua University, Beijing, China
Yang Liu
Baidu (China), Beijing, China
Hua Wu
Chinese Academy of Sciences, Beijing, China
Liu Kang
Harbin Institute of Technology, Harbin, China
Wanxiang Che
Chinese Academy of Sciences, Beijing, China
Shizhu He
Beijing Language and Culture University, Beijing, China
Gaoqi Rao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, Z., Zhang, Y., Liu, K., Zhao, J. (2021). Topic Knowledge Acquisition and Utilization for Machine Reading Comprehension in Social Media Domain. In: Li, S., et al. Chinese Computational Linguistics. CCL 2021. Lecture Notes in Computer Science(), vol 12869. Springer, Cham. https://doi.org/10.1007/978-3-030-84186-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-84186-7_11
Published: 08 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-84185-0
Online ISBN: 978-3-030-84186-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Topic Knowledge Acquisition and Utilization for Machine Reading Comprehension in Social Media Domain

Abstract

Similar content being viewed by others

Word network topic model: a simple but general solution for short and imbalanced texts

SUMOPE: Enhanced Hierarchical Summarization Model for Long Texts

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

Keywords

1 Introduction

2 Related Work