Keywords

1 Introduction

Since controversial topics, especially the new ones, are widely discussed in Twitter, the search tool of Twitter is frequently used by people. However, the retrieved tweets which only reflect tweeters’ opinions or just general support or oppose these controversial topic are not meaningful enough. Argumentation is known as the most convincing structure, which is often used in law, persuasive essay, and debate domain and has been researched for decades. Among diverse argumentation definitions [3, 4, 10, 16, 19], a widespread one is claim and evidence [12]. Due to the short texts, Christian and Iryna [16] point out that argumentation structure is rare, or likely to be incomplete in social media. It means that some tweets may contain only claims, while others may contain only evidences or both claims and evidences. Specifically, the heart of every argumentation lies in a single claim, which is a assertion the argumentation aims to prove [5]. Moreover, only when the claim is confirmed, can the evidences make sense. To help users swiftly obtain many pre-eminent claims about the query topic, there is a pressing need for tools that can automatically retrieve claim-oriented tweets.

Table 1. Examples for tweets separately relevant to two topics, “abortion” and “animal testing”. “Y” means it contains a claim and “N” means it does not.

Hence, given a topic, our task aims to retrieve a list of claim-oriented tweets. We assume a claim-oriented tweet should meet three criteria: (1) the tweet should be topic-related; (2) the tweet clearly supports or opposes the topic; (3) the tweet provides an arguable reasonFootnote 1 for its stance. For examples, as shown in Table 1: T1 is a piece of news which contains no stance; T2 is clearly against the topic, and contains an explicit disputable reason, “abortion is murder”; T3 is a objective truth which is not in dispute (seems like an evidence); T4 just has an opposing stance without showing a reason; T5 contains an implicit claim, “animal testing used by cosmetics is cruel”. Consequently, T2 and T5 are claim-oriented tweets that we need to retrieve.

Previous studies of predicting whether a document contains claims use supervised learning approaches [5, 15], parse tree measures [6], and more recent works concentrating on neural networks [2]. There are two major challenges rendering these approaches not suitable for our task.

Chaotic Twitter. Tweets are short and often contain specific conventions. For instance, in the first sample in Table 1, tweet contains hashtags, URLs, and re-tweet (RT@), while the textual content are really short. Cleaning these Twitter specific conventions using NLP techniques will cause incomplete sematic of the tweet. Therefore, these chaotic elements in Twitter represent an open challenge for standard claim detection approaches.

Vague Claim. In fact, the majority of online users do not really need to present a well-formed argumentation or their proposition. As a consequence, claims made by the users will often be unclear, ambiguous, vague, or simply poorly worded [17]. For example, people need background knowledge “cosmetics often use animal testing” to recognize that T5 in Table 1 contains an implicit claim “animal testing used by cosmetics is cruel”, which is clearly challenging.

In this paper, we explore both Twitter structural information and claim-oriented information to address the above issues. Twitter structural information refers to hashtags, URLs, re-tweet (RT@), etc. And the claim-oriented information denotes indicative words whose appearances represents that the tweet is likely to contain claim. First, We utilize a learning-to-rank framework to learn a ranking function that uses both Twitter structural information and topic-independent claim-related informationFootnote 2 in addition to traditional topic-related information and stance information. And then we elevate the performance by automatically generate topic-dependent claim-oriented lexicons and use them in a lexicon-based approach. Additionally, since the topic-dependent claim-oriented lexicon can be constructed using unlabeled topic-relevant tweets, our model can be easily adapted to new topics which guarantees the practicability of our model.

The contributions of this work can be summarized as follows:

  1. (1)

    We define a novel claim-oriented tweet retrieval task. We construct a real-world dataset for this task.

  2. (2)

    Our method integrates both topic-independent and topic-dependent claim-oriented information and achieves portability to all controversial topics.

  3. (3)

    Experimental results show that best performance of our ranking model is significantly better than baselines.

2 Related Work

The task of automatic claim-oriented document detection was first introduced by Levy et al. [5] who used a supervised learning approach to detect context dependent claims in Wikipedia articles. Lippi and Torroni [6] focused on the rhetoric structure of claims and relied on the ability of Partial Tree Kernels to generate the feature set. More recently, Roitman et al. [15] proposed a two-step retrieval approach to do claim-oriented document retrieval task, and they concentrated on retrieving as many relevant claims as possible from wikipedia corpus. Our experimental results show that claim-oriented document retrieval features do not perform well in Twitter.

Our task shares relationship with argument mining in Twitter or online forum [1, 11, 18, 20]. Theodosis et al. [18] did not distinguish between domain entities and claims, since they thought the claims are not expressed literally. However, in our opinion, both explicit and implicit claims are contained in tweets, and only when the claim is confirmed, can the evidences make sense. Other examples often considered argument as evidence. Addawood and Bashir [1] used a supervised classifier trained with different kinds of features to capture the evidence types in social media. To conclude, none of the work mentioned above concentrated on claim mining in Twitter.

Since we define the claim-oriented tweet should contain a clear stance, stance detection in Twitter is also important for our task. Saif et al. [9] proposed a state-of-art stance detection system using a SVM classifier along with distant supervision techniques. We use their features to measure whether there are stances in tweets.

3 Methodology

To generate a good function which ranks the tweets according to our principle for finding claim-oriented tweets, we investigate the features concerning topic relevance, stance existence and arguable reason inclusion of a tweet. In general, we use a learning-to-rank framework to integrate topic-related feature, stance detection features, Twitter structural features and topic-independent claim-related features. To further elevate the retrieval performance, we use a topic-dependent claim-oriented lexicon to score whether each tweet contains arguable reasons.

3.1 Learning to Rank Method

Learning-to-rank is a data driven approach that effectively incorporates a bag of features into the retrieval process. To generate a general model for all kinds of controversial topics, we develop topic-independent features into a learning-to-rank scenario. In the remainder of this section, we will focus on these topic-independent features.

Relevance Feature. We use the Okapi BM25 [14] to measure the relevance between topics and tweets.

Stance Features. Since the claim-oriented tweets need to express a clear stance toward the given controversial topic, we use a feature set TwitStan integrated in a state-of-art classifier which is proposed by Saif et al. [9] to address the SemEval-2016 task on stance detection in Twitter. The features used for our method include n-grams, sentiment, target, POS, encodings, and word embeddings trained on large collections of tweets in November 2015 using Glove [13].

Twitter Structural Features. Compared to traditional media data, Twitter has many specific structural information, such as URLs, hashtags, etc. Some of them have been proved to have significant influence on Twitter retrieval [7, 8]. However, most argument mining works in Twitter treat tweets as plain texts by removing them [1]. This may lead to the information loss of tweets. To explore the relationship between Twitter structural information and claim-oriented tweets, we use them as binary features.

RT @” indicates copying and rebroadcasting of the original tweet, we assume that persuasive tweets containing clear propositions are more likely to be broadcasted. URL indicates the links to out side content. Observationally, advertisements and news that are unlikely to contain a claim in Twitter often contain a URL. Inspired by the assumption that high quality claims arise in debates or quarrels, we use “reply” which describes whether this tweet is a comment or a reply.

Topic-Independent Claim-Related Features. Some claim-oriented tweets expressed arguable reasons explicitly, and they often express in general patterns, for instance,

(1) @mmfa Abortion is not a choice, abortion is the killing of an innocent life

(2) RT @hailey stiegel: MAKING ABORTION ILLEGAL IS NOT GETTING RID OF ABORTION, IT IS GETTING RID OF SAFE ABORTION

A is not B, it is C” pattern appears in these explicit claim-oriented tweets. In order to capture these claim-oriented patterns, which involve be verbs, modal verb, we utilize an information gain based method to calculate the claim score of each word.

Table 2. Table for information gain. \(C_{1*}=C_{11}+C_{12}\); \(C_{2*}=C_{21}+C_{22}\); \(C_{*1}=C_{11}+C_{21}\); \(C_{*2}=C_{12}+C_{22}\); \(C=C_{11}+C_{12}+C_{21}+C_{22}\).

\(C_{ij}\) in Table 2 indicates the number of tweets having/not-having term t in the claim-oriented/non-claim set respectively. For example, \(C_{11}\) is the number of claim-oriented tweets which contain term t. Then, we give definitions of some concepts: H(X) is the entropy of X. For each topic, the total claim entropy is called \(H(C)= -\sum _{i=1}^2 p_{i*}\log _{2} p_{i*}\), where \(p_{i*} =\frac{C_{i*}}{C}\) is the probability of the \(C_{i*}\). For each term t, we compute the entropy of claim on the term t H(C|t) as follows:

$$\begin{aligned} \begin{aligned} H(C|t)= -p_{t}\sum _{i=1}^2p(C_{i}|t)\log _{2}p(C_{i}|t) - p_{(\lnot t)}\sum _{i=1}^2p(C_{i}|\lnot t)\log _{2}p(C_{i}|\lnot t)\\ \end{aligned} \end{aligned}$$
(1)

\(IG(C,t)= H(C)-H(C|t)\) calculates the information gain about claim of term t.

The number of claim-oriented tweets varies from topics. For example, there are 40 tweets containing claims in topic “abortion”, but only 2 tweets contain claims on topic “Trump”. Therefore, tweets about topic “abortion” are more likely to contain claims. If term scores are calculated without considering the topic, insignificant topic words will score higher and be seen as claim-oriented words. For instance, “abortion”, “woman” (high frequency words on topic “abortion”) etc. To avoid this situation, term scores are calculated separately according to topics. For each term t, we use \(H(t|K)= \sum _{i=1}^n p_{k_{i}}H(t|K=k_{i})\) to represent t’s distribution under the topic set K.

If term t is a topic-independent claim indicator, it should be evenly distributed under various topics. And this situation will cause H(t|K) to increase. Therefore, t’s score \(Claim_{TI}(t)\) which used to indicate claim relatedness is calculated as follows:

$$\begin{aligned} Claim_{TI}(t)= \sum _{k\in K}\frac{IG_{k}(C,t)\cdot H(t|K)}{TN_{k}} \end{aligned}$$
(2)

where \(TN_{k}\) is the number of tweets about topic k. The highest score terms are selected to form the Topic-Independent Claim-Related Lexicon TICRLex and will be used as topic-independent claim-related features.

3.2 Lexicon Method

Some arguable reasons in claim-oriented tweets are expressed implicitly. For instance, there are 2 tweets of topic “death penalty”:

(1) @mmellmmar because death penalty treats you better if you are rich and guilty than if you are poor and innocent..

(2) Death penalty should not exist, esp because it is against those who are poor.#deathpenalty

They expressed the claim that “the death penalty for the poor and the rich is different”, which requires background knowledge to identify. We find that these implicit claim-oriented tweets often contain some topic-dependent words, like “poor”, “rich” with topic “death penalty”. To capture these words, we develop a approach to automatically generate topic-dependent claim-oriented lexicons using unlabeled topic-related tweets.. Additionally, since it is impossible to train a supervised model for every topic, we use topic-dependent claim-oriented lexicons in a lexicon-based method. We estimate the claim-oriented score of each tweet by calculating the average claim-oriented score over certain terms.

Topic-Dependent Claim-Oriented Lexicon. We suppose that if term t often appear with topic-independent claim-oriented words simultaneously, then term t is likely to be a claim-oriented word. In the above two examples, we suppose that term “because” is a topic-independent claim-oriented word. The term “poor” appear with “because” twice in these two tweets. Since we suppose that topic-dependent claim-oriented and topic-independent claim-oriented words are often united, term “poor” can be seen as a claim-oriented word of topic “death penalty”.

First, suppose we have already got the topic-independent claim-related lexicon TICRLex. To distinguish claim-oriented terms in the claim-related lexicon, we introduce a signal function Sgn(t) for each term t:

$$\begin{aligned} Sgn(t)=\left\{ \begin{array}{rcl} -1 &{} &{} {\frac{C_{11}}{C_{*1}} \le \frac{C_{1*}}{C}}\\ 1 &{} &{} {\frac{C_{11}}{C_{*1}} > \frac{C_{1*}}{C}} \end{array} \right. \end{aligned}$$
(3)

\(Claim_{TI}(t)\) is the term t’s claim score in TICRLex. Then we compute the new score \(Claim_{TI}(t)^{+}= Claim_{TI}(t)\cdot Sgn(t)\) of each term t in TICRLex. If \(Claim_{TI}(t)^{+} > 0 \), means term t is positively related to claim, we add t to a new Lexicon called posLex.

\(CoT(w_{i},t)\) represents the co-occurrence frequency of term t in topic-related tweet set TS with the term \(w_{i}\) in posLex. \(TN_{t}\) is the number of tweets containing term t. t’s topic-dependent claim-oriented score \(Claim_{TD}(t)\) is then defined as the weighted sum of \(CoT(w_{i},t)\):

$$\begin{aligned} Claim_{TD}(t)=\sum _{w_{i}\in {{\varvec{posLex}}}} \frac{Claim_{TI}(w_{i})^{+}\cdot CoT(w_{i},t)}{TN_{t}} \end{aligned}$$
(4)

The highest score terms are selected to form the Topic-Dependent Claim-Oriented Lexicon TDCOLex.

4 Experiments

4.1 Datasets

We construct a real-world dataset for our claim-oriented tweet retrieval taskFootnote 3. We crawled and indexed about 90 million tweets using the Twitter API in 2016 and reserve the English tweets. Using these tweets we implemented a search engine based on ElasticSearchFootnote 4. We collected 30 debate topics from debate websiteFootnote 5 as the queries. Given a query the search engine would present a list of relevant tweets ranked based on the Okapi BM25 [14] score. A native English speaker and two experienced annotators with NLP background were hired to identify whether the tweet contains a claim following the criteria we proposed (in Sect. 1) by assigning binary labels to every tweet. The inter-annotator agreement was 90.1% for topic-relevance, 78.2% for clear stance and 75.2% for arguable reasonFootnote 6. The high consistency of the annotation proves our claim-oriented criteria are easy to convey to human labelers. We marked an instance with a claim only if at least 2 annotators labeled them as containing claim. Totally, 2520 tweets were selected for study and 586 tweets were identified as containing claims.

4.2 Experimental Settings

For learning to rank, SVM lightFootnote 7 which implements the ranking algorithm is used. To avoid overfitting, we perform 10 fold cross-validation in our dataset. We use Mean Average Precision (MAP), Precision@5, and Precision@10 as evaluation metrics.

4.3 Baselines

We investigate the features used by previous similar tasks, and separately develop these bags of features into a learning-to-rank scenario as our baselines.

BM25 Similarity. We use BM25 similarity as a basic measure. The Okapi BM25 scoring shows the relevance between query topic and the tweet.

TwitStan. TwitStan is a feature set used in a state-of-art stance classifier for tweets [9]. We combine the BM25 as the relevance feature.

WikiClaim. WikiClaim is a claim-discovery feature list from Roitman et al. [15]. Considering tweets do not have title or headers, we only use the content features. We combine the BM25 as the relevance feature.

TwitArgument. Since claim and evidence are all argumentative components, we also use TwitArgument which is a feature set used by argument identification tasks in Twitter [18]. We combine the BM25 as the relevance feature.

Table 3. Results for baselines. A significant improvement over the BM25 with \(^\triangle \) and (for p < 0.05 and p < 0.01).

4.4 Results

Experiment I: Baselines. Table 3 gives the performance of the baselines. Due to the particularity of corpus, \(LTR_{WikiClaim}\) which is effective on Wikipedia corpus do not perform well. The results also show that \(LTR_{TwitArgument}\) is much worse than \(LTR_{TwitStan}\). Because argument mining in Twitter tends to find different types of evidence, which is usually described objectively and it is difficult to see the stance of tweeter. However, the claim needs the tweeter to clearly express his stance. So our following experiment is on the basis of \(LTR_{TwitStan}\).

Experiment II: Topic-Independent Features. The first column of Table 4 presents the effect of using Twitter structural features and topic-independent claim-related features. Each feature is combined with the \(LTR_{TwitStan}\) and evaluated separately. Among these Twitter features, re-tweet (“RT @”), reply, structure (re-tweet+URLs+reply) intuitively perform better than others, which serve as useful proofs to conceive that some Twitter specific features really have correlation with claims. The improvement of ranking result using re-tweet feature is very possible because of the high forward frequency of valuable claim. As for the reply, it is probably because the argumentation always occurs during the discuss or quarrel. Besides, some features’ combination may greatly improve the performance. For example, News in Twitter presents a specific structure as it contains both re-tweet and URLs, and it rarely contains a claim. For comparison, we use a controversy lexicon (CL) that has been proved useful for document claim-oriented retrieval [15]. However, the 7th case in Table 4 shows that CL is not very effective in Twitter. This may be because the text of tweets is different from documents.

Table 4. Experiment results (structure:re-tweet+URLs+reply, TI: structure + TICRLex, TD: TDCOLex). A significant improvement over the \(LTR_{TwitStan}\) with \(^\triangle \) and (for p < 0.05 and p < 0.01).

Experiment III: Topic-Dependent Lexicon. Table 5 gives claim-related terms in the TICRLex and the claim-oriented terms in TDCOLex of topic “abortion”. Apparently, the terms in TICRLex are some modal verbs, linking verbs, conjunction, negative words and punctuation which often do not have an exact meaning but are used to form a sentence pattern. However, words in TDCOLex tend to be content words. For example, when it comes to Abortion, “rights”, “murder”, “control” are included. Part of the reason can be that abortion supporters often think that abortion is part of women rights, while “abortion is murder”, “abortion is not birth control” are claims widely accepted by opponents. The 8th case in Table 4 shows that topic-dependent lexicons provide further boost to a model on the basis of topic-independent features. It shows that our lexicon does capture important topic-dependent claim-oriented information.

Finally, both effective topic-independent and topic-dependent elements including BM25, features in TwitStan, Re-tweet, Reply, Urls, TICRLex(best), TDCOLex(best) have been added to build our best model \(LTR_{TI}+[TD]\) which improved the MAP by 95.7% compared with solely BM25, and 17% compared with \(LTR_{TwitStan}\).

Table 5. Comparison of the claim terms in TICRLex and TDCOLex of topic “abortion”.

5 Conclusion and Future Work

We define a novel claim-oriented tweet retrieval task which will be certainly helpful in the development of public opinion research. We utilize the Twitter structural information to deal with the chaotic Twitter problem, and leverage claim-oriented lexicons to solve the vague claim problem. The topic-dependent claim-oriented lexicon can be generated using a large number of unlabeled topic-related tweets. Hence, our model can be easily adapted to new emerging topics in Twitter. We construct a real-world dataset. The best performance of our model improves the MAP by 95.7% compared with BM25 baseline, and 17% compared with \(LTR_{TwitStan}\) baseline.

The main future work is threefold: first, we plan to use our automatic method to get an extended corpus and leverage deep learning techniques to learn more claim-oriented features. Second, we will diversify the searched claims and detect the relevant evidence of the known claim to generate a complete argumentation structure in Twitter. Third, we will study how to assess the quality of a claim.