Keywords

1 Introduction

When an author attempts to provide an argument for something, a number of argumentation patterns can be employed. An argument is the key point of any persuasive essay or speech. The target of this paper is to recognize discourse features of text where an author not just shares her point of view but also provides a reason for it and attempts to prove it as well. To systematically extract argumentation patterns, we compile the Intense Argumentation Dataset where authors attempt to back up their complaints with sound argumentation.

Naturally, the text units considered in discourse analysis correspond to argument components, and discourse relations are closely related to argumentative relations. However, the traditional training dataset for rhetoric parsing consists of newspaper articles which do not necessarily involve heavy argumentation, and only relations between adjacent text units are identified. It is still an open question how the proposed discourse relations relate to argumentative relations [2].

To represent the linguistic features of text, we use the following sources: 1) Rhetoric relations between the parts of the sentences, obtained as a discourse tree [21]; 2) Speech acts, communicative actions, obtained as verbs from the VerbNet resource (the verb signatures with instantiated semantic roles). These are attached to rhetoric relations.

The final goal of this ongoing research is to estimate the contribution of each feature type to the problem of argument identification in text fragments.

The main contribution of our work at the current step is the following:

  1. 1.

    We apply the notion of Communicative Discourse Tree (CDT) for the specific text classification task

  2. 2.

    We develop a part of a text classification framework that includes automatic CDT extraction from text paragraphs, tree kernel learning on CDT, kNN learning on CDT based on computing similarity between CDTs.

  3. 3.

    We apply our framework for the binary classification task on the dataset consisting of mixed texts with different types of argumentation and compare the performance of a few learning methods in combination with different features.

  4. 4.

    We built and published new language resourse - Intense Argumentation Dataset containing different patterns of valid and invalid argumentation.

2 Related Work

Most previous work in automated discourse analysis is based on the extracting patterns from the corpora annotated with discourse relations, most notably the Penn Discourse Treebank (PDTB) [32] and the Rhetorical Structure Theory (RST) Discourse Treebank [3]. An extensive corpus of studies has been devoted to RST parsers, but the research on how to leverage RST parsing results for practical NLP problems is rather limited.

It is well known that argumentation and discourse structure of text are strongly related with each other. In [1] authors claim that performing an RST analysis essentially subsumes the task of determining argumentation structure. As it was shown [29] recently RST analysis can in principle support an argumentation analysis. Also an annotation of a discourse structure [41] is a field that is closely related to the annotation of argumentation structures [15].

There are different approaches to argument mining. The basic argument model can be represented with a scheme - a set of statements which contains three elements: a conclusion, a set of premises, and an inference from the premises to the conclusion [38]. Other models were offered in [37] and [6]. In general, all these models refer to an argument as a conclusion (or a claim) and a set of premises (or reasons). Text fragments can be classified into argumentation schemes - templates for typical arguments. So argument mining can consist of the following steps: identifying argumentative segments in text [19, 20, 36], clustering and classifying arguments [24], determining argument structure [10, 17], getting predefined argument schemas [4]. Recent works in argumentation mining study different features related to discourse, considering arguments which support claims [9, 11, 30], the relationship between argumentation structure and discourse structure (in terms of Rhetorical Structure Theory) is also the focus of contemporary research [31].

Previously, annotation schemes and approaches for identifying arguments in different domains have been developed, including [27] for legal documents, [40] for newspapers and court cases, [5] for policy modelling, and [33] for persuasive essays.

The concept of automatically identifying argumentation schemes was first discussed in [39] and [4]. Most of the approaches focus on the identification and classification of argument components. In [10] authors investigate argumentation discourse structure of the specific type of communication - online interaction threads. In [16] three types of argument structure identification are combined: linguistic features, topic changes and machine learning.

3 Communicative Discourse Tree

3.1 Case Study

We consider a controversial article published in Wall Street Journal about TheranosFootnote 1, a company providing healthcare services, and the company rebuttal.

RST represents texts by labeled hierarchical structures, called Discourse Trees (DTs). The leaves of a DT correspond to contiguous atomic text spans, Elementary Discourse Units (EDUs). EDUs are clause-like units that serve as building blocks. RST relations connect adjacent EDUs to form next-level discourse units represented by internal nodes. These nodes are in turn subjects to linking by RST relations. Discourse units linked by a rhetorical relation are further distinguished based on their relative importance in the text: nuclei are the core parts of the relation and satellites are peripheral or supportive ones.

We build an RST representation of the arguments and observe if a DT is capable of indicating whether a paragraph communicates both a claim and an argumentation that backs it up. We will then explore what needs to be added to a DT so that it is possible to judge if it expresses an argumentation pattern or not.

DTs and their images in this case study are obtained by the software of [35]. This is what happened according to CarreyrouFootnote 2:

"Since October [2015], the Wall Street Journal has published a series of anonymously sourced accusations that inaccurately portray Theranos. Now, in its latest story (“U.S. Probes Theranos Complaints,” Dec. 20), the Journal once again is relying on anonymous sources, this time reporting two undisclosed and unconfirmed complaints that allegedly were filed with the Centers for Medicare and Medicaid Services (CMS) and U.S. Food and Drug Administration (FDA)."

Fig. 1.
figure 1

Discourse tree with multiple rhetoric relations

We as a readers understand that Theranos attempts to rebuke the claim of WSJ. But Fig. 1 demonstrates that just from a DT and multiple rhetoric relations of elaboration and a single instance of background, it is unclear whether an author argues with his opponents or enumerating some observations.

"Theranos remains actively engaged with its regulators, including CMS and the FDA, and no one, including the Wall Street Journal, has provided Theranos a copy of the alleged complaints to those agencies. Because Theranos has not seen these alleged complaints, it has no basis on which to evaluate the purported complaints."

For the following paragraph Fig. 2 shows the DT with additional communicative actions labels which help to identify presence of argumentation. When arbitrary communicative actions are attached to DT as labels of its terminal arcs, it becomes clear that the author is trying to bring her point across and not merely sharing a fact.

"But Theranos has struggled behind the scenes to turn the excitement over its technology into reality. At the end of 2014, the lab instrument developed as the linchpin of its strategy handled just a small fraction of the tests then sold to consumers, according to four former employees."

Fig. 2.
figure 2

Discourse tree with attached communicative actions

3.2 Definition

As it can be seen from this example to show the structure of arguments we need to know the discourse structure of interactions between agents, and what kind of interactions they are. We do not need to know domain of interaction (here, health), the subjects of these interaction (the company, the journal, the agencies), what are the entities, but we need to take into account mental, domain-independent relations between them.

Communicative discourse tree (CDT) [7] is a DT with labels for arcs which are the VerbNet expressions for verbs which are communicative actions (CA). The arguments of verbs are substituted from text according to VerbNet frames. Arguments of verbs are substituted from text according to VerbNet frames. The first, possibly second and third argument are instantiated by agents and the last ones by noun or verb phrases. These phrases are subjects of communicative action.

CA can take the form of verb (agent, subject, cause) where verb characterizes, for example, some sort of interaction between a customer and company in a complaint scenario (e.g., explain, confirm, remind, disagree, deny), agent identifies either the customer or the company, subject refers to the information transmitted or object described, and cause refers to the motivation or explanation for the subject. A communicative action associated with some customer claim such as I disagreed with the overdraft fee you charged me because I made a bank deposit well in advance would be represented as: disagree (customer, overdraft fee, I made a bank deposit well in advance). VerbNet frames are used to apply the computational part of Speech Act theory to discourse analysis, formalizing CAs.

For the details of DTs we refer the reader to [12], and for VerbNet Frames to [14]. To build CDT automatically we combined together discourse parsers [13, 35] with our own modules focused on extracting and information from VerbNet [28] into one Java-oriented system. Our project and examples of CDT representation can be found at GitHub.

4 Text Classification Settings

To evaluate the contribution of our sources, we use two types of learning on CDT graph representations of a paragraph. 1) Nearest Neighbour (kNN) learning with explicit engineering of graph descriptions. We measure similarity as an overlap between the graph representation of a given text and that of a given element of a training set. 2) Statistical tree kernel learning of structures with implicit feature engineering.

We consider standalone discourse trees and scenario graphs built on communicative actions extracted from the text as well as full CDT graphs.

Our family of pre-baseline approaches is based on keywords and keywords statistics. Since mostly lexical and length-based features are reliable for finding poorly supported arguments [34], we combined non-name entities as features together with the number of tokens in the phrase which potentially expresses argumentation.

4.1 Nearest Neighbour

To predict the label of the text, once the CDT is built, one needs to compute its similarity with CDTs for the positive class and verify that it is lower than similarity to the set of CDTs for its negative class. Similarity between CDT’s is defined by means of maximal common sub-CDTs [7]. Formal definitions of labeled graphs and domination relation on them used for construction of this operation can be found, e.g., in [8]. To handle meaning of words expressing the subjects of edge label, we also apply word2vec models [22, 23]. Similarity of meaning is calculated on a word-by-word basis: if two words are in the same syntactic role, only then they are matched. For computing maximal common sub-CDT we developed our own programming module which is also integrated into the project mentioned above.

4.2 SVM Tree Kernel

In this study we extend the tree kernel definition for the CDT, augmenting DT kernel by the information on communicative actions [7]. A CDT can be represented by a vector of integer counts of each sub-tree type (without taking into account its ancestors). The terms for Communicative Actions as labels are converted into trees which are added to respective nodes for RST relations. For Elementary Discourse Units (EDUs) as labels for terminal nodes only the phrase structure is retained: we label the terminal nodes with the sequence of phrase types instead of parse tree fragments. For the evaluation purpose we used Tree Kernel builder tool [25].

5 Datasets

5.1 New Intense Argumentation Dataset

The set of tagged customer complaints about financial services is available at GitHub.

The purpose of this dataset is to collect texts where authors do their best to bring their points across by employing all means to show that they are right and their opponents are wrong. Complainants are emotionally charged writers who describe problems they encountered with a financial service and how they attempted to solve it. Raw complaints are collected from PlanetFeedback.com for a number of banks submitted in 2006–2010. Four hundred complaints are manually tagged with respect to perceived complaint validity, proper argumentation and detectable misrepresentation.

Judging by complaints, most complainants are in genuine distress due to a strong deviation between what they expected from a service, what they received and how it was communicated. Most complaint authors report incompetence, flawed policies, ignorance, indifference to customer needs and misrepresentation from the customer service personnel. The authors are frequently exhausted from communicative means available to them; they could be confused, seeking recommendation from other users. The focus of a complaint is a proof that the proponent is right and her opponent is wrong, resolution proposal and a desired outcome.

Multiple argumentation patterns are used in complaints. The most frequent is a deviation from what has happened from what was expected, according to common sense. This pattern covers both valid and invalid argumentation. The second in popularity argumentation patterns cites the difference between what has been promised (advertised, communicated) and what has been received or actually occurred. This pattern also mentions that the opponent does not play by the rules (valid pattern).

A high number of complaints are explicitly saying that bank representatives are lying. Lying includes inconsistencies between the information provided by different bank agents, factual misrepresentation and careless promises (valid pattern).

Another reason complaints arise is due to rudeness of bank agents and customer service personnel. Customers cite rudeness in both cases, when the opponent point is valid or not (and complaint and argumentation validity is tagged accordingly). Even if there is neither financial loss or inconvenience the complainants disagree with everything a given bank does, if they been served rudely (invalid pattern).

Complainants cite their needs as reasons bank should behave in certain ways. A popular argument is that since the government via taxpayers bailed out the banks, they should now favor the customers (invalid pattern).

We refer to this dataset as Intense because of the amount, strength and emotional load of customer complaints. For a given topic such as insufficient funds fee, this dataset provides many distinct ways of argumentation that this fee is unfair. Therefore, Intense Argumentation dataset allows for systematic exploration of the topic-independent clusters of argumentation patterns and observe a link between argumentation type and overall complaint validity. Other argumentation datasets including legal arguments, student essaysFootnote 3, internet argument corpusFootnote 4, fact-feeling datasetFootnote 5, political debates have a strong variation of topics so that it is harder to track a spectrum of possible argumentation patterns per topic. Unlike professional writing in legal and political domains, authentic writing of complaining users have a simple motivational structure, a transparency of their purpose and occurs in a fixed domain and context. In the Intense Argumentation Dataset, the arguments play a critical rule for the well-being of the authors, subject to an unfair charge of a large amount of money or eviction from home. Therefore, the authors attempt to provide as strong argumentation as possible to back up their claims and strengthen their case.

Using our Intense Dataset, one can find correlation between argumentation validity, truthfulness and overall complaint validity. If a complaint is not truthful it is usually invalid: either a customer complains out of a bad mood or she wants to get compensation. However, if the complaint is truthful it can easily be invalid, especially when arguments are flawed. When an untruthful complaint has valid argumentation patterns, it is hard for an annotator to properly assign it as valid or invalid. Three annotators worked with this dataset, and inter-annotator agreement exceeds 80%.

5.2 Additional Evaluation Dataset

For the particular task described in this paper we collected a large dataset, which includes the Intense Argumentation Dataset described in the previous section.

Evaluation dataset was divided into two parts: “positive” and “negative”. Texts from the first part are expected to contain any kind of argumentation inside. We formed the positive dataset from a few sources to make it non-uniform and pick together different styles, genres and argumentation types. First we used a portion of data where argumentation is frequent, e.g. opinionated data from newspapers such as The New York Times (1400 articles), The Boston Globe (1150 articles), Los Angeles Times (2140) and others (1200).

As it was mentioned earlier we also used our new Intense Dataset. Besides, we use the text style & genre recognition dataset [18] which has a specific dimension associated with argumentation (the section [ted] “Emotional speech on a political topic with an attempt to sound convincing”). And we finally add some texts from standard argument mining datasets where presence of arguments is established by annotators: “Fact and Feeling” dataset [26] (680 articles) and dataset “Argument annotated essays v.2” [33](430 articles).

Negative part of the dataset consists of texts written in a neutral manner. We use Wikipedia (3500 articles), factual news sources (Reuters feed with 3400 articles) and also [18] dataset including such sections of the corpus “Instructions for how to use software” (320 articles); [tele], “Instructions for how to use hardware” (175 articles); [news], “A presentation of a news article in an objective, independent manner” (220 articles), and other mixed datasets without argumentation (735 articles). Both datasets include 8800 texts.

We used Amazon Mechanical Turk to confirm that the positive dataset includes argumentation in a commonsense view, according to the employed workers. Twelve workers who had the previous acceptance score of above 85% were assigned the task to label.

6 Evaluation

For the evaluation we split out dataset into the training and test part in proportion of 4:1 and balanced the split with respect to the label and to the source.

Table 1. Evaluation results: Nearest Neighbour classification
Table 2. Evaluation results: SVM TK

Extremely naive approach is just relying on keywords (bag-of-words) to figure out a presence of argumentation. The hypothesis here is that people use different words to describe facts vs words to back them up and explicitly provide argumentation. Usually, a couple of communicative actions so that at least one has a negative sentiment polarity (related to an opponent) are sufficient that argumentation is present. In Table 1, we see that this naive approach is outperformed by the top performing CDT approach by 22%. A Naive Bayes classifier delivers just 2% improvement. One can observe that for nearest neighbor learning DT and scenario graphs based on CA indeed complement each other, delivering f-measure of full CDT 17% above the former and 19% above the latter. Just CA delivered worse results than the standalone DT.

Nearest neighbor learning for full CDT achieves slightly lower performance than SVM TK for full CDT, but the former gives interesting examples of sub-trees which are typical for argumentation, and the ones which are shared among the factual data. The number of the former groups of CDT sub-trees is naturally significantly higher (Table 2).

Table 3. Classification results for individual sources of argumentation (F1 measure)

Table 3 shows the SVM TK argument detection results per source. As a positive set, we now take individual source only. The negative set is formed from the same sources but reduced in size to match the size of a smaller positive set. The cross-validation settings are analogous to our assessment of the whole positive set. We did not find correlation between the peculiarities of a particular domain and contribution of discourse-level information to argument detection accuracy. At the same time, all these four domains show monotonic improvement when we proceed from Keywords and Naive Bayes to CDT. Since all four sources demonstrate the improvement of argument detection rate due to CDT, we conclude that the same is likely for other source of argumentation-related information.

7 Conclusion

In this study we addressed an issue of argumentation detection in text using it communicative and discourse structure. We described a few representations that capture this kind of structure. We then compared two learning methods working over these representations. Performances of these learning methods showed that the bottleneck of text classification based on textual discourse information is in the representation means, not in the learning method itself. Comparing inductive learning results with the kernel-based statistical learning, relying on the same information allowed us to perform more concise feature engineering than either approach would do. We see that text classification, based on Nearest Neighbour learning, shows better results with Communicative discourse tree features than with only Discourse Tree features or with only communicative actions features. We also built and published a set of tagged customer complaints about financial services which can be used for the future case studies and research in argumentation mining and text classification.