1 Introduction

Increasingly, web and mobile-based information technologies are used to support the public to participate in large-scale online open participative activities (LSOOP). People offer and justify their ideas and opinions and comment and build on others’ perspectives. These activities may have various end goals. For example, they might offer ways for people to express and share their perspectives regarding a recent event, such as the news comment area in CNN.com, or they may be money-driven crowdsourcing activities that solicit the participants’ input leveraging human power such as Amazon’s Mechanic Turk. They can be designed purposefully such as for the participants to offer their views and invite others to persuade and change the views, like the “Change My View” Reddit forum.Footnote 1 Or they are location-based and offer an opportunity for the community to deliberate and reach a consensus with respect to whether and how to address a community need.

Regardless of the purpose of the LSOOP activities, it is important for the success of the activity that the participants are aware of the others’ ideas and opinions so that they can learn from each other and make good judgements. Yet, this is particularly challenging in the LSOOP activity contexts. In these activities, the participants can join and leave at any time, and often have never worked with each other before (i.e., no prior knowledge about each other’s communication styles and/or perspectives). In addition, although the activities’ communication history is usually accessible to the participants in the environment, it often includes a large amount of communication data, which makes it time consuming and cognitively burdensome for the participants to process. All these characteristics make it difficult for the participants to identify, interpret, and evaluate the others’ ideas and opinions.

The work presented here is part of the first author’s long-term research program on designing tools that improve the participants’ awareness and understanding of their counterparts’ opinion in the LSOOP activities. Her approach in this program leverages machine learning, natural language processing, and awareness design knowledge. Recent research activities in Argumentation Mining (AM) have shown the potential of using computational techniques to automatically detect participants’ opinions and their rationales from the activities’ communication data (e.g., Walker et al. 2012; Hasan and Ng 2013; Sobhani et al. 2015; Biran and Rambow 2011; Khazaei and Xiao 2015a, b; Khazaei et al. 2015). She envisions that with these components of the arguments identified from the communication history, and the insights from design-oriented awareness research, design solutions can be explored to present the information to LSOOP participants so as to raise their awareness of the different issues and opinions represented in the activities as well as their justifications. This awareness will then help the participants to make an informed judgement in evaluating the issues and opinions.

Although much research is being conducted to automatically identify and classify opinions, much less work has focused on automatic detection of rationales. Yet it has been documented that knowing the others’ rationales helps interlocutors better interpret the others’ ideas and opinions (Xiao 2014). Therefore, to achieve the long-term goal of design to improve the participants’ awareness and understanding, an important issue is how to automatically detect the participants’ rationales in LSOOP communication data. In this important task, the three authors synthesize their expertise in argument mining and rationale studies (the first author), communication (the second author), and computational linguistics (the third author), and argue that there has not been enough conceptual work to properly illuminate the factors that shape the expression of rationales, thereby making automatic detection challenging. To remedy this conceptual gap, this paper presents a framework comprising three analytical aspects that, in our view, a computational tool should address in order to identify rationales in LSOOP communication data. Future work in this area, we argue, would benefit from a greater appreciation for these different analytical aspects, and work to advance automated systems that take advantage of them. Specifically, we believe that the detection of rationales should consider the linguistic, informational, and contextual and communicative aspects of rationales in the context of LSOOP activities.

In the remaining sections, we first discuss the concept of rationales and related concepts to illustrate the diversity and complexity of it, and to demonstrate that it is a necessity to use different analytical lenses to capture and evaluate the rationales using the communication context. We next review the AM studies that are relevant to the task of detecting rationales from online interactions. We then present the conceptual framework and describe the three analytical aspects in detail. We conclude with a better conceptual understanding of rationales, automated tools for detecting them can be substantially improved in their accuracy and ultimate utility for scholars and for LSOOP users.

2 Rationale, Argument, and Explanation

We understand an individual participant’s rationale from the LSOOP activity as her justification of her opinions. In LSOOP activities, the participant’s opinion and rationale may be explicitly provided in the text, but it may also be the case that the rationales are available to the readers by understanding the text at the pragmatic level or by sharing the same world knowledge. Certainly, there are many cases where the participant provides her opinion but not the rationale. To illustrate this, we selected some online comments from the Rutgers’ Argument Mining corpora,Footnote 2 and annotated the opinions and the rationales that are contained in these comments. The opinions (whether explicitly expressed as such or implicitly contained in an utterance) are in bold, the rationales are underlined:

  1. 1.

    I suppose a pervasive back-button on the iPhone wouldn’t make much sense since navigation on the iPhone is always one app deep at a time

opinion and rationale – explicit

  1. 2.

    I don’t exactly consider netbooks and the iPad to be on a collision course. My mum wants an iPad because it has a big screen and has intuitively limited functionality, and I want a netbook because I just need something that I can easily type on and customize the experience to what I want and need. Same position in the market (in between phones and regular computers) but not on the same paths in my opinion.

opinion and rationale – explicit

  1. 3.

    I never really gave much thought to how multitasking was really all that important for an iPhone and it didn’t seem to be at first. After using one for a while though, losing your context all the time as you move from one thing to another gets to be painful.

opinion – implicit but can find signals in the text (her opinion is: after all multitasking is important for an IPhone, and it is not well designed)

rationale – explicit

  1. 4.

    without amrica where whould japan be tho hmm? everyhting i buy says made in japan

opinion – implicit but can find signals in the text (her opinion is: Japan can not survive without America)

rationale – explicit

  1. 5.

    To Everyone that thinks the iPad is so much better than a netbook, and that there are no other multi touch devices. I tell you to look at the Asus T91MT. It has multi touch, it has 32gb SSD, SD slots, USB ports, audio ports, web cam, tablet mode, multi touch track pad, fast, 6 hours of battery life, and oh yeah it’s only $550. How can the iPad be considered better, what because it runs Windows 7? A normal PC OS? That is not a problem, as you can get tablet specific applications if you want, there just is no need for them. Just admit that the iPad is not innovative, and stop drinking the Apple koolaid.

opinion – explicit

rationale – implicit and the signals are hard to find in the text (the person used two aspects to support her opinion: 1. iPad is not better; 2. There are other multi touch devices)

  1. 6.

    One thing is for sure, iPad is nowhere close to an iPhone and nor it is close to consumer laptops in terms of future sales. It will remain as a niche product like MacBook Pros

opinion – explicit

rationale – not provided

In example 1, the participant’s opinion and rationale are connected by the discourse connector since. There are however many cases in which the opinions and the rationales, although explicitly laid out in the text, are not connected with discourse markers such as in the example 2. As can be seen from these six examples, detecting the rationales from the LSOOP communication data can be quite complex, especially for the implicit ones (note: the typos and grammatical mistakes from example 4 are from the original data). On the other hand, at the explicit level, i.e. when the rationales are identifiable in the communication data, rationales have been reported to contribute to the effectiveness of collaborative activities. Prior studies have shown that when the members’ rationales are shared e.g. in online idea generation activities, intellectual contribution improves and the peers’ engagement and performance can be enforced (Monk 2003; Xiao 2013, 2014; Xiao and Carroll 2013).

An individual’s rationale should not be confused with her intention. Consider the following statement: On that day of gathering, we should all use public transportation when we travel, because there is too much air pollution from the cars. Although the because clause indicates the speaker’s rationale, that is, her justification for her opinion resides in the first part of the sentence, the speaker’s intention to offer this opinion can actually be not to make the roads too crowded on that day.

The concept of rationale shares properties with other concepts studied in closely related research fields, e.g. reasoning and argumentation, persuasion, and explanation. We discuss these connections here to further illustrate the complexity of identifying the features that characterize rationales in LSOOP activities.

By definition, an argument can be considered as “a reason given in proof or rebuttal; discourse intended to persuade; a coherent series of statements leading from a premise to a conclusion ” (Merriam-Webster), “A course of reasoning aimed at demonstrating truth or falsehood; A fact or statement put forth as proof or evidence; a reason; A set of statements in which one follows logically as a conclusion from the others” (American \(\hbox {Heritage}\circledR \) Dictionary), and “A reason or set of reasons given in support of an idea, action or theory” (Oxford Dictionary). As can be seen from these definitions, an argument is often associated with the intention to prove with evidence, facts, or reasoning. In Toulmin’s (1958) structural model of an argument, an argument is a movement from data, i.e., the evidence, to claim, i.e., the conclusion. The transition between data and claim is made possible because of a warrant. With this structural model, the warrant justifies the claim in the argument based on the data, and it may be justified by its backing. Therefore, a participant’s rationale, when it appears in an argument, can be considered as the warrant or the warrant’s backing. In our interest of detecting rationales, we do not differentiate these two movements, that is, the warrant and the warrant’s backing, in LSOOP communication data.

Argumentation is a complex social process. Van Eemeren et al. (2013) explain that argumentation is a verbal and social activity of reason that uses propositions to justify or refute a controversial standpoint so as to influence its acceptability by the listener or reader. From this perspective, when a rationale is present in an argumentation process, justification can be viewed as the rationale’s function, and its ultimate goal is to persuade the others to accept the participant’s opinion.

The participants’ rationales that are of interest to us are sometimes interpreted as explanations. With an explanation, ideas are meant to be easier to understand. When a participant’s explanation achieves this goal by using the justification for her action or belief, it falls into the category of rationale in our sense. Explanations have been studied in knowledge-based systems (KBS) where researchers examine when, why, and how explanations are used (Mao and Benbasat 2000; Gregor and Benbasat 1999; Giboney et al. 2015), and the effects of explanation use (Ye and Johnson 1995; Bunt et al. 2012). Three common types of explanations are studied in these research communities: reasoning-trace explanation, justification explanation, and strategic explanation (Mao and Benbasat 2000). Reasoning-trace explanations reflect a system’s problem-solving logics thus are rule-based. Justification explanations are the ones provided to rationalize the behavior of KBS, in other words, why what the system is doing is reasonable and appropriate. Strategic explanations help users be aware of the overall problem-solving strategy in a particular task. The notion of justification explanation is analogous to the concept of rationales in the scope of our interest. Philosophers have distinguished between justificatory reasons and explanatory reasons, and warned that a person may not be aware of the reason for her action, and thus the reason she provides may not reflect the reality of the situation (Lenman 2010). This is also possible in one’s rationale for her opinion. Using the rationale example 2 above, the participant used the stories of her mom and herself to justifiably explain that netbook and iPad have focused on different technical aspects, and thus have types of consumers in the market. According to the participant, “my mum wants an iPad because it has a big screen and has intuitively limited functionality”, but it is possible that the mom wants an iPad because she hopes to have the same device as her neighbor friend – a reason that the participant or even the mom has not recognized themselves. The veracity of rationales is often contested, and indeed is often the source of argumentation. Our interest is in identifying rationales for their own sake – regardless of whether they are true or not. So we do not differentiate here between the provided justificatory explanation and the real reason of the opinion.

In their research of developing computational techniques to identify justifications from online interaction data, Biran and Rambow (2011) described five types of justification utterances as follows

“1. Recommendation for action, and motivation for proposed action....

2. Statement of like or dislike or of desires and longing, and subjective reason for this like or dislike or desire or longing...

3. Statement of like or dislike or of desires and longing, and claimed objective reason for this like or dislike or desire or longing...

4. Statement of subjectively perceived fact, with a proposed objective explanation...

5. A claimed general objective statement and a more specific objective statement that justifies the more general one....”

(Biran and Rambow 2011, p. 162)

In our research of the rationales from LSOOP activities, a rationale can be a justification of types 2 through 5. In summary, the concept of rationale is complex and diverse, presenting a challenge to elicit features that can characterize it in the detection and criteria that can evaluate the detected rationales.

3 Related Work in Argumentation Mining

With the proliferation of online interactions enabled by Web 2.0 technologies, mining these data for embedded arguments has become more and more attractive to researchers. As suggested in the previous section, one is expected to find participants’ rationales in argumentation processes. Hence, we review the growing literature of argumentation mining to understand the state-of-the-art with respect to detecting rationales from the online interactions. First defined by Mochales-Palau and Moens (2009), argumentation mining (AM) is an emerging research area that intersects with natural language processing, argumentation theory, and information retrieval. The authors maintain that, “the aim of argumentation mining is to automatically detect the argumentation of a document and its structure. This implies the detection of all the arguments involved in the argumentation process, their individual or local structure, i.e. rhetorical or argumentative relationships between their propositions, and the interactions between them, i.e. the global argumentation structure” (Mochales-Palau and Moens 2009, p. 98). In this area, the researchers create annotated corpora for AM research in scientific articles, newspaper editorials, Wikipedia articles, and social web texts (Reed et al. 2008; Green 2014; Houngbo and Mercer 2014; Kiesel et al. 2015; Aharoni et al. 2014; Goudas et al. 2014). The AM researchers also develop computational tools to evaluate whether or not a statement is argumentative (Mochales-Palau and Moens 2009), to identify argumentation components such as stance (Walker et al. 2012; Hasan and Ng 2013; Sobhani et al. 2015), the argumentation styles or argumentative units (Oraby et al. 2015; Lawrence and Reed 2015), and to generate argumentation structure or summarize the argumentation by identifying prominent arguments (e.g., Reisert et al. 2015; Misra et al. 2015). While argumentation mining has started in mining legal documents (e.g., Mochales-Palau and Moens 2009), the methodology has now been applied to other diverse text contexts, such as scholarly articles (Green 2014), online discussions (Rosenthal and McKeown 2012), and persuasive essays (Nguyen and Litman 2015). In addition, argumentation mining has been used as a method to identify information from the text for the benefits of non-argumentation related topics, e.g., to offer surveillance on influenza (e.g., Lamb et al. 2013) and to detect influencers in the interactions (e.g., Biran and Rambow 2011).

The most relevant AM studies for our purposes are those that focus on identifying supporting information of the stance from online discussions, because rationales are one type of supporting information by offering justification of one’s opinions. Interested in identifying different types of supporting information in online interactions, Park and Cardie (2014) find that different types of propositions often come with different types of supporting information. Therefore, by identifying different types of propositions, one can identify the types of corresponding supporting information. With Support Vector Machine (SVM) (Park and Cardie 2014) and Conditional Random Fields (CRF) (Park et al. 2015) classifiers trained with various features, the best overall performance is by a SVM classifier with features including n-grams and those capturing verifiability and experientiality.

In their survey of the argument diagramming techniques, Peldszus and Stede (2013) reviewed the studies that used the Rhetorical Structure Theory (RST) framework to represent the argument structure. RST provides a model of the organization of discourse elements (Mann and Thompson 1988). It views text as a chain of discourse units that have different purposes, and it aims at describing how the different segments of a text are logically connected. There are two types of units in RST: nucleus (i.e., the main passage) and satellite (i.e., the supplemental part). RST views discourse relations as paratactic or hypotactic relations that hold between nuclei and satellites, and account for the construction of coherence in discourse (Mann and Thompson 1988; Taboada 2006). RST is one of the most influential discourse relation analysis frameworks. Peldszus and Stede (2013) suggested that while there exist interesting parallels between RST’s relations and argumentative moves, RST is limited in presenting the argumentation structure. For example, an argument may include non-adjacent but interdependent components as parts of the argument. RST cannot clearly depict this dependency when the text units are distant.

On the other hand, when it comes to detecting the rationales in an argument as opposed to identifying the argumentation structure, researchers have found it useful to identify the common RST relations in order to detect the rationale. Biran and Rambow (2011) attempted to detect justifications from online blog threads through the common discourse relations they contain. The authors selected 12 RST relations from the RST Treebank that they believed to be commonly present in justifications. They then developed a classifier that uses word pair indicators of the discourse relations to identify whether or not the sentence after a claim is a justification or not.

With the original RST scheme from Mann and Thompson (1988), Xiao and Carroll (2015) annotated the RST discourse relations of the rationales from small group idea generation activities, and found that three RST discourse relations are commonly present in their sample: circumstance, elaboration, and evaluation. Based on this finding, Khazaei and Xiao (2015a, b) studied the applicability of using corpora-based lexical cues to detect the three relations. The authors discovered that of the three relations only the circumstance relation responds to this approach well. The research group further explored the use of a probability-based model to disambiguate the cues for the circumstance relation in order to improve the performance of this approach (Khazaei et al. 2015). Their approach obtained a promising performance of detecting discourse relations, compared to the existing techniques reported in Biran and Rambow (2011) and Ji and Eisenstein (2014).

Our review on the automatic detection of online users’ rationales implies that the researchers’ effort has been mainly focused on leveraging the information revealed from the text’s discourse relations with a few exceptions such as Wyner et al.’s (2012) work. To support an analyst in examining the argumentation of online product reviews, these authors proposed five “tiers” of analysis: a consumer argumentation scheme, a set of discourse indicators, sentiment terminology, a user model, and a domain model. In their work, a participant’s justification is expressed in terms of the elements of these tiers of analysis. While this set of analytic aspects is more comprehensive, they are for product reviews specifically.

Rationales are expressions reflecting our complex cognitive and metacognitive processes. We argue that a more comprehensive approach is needed to identify them from online communication data. To promote a systematic investigation of a rationale detection tool, we have been developing a conceptual framework to suggest multiple analytic lenses in identifying the rationales. While our decision of including an analytic lens in the framework is mainly based on our conceptual analysis, we use examples to illustrate our views in our presentation below.

4 Conceptual Framework for Detecting the Participants’ Rationales from LSOOP Activities

Our conceptual framework consists of three analytic aspects for detecting the rationales from online communication texts. The first lens examines the linguistic features of the rationale text, the second its information content, and the third the argumentative and communicative role of the rationale. Figure 1 shows the high-level view of this framework, and we discuss each of the lenses in the following subsections.

Fig. 1
figure 1

Our conceptual framework for automatically detecting rationales

4.1 Linguistic Features of Texts Conveying Rationales

Rationale is a relational concept, since it is defined as the justification of an idea or opinion, which is conveyed in a different text segment. Prior studies have shown that the relationship between a rationale and the idea or opinion is expressed as a discourse relation (Biran and Rambow 2011). Biran and Rambow (2011) and Xiao and Carroll (2015) have thus suggested that detecting rationales in LSOOP activities by detecting discourse relations as defined by the RST framework (Mann and Thompson 1988) is a promising approach. However, neither study has been able to show empirically which discourse relations are commonly used in indicating rationales. Biran and Rambow (2011) focused on developing a computational technique to detect RST discourse relations. The 12 discourse relations they selected are based on their reasoning instead of empirical evidence. Although Xiao and Carroll identified three common discourse relations indicating rationales, those in their sample were not representative of those in LSOOP activities. In their study, the rationales were produced in online small group tasks. Also, in the tasks’ virtual group workspace, the individuals provided their opinions in one space and their rationales in another space. Xiao (2013) made a legitimate claim on the value of offering a dedicated rationale space in online small group activities. But for detecting rationales from LSOOP communication data, the findings of the discourse relation analysis based on this rationale sample are not enough.

We take these studies’ perspective but call for more research to study the discourse relations in texts conveying rationales in the LSOOP activities. Currently, the first author’s research group has been analyzing annotated corpora provided by the Rutgers’ Argumentation Mining group.Footnote 3 The corpora consist of five blog posting datasets from Technorati (technorati.com) between 2008 and 2010, as well as the first 100 comments along with the original posting in each blog posting corpus. Rutgers researchers employed human experts and Amazon Mechanical Turkers to annotate the dataset by identifying two particular text segments: call-outs and targets. According to their annotation guidelines (Wacholder et al. 2014), a call-out includes one or both of the following: a) an explicit stance (indication of attitude or position relative to the target); and b) an explicit rationale (argument/justification/explanation of the stance taken). With these annotated datasets, the first author’s research group further identifies the rationales from the call-outs, annotates the call-outs that contain the rationales using RST, and examines the common discourse relations in these call-outs and in the rationales. Carrying out RST analysis consists in identifying the discourse segments, specifying their hierarchical structure, and then assigning one relation label of a pre-defined list of discourse relations to the adjacent segments. Figure 2 presents an example obtained from this corpora, that is, a comment that contains a rationale, and how it was analyzed with the RST schema. In this example, there are four segments (an example segment is “It’s not that there is nothing good about it”), three relations, and two hierarchical levels. The first two segments form the participant’s opinion (this comment is in response to a blog post about Twitter), and the other two segments form the rationale to justify this opinion.

Fig. 2
figure 2

An example of RST analysis on a call-out comment that contains a rationale

From this RST study, the research group has identified the call-outs that contain rationales from the corpora (N \(=\) 527 call-outs) and has been conducting RST analysis to these call-outs. The results of this work are expected to shed more light on our understanding of the discourse relations in the rationales.

Using RST discourse relations as indicators to detect rationales is one among several possible approaches. Recent progress in AM research implies the potential of other linguistic features as well. For example, Rosenthal and McKeown (2012) used a machine learning approach to detect opinionated claims in online discussions. Opinionated claims are those that come with the speaker’s intention to convince others to believe or accept them. The authors investigated the indicators of the opinionated claims in two datasets: LiveJournal blogposts (N \(=\) 285 blogposts) and Wikipedia discussion forums (N \(=\) 51 discussions). They explored the usefulness of various features in detecting the opinionated claims, including sentiment, committed belief, part-of-speech (POS) tags and n-grams, and social media features such as emoticons and slang. Committed belief is the case in which the writer believes a specific proposition is true. Rosenthal and McKeown’s study (2012) showed that of these features, sentiments and the detection of committed belief play an important role in detecting opinionated claims. As the opinionated claims often come with rationales (Biran and Rambow 2011), this study indicates the potential of using of these indicators to detect where the rationales are in the text. The authors also found that the two LSOOP environment affected how people offer the claims. Specifically, in the LiveJournal dataset, the sentiment and POS tags were better indicators, whereas committed belief and n-grams were more important for the Wikipedia discussion forums. We expect that the text environment also affects how people offer their rationales at the linguistic level.

Prior explorations of discourse moves in the academic writing imply another direction to look at in examining the linguistic aspect of the rationales. Swales (1990) identified various discourse moves through which academic authors provide research rationales of their research, like “Claiming Centrality”, “Indicating a gap”, “Highlighting a problem”, etc. Various automated methods have been proposed using text classification (Teufel 1998; Pendar and Cotos 2008) and syntactic and semantic patterns (Sándor 2007) to detect such pre-defined discourse moves.

In summary, the linguistic analytical aspect of rationales involves conducting lexical and syntactic analysis of the texts conveying rationales. While the majority of works has been focusing on the detection of discourse relations to identify rationales, other linguistic features are yet to be explored for this purpose.

4.2 Information Content Contained in the Rationales

The second analytical aspect examines the common types of information that a rationale may contain in a LSOOP activity in view of detecting rationales from the communication data. To illustrate what we mean with this lens, we present several examples here that have used this aspect in detecting other types of complex text units. One example is Bender and her colleagues’ work on detecting authority claims in online discussions. Bender et al. (2011) annotated 365 discussions from Wikipedia talk pages to capture alignment moves and authority claims. The authors explain that a “social act” is “a communicative move aimed at social positioning of a discussant within a group of participants, which may be specialized dialog acts” (Bender et al. 2011, p. 48). In their work, an authority claim is defined as a type of social act that the discussant uses for bolstering her credibility in the discussion. The authors identify different types of authority claims: credential, experiential, institutional, forum, external, and social expectations. The authors defined the forum type of authority claim as those statements that use Wikipedia’s policies, norms, or contextual rules of behavior to bolster one’s credibility in the discussion. In other words, the authors used the information contained in the statement to identify whether and which type of authority claim it belongs to. Marin et al. (2011) developed machine learning classifiers to identify these different types of information so as to detect authority claims in the discussions.

As suggested by Bender et al.’s study (2011), the types of information contained in an authority claim are domain dependent. Their work focuses on the Wikipedia talk page environment, so it is not surprising to see that forum appears as one type in the classification. A comparative study presented by Mulkar-Mehta et al. (2011) suggested the same argument. In this study, the authors worked on semantic relations in different domains (biomedicine vs. football) and found that surface patterns for causality are highly domain-dependent. This suggests that the way people make statements about casual relationships differ in different domains.

The same argument holds for the types of information in a rationale. Different purpose LSOOP activities – e.g., soliciting people’s opinions about a health issue, making a decision on a Wikipedia article, discussing whether or not a particular program should be funded through crowdfunding – call for different types of rationales. Because of this tight relationship between rationale types and communicative goals, we believe that the types of information contained in a rationale (at least a meaningful rationale) have a certain dependency on the communication context in the LSOOP activities. Hypothetically, in an online health information seeking context a meaningful rationale may include personal stories, symptom descriptions, descriptions of doctors’ visit, etc. A concrete example is the studies on Design Rationale (DR). DR studies have been evolving over several decades. In the early 90s, the HCI community defined DR as “the reasons and the reasoning processes behind the design and specification of artifacts” (Carroll and Moran 1991, p. 198). Later, researchers specified that DR should explain “why a decision has been made and why one design or solution has been preferred over another” (Schneider 2006, p. 91). Recently Rogers et al. (2014) explored the use of machine learning and natural language processing tools to extract design rationales from the Chrome web browser bug reports. The authors considered that rationales of the software system include the decisions made throughout the software lifecycle, the alternative solutions considered, and the justifications for choices made (or rejected). They set up eight types of DRs in their study: Requirements, Decisions, Alternatives, Arguments, Assumptions, Questions, Answers, and Procedures.

In summary, our second analytical lens involves detecting the types of information contained in a rationale in order to detect the rationales. This may be related to semantic analysis or topic detection carried out by natural language processing algorithms. To our best knowledge, this is an unexplored area, partially due to the lack of annotated corpora. We call for annotation studies of the LSOOP communication data to address this gap, and emphasize the role of the communication context in this approach.

4.3 Argumentative and Communicative Aspects of Rationales

The last aspect in our framework emphasizes a pragmatic analysis of rationales. The focus here is on the communicative and argumentative contexts and strategies of an individual’s rationale in the LSOOP activities. Reason-giving does not happen in a vacuum. People offer rationales in interaction with others, whether as part of a larger public discussion when someone writes a letter-to-the-editor or whether as part of a third-turn in a dyadic interaction. Rationales are understood as such not only by their linguistic features or their informational characteristics, but also by the context in which they are offered in interaction. Therefore, it is essential to also take into consideration features related to context that further inform rationales.

We maintain that small group research and group decision scholarship offer a valuable source for taking into account the context in the development of a rationale detection tool. This literature has highlighted the importance of considering opinion dynamics in small groups as well as the larger opinion climate. The theory of the spiral of silence (Noelle-Neuman 1992), for example, provides evidence that people have a sense of the climate of public opinion, which reduces the likelihood that they express their true opinions and the rationales behind them if they sense themselves to hold a minority perspective. Small group research suggests that reason-giving changes based on the dynamics of the group (Moscovici 1985). In situations where there is a clear minority perspective, those in the minority engage in distinct rationale moves, such as appeals to an objective truth. Providing consistent and frequent repetitions of the same rationales are likely as well (Ridgeway 1982). Other small group research suggests that groups in a minority position argue differently than those in the majority group (Meyers et al. 2000). This research suggests that group dynamics as well as larger political dynamics around minority and majority opinion matters for the extent and type of rationale that is likely to be given. Leveraging this body of literature implies that a rationale detection tool should pay attention to not only the linguistic aspects of the participant’s message itself but also be sensitive about the participant’s position in the discussion group in terms of his/her opinion and whether it is a minority or majority one.

The literature also highlights the importance of interactivity. Interactivity is defined as occurring when the third and subsequent utterance between two or more people is in response to a prior message (Rafaeli 1988). People in groups build on each other’s rationales as part of the discussion. Their subsequent messages take up, rebut, or agree with prior rationales, often by adding further rationales. This requires computational approaches that take message features of prior turns into consideration when assessing subsequent turns. Thus, tracking who is speaking to whom and categorizing messages based on this is important. Broadwell et al. (2012) have developed an algorithm that looks at messages in interaction by tracking noun phrases over multiple turns and other features to accurately categorize influential discourse. Similar tracking of turns and who is speaking to whom might improve the accuracy of rationale detection.

Arguments are interactions, and there is a sequential quality in such interactions. Aimed at automatically classifying different argumentative zones in conference papers, Teufel and Moens (2002) considered the context through different measures including the position of a sentence within its section and paragraph, and the most likely category suggestion given the preceding sentence. While their work is somewhat different from what we discuss here, their approach on context consideration is inspiring for rationale detection.

Prior study has shown that certain types of rationales are likely called for at different times and in particular sequences as part of argumentation (Meyers and Brashers 1998). In addition, rationales often are offered in the context of particular dialogue acts, which can only be understood in the context of prior turns. Disagreement, for example, may increase the probability of rationale giving because in the context of disagreement, those who are in opposition are called upon to justify their claims with further information or evidence (see, for example, Stromer-Galley et al. 2015). Thus, identifying particular dialogue acts, such as disagreements, as opposed to information-requests, simple assertions, or agreements, might improve the performance of classifiers by considering the increased probability of reason-giving in that context.

The communicative and argumentative context of the rationales includes not only the dynamics and interactivity of the LSOOP activities, but also the participants’ background or domain expertise. For example, Schneider (2014) annotated the participants’ rationales in Wikipedia’s article for deletion (AfD) deliberations using Walton’s argumentation schemes (Walton et al. 2008). She found that experts and novices used statistically significantly different argumentation schemes in constructing their rationales. Specifically, experts were more likely use Arguments from Precedent, whereas novices were more likely to use some schemes that were not perceived well in the community. These are Argumentation from Values, Argumentation from Cause to Effect, and Argument from Analogy. This implies that a participant’s experiences on the topic affects how the person reasons and offers rationales in the related LSOOP discussions. There are different places to understand the participant’s experiences in the LSOOP environments such as the participant’s profile information, his/her past activities and recognized contributions in the LSOOP environment (e.g., the number of votes a person has received in the past in Reddit), etc. It is thus expected that data from these places can help identify the participant’s rationales in the discussions. We call for more research to study the relationship between a participant’s domain expertise or experiences and his/her reasoning behavior to contribute to this analytic perspective.

In summary, our third analytical lens emphasizes consideration of the local or global context of the argumentation because rationales are influenced by such contextual constraints. Moreover, arguments are interactions, and it is likely that considering prior messages while looking for rationales in messages will help improve the accuracy of rationale identification.

5 Discussion

As mentioned in the introduction, our long-term research goal is to design tools that improve the participants’ awareness and understating of their counterparts’ opinion in the LSOOP activities. Our approach combines computational techniques and awareness design methodology. To improve this awareness, both the opinions and the rationales should be identified and presented. We chose to focus on the rationale detection task because it is much less studied, compared to the many studies on opinion identification/classification. Our review of the studies related to this task suggests the lack of a conceptual framework to systematically guide the investigation and development of a rationale detection tool.

To address this gap, we present a framework that considers three analytical aspects in how to detect a rationale from the LSOOP communications. We acknowledge the two constraints in our approach. Firstly, the framework has assumed separate processes of identifying/classifying the participant’s opinion and identifying his/her rationale, but one can expect that the two processes can mutually benefit from each other. For example, in a communication process one may provide one’s rationales in close proximity to where one offers the opinion. Thus an algorithm that locates the participant’s opinions may provide the first screening of the text for locating the participant’s rationales. Instead of considering this relative distance as another analytic lens, our framework treats it as part of the third level lens, that is, the communicative and argumentative lens. The relative positions of one’s opinions and rationales can be more complex, such as in the examples we gave in Sect. 2. Yet the semantic relationship between one’s opinion and rationales deserves more attention for the task of automatic rationale detection. We call for more research work on this topic.

This framework presents a general research road map for answering the question of how to identify rationales in LSOOP communications. The main contribution of this framework to existing methods is that it points not only to the computational linguistics literature, but also to information science, group research, and sociology literature. This widened scope also puts a constraint on this systematic investigation: rationale studies in these research areas are needed but many related topics are still underexplored at the moment especially in the LSOOP context. For example, what contextual triggers are needed to motivate one to offer his/her rationales in a LSOOP environment? What affordances or application features could support users providing better or deeper rationales for their opinions? How do participants’ status in the online community affect the way they reason? The answers to these research questions would advance the field, especially if they could be answered using automated approaches.

Automatically detecting rationales in LSOOP communications is only one component in our approach in order to achieve the long-term goal of this research program. An equally crucial aspect is the design of rationale-based awareness support tools. The focus here is not on argumentation visualization, but rather on how to improve the participants’ awareness and understanding of others’ opinions in the LSOOP activities. This awareness is an important component of the activity awareness. Activity awareness refers to a state of people’s bearing with respect to knowing what has happened, what is happening, and what may happen in the future based on various information sources, such as people’s perspectives, status, intentions, plans, and changes with respect to the joint endeavor and their skill development, the social networks they participate in, the socio-technical environment of the activity, and the community of practice the activity belongs to Carroll et al. (2006, 2009).

Various design studies have been conducted to support awareness in small group activities (e.g., Convertino et al. 2009; Biehl et al. 2007; Carroll et al. 2015), and there is some work supporting awareness through rationales in small group activities. For example, Farooq and Carroll (2011) investigated the consequences of offering status updates that conveyed a design rationale in small design teams in fully distributed collaboration. The authors found that such status updates contributed to an increased members’ awareness about their collaborators’ future plans. The authors suggested that activity awareness can engage users productively at a metacognitive level. Xiao (2013) designed a shared rationale space in a virtual group WYSIWIS (“What You See Is What I See”) workspace, and found out that knowing the other participants’ rationales contributed to different aspects of the activity awareness in an online small group activity. Xiao and Mazalov (2012) designed a Rationale Flower tool that aims at increasing the credibility of the shared information in a communication task through instant messaging. The flower has the topic in the center and the petals show the important information and its rationales (i.e., why the information is important to be considered). Xiao and Khazaei (2014) developed a software prototype called ProjectTales to help a project manager make a change decision by leveraging the previous rationales. To help the project managers compare the current situation with the situations of the previous projects, the authors associate the history of previous projects’ changes, the causes of these changes, and their decision rationales through interactive visualization features. Their design also connects the previous projects’ change decisions and the project progress to help the project managers understand the impact of these decisions on the previous projects.

Researchers also invest design efforts that are related to offering awareness support in LSOOP activities (e.g., Iandoli et al. 2014; Klein 2012), yet there are only two studies that are about using rationales to offer awareness support in the LSOOP activities. Mao et al. (2014a, b) developed a software prototype that parses and organizes the rationales from the Wikipedia Article for Deletion (AfD) discussions. The authors offered a simple table view to show the representative rationales for each topic along with the user names and the number of rationales they represented. Based on her analysis of the rationales from the AfD discussions, Schneider (2014) envisioned three kinds of support tools: to help the users author more persuasive arguments, to identify weakness in others’ arguments, and to summarize a debate’s overall conclusions. These ideas suggest the potential of the rationales to improve the participants’ awareness of the quality of the argumentation in the activities. With this discussion section, we have intended to indicate that previous explorations are very limited regarding the design of rationale-based awareness support. We call for more attention be paid to this topic that, we believe, has a great impact on users’ experiences in the LSOOP activities.

6 Conclusion

In large-scale online open participative (LSOOP) activities, the participants can find it challenging to be aware of others’ opinions and to interpret their perspectives, because the participation is highly dynamic, and the participants often do not have any prior knowledge of each other. At the same time, the participants’ opinions and their rationales are often provided in the communication data in these activities. In a new research area called Argumentation Mining, \(=\) researchers have been exploring different techniques to automatically identify and classify participants’ opinions in LSOOP activities. How to detect the participants’ rationales for their opinions, however, has been much less investigated.

Prior studies on the effects of the rationales in group work have illustrated that one’s awareness of others’ rationales has benefits to the activities, such as monitoring and potentially improving the quality of the group work (Xiao 2013, 2014; Xiao and Askin 2014, 2015), improving one’s awareness of the others perspectives (Xiao 2013), and offering a useful resource for new participants (Mao et al. 2014a, b, 2015). It is expected that one’s awareness of the others’ rationales will also help one interpret their opinions better.

Automated tools could be used to advance such awareness, but so far we have not seen accurate and useful automated tools to identify rationales. Thus in this paper, we provided a conceptual framework to study the multiple complementary indicators that characterize different aspects of rationales. We have focused on three important aspects: the linguistic features of texts that convey the rationales, the information content of the rationales, and the contextual and communicative aspects of the rationales. We discuss the importance of each aspect and suggest the research directions that seem promising based on the related studies. We also discuss the constraints that this framework presents and call for more research to address them.

While the main focus of this paper is to present our conceptual framework, that is, the three analytical lenses for studying and detecting rationales, we recognize the equal importance of designing the rationale-based awareness tools for the LSOOP activities. We briefly reviewed the related design work and call for design attention to this area.