Keywords

1 Introduction

Following previous editions, MC2 Lab 2018 was centered on multilingual culture mining and retrieval process over the large corpus of cultural microblogs [7] considered in the two previous editions [6, 8]. Two main tasks were considered: cross language cultural microblog search and argumentation mining.

The initial challenge for 2018 was, given a short movie review on the French VodKasterFootnote 1 Social Media, find related microblogs in the MC2 corpus in four different target languages (French, English, Spanish and Portuguese). Indeed, browsing the VodKaster website, French readers get personal short comments about movies. Since similar posts can be found on twitter we decided to display to the reader a concise summary of microblogs related to the comment he/she is reading, considering bilingual and trilingual users that would read microblogs in other languages than French. In this user’s context, personal and argumentative microblogs are expected to be more relevant than news or official announcements. Microblogs sharing similar arguments can be considered as highly relevant even though they are about different movies. From this initial task, came the idea of a second one focusing on argument mining in a multilingual collection. It consisted in finding personal and argumentative microblogs in the corpus. Public posts about cultural events like festivals are mostly promotional announcements by organizers or artists. Personal argumentative microblogs about specific festivals provide real insights into public reception but both their variety and rarity make them difficult to seek. Therefore, argumentative mining captured most of participant efforts during this lab edition. The cold start scenario of finding them without any specific learning resource motivated the use of IR approaches based on language model or specialized linguistic resources.

The rest of this paper focus on this specific task. Related work is presented in Sect. 2. Section 3 is devoted to task thorough description an motivations. Data including a baseline run is fully described in Sect. 4. Result and participant approaches are reported in Sect. 5.

2 Related Work

Argumentation (or argument) mining is the automatic extraction of structured arguments from unstructured textual corpora [10]. This task represents a new problem in corpus-based text analysis that addresses the challenging task [13] of automatically identifying the justifications provided by opinion holders for their judgments. The initial research of argumentation mining has been proposed for legal documents, on-line debates, product reviews, political debates and newspaper articles, court cases, as well as in the dialogical domain [3, 12, 13].

As a result of the advent of social media platforms, argumentation mining for social media text and user generated content has been proposed [5, 14]. The goal of argumentation mining with short and unstructured data is to improve our ability to process and infer meaning from social media text. In fact, this kind of data is characterized to be ambiguous by nature which makes it hard for a user to effectively understand what the opinion tweet is about. Generally, such tweets are indispensable to form a view about a new topic or make a decision based on users feedback. In such a case, expressed argument is all what we are looking for.

Regarding short texts, developed approaches for microblogs differ from techniques dedicated to other genres. These are usually longer, such as forums, product reviews, blogs and news. In fact high quality social media data sets annotated with argumentation structure are rare which affects the use of machine learning techniques. In this context we cite DART [4], a dataset to support the development of frameworks addressing the argument mining pipeline on Twitter.

This lack of resources and challenges to extract arguments from social media text could be explained by the fact that social media platforms such as comment boards on news portals, product review sites, or microblogs are less controlled communication environments where the communicative intention is not to engage in an argumentative discussion but rather to simply express an opinion on the subject matter [14]. To solve this issue, argumentation mining within social media text has to deal with several sets of features to capture the above mentioned characteristics for persuasive comment identification from user generated data. This was the case of [17] where authors propose and evaluate other features to rank comments for their persuasive scores, including textual information in the comments and social interaction related features.

3 Task

The proposed task is inspired from the field of focused retrieval. This later aims to provide users with direct access to relevant information in retrieved documents. For this task, a relevant information is expressed in the form of argument that supports or criticizes an event. So, we presume that the proposed method must perform:

  1. 1.

    a search process that focus on claims about a given topic out in a massive collection.

  2. 2.

    a ranking process that has a potential argumentative coming first.

Following such steps, a synthesis of many argument facets about a specific event is automatically constructed. Such an output could be treated more easily, on priority, by a festival organizer.

Argumentation mining is considered as an extension of the opinion mining issue from social network content. The main objective of this field is to automatically identify reason-conclusion structures that can lead to model social web user’s positions about a service, product or event expressed through social media platforms. As explored in [10] most argumentation mining approaches have tackled the challenging task of extracting arguments based on machine learning methods. However, in case of argumentation mining from social media like Facebook and Twitter, the lack of labeled corpora with argumentation information and the informal nature of user-generated content make this task more complicated.

Argumentation mining in this task tend to act in the same way of an Information Retrieval (IR) system where potential argumentative microblogs had to come first. A similar approach that addresses such purpose was presented in RepLab task [2], where the output of the priority task will be a ranking of microblogs according to their probability of being a potential threat to the reputation of some entity.

Following the task proposition described above, the argumentation mining task of MC2 lab is then defined as argumentation detection combined with priority ranking of argumentative microblogs. The detection of argumentation content will depend on a search process that arranged microblogs based on the amount of claims about a given culture event or festival name.

The evidence related to such claims would be an invaluable information for festival organizers, journalists and communication departments. It would be useful even to normal festival spectator, since it would summarize all argumentation facets that one needs to access in order to obtain a satisfactory overview about a festival name.

Participants were welcome to present systems that attempt the whole task objective (argumentation detection + argumentation ranking). These two phases are explicitly considered in Argumentation mining task as following:

  • Argumentation detection: Given a festival name as query (Topic), participants have to induce, from the microblog collection, the set of the most argumentative microblogs about this culture event.

  • Argumentation ranking: Participants are asked to judge the relevance of each microblog of the set in term of argumentation.

4 Data

4.1 Corpus

The MC2 corpus is a microblog stream, covering 18 months from May 2015 to November 2016, about festivals in different languages [7]. This corpus was provided to registered participants by ANR GAFES projectFootnote 2. It consists of a pool with more than 50M unique microblogs from different sources with their meta-information.

4.2 Topics

Given a cultural query about festivals in English or French. The task proposes to search for the 100 most argumentative microblogs.

We chose to gather microblogs based on the most visible festival names on FlickR (the famous photos sharing site)Footnote 3 in order to avoid getting microblogs from official pages of festival organizers and getting a maximum of personal microblogs

Only the subset of festivals with at least 300 photos has been considered. The selection was done through a manual exploration on the microblog corpus to ensure providing queries with enough argumentation content for our target audience.

4.3 Baseline

The baseline approach consisted in using Indri language model to search for argumentative microblogs. For each festival, a query including lexical features expressing opinion and argumentation was defined following  [1]. In argumentative microblogs, users usually use comparison language to compare and contrast ideas (More, less). Authors also tend to use pronouns like (my, mine, myself,I). Verbs like believe, think, agree and adverbs play an important role to identify argument components. They indicate the presence of a major claim and adverbs like also,often or really emphasize the importance of some premise [15]. Verbs like should, could are frequently used in argumentative context to express what users were expecting. In addition to this argumentative keywords list, we use a list expression opinion used in [9].

5 Results

Argumentative mining received considerable interest with 31 registered participants, but only 5 teams submitted a total of 18 runs per language. Organizers baselines were added to this pool. The NDGC has been adopted as the main official measure, but precision at 100 could have been used since it provided the exact same rankings.

Two reference sets of argumentative structures represented as regular expresions have been assigned to each query (festival name). One has been exracted apriori from the manual interactive run provided as baseline. A second one has been extracted from participant runs. To avoid duplicated content, only microblog textual content has been considered. All meta-data like URLs, #hashtags and @replies were removed. Most argumentative phrases have been extracted from this material and been modeled as generic Regular Expressions. These steps were both applied to the English and French runs.

Table 1 describes average NDGC results for English queries. Results on French are similar but due to a smaller number of queries, differences are not statistically significant. All participant systems relied on an initial step of pretreatment to filter the original dataset by language and topic.

ERTIM Team found the highest number of argumentative microblogs using lexical data enrichment [16]. This resource associates a score to each lemma according to the affective. Besides these lexicon based measures, opinion was detected based on the proportion of adjectives among all part of speech tags. In addition to this opinion scoring process, ERTIM tackled the argumentation detection in the same way by scoring opinion tweets based on the number of conjunctions. Conjunctions are discourse connector commonly used to structure a text. This was a systematic approach applied to all microblogs in the corpus. Although they found a number of argumentative microblogs higher than other participants for almost all queries, there was no overlap with argumentative microblogs found in the baseline runs.

Teams relying on language model using queries mixing multiword terms with argumentative connectors found less argumentative microblogs but a larger overlap with the reference extracted from the baseline run.

Table 1. Best average NDGC scores for top participants (English)

6 Conclusion

Previous editions of the MC2 lab focused on contextualization [6] and timeline illustration [8, 11] of cultural events over a 18 months period based on the ANR GaFes corpus [7]. In 2018 the main challenge has been to find authentic personal microblogs in this massive collection. This is required to portrait festival reputation among participants. Among them, public argumentative microblogs are the most important since they could have a direct impact on reputation. However, promotional microblogs by festival organizers tend to use similar syntax and form. The main finding of this year is that lexical filtering combined with part of speech analysis is the most efficient to detect these microblogs and rank them by priority. However, this extraction is not exhaustive. An interactive search using complex queries based on Indri language modelFootnote 4 lead to discover undetected relevant personal argumentative microblogs.