Keywords

1 Introduction

The problem of emotion detection in text is of current interest, as it can be applied in various domains: network discussion moderation, analysis of public opinion on companies, goods, events; text classification [1, 2]. At the same time this problem causes a lot of difficulties. The problems associated with the task of automating of emotion detection in text content are related to ambiguity and subjectivity of the natural language. It should be considered that the methods of identifying emotions are practically limited and, as a rule, are suitable primarily for detection of explicit emotions [2]. A more difficult task consists, for example, in identification of implicit aggression and, more generally, in correct processing of the content that can be either aggressive or neutral when taken out of context.

Moreover, it is necessary to pay attention to the peculiarities of the environment. In particular, discussions in social media and forums may contain heterogeneous textual and audiovisual content in different languages [21]. Depending on the analyzed media, the common terms, jargon, memes, lexicon and cultural canons of social groups may differ significantly. The techniques used by intruders to bypass auto-moderation in social media complicate technical text processing. The content is also characterized by the presence of messages with spelling errors, typos, punctuation quirks, emoticons. Poor grammatical correctness and vague syntactic structure of social media posts complicates the usage of natural language processing tools [8]. The task turns out to be challenging even for human annotators, although they could refer to context of each message [9].

Another feature of the social media content is a large number of short messages: such messages can be classified well only provided, they contain explicitly expressed emotions. Another problem consists in detecting sarcasm and irony in text messages as there is no agreement on formal description of these concepts. The results in [17] are satisfactory but have a limited practical applicability.

Large amount and heterogeneous structure of the content require its preprocessing, before the methods described here could be applied. The preprocessing is performed by reducing the text dimension for further consumption by neural networks and other classifiers. The diversity of the social network content complicates the research: it should be noticed that working on domain-specific corpus gives better results than working on the domain-independent corpus [5].

2 Related Work

2.1 Methods and Systems

Considering the aggression as a kind of sentiment expressed in text, we can use Sentiment Analysis (SA) as a method of data mining [13] for its detection. SA identifies the sentiment expressed in a text and then analyzes it. The datasets used in SA are of high importance in this field. The social network sites and micro-blogging sites are considered a very good data source because people share and discuss their opinions about a certain topic freely there [5]. Fields in SA include emotion detection (ED) that aims to extract and analyze emotions, both explicit and implicit, present in the sentences. It was argued in [15] that there are eight basic and prototypical emotions, specifically: joy, sadness, anger, fear, trust, disgust, surprise, and anticipation; there are also more approaches as well [27]. The problem is either handled as a binary classification case, where only positive and negative sentiments are considered, or as a multi-class classification problem when a fine-grained list of sentiments is used (e.g., anger, disgust, fear, guilt, interest, joy, sadness, shame, surprise) [4].

The difference between SA and ED consists in following: SA is concerned mainly in specifying positive or negative opinions, whereas ED is concerned with detecting various emotions from text. As a SA task, ED can be implemented using ML approach or Lexicon-based approach, but Lexicon-based approach is more common one [5].

In order to implement SA or ED, feature selection (FS) should be carried out first of all. FS may be performed by lexicon-based methods that require human annotation, and statistical methods which are automatic methods that are more frequently used; statistical methods may ignore or retain the information on the word sequence [5].

Key features mostly used for ED are terms presence and frequency [16], parts of speech (POS), opinion words and phrases, negations.

As an example of such features we can consider activity markers, psycholinguistic, lexical and semantic markers described in [14]. Natural language markers allow evaluating possibly aggressive or other harmful text aspects (presence of manipulative techniques, negative emotional background), reveal “hot” news characteristic of tabloid press, fake news, etc. Psycholinguistic markers (number of personal pronouns, POS frequency ratios, etc.), lexical markers (injective lexicon, destructive semantics) can be measured and used for text analysis.

Various methods for emotional text classification and, in particular, for aggressive text detection, are discussed in review articles [5, 12] and in the article [10]. Some web services for solving SA tasks are analyzed in [11]. At the same time, a lot of sources deal with a binary classification problem of single messages, without analyzing entire threads; they often employ a very similar text preprocessing pipeline comprising stop-word removal, tokenization, POS tagging, emoticon detection, stemming, etc., and a typical text feature extraction step which resulted in bag-of-words, or, bag-of-stems representations [4]. Some methods that deal with the problems specified in the previous section are summarized in Table 1.

Table 1. Methods for emotion detection.

Among the considered approaches, neural networks show the most robust and high performance [9, 10]. While applying the methods described above, some problems still remain. In particular, the overwhelming majority of methods require that corpora of labeled texts exist. Beside the tasks of constructing such a corpus for the Russian language, the problem is that the social media lexicon is volatile, so the corpus becomes obsolete.

The language problem is also significant: the majority of methods are optimized for English language; some other languages under research are Germanic and Latin languages, some languages of South-Eastern Asia and the Near East.

The text analysis services mentioned in [11] are shown in Table 2. It should be noted that some services described there are not available now, though they are said to be able to provide a wide range of possibilities, including evaluating not only the message polarity, but also the separate emotional constituents like fear, gratitude, shame (Lymbix).

Table 2. Services for text analysis.

2.2 Datasets

The data problem arises most pronounced when analyzing non-English texts. For example, there is an annotated corpus of messages from more than 200 000 units [19] in Russian, but those messages are classified just as negative and positive, without any detailed description of the emotions expressed. Datasets in English are much more diverse. Some of them are analyzed in [18]. These datasets are characterized by a large variety in emotion handling: classification by Ekman [20], Plutchik [15], and also some other approaches are present. Datasets of tweets in Russian [19], “The Emotion in Text, published by CrowdFlower” (39 740 tweets, Ekman) [22], TEC (Twitter Emotion Corpus, Ekman) [23], Emobank (Valence - Arousal - Dominance) [24] were used as well as some smaller corpora. In this work they were processed separately to determine which corpora provide the most accurate results.

One of the options for the use of English-language datasets for the classification of Russian-language text is the use of machine translation. Currently, machine translation systems show quite good results when using English as source or target language. Translation causes accuracy loss, but it can be assumed that the features discussed in Sect. 2.1 are preserved to a large extent.

3 Processing Scheme

To handle various datasets in uniform manner, they were supplied by JSON metadata files containing descriptions of the dataset format and structure. Such file pairs were used as the input data. Firstly, a cleanup operation is performed on the datasets, particularly, removal of irrelevant and special characters, hyperlinks, identifiers. Then comes standardization of whitespace characters, converting all characters to uniform case. In addition, the converted versions (translated and normalized) are created for the datasets.

Emotion estimates were converted into a numerical form. For datasets providing binary classification [19], the estimate was normalized. For the datasets annotated with a variety of emotions, transformed datasets were created with score values in the range [0; 1] for each considered emotion. In the context of identifying aggression, the classes “hate”, “anger”, “aggression”, etc. were assigned the value 1.0; all classes that do not carry any negative constituent (“happiness”, “fun”, “trust”) were characterized by the value 0.0; neutral classes with 0.5; classes with negative properties that do not characterize aggression explicitly (“fear”, “worry”, “boredom”) were described with values from the range (0.5; 1).

For the datasets, n-gram dictionaries are built. In this paper, n-grams of characters and words with different values of n were used. The approach with n = 1 for words is identical to the “bag of words” concept. The n-gram occurrence is used to build vectors for neural network training.

Summarizing the aforementioned concerns, the pipeline of data preprocessing can be represented in Fig. 1.

Fig. 1.
figure 1

Dataset preprocessing pipeline

To organize the full processing pipeline, the following class model was developed (Fig. 2):

Fig. 2.
figure 2

Classes for data processing

For data processing, as well for creating and training neural networks Python 3.6 was used. The train and test datasets comprise 67% and 33% of the original datasets, respectively. Libraries NLTK and Keras were applied to process text data and train neural networks, respectively, to predict the text aggressiveness using a regression predictive model.

4 Experiments

The modelling results are shown in Figs. 3 and 4. Experiments show that the highest accuracy is achieved for binary classification using the original Russian corpus. Text normalization does not positively influence the result, which can be explained by the semantic loss caused by converting word forms. The considered neural network architectures contained 1 or 2 hidden layers and up to k neurons, where k is the vector size. The maximum accuracy 83% was achieved with the configuration of a neural network with 2 hidden layers consisting of 50 neurons each. The achieved accuracy is lower than in the work [26], but it deals with domain-specific texts (film, customer reviews) which simplifies the classification task.

Fig. 3.
figure 3

Accuracy of aggression detection depending on the input data

Fig. 4.
figure 4

Accuracy of aggression detection depending on the vector size

The use of machine translation enables distinguishing particular emotions, but the accuracy is much lower in this case. The best results were achieved when using the TEC dataset (65%) for recognizing aggressive messages. Approximately the same accuracy was obtained in [25], and the authors are also able to distinguish overtly and covertly aggressive messages, but that work deals with English texts, so the authors could use English corpora directly.

Using datasets smaller than of 10 000 items did not result in a sufficient accuracy level. Despite the high error rate, such an approach can be used to estimate the aggressiveness probability, for example, to rank messages for subsequent manual verification.

5 Conclusions

The conducted experiments show that, using neural networks trained on the annotated corpora both in Russian and English it is possible to determine with a certain accuracy whether a text item in Russian contains an aggressive message. Such results can be used to estimate the aggressiveness probability, for example, to rank social network messages for subsequent manual verification or to adjust the chatbot behavior models. These results also enable feasibility studies on the possibility of detecting particular emotion types, i.e. fear, interest, in a text using corpora in another languages.

Further research directions include comparison of different approaches to build dictionaries and reduce vector dimensions, comparative analysis and feasibility studies of detecting particular types of emotions, complex analysis of multimodal content on the basis of the technique proposed in [21].