Abstract
Social media is frequently plagued with undesirable phenomena such as cyberbullying and abusive content in the form of hateful and racist posts. Therefore, it is crucial to study and propose better mechanisms to automatically identify communication that promote hate speech, hostility, and aggressiveness. Traditional approaches have only focused on exploiting the content and writing style of social media posts while ignoring information related to their context. On the other hand, several recent works have reported some interesting findings in this direction, although they have lacked an exhaustive analysis of contextual information, and also an evaluation about if this same premise holds to detect different types of abusive comments, e.g. offensive, hostile and hateful. For this, we have extended seven Twitter benchmark datasets related to the detection of offensive, aggressive, hostile, and hateful communication. We evaluate our hypothesis by using three different learning models, considering classical (Bag of Words), advanced (Glove), and state-of-the-art (BERT) text representations. Experiments show statistically significant differences between the classification scores of all methods that use a combination of text and metadata in comparison to the classical view of only using the text content of the messages, thus suggesting the importance of paying attention to context to spot the different kinds of abusive comments on social networks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Social networks have had a profound impact on how we humans communicate. They were originally envisioned to reach out and support the spreading of ideas, experiences, and opinions. From this premise, very popular platforms such as Facebook, Twitter, Reddit, and many others emerged. Unfortunately, these same platforms can also be exploited to show intolerance, hateful comments, aggressiveness, and harassment. Hate speech, for instance, has become a problem affecting the interactions among online groups (Burnap & Williams, 2015), since the intolerance and aggressiveness of certain users provoke a negative impact on the experience of other pairs or even online communities.
As the volume of online interactions grows minute by minute,Footnote 1 the need for automated abusive language monitoring mechanisms becomes more evident (Nobata et al., 2016). To support novel research strategies to address this need, recently, challenges and shared tasks have been promoted within the Natural Language Processing (NLP) community (Sanguinetti et al., 2020; Kumar et al., 2018; Basile et al., 2019; Aragón et al., 2020; Fersini et al., 2018), and new resources for different platforms and languages have been created to extend the scope of existing studies. For example, (Jiang et al., 2022) presents a lexicon and dataset in Chinese for sexism detection and provides an exploratory analysis of the characteristics of the latter to validate its quality and to show how sexism is manifested in the Chinese language. In Caselli et al. (2021b), the authors present a Dutch abusive language corpus, a new dataset with tweets manually annotated for abusive language. Similarly, (Plaza del Arco et al., 2021b) presents a new corpus in Spanish for offensive language identification, describing its building process, novelties, and some preliminary experiments. In Pronoza et al. (2021) the authors present a new ethnicity-target hate speech detection task in Russian and show that ethnicity-targeted hate speech is more effectively addressed with their proposed three-class approach. Furthermore, the authors of Amjad et al. (2021) introduced a collection of tweets in Urdu to assess classification methods for threatening language detection, distinguishing between threats aimed towards individuals and groups. Finally, (Vidgen et al., 2021) introduces a contextual abuse dataset in English, which has labels annotated in the context of the conversation thread, contains rationales, and uses an expert-driven group-adjudication process for high-quality annotations. Organizers of such events and developers of these resources provide real examples of texts showing reprehensible attitudes on social media platforms, in the way of hostile, hateful, or aggressive expressions.
Traditional ways to process social media posts include extracting patterns from their content and style, that is, paying full attention to the explicit text being shared. This highlights a questionable assumption, being that the message is all you need to understand its real meaning. This clearly ignores one important aspect that we humans regularly master, the context. Accordingly, the hypothesis in this study is that exploiting the context improves the classification performance of learning models. By context we particularly focus on capturing the post’s metadata such as Retweet count, Replies status, Favorite count, among other variables; but also considering author’s metadata like Default profile, Friends count, Verified, etcFootnote 2. In the end, we evaluate how the inclusion of up to 14 context variables enhances classification performance.
To test the proposed hypothesis we worked on extending seven existing Twitter benchmark datasets that had not originally provided metadata information, thus making this work, to the best of our knowledge, the largest study in terms of the number of metadata variables and a number of benchmark datasets ever evaluated. In an effort to assess if the findings are given to specific strengths of learning pipelines (or not), we consider classical (bag of words & SVM classifier), modern (GloVe & GRU), and state-of-the-art (BERT & linear layer) text classification models.
After this analysis, we observe that results are consistent across all seven datasets, suggesting that adding context obtains an improvement of up to 6% in the classification performance. Beyond this, an interesting finding is the generalization of this pipeline, since it spots hostile, offensive, aggressive, and hateful text.
The contributions of this study can be summarized as:
-
1.
The creation of a new resource for the study of abusive language in social media, made up of seven Twitter datasets that were expanded by retrieving metadata from tweets available online. This new compendium of extended datasets could foster new analysis on the role of context information in the detection of this kind of unwanted behavior.
-
2.
An analysis and experimental evaluation of up to fourteen context variables and three text processing models, that altogether with the seven datasets make it the more exhaustive study on the impact of metadata on the detection of abusive language on Twitter.
The remainder of this paper is organized as follows. In Section 2 we revisit relevant literature to highlight where this study stands regarding the body of knowledge. Section 3 presents the process of construction of the 7 context-enriched datasets, giving proper detail for future works willing to use this corpus. Section 4 presents the experimental setup. In Section 5 we present the results to validate this work’s hypothesis, while in Section 6 we discuss results and present statistical and error analysis. Finally, in Section 7 we conclude this work with some remarks.
2 Related work
Most of the works on detecting abusive language have been modeled as text categorization problems (Schmidt & Wiegand, 2017; Fortuna & Nunes, 2018), that is, posts, comments, or documents are assigned to one or more predefined categories based solely on their content. We organized this review on how relevant studies have represented the explicit message being shared; at the end, we also cover some studies that have attempted to exploit the context of the message or of its author.
The detection of abusive language has considered a great variety of features. Initial attempts used hand-crafted features such as bag-of-words representations, as well as syntactic and semantic-motivated features. For example, (Burnap & Williams, 2015) experimented with different configurations of n-gram approaches, finding that word unigrams and bigrams could include samples of derogatory terms which can be exploited to detect hate speech. Similarly, (Chen et al., 2012) showed that an approach including criteria such as the writing style of users, relationships between offensive words - user identifiers, and cyberbullying language patterns outperformed traditional learning strategies. Moving a step forward, (Nobata et al., 2016) fused various text features to identify abusive language: linguistic, syntactic, n-grams, and from distributional semantics. In the end, all these features made their proposal robust with better performance than state-of-the-art approaches at the time. Davidson et al. (2017) computed other features such as sentiment scores, and Part-of-Speech tag n-grams to represent information about the syntactic structure of the texts. That work also presents an initial exploration of the role of some social media metadata tokens, such as hashtags, retweets, and URLs, although they did not elaborate on the effects of adding these attributes to its feature set. Interestingly, its findings suggest that lexical methods could be an effective way to identify offensive terms but are inaccurate at identifying hate speech.
With the purpose of improving the generalization of classifiers, some recent works have explored the use of deep learning models to learn abusive language patterns without the need for explicit feature engineering. For example, Gambäck and Sikdar (2017) proposed a Convolutional Neural Network (CNN) that exploited word embeddings and one-hot character n-grams. That study outperformed a Logistic Regression model, suggesting some advantage of using deep models. Zhang et al. (2018) added a Gated Recurrent Unit (GRU) layer to a CNN model, benefiting from the feature extraction of the network while capturing order information. This architecture reported new state-of-the-art results in 6 out of 7 tested hate speech collections. In Mozafari et al. (2020), an approach based on deep contextualized word representations for hate speech detection improved the baseline model by adding a CNN as a supervised fine-tuning strategy. Their classifier outperformed the baseline scores reported in Waseem’s and Davidson’s publications (Waseem and Hovy, 2016; Davidson et al., 2017). More recently, transfer learning approaches, considering pre-trained models such as ELMO, GPT-2, and BERT, have also been successfully applied and adapted to the detection of abusive language (Liu et al., 2019; Nikolov & Radivchev, 2019; Caselli et al., 2021a). Furthermore, to address the detection of hate speech in languages other than English, a number of works have presented studies and comparisons of the effectiveness of BERT-based and traditional machine learning classifiers (Plaza del Arco et al., 2021a; Sharma et al., 2022; Pamungkas et al., 2021). These pre-trained models have also been applied in other related tasks, for example, (Gomez et al., 2020) presented a multi-modal architecture to provide text messages expressing hate speech with visual context, Nelatoori and Kommanti (2022) incorporated bidirectional embeddings into a multi-task learning approach to distinguish online toxic messages, and Pandey and Singh (2022) described a stacked arrangement of BERT and LSTMs to detect sarcastic statements.
Regarding the use of context information, a survey (Schmidt & Wiegand, 2017) states that meta-information about the background of the user can be especially predictive, and even the writers of Schulz et al. (2020) emphasize how user’s public expression is shaped by their audience. This is because a user who is known to write abusive messages may do so again, while a user who is not known to write such messages is unlikely to do so in the future. Nevertheless, this survey also refers to some works where using other kinds of metadata from the post (reply count, geographical origin, etc.) led to contradicting results. Following this idea, (Dadvar et al., 2013) used as a feature the number of profane words in the post history of a user to detect further abusive messages. The same authors of this work, presented some preliminary results in Casavantes et al., (2019); (2020), suggesting the plausibility of an approach exploiting metadata to detect aggressiveness in users’ posts. In another study (Chatzakou et al., 2017), the feature set was built by extracting the properties of the content of the messages, the traits from the users, and their use of the social network. When the authors evaluated the features’ importance through information gain, it was found that the user and network-based attributes were the most relevant, contributing to highly accurate discrimination between neutral, aggressive, and cyberbully users. Ribeiro et al. (2018) also performed hateful comments characterization, taking into consideration network and activity-based attributes. Their results suggest that using GraphSAGE (Hamilton et al., 2017), a model aimed at learning in graphs, with network and activity-based features along with GloVe embeddings (Pennington et al., 2014) improve the scores of the prediction task while also decreasing the standard deviation of 5 out of 6 quality measures.
As a final comment, we wish to remark that the latest results suggest the advantages of using deep learning strategies to learn optimized representations from posts, and also the incipient efforts to include metadata information. To have a better perspective of where the present study stands with respect to this literature we offer Table 1.
3 Original and extended Twitter collections
We selected 7 recent datasets that are collections of online posts originally gathered from the Twitter platform (Waseem and Hovy, 2016; Davidson et al., 2017; Álvarez-Carmona et al., 2018; Aragón et al., 2020; Basile et al., 2019; Mandl et al., 2019). These datasets were either published as individual studies or presented in international challenges and shared tasks (Vidgen & Derczynski, 2021; Poletto et al., 2021). The availability of data and the diversity of abusive content within the scope of our research were the deciding factors in the selection of these datasets. Furthermore, we had the opportunity to participate in three of the seven collections’ shared tasks, so we took benefit of the fact that we were previously familiar with some of these resources. In Table 2 we provide the URLs to these resources. Next, we continue with a brief description of these collections.
Waseem and Benevolent Sexism datasets
The Waseem dataset consists of 16K tweets annotated for hate speech and collected over the course of 2 months (Waseem & Hovy, 2016). Three labels were considered: sexist, racist, and neither; however, since the dataset is made of tweet IDs and labels, and the availability of the racist class is almost non-existentFootnote 3 (only 17 out of 1972 samples at the time of assessment), we decided to discard the racist subset. Manual annotation by the creators of the corpus was reviewed with the help of an outside annotator working on gender studies, reporting an annotator agreement of κ = 0.84.
HatebaseTwitter dataset
Davidson et al. (2017) conducted a study in which the Twitter API was used to search for tweets containing keywords from a hate speech lexicon (Hatebase, 2021), resulting in a sample of tweets from 33,458 Twitter users. They extracted the timeline for each user and employed crowdsourcing from CrowdFlower to label a sample of over 24k tweets into three categories: those containing hate speech, only offensive language, and those with neither. They reported an intercoder agreement of κ = 0.92.
MEX-A3T aggressive detection track dataset
This dataset contains more than 7K tweets with hashtags related to topics of politics, sexism, homophobia and discrimination in Mexican Spanish (Álvarez-Carmona et al., 2018). The MEX-A3T team made two different datasets, one used in the first and second editions of the shared task (corresponding to 2018 and 2019) and another used in 2020 for the third edition (Aragón et al., 2020). For both collections, a set of vulgar words was used as seeds for extracting the tweets, then each tweet was labeled as aggressive or non-aggressive. The annotation provides specific criteria to separate a tweet from aggressive, offensive, and profane, based on the linguistic characteristics and intent of the message. They did not report the inter-annotator agreement.
HatEval Subtask A datasets
SemEval-2019 Task 5 - Subtask A consisted of a Hate Speech Detection task against Immigrants and Women in English and Spanish (Basile et al., 2019). To collect the tweets, the authors monitored potential victims of hate accounts, downloaded the history of identified haters, and filtered Twitter streams with keywords. The annotation task was performed by contributors from the crowd-sourcing platform Figure Eight, and by two expert annotators with previous experience in the subject. The English dataset consists of 10k tweets, whereas the Spanish collection contains 5k tweets. The inter-annotator agreement was reported as 0.83 and 0.89 for the English and Spanish datasets, respectively.
HASOC English Subtask A dataset
HASOC Subtask A required systems to classify tweets into two classes: Hate and Offensive (HOF) and Non-Hate and Offensive. The creators of this collection identified topics for which many hate posts could be expected, then data was sampled from Twitter and partially from Facebook using different hashtags and keywords (Mandl et al., 2019). The annotation process was carried out using an online system to label the tweets, reaching an inter-annotator agreement of κ = 0.89 in the English set. After the subtask results were published, the HASOC team released the complete corpus with class labels.
3.1 Context in the form of metadata
To enhance the selected datasets by incorporating metadata we followed the next protocol.
-
1.
Each corpus was loaded with the proper encoding to preserve the intended format of the text messages.
-
2.
For every text sample, a query was made using the Twitter API to search for that instance, setting “end_time” as the search parameter for every dataset (e.g., if a dataset was released on September 2018, we could only retrieve tweets issued until that date for that specific collection).
-
3.
We took into account the similarity between the original tweet and the query results, comparing the length of strings and character placement.
-
4.
We retrieved the information of the tweet with the highest similarity score.
3.1.1 Tweets’ and users’ metadata
Each tweet is represented by an object with a list of fundamental properties. Table 3 displays the tweet attributes, types of data, and descriptions from (Twitter, 2021a) that were considered in our experiments. Observe that we employed the “Date of creation” to extract the hour of the day in which a tweet was posted as a new feature (integer ranging from 0 to 23, and, from now on, referred to as “Hour”).
Similar to tweets, the Twitter API Platform associates an object to each user, which indicates several of their properties (Twitter, 2021b). Table 4 displays the user attributes included in our experiments with their respective types and descriptions. In a similar fashion to the “Hour” feature, we used “Created_at” to calculate the age of the account in days.
To wrap up this section, we present Table 5 where we show for all datasets how ended up being after the inclusion of the metadata information.
4 Experimental settings
4.1 Data preprocessing
We followed standard procedures for the preprocessing of the posts reported for this task, such as the exclusion of non-alphanumeric characters and lowercasing.
Initially, there were three types of metadata: boolean, date, and integer. The boolean features were turned into integers (1/0), and the dates of tweet and account creation were changed to integer values in the form of hours and days, respectively. We transformed all the metadata using QuantileTransformer (Pedregosa et al., 2011), changing each feature individually to map the original values into a uniform distribution.
4.2 Experimental design
Figure 1 depicts the pipeline for the construction of the final vector representation that integrates the information from the texts and the metadata.
In the last years, several statistical models have shown robust performance for NLP tasks. To account for these main branches, we decided to use three different classification models, one based on a Support Vector Machine, one on a deep GRU network, and one on a transformer-based (BERT) approach.
-
Classical - Bag of words (BoW) with tf-idf weights. This representation used an SVM as a classifier, which is one of the most powerful and versatile traditional machine-learning models. Particularly, in the experiments, we considered word unigrams, as well as a combination of unigrams, bigrams, and trigrams for the representation, and a linear kernel, C = 1, L2 normalization, and weighted for class imbalance.
-
Deep RNN - Gated Recurrent Unit (GRU). This approach used GloVe embedding vectors to obtain the representation. For each word, we obtain its vector and feed the recurrent network sequentially. The text is represented by the RNN’s hidden layer, which then passes through an attention layer and then a linear layer to perform the classification. GRUs are a simplified variant of LSTM cells. These networks are specialized to work on sequences as inputs, producing an output and then sending it back to itself as a form of memory from previous time steps (Géron, 2017). Our network used as configuration 100 neurons, an ADAM optimizer, and 300-dimensional Glove embeddings.
-
Bidirectional Encoder Representations from Transformers (BERT): For this model, we used BERT to represent the texts and a linear layer for their classification. A BERT representation is enabled to combine left and right contexts, generating a deep bidirectional Transformer (Devlin et al., 2019). For the experiments, we fine-tuned the BERT model over the training set and the text is represented using the [CLS] vector with a dense layer for the classification.
Take again as reference Fig. 1. Note that to evaluate the hypothesis we followed two different classification pipelines for each dataset. The baseline “Text” pipeline uses a feature set built from the text of tweets as the only input to the classifiers, that is, this is the common approach that any classifier would follow for this task. Our proposal (Fig. 1) follows a slightly different configuration, adding the metadata features, which are concatenated at the end of each tweet text vector.
4.3 Evaluation
For the experiments, we used the expanded collections and ran a 10-fold cross-validation, splitting for each fold the data in 80% for training, 10% for validation, and 10% for testing. We collected values for the standard text classification measures: Accuracy (ACC), Precision, Recall and F1-score; excluding the accuracy, we report macro averages for multiclass classification tasks and the values over the abusive class for binary classification tasks. To test for statistical significance, we used the F1-scores on a Bayesian Wilcoxon signed-rank test (Benavoli et al., 2014; Benavoli et al., 2017).
5 Results
Table 6 presents the complete results of the experimentation using the original and the proposed pipeline which includes text+metadata. For clarity purposes, we only include results from the baseline pipeline (only text) and the best result achieved by adding metadata (considering only User metadata, only Post metadata, or the combination of both). Results are very consistent since in all pairwise comparisons, using text+metadata improved classification performance across all datasets. Moreover, with BERT, which has become an important player in the NLP arena we observe a significant increase in performance for all data sets. Something interesting to notice is that in some cases precision increases more than recall when considering metadata, thus suggesting that context from the tweet directly helps the identification of non-abusive texts. We would like to note that we also computed the results using only metadata, but the results were generally poor, obtaining on average 20 points lower than using the context information with the text.
A more specific question is related to the influence of adding only tweets’ metadata (MD), users’ MD, or both to the text representation. This evaluation is shown in Fig. 2, where for each ML strategy the three options are evaluated on the Precision vs Recall space. We observe that adding metadata to SVM and GRU classifiers increases precision scores while keeping recall scores almost constant with respect to the Text-only model. Meanwhile, for BERT, although the improvements are more modest, they occur for both precision and recall. We can appreciate that the models obtained similar proportions of true positives while predicting as abusive a comparable number of tweets that were not (false positives). BERT model outperformed GRU and SVM in correctly predicting the greatest amount of abusive tweets.
6 Analysis of results
6.1 Statistical significance analysis
To evaluate the significance of including metadata in the classification pipeline we applied a Bayesian Wilcoxon signed-rank test. This test is a nonparametric Bayesian version of the Wilcoxon signed-rank test set up on the Dirichlet process and it is recommended to directly compare ML classifiers (Benavoli et al., 2014; Benavoli et al., 2017). Given the observed data, the test computes the posterior probability of the null and alternative hypotheses, providing a straightforward probability of one method being better than the other (when comparing two treatments), thus avoiding the abstract interpretation of frequentist tests.
For this analysis, we define that method A corresponds to the pipeline that relies only on the tweets’ text while method B is the proposed strategy that combines text and metadata (MD). We present the results of this statistical analysis in Table 7 over the F1 scores, where the symbol “>” is used to represent “better than”. We can observe that for 2 out of 3 treatments, there is a very high probability (> 0.98) that using text+MD offers better results than only using text. In the case of the SVM, the conclusion is that using metadata (text+MD) is practically the same as not using it. For the SVM model, using metadata (text+MD) does not provide much improvement compared to the other models. One possible explanation is that the quantity of metadata added is relatively small in comparison to the size of the original BoW representation, culminating in little impact on the relative position of the support vectors and thus on the definition of the decision hyperplane. As a result, the classification results remain relatively the same. On the other hand, deep learning models can generalize this type of information better.
For a more compelling and visual interpretation of the results of this analysis we present Fig. 3, where each point represents a statistical comparison between both treatments and each vertex of the triangle is associated with a possible result of the comparison for a) SVM, b) GRU and c) BERT strategies. When comparing two algorithms A and B over a specific data set, the Bayesian Wilcoxon signed-rank test gives the likelihood of occurrence of three different scenarios: A outperforms B; A and B perform similarly (referred to as rope or region of practical equivalence); and B outperforms A. To help visualize this analysis, in Fig. 3 we map 150,000 Monte Carlo samples in barycentric coordinates as proposed by (Benavoli et al., 2014), where each vertex of the triangle is associated with each Bayesian test scenario. For example, using the data provided in Table 7, the Bayesian Test concluded that for 147,741 out of 150,000 samples, combining both text+metadata is advantageous to exclusively use text if predictions are made using BERT.
6.2 What is the contribution of each metadata feature to the final result?
To measure the dependency between metadata features and labels, we calculated their Mutual Information (MI) scoresFootnote 4. We did this for each of the features, in each of the seven collections, to then report their average value. The higher this value, the greater the dependency between the given feature and the labelsFootnote 5, and therefore, the greater the relevance of the former for the prediction of the category of the tweets.
Figure 4 shows the average MI score obtained for each metadata feature, where the sizes of the polar bars correspond to these scores, that is, the bars furthest from the center correspond to the features associated with the class labels, while those closest to it to features whose values are independent of that of the labels. Accordingly, we observe that most User-based metadata obtain higher values compared to Tweet-based features. This suggests the relevance of profile-based features to the classification task at hand, being the most effective the following users’ metadata: [status count], [favorites count], and [Listed count].
6.3 Corrections and new errors when considering metadata
To shed some light on how adding metadata corrects some cases, we present Table 8. From these examples, we elaborate on how metadata could be influencing the final decision of the classifier.
-
The first tweet includes some trigger words, but their context is not clear enough. However, observing that the user who wrote it still has the default profile, which is usually interpreted as avoiding network engagement, the classifier changed its decision to be a hateful message.
-
Despite the fact that the second tweet contains profanity in reference to a song’s title, the user has a large number of followers and status updates, which are unusual characteristics for haters in the corresponding dataset.
-
The user who sent the third tweet, containing a white supremacist message, has a small number of friends and followers. In addition, the account is also older than six years yet it is still using a default profile. All this extra information influences the classifier to modify its decision indicating that it is a hate-speech message.
-
Despite the fourth tweet containing trigger words such as “gay” and “queer”, the user actually uses them not as an insult but instead to describe and praise an actor. The positiveness of the message is paired up with moderately high statuses and friends counters.
On the other hand, there are cases where adding metadata influences misclassification. Table 9 presents examples that were correctly classified in the absence of metadata and misclassified when the context was included. Some of the things worth considering to get an idea of what might have caused the new errors are:
-
The user that posted the first tweet attacking immigrants, has high followers and listed counters, qualities that, at least for that specific collection, the classifier learned to associate with messages devoid of hate speech.
-
The second tweet employs a trigger word in a disagreeable joke, however, the user who wrote it has a sizable following, friend, and favorites counters does not use a default profile and has made a lot of status updates, all of which are common characteristics in users who are not used to posting offensive messages.
-
The third tweet shows an example of informal language labeled as “Neither HS/offensive”. When considering the metadata, in particular, that the tweet was not retweeted or marked as favorite, the classifier changed its decision from “non-offensive” to “offensive”.
6.4 Discussion: theoretical and practical implications of our research
Several social networks have currently decided to ban specific forms of speech. The European Commission, for example, has set a number of commitments to counteract the spread of hate speech in collaboration with businesses such as Facebook and TwitterFootnote 6. In this regard, research into the creation of automatic algorithms for detecting abusive language is important, because a manual and complete assessment of content is obviously unfeasible.
We are aware of the conflict that exists between the route of free expression and the path of content censorship to reduce hate speech (Apple, 2022; DiLeo, 2017). Our stance is that machine learning technology can help to tag social media content and allow users themselves to choose whether to view or block that content. In other words, a classification system may warn users but not restrict them.
Previous works have mainly addressed the identification of abusive communication focusing on exploiting either hand-crafted or learned features from the explicit text in the posts. However, we believe the consideration of metadata presents an opportunity to improve current detection strategies, as we have proved that these could benefit from information about the social interactions that take place along text exchanges. Nonetheless, it is important to acknowledge that including authors’ metadata could incur ethical, unfair, and risk issues related to racial or gender bias. To avoid this phenomenon we need to pay special attention to the type of metadata that could be considered. In this sense, an interesting finding in this study is that posts’ metadata is more informative than users’ metadata, then opening the possibility to avoid some of these risks.
This study takes advantage of the general usefulness of metadata in social media, as mentioned by Poletto et al. (2021), by saying that Twitter is currently the most exploited source of textual data to build collections of abusive language. In this work, we expect to draw more attention to the usefulness, and fair treatment, of metadata by making available resources to continue studying the undesirable phenomena of abusiveness in social networks.
7 Conclusions
As accessibility to the internet becomes easier for all kinds of purposes, it is important to develop effective methods to moderate and keep an eye on abusive content. In this study, we explored if the inclusion of metadata features extracted from tweets and authors is able to offer better results to spot abusive content than considering only the explicit text in the tweets. For this, we extended seven Twitter benchmark datasets by including the context in the form of metadata features such as retweet count, favorite count, and reply status, among other variables. The results and their statistical analysis strongly suggest that a pipeline considering text and metadata obtains a clear advantage regarding a traditional approach of only using text. To reduce model bias we considered three text representation schemes in the way of Bag of Words, GLoVe embeddings, and BERT contextualized vectors. Analysis of the results also indicates that if only one metadata feature is to be used it is suggested to be one extracted from the user account, while if they are used as a complete set, then Tweet-based features should be preferred. As future work, we also want to explore the addition of followers’ information as metadata that could improve the representation of the users.
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Notes
Just in one minute: Facebook users upload 147,000 photos, Twitter registers 319 new users, Instagram adds 350,000 new stories, etc. Source: https://www.socialmediatoday.com/news/what-happens-on-the-internet-every-minute-2020-version-infographic/583340/
Important to remark that although this data is particular to specific posts, the privacy of its authors is never compromised.
Those tweets were probably easier to spot and deleted by Twitter itself because of the racist keywords used for corpus collection.
A zero value means both variables are independent.
References
Álvarez-Carmona, M., Guzmán-Falcón, E., Montes-y Gómez, M., & et al. (2018). Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets. CEUR Workshop Proceedings, 2150, 74–96. https://ceur-ws.org/Vol-2150/overview-mex-a3t.pdf.
Amjad, M., Ashraf, N., Zhila, A., & et al. (2021). Threatening language detection and target identification in urdu tweets. IEEE Access, 9, 128,302–128,313. https://doi.org/10.1109/ACCESS.2021.3112500.
Apple, K. (2022). When the shield becomes the sword: the evolution of section 230 from a free speech shield to a sword of censorship. Working paper. https://ssrn.com/abstract=4045663.
Aragón, M. E., Jarquín-Vásquez, H. J., y Gómez, M. M., & et al. (2020). Overview of mex-a3t at iberlef 2020: Fake news and aggressiveness analysis in mexican spanish. In IberLEF@SEPLN, vol 2664. CEUR Workshop Proceedings (CEUR-WS.org, pp. 222–235). https://ceur-ws.org/Vol-2664/mex-a3t_overview.pdf.
Basile, V., Bosco, C., Fersini, E., & et al. (2019). SemEval-2019 task 5 Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th international workshop on semantic evaluation. https://doi.org/10.18653/v1/S19-2007. https://aclanthology.org/S19-2007 (pp. 54–63). Minnesota: Association for computational linguistics.
Benavoli, A., Mangili, F., Corani, G., & et al. (2014). A bayesian wilcoxon signed-rank test based on the dirichlet process. In Proceedings of the 31st international conference on international conference on machine learning - Volume 32. JMLR.org, ICML’14, p. II–1026–II–1034. http://proceedings.mlr.press/v32/benavoli14.pdf.
Benavoli, A., Corani, G., Demšar, J., & et al. (2017). Time for a change: A tutorial for comparing multiple classifiers through bayesian analysis. The Journal of Machine Learning Research, 18(1), 2653–2688. https://jmlr.org/papers/v18/16-305.html.
Burnap, P., & Williams, M. L. (2015). Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet, 7(2), 223–242. https://doi.org/10.1002/poi3.85. https://onlinelibrary.wiley.com/doi/abs/10.1002/poi3.85.
Casavantes, M., López, R., & González-Gurrola, L. C. (2019). Uach at mex-a3t 2019: Preliminary results on detecting aggressive tweets by adding author information via an unsupervised strategy. In Proceedings of the first workshop on Iberian languages evaluation forum (IberLEF 2019), CEUR WS proceedings. https://ceur-ws.org/Vol-2421/MEX-A3T_paper_8.pdf.
Casavantes, M., González, L., & López, R. (2020). UACh at MEX-A3T 2020: Detecting aggressive tweets by incorporating author and message context. In Proceedings of the 2nd SEPLN workshop on Iberian languages evaluation forum (IberLEF) 2664. https://ceur-ws.org/Vol-2664/mexa3t_paper6.pdf.
Caselli, T., Basile, V., Mitrović, J., & et al. (2021a). HateBERT: Retraining BERT for abusive language detection in English. In Proceedings of the 5th workshop on online abuse and harms (WOAH 2021). Association for computational linguistics, pp. 17–25. https://doi.org/10.18653/v1/2021.woah-1.3. https://aclanthology.org/2021.woah-1.3.
Caselli, T., Schelhaas, A., Weultjes, M., & et al. (2021b). DALC: the Dutch abusive language corpus. In Proceedings of the 5th workshop on online abuse and harms (WOAH 2021). Association for Computational Linguistics, pp. 54–66. https://doi.org/10.18653/v1/2021.woah-1.6. https://aclanthology.org/2021.woah-1.6.
Chatzakou, D., Kourtellis, N., Blackburn, J., & et al. (2017). Mean birds: Detecting aggression and bullying on twitter. In Proceedings of the 2017 ACM on web science conference, WebSci ’17 (pp. 13–22). New York: Association for computing machinery, https://doi.org/10.1145/3091478.3091487.
Chen, Y., Zhou, Y., Zhu, S., & et al. (2012). Detecting offensive language in social media to protect adolescent online safety. In 2012 International conference on privacy, security, risk and trust and 2012 international confernece on social computing, pp. 71–80, https://doi.org/10.1109/SocialCom-PASSAT.2012.55.
Dadvar, M., Trieschnigg, D., Ordelman, R., & et al. (2013). Improving cyberbullying detection with user context. In P. Serdyukov, P. Braslavski, S. O. Kuznetsov, & et al. (Eds.) Advances in information retrieval (pp. 693–696). Berlin: Springer, https://doi.org/10.1007/978-3-642-36973-5_62.
Davidson, T., Warmsley, D., Macy, M. W., & et al. (2017). Automated hate speech detection and the problem of offensive language. In International conference on web and social media, pp. 512–515, https://doi.org/10.1609/icwsm.v11i1.14955.
Devlin, J., Chang, M.-W., Lee, K., & et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1 (Long and Short Papers). https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423 (pp. 4171–4186). Minnesota: Association for computational linguistics.
DiLeo, D. (2017). Social media terms and conditions - the delicate balancing act between online safety and free speech censorship. In Student works 929. https://scholarship.shu.edu/student_scholarship/929.
Fersini, E., Nozza, D., & Rosso, P. (2018). Overview of the evalita 2018 task on automatic misogyny identification (ami). In EVALITA Evaluation of NLP and speech tools for Italian: proceedings of the final workshop 12-13 December 2018, Naples. Torino: Accademia University Press, https://doi.org/10.4000/books.aaccademia.4497.
Fortuna, P., & Nunes, S. (2018). A survey on automatic detection of hate speech in text. ACM Computing Surveys, 51(4), 1–30. https://doi.org/10.1145/3232676.
Gambäck, B., & Sikdar, U. K. (2017). Using convolutional neural networks to classify hate-speech. In Proceedings of the first workshop on abusive language online. https://doi.org/10.18653/v1/W17-3013. https://aclanthology.org/W17-3013(pp. 85–90). Vancouver: Association for computational linguistics.
Géron, A. (2017). Hands-on machine learning with scikit-learn and tensorflow: concepts, tools, and techniques to build intelligent systems, 1st edn. O’Reilly Media, Inc.
Gomez, R., Gibert, J., Gomez, L., & et al. (2020). Exploring hate speech detection in multimodal publications. In 2020 IEEE winter conference on applications of computer vision (WACV), pp. 1459–1467, https://doi.org/10.1109/WACV45572.2020.9093414.
Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 31st international conference on neural information processing systems, NIPS’17 (pp. 1025–1035). Red Hook, NY: Curran Associates Inc.
Hatebase, I. (2021). Hatebase. https://hatebase.org/. Accessed 17 Feb 2023.
Jiang, A., Yang, X., Liu, Y., & et al. (2022). Swsr: A chinese dataset and lexicon for online sexism detection. Online Social Networks and Media, 27, 100,182. https://doi.org/10.1016/j.osnem.2021.100182. https://www.sciencedirect.com/science/article/pii/S2468696421000604.
Kumar, R., Ojha, A. K., Malmasi, S., & et al. (2018). Benchmarking aggression identification in social media. In Proceedings of the first workshop on trolling, aggression and cyberbullying (TRAC-2018). https://aclanthology.org/W18-4401 (pp. 1–11). New Mexico: Association for computational linguistics.
Liu, P., Li, W., & Zou, L. (2019). NULI at SemEval-2019 task 6 Transfer learning for offensive language detection using bidirectional transformers. In Proceedings of the 13th international workshop on semantic evaluation. https://doi.org/10.18653/v1/S19-2011. https://aclanthology.org/S19-2011 (pp. 87–91). Minneapolis, Minnesota: Association for computational linguistics.
Mandl, T., Modha, S., Majumder, P., & et al. (2019). Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages, FIRE ’19, (pp. 14–17). New York: Association for computing machinery.
Mozafari, M., Farahbakhsh, R., & Crespi, N. (2020). A bert-based transfer learning approach for hate speech detection in online social media. In H. Cherifi, S. Gaito, J. F. Mendes, & et al. (Eds.) Complex networks and their applications VIII (pp. 928–940). Cham: Springer, https://doi.org/10.1007/978-3-030-36687-2_77.
Nelatoori, K., & Kommanti, H. (2022). Multi-task learning for toxic comment classification and rationale extraction. Journal of Intelligent Information Systems.
Nikolov, A., & Radivchev, V. (2019). Nikolov-radivchev at SemEval-2019 task 6: Offensive tweet classification with BERT and ensembles. In Proceedings of the 13th international workshop on semantic evaluation. https://doi.org/10.18653/v1/S19-2123. https://aclanthology.org/S19-2123 (pp. 691–695). Minneapolis, Minnesota: Association for computational linguistics.
Nobata, C., Tetreault, J., Thomas, A., & et al. (2016). Abusive language detection in online user content. In Proceedings of the 25th international conference on world wide web. international world wide web conferences steering committee, Republic and Canton of Geneva, CHE, WWW ’16, pp. 145–153, https://doi.org/10.1145/2872427.2883062.
Pamungkas, E. W., Basile, V., & Patti, V. (2021). A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection. Information Processing & Management, 58 (4), 102,544. https://doi.org/10.1016/j.ipm.2021.102544. https://www.sciencedirect.com/science/article/pii/S0306457321000510.
Pandey, R., & Singh, J. (2022). Bert-lstm model for sarcasm detection in code-mixed social media post. Journal of Intelligent Information Systems :1–20.
Pedregosa, F., Varoquaux, G., Gramfort, A., & et al. (2011). Scikit-learn: machine learning in python. Journal of Machine Learning Research, 12 (85), 2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html.
Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). https://doi.org/10.3115/v1/D14-1162. https://aclanthology.org/D14-1162 (pp. 1532–1543). Doha: Association for Computational Linguistics.
Plaza del Arco, F. M., Molina-González, M. D., Ureña-López, L. A., & et al. (2021a). Comparing pre-trained language models for spanish hate speech detection. Expert Systems with Applications, 166, 114,120. https://doi.org/10.1016/j.eswa.2020.114120. https://www.sciencedirect.com/science/article/pii/S095741742030868X.
Plaza del Arco, F. M., Montejo-Ráez, A., Ureña-López, L. A., & et al. (2021b). OffendES: A new corpus in Spanish for offensive language research. In Proceedings of the international conference on recent advances in natural language processing (RANLP 2021), INCOMA Ltd., Held Online, pp. 1096–1108. https://aclanthology.org/2021.ranlp-1.123.
Poletto, F., Basile, V., Sanguinetti, M., & et al. (2021). Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation, 55, 477–523. https://doi.org/10.1007/s10579-020-09502-8.
Pronoza, E., Panicheva, P., Koltsova, O., & et al. (2021). Detecting ethnicity-targeted hate speech in russian social media texts. Information Processing & Management, 58(6), 102,674. https://doi.org/10.1016/j.ipm.2021.102674. https://www.sciencedirect.com/science/article/pii/S0306457321001606.
Ribeiro, M., Calais, P., Santos, Y., & et al. (2018). Characterizing and detecting hateful users on twitter. In Proceedings of the international AAAI conference on web and social media 12(1). https://doi.org/10.1609/icwsm.v12i1.15057. https://ojs.aaai.org/index.php/ICWSM/article/view/1505.
Sanguinetti, M., Comandini, G., di Nuovo, E., & et al. (2020). Haspeede 2 @ evalita2020: Overview of the evalita 2020 hate speech detection task. In V. Basile, D. Croce, M. Di Maro, & et al. (Eds.) Proceedings of the seventh evaluation campaign of natural language processing and speech tools for Italian. Final Workshop (EVALITA 2020), vol 2765. CEUR Workshop Proceedings (CEUR-WS.org).
Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language processing. In Proceedings of the fifth international workshop on natural language processing for social media. https://doi.org/10.18653/v1/W17-1101. https://aclanthology.org/W17-1101 (pp. 1–10). Valencia: Association for computational linguistics.
Schulz, W. S., Guess, A. M., Barberá, P., & et al. (2020). (Mis)representing Ideology on Twitter: How social influence shapes online political expression. In Working paper. https://simonmunzert.github.io/meof/material/schulz-et-al-ideology-twitter-apsa.pdf.
Sharma, A., Kabra, A., & Jain, M. (2022). Ceasing hate with moh: Hate speech detection in hindi–english code-switched language. Information Processing & Management, 59(1), 102,760. https://doi.org/10.1016/j.ipm.2021.102760. https://www.sciencedirect.com/science/article/pii/S0306457321002417.
Twitter, I. (2021a). Tweet object — twitter developers. Accessed 18 Nov 2021. https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.
Twitter, I. (2021b). User object — twitter developers. Accessed 18 Nov 2021. https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object.
Vidgen, B., & Derczynski, L. (2021). Directions in abusive language training data, a systematic review: Garbage in, garbage out. PLoS ONE, 15 (12), 1–32. https://doi.org/10.1371/journal.pone.0243300. https://doi.org/10.1371/journal.pone.0243300.
Vidgen, B., Nguyen, D., Margetts, H., & et al. (2021). Introducing CAD: the contextual abuse dataset. In Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, Association for computational linguistics, pp. 2289–2303. https://doi.org/10.18653/v1/2021.naacl-main.182. https://aclanthology.org/2021.naacl-main.182.
Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In Proceedings of the NAACL student research workshop. https://doi.org/10.18653/v1/N16-2013. https://aclanthology.org/N16-2013(pp. 88–93). San Diego: Association for computational linguistics.
Zhang, Z., Robinson, D., & Tepper, J. (2018). Detecting hate speech on twitter using a convolution-gru based deep neural network. In A. Gangemi, R. Navigli, M.-E. Vidal, & et al. (Eds.) The semantic web (pp. 745–760). Cham: Springer, https://doi.org/10.1007/978-3-319-93417-4_48.
Funding
This work was supported by the Mexican National Council for Science and Technology (CONACYT) under grant agreements no. 701616 and no. 654803.
Author information
Authors and Affiliations
Contributions
Conceptualization: [Marco Casavantes]; Methodology: [Mario Ezra Aragón]; Formal analysis: [Marco Casavantes]; Investigation: [Marco Casavantes, Mario Ezra Aragón]; Data curation: [Marco Casavantes]; Validation: [Mario Ezra Aragón]; Writing - original draft preparation: [Marco Casavantes, Mario Ezra Aragón]; Writing -review and editing: [Luis C. González, Manuel Montes-y-Gómez]; Supervision: [Luis C. González, Manuel Montes-y-Gómez]; Project administration: [Luis C. González, Manuel Montes-y-Gómez].
Corresponding author
Ethics declarations
Competing interests
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Casavantes, M., Aragón, M.E., González, L.C. et al. Leveraging posts’ and authors’ metadata to spot several forms of abusive comments in Twitter. J Intell Inf Syst 61, 519–539 (2023). https://doi.org/10.1007/s10844-023-00779-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-023-00779-z