Detecting and Characterizing Bot-Like Behavior on Twitter

Qi, SiHua; AlKulaib, Lulwah; Broniatowski, David A.

doi:10.1007/978-3-319-93372-6_26

SiHua Qi¹⁷,
Lulwah AlKulaib¹⁷ &
David A. Broniatowski¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10899))

Included in the following conference series:

International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation

3743 Accesses
7 Citations
1 Altmetric

Abstract

Social media is becoming a platform of choice for people to voice their opinion on topics of discussion. To evaluate these opinions, it is important to have an accurate assessment of who is saying what. Unfortunately, social media are also the home of bots which makes the assessment difficult. Bots are computer programs designed to mimic human behavior online in social networks. They are used to pursue a variety of goals, including, but not limited to, spreading information, and influencing targets.

In this paper, we describe a machine learning framework that uses content-based features extracted from Twitter to detect bot-like behavior on the platform. Unlike other machine-learning approaches to bot detection, we seek to generate explanations of why specific accounts are categorized as bots; thus allow us to modify these criteria as bots’ behaviors evolve. We have therefore developed the criteria mentioned in an article published in Medium [1] to detect bot-like behavior in our dataset then evaluate the results. We then explain the different types of bots that used as our datasets and compare the significant features for each type of bots in a logistic regression method.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Identification of Social Accounts’ Responses Using Machine Learning Techniques

Identify Twitter Data from Humans or Bots Using Machine Learning Algorithms with Kendalls Correlation

Changing Perspectives: Is It Sufficient to Detect Social Bots?

Keywords

1 Introduction

Online social network sites are becoming more popular each day. According to a report by Dream Grow, Twitter is considered among the top 15 most popular social networking sites. It has 330 M active users monthly, which puts it in 4^th place on that list. [2] Unfortunately, the number of bot accounts is surprisingly large. A new paper from University of Southern California and Indiana University suggests that up to 15% of twitter accounts are in fact bots rather than people [3].

Bots on Twitter are accounts controlled by a software, automatically producing content, and interacting with other users. Some of these bots use Twitter as a tool to announce news headlines, others utilize the platform for marketing, such bots are considered useful bots. However, there is a growing record of misuse of bot accounts. These accounts would be designed to mimic human behavior, then sold to users aiming to boost their popularity with fake followers, [4] used to promote terrorist propaganda, [5] or used by some organizations to influence public opinion [6].

Twitter allows bots on the platform that adhere to their rules, automated likes are not allowed, and automated retweets are only allowed for entertainment, informational, or novelty purposes. [twitter-automation] These rules, and other issues have been addressed after the DARPA challenge which was a Twitter bot detection challenge to study malicious activities carried by bot accounts. [7] While working on this project we noticed that published papers talk about bot detection. Meanwhile, users that try to hide that their accounts constantly violate Twitter rules, are not always fully automated. Bots can be turned on and off as needed. When the program is not running the account, a human would be posting, which makes these accounts harder to detect.

In this paper, we defined our criteria that determines what a bot account acts like as bot-like behavior. We used criteria from an article published by Nimmo [1], and some others that we added as the study evolved. Unlike other approaches that try to predict whether an account is a bot or not based on holdout data [3], we use a statistical approach that aims to provide explanatory insight into why our assignment is made. The rest of this paper is organized as followed. In Sect. 2, related work is discussed. In Sect. 3, We propose criteria used for bot-like-behavior detection. We use the criteria to train our model which detects accounts that satisfy any of the criteria and generates a report for each dataset with the results. In Sect. 4, we explain the results generated by the study.

2 Related Work

Twitter has been widely used since 2006, and the open structure of twitter lead people to question who is tweeting early on. Chu et al. [8], classified twitter users into human, bot, and cyborg accounts using 4 components each of which checks a specific criterion, then compute a score that enables classification. Davis et al. [9], created a service that evaluates the extent to which a Twitter account exhibits the similarity to the known characteristics of social bots. Their platform fetches a given account’s recent activity, then computes and returns a bot-likelihood score. The DARPA Twitter bot challenge [7] also addressed four different features they assigned to different teams to work on in the bot detection challenge. The detection systems created in the challenge were all semi-supervised and all teams used human judgement to augment automated bot identification processes.

3 Bot-Like Behavior Detection

3.1 Data Collection

Twitter has a set of API functions [10] that supports user information collection. Our data was collected using the Twitter API, where we crawled the most recent 200 posts by users from a known bot list [11]. We used a dataset consisting of 4 types of manually verified twitter bots: Fake Followers, Traditional Spam Bots, Social Spam Bots and Content Polluters. We also pulled a list of verified legitimate users from the same source.

3.2 Methods

We designed a study to describe a list of user behaviors for each twitter account. Using the article by Nimmo [1], we created a program that would detect the features that indicate bot-like behavior. After data collection, we ran the script that generated results for each user against our criteria. Using those results, we applied a stepwise logistic regression model based on Akaike Information Criteria (AIC) values to determine which of the 19 features were relevant when detecting bot-like behavior. Features are explained in Table 1.

Table 1. Features used for bot-like-behavior detection.

Full size table

4 Analysis

We tested our script on all four bot categories data that we collected. Examining the results, we used the cut off value |z| = 2 as a threshold to extract features which are more relevant to the model. The |z| = 2 cut off value corresponds to two-sided hypothesis test with a significance level of = 0.05. A big magnitude of z-score indicates that the corresponding true regression coefficient is not 0 and that the variable matters. Based on the features we got, we described a series of bot-like behaviors.

For all four types of bots, there were 2 features in common: “most_recent_time” with a negative z-score and “status_num” with a positive z value which indicated if a user is more active recently, the user is most-likely not a bot and the user with a high number of tweets are more likely be a bot.

Based on the reports generated (Table 2), we noticed that each bot type has features more significant to that type. Fake follower bots are “simple accounts that inflate the number of followers of another account” [9]. Our results showed that fake follower accounts do not tweet frequently, but they have a significant number of friends consistent with their purpose: making other accounts popular. On the other hand, content polluters bots are designed to generate spam while masquerading as humans [12]. According to our analysis, content polluters have a high average number of tweets per day, and a significant number of friends. This corresponds to the idea of spam accounts in general, where accounts are trying to increase their outreach. In contrast, traditional spam bots are “a group of automated accounts spamming job offers” [9], which are easily identifiable as automated. In our dataset, the average time between two posts by traditional spam bots is short. Traditional spambots also rarely post retweeted content. Such behavior is consistent with accounts that are designed to posting job advertisements. Finally, social spam bots are “spammers of products on sale at Amazon.com” or “spammers of paid apps for mobile devices” [9]. The report shows that these accounts post several of hyperlinks, and do not engage in conversations (twitter mentions). Social spam bot behavior that we found here is consistent with their content suggested by the source.

Table 2. Features relevant to bo types.

Full size table

5 Conclusion

Our results demonstrate that bot-like behavior differs significantly with bot design. Specifically, one may be able to infer the functional purpose for which the bot was created by exploring the specific features along which that particular bot type differs from human users. Future work in the area of bot detection could benefit from combining explanatory approaches grounded in traditional statistical analyses, in addition to the machine-learning approaches that are already in widespread usage.

References

Nimmo, B.: #BotSpot: Twelve ways to spot a bot, 28 August 2017. https://medium.com/dfrlab/botspot-twelve-ways-to-spot-a-bot-aedc7d9c110c
Kallas, P.: Top 15 most popular social networking sites and apps, February 2018. https://www.dreamgrow.com/top-15-most-popular-social-networking-sites/
Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization (2017). http://arxiv.org/abs/1703.03107
Confessore, N., Dance, G., Harris, R., Hansen, M.: The follower factory. New York Times, 27 January 2018. https://www.nytimes.com/interactive/2018/01/27/technology/social-media-bots.html
The ISIS Twitter census: defining and describing the population of ISIS supporters on Twitter. States News Service, 13 March 2015
Google Scholar
Ferrara, E., Wang, W., Varol, O., Flammini, A., Galstyan, A.: Predicting online extremism, content adopters, and interaction reciprocity (2016). https://doi.org/10.1007/978-3-319-47874-6_3. http://arxiv.org/abs/1605.00659
Subrahmanian, V.S., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K., Zhu, L., Ferrara, E., Flammini, A., Menczer, F.: The DARPA Twitter bot challenge. Computer, 49(6), 38–46 (2016). https://doi.org/10.1109/mc.2016.183. http://ieeexplore.ieee.org/document/7490315
Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on Twitter. Paper presented at the 21–30, 6 December 2010. https://doi.org/10.1145/1920261.1920265. http://dl.acm.org/citation.cfm?id=1920265
Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: BotOrNot: a system to evaluate social bots (2016). https://doi.org/10.1145/2872518.2889302. http://arxiv.org/abs/1602.00975
Rules and policies. https://help.twitter.com/en/rules-and-policies/twitter-automation
Bot repository (2017). https://botometer.iuni.iu.edu/bot-repository/datasets.html
Lee, K., Eoff, B., Caverlee, J.: Seven months with the devils: a long-term study of content polluters on Twitter
Google Scholar

Download references

Author information

Authors and Affiliations

George Washington University, 2121 I St NW, Washington, DC, 20052, USA
SiHua Qi, Lulwah AlKulaib & David A. Broniatowski

Authors

SiHua Qi
View author publications
You can also search for this author in PubMed Google Scholar
Lulwah AlKulaib
View author publications
You can also search for this author in PubMed Google Scholar
David A. Broniatowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to SiHua Qi .

Editor information

Editors and Affiliations

United States Military Academy, West Point, New York, USA
Robert Thomson
Bucknell University, Lewisburg, Pennsylvania, USA
Christopher Dancy
The Ohio State University, Columbus, Ohio, USA
Ayaz Hyder
University of Michigan–Flint, Flint, Michigan, USA
Halil Bisgin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qi, S., AlKulaib, L., Broniatowski, D.A. (2018). Detecting and Characterizing Bot-Like Behavior on Twitter. In: Thomson, R., Dancy, C., Hyder, A., Bisgin, H. (eds) Social, Cultural, and Behavioral Modeling. SBP-BRiMS 2018. Lecture Notes in Computer Science(), vol 10899. Springer, Cham. https://doi.org/10.1007/978-3-319-93372-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-93372-6_26
Published: 14 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93371-9
Online ISBN: 978-3-319-93372-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Detecting and Characterizing Bot-Like Behavior on Twitter

Abstract

Similar content being viewed by others

Identification of Social Accounts’ Responses Using Machine Learning Techniques

Identify Twitter Data from Humans or Bots Using Machine Learning Algorithms with Kendalls Correlation

Changing Perspectives: Is It Sufficient to Detect Social Bots?

Keywords

1 Introduction

2 Related Work

3 Bot-Like Behavior Detection

3.1 Data Collection

3.2 Methods

4 Analysis

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Detecting and Characterizing Bot-Like Behavior on Twitter

Abstract

Similar content being viewed by others

Identification of Social Accounts’ Responses Using Machine Learning Techniques

Identify Twitter Data from Humans or Bots Using Machine Learning Algorithms with Kendalls Correlation

Changing Perspectives: Is It Sufficient to Detect Social Bots?

Keywords

1 Introduction

2 Related Work

3 Bot-Like Behavior Detection

3.1 Data Collection

3.2 Methods

4 Analysis

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation