Keywords

1 Introduction

Online social network sites are becoming more popular each day. According to a report by Dream Grow, Twitter is considered among the top 15 most popular social networking sites. It has 330 M active users monthly, which puts it in 4th place on that list. [2] Unfortunately, the number of bot accounts is surprisingly large. A new paper from University of Southern California and Indiana University suggests that up to 15% of twitter accounts are in fact bots rather than people [3].

Bots on Twitter are accounts controlled by a software, automatically producing content, and interacting with other users. Some of these bots use Twitter as a tool to announce news headlines, others utilize the platform for marketing, such bots are considered useful bots. However, there is a growing record of misuse of bot accounts. These accounts would be designed to mimic human behavior, then sold to users aiming to boost their popularity with fake followers, [4] used to promote terrorist propaganda, [5] or used by some organizations to influence public opinion [6].

Twitter allows bots on the platform that adhere to their rules, automated likes are not allowed, and automated retweets are only allowed for entertainment, informational, or novelty purposes. [twitter-automation] These rules, and other issues have been addressed after the DARPA challenge which was a Twitter bot detection challenge to study malicious activities carried by bot accounts. [7] While working on this project we noticed that published papers talk about bot detection. Meanwhile, users that try to hide that their accounts constantly violate Twitter rules, are not always fully automated. Bots can be turned on and off as needed. When the program is not running the account, a human would be posting, which makes these accounts harder to detect.

In this paper, we defined our criteria that determines what a bot account acts like as bot-like behavior. We used criteria from an article published by Nimmo [1], and some others that we added as the study evolved. Unlike other approaches that try to predict whether an account is a bot or not based on holdout data [3], we use a statistical approach that aims to provide explanatory insight into why our assignment is made. The rest of this paper is organized as followed. In Sect. 2, related work is discussed. In Sect. 3, We propose criteria used for bot-like-behavior detection. We use the criteria to train our model which detects accounts that satisfy any of the criteria and generates a report for each dataset with the results. In Sect. 4, we explain the results generated by the study.

2 Related Work

Twitter has been widely used since 2006, and the open structure of twitter lead people to question who is tweeting early on. Chu et al. [8], classified twitter users into human, bot, and cyborg accounts using 4 components each of which checks a specific criterion, then compute a score that enables classification. Davis et al. [9], created a service that evaluates the extent to which a Twitter account exhibits the similarity to the known characteristics of social bots. Their platform fetches a given account’s recent activity, then computes and returns a bot-likelihood score. The DARPA Twitter bot challenge [7] also addressed four different features they assigned to different teams to work on in the bot detection challenge. The detection systems created in the challenge were all semi-supervised and all teams used human judgement to augment automated bot identification processes.

3 Bot-Like Behavior Detection

3.1 Data Collection

Twitter has a set of API functions [10] that supports user information collection. Our data was collected using the Twitter API, where we crawled the most recent 200 posts by users from a known bot list [11]. We used a dataset consisting of 4 types of manually verified twitter bots: Fake Followers, Traditional Spam Bots, Social Spam Bots and Content Polluters. We also pulled a list of verified legitimate users from the same source.

3.2 Methods

We designed a study to describe a list of user behaviors for each twitter account. Using the article by Nimmo [1], we created a program that would detect the features that indicate bot-like behavior. After data collection, we ran the script that generated results for each user against our criteria. Using those results, we applied a stepwise logistic regression model based on Akaike Information Criteria (AIC) values to determine which of the 19 features were relevant when detecting bot-like behavior. Features are explained in Table 1.

Table 1. Features used for bot-like-behavior detection.

4 Analysis

We tested our script on all four bot categories data that we collected. Examining the results, we used the cut off value |z| = 2 as a threshold to extract features which are more relevant to the model. The |z| = 2 cut off value corresponds to two-sided hypothesis test with a significance level of = 0.05. A big magnitude of z-score indicates that the corresponding true regression coefficient is not 0 and that the variable matters. Based on the features we got, we described a series of bot-like behaviors.

For all four types of bots, there were 2 features in common: “most_recent_time” with a negative z-score and “status_num” with a positive z value which indicated if a user is more active recently, the user is most-likely not a bot and the user with a high number of tweets are more likely be a bot.

Based on the reports generated (Table 2), we noticed that each bot type has features more significant to that type. Fake follower bots are “simple accounts that inflate the number of followers of another account” [9]. Our results showed that fake follower accounts do not tweet frequently, but they have a significant number of friends consistent with their purpose: making other accounts popular. On the other hand, content polluters bots are designed to generate spam while masquerading as humans [12]. According to our analysis, content polluters have a high average number of tweets per day, and a significant number of friends. This corresponds to the idea of spam accounts in general, where accounts are trying to increase their outreach. In contrast, traditional spam bots are “a group of automated accounts spamming job offers” [9], which are easily identifiable as automated. In our dataset, the average time between two posts by traditional spam bots is short. Traditional spambots also rarely post retweeted content. Such behavior is consistent with accounts that are designed to posting job advertisements. Finally, social spam bots are “spammers of products on sale at Amazon.com” or “spammers of paid apps for mobile devices” [9]. The report shows that these accounts post several of hyperlinks, and do not engage in conversations (twitter mentions). Social spam bot behavior that we found here is consistent with their content suggested by the source.

Table 2. Features relevant to bo types.

5 Conclusion

Our results demonstrate that bot-like behavior differs significantly with bot design. Specifically, one may be able to infer the functional purpose for which the bot was created by exploring the specific features along which that particular bot type differs from human users. Future work in the area of bot detection could benefit from combining explanatory approaches grounded in traditional statistical analyses, in addition to the machine-learning approaches that are already in widespread usage.