Keywords

1 Introduction

Extremist groups’ powerful use of online social networks (OSNs) to disseminate propaganda and garner support has motivated intervention strategies from industry as well as governments however early efforts to provide effective counter-narratives have not produced the results desired. Mr. Michael Lumpkin, the director of the United States Department of State’s Center for Global Engagement, is charged with leading efforts to “coordinate, integrate, and synchronize government-wide communications activities directed at foreign audiences in order to counter the messaging and diminish the influence of international terrorist organizations” [3]. In a recent interview, Mr. Lumpkin expressed the need for a new approach:

So we need to, candidly, stop tweeting at terrorists. I think we need to focus on exposing the true nature of what Daesh is.Mr. Michael Lumpkin, NPR Interview March 3, 2016

A logical follow-up question to Mr. Lumkin’s statement would be “Expose to whom?” Recent literature suggests that “unaffiliated sympathizers” who simply retweet or repost propaganda represent a paradigmatic shift that partly explains the unprecedented success of ISIS [1, 2] and could be the audience organizations like the Global Engagement Center need to focus on. Gaining understanding of this large population of unaffiliated sympathizers and the narratives most effective in influencing them motivates methods to detect and extract information from large online threat-group-supporting communities (OTGSC). However, detecting, monitoring, and data-mining targeted OTGSs requires novel methods, and development must include both data science and regional expertise. We define data science as a set of fundamental principles that support and guide the principled extraction of information and knowledge from data, and in this paper we present the CASOS Jihadist Twitter Community (CJTC), a online community of over 15,000 Twitter users who support one or more of the radical groups engaged in the ongoing conflicts in Northern Iraq and Syria. We describe how large OTGSCs can offer unique insights into the unaffiliated supporters who appear critical to ISIS’ success. We then provide an example of one method used to excite and grow these OGTSCs in the form of an active social botnet. The botnet attempts to elevate the social influence of users supportive to Jabhat al-Nusra’s agenda, while encouraging following ties amongst botnet followers. Our goal is to present two novel examples of social computing applied to counterterrorism, and motivate the continued interdisciplinary collaboration required to gain understanding of large online communities and effectively counter extremist propaganda.

2 The CASOS Jihadist Twitter Community (CJTC)

On November 13, 2015 much of the world watched as terrorist launched a series of coordinated attacks in Paris killing 130 people. In near real-time social media erupted with support for the victims of these attacks, but some online communities viewed the attacks as cause for celebration. In fact, passive supporting but unaffiliated social media users have become an essential element of groups like ISIS and Jabhat al-Nusra’s recruiting strategy, possibly aid the motivation and resourcing for attacks like those seen in Paris [1]. Large online social networks like Twitter offer a means to generate large online communities, and many of the members appear to be “unaffiliated supporters.” In fact, Twitter has suspended over 125,000 ISIS-supporting accounts from August to December of 2015 [4]. As ISIS recruiters identify community members who show increasing levels of radicalization, small teams of social media cadre have been observed lavishing attention on these recruitment targets and subsequently move the conversation to more secure online platforms [2]. Less secure but large open platforms like Twitter enable extremist groups propaganda to gain broad reach. Denying this key terrain requires novel methods designed specifically to identify and analyze threat-group-supporting communities embedded in OSNs. Information like key users, powerful narratives, and advanced dissemination methods can all be extracted from OTGSCs to inform messaging and intervention strategies. Benigni et al. present Iterative Vertex Clustering and Classification [5], a novel method to detect large, ideologically organized online communities, using both agent level attributes and network structure. We briefly present the methodology, introduce the CJTC, provide illustrative analysis of the network, and share ongoing research goals in this section.

2.1 Background: From Community Detection to Threat Network Detection

The application of network science to counter-terrorism has a long history [6, 7]; however, the rise of social media and online social networks (OSNs) has motivated methods to apply network science theory to networks at much larger scale. Community detection attempts to identify groups of vertices more densely connected to one another than to other vertices in a network, but networks extracted from OSNs present unique challenges due to their size and high clustering coefficients. Furthermore, an individual’s social network also often reflects his or her membership to many different social groups. Thus in many instances algorithms that use only network structure do not provide the precision needed to identify OTGSCs [5]. A sub-class of community detection methods has emerged that attempts to leverage node attributes and network structure called community detection in annotated networks. These methods have been shown to perform well with OSNs because of their ability to account for a great variety of vertex features like user account attributes while still capitalizing on the information provided by the structure of the graph; they also perform well at scale [8, 9]. However, we find that effective OTGSC detection requires information from users’ following, mention, and hashtag behaviors as well. Benigni et al. present IVCC, an community detection method designed to extract OTGSCs by modeling users within a heterogeneous graph structure with annotated nodes [5] (Fig. 1).

Fig. 1.
figure 1

IVCC is an online threat-group-Supporting community (OTGSC) detection methodology conducted in two phases. In Phase I either community optimization or vertex clustering algorithms are used to identify positive and negative case examples to facilitate supervised detection in Phase II.

2.2 Iterative Vertex Clustering and Classification

Iterative Vertex Clustering and Classification (IVCC) is conducted two phases, and often iteratively. In Phase I, unsupervised clustering methods like Newman and Louvain grouping are used to both identify positive cases labels and remove noise. This pre-clustering facilitates supervised classification of OTGSC members in Phase II. At the core of the methodology is the use of both user level features and rich multiplex network structures offered by OSNs. First the authors construct \(U_{u\times a}\) consisting of a numeric user attributes where u is the total number of users or nodes in the network. Examples of such attributes are follower count, number of posts, or creation date. Node attributes could also be developed from other sources of intelligence. Spectral methods are used to dimensionally reduce network data like following, mention, or hashtag behaviors. By constructing symmetric graphs of users’ following F and mention M relationships, and a weighted bipartite graph H of hash tags in a user’s timeline, lead eigenvectors can then be extracted from each graph and concatenated with U to form a feature space for classification. Although IVCC is presented using Twitter data [5], similar methods could be used more generally with large heterogeneous networks.

Benigni et al. collected a two-hop snowball sample of five popular ISIS propagandists presented in [10], resulting in approximately 120,000 Twitter users. With two iterations of IVCC, they removed accounts with high following counts (i.e. politicians, news media members, celebrities, etc.), and extracted a network of nearly 23,000 ISIS supporters. The results of this initial work form the seed accounts for the CJTC.

2.3 Threat Network Analysis: The CJTC

CASOS is currently extending IVCC to dynamically monitor threat-group-supporting online communities. By using historical results and active learning, we update the CJTC based on the recent community activity. Currently the community contains just over 15,000 supporters, where we define a supporter as a Twitter user who positively affirms the leadership, ideology, fighters, or call to Jihad of any of the known Jihadist groups engaged in ongoing operations in Northern Iraq and Syria. The majority of tweeters voice support for ISIS or Jabhat al-Nusra though other groups are present. The size of this community offers insights not easily gleaned from randomly sampled Twitter data or manually developed datasets as will be highlighted in the remainder of this section.

Fig. 2.
figure 2

The left panel depicts the volume of hashtags used within the CJTC from AUG-NOV 2015. The right panel highlights the hashtags most explanatory of the increased activiy on November 14, 2015.

Though many demographical analyses could be useful, for conciseness we will use temporal network activity patterns to illustrate information extraction from OTGSCs. The Twitter REST API limits collection to a tweeter’s last 3,200 posts which forces us to normalize daily volume. Some tweeters have more than 3,200 posts in the past 6 months, and quite a few of our tweeters have not posted in over 90 days. Identification of dormant users could provide insight into the radicalization process, but will not be analyzed or discussed in this work. We estimate the CJTC’s daily volume by normalizing based on the number of tweeters in our dataset who have a collected tweet before and after any given day which often highlights current events that stimulate this community. A simple news search of events on days of increased activity often reveals operational events in Syria, Northern Iraq, or large scale terror attacks. Similar analysis of hash tag trends often provides richer insight. Figure 2 highlights temporal analysis of CJTC hash tag use. The left panel of depicts hash tag frequencies over time, while the right panel depicts trending hashtags on 13–14 November, 2015. Size in the word cloud connotes frequency, and color denotes how anomalous a particular tag’s frequency was when compared to a 6 months average. The community’s reaction to the 13 November, 2015 Paris attacks is illustrated with both increased volume and trending hashtags. Increased hash tag volume depicted in the left panel of Fig 2, coupled with the corresponding hash tag trends in the right panel give startling insight into the unique nature of this online community. Ongoing operations in Syria provide another example. The hash tag , translated Zabadani, increases tenfold in terms of daily frequency on 15 August and 18 September, 2015. Both dates refer the breakdown of ceasefire agreements in the region [11]. With proper subject matter and language expertise, similar analysis can identify changes in popularity of leaders, organizations, or narratives over time.

2.4 Moving Forward

As a supervised learning methodology, IVCC lends itself to leveraging regional expertise by learning patterns based on examples. Active-learning refers to supervised algorithms that iteratively select examples to be labelled by experts, and have been found substantially increase performance with far fewer labelled instances. Such methods could enable regional expertise to be incorporated into the classifier at minimal cost. Furthermore, a user-oriented, server-based interface could enable the regional expert to contribute to the set of annotated instances while conducting his or her own exploratory data analysis. As the set of annotated examples or “training set” grows new, more nuanced classifiers could be trained. Due to the size and diversity of these online communities, exploration and interpretation of results is likely a research area unto itself. One could identify the news sources or propagandists most influential within these communities, and develop more-informed counter-narratives and strategic communications strategies. The challenge in developing tools and methods to facilitate OTGSC analysis lies in the novelty of the analytical task. Regional experts cannot yet articulate exactly what they want methods to provide, and researchers are challenged to understand what information extractions are most useful to senior leader information requirements. Establishing online tools that provide illustrative analyses and capture feedback while end users to explore large communities would likely accelerate research efforts aimed at countering groups like ISIS.

3 The FiribiNome Social Botnet: Sophisticated Promotion of Propaganda to Excite a Community

While analyzing the CJTC, as well as a similar dataset focused on online dialogue focused on the Russian occupation of Crimea, we observe accounts that tweet with high daily volume, but each tweet or retweet simply contains a string of @mentions. In this section we analyze a network of social bots used to promote specific online activists or propagandists.

Social bots, software automated social media accounts, have become increasingly common in OSNs. Though some provide useful services, like news aggregating bots, others can be used to shape online discourse [12]. ISIS’ use of bots has been well documented [13], and their competitors are following suit. Social botnets are teams of software controlled online social network accounts designed to mimic human users and manipulate discussion by increasing the likelihood of a supported account’s content going viral. The use of bots to influence political opinion has been observed in both domestically [14] and abroad [15], the use of social bots has been documented in the MENA region [12], and ISIS use of them motivated a DARPA challenge to develop detection methods [16]. In isolation, these accounts appear to be producing spam and relatively harmless, however they are examples of a sophisticated strategy to promote specific accounts while remaining undetected by Twitter.

Fig. 3.
figure 3

Depicts mention behaviors and their effects within the FiribiNome Social Botnet. The left panel depicts two scaled time series. The red circles and smoothed trend line depict the number of daily mentions by botnet members. The blue circles and corresponding trend line depict botnet followers’ mentions of benefactor accounts. The association between the two series implies the botnet was able to generate discussion about benefactor accounts among its followers. The right panel depicts the mention network of the FiribiNome social botnet. The vertices are user accounts. The plot depicts how botnet members, red vertices, are used to increase the social influence of benefactors, black vertices, by promoting them to botnet followers, blue vertices. Vertices are scaled by follower count (Color figure online).

3.1 CJTC Botnet Analysis

Figure 3 depicts the mention activity associated with the a Jabhat al-Nusra supporting social botnet designed to increase the social influence of a specific set of accounts and encourage following connections between Jabhat al-Nusra supporting tweeters. The botnet consists of two types of accounts. Botnet members, are depicted by red vertices in the right panel of Fig. 3, and consist of 74 accounts exhibiting near identical behavior. Each account follows between 116 and 134 accounts, most of which are botnet members. Their following counts vary from 142 to 322 accounts of which many appear to be real tweeters. They come online for 38–58 days, tweet between 71 to 170 times, then go dormant. This behavior can clearly be seen by the red trend line in Fig. 3. Their tweets consist of original posts or retweets containing strings of @mentions of other botnet members, but occasionally mention or retweet content from what we call benefactor accounts (depicted by black vertices in the right panel of Fig. 3). The botnet account FiribiNome20 illustrates this behavior. In isolation, these accounts appear to be producing spam and relatively harmless, however our analysis indicates the network of botnet members increases the social influence of benefactor accounts. The blue series in the left panel of Fig. 3 and corresponding blue vertices in the right panel depict the mention activity of the 843 active botnet followers as of February 2016. The left panel depicts follower accounts’ mentions of benefactor accounts and the temporal relationship betwen the activity associated with each account type implying the botnet effectively promotes discussion of benefactor accounts. How much discussion is generated remains an open question. Due to the large number of extremist accounts suspended by Twitter, the number of botnet followers active in the summer of 2014 was likely much larger. This mention behavior exhibited by botnet members could also trigger Twitter’s recommendation system to recommend following ties between botnet followers,or encourage botnet followers to follow benefactors.

Examples of benefactor accounts are depicted in Table 1; each representing a slightly different style and type of messaging commonly observed in the CJTC. Dr. Hani al-Sibai is a London-based radical Islamic Scholar cited by Ansar al-Sharia as one of five influential motivators of Tunisian terrorists [17]. @ba8yaa or “Daesh are the Enemy” attempts to discredit ISIS through satire and counter-propaganda and could prove informative in development of counter-narratives. There are also many accounts that present the appearance of reporting near-real-time news like @Ghshmarjhy, while other accounts promote third party applications like @Almokhtsar and @FiribiNome12. We have found some of these applications request permission to tweet or follow users on the tweeter’s behalf. These highly followed and highly mentioned accounts each could offer insight into the sophisticated methods used to leverage social media.

Table 1. Depicts four account promoted by the FiribiNome social botnet. Each account represents a slightly different style and type of messaging.

3.2 Moving Forward

It is possible that botnet structures with similar mention behavior could be developed in a more sophisticated manner. Larger networks with more human-like behavior would be much more challenging to detect. The FiribiNome botnet could simply represent a proof of concept explaining its lack of activity since 2014. Although simple heuristics like average mentions per tweet enabled us to detect the botnet, more advanced detection strategies are needed to determine if more sophisticated botnets are influencing the CJTC. Methods of operationalizing this type of intelligence are worth exploring as well. It is possible that similar mention behaviors could be used to target specific online communities with counter-narratives. Again, the need for an interdisciplinary collaboration between the data scientist, regional expert, and decision maker is needed to identify opportunities for useful intelligence extraction.

4 Conclusion

We have highlighted the potential of extracting intelligence from large online threat-group-supporting communities (OTGSCs) and presented illustrative examples with a goal of motivating continued interdisciplinary collaboration. We have also presented the CJTC dataset as an example of an OTGSC to emphasize how detecting and monitoring OTGSCs can be an important tool in understanding the passive support structure essential to the distribution of extremist propaganda. Furthermore, these methods could facilitate identification of sophisticated dissemination techniques used in these communities and inform our own information operations. Our goal is to refine these methods and grow a consortium of data scientists, regional experts, and strategic decision makers by hosting, curating, and reporting on datasets like the CJTC.