1 Introduction

Online social networking websites are rapidly affected by botnets, which are automated software programs that control social network user accounts with malicious activities [15]. A social botnet is a group of bots which are controlled by a botmaster. The botmaster establishes a social relationship among the social bots and the normal online social networking participants (users) in order to reduce the probability of identification [35]. Like traditional bots (in Internet chat), social bots are more common in Twitter [22]. A social bot is created with the support of open APIs (like Twitter API) [33]. Social bots have several applications with normal or malicious intention. Especially, Twitter online social network is a suitable platform for the generation of social bots with an intention to create sybil attack, spam distribution and posting of useful information (such as news or blog updates) [43]. Social bots are used to influence other Twitter users by posting tweets to degrade the services and manipulate the public opinion [14]. Moreover, the social bots add normal users as their friends and expect a few normal users to follow. The tweets posted by the social bots are displayed on the normal users’ home page. However, a few normal users may be attracted by the tweets posted by the social bots (which contain malicious text or malicious URLs) [28]. If the normal users establish the relationships with social bots, then the entire social networking community will be affected with untrustworthy information [30]. Most of the existing techniques identify Twitter bots based on the follower ratio, URLs and incomplete user profile. For example, a Twitter bot can retweet other user tweets and can also follow other users. Moreover, the follower ratio is one of the important attributes to identify a social bot [17].

The traditional botnet detection approaches mainly focus on peer-to-peer networks and botnet-based command -and-control protocols [11]. Moreover, these types of approaches fail to detect social bots. Social bots are more common in Twitter in order to obtain command-and-control information, such as follower ratio, tweets and URLs. Traditional Twitter bots can easily be detected as they view the profile page of a user frequently in order to obtain command-and-control information [27]. However, some future bots may avoid their weaknesses by considering the private message passing feature provided by online social networks in order to establish the relationship among social bots (which are controlled by the botmaster) [45]. The major limitation that exists with the traditional approaches is the detection of social bots by considering only user profile-based features [12, 41]. Moreover, these types of approaches fail to distinguish legitimate users from the new kinds of social bots. The following are the challenges related to social bot detection in online social networks: (i) due to the continuous growth of online social networking participants (users) and content of information posted every day in social networks, the complexity of online social networks is rapidly increasing for analyzing the behavior of user, (ii) in an online social network, the behavior of user rapidly changes over time. Thus, it is important to extract the information that is needed to evaluate a user based on identifying the malicious behavior of user, (iii) a few malicious users may use tools to create fake accounts or manipulate their influence value and (iv) the user may be influenced by various factors, such as content of information posted in the tweet and behavior of other users. Thus, it is important to distinguish the social bots from legitimate users in online social networks.

To address the above challenges, Reinforcement Learning (RL) has been adopted for social botnet detection problem because in an online social network the behavior of a user rapidly changes over time. To detect the social bots, the learning agent has to learn from past experiences to reach a goal state through several episodes. The main objective of RL is to obtain an optimal policy (i.e., process of selecting a specific learning action) at each state [31]. Moreover, two different learning strategies are adopted to determine the optimal policies quickly. In the first strategy, each learning agent (i.e., user) learns individually by considering past experiences of another learning agent from a random environment. In the second strategy, each learning agent learns by (frequently) establishing interactions with other learning agents (i.e., by commenting and retweeting on other users’ tweets). Several existing reinforcement learning approaches have been proposed [25, 29, 39] to obtain the optimal policy. Q-learning is one of the RL techniques. In Q-learning, choosing an optimal policy from large number of training samples is a difficult task. Moreover, Q-learning needs less number of states in order to converge quickly [42]. To overcome this problem, we consider deep Q-learning model by using Q-value function (i.e., state-action value function) through a deep Q-network, which is an efficient method of learning from high dimensionality data. Hence, in comparison to Q-learning, deep Q-network has more ability to handle large number of states and can converge quickly when compared to Q-learning [18].

In this paper, we consider a set of social attributes, such as tweet-based attributes (i.e., from the content of each user tweet), user profile-based attributes (i.e., from a series of each user weekly tweets) and social graph based attributes (i.e., the users’ interaction with their friends and followers) to identify the suspicious behavior of social bots. The objective of this paper is to classify the Twitter accounts into two categories, namely legitimate users and social bots. In this paper, we consider each social attribute of a user as a state. The learning agent’s movement from one state to another state is considered as an action. For the Q-value function, we consider all the state-action pairs in order to construct the state transition probability values between the state-action pairs. In the proposed algorithm, the learning agent chooses a specific learning action with an optimal Q-value in each state for social bot detection. We have summarized our contribution as follows:

  • A user behavioral analysis model is proposed by (extracting and) ranking social attributes (such as tweet-based attributes, user profile-based attributes and social graph based attributes) in Twitter network based on the tweet propagation and user interactions.

  • We propose a deep Q-network based architecture by integrating deep Q-learning model with social attributes for social bot detection based on the Q-value function (i.e., state-action value function). Further, an algorithm has been proposed to identify the most influential users (which are influenced by the social bots) based on the tweets and the users’ interactions.

  • The experiments are conducted on three datasets collected from the Twitter network, such as The Fake Project dataset [13], Social Honeypot dataset [26] and User Popularity Band dataset [19].

The rest of the paper is organized as follows: The related work is discussed in Section 2. In Section 3, a Deep Q-Learning algorithm has been proposed for detecting the social bots and identifying the most influential users in an online social network. The experimental results are discussed in Section 4. Finally, the conclusion of this paper is presented in Section 5.

2 Related work

In this section, the prior works on presenting vulnerabilities of several social networks, social bot detection and identifying top-k influential users in online social networks have been discussed.

The social bots in online social networks (like Facebook and Twitter) have gained more attention recently. Chu et al. [12] have categorized Twitter users into three different groups (i.e., human, bot and cyborg) based on their tweeting behavior, account based features and tweet content. Further, the authors have proposed a classification model which includes three major components, (i) an entropy based component used to measure regular tweeting behavior of user, (ii) a spam detection component used to verify whether tweet contains any spam content or not and (iii) account based features to classify the users. Yan [45] has discovered three different types of social botnets (such as appendix botnet, standalone botnet and crossover botnet) which are hidden in Twitter network based on dividing graph into small connected components which help to effectively monitor social botnet activities. In their work, the authors have analyzed Twitter network by constructing a social network graph in which node represents a Twitter user and edge represents the information flow between two connected users. Further, the authors have investigated the size of weakly and strongly connected components for identifying suspicious activities of social bots in Twitter network. Zhang et al. [47] have analyzed over thousands of Twitter accounts and discussed two new types of social botnet attacks, such as manipulation of user’s influence value and spam distribution on Twitter. The authors have identified that botmaster constructs a retweeting tree, where the root bot is regarded as spam originator and remaining all other bots only retweet spam content from the parent bot. In botnet-based manipulation of user influence, the authors have found that a few malicious user can manipulate their influence value to attract legitimate users. Further, the authors have presented two countermeasures to protect against the social botnet attacks based on maintaining spam score of each user and identifying the credible users among social bots. In [40], a stegbot detection method in multimedia social network has been proposed to detect stegbots. The authors have analyzed that stegbots can affect the legitimate users by performing malicious activities such as stealing sensitive information (like credit card details and password), phishing and spreading spam content. The authors have also extracted the social attributes (such as image based features, user profile based features and network based features) to distinguish between legitimate users and malicious users (stegbots). Kudugunta et al. [24] have proposed a deep neural network model based on long short term memory (LSTM) architecture. In this architecture, content based features (such as retweet count, number of hashtags and number of mentions) and user metadata based features (such as status count, follower count and default profile) are given as input to LSTM for social botnet detection. The authors have also analyzed that considering only tweet based features may not effectively detect the social bots in online social networks.

Freitas et al. [17] have studied social bot infiltration strategies in Twitter network. The authors have created social bots in Twitter network by performing malicious activities, such as spam distribution, following other users and retweeting other users’ tweets. Their work also shows that only 31% of social bot accounts have been detected by Twitter network after one month. Further, the authors have addressed a strategy to gain more number of followers and an automated strategy of posting tweets (through program) to avoid bot detection. In [1], a support vector machine-based optimization learning algorithm has been proposed for detecting spam bots based on content-based features, profile-based features and user behavior-based features. The authors have analyzed the most influencing features for detecting the spammers in an online social network. Subrahmanian et al. [36] have proposed to separate bots from other Twitter users on a specific topic. The authors have identified additional bots based on the cosine similarity between bot and human. Further, the authors have analyzed behavior of social bot based on the hashtag co-occurence, prediction score (higher value more likely to be social bot) and proposed program that could generate social bots by varying number of parameters for social botnet creation. Ashfaq et al. [4] have designed a framework for bot detection using Bayesian network classifier model. This model quantifies a belief value (which lies within a range of 0 and 1) to indicate whether a host is acting as a bot or not. Halfaker et al. [20] have summarized Wikipedia’s Immune system to distinguish social bots from cyborgs (which integrate both human (i.e., manual characteristics) and bot (i.e., automated) behavior). The programmable Wikipedia social bots are capable of performing many activities (like spell checker bots) on the website. In [11], a botnet detection approach has been proposed based on the node centrality measures, such as degree centrality, betweenness centrality, eigenvector centrality and pagerank centrality. Further, the authors have adopted self organizing feature map in order to form clusters based on these features and focused on the abnormal behavior of social bots.

In large online social networks (like Facebook), Baltazar et al. [5] have analyzed that 20% of normal users accept friend requests with at least one common friend. Moreover, in some social networks (like Twitter), establishing social interaction with strangers is one of their features. Boshmaf et al. [8] have proposed a social bot network model on Facebook in order to infiltrate the Facebook users by creating programmable social bots for two months duration. Later, the authors have analyzed the behavior of users before and after infiltration. Their work shows that up to 80% of online social networks can effectively be infiltrated. Ferrara et al. [15] have proposed botnet detection approaches based on crowdsourcing based features, graph based features and user based features. The authors have identified two limitations in the crowdsourcing based features, (i) human experts fail to detect fake accounts more accurately, and (ii) revealing the personal information to the human experts lead to privacy issue. Graph based features are taken into consideration to detect sybil accounts by analyzing the social network graph. For social botnet detection, the authors have considered user based features, such as sentimental analysis and content based features. Several existing effective approaches have been proposed for detecting social bots in online social networks [11, 12, 24, 45]. The existing approaches have considered either tweet based features or graph based features for detecting social bots in online social networks. However, by considering either of these features, we can not effectively predict the accuracy of social botnet detection. In this paper, we consider tweet based attributes, user profile based attributes and social graph based attributes to effectively distinguish social bots among legitimate users. Further, we focus on identifying influential users, which are influenced by social bots (termed as influence bots).

Zhang et al. [46] have presented a True-Top sybil resilient system for measuring user influence value in Twitter network. The authors have analyzed that in Twitter network, users usually interact with strangers. Ma et al. [32] have identified that detecting influential users plays a vital role in spreading spam content in online social networks. The authors have observed that centrality measures (such as betweenness, closeness and pagerank) are important to identify how quickly the information can be spread across social networks. Further, the author have proposed Adjustable Multi-hop Spreading (AMS) method to measure the user influence. Alshahrani et al. [3] have proposed D-hops model, which incorporates degree centrality with multi-hop distance measure for identifying top-k influential users in directed and undirected graph. In [2], the authors have proposed and validated a user centric approach based on four different social attributes (namely social-emotional, socio-psychological, behavior and privacy related attributes) for detecting cyber attacks in online social networks. Further, the authors have analyzed how these attributes have more impact on influencing users in social networks. Wu et al. [44] have presented topic behavior influence tree method based on five features (such as message content, hashtags, replies, mentions and retweets) for identifying influential users in Twitter network. In the above mentioned works, the authors have not taken into consideration the impact of social bots for identifying influential users, which are influenced by social bots (termed as influence bots) in Twitter network. In this paper, we propose a deep Q-network based architecture by integrating deep Q-learning model with social attributes for social botnet detection based on the Q-value function (i.e., state-action value function). Further, an algorithm has been proposed to identify the most influential users based on the tweets and the users’ interactions.

2.1 Reinforcement learning

In Reinforcement learning (RL), the learning agent communicates with a random environment in order to increase the performance of the entire system. The random environment rewards the learning agent after executing a finite set of actions (which is termed as reinforcement signal) [31]. The learning agent considers the current reward and the cumulative reward.

Let S = {s1, s2, .... , sn} be a set of states and LA = {α1, α2,....., αn} be a finite set of learning actions. The learning agent chooses an action αjLA and obtains a reward based on the present state siS. Further, a next state si+ 1S depends on the previous state si and the learning action αj. The process of selecting a specific learning action is defined as a policy. Moreover, the learning agent has to determine an optimal policy at each state s in order to maximize the cumulative reward.

Q-learning is one of the RL techniques [18]. The Q function is implemented by using a function approximator (i.e., a neural network or a deep neural network) to estimate the value of the function. The major limitation of Q-learning is that choosing an optimal policy from a large number of training samples is a difficult task. To overcome this problem, Q-learning model is integrated with deep neural networks (which is defined as deep Q-learning model). In this work, a deep Q-learning algorithm has been proposed (which uses a deep Q-network) to extract high dimensional user profile based features for detecting the social bots in online social networks.

2.2 Deep Q-learning

In Q-learning, the learning agent adopts a neural network (defined as a Q-network) with a Q-function, which is represented as Q(s, α; w). The Q-function is determined by considering the parameters, such as state s, learning action α and current reward r. The learning agent chooses a specific action α and obtains a corresponding reward. The parameter w represents the weights associated with each layer in the Q-network at time t. Further, the next state s′ ∈ S depends on the previous state s and the learning action α [42]. Therefore, the Q-function (i.e., state-action function) is given by

$$ Q(s,\alpha)= (1-\epsilon)Q(s,\alpha)+ \epsilon (r+\gamma [max_{\alpha^{\prime} \in \alpha(s^{\prime})}Q(s^{\prime},\alpha^{\prime})-Q(s,\alpha)]) $$
(1)

where 𝜖 represents the learning rate (i.e., 0 < 𝜖 < 1) and α(s′) represents a finite set of learning actions in next state s′.

Deep Q-network adopts a deep neural network rather than using a Q-network. Three techniques are required to convert Q-network into deep Q-network. Firstly, neural networks are replaced with deep convolutional networks, which help to extract high dimensional features from the input dataset. Secondly, an experience relay is introduced to allow the learning agent to recall and reuse the social interactions from the past experiences. Moreover, the learning agent stores the past experiences (as an experience tuple at a time ’t’ i.e., <state, action, reward, next_state>) into a replay memory. Lastly, a deep neural network predicts the target Q-values, which are used to determine the loss function Loss(w) for each learning action. If only one network is used for estimating the Q-value and the target Q-value, then the result may lead to a feedback loop [18]. Therefore, in deep Q-learning, the target weights (i.e., w) are to be periodically updated. The deep Q-learning model is trained to reduce the loss function Loss(w) from the replay memory which is given by

$$ Loss(w)=E\left[(y-Q(s,\alpha;w))^{2}\right] $$
(2)
$$ y=r+\gamma Q(s^{*}, arg max_{\alpha^{*} \in \alpha(s^{*})}Q(s^{*},\alpha^{*};w)w^{*}) $$
(3)

where y represents the target value and w represents the updated weight for each iteration in deep Q-network.

3 Deep Q-learning model for detecting social bots and influential users

In this section, we define a set of social attributes for distinguishing the social bots among legitimate users. In addition, we design a deep Q-network architecture, which incorporates the proposed Deep Q-Learning algorithm and social attributes from the Twitter network for social bot detection. Further, in this section, an algorithm has been proposed to identify the most influential users (which are influenced by the social bots) based on the tweets and the users’ interactions.

3.1 Deep Q-network architecture

Figure 1 shows the proposed deep Q-network architecture for detecting the social bots in Twitter network. Three different types of social attributes (such as, tweet-based attributes, user profile-based attributes and social graph based attributes) are given as input to the deep Q-network. Firstly, the tweet-based attributes such as syntax, semantic and temporal behavior attributes are extracted from each user tweet (as shown in Fig. 1 as rectangular boxes). Secondly, user-profile based attributes are extracted from a series of each user tweets (with weekly sampling time period) based on the tweeting behavior and the user interactions. Lastly, social graph-based attributes are extracted from the tweets based on the social relationship among participants (users). Therefore, a server (i.e., an interface between an online social network data and deep Q-learning model) is responsible for collecting each participant’s (user’s) social attributes. For each user, the server (as shown in Fig. 1) stores the collected data in the form of state vector S (which is discussed in Section 3.3), which contains a set of states (i.e., social attributes). Next, for each user the server sends S to deep Q-network. For every user, the social attributes values are collected in each time slot (where the total time ’t’ is divided into ls time slots and ∑ i= 1lsτi = t). Later, deep Q-network determines an optimal action for each state. After executing the action, the server decides whether the corresponding user is acting as a social bot or a legitimate user. Finally, after detecting the corresponding user as a social bot, then the server isolates the social bot from the Twitter network. Therefore, the system (i.e., deep Q-learning model) is transferred to the next state after a specific action is executed and obtains a reward which is computed by using the reward function (which is discussed in Section 3.3). Moreover, at each time ’t’, the deep Q-network stores the experience tuple (i.e.,<state, action, reward, next_state>) into a replay memory. Therefore, the proposed deep Q-learning model is used in the proposed architecture that helps to differentiate social bots among legitimate users and identifies the most influential users in the Twitter network.

Fig. 1
figure 1

Deep Q-learning architecture for social botnet detection

3.2 Classification of social attributes

To address the challenging issue of social bot detection, the social attributes are classified into three categories, such as tweet-based attributes [1, 15, 36], user profile-based attributes [12, 40, 41] and social graph based attributes [11, 40, 45].

3.2.1 Tweet-based attributes

We extract tweet-based attributes from the content of each user’s tweet. Hence, tweet-based attributes describe about tweet syntax, tweet-semantics and temporal behavioral features, which are listed in Table 1.

Table 1 Overview of Tweet-based Attributes from [1, 15, 36]

We adopt a lexical normalization technique on Twitter data to obtain the individual words (or tokens) [23]. Further, we classify the individual words into either positive or negative emotions based on the user’s tweets. Moreover, we need to extract punctuation symbols (’#’,’?’, ’!’, ’....’, ’.’), special characters (’@’, ’$’, ’%’) and emoticons (i.e., ) in the tweet. Geo-tagged user’s tweets gives the information about location of the posted tweets [10]. Latent dirichlet allocation is used to identify the most frequent words or topics posted by the Twitter users [7]. A sentimental analysis framework and an opinion analysis system are adopted in order to compute a user’s sentimental score based on the tweet [9, 37].

The tweet-based attribute vector \(A^{(T)}_{p_{i}}\) of ith participant’s jth tweet is represented as

$$ A^{(T)}_{p_{i}} =<<S{y_{i}^{j}}>, <S{m_{i}^{j}}>, <T{m_{i}^{j}}>> $$
(4)

3.2.2 User profile-based attributes for behavioral analysis

User profile-based attributes show the behavioral characteristics of the user’s. Moreover, some malicious users send friend requests to unknown user accounts or randomly share tweets with other users. In this paper, we extract user profile based attributes from a series of user’s tweets with sampling time per-week basis. Moreover, the tweet size is limited up to 140 characters [47]. Hence, we define user profile-based attributes to distinguish social bots among legitimate users based on two aspects, such as tweet behavioral attributes and user interaction attributes. An overview of user profile-based attributes are listed in Table 2.

Table 2 Overview of User profile-based Attributes from [12, 40, 41]

The user profile-based attribute vector \(A^{(U)}_{p_{i}}\) of participant pi is represented as

$$ A^{(U)}_{p_{i}} =<<TB_{i}>, <UI_{i}>> $$
(5)

3.2.3 Social graph-based attributes for tweet propagation

The social graph-based attributes mainly focus on the social relationships among the users. Hence, we define social graph-based attributes, such as clustering coefficient, closeness centrality, betweenness centrality and pagerank centrality for each user [11, 40, 45].

  1. i)

    Clustering coefficient: A social graph G = (P, L), where P represents a set of participants (users) and L represents a set of directed links. A directed link lij represents the social interaction from a participant pi to a participant pj. If pi’s has n links with its neighbors, then the clustering coefficient CC(Pi) is defined as CC(Pi) = ndi(di− 1), where di represents the number of neighbors of a participant pi.

  2. ii)

    Closeness Centrality: Closeness centrality is defined as sum of distance between a participant and all other participants in a social network. Therefore, the closeness centrality of the participant pi (which is denoted as C(pi)) is defined as \(C(p_{i})=\frac {1}{\sum d(p_{i},p_{j}):p_{j}\in P} \).

  3. iii)

    Betweenness Centrality: Betweenness centrality is a measure for identifying the important participants (users) within a social network. The betweenness centrality of the participant pk (which is denoted as Bc(pk)) is defined as \(B_{c}(p_{k})= {\sum }_{p_{i}\neq p_{k} \neq p_{j}} \frac {\sigma _{p_{i}p_{j}}(p_{k})}{\sigma _{p_{i}p_{j}}}\), where \(\sigma _{p_{i}p_{j}}(p_{k})\) represents the total number of paths from participant pi to participant pj passing through an intermediate participant pk and \(\sigma _{p_{i}p_{j}}\) represents the total number of shortest paths from pi to pj

  4. iv)

    Pagerank Centrality: Pagerank centrality of a participant is defined as the out-degree centrality of participant by establishing social relationships among participants (users) based on their interactions. Therefore, the pagerank centrality of a participant pi (which is denoted as PR(pi)) is defined as \(PR(p_{i})= 1 - d_{f} + d_{f} {\sum }_{\forall p_{k} \in M(p_{i})} \frac {PR(p_{k})}{deg_{o}(p_{k})}\), where df represents the damping factor (whose value usually set to 0.85 [45]). Further, M(pi) represents the set of participants that have directed links pointing from participant pi. The term dego(pk) represents the out-degree of a participant (node) pk.

The social graph-based attribute vector \(A^{(G)}_{p_{i}}\) is represented as

$$ A^{(G)}_{p_{i}} =<CC(p_{i}), B_{C}(p_{k}), C(p_{i}), PR(p_{i})> $$
(6)

For a given (unknown) participant piP, the social attribute vector is represented as

$$ A_{p_{i}} =<<A^{(T)}_{p_{i}}>, <A^{(U)}_{p_{i}}>, <A^{(G)}_{p_{i}}>> $$
(7)

Therefore, the social attributes determines the social bots among legitimate users.

figure f

As shown in Algorithm 1, for each participant (user) in an online social network, we extract three different types of social attributes, such as tweet-based attributes (< A(T) >), user-profile based attributes < A(U) > and social graph based attributes < A(G) >. The tweet-based attributes are extracted from the content of each user’s tweets. The user-profile based attributes are derived from a series of each user’s tweets (with weekly sampling time period) based on the tweeting behavior and the user interactions. Further, the social graph-based attributes are extracted based on the social relationships among the users (Line 3-6). The social attributes have to be normalized since the range of the values are different for different social attributes. In this paper, we adopt z-score normalization technique, where a value of each social attribute ai is normalized as \(a^{\prime }_{i}=\frac {a_{i}-\bar {a_{i}}}{\sigma _{a_{i}}}\) (\(\sigma _{a_{i}}\) and \(\bar {a_{i}}\) are the standard deviation and mean of social attribute ai) (Line 7). All the social attributes are not equally important for social bot detection. Further, we adopt principal component analysis (PCA) method by integrating with a ranking measure in order to create a priority vector which ranks the social attributes based on their relative importance. Moreover, the extracted social attributes should be weighted before determining Q-value function, since the social attributes have more impact on bot detection (Line 8-13).

3.3 Our proposed deep Q-learning model for detecting social bots

We propose a deep Q-learning algorithm by considering the following elements:

  • State: The state vector St (for each user) at time slot t is defined by a set of social attribute. The state vector S for a participant (i.e., user) is represented as

    $$ S_{t}=<\{s_{t}^{i1}, s_{t}^{i2}, .......s_{t}^{ij} \}> $$
    (8)

    Here, each participant (user) piP is associated with a set of social attributes A (refer Section 3.2 (7)). Moreover, each value \(s_{t}^{ij}\) represents the jth social attribute of ith participant at time slot t. Further, the goal state is defined as detecting each user as a social bot or not.

  • Action: An action is the selection of a state among ’n’ states based on the current state. Moreover, the learning agent’s movement from one state to another state is defined as an action α. Further, at each state, the server has to decide whether the corresponding user is a social bot or a normal user based on the social attributes (refer Section 3.2 (7)).

  • Reward: The reward value is determined based on the social behavior of each participant (user).Therefore, the reward function rt at a time ’t’ is computed as follows:

    $$ r_{t}= \beta x_{t}^{ij}+c $$
    (9)

    where \(x_{t}^{ij}\) represents the jth social attribute value associated with a participant Pi at time ’t’. Let β represents a model parameter (whose value lies between 0 and 1). Further, if the state is a goal state (i.e., detected as a social bot) then it gets rewarded and c set to 1. Otherwise, if goal state is not obtained then c will be -1.

figure g

The proposed Deep Q-Learning algorithm (refer Algorithm 2) initializes replay memory and deep Q-network (which is denoted \(Q(s^{ij}_{t},\alpha _{t};w)\)) with associated weights w (Line 1). For each episode (i.e., user), Deep Q-Learning algorithm initializes state vector St and begins with a state \(s^{ij}_{t}\) (which represents the jth social attribute of ith participant) at time slot t. Moreover, by random selection the learning agent chooses an action αt at time t. Later, the learning agent executes the learning action αt by observing the next state \(\overline {s}^{ij}_{t}\) and reward rt at time slot t. For each action, the learning agent (i.e., deep Q-network) stores the past interactions (as an experience tuple \(e_{t}=<s^{ij}_{t},\alpha _{t},r,\overline {s}^{ij}_{t}>\)) into replay memory. The Deep Q-Learning algorithm is executed as follows: (i) Update the Q-values (by using (1)), (ii) Compute the target Q-values (by using (3)), (iii) Deep Q-network is updated by reducing the loss function (by using (2)) and (iv) Deep Q-network parameter w (where w represents the weights associated with deep Q-network) is updated at time slot t. Therefore, this process will be repeated for a finite number of episodes (Line 2-14).

3.4 Influence bots in Twitter

In order to influence the user in the Twitter network, the social bots may post malicious information in a tweet. Moreover, social bots may influence a few legitimate users by posting attractive and fake information in the tweet. Even though the social bots are isolated from the Twitter network, few users may be influenced by the tweets posted by the social bot. In this paper, we have identified the most influential users, which are influenced by social bots (termed as influence bots) in Twitter network. The user influence value is defined as a measure of influencing more number of users by rapidly sharing a tweet among users and influencing based on their social interactions (such as retweets, replies, comments and mentions) in the Twitter network. Moreover, after reading a tweet, a reader may post comment about a tweet, retweeting a tweet or posting a tweet with similar opinion. Hence, this implies that a user has influenced reader’s opinion. Therefore, the user influence value is determined based on the social interaction behavior of other user’s after reading a tweet. If a tweet has more number of comments, likes and retweets, then the user influence value is high. As mentioned in Algorithm 3, a user influence score (UI) is based on two parameters, such as the influence of user’s tweets and influence of user’s interactions in the Twitter network and it is defined as

$$ UI_{p_{i}}=\left\{\begin{aligned} I^{T} (p_{i})+ I^{I} (p_{i}) & \text{if participant (user)} p_{i} \text{follows participant} p_{j}\\ 0 & \text{otherwise} \end{aligned}\right. $$
(10)

where IT(pi) is the influence of user’s tweets and II(pi) is the influence of user’s interactions.

3.4.1 Influence of user’s tweets

The influence of each user’s tweet is based on the number of comments, retweets and replies. Commenting on a tweet represents that a user wants to express his/her views and willing to share the opinion of tweet with his/her friends. Retweeting a tweet represents that a user is supporting about the opinion of tweet. Moreover, if a tweet is commented, retweeted, liked and replied more number of times, then it indicates that the probability of user reading a tweet is high. The influence of user’s series of tweets IT is defined as the probability of sharing a tweet from participant (user) pi to its neighbors is defined as

$$ I^{T}(p_{i})=Co(tw)+ Li(tw)+ RT(tw)+ RE(tw) $$
(11)

where Li(tw) and Co(tw) represent the number of likes and comments posted for tweet tw, respectively. Further, RT(tw) and RE(tw) represent the number of retweets and replies posted for tweet tw, respectively.

figure h

3.4.2 Influence of user’s interactions

The influence of user’s interactions is based on the clustering coefficient, betweenness centrality and closeness centrality measures (refer Section 3.2.3). If the user’s degree centrality is high, then it implies that the probability of reading a tweet will be high. If the user’s clustering coefficient is high, then it implies that all its neighbors are strongly connected. If the user’s betweenness centrality is high, then the user can quickly share tweet to the entire Twitter community through a few users. If the user’s closeness centrality is high, then the user has more ability in order to control the information from spreading. The influence of user’s interactions II is defined as

$$ I^{I}(p_{i})=D_{c}(p_{i})+CC(p)+B_{C}(p_{i})+C(p_{i})+PR(p_{i}) $$
(12)

where Dc(pi) and CC(pi) represent the degree centrality and clustering coefficient of participant (user) pi, respectively. BC(pi) and C(pi) represent the betweenness centrality and closeness centrality of pi, respectively. Further PR(pi) is denoted as the pagerank centrality of pi.

figure i

3.4.3 Proposed top-k influential users algorithm

The proposed top-k influential algorithm (refer Algorithm 4) is used to identify the most influential users (which are influenced by the social bots) based on tweets and the user’s interactions in Twitter network. For each user, we need to compute the user influence value (refer Algorithm 3). Moreover, we need to rank the users based on their influence value. Hence, we monitor the ranking value of each user between two consecutive iterations. The rank distance dr(k) is measured between the ranking of kth-influential user in two consecutive (i.e., at ith and ith − 1) iterations and it is defined as

$$ d_{r}(k)=\sum\limits_{i=1}^{K} |R_{i}(p_{i})-R_{i-1}(p_{i})| $$
(13)

where Ri(pi) and Ri− 1(pi) represent the ranking of influential user pi at ith and ith − 1 iterations, respectively. Let K represents the total number of influential users (which are influenced by social bots). If the difference between the ranking value of user in two consecutive iterations is less than threshold Tf, then the algorithm is terminated and returns the top-k influential users. Moreover, larger Tf lead to high accuracy of identifying the most influential users.

4 Experimental setup and performance evaluation

In this section, the experimental results are presented to evaluate the performance of our proposed Deep Q-Learning algorithm by considering three real-world datasets collected from the Twitter network, such as The Fake Project dataset, [13], Social Honeypot dataset [26] and User Popularity Band dataset (the dataset is partitioned into four groups based on number of followers) [19]. The details of three different Twitter datasets are presented in Table 3. We compare the proposed Deep Q-Learning algorithm (with three hidden layers) with the other existing algorithms, such as feed-forward neural network (FFNN) [38], deterministic Q-Learning (QL) [42] and regularized deep neural network (RDNN) [6]. We have evaluated our proposed DQL algorithm for social bot detection and identified top-k influential users. Further, we compare our proposed top k-influential users algorithm with other existing algorithms, such as degree centrality based radius-neighborhood (DERND) [3], suspected infected recovered (SIR) diffusion model [34] and true-top [46]. Hence, we evaluate the performance of our proposed Deep Q-Learning algorithm in terms of precision, recall (true positive rate), false positive rate and f-measure (refer Section 4.1) [47]. We have conducted experiments using a GPU-based server with NVIDIA Tesla, which has 4 NVIDIA GPUs. The CPU is 2.10 GHz Intel Xenon E5-2620 v4 processor with 32GB memory. The NVIDIA driver version is 410.79. Further, CUDA 10.0 with cuDNN 7.3.1 is used. The software environment is tensorflow 1.12.0 with Spark 2.4.0 in Ubuntu 16.04.

Table 3 Summary of datasets collected from Twitter

We consider three different types of social attributes (such as tweet-based attributes, user profile-based attributes and social graph-based attributes) from the three different Twitter datasets. We have summarized the outcomes of our proposed Deep Q-Learning algorithm by defining the following metrics:

  • True Positive (TP): the total number of users detected as social bots, which are actually social bots,

  • True Negative (TN): the total number of users detected as legitimate users, which are actually legitimate users,

  • False Positive (FP): the total number of users detected as legitimate users, which are actually social bots,

  • False Negative (FN): the total number of users detected as social bots, which are actually legitimate users.

Further, we define the following metrics to compare the proposed Deep Q-Learning algorithm with all other existing algorithms, namely:

  • Accuracy: the proportion of true positives and true negatives, which is defined as \(\frac {TP+TN}{TP+TN+FP+FN}\),

  • True positive rate (or Recall): the proportion of real positives that correspond to predicted positives, which is defined as \(\frac {TP}{TP+FN}\),

  • False positive rate: the proportion of real negatives that correspond to predicted negatives, which is defined as \(\frac {FP}{FP+TN}\),

  • Precision: the proportion of predicted positives that correspond to real positives, which is defined as \(\frac {TP}{TP+FP}\).

  • F-measure: \(\frac {\textit {2.precision.recall}}{\textit {precision} + \textit {recall}}\)

4.1 Experimental results

Figure 2a, b and c show the convergence performance of our proposed DQL algorithm with two different learning rate parameter values i.e., 𝜖 = 0.001 and 𝜖 = 0.0001. We can observe that our proposed algorithm quickly converges with a learning rate of 𝜖 = 0.001 when compared to 𝜖 = 0.0001. Moreover, a higher learning rate leads to a local optimum in order to obtain higher precision value. From Fig. 2, it can be observed that for learning rate 𝜖 = 0.001, the precision value is more than 90% (on an average) for social bot detection. Further, lowering learning rate value below 0.0001 will give lower precision. The convergence of target Q-function is also affected by other parameters, such as discount factor and mini batch size. The parameter values that are used for computing the target Q-values are listed in Table 4. The discount factor γ determines how much weight it provides for future reward (γ value usually lies in [0, 1]). If discount factor γ = 0, implies that the state-action values represent the current reward. If discount factor γ is approaches to 1, then the state-action values represent a (constant) high reward. Moreover, if discount factor γ is 1 (or exceeds 1), then the state-action values may diverge [16]. Therefore, we have chosen discount factor γ = 0.99. Figure 2a, b and c show the convergence performance of mini-batch size in the proposed deep Q-learning algorithm. The mini-batch size determines number of experience tuples in each training step. The mini-batch size is usually based on computational system on which the experimentation is being performed [21]. From Fig. 3, we can observe that our proposed algorithm can converge quickly with smaller mini batch size 32 as compared to larger mini batch size 64. It can be observed that for mini-batch size 32, a high precision is achieved (i.e., more than 90% precision, on an average) for social bot detection. Further, increasing mini-batch size (i.e., greater than 64) will give lower precision.

Fig. 2
figure 2

Comparison of precision value with different learning rate parameter values

Fig. 3
figure 3

Comparison of precision value with different mini batch sizes

Table 4 List of parameters

We consider a set of social attributes, such as tweet-based attributes (i.e., from the content of each user tweet), user profile-based attributes (from a series of each user’s tweets) and social graph-based attributes (i.e., the user establishes the social relationship with their friends and followers), which are denoted as A(T), A(U) and A(G) respectively (discussed in Section 3.2). We compare the social bot detection performance of the proposed Deep Q-Learning (DQL) algorithm with other existing algorithms (such as feed-forward neural network (FFNN), deterministic Q-Learning (QL) and regularized deep neural network (RDNN) [6, 38, 42] in terms of precision, recall and f-measure on three different Twitter datasets (such as The Fake Project dataset, Social Honeypot dataset and User Popularity Band dataset). From Fig. 4, we can observe that all the algorithms can obtain the best social bot detection performance by considering all the three different types of social attributes. When we consider only user profile-based attributes, the social bot detection performance of the proposed DQL algorithm and the Regularized Deep Neural Network (RDNN) [6] has been fallen down from 87% to 84% respectively on precision value. From Fig. 4, we can also observe that by considering only tweet-based attributes, the social bot detection performance of all algorithms is drastically reduced when compared to user profile-based attributes. However, by combining the tweet-based attributes with the user profile-based attributes, the social bot detection performance has been improved approximately 5-9% (i.e., from 5% to 9%) on precision value. Therefore, by combining all the social attributes, the social bot detection performance has been improved approximately 4-10% on precision value. From Fig. 5, we can observe that by combining the tweet-based attributes with the user profile-based attributes, the social bot detection performance has been improved approximately 4-8% on precision value. Therefore, by combining all the social attributes, the social bot detection performance has been improved approximately 3-8% on precision value. From Fig. 6, we can observe that by combining the tweet-based attributes with the user profile-based attributes, the social bot detection performance has been improved approximately 4-9% on precision value. Therefore, by combining all of the social attributes, the social bot detection performance has been improved approximately 5-10% on precision value. Table 5 shows performance of proposed DQL algorithm for 5-fold cross-validation.

Fig. 4
figure 4

Experimental results by considering all possible combinations of social attributes on The Fake Project dataset

Fig. 5
figure 5

Experimental results by considering all possible combinations of social attributes on Social Honeypot dataset

Fig. 6
figure 6

Experimental results by considering all possible combinations of social attributes on User Popularity Band dataset

Table 5 Performance of the proposed Deep Q-learning algorithm for 5-fold cross-validation

Table 6 shows the average execution time for the proposed Deep Q-Learning (DQL) algorithm. The average execution time for the DQL algorithm is computed with one, two and three hidden layers, which are denoted as DQL-1, DQL-2 and DQL-3, respectively. As the number of hidden layers increase, the average execution time also increases. This is due to fact that the DQL consumes more execution time (as number of hidden layers increases) for training target Q-function parameters, such as learning rate, mini-batch size and discount factor.

Table 6 Average execution time for the proposed DQL algorithm with different number of hidden layers

We evaluate the performance of our proposed top-k influential users algorithm by considering the following metrics.

  • Precision: The precision value is defined as \(\frac {|LU_{1}(k) \cap LU_{2}(k)|}{|LU_{1}(k)|}\), where LU1 represents the list of legitimate users ranked by the user influence metric (refer Section 3.4, (10)) and LU2 represents the list of legitimate users ranked based on the user interactions (such as retweets, replies, comments and likes). Further, LU1(k) and LU2(k) represents the top-k influential users in LU1 and LU2, respectively.

  • Recall: The recall value is defined as \(\frac {|LU_{1}(k) \cap LU_{2}(k)|}{|LU_{2}(k)|}\).

Figures 78 and 9 show that the proposed influential users algorithm has the better recall and precision than other existing algorithms, such as degree centrality based radius-neighborhood (DERND) [3], suspected infected recovered (SIR) diffusion model [34] and true-top [46] (on all three different Twitter datasets). We can observe that as k-value increases, the recall values of all algorithms increase. The experiment results of the proposed algorithm shows that the tweet-based attributes and the user interactions are two important factors in order to influence the user. The precision of our proposed algorithm is approximately 80% as shown in Figs. 7b, 8b and 9b. This implies that the proposed algorithm can identify 80% of top-10% influential users, which were influenced by the social bots. Moreover, the influential users may attract other legitimate users and become trustworthy users, which affects the entire Twitter community. The computation of influence score for each user makes the proposed method consume more time. However, the proposed method identifies the most influential users (which are influenced by social bots) in online social networks more effectively. The proposed method is more efficient than the existing True top algorithm because the proposed method is based on various centrality measures that determines the spreading probability of information in Twitter network. From Figs. 7b, 8b and 9b, we can observe that DERND [3] algorithm cannot effectively identify the influential users because this method gives same precision value as number of the influential users increases. The existing SIR diffusion model and DERND algorithm identify the influential users based on only semi-local degree centrality and radius-neighboring degree centrality measures, respectively. Moreover, the users with high degree centrality measure may not necessarily have more number of retweets or comments. We observe that the proposed algorithm performs better than the other existing algorithms in terms of tweet propagation under the influence value of each user tweet IT. Further, the proposed method has a high influence spreading probability based on the influence of user interaction. This means that the proposed algorithm selects the users which are influenced by social bots so that these users cannot further influence the current users. Moreover, these users are also detected as influential bots. Therefore, we present the top-k influential users by ranking the users based on the user influence score which is measured on user interactions and tweet propagation.

Fig. 7
figure 7

Top-k Influential Users on The Fake Project Dataset

Fig. 8
figure 8

Top-k Influential Users on Social Honeypot Dataset

Fig. 9
figure 9

Top-k Influential Users on User Popularity Band Dataset

5 Conclusion

In this paper, we have considered a set of social attributes, such as tweet-based attributes, user profile-based attributes and social graph-based attributes, which are used for detecting the social bots. We propose a deep Q-network based architecture by integrating deep Q-learning model with social attributes for social bot detection based on the Q-value function (i.e., state-action value function). Further, a top-k influential user algorithm has been proposed to identify the most influential users based on tweets and the user’s interactions. Three different datasets collected from the Twitter network, such as The Fake Project dataset, Social Honeypot dataset and User Popularity Band dataset are used to evaluate the performance of our proposed Deep Q-Learning algorithm. We have compared the proposed Deep Q-Learning algorithm with the other existing algorithms, such as Feed-Forward Neural Network, Deterministic Q-Learning and Regularized Deep Neural Network. For social bot detection, the proposed algorithm with the tweet-based attributes achieves average accuracy of 85% on the precision value, the proposed algorithm with the user profile-based attributes achieves average accuracy of 87% on the precision value and the proposed algorithm with the social graph-based attributes achieves average accuracy of 88% on the precision value. Therefore, by integrating all social attributes we have achieved average accuracy of 93% on the precision value. The experiment results show that the Deep Q-Learning algorithm provides 5-9% improvement of precision over other existing algorithms. Further, the experiment results show that the proposed algorithm can identify top-10% influential users with a better precision value (i.e., 80%) when compared with other existing algorithms.