1 Introduction

In recent years, Online Multimedia Social Networking (OMSN) sites facilitate a high degree of user personalization and user intercommunication [4]. The enormous growth of OMSN has also resulted in an increase in their use for significant criminal activities including identity theft, piracy, illegal trading, cyber stalking and cyber terrorism [34]. Cyber criminals are becoming increasingly sophisticated in attempting to use social networking based technology in order to evade detection and perform criminal acts. This happens in virtual environment using social network as a communication medium and it gives attackers to increase the chance of attacking systems [1, 7, 15, 36]. In addition, combating this growing level of crime is challenging due to the ever increasing scale of today’s OMSN.

Due to emerging social network activities, malware trends are now shifting towards new direction. The challenge faced by stealthy malcode is to reach and stay on the vulnerable hosts for a longer period. The longer a threat remains undiscovered in the wild, the more opportunity it has to compromise systems before measures can be taken to protect against it. Further more, its ability to steal information increases the longer it remains undetected on a compromised computer [22].

Recently Nagaraja et al. [20] proposed a new type of stealthy multimedia social network threat called Stegobot. Stegobot was created to show how easy it would be for an intruder to hijack Facebook photos to create a secret communication channel that is very difficult to detect. Stegobot gains control of computers by making users to open or download malicious images. Instead of contacting the botmaster directly, it takes the advantage of social network activity to communicate with the botmaster.

Social network based security has drawn lots of research interest and various systems have been developed with the support of machine learning techniques. Recently proposed techniques [2, 26, 32] aim to detect spamming in online social networks. Cao et al. [5] have proposed a method to detect fake accounts created in large scale online social networks. Stein et al. [31] have proposed the Facebook Immune System (FIS) to protect the users. They have built an adversarial learning system that performs real time detection of malicious activities from the regular activity on Facebook’s database. This system contains historical data related to Facebook user’s malicious activities. But it is very difficult to identify and block the socialbots. Socialbots are designed to appear as a normal Facebook user. Viswanath et al. [33] have proposed graph theoretic techniques for defensing against sybil attacks, which is an alternative to machine learning based detection learning systems. But these methods are not efficient enough to detect Stegobots. This reveals a new security challenge in the domain of multimedia social networks.

In our previous work [23], we proposed a multilevel detection mechanism that analyzes user profiles and finds out the malicious ones. The idea behind this method is that each profile content with huge volumes of image data is to be analyzed independently. It is an extremely difficult task to analyze a huge amount of image data. Also the real time deployment of this technique is a challenging one. The major challenge with most of the existing techniques is the detection of socialbots using single view features. Since Stegobot profiles in OMSN look like genuine profile, these approaches cannot detect new kind of bots [20]. Due to these difficulties, it would be desirable to develop additional methodologies to overcome these issues. In this paper, we present a new technique for analysis and detection of Stegobot network traffic. Multiple aspects of social attributes proposed in this paper help to explore the hidden communication structure of botnet. Further, we enhance our method towards OMSN security that stands against Stegobot profiles which mimic genuine users. Also, this work attempts to help network detectives and forensic analysts to understand the structure of Stegobot and the key profiles inside the malicious network.

The rest of this paper is organized as follows: Section 2 describes Stegobot network along with threat model analysis. Section 3 introduces our proposed method for social botnet detection. The experimental study, observations and performance are presented in Section 4 and Section 5 concludes the paper.

2 Analysis of Stegobot network

This section briefs a conceptual overview of Stegobot Network (SbN), and gives a brief outline of the adversarial objectives behind maintaining such a network. This is followed by a short note about the SbN design goals and its construction details.

2.1 Design goals of Stegobot

Stegobot Network is a set of bots that are owned and maintained by a human controller called botherder. A SbN consists of three components: the botmaster, Stegobot and the command and control channel. Each Stegobot controls a profile in a targeted OMSN, and is capable of executing commands that results in operations related to social interactions. These commands are either sent by the botmaster or predefined locally on each Stegobot. The data stolen by the Stegobots are called botcargo and are sent back to the botmaster through the covert channel.

Botnet command and control channels have traditionally been carried over protocols such as IRC (Internet Relay Chat) or the various P2P (Peer To Peer) networks. The ability to coordinate and upload new commands to bots gives the botmaster high power when performing criminal activities like sending spam, perform Distributed Denial of Service (DDoS) attacks and phishing etc.

A peculiar property of Stegobot [3] is the design of the communication channel between the bots and the botmaster. The goal of Stegobot is to introduce probabilistically unobservable communication channels connecting the bots and botmaster. If the command and control communication is unobservable, then botnet detection can be more difficult than the detection of normal botnets [8, 17, 25].

Figure 1 shows a conceptual model of Stegobot. Each node in the OMSN represents a profile. The Stegobots are marked in black. Infiltrated profiles are marked in gray. Edges between nodes represent social connections. The dashed arrow represents connection requests. The small arrows represent social interactions [13, 21].

Fig. 1
figure 1

Graphical representation of Stegobot network in OMSN

Stegobot is designed to infect users connected to each other via social links such as an email communication or an online social network that allows friends to share images. The propagation of bot binaries take place via social malware attacks [22]. These bots contain pre-programs to perform malicious activities such as harvesting email addresses, stealing passwords, credit card numbers and keylogging. Alternatively, in a more flexible design, the bots execute commands received from the botmaster. In Stegobot, the images shared by social network users are utilized as a medium for building up the command and control channel. Specifically, image steganography is used to setup a communication channel within the social network, which serves as botnet’s command and control channel. The information exchanged between bots must be transferred only using steganographic channels.

2.2 Behavior analysis

The behavioral characteristics [2, 11] of profiles are analyzed in order to distinguish normal profile activities and malicious activities. Botmaster can obtain their malicious profiles in any one of the two ways: compromising existing genuine profiles and creating fake profiles. In the first case, the botmaster takes control of a legitimate profile following attacks via social links such as online multimedia social network or an e-mail communication that allows users to exchange email [22]. This technique is attractive for an attacker, because legitimate user already has a significant number of friends. Also these profiles are trusted or at least known to their friends, social malware attacks from these accounts are more likely to succeed. Additionally, profile user may lose control over these accounts at any time after successful infection of Stegobot. Alternatively, botmaster may create a new fake account, in the sense that they do not represent a real user in OMSN. Despite the use of mechanisms like user account registration, authentication and CAPTCHAs(Completely Automated Public Turing test to tell Computers and Humans Apart) still it is easy to automate and botmaster can efficiently create large number of fake profiles.

It is very important to analyze the Stegobot profile covert communication patterns and its different characteristics of compromised profiles. Legitimate profiles and Stegobot profiles have different goals in the OMSN system and also differs in behavior to achieve their purposes. It is important to analyze a large set of attributes that reflect profile behavior and investigating their relative discriminatory power to distinguish between malicious and normal profiles. In this paper, we considered three set of features, namely image features, profile features and social network features for the detection of Stegobot.

3 Stegobot detection

The goal of the host based detection method is to detect the presence of Stegobot command and control (C &C) traffic on the monitored host. A functional block diagram of the proposed method is shown in Fig. 2. To facilitate the Stegobot detection as well as the detection of stego communication between two profiles, we use social network features, graph based features and image content features. Based on the Stegobot policy on OMSN, various graph based features are extracted from users social graph and most recent activities. Traditional classification algorithms are applied to detect suspicious behaviors of Stegobot.

Fig. 2
figure 2

Functional block diagram of the proposed Stegobot detection

3.1 Preprocessing

To build a Stegobot detection framework, we have to create a standard social profile object model within the social network. The object model of a profile is a schema containing most common features of social network user account. A web crawler using different social network APIs are utilized to collect real data set from publicly available information in OMSN user account. These techniques may have challenges due to their large volume of data, although they significantly improve the time taken to classify a profile object as infected profile or legitimate profile.

3.2 Feature extraction

The attributes consist of individual characteristics of user profiles and behaviors. Both genuine and bot profiles have certain kind of patterns. For example, genuine users have many legitimate friends, while bots are fake profiles and never reply to comments. The bot profiles may have many attractive images and celebrity photos and attract other users to accept itself as a friend by initiating friend request or image sharing. The selection of features for detection of Stegobot is based on the following assumptions: Any infected profile has equal probability to send request or to infect any susceptible profile in a social network community. The vulnerable profiles are infected when it has friendship with or follow the bot infected profile. It is suitable to investigate the various features like vulnerable image contents, social graph and profile based features. The extracted features are based on basic social network, steganography and graph theory concepts and their underlying design principles.

3.2.1 Image features

Stegobot transmits the malicious images and stego images with confidential information to botmaster using the image sharing behavior of multimedia social networks. The Stegobot communication channels are built by leveraging image steganography and the social image sharing behavior of users. In order to achieve better detection accuracy, the attributes are derived from social network based features and image features in a combined manner.

Image attributes capture specific properties of the image uploaded by the user. Each profile has a set of images in the user account, each image with its features that serve as a main parameter for detection of malicious images. In particular, each image is characterized by its stegnographic features along with its size, quality, number of views and number of likes, number of times the image is selected as a favorite. The Stegobot communication channels are built by leveraging image steganography and the social image sharing behavior of users. In order to achieve better detection accuracy, the attributes were derived from social network based features and image features in a combined manner.

For each profile we select images having maximum size, high quality, maximum number of likes and favorite for many users and extract the features based on the steganalysis point of view.

For each profile and its associated images in the data set, we extract the features based on the steganalysis point of view. The choice of the image steganography based features for Stegobot identification is determined based on our previous positive experience in the domain of image steganalysis. This detection can be extended in a straightforward manner to color images by considering the color image as three gray scale images and fusing the results from each channel. All the 24 content features utilized in our previous work [23] are considered as image based features in this work.

Fridrich et al. [10] have investigated the steganalysis problem using discrete cosine transform based features. Natarajan et al. [24] later proposed different set of features based on the contourlet transform and subband coefficient modeling using Gaussian distribution. These methods are used towards identifying and extracting secret message from stego images embed by the well known image steganographic algorithms. In this work, we focus on detection of malicious content inside the image and bot binaries within the image. With this objective, each image is characterized using a total of 24 features calculated directly from the carries object. The detailed process of calibration used to construct image content based features are reported in [10, 17, 25]. These steganalysis based features are efficiently utilized for detection of Stegobot communication through images. The image based feature set

$$\begin{array}{@{}rcl@{}} F^{(I)}&=&<lm_{k},hm_{k},SSIM,H,H_{R_{cd}},H_{C_{cd}}, D_{Row_{KL}}, D_{Col_{KL}},I_{row},I_{col},H_{\_Row},\\ &&H_{\_Col}, dc_{l},dc_{h},dc_{m},wc_{l},wc_{h},wc_{m}>\end{array} $$
(1)

where l m k and h m k (k=1,2,3,4) are the moments of low frequency and high frequency contourlet subbands of the image respectively. SSIM is the mean structural similarity of the whole image. H denotes the image entropy, \(H_{R_{cd}}, H_{C_{cd}}\) are row wise and column wise conditional image entropy. \(D_{Row_{KL}}\) and \(D_{Col_{KL}}\) are row wise and column wise K-L divergence respectively. Row wise mutual image entropy and column wise mutual image entropy are denoted by I r o w ,I c o l . \(H_{\_Row}\), \(H_{\_Col}\) are row wise and column wise Rényi entropy. Low, high and medium DCT frequency subbands features d c l ,d c h and d c m . w c l , w c h and w c m are low, high and medium wavelet based frequency subbands.

3.2.2 User profile features

The profile based analysis gives individual characteristics of profile behavior. Malicious profile users spend more time doing activities such as sending abnormal requests to random profiles, adding more images as favorite and abnormal sharing of images to others. Liben-Nowell et al. [18] proposed link prediction method using machine learning techniques to predict links between users in different online social networks. Stringhini et al. [32] proposed a technique for spammer profiles detection by using supervised learning algorithms. Altshuler et al. [27] have investigated different users properties, such as origin and ethnicity, inside the social relationship, which helps to predict malicious user in OSN. Recently, Fire et al. [9] used the online social networks topological features to identify fake users in different online social networks. As part of this research we proposed a method for investigating the social network users which of their friends might be a malicious profile. Our proposed features are based on the various Stegobot characteristics and its connection properties between OSN users. This type of problem is to some degree similar to the problem of investigating Stegobot profile in multilevel analysis, studied by Natarajan [23].

In this study, we also used different type of graph relationship between users, similar to the study of Buscarino et al. [4] and Mislove et al. [33]. In addition, our study contrasts previous studies [37], because we construct our feature set by using only variations of the data collected in real-time from the Stegobot attacker activity point of view. Hence the following six features used in our previous work [23] are included based on the profile behavior analysis.

  • Following/Follower Ratio(R).

  • URL ratio(U).

  • Message Similarity(M).

  • Ratio of Trusted Friends(RF).

  • Message/Image Shares(S)

  • Female/Male Friend Number(N).

The social network profile based feature vector is denoted by

$$ F^{(S)}=<R,U,M,RF,S,N> $$
(2)

3.2.3 Social graph based features

We have utilized image features and user profile features alone to detect stegobot. But that cannot identify Stegobot communications. Hence social graph features focusing on interaction between profiles are combined to form the feature set. The social network attributes capture the social graph based relationships established between profiles via image response interaction, which is one of the several possibilities in online multimedia social network [4, 19]. The idea is that these features might capture specific interaction pattern that could help to differentiate genuine profiles and bot profiles. The following profile features are extracted from the image response user graph, which captures the level of interaction of corresponding profiles. The features include cluster coefficient, betweenness, reciprocity, assortativity and page rank.

Cluster coefficient(transitivity)

The clustering coefficient of social graph measures the degree of interconnection of a network. In other words, it measures the tendency of two nodes that are not adjacent but share an acquaintance, to get themselves in contact. High clustering coefficients mean the presence of high number of triangles in the network. The clustering coefficient c(i) for a node v i is the number of directed links divided by the number of possible directed links that could exist between the nodes neighbors. If a node v i ’s neighbors have n directed links between them, then the clustering coefficient

$$ c(i)=\frac{n}{d_{i}(d_{i}-1)} $$
(3)

where d i is the number of links the node v i has to other nodes.

It is well known in literature [11], that social network shows high values of clustering coefficient since they reflect the underlying social structure of contact among friends. It provides the possibility of computing both global clustering coefficient for any social network and the local clustering coefficient of any given node.

Betweenness

Betweenness is a measure of the nodes centrality in the graph , that is nodes appearing in the large number of shortest paths between any two nodes have higher betweenness than others. The betweenness centrality of a node v can be defined as

$$ B_{C}(v)=\sum\limits_{s \neq v \neq t} \frac {\sigma_{st}(v)}{\sigma_{st}} $$
(4)

where σ s t is the number of shortest paths from s to t and σ s t (v) is the number of shortest paths from s to t that pass through a node v.

Modularity

When examining communities in networks, we require an objective metric to evaluate how good a particular network into communities. Modularity is one measure of the structure of networks or graphs. It is used to measure the strength of division of network into modules. It is defined as

Q= Number of edges within communities - expected number of such edges

Reciprocity

A traditional way to define the reciprocity R r is using the ratio of the number of links pointing in both directions L <−> to the total number of links L.

$$ R_{r} = \frac {L^{<->}}{L} $$
(5)

With this definition, R r =1 for a purely bidirectional network and R r =0 for a purely unidirectional one. Real networks have an intermediate value between 0 and 1.

Assortativity

It is a measure of the likelihood for nodes to connect to other nodes with similar degrees. The assortativity A r is defined as the Pearson correlation coefficient between the degrees of all pairs of nodes connected by an edge. Thus, the assortativity coefficient A r ranges between -1 and 1. A high A r means that nodes tend to connect to nodes of similar degree, while a negative coefficient means that nodes likely connect to nodes with very different degree from their own [6, 19].

Page rank

The page rank (P R) algorithm is commonly used to assess the popularity of a webpage. The computed metric, which we refer to as user rank, indicates the degree of participation of user in the system through interactions via image responses.

The social graph based feature vector is denoted by

$$ F^{G}=<c(i),B_{C} (v),Q,R_{r},A_{r},PR> $$
(6)

3.3 Classification

The Weka Machine Learning Java library is used to build the classifier. During the training phase, all generated instances are labeled as Stegobot and normal indicating the type of communication channel. Labeling is based on prior knowledge of Stegobots used to generate social network traces.

Different classification methods, such as Naive Bayes, Support Vector Machines (SVM), Decision Tree and k-Nearest Neighbors (k-NN) are used to identify the bots. Among these algorithms, Bayesian classifier has the best performance for several reasons. First, Bayesian classifier is noise robust. Another reason that Bayesian classifier has a better performance is that the class label is predicted based on user’s specific pattern. A Stegobot detection probability is calculated for each individual user based on its behaviors, instead of giving a general rule. Also, Bayesian classifier is a simple and very efficient classification algorithm.

In order to evaluate the accuracy of proposed detection algorithm, 10-fold cross validation technique is used to split the data set into ten random subsets, out of which nine sets are used for training and other one is used for testing. The same procedure is repeated until all ten subsets have been utilized as the testing set exactly once. The results reported are an average of the results of all ten runs.

3.4 Stegobot detection algorithm

In this section, we formally introduce the generalized algorithm for detection of compromised profiles in OMSN. Let G=(U,E) be a time stamped multimedia social network. U be the set of all profiles of users, who shared image responses and posts until a certain instant of time. E denotes the set of all relations between the profile users. We denote real time social network dataset as D. Three different views of social profiles namely image content based features F (I), social profile activity based features F (S), social graph based features F (G) as detailed in Section 3.2 are considered. Ideally, the different views of social profiles are conditionally independent. For a given unknown user profile u k U corresponding to the k th instance \(I_{u_{k}} \in D\) is represented by the feature vector \(\textbf {F}_{u_{k}}=(F_{u_{k}}^{(I)}, F_{u_{k}}^{(S)}, F_{u_{k}}^{(G)})\). The classification problem can be formulated as follows: Given a new user profile u i U represented by the calibrated feature vector \(\textbf {F}_{u_{i}}\), the decision maker determines the class C to which the user belongs to. Namely,

$$ U=(F_{u_{k}}^{(I)}, F_{u_{k}}^{(S)}, F_{u_{k}}^{(G)}) \rightarrow C=\{\text{Normal, Stegobot}\} $$
(7)

We select any efficient machine learning algorithm, and implement the decision maker based on that. Further, \(p_{u_{i}}^{N}\), \(p_{u_{i}}^{B}\) are the calibrated probabilities for user u i to be predicted as normal, Stegobot respectively. For completeness, the overall working mechanism of the Stegobot detection algorithm is shown in Algorithm 1.

figure c

The proposed Stegobot detection algorithm works as follows: Given social network G, first choose set of profiles U that needs to be tested. The critical part of the algorithm is the construction of feature vector as discussed in Section 3.2.

For each profile, suitable attributes (features) are selected on which the classification algorithm is applied. The extracted features are passed to the selected classifier in order to construct trained model Q. The decision maker calibrates the probabilities \(p_{u_{i}}^{N} , p_{u_{i}}^{B}\) respect to the prediction of normal and Stegobot using training model Q. This process is repeated until the set U becomes empty. Initially, this algorithm analyzes all the |U| nodes as an individual user. In the detection process |C| users are detected as compromised ones. For each user, feature vector is computed with time complexity d, where d is the dimension of the feature set. From the observation |C| is at most as large as |U|, the worst case time complexity is d∗|U|.

4 Experiments and results

In order to evaluate the performance of the proposed technique it is deployed in a real time environment. We conducted different experiments which are detailed below:

4.1 Dataset collection

There are many challenges and practical difficulties in obtaining real world datasets of Stegobot network traces. Some of the publicly available datasets consist of information collected from social spam bots, which may not reflect real Stegobot network behavior. However, in order to evaluate our system we attempt to collect considerable amount of social bot network traffic traces from different sources. We used Barracude labs [38] dataset (D1) for large scale infiltration networks. Barracude labs is currently working on various security and privacy applications in social networks. We have collected ICWSM-13 [40] dataset (D2), which is released as openly available community resources for social media researchers. Further, we captured the bot and normal traces (D3) based on the experimental guidelines [39] and similar works [3, 20] in public network.

Our raw-image database contains 5150 color images and they are never compressed. Due to the storage in social networks, we have converted these raw images into JPEG images and build vulnerable and steganalysis databases for experimental purposes.

In bench-marking image steganographic techniques, one of the important components is the image dataset employed. Our goal was to use a dataset of images which would include a variety of texture,qualities and sizes, at the same time we wanted to have a set which would represent the type of images found on public domains. Obtaining images by INRIA Holidays dataset [16, 28], would provide diverse subjects such as natural scenes and artificial objects. Although the original images were color images of different sizes, all images have been changed into 512 X 512 gray-level images and saved as JPEG files with a quality factor 85 with the JPEG compression. The overall qualitative assessment of images have been carried out using automated qualitative assessment of multi-modal distortions method [12]. In each experiment,an image is adapted to OSN constraints, and then the infected image data set is created by NERGAL tool [29], YASS scheme [30] and F5 [35]. The infected image is uploaded into some of the social network profiles through user account, and then downloaded from another account. Finally, downloaded images are used for the experiments. For each method, different image data sets with different payloads and file types are generated. Table 1 provides the general information about datasets used in this paper.

Table 1 Summary of Datasets

4.2 Feature evaluation

Feature analysis is very important to construct effective feature vector to design a detection technique with high efficiency. The main challenge of this work is to identify the most influential features for detecting malicious profiles. The detection of Stegobot on OSNs is a form of adversarial challenge between normal and malicious profiles. Thus, it is important to understand whether different sets of features could lead our approach to accurate detection.

In order to study the importance of selected attributes we use well known feature selection method available on Weka [14]. We assessed the relative power of the 36 selected features [23] in discriminating Stegobot profile and legitimate profile by applying χ 2 test. Table 2 presents the 10 top most features selected from the feature vector according to the ranking estimated by χ 2 test. From Table 2, one can note that the most important attributes are the clustering coefficient and betweenness. The importance of the attributes highlights an important aspect to detect Stegobot communication. Sometimes Stegobots mimic as a real OMSN user and in this scenario it may escape from the content based analysis. Also it is observed from the ranking, URL ratio is one of the significant features and it is true since Stegobots are most interested in sharing URLs, similar messages and images. We can also note that contourlet high frequency subbands, entropy, DCT high frequency subbands are efficient in the detection of Stegobot profiles.

Table 2 Summary of selected OMSN based feature attributes

The detection of stegobot on OMSNs is a form of adversarial challenge between normal and malicious profiles. Thus, it is important to understand whether different sets of features could lead our approach to accurate detection scheme. We estimate the classification results considering different subsets of 10 features that occupy adjacent positions in the ranking are used. Table 3 presents the number of features from each set(image,profile,graph) in the top 10,20,30 and 40 most discriminative features according to the ranking estimated by chi-square test.

Table 3 Number of features at top positions in χ 2 ranking

4.3 Performance evaluation

In the evaluation of Stegobot detection algorithm, we perform cross-validation using the datasets D1, D2 and D3 as described in Section 4.1. Different characteristics of Stegobot are calibrated and translated into feature vector. The proposed Stegobot detection algorithm is evaluated with many of the classifiers in the Weka tool using default values for all parameters. In this work, the efficiency of four different families of classification methods; Naive Bayes(NB), Decision Trees(DT), k- Nearest Neighbor(k-NN) and Support Vector Machine(SVM) with respect to accuracy have been compared.

More robust models can be achieved by locating k, where k>1, neighbors and letting the majority vote decide the outcome of the class labeling. A higher value of k results in a smoother, less locally sensitive function. The nearest neighbor classifier can be regarded as a special case of the more general k-nearest neighbors classifier, hereafter referred to as a k-NN classifier. The essential assumption of the method is that malicious profiles are surrounded by malicious and infected profiles.

During the practical usage of our proposed system, we build classifiers from different families using the datasets D1, D2 and test the results using dataset D3. The results in terms of true positive(TP), true negative(TN), false positive(FP) , false negative(FN) and Accuracy(ACC) are investigated in different scenarios. Accuracy is the most famous metric in machine learning, which is used to measure correct classification of all instances to their actual class. The top 4 classifiers and their performance are presented in the Table 4. It can be seen that, the Naive Bayesian classifier has the best overall performance in comparison with other techniques.

Table 4 Computed performance metrics for different classification methods with Facebook dataset and Flickr dataset

To perform the cross validation, we have conducted additional experiments with the same data sets. First we train the same set of classifiers using the public network dataset(D3). Then the trained models are tested against datasets D1 and D2. The results of these experiments are summarized in Table 5. As can be seen, again Naive Bayesian classifier gives the best results, with an average of 96.26 % and a 4.25 % false positives.

Table 5 Computed performance metrics for different classification methods with Barracuda labs dataset and ICWSM - 2013 dataset

In addition, we have performed a set of experiments to assess the generalization ability of our proposed method with Naive Bayesian classifier. We again trained the Naive Bayes classifier multiple times with randomly selected instances from all the three datasets. Every time, we left out specific type of social network traffic from training. For example, first we train the classifier while leaving out all Facebook traffic from the training dataset, and then we test the obtained trained classifier on the Facebook traffic. The results of this set of experiments are reported in Table 6. The results show that the proposed method can detect almost all types of social network profiles. The best algorithm, Naive Bayes, even without any tuning reaches almost 96 % true positive and 98 % true negative. All other classification methods perform well, with significant true positive rates and false positive rates. This attests the effectiveness of our proposed method.

Table 6 Computed performance metrics for Naive Bayes classification method with different dataset

Some times, accuracy does not provide any details about misclassification of instances where a class tends to be misclassified. In Stegobot detection, it is worse if a genuine profile is being wrongly classified as a bot than if a bot is misclassified as a legitimate profile. Therefore, we used the precision which is used to determine the fraction of actual positives in the group of instances classified as positives. Precision will be high if the number of correctly classified infected profile is high and the number of false positives is low. Recall measures how many elements of a class are correctly classified. In this case, we use it to measure how many of all actual bots are detected. In addition, F measure is used, which is the harmonic mean between precision and recall.

To compare the performance of classification with the top 10 combined features and that of individual features some experiments are done and the results are shown in Table 7. Experimental results show an increase in detection rate in combined feature set based detection. This leads to the conclusion that, doing classification in multi view feature set can increase the botnet detection rate.

Table 7 Performance comparison of individual feature with combined feature set

Finally Receiver Operating Characteristic (ROC) analysis is used to measure the goodness of the proposed algorithm. The x-axis represents a cumulative false positive rates while the y-axis represents the cumulative true positive rates. False positive rate also known as the false alarm ratio.

Figure 3 shows the comparison of the accuracy of individual feature set and combined feature set. From the results, it is clear that combined feature based Stegobot detection algorithm gives better accuracy than other single feature set.

Fig. 3
figure 3

Performance variance of Naive Bayes classifier with respect to different feature set

Further, in the multi view feature analysis we identify the parameters which are responsible for evasion of detection. Then, the change in values of these parameters are correctly identified by the help of Navie Bayes classifier. The main idea is to exploit the strength of selected parameters and Navie Bayes to obtain a robust detection mechanism which ensures resiliency of evasion of detection.

The experiments ended with very promising results, showing that the model of Stegobot detection is possible with some scalability challenges. Further research should concentrate on deploying in real time online social network.

5 Conclusion

Stegobot detection differs from the detection of traditional botnets since Stegobot traffic does not introduce new communication endpoints between bots. In this paper a new approach to detect Stegobot profiles as well as Stegobot communication in online multimedia social network is presented based on multiview features. This method uses a composition of multimedia image content based features, profile based features and social graph theoretic features to identify profiles that experience a sudden change in behavior. This detection can be extended in a straightforward manner to color images by considering the color image as three grayscale images. Traditional classification algorithms are applied to detect suspicious behaviors of Stegobot. Data collected from four popular multimedia social networking sites Facebook, Flicker, Twitter and Google+ are utilized for this study. Our study shows that some of the identified attributes are significant for classification of data and can be useful for a network forensic analyst to develop better prevention strategies. So far in this work, machine learning algorithms have been used for the detection of Stegobot. The main focus of this work is to detect whether a user profile in a multimedia social network community is a bot profile or not. The social graph based features could identify Stegobot communication. In future, large scale infiltration in OMSNs is a major cyber threat and defending against such threats is a challenge.