Mining social applications network from business perspective using modularity maximization for community detection

Akbar, Zeeshan; Liu, Jun; Latif, Zahida

doi:10.1007/s13278-021-00798-0

Mining social applications network from business perspective using modularity maximization for community detection

Original Article
Published: 02 November 2021

Volume 11, article number 115, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Social Network Analysis and Mining Aims and scope Submit manuscript

Mining social applications network from business perspective using modularity maximization for community detection

Download PDF

Zeeshan Akbar¹,
Jun Liu¹ &
Zahida Latif²

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

There are different social applications available for different purposes. A lot of information about different fields including politics, sports, business, movie industry, etc., pass by and people are not well informed about most important happenings taking place in the world. Social applications usage varies among people in different parts of the world. A social application in a community may be popular for a particular purpose such as Twitter that may be used as a core application for political use among people in one part of the world, whereas other people may use Facebook, WeChat or YouTube for entertainment and other purposes and may not be aware of the important political changes taking place in the world. Social media usage by businesses can be improved by knowing the maximum usage of particular social applications among different communities of people so that targeted contents including information, advertisements, services and recommendations can be forwarded to them. In this paper, we mine social applications network by extracting knowledge according to the popularity of social applications. r-neighborhood technique is used for removal of edges from social applications network. Users are assigned to different communities based on the modularity scores. Optimal communities are found using divisive clustering approach that partitions the graph until maximum modularity score is achieved. Community detection method is also performed in gephi tool and using k-nearest neighbors graph. The trends of the social applications are analyzed among different communities, and it is seen that r-neighborhood, k-nearest neighbors and gephi tool result in Twitter, YouTube and Facebook as the most popular applications among other social applications. Related contents can be forwarded to the respective communities as well as people of a community defined by popularity of a social application can also be well informed about other happenings in the world such as Twitter and YouTube communities that may advertise about different products, whereas Facebook and YouTube communities are advertised with political news. The modularity function of k-nearest neighbors has the highest value and gives better interpretation of communities than other two techniques.

Analyzing Online Groups or the Communities in Social Media Networks by Algorithmic Approach

Ego Based Community Detection in Online Social Network

Community Detection Methods in Social Network Analysis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The online interaction of users has tremendously increased with the availability of different social applications and networking sites like Facebook, Twitter, WeChat, YouTube, Skype and many others. People with similar behaviors using social applications are linked with each other. The community structure in different networks like Internet, email, transportation, biochemical, citation and social networks shows a set of nodes with dense connections within community and sparse links out of community (Newman and Girvan 2004). The detection of such community structures in network systems is one of the key issues, known as community detection. The structures revealed by detecting communities in different networks are meaningful such as online and contact-based groups in social networks, customers’ groups with similar interests in purchasing from online social networks, clusters of scientists in interdisciplinary collaboration networks. (Fortunato 2010).

Modularity maximization has been one of the most popular methods for community detection over partitioning a network (Newman and Girvan 2004; Newman 2006, 2004; Leicht and Newman 2008). Algorithms for modularity optimization including greedy algorithms such as Fast general hierarchical method, greedy optimization-based agglomeration algorithm, three forms of CNM algorithm by integrating consolidation ratio metrics, heuristic method by optimizing modularity (Newman 2004; Clauset et al. 2004; Wakita and Tsurumi 2007; Blondel et al. 2008), sampling technique using unsupervised method comprising of the proximity estimation and validation of hierarchical group of networks (Sales-Pardo et al. 2007), Eigen spectrum, spectral graph tri-partitioning algorithm, objective function maximization by proposing two new spectral methods, heuristic algorithm Qcut and recursive algorithm HQcut, Kcut spectral methods (Newman 2006; Newman 2006; Richardson et al. 2009; White and Smyth 2005; Ruan and Zhang 2008; Ruan and Zhang 2007; Newman 2013), extremal optimization algorithm (Duch and Arenas 2005), mathematical programming by proposing two unique linear programming and vector programming algorithms (Agarwal and Kempe 2008) and simulated annealing such as cartographic method and Monte Carlo methods (Guimera et al. , 2005a; b; Massen and Doye 2005; Medus. et al. 2005) have been proposed. This quality metric of network has been used as measurement of strength of community structure and is the difference between actual edges within community and expected edges in a randomized graph of same nodes and degrees. The degree is the number of edges connected to a node. This paper focuses on divisive clustering by maximizing graph modularity that add scores of every pair of nodes placed together in a single community.

Divisive clustering algorithms are ‘top-down’ in which all nodes are initially in a single cluster. The cluster splits recursively until each node forms its own cluster. Girvan–Newman algorithm (Girvan and Newman 2002) is a common divisive method that uses edge betweenness, the sparse connections between vertices of different communities, to determine the strength of edges and delete those edges whose has biggest betweenness until algorithm finds no edge for deletion. Another algorithm called Fast-Newman (Newman 2004) takes modularity as an objective function and gives optimal outcome when objective function indicated by Q has the highest value.

When dense clusters are selected which have sparse connections to the rest of the graph, this process is called community identification. In social networks, many overlapping of these communities are present with each node participating in many communities, which reveals the network features. Many approaches exist for community detection. However, the coupled-seed expansion method is effective as compared to many other existing algorithms such as Bigclam, OSLOM, SE, Demon, OMSTMO, LC, Ego-Splitting (Asmi et al. 2021). The modularity-based local community detection methods are widely used but also have some limitations to seed node selection and community instability. Considering the local modularity density and using Jaccard coefficient, the local communities can be formed by core area detection stage and the extension stage of the local communities which also provides efficiency and precision (Guo et al. 2021). A more generalized modularity measure called f-modularity when applied to simulated networks and also to the real-world market networks quantifies the community structure estimating the information existing between discrete random samples and big amount of value space (Guo et al. 2021). A more recent new algorithm which is slightly different from the graph neural network of unsupervised network community detection using modularity optimization has been proposed which is more efficient than fast Louvain method (Sobolevsky 2021). (v) Social networks and their analysis combine many techniques such as K-means clustering algorithm for many novel predictions such as drug target interactions using Bayes network, Naïve Bayes and SVM (Aghakhani et al. 2018).

Social applications network comprises different communication applications that facilitates different purposes including news sharing, marketing, entertainment, relationships, education, merchandising. Users of different social applications have more than one account and use these accounts for different purposes depending on the situation. In this case, Twitter is used for politics, YouTube for videos, WeChat for transactions, Facebook for profile information of products, WhatsApp for personal communications, Skype for meeting/interviews and Instagram for pictures.

In this research, we derive insights from a social applications network by creating a cosine similarity weighted graph of users. The cosine similarity is defined as counting same applications used by users divided by square root of the total applications used by one user multiplied by square root of the total applications used by other user. r-neighborhood technique is used for pruning edges of a network in which edges for a particular value of r are maintained while removing all other edges. It is hard to group a web of customers together present in r-neighborhood graph. Also, to determine whether a customer is present in a single or various communities, we use graph modularity maximization to make decisions about community assignments. Knowledge is extracted by analyzing the trends of social applications in order to forward advertisements, information, services, and recommendations to users. k-nearest neighbors’ technique is also implemented for deletion of edges from social applications network of users. Communities are detected using modularity maximization by divisive clustering approach from r-neighborhood graph and k-nearest neighbors’ graph. Gephi tool is also used to perform modularity maximization. All the three techniques indicate Twitter, YouTube and Facebook that are the most popular applications among other applications. However, modularity function of k-nearest neighbors has the highest value of 0.581 as compared to r-neighborhood and gephi tool which have values of 0.554 and 0.555.

2 Related work

2.1 Community detection

A review of various community detection metrics is presented and an efficient algorithm has been proposed that maximizes modularity density (Q_ds) (Chen et al. 2014). In another study, ten algorithms are re-implemented and evaluated on real-world datasets for community detection in a proposed framework (Wang et al. 2015). A new paradigm called HICODE is proposed to detect hidden community structures in many domains of real world. Experiments show that hidden communities exist in network (He et al. 2018). Community detection acts as a tool for analyzing network data, for example communities in social network defines the nature of social interactions among people.

There are natural divisions that exist in many complex systems and social networks that can be grouped into clusters having strong connections within the clusters and sparse links between them, known as community structure. In context of social applications, web has evolved and became a source of information helpful in analysis of web information using different models and brought intelligence through automation of web services (Cena et al. 2011). Improving recommender systems by describing different approaches used for recommendation and suggesting possible extensions for the limitations of mentioned approaches that can enhance the performance of the recommendation systems by forwarding different services and contents through web automatically (Adomavicius and Tuzhilin 2005). The hidden community structures in a social network that have to be explored in any social network are of great significance. To resolve this problem, graph compression-based community detection algorithms exist (Zhao et al. 2021) where the number of communities in a compressed social network with their initial community seeds is found out simultaneously. Addressing the heterogeneous properties of a vertex and using new probabilistic c-means model that uses attribute and structural similarities. This new model serves like fuzzy community detection that resolves the overlapping community detection problem (Naderipour et al. 2021). For stream graph, the local overlapping communities are detected at the end points of a newly found edge with common communities (Panchal 2021).

Based on the review of different empirical studies about the functionality and structure of a variety of networks, the task of community detection gives an insight into the core structure of networks. Developments in the statistical characteristics of different networks such as clustering, path lengths, degree distributions were mainly focused (Newman 2003). Due to complexity of the internal structure, these networks are defined as complex networks. Mathematical models, used to represent networks, are called graphs. In modern graph theory, the problem of partitioning a graph is also known as community detection (Diestel 2012; Bollobás 1998). Typically, there are two types of graph clustering algorithms with the first type having condensed regions of nodes and second type cluster different graphs using edges and structural characteristics (Aggarwal and Wang 2010). Different solutions include a new efficient, scalable algorithm based on recursive shingling and clustering steps that specifies huge dense subnetworks. A label distribution algorithm that assigns unique label to each community requires linear time for computations and is therefore less expensive (Gibson et al. 2005; Raghavan et al. 2007).

2.2 Modularity optimization

A new method related to the community structure is developed in many social and biological networks for the detection of communities. This new technique is based on the centrality indices to find the boundaries of the communities (Girvan and Newman 2002). This quality function having certain drawbacks like it may be unable to specify modules below a scale depending on the network size and degree. This drawback is validated in different real and artificial biological, technological and social networks (Fortunato and Barthelemy 2007; Wakita and Tsurumi 2007). Modularity is widely used because of the capability of auto-detection of optimal number of clusters by utilizing k-nearest neighbor graph construction and applying distance modularity by modifying Louvain algorithm (Ruan 2009; Shakarian et al. 2013). A graph with high modularity value indicates quality partitions and a good community structure. There are many modularity maximization methods introduced. One of the hierarchical method that maximizes modularity is Louvain algorithm (Adomavicius and Tuzhilin 2005). On large-scale networks, this algorithm runs very fast besides its ease of implementation and also avoids the resolution limit of modularity. A famous scholar Fortunato recommended it as best performance modularity optimization algorithm for community detection (Fortunato 2010).

2.3 Nearest neighbors

Neighborhood graphs model relationships among data points in various fields of machine learning including clustering, semi-supervised learning or dimensionality reduction. The two popular techniques are the r-neighborhood graph in which a specific point is connected to other points for a particular value of r and k-nearest neighbor graph (kNN) in which a point is connected to k-nearest neighbors. kNN is a popular classification technique (Samanthula et al. 2014; Xu et al. 2018; Wu et al. 2008; Cover and Hart 1967) that is used in different fields such as novel Voronoi-based kNN approach in spatial databases that outperforms online distance-based methods (Kolahdouzan and Shahabi 2004), gene classification by combining genetic algorithm and kNN method GA/kNN for assessment (Li al. 2001), and fault detection using kNN method (FD-kNN) in semiconductors is developed to handle nonlinearity in operation data (He and Wang 2007).

3 Community detection from business perspective in social networks

Social network analysis is based on community detection with nodes and edges representing the actors and their social connections, respectively, in a social graph which are commonly web in a dense manner with highly related and yet separated groups from each other. A lot of work has been done in this field of social network analysis, and many methods have been proposed in this regard (Chunaev 2020). The businesses around the world are growing due to social media boom as their target audience join and use these social networks in a regular manner and businesses have to take advantage of these social media platforms like Facebook, Twitter or Instagram to reach their highly targeted potential customers. Social media users and customers log into their accounts regularly with 70 percent of users logging into at least one per day (Pew Research Center 2021) which is the best source of staying on the top of customers’ minds with effective digital marketing strategy.

With Facebook having almost over 2.7 billion active users around 180 countries and Twitter having 1 billion active users per month worldwide, the business owners should embed and understand the relevance of social networks and should design their communication strategies. The rapid growth of personal communities to business communities in online social networks proves it to be a highly cost-effective way of engaging with the customers with a significant value. Targeting the right customers on right social media platforms should be the integral part of any business plan with customer behaviors, demographics and trend analysis being properly worked upon in social media marketing strategy.

3.1 Contributions

Social media applications usage has changed the business dynamics in a tremendous manner, making it the only way forward to the future. This research serves to be a part of the new wave of making smarter business decisions by keeping near to the customers as much as possible. Both internal and external communications are crucial for the survival and progress of the businesses. Following are the contributions of the research:

r-neighborhood, k-nearest neighbors’ methods are used for removal of edges from network.
Modularity maximization using divisive clustering approach is used for the detection of communities.
Gephi tool is also used for detection of communities.
The modularity score using r-neighborhood, k-nearest neighbors, and gephi tool is compared determining which technique results in better detection of communities.
Knowledge is extracted according to popularity of social applications used in each community.
The aim is to improve the scope, quality, richness, depth, interactivity and reach of the targeted contents using social applications popularity in a particular community. The effective decisions can also be taken among different fields such as improvement in business, i.e., forwarding product contents through particular social application maximum usage in a community. Community detection is performed by maximizing modularity using r-neighborhood, kNN, gephi and results are compared.

4 Methodology

4.1 Research framework

This research presents different social applications with different functionalities such as transactions, politics, video and profile information accessed through different mediums including mobile, tablet, computer and iPad for particular purpose. A set of users using those social applications is considered. The similarity between users is determined using cosine similarity, and a network of similar users is constructed. r-neighborhood and k-nearest neighbor’s graphs are constructed by removing unnecessary edges from user similarity network. Communities are detected in r-neighborhood and kNN graphs using modularity maximization by divisive clustering approach. Gephi tool is also used for communities’ detection using modularity maximization. Knowledge is extracted by determining which technique gives better and clear interpretation of communities.

The metadata consist of two types of data. Each social application used consists of functionality, purpose, application number and medium of access. It is also known that which user used the particular application by specifying the application number. In this paper, we have 32 instances representing different social applications accessed more than once through different mediums for different purposes and a list of about 324 usages of these applications by 100 users.

4.2 Cosine similarity weighted graph construction

A user-to-user graph is constructed using cosine similarity matrix that shows how much users are similar to each other in terms of usage of social applications. Consider two vectors (1, 1) and (1, 0) where 1 represents usage of application by a user. The cosine similarity between users is calculated as (Foreman 2013):

Matching common applications usage between the two users divided by square root of total applications used by first user multiplied by square root of total applications used by second user.

Cosine (45) = 1 common application/SQRT {total applications used by first user} * SQRT {total applications used by second user} = 0.707.

This weighted graph using cosine similarity shows each pair of users having either a zero or nonzero value showing the strength of an edge, an affinity matrix.

4.3 r-Neighborhood graph construction

An r-neighborhood graph for set of nodes with vertex set V and edge v, such that the edge v ϵ V to its similar nodes in V for a given similarity, i.e., cosine similarity is constructed. To create adjacency matrix that comprises edges of certain strength for a given set of points x₁, x₂, x₃……x_n, the r-neighborhood graph is G_n, _r: For an edge from point x_i to x_j, A_ij is 1, if Simil (x_i, x_j) ≥ r, for all 1 ≤ i, j ≤ n, i ≠ j. In this case, r-neighborhood graph is produced for r = 0.5, in which edges are removed that has strength between users with similarity less than 0.5.

4.4 k-Nearest neighbors graph construction

In k-nearest neighbors graph, each node is connected to its nearest neighbors for a k value. Given a set of nodes P, the kNN graph is G (P, E), whereas E = {(u, v Simil (u, v)), vϵ NN(u)_simil} where NN(u)_simil is the nearest neighbor for each u ϵ P. In this case k = 5, we construct 5NN graph from the affinity matrix where five edges that have highest affinities are coming out of each node. Adjacency matrix is generated from affinity matrix, l represents the fifth highest affinity of each user, so A_uv is 1, if Simil (u, v) ≥ l, for all 1 ≤ u, v ≤ n, u ≠ v.

4.5 Modularity maximization using divisive clustering

Modularity maximization using divisive clustering is used for community detection. This method assigns scores to each pair of nodes in the r-neighborhood network. Divisive clustering splits the graph into two communities and uses an optimization algorithm for different community assignments in order to get maximum modularity score. The two communities are further divided into four and so on, until modularity maximization stops and gives optimal communities. Mathematically,

$$Q = \sum\limits_{{c_{i} \in C}} {\left[ {\frac{{\left| {E_{{c_{i} }}^{{{\text{in}}}} } \right|}}{{\left| E \right|}} - \left( {\frac{{2\left| {E_{{c_{i} }}^{{{\text{in}}}} } \right| + \left| {E_{{c_{i} }}^{{{\text{out}}}} } \right|}}{{2\left| E \right|}}} \right)^{2} } \right]}$$

(1)

In the above equation, C represents all communities, where c_i refers to a particular community, $\left|{E}_{{c}_{i}}^{in}\right|$ shows edges of nodes inside community c_i, $\left|{E}_{{c}_{i}}^{out}\right|$ are the links to nodes of other community and $\left|E\right|$ represents total edge count in a network.

4.6 Knowledge extraction

Knowledge is extracted by determining the maximum usage of social applications in a particular community so that targeted contents can be forwarded to a community using those popular social applications. The modularity maximization for community detection is also performed using gephi tool. Knowledge extracted is compared with r-neighborhood and k-nearest neighbors’ techniques for the same purpose of application popularity.

5 Results and discussion

This section presents the results and analysis. In Table 1, 32 instances of social applications including Twitter, WeChat, YouTube, Facebook, Instagram, WhatsApp and Skype are accessed for different purposes such as news sharing, merchandise, entertainment, marketing, brands information, educational/professional, greetings/personal.

Table 1 Social applications

Mining social applications network from business perspective using modularity maximization for community detection

Abstract

Similar content being viewed by others

Analyzing Online Groups or the Communities in Social Media Networks by Algorithmic Approach

Ego Based Community Detection in Online Social Network

Community Detection Methods in Social Network Analysis

Explore related subjects

1 Introduction

2 Related work

2.1 Community detection

2.2 Modularity optimization

2.3 Nearest neighbors

3 Community detection from business perspective in social networks

3.1 Contributions

4 Methodology

4.1 Research framework

4.2 Cosine similarity weighted graph construction

4.3 r-Neighborhood graph construction

4.4 k-Nearest neighbors graph construction

4.5 Modularity maximization using divisive clustering

4.6 Knowledge extraction

5 Results and discussion

5.1 r-Neighborhood graph construction

5.2 Knowledge extraction

5.3 Community detection using gephi

5.4 Knowledge extraction using Gephi

5.5 kNN graph construction

5.6 Knowledge extraction using 5NN graph

6 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation