Abstract
Social networks typically include a community structure, and the connections between nodes within the same community are very close; however, the connections between communities are sparse. In this study, we analyze the main challenges behind the problem and then resolve it using differential privacy. First, we choose the Louvain algorithm as a benchmark community detection algorithm for the algorithmic perturbation scheme. We introduce an exponential mechanism that uses modularity as a score. Secondly, by transforming each community into a hierarchical random graph model, and its edge connection probability is noisy by differential privacy mechanism to ensure the security of relevant information in the protected community.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Techniques for identifying the groupings in social networks and then analyzing these groupings for further use have become a key research topic in sociology—this is referred to as “community detection”. Through community detection, we can reduce both the network size and the computational complexity of the algorithm used to process it, thereby improving the accuracy of the analysis. However, most of the methods are performed without privacy protection, and the results of community detection are output in the form of the node-set. In order to protect the privacy of users, it is necessary to protect the privacy of community detection.
Existing social network differential privacy protection schemes focus on a centralized model; they assume that third-party data collectors who possess the information are trustworthy, which is a practical assumption of real-world applications. Therefore, we use a local differential privacy (LDP) model to protect the privacy of social networks, by releasing the sanitized graph at local devices after differential privacy processing. The published data mask the interconnections between nodes and preserve the characteristics of the network structure, enabling researchers to achieve a reasonable balance between the utility of the algorithm and its ability to protect privacy when data are used for feature analysis and data mining.
2 Related Work
2.1 Local Differential Privacy
The differential privacy protection methods applied to social networks can be roughly divided according to two approaches. The first approach focuses on publishing certain types of noisy mining results; these include degree distributions, subgraph counts, frequent graphics patterns, and cut queries [4, 5]. This approach uses the properties of the original graph for general purposes, perturbs the graph, and publishes the aggregated results. It is theoretically proven that noise addition ensures strong privacy preservation. The second approach is to publish the entire social network for general purposes [6, 7]. These methods differ in the intermediate structures used for publishing and the corresponding definitions of differential privacy.
2.2 Community Detection with Differential Privacy
The task of finding node groups using connection relationships in the network is referred to as community detection. The Louvain algorithm [9] is based on multi-level optimization modularity and performs well in terms of efficiency and effectiveness. Moreover, the Louvain algorithm can identify hierarchical community structures; thus, it is considered one of the best community detection algorithms.
Recently, researchers have applied differential privacy for community detection. Nguyen et al. [10] chose the Louvain method as the backend community detection method of the input perturbation scheme and proposed the LouvainDP method. Ye et al. [11] proposed the first LDP-enabled graph metric estimation framework for a variety of graph analysis tasks, which address data correlation among nodes by two efficient perturbation algorithms based on adjacency bit vector and node degree.
3 Preliminaries
3.1 Louvain Algorithm
The Louvain algorithm [9] is based on multi-level optimization modularity, which is efficient to identify hierarchical community structures. According to [9], when assigning node \( i \) to a community, the modularity of the community changes as
3.2 Local Differential Privacy
Definition 1 \( (\upepsilon\textbf{-Differential Privacy}) \):
Given a random algorithm \( {\mathcal{A}} \), let \( {\text{S}} \) represent the set of all output spaces of \( {\mathcal{A}} \) on the two neighbor graphs \( {\text{G}}_{1} \) and \( {\text{G}}_{2} \)(which differ at most one element). The algorithm \( {\mathcal{A}} \) satisfies \( \epsilon \)-differential privacy if:
Theorem 1 (Laplace Mechanism):
For any function \( f:{\text{G}} \to {\mathbb{R}}^{\text{d}} \), the mechanism \( {\mathcal{A}} \)
provides \( \epsilon \)-differential privacy, where \( Lap_{i} \left( {\frac{\Delta f}{\epsilon }} \right) \) represents Laplacian independent and identically distributed variable samples with scale parameter \( \frac{\Delta f}{\epsilon}. \)
Theorem 2 (Exponential Mechanism):
Given a function \( {\text{f}}:\left( {{\text{G}} \times {\text{OS}}} \right) \to {\mathbb{R}} \), where \( {\text{OS}} \) is output space. For a graph \( {\text{G}} \), the mechanism \( {\mathcal{A}} \) that samples an output \( {\text{O}} \) with a probability proportional to \( { \exp }\left( {\frac{{\epsilon \cdot {\text{f}}\left( {{\text{G}},{\text{O}}} \right)}}{{2\Delta {\text{f}}}}} \right) \) satisfies \( \epsilon \)-differential privacy.
Theorem 3 (Sequential Composition):
Let each \( {\mathcal{A}}_{\text{i}} \) provide \( \epsilon_{\text{i}} \)-differential privacy. A sequence \( {\mathcal{A}}_{\text{i}} \left( {\text{G}} \right) \) over the entire graph \( {\text{G}} \) provides \( \sum \epsilon_{\text{i}} \)-differential privacy.
4 Problem Solution
4.1 Differentially Private Louvain Algorithm
We propose a solution to the privacy problems in community detection. Our scheme is divided into two phases. First, the social network is partitioned into multiple independent communities by adopting the community detection algorithm. Then, the privacy of the edges within each independent community is protected.
Algorithm 1 first calculates the sensitivity of the social graph according to its number of nodes (Line 1), and we will explain in detail how to calculate \( \Delta f \) later. Based on the first phase of the Louvain algorithm, each node in the original graph is treated as a separate community (Line 2). Then, according to the node sequence, mining the neighboring nodes/communities of each node (lines 3–4). Then moving the node to different communities and calculating the current modularity gain. Finally, we introduce the exponential mechanism, and the maximum modularity gain is selected, otherwise, it is unchanged (lines 5–9). When the movement of all nodes no longer causes changes of the modularity gain, the first round is completed and the results of the first round of community detection are returned (lines 10–12).
4.2 Edge Probability Perturbation
Community detection makes edge connection within the same community more salient and therefore requires additional protection. We use the same model in [15], which converts each community into an HRG model, then combined the generated HRG model with the edge-connection probability by adding Laplace noise in Algorithm 2.
After we convert the community into an HRG model, we calculate the connection probability of each internal node separately. Further, we introduce the Laplacian mechanism. Subsequently, for any two nodes \( i \) and \( j \) in the community, we find the lowest common ancestor \( r \) and establish a connection between the two nodes \( i \) and \( j \) using the connection probability of the internal node \( r \). Because the inter-community edges are relatively sparse and have low correlations, these direct connections do not provide additional privacy protection.
4.3 Sensitivity Analysis
The sensitivity \( \Delta f \) should be analyzed to complete the selection probability equation, where \( G^{\prime} \) is the neighbor of \( G \). The neighbor of a graph is the graph obtained by changing only one edge. Because the addition or removal operations are similar, only the former is considered in our proof. There are two cases to consider: (1) The connection is an edge inside the community \( P \). (2) The connection is an edge between the community \( P \) and \( S \). Finally, we obtain \( \Delta f \le \frac{3}{m} \).
5 Experiment Evaluation
5.1 Experimental Setup
For comparison purposes, two techniques that are similar to our method were implemented as references. They are the basic differential privacy algorithms for the HRG model, which use the same privacy criteria as [13] and the algorithm perturbation presented in [10]; the results of the previously centralized differential privacy scheme and the algorithm perturbation scheme and our proposed scheme are labeled as “DP”, “MD” and “LLDP” respectively. We performed experiments on two real datasets to evaluate our algorithm. The details of datasets are shown in Table 1.
5.2 Experiment Evaluation of Community Detection
The real social network dataset we chose did not have standard community detection results. Thus, we chose the output of the Louvain algorithm as a standard control because the evaluation of the data had been performed in [15], and the Louvain method had been proven there to provide high-quality results.
The partitioning results of different privacy budgets are shown in Fig. 1. For both datasets, the results given by our algorithm increased with the increase of the privacy budget and gradually stabilized after reaching a certain value. This provides an effective reference for the selection of a privacy budget. In general, the effectiveness of algorithm perturbations is higher than that of input perturbations. The input disturbance used for comparison also followed a similar trend; however, the overall value was low, which is significantly different from its typical value. Newman [8] suggested that the value of \( Q \) in a general network is between 0.3 and 0.7, which can explain a good community structure. Therefore, although the modularity of the results obtained by our algorithm was lower than the real situation, it still retained an effective community structure.
6 Conclusion
We analyzed the privacy problems of community detection can lead to and proposed a differentially private detection procedure based on the Louvain algorithm. Moreover, we proposed to further protect the relational data within the community by converting the individual community into a subgraph of an HRG model and subsequently calculating the edge connection probability by adding Laplacian noise. Experimental results indicated an improved performance on real data.
References
Newman, M.: Social networks. Oxford Scholarship Online (2018)
Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What can we learn privately? SIAM J. Comput. 40(3), 793–826 (2011)
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) Theory and Applications of Models of Computation. Lecture Notes in Computer Science, vol. 4978, pp. 1–19. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79228-4_1
Hardt, M., Roth, A.: Beating randomized response on incoherent matrices. In: Proceedings of the 44th symposium on Theory of Computing - STOC’12 (2012)
Cai, Z., He, Z., Guan, X., Li, Y.: Collective data-sanitization for preventing sensitive information inference attacks in social networks. IEEE Trans. Dependable Secure Comput. 15, 1 (2016)
Proserpio, D., Goldberg, S., McSherry, F.: A workflow for differentially-private graph synthesis. In: Proceedings of the 2012 ACM Workshop on Workshop on Online Social Networks - WOSN’12 (2012)
Wang, Y., Wu, X., Wu, L.: Differential privacy preserving spectral graph analysis. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, vol. 7819, pp. 329–340. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_28
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)
Nguyen, H.H., Imine, A., Rusinowitch, M.: Detecting communities under differential privacy. In: Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society - WPES’16 (2016)
Ye, Q., Hu, H., Au, M.H., Meng, X., Xiao, X.: Towards locally differentially private generic graph metric estimation. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, pp. 1922–1925 (2020)
McAuley, J., Leskovec, J.: Learning to discover social circles in ego networks. In: NIPS 2012, pp. 539–547
Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)
Xiao, Q., Chen, R., Tan, K.-L.: Differentially private network data release via structural inference. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 911–920, ACM (2014)
Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.-L.: High quality, scalable and parallel community detection for large real graphs. In: Proceedings of the 23rd international conference on World Wide Web - WWW’14 (2014)
Acknowledgments
This paper is supported by Inner Mongolia Natural Science Foundation (Grant No. 2018MS06026) and the Science and Technology Program of Inner Mongolia Autonomous Region (Grant No. 2019GG116).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Ma, X., Yang, J., Guan, S. (2020). Differentially Private Social Graph Publishing for Community Detection. In: Park, N., Sun, K., Foresti, S., Butler, K., Saxena, N. (eds) Security and Privacy in Communication Networks. SecureComm 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 336. Springer, Cham. https://doi.org/10.1007/978-3-030-63095-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-63095-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63094-2
Online ISBN: 978-3-030-63095-9
eBook Packages: Computer ScienceComputer Science (R0)