Abstract
Graph clustering is a central and fundamental problem in numerous graph mining applications, especially in spatial-temporal system. The purpose of the graph local clustering is finding a set of nodes (cluster) containing seed node with high internal density. A series of works have been proposed to solve this problem with carefully designing the measuring metric and improving the efficiency-effectiveness trade-off. However, they are unable to provide a satisfying clustering quality guarantee. In this paper, we investigate the graph local clustering task and propose a End-to-End framework LearnedNibble to address the aforementioned limitation. In particular, we propose several techniques, including the practical self-supervised supervision manner with differential soft-mean-sweep operator, effective optimization method with regradient technique, and scalable inference manner with Approximate Graph Propagation (AGP) paradigm and search-selective method. To the best of our knowledge, LearnedNibble is the first attempt to take responsibility for the cluster quality and take both effectiveness and efficiency into consideration in an End-to-End paradigm with self-supervised manner. Extensive experiments on real-world datasets demonstrate the clustering capacity, generalization ability, and approximation compatibility of our LearnedNibble framework.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Graph is a powerful framework to model the complex relations and interactions of our world [1,2,3,4]. The analyzing methods on graph have become fundamental and crucial. Graph clustering serves as an admiring and essential technique for wide-range applications, including community detection [5,6,7], image segmentation [8,9,10,11], protein grouping [12, 13] and especially spatial-temporal system [14,15,16], thus has drawn increasing attention during the recent years in computer science together with plenty of various applied research areas. However, the graph clustering faces efficiency and effectiveness challenges from both practical and theoretical aspects.
Locality
One fundamental consensus towards efficiency challenge is that the graph clustering methods should be local w.r.t. some given seed node [17,18,19]. The locality has two aspects of meanings: process and result, which are usually coupled together for their implicit consistency. The process aspect means the graph clustering algorithm should only access the data in the neighborhood of the given seed node. The requirement for the result being local means the graph clustering algorithm should output the result nodes set in a small region around the seed node.
Framework
Spielman and Teng [17, 20] are the first to study the graph local clustering problem. They propose a two-phase framework Nibble to guarantee the locality and performance of the graph clustering algorithm based on the analysis of Lovász and Simonovits [21, 22]. We introduce the important Nibble framework in detail in Section 2.3 and go through it quickly here. In the first phase, a power series \(\left\{\mathbf P^k{\vec1}_s\right\}\), which represents the transition probability from seed node s to other nodes with k steps, is calculated. In the second phase, the standard sweep operation is conducted to output the node set with the first or global minimal conductance as the clustering result. The effectiveness of Nibble is guaranteed by the theoretical bounds on cluster quality [18, 23] and illustrated by empirical evaluations on various real networks [24,25,26,27].
Efficiency
Nibble has been improved from two main aspects: measuring metric designing and measurement computing, leaving the sweep process as the standard and static module. Andersen, Chung, and Lang [18] propose the algorithm PRNibble by assembling the K-hop transition probabilities with the weight formed as α(1 − α)k determined by the teleport constant α, which is a widely used node proximity metric named Personalized PageRank [28]. Besides, PRNibble [18] also provides an efficient local operation to compute it named PR-Push and achieve a better efficiency and effectiveness guarantee. Chung [24] extends the PRNibble with a physics-innovated metric called Heat Kernel PageRank(HKPR) whose weights are formed as \(\frac {e^{-t}t^{k}}{k!}\), where the parameter t is the temperature constant controlling the distribution shape. With the improved theoretical bound, Chung and Simpson [29] propose a randomized algorithm ApproxHK with sampling random walks weighted by the heat kernel coefficients to get the approximated HKPR and propose a sub-linear solution ClusterHKPR for the graph local clustering problem. To optimize the HKPR computation, Kloster and Gleich [25] attempt to get rid of the heavy and randomized Monte-Carlo process in the ApproxHK by forming the computing problem as a linear equation solving problem. They propose an efficient deterministic algorithm HK-Relax to solve the proposed linear equation by using the coordinate relaxation technique and get a faster and better algorithm for graph local clustering. Recently, Yang et al. [30] point out that though the absolute error bound in HK-Relax is not the best choice for the graph local clustering task, under the perspective of the sweep process on the measure with degree normalization. Based on this observation, they propose the algorithm TEA to approximate the HKPR vector from the seed node with relative error bound, and achieve a better efficient-effectiveness trade-off. Wang et al. [27] optimize the Push operation in several works mentioned above by introducing the randomization and propose an efficient graph propagation framework AGP. AGP can simulate any weighted message passing schema and achieve state-of-the-art graph local clustering task performance with the HKPR weights.
Evaluation criterion
Though the techniques mentioned above achieve better efficiency and effectiveness-efficiency trade-off, it is ambiguous and difficult for us to evaluate their effectiveness for the following two main reasons: 1) The algorithms are not developed to optimize the effectiveness, and 2) the effectiveness criterion and metric are not well defined. We first talk about the latter evaluation criterion problem by introducing several metric of cluster quality. Girvan and Newman [1] bring the concept of the cluster into graph research to represent the nodes set in such graph organized into internal densely linked but external loosely connected groups, which is also known as communities in network science [3]. To characterize the intuition of cluster concept, several scoring functions defined on graph structure have been proposed [6, 31,32,33,34,35], among which the modularity and the conductance are two essential metrics. modularity [34] metric evaluates the difference between the sub-graph with regard to the cluster nodes and the random graph with the same statistic properties. The conductance [36,37,38] metric directly describes the initial concept of the cluster with the Raleigh quotient form, which is formalized in Definition 3. Yang and Leskovec [39] compare a series of existing metrics on 230 real-world graphs with ground-truth cluster labels by defining sense-making and convincing criterion on goodness and robustness. They point out the conductance metric achieves the best performance during structural defines for graph clusters. Since high-order structures have advantages on revealing the real communities [40, 41], motif-conductance has been proposed recently and studied with a series of work [42,43,44,45,46]
Besides the cluster quality metrics based on graph topology, Emmons and Kobourov [47] propose the concept of information recovery metrics based on the Shannon entropy [48] defined on the ground-truth label of the graph cluster, including Adjusted Rand Index (ARI) [49] and Normalized Mutual Information (NMI) [50].
Optimization purpose
In practice, graph local clustering works always take conductance and F1-Score (or ARI, NMI) as the criterion, seeing whether they could get both of them improved. Meanwhile, they generally achieve only one of them, which always be the information recovery one [25, 26, 51], making the result less convincing. Another kind of performance criterion is the trade-off between efficiency and effectiveness, which compares the cost to achieve the same effectiveness or the measuring score with the same algorithm cost, and adopted by the mainstream researches [27, 30]. However, even though the effectiveness would get better along with the algorithm running, we have no sense about the effectiveness, e.g., conductance or F1-Score, the algorithms ought to achieve and the appropriate time to stop the algorithm. The fundamental works of graph local clustering [18, 20, 29] suffer from a similar problem. The theoretical bounds with form \(O(\sqrt {\Phi })\) may be less meaningful for the specific application situation whose purpose is finding the cluster with the best measuring score but not just with some bound guarantee.
Effectiveness
To explore the solution of the evaluation dilemma and bring effectiveness into clustering algorithms, a series work [26, 51,52,53] focus on the measuring metric designing in the first phase of Nibble and push a step forward in both theoretical and applied areas. Kloumann, Ugander, and Kleinberg [52] regard the power series of transition probabilities as node features relevant to cluster and make the assembling weights a kind of linear classifier that digests these features to get the GPR space separated. They point out that PPR with a proper choice of the teleport constant α corresponds to the optimal classifier under Stochastic Block Model (SBM) [54] with the mean filed assumption [55]. Li, Chien, and Milenkovic [26] generalize the result by relaxing the mean-field assumption and analyzing the convergence of transition probabilities to their mean-field values and propose a new measure form with \(\frac {\theta ^{k}}{(\theta ^{k}+\phi )^{2}}\) called Inversed PageRank (IPR) for their slower decay speed compared with PPR. Another inspiring work upon effectiveness is the Time-Dependent Personalized PageRank (TDPR) provided by Avron and Horesh [51]. Besides the new GPR structure, they also propose a new quality metric to evaluate the effectiveness of algorithms based on the differences between the results produced by different methods. They show that the proposed TDPR measure performs differently from the popular PPR and HKPR and could cooperate with the existing measures.
1.1 Motivations and challenges
Though a series of measuring metrics have been proposed to achieve better effectiveness, the problem of parameter selection under given metric is still challenging, such as the teleport parameter α in PPR, the temperature parameter h in HKPR, and the decaying parameters 𝜃,ϕ in IPR. Though each work could choose the best parameters for its own purpose, they actually have no idea how to tune the parameter to get a better result, which leads the literature usually share the same parameters with some original work to take fair comparisons. Yang et al. [30] share their consideration of the parameter choice for their TEA algorithm and claim the importance of choosing the appropriate parameter for specific graph task. Klicpera, Weißenberger, and Günnemann [56] try to explore the best GPR weighting parameters for the graph local clustering task with the adaptive diffusion paradigm [57], which is designed for the link-prediction tasks. However, they find it performs worse than specific PPR and HKPR used in most other works. Li et al. [53] set up a End-to-End learning framework Gumbel-Softmax-based Optimization (GSO) to solve the optimization problems on graph, with the help of the Gumbel-Softmax technique, which could provide gradient to the sampling operations approximately. Though the framework is designed for all graph optimization problems, GSO would lose its ability to the massive graphs since the supervision signals in the graph learning problems are always sparse and GSO has O(n) parameters to train.
Motivations
To this end, we summarize the aforementioned statements and analysis with several questions as our motivations to conduct this research.
-
Though the capacity of the GPR has been studied a lot, has it been fully demonstrated with the existing fixed parameters?
-
To achieve better effectiveness, are there measures appropriate for different circumstance, and how can we design them?
-
To achieve better efficiency, can we avoid conducting the grid-searching operations in graph clustering problem?
-
To be scalable, can we take advantages of the existing techniques, e.g., approximation or randomization?
Challenges
There are several fundamental challenges for designing and solve the problems described above. We give a brief description here and answer them in Section 3.
-
How to deal with the discreteness of the sweep phase of the Nibble and make it differentiable to play a role in the End-to-End framework?
-
How to make the conductance metric a proper supervising signal to provide the appropriate gradient to the training process?
-
How to get the End-to-End model trained as desired?
-
How to use the trained model to infer the clustering result?
-
How to make the framework compatible with the present scalable graph local clustering algorithms?
Motivated by these inspiring questions, we focus on the measuring metric designing problem under the Nibble two-phase framework and develop a End-to-End learning framework LearnedNibble to efficiently and effectively optimize the graph local clustering target. More specifically, we model the measuring metric designing problem as the parameter selection task under the GPR form, whose capacity on the graph local clustering task has been proven in both theoretical and practical areas. We take graph topology G = (V,E) and the seed node u as input since the relation between the semantic context on graph and the cluster structure is beyond our scope. We evaluate the algorithm performance with the conductance metric because conductance is consistent with the initial and natural definition of the cluster and performs well in both experimental and applied circumstances. By solving these non-trivial challenges in an integral framework, we bring a new perspective and framework to the graph local clustering task.
1.2 Our contributions
We present an in-depth study on Nibble-based graph local clustering task with conductance as the cluster quality metric and make the following contributions.
Supervision manner
We design a differentiable learning-based soft-mean-sweep operator in a self-supervised manner to guide the training process.
Optimization mechanism
We explore the appropriate optimization mechanism for the graph local clustering task and propose the regradient technique to conduct the optimization.
End-to-end framework
We model the effectiveness problem of graph local clustering as a learning task w.r.t. GPR weighting parameters, and propose a End-to-End framework named LearnedNibble based on the soft-mean-sweep and the regradient technique, which can adaptively raise the cluster with best conductance score on different graphs.
Capacity and compatibility
We illustrate the capacity of the GPR family and our LearnedNibble framework by conducting extensive experiments on the standard benchmarks of graph clustering tasks. We show that LearnedNibble gets the better effectiveness against all existing and commonly-used measuring metrics, e.g., PPR, HKPR and IPR, in all datasets. Moreover, the advantage of LearnedNibble is still kept with all levels of approximation, allowing it to combine with any approximated local clustering framework.
Scalability and practicality
We show that the clustering manner obtained from our LearnedNibble can generalize to the other nodes, whether they are in the same cluster as the seed node or not, with just a slight performance reduction. The generalization ability of LearnedNibble makes it scalable to massive data circumstances and practical in diverse graph-based tasks, including Graph Visualization and Graph Neural Networks.
1.3 Paper organization
The rest of the paper is organized as follows. We introduce some basic notations and important techniques in Section 2, We present LearnedNibble framework in Sections 3. We evaluate the clustering capacity, generalization ability and approximation compatibility of our framework in Section 4. Finally, Section 5 discusses several interesting observations and shares some ideas and Section 6 concludes the paper.
2 Preliminaries
Before deriving the LearnedNibble framework in detail, we first introduce several important notations and techniques, and finally formalize the problem we investigate in this work.
2.1 Basic terminology
Let G = (V,E) be an undirected and unweighted graph, where V = {v1,v2,...,vn} denotes the node set with size n, and E = {e(u,v)∣u,v ∈ V } denotes the edge set with size m. We use d(u) to denote the node v’s degree, and use vector d = {d(u),u ∈ V } to represent degree corresponding to each node. We use A to denote the adjacency matrix of G, and A(i,j) = A(j,i) = 1 if and only if we have e(vi,vj) ∈ E. Let D be the degree matrix of G with D(i,i) = d(vi). Besides, the transition probability matrix (a.k.a random walk transition matrix or random walk transition probabilities) for G is represented by P = D− 1A. Accordingly, Pk denotes the k-th order transition probability matrix, \(\mathbf {P}^{k}\vec {1}_{s}\) denotes the transition probabilities of the k-hop random walk started from seed node s. The notations used frequently in this work are listed in Table 1.
2.2 Generalized pagerank
This part introduces the measuring metric used in this work.
Definition 1
(L-hop Transition Probability Sequence) Given a graph G and the seed node s, the transition probability from s to other nodes u ∈ V with k-steps can be computed as: pk(s,u) = Pk(s,u). By putting nodes together we get the k-hop transition probability vector of s, i.e., \(p^k(s)=\mathbf P^k{\vec1}_s=\left\{p^k(s,u)\vert u\in V\right\}.\) The L-hop transition probability sequence is defined as the sequence of k-hop transition probability vector with the random walk length k ranging from 1 to L with form:
Definition 2
(Generalized PageRank) Given the L-hop transition probability sequence πL(s) of seed node s on graph G, the Generalized PageRank with the weighting vector w is defined as:
We may omit the seed node flag s and the range flag L in the expressions and use π and gprw in brief.
2.3 Graph local clustering
A cluster in G is a node set C ⊂ V and its quality is measured by a given criterion. We use the commonly-used conductance criterion in this work.
Definition 3
(Conductance) Let G = (V,E) be a undirected, unweighted graph. The volume of a node set C ⊂ V is defined as \(\text {vol}(C)={\sum }_{u\in C}d(u)\). The edge boundary of a node set C is defined as ∂(C) = {e(u,v)|u ∈ C,v ∉ C}. The conductance of a node set C is defined as:
We introduce the Nibble two-phase framework, which is the fundamental framework of graph clustering tasks, by formally introducing each phase of it.
Definition 4
(Measure) Given the graph G, seed node s and the measuring metric \({\mathscr{M}}\), we use the \({\mathscr{M}}\) to measure the proximity score of all nodes towards s on graph G and output the measuring score vector \(q={{\mathscr{M}}(G,s)}\).
Definition 5
(Sweep) Given the measuring score vector q and the quality scoring function \(\mathcal {S}\). Let c = (v1,...,vn) be an ordered sequence of the nodes such that \(\frac {q(v_{i})}{d(v_{i})}\geq \frac {q(v_{i+1})}{d(v_{i+1})}\). We scan the sequence and make the top-j elements a candidate set Cj when visit j-th element. We use \(\mathcal {S}\) to evaluate the quality of the candidate set sequentially, and outputs the C∗ with best score, i.e., smallest conductance in this work, \(\mathcal {S}(C_{*})=\mathcal {S}_{*}\) as the result.
The sweep phase is demonstrated by Algorithm 1.
2.4 Approximate graph diffusion
The approximate graph diffusion(AGP) [27] framework shows a great capacity to handle the massive data circumstance. We make it a basic module in LearnedNibble for scalability sake. AGP takes an undirected graph G, a seed node s, a propagation range level L, a weighted sequence w and a error guarantee parameter 𝜖 as input, outputs the estimated propagation vector which achieves both theoretical approximate guarantee and near-optimal running time complexity. In our settings, we make the weight vector w an all-ones vector to get the estimated L-hop transition probability sequence \(\hat {\pi }^{L}\) from the AGP process as the input of our LearnedNibble.
2.5 Problem formulation
With taking the effectiveness, efficiency and scalability into consideration, we formalize the problem investigated in this work as the End-to-End d approximate conductance optimization task described as follow.
Definition 6
(End-to-End Approximate Conductance Optimization) Given the graph G, seed node s, propagation range level L, error guarantee parameter 𝜖. The estimated L-hop transition probability sequence \(\hat {\pi }^{L}\) with absolute error 𝜖 is raised from the AGP. We follow the Nibble two-phase framework with keeping the sweep phase fixed as a standard cluster proposition process based on measuring score vector q, focus on finding the appropriate measuring metric \({\mathscr{M}}\) in Generalized PageRank form, i.e., \(w\times \hat {\pi }^{L}\), to optimized the conductance of the proposed cluster, in an End-to-End d manner.
3 The framework
This section introduces our LearnedNibble framework with dealing with the challenges mentioned in Section 1.1 and to solve the problem defined as Definition 6.
3.1 Input data
The input data is not only the material on which our training process is based but also is the query task for which our model should take responsibility. We use the approximated result output by the AGP under some error guarantee parameter 𝜖 as our input data to make our framework compatible with approximation and scalable on massive graphs.
3.2 Trainable parameters
We model the \({\mathscr{M}}^{L}_{GPR}\) used in the Measure phase of LearnedNibble as an assembling method of estimated L-hop transition probability sequence \(\hat {\pi }^{L}\) with trainable weighting parameters as:
Therefore, the parameter amount of LearnedNibble is L rather than O(n) [53].
3.3 Supervision manner
As mentioned in Section 1.1, the sweep phase described in Section 5 is in grid-search manner and thus is discrete and not differential inherently, which brings challenges to achieve the desiring End-to-End d framework. With a careful investigation of the sweep phase, we divide the integral sweep apart into three operations, which are conducted in turn but coupled with each other in an ingenious way, namely the loop, selection and evaluation. We analysis these operations carefully in the following part to better introduce our intuition and solution.
3.3.1 Loop
The loop operation sequentially visits each element along the measuring score vector and conducts the following selection operation to guarantee the best result within all n cluster candidates. The loop operation makes the algorithm avoid the combinatorial complexity by reducing the check operation times from the Bell Number with parameter n to the n and provides the admiring locality. Nevertheless, the brute-force mechanism within the loop operation binds itself to the disappointing discreteness and makes it incompatible with the End-to-End manner. Therefore, questions come in two aspects. 1) How to activate the selection operation to get the candidate node sets? 2) How to guarantee the performance? We give our answers to both questions in the following part.
3.3.2 Selection
Towards the questions above, we present two of our several trials here, one of which finally forms the LearnedNibble.
Sharp-drop modeling
We notice that Andersen and Chung [23] propose an powerful statement about the sweep phase, saying that whenever there is a sharp drop in the rank defined by a personalized PageRank vector, the location of the drop reveals a cut with small conductance. Inspired by this observation, we try to model the selection operation with a trainable parameter Δ. We expect it separates the measuring score vector into two parts, corresponding to the cluster and the rest. Unfortunately, this proposal suffers from the absence of the loop operation and the flexibility of the GPR measure \({\mathscr{M}}^{L}_{GPR}\) in Section 3.2. As a consequence of the first one, we cannot keep the learned Δ with a reasonable value which surely should be in the measuring score range, despite diverse training techniques or regularizations. Besides, we lose the connection between the GPR measuring score and the parameter Δ even within two consecutive learning epochs, making the two-stage training mechanism fail. Because it makes no sense to expect the best separation method for one score vector suits another well, as they may vary widely.
Self-supervising
To handle it, we first revisit the Nibble framework to find out the most essential information covered by it. Though the performance seems to be related to the measure result and some values like the sharp drop, we point out that the clustering capacity is mainly determined and represented by the order of measuring score sequence under the Nibble manner. Once the score of each node is fixed, the clustering result is almost determined. Besides, unlike most other methods, we are not pursuing to output the final result in just one sweep run, but exploring the appropriate measurement method for the specific task in the training process. Therefore, we don’t have to ask for the exact evaluation on the measuring score sequence as the standard sweep does. We only need to provide some information to guide the measure \({\mathscr{M}}^{L}_{GPR}\) to achieve better cluster discovering capacity by adaptive adjusting its weight parameters. Thus, we propose the mean-sweep technique to provide a lower-bound of the clustering capacity of the measure \({\mathscr{M}}^{L}_{GPR}\) by separating the score sequence into two parts based on mean of itself. We formally introduce the mean-sweep operation with the following definition.
Definition 7
(mean-sweep) Given the measuring score vector gpr with the measure \({\mathscr{M}}^{L}_{GPR}\). Let \(\mathbf {gpr^{d}}=\frac {1}{d}\mathbf {gpr}\) be degree-normalized version of the gpr. We choose the nodes whose normalized measuring score is above the mean value, i.e.,
to be the cluster result.
Even though the mean-sweep only provides one clustering result among many possible selections, it is sufficient to guide the training process. We illustrate this statement with the experiment results in Section 4.
Although we have already stepped forward by providing a solution to the selection dilemma within the learning mechanism, the discreteness challenge still exists as the result proposed by mean-sweep is also a node set, which is discrete and blocks the gradient propagation. It leads us to the evaluation problem which is rather trivial in the standard sweep operation.
3.3.3 Evaluation
Following the same principle in the mean-sweep technique, we use the Sigmoid operator, which is widely used in the Machine Learning areas, to make the score above mean close to 1 and make the other close to 0. The activating operation here is not for bringing the system non-linearity but is an approximation of the discrete set selection result. It plays a similar role as the Gumbel-Softmax operation in the GSO [53]. With this approximation, we propose the soft-mean-sweep module, the core element of our LearnedNibble framework.
Definition 8
(soft-mean-sweep) Given the measuring score vector gpr with the measure \({\mathscr{M}}^{L}_{GPR}\). Let \(\mathbf {gpr^{d}}=\frac {1}{d}\mathbf {gpr}\) be degree-normalized version of the gpr. We normalize the gprd with its mean and use the Sigmoid operator to activate it, i.e.,
and make it the approximate clustering result.
Loss
We get the final supervision manner for LearnedNibble by putting everything together. We compute the conductance in the Raleigh quotient on the result output by soft-mean-sweep with the matrix operation, view it as the approximate reflection on the clustering capacity provided by the \({\mathscr{M}}^{L}_{GPR}\), and set it as the supervising signal (a.k.a the loss) of the learning framework, i.e.,
3.4 Optimization mechanism
With the supervision manner and loss function in hands, the most important thing is using the supervising signal to guide the training process. Several optimizers have shown their capacities to be the appropriate engine of diverse learning tasks, among which the Adam [58] is the most widely-used one. Though being successful in plentiful circumstances, the Adam does not work well as expected in our graph local clustering task. It always misses the better solutions and sometimes keeps the wrong direction for a long time. We suppose one reasonable explanation of this wired situation is that the graph clustering task naturally has many local optimums who are close to the best one, which makes the Adam strapped and misled.
Regradient
To mitigate the problem, we propose the regradient technique to make the Adam optimizer focus more on the current step and avoid being affected by the former gradients.
Definition 9
(regradient) With a parameter r which controls the restart frequency, we reset the Adam optimizer every r epoch by clearing its accumulated gradients. Without losing the generality, we fix the r to be 10 in this work.
We will present the effectiveness of the regradient technique with an ablation experiment in Appendix 1.
3.5 Inference manner
The last but not least thing is obtaining the model from the training process and using it to do the inference. In our LearnedNibble framework, the model is the measuring method \({\mathscr{M}}^{L}_{GPR}\) with learned GPR weight parameters, and the inference result is the clustering node set.
Most machine learning tasks obtain the final trained model with the convergence and the early-stop technique. As for our graph local clustering task, it is unnecessary and unfair to ask the model to get converged. The reasons are twofold. 1) As mentioned in the supervision manner in Section 3.3, we use the lower-bound of the clustering capacity to guide the training process, and there may be a gap between the performance reported by the loss and the actual ability of the model. 2) Though we aim to avoid searching the massive possible cases, we still share the same solution space as the former combinatorial optimization problem. Thus, we propose the search-select manner to obtain the model from the LearnedNibble.
Definition 10
(search-select) Given the L-hop transition probability sequence π and training process of the LearnedNibble with T epochs, i.e., \(\mathcal {R}^{T}=\left \{{\mathscr{M}}_{1},...,{\mathscr{M}}_{T}\right \}\), where \({\mathscr{M}}_{i}\) is the measuring method with trained weight vector wi, i.e., \({\mathscr{M}}_{i}(\pi )=w_{i}\times \pi\). We compute the exact clustering capacity Φi of \({\mathscr{M}}_{i}\) by conducting the standard sweep operation on the measure result of each \({\mathscr{M}}_{i}\) as described in Algorithm 1. We select the \({\mathscr{M}}_{*}\) with the best Φ∗ as the final model. We use the \({\mathscr{M}}_{*}\) obtained from the training process \(\mathcal {R}^{T}\) on graph G to answer the query of any seed node s ∈ G.
3.6 Framework overview
We present our LearnedNibble framework in this part with Algorithm 2 and Algorithm 3. The Initialization module which has not been mentioned is described in Appendix 1.
4 Experiments
This section evaluates the performance of our LearnedNibble in three aspects: 1) the clustering capacity concerning conductance optimization, 2) the generalization ability from training seed nodes to the whole graph, 3) and the compatibility with the approximation. We report the key results here and discuss additional results in Appendix 1.
Datasets
We conduct our experiments on commonly-used benchmark graphs with ground-truth labels, including DBLP, Amazon, PubMed, CiteSeer and Cora. The statistics of the datasets are listed in Appendix 1 with Table 4.
Metric
We use the conductance as the evaluation metric and the optimization target of our framework since the information recovery metrics could conflict with our optimization purpose. We investigate the conductance of the ground-truth clusters in Figure 2 in Appendix 1.
Competitors
We set the existing GPR instances with different specific weighting paradigm, like PPR, HKPR and IPR, as part of our competitors. Another competitor is the MEAN weighting operation since the result proposed by any method should be better than this trivial one. The last competitor we set for our LearnedNibble is the most recent GSO [53] as for its applicability for all graph optimization tasks. The considerations and the comparison methods are presented in Appendix 1.
4.1 Training settings
Training data
We select 5 seed nodes from different clusters whose size is larger than 100 randomly from each graph to form the training seed node sets. We set the propagation range L = 50. We vary the approximation parameter 𝜖 in [0,10− 4,10− 5,10− 6]. 𝜖 = 0 means we use the exact L-hop transition probability sequence rooted at the seed node.
Initialization method
We use the RAW weighting vector, which is a one-hot vector with the seed node index non-zero as the initial weight for the training process since the other initialization methods are our competitors. The analysis of the initialization sensitivity is presented in Appendix 1
Training method
We set the training budget T = 2,000, the regradient step e = 10 and the learning rate lr = 1.
4.2 Clustering capacity
This section investigates the clustering capacity of LearnedNibble with no approximation. The results are presented with Table 2. It is surprising to see that the trivial MEAN beats all GPR instances in all datasets. Moreover, our LearnedNibble shows much better performance compared with all competitors. The GSO seems to learn nothing from the training process, and we will omit it for the following comparisons. The specific settings and detailed results are presented in Appendix 1.
4.3 Generalization ability
We present the generalization ability of LearnedNibble with no approximation in this part by reporting basic statistics of the conductances in test samples. The results are listed in Table 3. First, the final model obtained from LearnedNibble is useful as it gets even better performances when transferred to other nodes, no matter within the cluster or the whole graph. See the last column of Table 3. Then, the mean and the std. columns prove that the model achieves a satisfying performance wvector in our situatioith having strong confidence to find a cluster with relatively small conductance. We talk about some other interesting observations in Section 5.
4.4 Approximation compatibility
The approximation compatibility of LearnedNibble is presented in this section with four different approximation levels. Figure 1 combines all information together. We report 4 results of total 10 results(5 datasets and 2 generalizations for each) for the convenience sake, where the ∗ represents the in-cluster generalization results. The other results are presented in Appendix 1. The x-axis is the approximation level with parameter 𝜖. The y-axis is the conductance value. The thick horizontal line is the training results. The boxplot represents the transferring results. Though the clustering capacity and the generalization ability of the LearnedNibble would be weakened with the approximation level up, it is still rather satisfying for the most time since the conductance values are rather small with variance well-bounded.
5 Discussions
Though the experiments result in Section 4.3 and Appendix 1 shows positive evidence for sharing the similar clustering method for all nodes on the same graph, some weird but interesting phenomena have got our attention. 1) The in-cluster generalization performance is worse than the in-graph one in PubMed and Cora. 2) The performance gap between the in-cluster and in-graph is large in DBLP and Amazon. 3) We can get better results for some nodes even not been optimized in nearly all situations. These observations get us to consider the basis of the generalization and the information attached to the graph.
Topology consistency
The most critical assumption we should have to transfer one model to other situations is the consistency. As for graph clustering tasks concerning the topology structure optimization metric like conductance, the topology consistency of different parts of graph should be evaluated and checked firstly, which is another interesting question.
Data representativeness
As described in Section 4.1, we only use 5 randomly chosen nodes as our training data, which may not be so representative for the graph. Thus, selecting suitable nodes as training data may be a fundamental problem.
Information-topology compatibility
Recall that we have pointed out the conflict between topology-based metrics and information-based metrics in Section 1 and Appendix 1, the information attached to the graph as labels or other context on nodes or edges may provide somewhat different and independent information compared to the topology. As a result, the generalization in the so-called same cluster could be meaningless and even more challenging. Undoubtedly, taking advantage of both topology and information of graph is one of the most crucial but challenging problems in the graph mining area.
6 Conclusions
In this paper, we take in-depth research on the graph local clustering task and propose a novel learning-based framework LearnedNibble by solving a series of non-trivial challenges. To the best of our knowledge, LearnedNibble is the first one to take responsibility for the cluster quality and take both the effectiveness and efficiency into consideration in an End-to-End paradigm with self-supervised manner. Our experiments demonstrate that the clustering capacity of L-hop transition probability sequence is under-estimated with only using the fixed weighting structures and parameters to assemble, and can be taken better advantage by our LearnedNibble framework. Besides the performance improvements on the cluster quality, our framework shows great generalization ability and approximation compatibility, making itself practical in many situations.
Data availability
The graph datasets that support the findings of this study are available in SNAP project, https://snap.stanford.edu/data/index.html.
References
Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99(12), 7821–7826 (2002)
Wasserman, S., Faust, K., et al.: Social network analysis: Methods and applications (1994)
Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.-U.: Complex networks: Structure and dynamics. physrep 424(4–5), 175–308 (2006). https://doi.org/10.1016/j.physrep.2005.10.009
Lu, Z., Wahlström, J., Nehorai, A.: Community detection in complex networks via clique conductance. Sci. Rep. 8(1), 1–16 (2018)
Wang, M., Wang, C., Yu, J.X., Zhang, J.: Community detection in social networks: an in-depth benchmarking study with a procedure-oriented framework. Proc. VLDB Endow. 8(10), 998–1009 (2015)
Fortunato, S.: Community detection in graphs. Phys. Rep. 486 (3–5), 75–174 (2010)
Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631–640 (2010)
Yi, F., Moon, I.: Image segmentation: A survey of graph-cut methods. In: 2012 International Conference on Systems and Informatics (ICSAI2012), pp. 1936–1941. IEEE (2012)
Vicente, S., Kolmogorov, V., Rother, C.: Graph cut based image segmentation with connectivity priors. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. Int. J. Comput. Vis. 59(2), 167–181 (2004)
Tolliver, D.A., Miller, G.L.: Graph partitioning by spectral rounding: Applications in image segmentation and clustering. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 1, pp. 1053–1060. IEEE (2006)
Liao, C. -S., Lu, K., Baym, M., Singh, R., Berger, B.: Isorankn: Spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12), 253–258 (2009)
Voevodski, K., Teng, S. -H., Xia, Y.: Finding local communities in protein networks. BMC Bioinform. 10(1), 1–14 (2009)
Zhou, S., Yang, X., Chang, Q.: Spatial clustering analysis of green economy based on knowledge graph. Journal of Intelligent & Fuzzy Systems (Preprint), 1–10 (2021)
Foysal, K.H., Chang, H.J., Bruess, F., Chong, J.W.: Smartfit: Smartphone application for garment fit detection. Electronics 10(1), 97 (2021)
Zhu, D., Shen, G., Chen, J., Zhou, W., Kong, X.: A higher-order motif-based spatiotemporal graph imputation approach for transportation networks. Wirel. Commun. Mob. Comput., 2022 (2022)
Spielman, D.A., Teng, S. -H.: Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, pp. 81–90 (2004)
Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pp. 475–486. IEEE (2006)
Andersen, R., Peres, Y.: Finding sparse cuts locally using evolving sets. In: Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, pp. 235–244 (2009)
Spielman, D.A., Teng, S. -H.: A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM J. Comput. 42(1), 1–26 (2013)
Lovász, L., Simonovits, M.: The mixing rate of markov chains, an isoperimetric inequality, and computing the volume. In: Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science, pp. 346–354. IEEE (1990)
Lovász, L., Simonovits, M.: Random walks in a convex body and an improved volume algorithm. Random Struct. Algor. 4(4), 359–412 (1993)
Andersen, R., Chung, F.: Detecting sharp drops in pagerank and a simplified local partitioning algorithm. In: International Conference on Theory and Applications of Models of Computation, pp. 1–12. Springer (2007)
Chung, F.: The heat kernel as the pagerank of a graph. Proc. Natl. Acad. Sci. 104(50), 19735–19740 (2007)
Kloster, K., Gleich, D.F.: Heat kernel based community detection. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1386–1395 (2014)
Li, P., Chien, I., Milenkovic, O.: Optimizing generalized pagerank methods for seed-expansion community detection. Adv. Neural Inf. Process. Syst., 32 (2019)
Wang, H., He, M., Wei, Z., Wang, S., Yuan, Y., Du, X., Wen, J.-R.: Approximate graph propagation. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1686–1696 (2021)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the Web. Stanford InfoLab, Technical report (1999)
Chung, F., Simpson, O.: Solving linear systems with boundary conditions using heat kernel pagerank. In: International Workshop on Algorithms and Models for the Web-Graph, pp. 203–219. Springer (2013)
Yang, R., Xiao, X., Wei, Z., Bhowmick, S.S., Zhao, J., Li, R. -H.: Efficient estimation of heat kernel pagerank for local clustering. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1339–1356 (2019)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 150–160 (2000)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Nat. Acad. Sci. 101(9), 2658–2663 (2004)
Newman, M.E.: Modularity and community structure in networks. Proc. Nat. Acad. Sci. 103(23), 8577–8582 (2006)
Kobourov, S.G., Pupyrev, S., Simonetto, P.: Visualizing graphs as maps with contiguous regions. In: EuroVis (Short Papers) (2014)
Cheeger, J.: A lower bound for the smallest eigenvalue of the Laplacian. Probl. Anal. 625(195-199), 110 (1970)
Cox, I.J., Rao, S.B., Zhong, Y.: “ratio regions”: a technique for image segmentation. In: Proceedings of 13th International Conference on Pattern Recognition, vol. 2, pp. 557–564. IEEE (1996)
Sharon, E., Galun, M., Sharon, D., Basri, R., Brandt, A.: Hierarchy and adaptivity in segmenting visual scenes. Nature 442(7104), 810–813 (2006)
Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. Knowl. Inf. Syst. 42(1), 181–213 (2015)
Benson, A.R., Gleich, D.F., Leskovec, J.: Higher-order organization of complex networks. Science 353(6295), 163–166 (2016)
Tsourakakis, C.E., Pachocki, J., Mitzenmacher, M.: Scalable motif-aware graph clustering. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1451–1460 (2017)
Yin, H., Benson, A.R., Leskovec, J., Gleich, D.F.: Local higher-order graph clustering. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 555–564 (2017)
Ma, W., Cai, L., He, T., Chen, L., Cao, Z., Li, R.: Local expansion and optimization for higher-order graph clustering. IEEE Internet Things J. 6(5), 8702–8713 (2019)
Huang, S., Li, Y., Bao, Z., Li, Z.: Towards efficient motif-based graph partitioning: An adaptive sampling approach. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 528–539. IEEE (2021)
Zhou, D., Zhang, S., Yildirim, M.Y., Alcorn, S., Tong, H., Davulcu, H., He, J.: High-order structure exploration on massive graphs: A local graph clustering perspective. ACM Trans. Knowl. Discov. Data (TKDD) 15(2), 1–26 (2021)
Chhabra, A., Faraj, M.F., Schulz, C.: Local motif clustering via (hyper) graph partitioning. arXiv:2205.06176 (2022)
Emmons, S., Kobourov, S., Gallant, M., Börner, K.: Analysis of network clustering algorithms and cluster quality metrics at scale. PloS one 11(7), 0159161 (2016)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Meilă, M.: Comparing clusterings—an information based distance. J. Multivar. Anal. 98(5), 873–895 (2007)
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)
Avron, H., Horesh, L.: Community detection using time-dependent personalized pagerank. In: International Conference on Machine Learning, pp. 1795–1803. PMLR (2015)
Kloumann, I.M., Ugander, J., Kleinberg, J.: Block models and personalized pagerank. Proc. Natl. Acad. Sci. 114(1), 33–38 (2017)
Li, Y., Liu, J., Lin, G., Hou, Y., Mou, M., Zhang, J.: Gumbel-softmax-based optimization: a simple general framework for optimization problems on graphs. Comput. Soc. Netw. 8(1), 1–16 (2021)
Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: First steps. Soc. Netw. 5(2), 109–137 (1983)
Weiss, P.: L’hypothèse du champ moléculaire et la propriété ferromagnétique. J. Phys. Theor. Appl. 6(1), 661–690 (1907)
Klicpera, J., Weißenberger, S., Günnemann, S.: Diffusion improves graph learning. Advances in Neural Information Processing Systems, 32 (2019)
Berberidis, D., Nikolakopoulos, A.N., Giannakis, G.B.: Adaptive diffusions for scalable learning over graphs. IEEE Trans. Signal Process. 67(5), 1307–1321 (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Leskovec, J., Sosič, R.: Snap: A general-purpose network analysis and graph-mining library. ACM Trans Intelli Syst Technol (TIST) 8(1), 1–20 (2016)
Getoor, L.: Link-based classification. In: Advanced Methods for Knowledge Discovery from Complex Data, pp. 189–207. Springer (2005)
Namata, G., London, B., Getoor, L., Huang, B., EDU, U.: Query-driven active surveying for collective classification. In: 10th International Workshop on Mining and Learning with Graphs, vol. 8, p. 1 (2012)
Acknowledgements
The author would like to thank Wang Hanzhi and Zhang Ruoqi for their selfless and solid technical supports. This work is partially supported by the Fundamental Research Funds for the Central Universities (No.2020JS005).
Funding
This work is partially supported by the Fundamental Research Funds for the Central Universities (No.2020JS005).
Author information
Authors and Affiliations
Contributions
Yuan Zhe devised the methods and framework, wrote the whole manuscript text and prepared all materials.
Corresponding author
Ethics declarations
Human and Animal Ethics
Not applicable.
Ethics approval and consent to participate
Not applicable.
Consent for Publication
Not applicable.
Competing interests
The author have no relevant financial or non-financial interests to disclose.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: additional experiments
Appendix A: additional experiments
Data sources
We obtain the DBLP, Amazon from the Stanford Network Analysis Project(SNAP) [59], and the rest from their original works [60, 61]. We present the basic information of the datasets used in our experiments in Table 4, and take a view of the conductances of the ground-truth clusters with Figure 2. We can see that the conductance of the labeling clusters are rather large, which should make the information-based metrics conflict with the structure-based metrics, as we note in the following part.
Competitor considerations
Since the effectiveness challenge has not been studied much and little work targets the conductance metric as we do, the competitor of our LearnedNibble may not be any specific research result or algorithm. Besides, the work we present here does not aim to beat any baseline but reveals the capacity of GPR measure family and explore the possibility and method to realize them, with being compatible to the mainstream approximate algorithms.
Comparisons
For GPR instances, we evaluate them by grid-searching a bunch of parameters with 2,000 trials for each, which is also the training budget for LearnedNibble, and take the best performance as their clustering capacities. Specifically, we set the α ∈ [0, 1, 0.0005] for PPR, h ∈ [1, 20, 0.01] for HKPR, 𝜃 ∈ [0, 1, 0.005] and vary the power of 𝜃 which determines the ϕ in [1, 5, 20, 50, 100] for IPR. For MEAN, we directly compute its exact conductance by the standard sweep operation. For GSO, we set the training budget of 200,000 for it since it has much much more parameters to train.
1.1 A.1 Training details
We make the LearnedNibble have the full accessibility of the graph adjacency matrix in the training phase but keep the algorithm local in the inference phase as other computing-based graph local clustering algorithms. The reason we make the algorithm not thoroughly local is twofold. 1) First, we should use the whole graph data since the topology is integrated and should not be sampled as the data points in the Euclidean space. 2) Second, we are looking forward to seeing that the framework have a good generalization ability to the whole graph, which is the crucial character we may depend on to develop the scalability and practicality of LearnedNibble while making the algorithm local seems weird and maybe conflict with the purpose.
For the trainable weighting parameters, we normalize the weight vector w to be one-norm ||x||1 = 1 in the inference phase but keep it free in the training phase for numerical stability sake.
1.2 A.2 Clustering capacity details
Comparisons
We report the average conductance of the 5 training seed nodes with the final model in each datasets with Table 2. The first 4 columns are the GPR family instances and the trivial MEAN pooling operation. The GSO column represents the GSO [53] framework. The last column with title GPR is our LearnedNibble framework.
Results with approximation in detail
We report the results of different datasets in turn and list them with Table 5.
1.3 A.3 Generalization ability details
Comparisons
To see more clearly, we report the generalization abilities of our LearnedNibble framework with competitors in two aspects. 1)In-Cluster: We do inference on the node randomly selected within the same cluster as the training seed nodes. It’s represented by the c columns in Table 3. 2)In-Graph: We do inference on the node randomly selected from the whole graph. It’s represented by the g columns in Table 3. We report the average conductance of the 50 testing nodes with the final model in each dataset.
Results with approximation
We report the results of different datasets in turn with both in-cluster and in-graph situations, which have not been shown in Section 4 with Figure 3.
1.4 A.4 Parameter sensitivity
Initialization comparisons
We test the sensitivity of different initializations by training our LearnedNibble framework from different starting weights. Specifically, we use the PPR weighting vector with teleport constant α = 0.1 to challenge our model. We use the IPR weighting vector with 𝜃 = 0.99,ϕ = 0.9910 for IPR testing. The comparison results of different datasets are listed in Table 6. We can see that the training with different initialization methods achieves similar but slightly different performances. The trivial MEAN and RAW initializations perform a little better, and the IPR with theoretical advantage also plays well in some cases.
Regradient and locality regularization
We investigate the regradient technique proposed in this work by conducting the ablation experiments. At the same time, we test the performance of the popular locality regularization term used in Graph Neural Networks(GNN), which keeps the information diffusion local with the minimizing the 2-norm of the difference between the graph signal after propagating and the initial signal which is the one-hot vector in our situation, i.e., ||gpr − \(\overrightarrow1_{s}\)||. The results under the exact settings with 𝜖 = 0 of both are presented by Table 7. We can see that the regradient sets with R = 1 shows better performance than its comparisons with R = 0, and the training settings with R = 1; L = 0 corresponding to the experiments with regradient technique and without the commonly-used locality regularization achieves the best performance in all situations.
Rights and permissions
About this article
Cite this article
Yuan, Z. Self-supervised end-to-end graph local clustering. World Wide Web 26, 1157–1179 (2023). https://doi.org/10.1007/s11280-022-01081-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-022-01081-8