1 Introduction

Graph is a powerful framework to model the complex relations and interactions of our world [1,2,3,4]. The analyzing methods on graph have become fundamental and crucial. Graph clustering serves as an admiring and essential technique for wide-range applications, including community detection [5,6,7], image segmentation [8,9,10,11], protein grouping [12, 13] and especially spatial-temporal system [14,15,16], thus has drawn increasing attention during the recent years in computer science together with plenty of various applied research areas. However, the graph clustering faces efficiency and effectiveness challenges from both practical and theoretical aspects.

Locality

One fundamental consensus towards efficiency challenge is that the graph clustering methods should be local w.r.t. some given seed node [17,18,19]. The locality has two aspects of meanings: process and result, which are usually coupled together for their implicit consistency. The process aspect means the graph clustering algorithm should only access the data in the neighborhood of the given seed node. The requirement for the result being local means the graph clustering algorithm should output the result nodes set in a small region around the seed node.

Framework

Spielman and Teng [17, 20] are the first to study the graph local clustering problem. They propose a two-phase framework Nibble to guarantee the locality and performance of the graph clustering algorithm based on the analysis of Lovász and Simonovits [21, 22]. We introduce the important Nibble framework in detail in Section 2.3 and go through it quickly here. In the first phase, a power series \(\left\{\mathbf P^k{\vec1}_s\right\}\), which represents the transition probability from seed node s to other nodes with k steps, is calculated. In the second phase, the standard sweep operation is conducted to output the node set with the first or global minimal conductance as the clustering result. The effectiveness of Nibble is guaranteed by the theoretical bounds on cluster quality [18, 23] and illustrated by empirical evaluations on various real networks [24,25,26,27].

Efficiency

Nibble has been improved from two main aspects: measuring metric designing and measurement computing, leaving the sweep process as the standard and static module. Andersen, Chung, and Lang [18] propose the algorithm PRNibble by assembling the K-hop transition probabilities with the weight formed as α(1 − α)k determined by the teleport constant α, which is a widely used node proximity metric named Personalized PageRank [28]. Besides, PRNibble [18] also provides an efficient local operation to compute it named PR-Push and achieve a better efficiency and effectiveness guarantee. Chung [24] extends the PRNibble with a physics-innovated metric called Heat Kernel PageRank(HKPR) whose weights are formed as \(\frac {e^{-t}t^{k}}{k!}\), where the parameter t is the temperature constant controlling the distribution shape. With the improved theoretical bound, Chung and Simpson [29] propose a randomized algorithm ApproxHK with sampling random walks weighted by the heat kernel coefficients to get the approximated HKPR and propose a sub-linear solution ClusterHKPR for the graph local clustering problem. To optimize the HKPR computation, Kloster and Gleich [25] attempt to get rid of the heavy and randomized Monte-Carlo process in the ApproxHK by forming the computing problem as a linear equation solving problem. They propose an efficient deterministic algorithm HK-Relax to solve the proposed linear equation by using the coordinate relaxation technique and get a faster and better algorithm for graph local clustering. Recently, Yang et al. [30] point out that though the absolute error bound in HK-Relax is not the best choice for the graph local clustering task, under the perspective of the sweep process on the measure with degree normalization. Based on this observation, they propose the algorithm TEA to approximate the HKPR vector from the seed node with relative error bound, and achieve a better efficient-effectiveness trade-off. Wang et al. [27] optimize the Push operation in several works mentioned above by introducing the randomization and propose an efficient graph propagation framework AGP. AGP can simulate any weighted message passing schema and achieve state-of-the-art graph local clustering task performance with the HKPR weights.

Evaluation criterion

Though the techniques mentioned above achieve better efficiency and effectiveness-efficiency trade-off, it is ambiguous and difficult for us to evaluate their effectiveness for the following two main reasons: 1) The algorithms are not developed to optimize the effectiveness, and 2) the effectiveness criterion and metric are not well defined. We first talk about the latter evaluation criterion problem by introducing several metric of cluster quality. Girvan and Newman [1] bring the concept of the cluster into graph research to represent the nodes set in such graph organized into internal densely linked but external loosely connected groups, which is also known as communities in network science [3]. To characterize the intuition of cluster concept, several scoring functions defined on graph structure have been proposed [6, 31,32,33,34,35], among which the modularity and the conductance are two essential metrics. modularity [34] metric evaluates the difference between the sub-graph with regard to the cluster nodes and the random graph with the same statistic properties. The conductance [36,37,38] metric directly describes the initial concept of the cluster with the Raleigh quotient form, which is formalized in Definition 3. Yang and Leskovec [39] compare a series of existing metrics on 230 real-world graphs with ground-truth cluster labels by defining sense-making and convincing criterion on goodness and robustness. They point out the conductance metric achieves the best performance during structural defines for graph clusters. Since high-order structures have advantages on revealing the real communities [40, 41], motif-conductance has been proposed recently and studied with a series of work [42,43,44,45,46]

Besides the cluster quality metrics based on graph topology, Emmons and Kobourov [47] propose the concept of information recovery metrics based on the Shannon entropy [48] defined on the ground-truth label of the graph cluster, including Adjusted Rand Index (ARI) [49] and Normalized Mutual Information (NMI) [50].

Optimization purpose

In practice, graph local clustering works always take conductance and F1-Score (or ARI, NMI) as the criterion, seeing whether they could get both of them improved. Meanwhile, they generally achieve only one of them, which always be the information recovery one [25, 26, 51], making the result less convincing. Another kind of performance criterion is the trade-off between efficiency and effectiveness, which compares the cost to achieve the same effectiveness or the measuring score with the same algorithm cost, and adopted by the mainstream researches [27, 30]. However, even though the effectiveness would get better along with the algorithm running, we have no sense about the effectiveness, e.g., conductance or F1-Score, the algorithms ought to achieve and the appropriate time to stop the algorithm. The fundamental works of graph local clustering [18, 20, 29] suffer from a similar problem. The theoretical bounds with form \(O(\sqrt {\Phi })\) may be less meaningful for the specific application situation whose purpose is finding the cluster with the best measuring score but not just with some bound guarantee.

Effectiveness

To explore the solution of the evaluation dilemma and bring effectiveness into clustering algorithms, a series work [26, 51,52,53] focus on the measuring metric designing in the first phase of Nibble and push a step forward in both theoretical and applied areas. Kloumann, Ugander, and Kleinberg [52] regard the power series of transition probabilities as node features relevant to cluster and make the assembling weights a kind of linear classifier that digests these features to get the GPR space separated. They point out that PPR with a proper choice of the teleport constant α corresponds to the optimal classifier under Stochastic Block Model (SBM) [54] with the mean filed assumption [55]. Li, Chien, and Milenkovic [26] generalize the result by relaxing the mean-field assumption and analyzing the convergence of transition probabilities to their mean-field values and propose a new measure form with \(\frac {\theta ^{k}}{(\theta ^{k}+\phi )^{2}}\) called Inversed PageRank (IPR) for their slower decay speed compared with PPR. Another inspiring work upon effectiveness is the Time-Dependent Personalized PageRank (TDPR) provided by Avron and Horesh [51]. Besides the new GPR structure, they also propose a new quality metric to evaluate the effectiveness of algorithms based on the differences between the results produced by different methods. They show that the proposed TDPR measure performs differently from the popular PPR and HKPR and could cooperate with the existing measures.

1.1 Motivations and challenges

Though a series of measuring metrics have been proposed to achieve better effectiveness, the problem of parameter selection under given metric is still challenging, such as the teleport parameter α in PPR, the temperature parameter h in HKPR, and the decaying parameters 𝜃,ϕ in IPR. Though each work could choose the best parameters for its own purpose, they actually have no idea how to tune the parameter to get a better result, which leads the literature usually share the same parameters with some original work to take fair comparisons. Yang et al. [30] share their consideration of the parameter choice for their TEA algorithm and claim the importance of choosing the appropriate parameter for specific graph task. Klicpera, Weißenberger, and Günnemann [56] try to explore the best GPR weighting parameters for the graph local clustering task with the adaptive diffusion paradigm [57], which is designed for the link-prediction tasks. However, they find it performs worse than specific PPR and HKPR used in most other works. Li et al. [53] set up a End-to-End learning framework Gumbel-Softmax-based Optimization (GSO) to solve the optimization problems on graph, with the help of the Gumbel-Softmax technique, which could provide gradient to the sampling operations approximately. Though the framework is designed for all graph optimization problems, GSO would lose its ability to the massive graphs since the supervision signals in the graph learning problems are always sparse and GSO has O(n) parameters to train.

Motivations

To this end, we summarize the aforementioned statements and analysis with several questions as our motivations to conduct this research.

  • Though the capacity of the GPR has been studied a lot, has it been fully demonstrated with the existing fixed parameters?

  • To achieve better effectiveness, are there measures appropriate for different circumstance, and how can we design them?

  • To achieve better efficiency, can we avoid conducting the grid-searching operations in graph clustering problem?

  • To be scalable, can we take advantages of the existing techniques, e.g., approximation or randomization?

Challenges

There are several fundamental challenges for designing and solve the problems described above. We give a brief description here and answer them in Section 3.

  • How to deal with the discreteness of the sweep phase of the Nibble and make it differentiable to play a role in the End-to-End framework?

  • How to make the conductance metric a proper supervising signal to provide the appropriate gradient to the training process?

  • How to get the End-to-End model trained as desired?

  • How to use the trained model to infer the clustering result?

  • How to make the framework compatible with the present scalable graph local clustering algorithms?

Motivated by these inspiring questions, we focus on the measuring metric designing problem under the Nibble two-phase framework and develop a End-to-End learning framework LearnedNibble to efficiently and effectively optimize the graph local clustering target. More specifically, we model the measuring metric designing problem as the parameter selection task under the GPR form, whose capacity on the graph local clustering task has been proven in both theoretical and practical areas. We take graph topology G = (V,E) and the seed node u as input since the relation between the semantic context on graph and the cluster structure is beyond our scope. We evaluate the algorithm performance with the conductance metric because conductance is consistent with the initial and natural definition of the cluster and performs well in both experimental and applied circumstances. By solving these non-trivial challenges in an integral framework, we bring a new perspective and framework to the graph local clustering task.

1.2 Our contributions

We present an in-depth study on Nibble-based graph local clustering task with conductance as the cluster quality metric and make the following contributions.

Supervision manner

We design a differentiable learning-based soft-mean-sweep operator in a self-supervised manner to guide the training process.

Optimization mechanism

We explore the appropriate optimization mechanism for the graph local clustering task and propose the regradient technique to conduct the optimization.

End-to-end framework

We model the effectiveness problem of graph local clustering as a learning task w.r.t. GPR weighting parameters, and propose a End-to-End framework named LearnedNibble based on the soft-mean-sweep and the regradient technique, which can adaptively raise the cluster with best conductance score on different graphs.

Capacity and compatibility

We illustrate the capacity of the GPR family and our LearnedNibble framework by conducting extensive experiments on the standard benchmarks of graph clustering tasks. We show that LearnedNibble gets the better effectiveness against all existing and commonly-used measuring metrics, e.g., PPR, HKPR and IPR, in all datasets. Moreover, the advantage of LearnedNibble is still kept with all levels of approximation, allowing it to combine with any approximated local clustering framework.

Scalability and practicality

We show that the clustering manner obtained from our LearnedNibble can generalize to the other nodes, whether they are in the same cluster as the seed node or not, with just a slight performance reduction. The generalization ability of LearnedNibble makes it scalable to massive data circumstances and practical in diverse graph-based tasks, including Graph Visualization and Graph Neural Networks.

1.3 Paper organization

The rest of the paper is organized as follows. We introduce some basic notations and important techniques in Section 2, We present LearnedNibble framework in Sections 3. We evaluate the clustering capacity, generalization ability and approximation compatibility of our framework in Section 4. Finally, Section 5 discusses several interesting observations and shares some ideas and Section 6 concludes the paper.

2 Preliminaries

Before deriving the LearnedNibble framework in detail, we first introduce several important notations and techniques, and finally formalize the problem we investigate in this work.

2.1 Basic terminology

Let G = (V,E) be an undirected and unweighted graph, where V = {v1,v2,...,vn} denotes the node set with size n, and E = {e(u,v)∣u,vV } denotes the edge set with size m. We use d(u) to denote the node v’s degree, and use vector d = {d(u),uV } to represent degree corresponding to each node. We use A to denote the adjacency matrix of G, and A(i,j) = A(j,i) = 1 if and only if we have e(vi,vj) ∈ E. Let D be the degree matrix of G with D(i,i) = d(vi). Besides, the transition probability matrix (a.k.a random walk transition matrix or random walk transition probabilities) for G is represented by P = D− 1A. Accordingly, Pk denotes the k-th order transition probability matrix, \(\mathbf {P}^{k}\vec {1}_{s}\) denotes the transition probabilities of the k-hop random walk started from seed node s. The notations used frequently in this work are listed in Table 1.

Table 1 Basic notations

2.2 Generalized pagerank

This part introduces the measuring metric used in this work.

Definition 1

(L-hop Transition Probability Sequence) Given a graph G and the seed node s, the transition probability from s to other nodes uV with k-steps can be computed as: pk(s,u) = Pk(s,u). By putting nodes together we get the k-hop transition probability vector of s, i.e., \(p^k(s)=\mathbf P^k{\vec1}_s=\left\{p^k(s,u)\vert u\in V\right\}.\) The L-hop transition probability sequence is defined as the sequence of k-hop transition probability vector with the random walk length k ranging from 1 to L with form:

$$\pi^{L}(s) = \left\{p^{k}(s) \vert k \in [1,L]\right\}.$$
(1)

Definition 2

(Generalized PageRank) Given the L-hop transition probability sequence πL(s) of seed node s on graph G, the Generalized PageRank with the weighting vector w is defined as:

$$\mathbf{gpr}^{L}_{w}(s)=w\times \pi^{L}(s)=\sum\limits_{k=1}^{L} w_{k} \cdot \mathbf P^k{\vec1}_s.$$
(2)

We may omit the seed node flag s and the range flag L in the expressions and use π and gprw in brief.

2.3 Graph local clustering

A cluster in G is a node set CV and its quality is measured by a given criterion. We use the commonly-used conductance criterion in this work.

Definition 3

(Conductance) Let G = (V,E) be a undirected, unweighted graph. The volume of a node set CV is defined as \(\text {vol}(C)={\sum }_{u\in C}d(u)\). The edge boundary of a node set C is defined as (C) = {e(u,v)|uC,vC}. The conductance of a node set C is defined as:

$${\Phi}(C)=\frac{\vert \partial(C)\vert}{\min(\text{vol}(C), \text{vol}(G\setminus C))}.$$
(3)

We introduce the Nibble two-phase framework, which is the fundamental framework of graph clustering tasks, by formally introducing each phase of it.

Definition 4

(Measure) Given the graph G, seed node s and the measuring metric \({\mathscr{M}}\), we use the \({\mathscr{M}}\) to measure the proximity score of all nodes towards s on graph G and output the measuring score vector \(q={{\mathscr{M}}(G,s)}\).

Definition 5

(Sweep) Given the measuring score vector q and the quality scoring function \(\mathcal {S}\). Let c = (v1,...,vn) be an ordered sequence of the nodes such that \(\frac {q(v_{i})}{d(v_{i})}\geq \frac {q(v_{i+1})}{d(v_{i+1})}\). We scan the sequence and make the top-j elements a candidate set Cj when visit j-th element. We use \(\mathcal {S}\) to evaluate the quality of the candidate set sequentially, and outputs the C with best score, i.e., smallest conductance in this work, \(\mathcal {S}(C_{*})=\mathcal {S}_{*}\) as the result.

The sweep phase is demonstrated by Algorithm 1.

figure a

2.4 Approximate graph diffusion

The approximate graph diffusion(AGP) [27] framework shows a great capacity to handle the massive data circumstance. We make it a basic module in LearnedNibble for scalability sake. AGP takes an undirected graph G, a seed node s, a propagation range level L, a weighted sequence w and a error guarantee parameter 𝜖 as input, outputs the estimated propagation vector which achieves both theoretical approximate guarantee and near-optimal running time complexity. In our settings, we make the weight vector w an all-ones vector to get the estimated L-hop transition probability sequence \(\hat {\pi }^{L}\) from the AGP process as the input of our LearnedNibble.

2.5 Problem formulation

With taking the effectiveness, efficiency and scalability into consideration, we formalize the problem investigated in this work as the End-to-End d approximate conductance optimization task described as follow.

Definition 6

(End-to-End Approximate Conductance Optimization) Given the graph G, seed node s, propagation range level L, error guarantee parameter 𝜖. The estimated L-hop transition probability sequence \(\hat {\pi }^{L}\) with absolute error 𝜖 is raised from the AGP. We follow the Nibble two-phase framework with keeping the sweep phase fixed as a standard cluster proposition process based on measuring score vector q, focus on finding the appropriate measuring metric \({\mathscr{M}}\) in Generalized PageRank form, i.e., \(w\times \hat {\pi }^{L}\), to optimized the conductance of the proposed cluster, in an End-to-End d manner.

3 The framework

This section introduces our LearnedNibble framework with dealing with the challenges mentioned in Section 1.1 and to solve the problem defined as Definition 6.

3.1 Input data

The input data is not only the material on which our training process is based but also is the query task for which our model should take responsibility. We use the approximated result output by the AGP under some error guarantee parameter 𝜖 as our input data to make our framework compatible with approximation and scalable on massive graphs.

3.2 Trainable parameters

We model the \({\mathscr{M}}^{L}_{GPR}\) used in the Measure phase of LearnedNibble as an assembling method of estimated L-hop transition probability sequence \(\hat {\pi }^{L}\) with trainable weighting parameters as:

$$\mathcal{M}^{L}_{GPR}(\pi)=w\times \hat{\pi}^{L}= \mathbf{gpr}_{w}.$$
(4)

Therefore, the parameter amount of LearnedNibble is L rather than O(n) [53].

3.3 Supervision manner

As mentioned in Section 1.1, the sweep phase described in Section 5 is in grid-search manner and thus is discrete and not differential inherently, which brings challenges to achieve the desiring End-to-End d framework. With a careful investigation of the sweep phase, we divide the integral sweep apart into three operations, which are conducted in turn but coupled with each other in an ingenious way, namely the loop, selection and evaluation. We analysis these operations carefully in the following part to better introduce our intuition and solution.

3.3.1 Loop

The loop operation sequentially visits each element along the measuring score vector and conducts the following selection operation to guarantee the best result within all n cluster candidates. The loop operation makes the algorithm avoid the combinatorial complexity by reducing the check operation times from the Bell Number with parameter n to the n and provides the admiring locality. Nevertheless, the brute-force mechanism within the loop operation binds itself to the disappointing discreteness and makes it incompatible with the End-to-End manner. Therefore, questions come in two aspects. 1) How to activate the selection operation to get the candidate node sets? 2) How to guarantee the performance? We give our answers to both questions in the following part.

3.3.2 Selection

Towards the questions above, we present two of our several trials here, one of which finally forms the LearnedNibble.

Sharp-drop modeling

We notice that Andersen and Chung [23] propose an powerful statement about the sweep phase, saying that whenever there is a sharp drop in the rank defined by a personalized PageRank vector, the location of the drop reveals a cut with small conductance. Inspired by this observation, we try to model the selection operation with a trainable parameter Δ. We expect it separates the measuring score vector into two parts, corresponding to the cluster and the rest. Unfortunately, this proposal suffers from the absence of the loop operation and the flexibility of the GPR measure \({\mathscr{M}}^{L}_{GPR}\) in Section 3.2. As a consequence of the first one, we cannot keep the learned Δ with a reasonable value which surely should be in the measuring score range, despite diverse training techniques or regularizations. Besides, we lose the connection between the GPR measuring score and the parameter Δ even within two consecutive learning epochs, making the two-stage training mechanism fail. Because it makes no sense to expect the best separation method for one score vector suits another well, as they may vary widely.

Self-supervising

To handle it, we first revisit the Nibble framework to find out the most essential information covered by it. Though the performance seems to be related to the measure result and some values like the sharp drop, we point out that the clustering capacity is mainly determined and represented by the order of measuring score sequence under the Nibble manner. Once the score of each node is fixed, the clustering result is almost determined. Besides, unlike most other methods, we are not pursuing to output the final result in just one sweep run, but exploring the appropriate measurement method for the specific task in the training process. Therefore, we don’t have to ask for the exact evaluation on the measuring score sequence as the standard sweep does. We only need to provide some information to guide the measure \({\mathscr{M}}^{L}_{GPR}\) to achieve better cluster discovering capacity by adaptive adjusting its weight parameters. Thus, we propose the mean-sweep technique to provide a lower-bound of the clustering capacity of the measure \({\mathscr{M}}^{L}_{GPR}\) by separating the score sequence into two parts based on mean of itself. We formally introduce the mean-sweep operation with the following definition.

Definition 7

(mean-sweep) Given the measuring score vector gpr with the measure \({\mathscr{M}}^{L}_{GPR}\). Let \(\mathbf {gpr^{d}}=\frac {1}{d}\mathbf {gpr}\) be degree-normalized version of the gpr. We choose the nodes whose normalized measuring score is above the mean value, i.e.,

$$\begin{aligned} C_{*}=\left\{v_{i}\vert \mathbf{gpr^{d}}(v_i) \geq \overline{\mathbf{g}} \right\}, \overline{\mathbf{g}}=\frac{1}{n} \sum\limits_{i=1}^{n} \mathbf{gpr^{d}}(v_{i}), \end{aligned}$$
(5)

to be the cluster result.

Even though the mean-sweep only provides one clustering result among many possible selections, it is sufficient to guide the training process. We illustrate this statement with the experiment results in Section 4.

Although we have already stepped forward by providing a solution to the selection dilemma within the learning mechanism, the discreteness challenge still exists as the result proposed by mean-sweep is also a node set, which is discrete and blocks the gradient propagation. It leads us to the evaluation problem which is rather trivial in the standard sweep operation.

3.3.3 Evaluation

Following the same principle in the mean-sweep technique, we use the Sigmoid operator, which is widely used in the Machine Learning areas, to make the score above mean close to 1 and make the other close to 0. The activating operation here is not for bringing the system non-linearity but is an approximation of the discrete set selection result. It plays a similar role as the Gumbel-Softmax operation in the GSO [53]. With this approximation, we propose the soft-mean-sweep module, the core element of our LearnedNibble framework.

Definition 8

(soft-mean-sweep) Given the measuring score vector gpr with the measure \({\mathscr{M}}^{L}_{GPR}\). Let \(\mathbf {gpr^{d}}=\frac {1}{d}\mathbf {gpr}\) be degree-normalized version of the gpr. We normalize the gprd with its mean and use the Sigmoid operator to activate it, i.e.,

$$c=\sigma\left( \mathbf{gpr^{d}}-\frac{1}{n}\sum\limits_{i=1}^{n}\mathbf{gpr^{d}}(v_{i})\right)$$
(6)

and make it the approximate clustering result.

Loss

We get the final supervision manner for LearnedNibble by putting everything together. We compute the conductance in the Raleigh quotient on the result output by soft-mean-sweep with the matrix operation, view it as the approximate reflection on the clustering capacity provided by the \({\mathscr{M}}^{L}_{GPR}\), and set it as the supervising signal (a.k.a the loss) of the learning framework, i.e.,

$$\begin{aligned} \psi&=\frac{c^{\mathrm{T}}(\mathbf{D}-\mathbf{W})c}{\min\left( c_{A}^{\mathrm{T}}\mathbf{D}c_{A},c_{\overline{A}}^{\mathrm{T}}\mathbf{D}c_{\overline{A}}\right)}, \\ c_{A}&= 0.5\cdot (1+c), c_{\overline{A}}= 1-c_{A}. \end{aligned}$$
(7)

3.4 Optimization mechanism

With the supervision manner and loss function in hands, the most important thing is using the supervising signal to guide the training process. Several optimizers have shown their capacities to be the appropriate engine of diverse learning tasks, among which the Adam [58] is the most widely-used one. Though being successful in plentiful circumstances, the Adam does not work well as expected in our graph local clustering task. It always misses the better solutions and sometimes keeps the wrong direction for a long time. We suppose one reasonable explanation of this wired situation is that the graph clustering task naturally has many local optimums who are close to the best one, which makes the Adam strapped and misled.

Regradient

To mitigate the problem, we propose the regradient technique to make the Adam optimizer focus more on the current step and avoid being affected by the former gradients.

Definition 9

(regradient) With a parameter r which controls the restart frequency, we reset the Adam optimizer every r epoch by clearing its accumulated gradients. Without losing the generality, we fix the r to be 10 in this work.

We will present the effectiveness of the regradient technique with an ablation experiment in Appendix 1.

3.5 Inference manner

The last but not least thing is obtaining the model from the training process and using it to do the inference. In our LearnedNibble framework, the model is the measuring method \({\mathscr{M}}^{L}_{GPR}\) with learned GPR weight parameters, and the inference result is the clustering node set.

Most machine learning tasks obtain the final trained model with the convergence and the early-stop technique. As for our graph local clustering task, it is unnecessary and unfair to ask the model to get converged. The reasons are twofold. 1) As mentioned in the supervision manner in Section 3.3, we use the lower-bound of the clustering capacity to guide the training process, and there may be a gap between the performance reported by the loss and the actual ability of the model. 2) Though we aim to avoid searching the massive possible cases, we still share the same solution space as the former combinatorial optimization problem. Thus, we propose the search-select manner to obtain the model from the LearnedNibble.

Definition 10

(search-select) Given the L-hop transition probability sequence π and training process of the LearnedNibble with T epochs, i.e., \(\mathcal {R}^{T}=\left \{{\mathscr{M}}_{1},...,{\mathscr{M}}_{T}\right \}\), where \({\mathscr{M}}_{i}\) is the measuring method with trained weight vector wi, i.e., \({\mathscr{M}}_{i}(\pi )=w_{i}\times \pi\). We compute the exact clustering capacity Φi of \({\mathscr{M}}_{i}\) by conducting the standard sweep operation on the measure result of each \({\mathscr{M}}_{i}\) as described in Algorithm 1. We select the \({\mathscr{M}}_{*}\) with the best Φ as the final model. We use the \({\mathscr{M}}_{*}\) obtained from the training process \(\mathcal {R}^{T}\) on graph G to answer the query of any seed node sG.

3.6 Framework overview

We present our LearnedNibble framework in this part with Algorithm 2 and Algorithm 3. The Initialization module which has not been mentioned is described in Appendix 1.

figure b
figure c

4 Experiments

This section evaluates the performance of our LearnedNibble in three aspects: 1) the clustering capacity concerning conductance optimization, 2) the generalization ability from training seed nodes to the whole graph, 3) and the compatibility with the approximation. We report the key results here and discuss additional results in Appendix 1.

Datasets

We conduct our experiments on commonly-used benchmark graphs with ground-truth labels, including DBLP, Amazon, PubMed, CiteSeer and Cora. The statistics of the datasets are listed in Appendix 1 with Table 4.

Metric

We use the conductance as the evaluation metric and the optimization target of our framework since the information recovery metrics could conflict with our optimization purpose. We investigate the conductance of the ground-truth clusters in Figure 2 in Appendix 1.

Competitors

We set the existing GPR instances with different specific weighting paradigm, like PPR, HKPR and IPR, as part of our competitors. Another competitor is the MEAN weighting operation since the result proposed by any method should be better than this trivial one. The last competitor we set for our LearnedNibble is the most recent GSO [53] as for its applicability for all graph optimization tasks. The considerations and the comparison methods are presented in Appendix 1.

4.1 Training settings

Training data

We select 5 seed nodes from different clusters whose size is larger than 100 randomly from each graph to form the training seed node sets. We set the propagation range L = 50. We vary the approximation parameter 𝜖 in [0,10− 4,10− 5,10− 6]. 𝜖 = 0 means we use the exact L-hop transition probability sequence rooted at the seed node.

Initialization method

We use the RAW weighting vector, which is a one-hot vector with the seed node index non-zero as the initial weight for the training process since the other initialization methods are our competitors. The analysis of the initialization sensitivity is presented in Appendix 1

Training method

We set the training budget T = 2,000, the regradient step e = 10 and the learning rate lr = 1.

4.2 Clustering capacity

This section investigates the clustering capacity of LearnedNibble with no approximation. The results are presented with Table 2. It is surprising to see that the trivial MEAN beats all GPR instances in all datasets. Moreover, our LearnedNibble shows much better performance compared with all competitors. The GSO seems to learn nothing from the training process, and we will omit it for the following comparisons. The specific settings and detailed results are presented in Appendix 1.

Table 2 Clustering capacity

4.3 Generalization ability

We present the generalization ability of LearnedNibble with no approximation in this part by reporting basic statistics of the conductances in test samples. The results are listed in Table 3. First, the final model obtained from LearnedNibble is useful as it gets even better performances when transferred to other nodes, no matter within the cluster or the whole graph. See the last column of Table 3. Then, the mean and the std. columns prove that the model achieves a satisfying performance wvector in our situatioith having strong confidence to find a cluster with relatively small conductance. We talk about some other interesting observations in Section 5.

Table 3 Generalization ability

4.4 Approximation compatibility

The approximation compatibility of LearnedNibble is presented in this section with four different approximation levels. Figure 1 combines all information together. We report 4 results of total 10 results(5 datasets and 2 generalizations for each) for the convenience sake, where the ∗ represents the in-cluster generalization results. The other results are presented in Appendix 1. The x-axis is the approximation level with parameter 𝜖. The y-axis is the conductance value. The thick horizontal line is the training results. The boxplot represents the transferring results. Though the clustering capacity and the generalization ability of the LearnedNibble would be weakened with the approximation level up, it is still rather satisfying for the most time since the conductance values are rather small with variance well-bounded.

Figure 1
figure 1

Capacity and ability with approximation. (a) DBLP* (b) Amazon* (c) CiteSeer (d) Cora

5 Discussions

Though the experiments result in Section 4.3 and Appendix 1 shows positive evidence for sharing the similar clustering method for all nodes on the same graph, some weird but interesting phenomena have got our attention. 1) The in-cluster generalization performance is worse than the in-graph one in PubMed and Cora. 2) The performance gap between the in-cluster and in-graph is large in DBLP and Amazon. 3) We can get better results for some nodes even not been optimized in nearly all situations. These observations get us to consider the basis of the generalization and the information attached to the graph.

Topology consistency

The most critical assumption we should have to transfer one model to other situations is the consistency. As for graph clustering tasks concerning the topology structure optimization metric like conductance, the topology consistency of different parts of graph should be evaluated and checked firstly, which is another interesting question.

Data representativeness

As described in Section 4.1, we only use 5 randomly chosen nodes as our training data, which may not be so representative for the graph. Thus, selecting suitable nodes as training data may be a fundamental problem.

Information-topology compatibility

Recall that we have pointed out the conflict between topology-based metrics and information-based metrics in Section 1 and Appendix 1, the information attached to the graph as labels or other context on nodes or edges may provide somewhat different and independent information compared to the topology. As a result, the generalization in the so-called same cluster could be meaningless and even more challenging. Undoubtedly, taking advantage of both topology and information of graph is one of the most crucial but challenging problems in the graph mining area.

6 Conclusions

In this paper, we take in-depth research on the graph local clustering task and propose a novel learning-based framework LearnedNibble by solving a series of non-trivial challenges. To the best of our knowledge, LearnedNibble is the first one to take responsibility for the cluster quality and take both the effectiveness and efficiency into consideration in an End-to-End paradigm with self-supervised manner. Our experiments demonstrate that the clustering capacity of L-hop transition probability sequence is under-estimated with only using the fixed weighting structures and parameters to assemble, and can be taken better advantage by our LearnedNibble framework. Besides the performance improvements on the cluster quality, our framework shows great generalization ability and approximation compatibility, making itself practical in many situations.