Joint Use of Node Attributes and Proximity for Node Classification

Merchant, Arpit; Mathioudakis, Michael

doi:10.1007/978-3-030-93413-2_43

Arpit Merchant⁸ &
Michael Mathioudakis⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1073))

Included in the following conference series:

International Conference on Complex Networks and Their Applications

3688 Accesses
1 Citations

Abstract

Node classification aims to infer unknown node labels from known labels and other node attributes. Standard approaches for this task assume homophily, whereby a node’s label is predicted from the labels of other nodes nearby in the network. However, there are also cases of networks where labels are better predicted from the individual attributes of each node rather than the labels of nearby nodes. Ideally, node classification methods should flexibly adapt to a range of settings wherein unknown labels are predicted either from labels of nearby nodes, or individual node attributes, or partly both. In this paper, we propose a principled approach, JANE, based on a generative probabilistic model that jointly weighs the role of attributes and node proximity via embeddings in predicting labels. Experiments on multiple network datasets demonstrate that JANE exhibits the desired combination of versatility and competitive performance compared to baselines.

Access provided by Autonomous University of Puebla. Download conference paper PDF

High-order proximity and relation analysis for cross-network heterogeneous node classification

Article 19 June 2024

SINE: Side Information Network Embedding

Enhancing Network Embedding with Auxiliary Information: An Explicit Matrix Factorization Perspective

Keywords

1 Introduction

Node classification involves a network with known graph structure, where each node is associated with a set of known attributes, as well as a label (or ‘class’) which is known only for some of the nodes and unknown for others. The task is to predict the unknown labels given all the known information about the network. This task finds application in many complex network settingsd such as information networks [21], complex systems [30], and protein function identification [5].

Previous Work. Motivated by theories of homophily [15] and social influence [14], a common assumption is that adjacent nodes tend to have similar labels. For instance, in a social network, friends may be likely to vote for the same political party. Methods for node classification that rely on this assumption typically enforce homophily by assigning the same label to proximate (i.e., nearby) nodes. In label propagation, for example, labels diffuse from labeled nodes to their unlabeled neighbors in an iterative manner until convergence [29]. Other methods induce label uniformity within cuts or clusters of the network [1, 2, 9]; or consider node proximity in a latent space that preserves network distances, as in DeepWalk [17] and similar matrix factorization methods [18].

However, the aforementioned methods ignore other node attributes, which can be detrimental. For example, Hamilton et al. [5] show that, for certain predictive tasks on citation and social networks, a linear classifier that is built only on node attributes outperforms methods such as DeepWalk that are based on node proximity but ignore node attributes. Similar cases are reported elsewhere in the literature [13, 20]. Therefore, it is important to appropriately leverage both the node proximity and node attributes, for label prediction.

Partially addressing this limitation, AANE [7] and DANE [4] combine low-dimensional encodings of node attributes with network-distance-preserving node embeddings, and use them as input for label prediction. However, they do not account for known labels during training, thus potentially ignoring information that would be useful in predicting the unknown labels. LANE [8] overcomes this limitation, by learning joint latent representations of node attributes, proximity, and labels. However, LANE does not directly address the node classification task, i.e., it does not optimize the conditional probability distribution of node labels given the node attributes and network structure, but rather targets their joint distribution of all quantities.

Separately, graph convolutional networks (e.g., GCN [26], GAT [23], GraphSAGE [5]) use network topology for low-pass filtering on node attributes. However, these convolutions are equivalent to repeated smoothing over the node attributes and performance quickly degrades [12]. Subsequent approaches such as DiffPool [28] have sought to address this limitation, but these too aggressively enforce homophily and require that nodes with the same labels have similar representations. Recent work by AM-GCN [25] attempts to weaken this assumption by analyzing the fusing capabilities of convolutional models. They define two modules, one each for the topology space and attribute space, and adaptively combine them using an attention mechanism.

Our Contribution. We develop an approach to node classification that flexibly adapts to a range of settings, from cases where node labels exhibit strong homophily (i.e., a node’s label can be predicted from the labels of proximate nodes) to cases where labels are solely predicted from attributes of the node, as well as cases that lie between these two extremes. Unlike aforementioned approaches that are heavily based on label homophily (e.g., label propagation), the proposed approach is based on a generative probabilistic model that jointly captures the role of network structure and node similarity in predicting labels. Our analysis leads to JANE, an algorithm that identifies a maximum-likelihood node labels. Moreover, unlike AANE [7] and DANE [4], JANE learns a low-dimensional node representation informed by labels, that is then used for prediction. And unlike LANE [8], it directly optimizes the conditional probability of unknown labels given the known information about the network. We empirically validate JANE ’s performance on real datasets and compare to standard baselines.

2 Problem Setting

Let us consider a network with undirected and connected graph structure $\mathcal {G}= \left( \mathcal {V}, \mathcal {E}\right) $ of node size $|\mathcal {V}| = n$. Let ${\mathbf {A}}$ be the adjacency matrix, ${\mathbf {D}} $ the (diagonal) degree matrix, and ${\mathbf {L}} = {\mathbf {D}}- {\mathbf {A}} $ the unnormalized Laplacian. Let ${\mathbf {e}}_{i}\left( {\mathbf {L}} \right) $ be the i-th spectral eigenvector, corresponding to the i-th smallest eigenvalue of ${\mathbf {L}}$.

Each node j is associated with: d observed attributes ${\mathbf {x}}_{j} \in \mathbb {R}^{d}$, k latent attributes ${\mathbf {u}}_{j} \in \mathbb {R}^{k}$, and one (1) possibly unobserved label ${\mathbf {y}}_{j} \in \left\{ 1,2,\ldots ,M\right\} $ as label. To refer to the attributes and labels of all nodes, we’ll be using the notation ${\mathbf {X}} = \left\{ {\mathbf {x}}_{j} \right\} _{j = 1\ldots n}$, ${\mathbf {U}} = \left\{ {\mathbf {u}}_{j} \right\} _{j = 1\ldots n}$, and ${\mathbf {Y}} = \left\{ {\mathbf {y}}_{j} \right\} _{j = 1\ldots n}$, correspondingly.

Node classification is formalized as Problem 1.

Problem 1 (Node-Classification)

Given adjacency matrix ${\mathbf {A}} $, node attributes ${\mathbf {X}} $, and labels ${\mathbf {Y}} _{L}$ for a subset $L\subseteq \mathcal {V}$ of nodes, predict labels ${\mathbf {Y}} _{\mathcal {V}-L}$ for the remaining nodes ${\mathcal {V}-L}$ in the network.

3 Our Approach

Following a principled probabilistic approach for Problem 1 we make use of an appropriate generative model (defined in Sect. 3.1) to derive JANE, our node classification method (Sect. 3.2).

3.1 Model

Figure 1 illustrates our generative model. First, to model homophily, we assume that two nodes will be more likely to be adjacent in ${\mathbf {A}}$ if they are similar in terms of some attributes. Formally, we assume that the adjacency matrix ${\mathbf {A}}$ is generated from latent attributes ${\mathbf {U}} $. Specifically, the probability that there exists an edge between two nodes i and j is decreases with their distance in terms of ${\mathbf {u}}_{} $ according to the following formula, for some scaling parameter $s ^2$.

$$\begin{aligned} \text {Pr}\left[ ~ \left( i,j\right) \in \mathcal {E}~ | ~ {\mathbf {u}}_{i}, {\mathbf {u}}_{j} ~\right] = p_{ij} = {e}^{\frac{-\left\Vert {u}_{i} - {u}_{j}\right\Vert ^2}{s ^2}}\end{aligned}$$

(1)

In light of the above, U represents a low-dimensional Euclidean embedding of the nodes that preserves connectivity in the form of Eq. 1.

Second, to build the desired flexibility into our approach, we assume that labels ${\mathbf {Y}}$ may be determined both from node attributes ${\mathbf {X}}$ (directly) and node proximity as expressed directly via ${\mathbf {U}}$ and thus indirectly via adjacency ${\mathbf {A}}$. Formally, we assume that this conditional probability of is given by a simple two-layer neural network,

$$\begin{aligned} \text {Pr}\left[ {\mathbf {Y}} |{\mathbf {X}},{\mathbf {U}},\mathbf {W} \right] = \sigma (\text {ReLU} (\left[ {\mathbf {X}} {\mathbf {U}} \right] {W}^{\left( 0\right) }) {W}^{\left( 1\right) } ) \end{aligned}$$

(2)

where $\sigma $ denotes the softmax function and weight matrices $\mathbf {W} = \{{W}^{\left( 0\right) }, {W}^{\left( 1\right) } \}$ are parameters that control the effect of ${\mathbf {X}} $ and ${\mathbf {U}} $ on labels ${\mathbf {Y}}$. We make this choice because we found this model to be sufficiently expressive for our empirical evaluation – however note that other models (e.g., neural networks with more hidden layers) could also be used in Eq. 2.

3.2 Algorithms

Problem 1 asks for predictions for ${\mathbf {Y}} _{\mathcal {V}-L}$ given the data $\mathcal {D} = ({\mathbf {X}}, {\mathbf {A}}, {\mathbf {Y}} _L)$ that are provided as input. Towards this end, we treat the remaining quantities in the model, i.e., the latent variables ${\mathbf {U}}$ and the weights of the neural network $\mathbf {W}$, as unobserved parameters $\mathcal {\theta }$ = (${\mathbf {U}}$, $\mathbf {W}$) – and use their maximum lilekihood values $\hat{{\mathbf {U}}}$ and $\hat{\mathbf {W}}$ in making the predictions. In summary, JANE consists of two algorithms: first, a training algorithm from which we learn the maximum likelihood estimates $\hat{{\mathbf {U}}}$ and $\hat{\mathbf {W}}$; second, a prediction algorithm, in which we use the learned parameter values to predict the missing labels. We provide the algorithms below.

Training. The training algorithm is defined in Algorithm 1. In summary, it uses gradient descent on negative log-likelihood ($-\log \mathcal {L}$) to iteratively update $\hat{{\mathbf {U}}}$ and $\hat{\mathbf {W}}$ towards their maximum-likelihood estimates. The initial value of $\hat{{\mathbf {U}}}$ is given by the first spectral vectors of the network. Due to space constraints, the analysis that justifies this initialization and the detailed gradient calculations are omitted here but provided in the extended version [?].

Prediction. Given maximum-likelihood estimates $\hat{{\mathbf {U}}}$ and $\hat{\mathbf {W}}$, JANE invokes Eq. 2 with ${\mathbf {U}} =\hat{{\mathbf {U}}} $ and ${W}^{\left( =\right) } \hat{\mathbf {W}} $ for each node with unknown label, and predicts for it the most likely label according to Eq. 2.

4 Experiments

Baselines. We evaluate JANE against a range of baseline methods.

JANE and variants: JANE-NU wherein we do not update the initial estimate of $\hat{{\mathbf {U}}} $ during training, and JANE-R where the initial estimate of ${\mathbf {U}}$ is a random matrix.
Methods based on network structure: Label Propagation (LP) [29], and DeepWalk (DW) [17] that encodes neighbourhood information via truncated random walks. These do not incorporate node attributes.
Deep attributed embeddings: We evaluate LANE [8] which constructs node embeddings that encode network structure and node attribute information, in addition to node labels.
Methods based on graph-convolutions: GCN [10] and GraphSAGE (mean aggregator) as representatives of graph convolutional networks. We acknowledge that this is extremely active area of research today and there are several methods that demonstrate improved performance along axes such as training efficiency [3], explainability [26], etc. on various real-world datasets [6]. For economy of space, we empirically compare with two benchmark, representative methods to demonstrate our central point: these methods sometimes fail because they strictly and inherently enforce homophily.

We were unable to reproduce the node classification results for AANE [7] based on the available implementation^{Footnote 1}. Further, we could not locate the authors’ implementations for DANE [4]. Therefore, we do not report results for them.

Experimental Setup. We implement JANE, JANE-NU, and LP in Pytorch. We use out-of-box Pytorch implementations of DeepWalk^{Footnote 2}, GCN^{Footnote 3} and GraphSAGE^{Footnote 4}. And, we use a MATLAB implementation^{Footnote 5} of LANE [4]. In all of our experiments, as is standard, all methods receive only the adjacency matrix ${\mathbf {A}} $ and the node attributes ${\mathbf {X}} $ as input, along with the same 10% and 20% of the node labels for training and validation, respectively. Wherever available, we use the default hyperparamater configurations as suggested in the original papers. Otherwise, we grid search over the hyperparameter space to find the best setting for all of our baselines. We perform all experiments on a Linux machine with 4 cores and 32 GB RAM.

Reproducibility. To aid further research, we make our code publicly available^{Footnote 6}. This also includes an implementation for constructing synthetic datasets as described below. The real-world datasets (cf. Section 4.2) are publicly available.

4.1 Node Classification on Synthetic Data

The goal of these experiments is to demonstrate differences between JANE and existing classification methods.

Synthetic Datasets. Figure 2 describes representative synthetic datasets generated according to the framework described previously. We set the number of individual node attributes $|{\mathbf {X}} | = d = 2$ and number of latent attributes $|{\mathbf {U}} | = k = 2$. We generate these attributes (gaussian-distributed) for $n=200$ points, each of which belongs to one of $M=4$ classes and set the scale $s ^2 = 1$. An influence parameter, $\alpha \in \left[ 0, 1\right] $, controls the degree to which node labels derive from ${\mathbf {X}}$ or ${\mathbf {U}}$: $\alpha = 0.0$ signifies that they derive only from ${\mathbf {U}} $ and are independent of ${\mathbf {X}} $; $\alpha = 1.0$ that they derive only from ${\mathbf {X}} $ and are independent of ${\mathbf {U}} $; and $\alpha = 0.5$ that they derive equally from ${\mathbf {X}}$ and ${\mathbf {U}}$ (specifically, without loss of generality, only the first attribute from ${\mathbf {X}} $ and ${\mathbf {U}} $ contributes to label assignment). Figure 2a depicts instances of synthetic networks corresponding to $\alpha = \left\{ 0.0, 0.5, 1.0\right\} $. The colors of points represent classes.

Implementation Details. We use Scikit-Learn’s [16] make_classification to generate these datasets. Classification algorithms do not have access to $\alpha $ or ${\mathbf {U}} $. JANE is trained as a two-layer neural network for a maximum of $T=200$ epochs with dropout of 0.2 for each layer, weight decay of $5e^{-2}$, and learning rate of 0.005 using Adam. We set the number of eigenvectors $k=2$ and choose a scaling factor $s ^2=0.01$.

Performance. Figure 2b shows performance as a function of training set size.

$\alpha = 0.0$: LP and DW infer that labels derive from ${\mathbf {A}} $ (indirectly). GCN converges attribute values of nodes in the same cluster but is not perfectly accurate because ${\mathbf {X}} $ does not correlate with ${\mathbf {Y}} $. LANE forces the proximity representation to be similar to the attribute representation and then smoothens it using the labels. It does not perform well since there is no correlation between them.
$\alpha = 0.5$: LP, DW are able to correctly classify nodes belong to 2 out of 4 classes, i.e. precisely those nodes whose labels are influenced by ${\mathbf {U}} $. Conversely, LANE is able to classify those nodes belong to two classes of nodes that correlate with ${\mathbf {X}} $. GCN smoothens attribute values of adjacent nodes and thus can correctly infer labels correlated with ${\mathbf {X}} $.
$\alpha = 1.0$ LP and DW reduce to random classifiers since adjacent nodes do not have similar labels. GCN reduces to a nearly random classifier because by forcing adjcent nodes with different attribute values to become similar, it destroys the correlation between ${\mathbf {X}} $ and the labels.

In all cases, JANE-NU and JANE achieve perfect accuracy, as they flexibly learn whether labels are predicted from ${\mathbf {X}} $, ${\mathbf {A}} $ (indirectly), or both. While these datasets are simplistic, this demonstrates how important this flexibility is.

4.2 Node Classification on Real-World Data

Datasets. We show our results on four tasks for a total of 9 networks (see Table 1 for basic statistics). If the original network is diconnected, we extract node attributes and labels belonging to its largest connected component.

Citation Networks: We use Cora, Citeseer, Pubmed [21], and UAI2010 [24]. Here, nodes represent academic papers, edges denote a citation between two nodes, node attributes are 0/1-valued sparse bag-of-words vectors and class labels denote the subfield of research that the papers belong to.
Social Networks: We focus on BlogCatalog and Flickr where the task is to predict pre-defined categories of blogs and images, respectively. Nodes are users that post content, edges represent follower relationships, and attributes are specified by a list of tags reflecting the interests of the users [11].
Air-traffic Networks: Based on flight records from Brazil, Europe, and USA, each node is an airport and an edge indicates a commercial airline route exists between them. Labels denote the level of activity in terms of people and flights passing through an airport [19]. Since no attributes for the nodes exist, we assign the all-ones dummy vector as the sole attribute.
Biological Networks: We use a processed protein-protein interaction (PPI) dataset [5] where the task is to identify protein roles based on gene ontology sets using positional gene sets, motif gene sets, and immunological signatures as attributes [22].

Table 1. Summary of dataset statistics.

Full size table

Experimental Setup. for the citation datasets, we use the same train-validation-test splits as in Yang, et al. [27] minus the nodes which do not belong to the largest connected component. These comprise of 20 samples for each class and represent 5% of the entire dataset. We use 500 additional samples as a validation set for hyperparameter optimization as per Kipf, et al. [10] to enable fair comparison. For all other tasks, we use 10% and 20% of the dataset for training and validation, respectively. We evaluate the performance of all methods on the remaining nodes. Values of hyperparameters k, the number of eigenvectors of the Laplacian, and the scaling factor $s $ are determined so as to minimize the likelihood function (details omitted).

Table 2. Classification accuracy (%) on test data averaged over 10 independent runs of $T=200$ epochs each. Bold denotes best average accuracy or overlapping with best accuracy range (within $\pm 0.3$ standard deviation). JANE-NU and JANE perform as well as or better than LANE, GCN and GraphSAGE on most datasets and consistently well across datasets.

Full size table

Performance Analysis. Table 2 provides the average accuracy of each method over 10 independent runs for $T=200$ epochs each for GCN, GraphSAGE, LANE, and JANE. Values in bold denote that the method either performs best or its accuracy range overlaps with that of the best. JANE-NU and JANE consistently outperform LP and DW on all datasets by significant margins. JANE-R on the other hand, has significantly lower performance either because it gets stuck in a local optima or it requires much longer training time. This shows that the choice of the initial estimate is crucial to performance. The embedding obtained from DW is unsupervised and thus its predictive power is limited in comparison to our $\hat{{\mathbf {U}}} $ which is label-informed. Both variants of JANE are competitive with GCN and GraphSAGE on Cora, Citeseer, and UAI2020. For instance, GraphSAGE achieves 79.70% accuracy on Cora, while JANE gets 79.24% which is within the margin of error. We observe strong performance by JANE-NU and JANE on Pubmed and PPI. This may be explained in part by Li, et al. [12]’s observation that Pubmed exhibits strong manifold structure as well as JANE ’s ability to better utilize the adjacency information. Lastly, we find significant gains on the social, and air-traffic datasets. LANE [8] report strong performance on Blogcatalog and Flickr, but its performance is noticeably poorer on citation, flights, and biological datasets.

Runtime. Figure 3 plots the average time for a single training epoch for JANE-NU, JANE, GCN, and GraphSAGE on various datasets. JANE is comparatively the slowest model while JANE-NU is by far the fastest. This is because JANE-NU is a vanilla neural network model that does not update its estimate of $\hat{{\mathbf {U}}} $. Further, a single laplacian eigenvector can be approximately computed using the Lanczos algorithm in $\tilde{\mathcal {O}}\left( |\mathcal {E}|\right) $ (up to log factors). Thus k eigenvectors can be computed in $\tilde{\mathcal {O}}\left( k \cdot |\mathcal {E}|\right) $ and this is a one-time operation. Note, we do not need to perform a full eigendecomposition.

Parameter Sensitivity. A crucial parameter of JANE is the number of latent attributes ${\mathbf {U}} $ during training. Figure 4 demonstrates their impact on performance. In each case, JANE is allowed a maximum of 200 training epochs. We find that test accuracy consistently increases as dimension of $\hat{{\mathbf {U}}} $ increases up until a certain threshold. However, increasing beyond this threshold introduces noise and reduces performances. Having too many latent attributes gives marginal and diminishing returns in performance while increasing the runtime.

Limitations. The primary limitation of JANE is the time and memory requirement for computing the gradient of ${\mathbf {A}} $ w.r.t. $\hat{{\mathbf {U}}} $. These requirements grow linearly in network size. Since the gradient is computed in every training epoch, it may not be viable to fit it into GPU memory. Future work can outline procedures for mini-batch computation. However, note that JANE-NU is orders of magnitude faster than the other methods while also demonstrating strong performance.

5 Conclusion

We demonstrated an approach to node classification that flexibly adapts to settings where proximity and individual attributes have varying importance in predicting labels. Given its simplicity, interpretability and performance, JANE can serve as a useful starting point in designing models that holistically account for different sources of node labels. As a future direction, we aim to the evaluate the design and performance of advanced graph neural networks that go beyond requiring or enforcing homophily.

Notes

References

Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
MathSciNet MATH Google Scholar
Blum, A., Lafferty, J., Rwebangira, M.R., Reddy, R.: Semi-supervised learning using randomized mincuts. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 13. ACM (2004)
Google Scholar
Cong, W., Forsati, R., Kandemir, M., Mahdavi, M.: Minimal variance sampling with provable guarantees for fast training of graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1393–1403 (2020)
Google Scholar
Gao, H., Huang, H.: Deep attributed network embedding. In: IJCAI (2018)
Google Scholar
Hamilton, W.L., Ying, R., Leskovec, J.: Inductive representation learning on large graphs. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1025–1035 (2017)
Google Scholar
Hu, W., et al.: Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687 (2020)
Huang, X., Li, J., Hu, X.: Accelerated attributed network embedding. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 633–641. SIAM (2017)
Google Scholar
Huang, X., Li, J., Hu, X.: Label informed attributed network embedding. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 731–739 (2017)
Google Scholar
Joachims, T.: Transductive learning via spectral graph partitioning. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 290–297 (2003)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Li, J., Hu, X., Tang, J., Liu, H.: Unsupervised streaming feature selection in social media. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (2015)
Google Scholar
Li, Q., Han, Z., Wu, X.M.: Deeper insights into graph convolutional networks for semi-supervised learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Lorrain, F., White, H.C.: Structural equivalence of individuals in social networks. J. Math. Soc. 1(1), 49–80 (1971)
Article Google Scholar
Marsden, P.V., Friedkin, N.E.: Network studies of social influence. Sociol. Methods Res. 22(1), 127–151 (1993)
Article Google Scholar
McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: homophily in social networks. Ann. Rev. Soc. 27(1), 415–444 (2001)
Article Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Google Scholar
Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., Tang, J.: Network embedding as matrix factorization: unifying deepwalk, Line, PTE, and node2vec. In: Proceedings of the eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467 (2018)
Google Scholar
Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: struc2vec: learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394. ACM (2017)
Google Scholar
Sailer, L.D.: Structural equivalence: meaning and definition, computation and application. Soc. Netw. 1(1), 73–90 (1978)
Article Google Scholar
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collective classification in network data. AI Mag. 29(3), 93–93 (2008)
Google Scholar
Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102(43), 15545–15550 (2005)
Article Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Wang, W., Liu, X., Jiao, P., Chen, X., Jin, D.: A unified weakly supervised framework for community detection and semantic matching. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 218–230. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93040-4_18
Chapter Google Scholar
Wang, X., Zhu, M., Bo, D., Cui, P., Shi, C., Pei, J.: AM-GCN: adaptive multi-channel graph convolutional networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1243–1253 (2020)
Google Scholar
Wu, J., He, J., Xu, J.: Net: Degree-specific graph neural networks for node and graph classification. arXiv preprint arXiv:1906.02319 (2019)
Yang, Z., Cohen, W., Salakhudinov, R.: Revisiting semi-supervised learning with graph embeddings. In: International Conference on Machine Learning, pp. 40–48. PMLR (2016)
Google Scholar
Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., Leskovec, J.: Hierarchical graph representation learning with differentiable pooling. In: NeurIPS, pp. 4800–4810 (2018)
Google Scholar
Zhou, D., Huang, J., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on Machine learning - ICML 2005, pp. 1036–1043. ACM Press, Bonn, Germany (2005). http://portal.acm.org/citation.cfm?doid=1102351.1102482
Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

University of Helsinki, Helsinki, Finland
Arpit Merchant & Michael Mathioudakis

Authors

Arpit Merchant
View author publications
You can also search for this author in PubMed Google Scholar
Michael Mathioudakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arpit Merchant .

Editor information

Editors and Affiliations

Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas Universidad Politécnica de Madrid, Universidad Politécnica de Madrid, Madrid, Madrid, Spain
Rosa Maria Benito
IUT Lumière, University of Lyon, Bron Cedex, France
Chantal Cherifi
LE2I, UFR Sciences et Techniques, University of Burgundy, Dijon, France
Hocine Cherifi
Grupo Interdisciplinar de Sistemas Compl, Universidad Carlos III de Madrid, Leganés, Madrid, Spain
Esteban Moro
Center for Complex Networks and Systems Research School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Luis M. Rocha
Department of Chemical Engineering, Universitat Rovira i Virgili, Tarragona, Tarragona, Spain
Marta Sales-Pardo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Merchant, A., Mathioudakis, M. (2022). Joint Use of Node Attributes and Proximity for Node Classification. In: Benito, R.M., Cherifi, C., Cherifi, H., Moro, E., Rocha, L.M., Sales-Pardo, M. (eds) Complex Networks & Their Applications X. COMPLEX NETWORKS 2021. Studies in Computational Intelligence, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-93413-2_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-93413-2_43
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93412-5
Online ISBN: 978-3-030-93413-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Joint Use of Node Attributes and Proximity for Node Classification

Abstract

Similar content being viewed by others