Keywords

1 Introduction

Graphs have found extensive applications across various research fields, including social network analysis [12], bioinformatics [13], recommendation systems [11], and more. Graphs are crucial in understanding user interactions, sentiment analysis, and community detection in social media mining. For example, consider a scenario where we aim to classify user’s sentiments towards a particular product or event on a social media platform. The graph can represent users as nodes and their connections as edges, capturing their relationships and interactions. By analyzing the structural properties of the graph, such as user connections, and incorporating node attributes like past sentiments or textual content, node classification algorithms can assign sentiment labels to new, unlabeled users. However, getting labeled data for node classification can take time and effort in real-world scenarios. Few-shot learning, a sub-field of machine learning, attempts to address this issue by creating a model using just a few examples. Few-shot learning has gained significant interest lately because of its capability to learn swiftly from a restricted amount of labeled data.

In recent years, meta-learning, also known as learning to learn, has emerged as a powerful technique for few-shot learning. Meta-learning involves training a model on a variety of tasks to learn a set of shared parameters that can be quickly adapted to new tasks with limited labeled data. In the context of graph node classification, meta-learning [5] has been used to train models that can quickly adapt to new graphs with a few labeled examples.

While meta-learning has demonstrated promising results in the field of few-shot node classification [14], most of the existing works have focused on the transductive setting, where the graph neural network (GNN) encoder is trained and evaluated on the same graph. The inductive setting, where the model is trained on a set of graphs and tested on a new, unseen graph, has received less attention in the few-shot learning community. Also, due to the message passing mechanism, where nodes exchange information with their neighboring nodes to update their own representations in GNNs, the inductive setting poses additional challenges compared to the transductive setting. Consider the example of sentiment analysis described before. In an inductive setting, we encounter new social media platforms or events where we need to classify user sentiment without access to the entire graph used during training. This reflects the reality of dealing with evolving social media platforms and ever-changing user dynamics.

Inductive few-shot learning allows us to train a model on a diverse set of graphs and test its performance on unseen graphs, mimicking the real-world scenario where we encounter novel contexts. This emphasizes the importance of studying and developing effective few-shot learning approaches in the inductive setting, enabling models to adapt and make accurate predictions in dynamic real-world environments. Therefore, this work aims to bridge this gap by providing a comprehensive study of meta-learning for few-shot node classification in the inductive setting. We empirically show that most current meta-learning frameworks cannot perform well in this setting. We propose to apply a straightforward yet effective baseline approach for inductive few-shot node classification tasks.

2 Related Work

In this section, we present an comprehensive review of the current literature concerning few-shot node classification and meta-learning, with a specific focus on the transductive setting.

2.1 Few-Shot Learning

Few-shot learning (FSL) is a machine learning paradigm that serves to address concerns of limited data by capitalizing on knowledge gained from previous training data. Some example of models that employ FSL are Model-Agnostic Meta-Learning (MAML), Prototypical Networks, and Meta-GNN.

MAML [2] tackles the few-shot learning problem by learning an optimal initialization of model parameters. It enables fast adaptation to new tasks with limited examples through a two-step process: an inner loop for task-specific updates and an outer loop for optimizing adaptation across tasks. By iteratively fine-tuning the parameters, MAML achieves effective generalization and enables efficient few-shot learning across various domains. Prototypical Networks [1] capture the essence of similarities and dissimilarities among instances through a metric-based approach by computing class prototypes based on support examples and using distance-based classification. This approach enables accurate classification in few-shot scenarios which over various domains offers a valuable approach to few-shot learning tasks. Meta-GNN [3] instead primarily addresses few-shot learning when provided with graph structured data. The model enhances the capability of GNNs to capture expressive node representations and effectively generalize to new classes or tasks with limited labeled data.

2.2 Meta Learning

In the context of few-shot node classification, meta-learning algorithms have been proposed to learn effective representations and update strategies for handling new, unseen classes with only a few labeled examples. Popular meta-learning algorithms for few-shot learning include GPN, G-Meta etc.

Graph Prototypical Network (GPN) [5, 18] introduces graph prototypes, learned through iterative aggregation with GNNs, as representative embeddings from the support set. By utilizing these prototypes, GPN achieves accurate few-shot classification by computing similarity scores between query nodes and prototypes. GPN’s incorporation of graph-level information and iterative aggregation enables effective generalization and robust few-shot classification on graph-structured data. G-Meta [4] combines subgraph extraction with GNNs to learn expressive node representations. It employs the MAML strategy to iteratively update and meta-update GNN parameters. This enables efficient adaptation to new tasks and improved classification on query nodes. Other models like AMM-GNN extend MAML with an attribute matching mechanism, and TENT reduces the variance among different meta-tasks for better generalization performance. Existing works primarily focus on transductive few-shot node classification, neglecting the widely studied inductive setting. We empirically evaluate meta-learning frameworks in the inductive setting to gain deeper insights into their performance on graphs.

3 Preliminaries

3.1 Problem Statement

The problem of few-shot node classification is concerned with attributed networks represented as \(G = (\mathcal {V}, \mathcal {E},X) = (A, X)\), where V is the set of nodes \(v_1, v_2, \ldots , v_n\), \(\mathcal {E}\) is the set of edges \(e_1, e_2, \ldots , e_m\) , \(X = [x_1; x_2; \ldots ; x_n] \in \mathbb {R}^{n \times d}\) is the matrix of node features, and \(A = \{0, 1\}^{n \times n}\) is the adjacency matrix representing the network structure. Each element in A is either 0 or 1, indicating the absence or presence of an edge between nodes. The task involves a series of node classification tasks \(T = {\{T_i\}}_{i=1}^I\), where \(T_i\) is a dataset for a particular task, and I is the number of such tasks. The classes of nodes available during training are referred to as base classes, while the classes during the target test phase are referred to as novel classes, and the intersection of the two sets is empty. Notably, under different settings, labels of nodes for training (i.e., \(C_{base}\)) may or may not be available during training. Conventionally, there are few labeled nodes for novel classes \(C_{novel}\) during the test phase.

Definition 1. Few-shot Node Classification (FSNC): Few-shot node classification refers to a problem in which an attributed graph \(G = (A,X)\) is given, with a label space C divided into two sets, \(C_{base}\) and \(C_{novel}\). The goal is to predict the labels of unlabeled nodes (query set Q) from \(C_{novel}\), given only a few labeled nodes (support set S) for \(C_{novel}\). If each task in the test set has N novel classes and K labeled nodes for each class, then this task is referred to as an N-way K-shot node classification problem.

Fig. 1.
figure 1

Transductive/Inductive Setting

Transductive Setting: In the transductive setting, the input graph is observed in all dataset splits, including the training, validation, and test sets (Fig. 1). The graph remains intact, and only the node labels are split for training and evaluation purposes. During training, embeddings are computed using the entire graph, and the model is trained using the labels of selected nodes (e.g., node 1 and node 2). During validation, embeddings are again computed using the entire graph, and the model’s performance is evaluated on the labels of other nodes (e.g., node 3 and node 4).

Inductive Setting: In the inductive setting, the graph is modified by breaking the edges between the dataset splits, resulting in different neighbor environments for nodes compared to the transductive setting (Fig. 1). For example, node 4 will no longer have an influence on the prediction of node 1. During training, embeddings are computed using the graph specific to the training split, such as the graph over node 1 and node 2. The model is trained using the labels of these selected nodes. During validation, embeddings are computed using the graph specific to the validation split, such as the graph over node 3 and node 4. The model’s performance is then evaluated on the labels of these respective nodes (node 3 and node 4). This will further lead to the change of message passing, making it harder for GNNs to learn generalizable knowledge [13].

3.2 Episodic Meta-Learning for FSNC

Episodic meta-learning has emerged as an effective paradigm for addressing few-shot learning tasks, garnering substantial attention [16, 17]. The underlying concept of episodic meta-learning involves training neural networks to mimic the evaluation conditions, which is believed to improve prediction performance on test tasks [16, 17]. This paradigm has been successfully extended to few-shot node classification in the graph domain, as demonstrated by recent works [5, 14, 18]. In the context of few-shot node classification, the training phase follows a specific procedure. Meta-train tasks or episodes, denoted as \(T_{tr}\), are generated from a base class set \(C_{base}\), to emulate the test tasks. These episodes adhere to N-way K-shot node classification specifications. Each episode, denoted as \(T_t\), comprises a support set \(S_t\), and a query set \(Q_t\), defined as follows:

$$\begin{aligned} \begin{aligned}&T_{tr} = \{T_t\}_{t=1}^\mathcal {T} = \{ T_1, T_2,...,T_\mathcal {T}\}, \\&T_t = \{S_t, Q_t\},\\&S_t = \{(v_1, y_1), (v_2, y_2), \ldots , (v_{N \times K}, y_{N \times K})\}, \\&Q_t = \{(v_1, y_1), (v_2, y_2), \ldots , (v_{N \times K}, y_{N \times K})\}. \\ \end{aligned} \end{aligned}$$
(1)

In a typical meta-learning method, within each episode, K labeled nodes are randomly sampled from N base classes to form the support set. This support set is then used to train a GNN model, simulating the N-way K-shot node classification scenario during the test phase. Subsequently, the GNN predicts labels for a query set, which comprises nodes randomly sampled from the same classes as the support set. The optimization process involves minimizing the Cross-Entropy Loss (\(L_{CE}\)) w.r.t. the GNN encoder \(g_\theta \) and the classifier \(f_\phi \):

(2)

Several approaches have been proposed based on this framework such as Meta-GNN [3], GPN [5], G-Meta [4] etc. Nevertheless, the evaluation of these methods has predominantly been conducted under transductive settings, neglecting the exploration of their performance in inductive settings.

3.3 Proposed Baseline

Our work is motivated by the Intransigent GNN model (I-GNN) introduced by a previous study [15, 19]. The I-GNN model proposes a straightforward approach for few-shot learning that relies on reusing features instead of using complex meta-learning algorithms to achieve fast adaptation. The authors show that the I-GNN model, despite its simplicity, can achieve competitive performance compared to meta-learning based approaches. In our study, we adapt the I-GNN model to the inductive setting and propose a simple yet effective baseline for inductive few-shot node classification tasks.

The I-GNN model is designed to be inflexible and unadaptable to new tasks. The training process of I-GNN is split into two phases. In the first phase, a GNN encoder (\(g_\theta \)) and a linear classifier (\(f_\phi \)) are pre-trained on all base classes (\(C_{base}\)) using vanilla supervision through the \(L_{CE}\) loss function. A weight-decay regularization term is also applied during this phase. In the second phase, the parameter of the GNN encoder is frozen, and the classifier is discarded. When fine-tuning on a target few-shot node classification task, the pretrained GNN encoder is used to directly transfer embeddings of all nodes from the task, and a new linear classifier (\(f_\psi \)) is involved and tuned with few-shot labeled nodes from the support set (\(S_i\)) to predict labels of nodes in the query set (\(Q_i\)).

(3)
(4)

4 Empirical Evaluation

4.1 Experimental Settings

In this research study, various methods for few-shot node classification are evaluated through systematic experiments under the inductive setting. These methods include ProtoNet [1], MAML [2], Meta-GNN [3], G-Meta [4], GPN [5], AMM-GNN [6], and TENT [7]. The performance of these methods is compared on five real-world graph datasets: CoraFull [8], Coauthor-CS [9], Amazon-Computer [9], Cora [10], and CiteSeer [10].

Table 1. Statistics of Benchmark Datasets

CoraFull, Coauthor-CS, Amazon-Computer, Cora, and CiteSeer are five prevalent real-world graph datasets, each consisting of multiple node classes for training and evaluation. These datasets include citation networks, co-authorship graphs, and co-purchase graphs, and the task is to predict the category of a certain publication or paper. The number of node classes used for training, development, and testing varies depending on the dataset. Table 1 describes the statistics of the datasets.

4.2 Evaluation Protocol

This section outlines the evaluation protocol used to compare the meta-learning methods. The node label space C of an graph dataset \(G = (A,X)\) is divided into \(\{C_{base}, C_{novel} \text { or } C_{test}\}\). \(C_{base}\) is split into \(C_{train}\) and \(C_{dev}\) (division strategy for each dataset are in Table 1). Evaluation is done by providing a GNN encoder g, a classifier, f, an epoch interval EI for validation, S sampled meta-tasks for evaluation, E epoch patience, M maximum epoch number, T experiment repeated times, and N-way K-shot, Q-query settings specification. The Algorithm 1 calculates the final FSNC accuracy \(\mathcal {A}\) and confident interval \(\mathcal{C}\mathcal{I}\). The default values of all the parameters are as follows, \(EI = 10; S=100; E=10; M=10000; T=5; N=\{2,5\}; K=\{1,3,5\}; Q=10\).

figure a

4.3 Comparison

In Table 2, the performance of different meta-learning methods and the proposed baseline is compared for few-shot node classification tasks. The comparison includes four distinct few-shot settings: 5-way 1-shot, 5-way 5-shot, 2-way 1-shot, and 2-way 5-shot, allowing for a comprehensive analysis. The evaluation metrics used are the average classification accuracy and the 95% confidence interval, which are computed based on multiple repetitions (T). Figure 2 presents the performance results of the CiteSeer dataset (similar trends observed in other datasets) for various N-way K-shot settings. The observations derived from the results are as follows:

Table 2. Few-shot node classification results of meta-learning methods and I-GNN. Accuracy (\(\uparrow \)) and Confidence Interval (\(\downarrow \)) are in %. The best and second best results are bold and underlined, respectively.
  • In the inductive setting, except for MAML and ProtoNet, meta-learning models exhibit a significant performance drop compared to the transductive setting. This decline is attributed to the challenges of generalizing knowledge from limited labeled examples to unseen data. In the transductive setting, models access the entire graph for predictions, while in the inductive setting, they must generalize to new nodes or graphs. Limited labeled data and the need for generalization contribute to lower performance in the inductive setting.

  • I-GNN shows superior performance in the inductive setting compared to the transductive setting for certain datasets like Cora, Citeseer, and CoraFull. This can be due to its ability to capture more transferable node embedding in the inductive setting.

  • The scores for both MAML and ProtoNet remain the same on all datasets because they do not utilize message-passing GNN in their approach. Since they do not leverage the graph structure and operate on a per-node basis, the performance drop observed in other meta-learning models under the inductive setting does not affect them in the same way. Therefore, their performance remains consistent between the transductive & inductive settings.

  • The I-GNN model outperforms the meta-learning-based methods under the inductive setting, particularly on datasets like Cora, CiteSeer and Corafull, while demonstrating competitive performance on other datasets. This can be attributed to the fact that meta-learning methods typically require a large number of samples to learn effectively.

Fig. 2.
figure 2

Meta-Learning, I-GNN with inductive and transductive (*)

4.4 Further Analysis

To make a direct comparison between the results of meta-learning methods and I-GNN, we present additional findings in Fig. 3 and Fig. 4, which showcase the performance of all methods across different N-way K-shot settings. By analyzing these results, we can draw the following conclusions.

Fig. 3.
figure 3

N-way K-shot results of CoraFull, Meta-Learning and I-GNN.

Fig. 4.
figure 4

N-way K-shot results of Cora and CiteSeer, Meta-Learning and I-GNN.

  • As N increases, the performance of all methods deteriorates due to the greater variety of classes within each meta-task. This increased complexity poses challenges for classification tasks, resulting in lower performance. Figure 3 demonstrates the impact of increasing N on the classification performance using the CoraFull dataset.

  • The performance improvement of the I-GNN method compared to meta-learning methods on the Cora dataset, as shown in Fig. 4, is notable due to its smaller number of classes, allowing I-GNN to leverage structural information for better generalization. The meta-learning methods struggle to effectively utilize the available supervision information during training.

5 Conclusion

In this paper, we investigate the performance of meta-learning methods in the inductive few-shot node classification tasks. While existing research primarily focused on the transductive setting, the inductive setting has received limited attention in the few-shot learning community. To bridge this gap, we conduct a comprehensive study of meta-learning for inductive few-shot node classification. Our empirical analysis reveals that most current meta-learning frameworks struggle in the inductive setting. To address this challenge, we propose applying a competitive baseline model called I-GNN. Experimental evaluations on five real-world datasets showcase the effectiveness of our proposed model. Our findings emphasize the need for further research in exploring the potential of meta-learning in the inductive setting, contributing to a more comprehensive understanding of few-shot node classification.