Keywords

1 Introduction

Graphs have a great capacity to model the relationship among entities, successfully applied in many fields, such as social network [10], finance analysis [15], and biological network [24]. Many academics are attempting to extend neural network models to graphs as a result of deep learning’s extraordinary performance. These neural network models, also known as graph network embedding, have emerged as a prominent method for graphs. The key idea of graph network embedding is to map node representation into a low-dimensional latent space, which preserves the similarity of nodes based on their local structure. These graph network embedding algorithms have been used by numerous academics for a variety of applications, including node classification, link prediction, and network visualization [1, 3, 5, 7, 22, 23, 27].

Although existing graph network embedding methods provide excellent performance, they are primarily developed for static graphs where nodes and edges remain unchanged over time. In most cases, however, networks behave as dynamic graphs in the actual world. For example, as new friendship contacts grow, new communication events such as emails and text messages are streamed on social networks. In e-commerce networks, new goods and ratings arise daily. In financial networks, transactions are streamed in computational finance, and supply chain relationships are always changing. In these dynamic graphs, nodes and edges are constantly evolving. The evolution trend of dynamic graphs can be recorded by a temporal sequence made up of a series of graph snapshots. Compared with static graphs, dynamic graphs have an additional dimension(i.e., the time dimension) that adds temporal dynamics to them. As a result, dynamic graph embedding is presented as a solution to the major issue of dynamic graphs, which is capturing temporal dynamics adequately.

Recently, several efforts, like DynmaicTraid [28], DynGEM [6], and TIMER [26], use some smoothness regularization to capture temporal dynamics. The premise behind these strategies is that dynamic graphs change slowly and thus they are unable to address dynamic graphs with abrupt changes. More recently, with the remarkable success of graph convolutional networks(GCN), some researchers focus on extending the GNNs to dynamic graphs by combining GCN with RNN components(e.g., LSTM or GRU), such as WD-GCN [14], EvolveGCN [17], and GANE [20]. However, these current GCN methods are designed for simple graphs that only represent pair-wise relationships among nodes. Thus, these GCN methods can not handle dynamic graphs composed of a series of simple graph snapshots independently, forcing them to resort to RNN components. Therefore, these approaches based on mixed architectures may break the internal link between topological information and temporal dynamics. Additionally, these methods only focus on capturing information from nodes and ignore edge information of graphs which is also an essential component of graph information.

To tackle the above issues, we propose a novel dynamic graph embedding framework, called DynHyper. First, we design a temporal hypergraph to model a dynamic graph that contains the characteristic of both local structure and temporal dynamics. Compared with simple graphs, hypergraphs can describe multiple relationships among nodes, and thereby we construct temporal hypergraphs to represent the correlation of nodes including local structure and temporal dynamics. To be specific, the main difference between simple graphs and hypergraphs lies in that hypergraphs contain hyperedges, which can connect an arbitrary number of nodes. Hence, we employ a hyperedge to connect nodes in the same time step, which is enclosed with the local structure information. Interactions between distinct hyperedges can indicate temporal interactions between nodes, allowing us to capture temporal dynamics in our model. In addition, edge information is an indispensable part of the graph. Therefore, we introduce a hyperedge projection for temporal hypergraphs to capture edge-level correlations of hypergraphs. The hyperedge projection aims to convert hyperedges to nodes, which preserves edge-level relationships of temporal hypergraphs and can integrate message-passing schemes for nodes. Finally, we propose the temporal edge-aware hypergraph convolution to operate message aggregation and transmission to update node embeddings on the temporal hypergraph. DynHyper’s effectiveness is demonstrated by experimental results on seven real-world datasets in link prediction and node classification tasks.

In a nutshell, our key contributions can be summarized as follows:

  • We introduce a temporal hypergraph construction to capture the local structure information and temporal dynamics simultaneously for dynamic graphs and a hyperedge projection to obtain edge-level relationships for temporal hypergraphs.

  • We propose a temporal edge-aware hypergraph convolutional network that can execute message passing in dynamic graphs autonomously and effectively without the need for RNN components.

  • We conduct our experiments on seven real-world datasets in link prediction and node classification tasks to evaluate the effectiveness of DynHyper. Our findings show superior predictive performance, compared to the state-of-the-art methods in dynamic graph embedding.

2 Related Work

Dynamic graph embedding plays a crucial role in network analysis, which aids in the advancement of many real-world applications such as social recommendation and protein structure prediction. Roughly, we classify them into three streams: random walk methods, autoencoder-based methods, and GNNs-based methods.

Random walk methods aim to apply random walks to generate node sequences and incrementally update the node embedding affected by temporal evolution [9, 13, 25]. For instance, dynnode2vec [13] employs the evolve random walks that only generate the walks for the changed nodes and proposes a dynamic skip-gram model, where the previous embedding is initialized as the weight for the next graph snapshot. For autoencoder-based methods, one seeks to minimize the reconstruct loss of a given graph snapshot. For example, DynGEM [6] proposes an incremental fully-connected network that can share the parameters between two consecutive networks to capture temporal evolution. The other aims to minimize the reconstruct loss between the previous graph snapshots and the future graph snapshot. For instance, dyngraph2vecAE [4] introduces the autoencoder network with the reconstruct loss between the adjacency mapped by the previous graph embeddings and the adjacency in the next time step.

Recently, the popular way to cope with dynamic graph embedding is to combine the GNNs model with temporal components(e.g., LSTM). GCRN-M1 [19] first employs graph convolution to obtain node embedding and then feed them into an RNN to capture temporal dynamics. The distinction between WD-GCN [14] and GCRM-M1 [19] lies in that WD-GCN utilizes the separate LSTM components per node. EvolveGCN [17] aims to use an RNN to evolve the parameter of the GNNs model, significantly reducing the mode size(i,e, the model parameters). DySAT [18] employs a self-attention mechanism with the GNNs model to joint learn representation along the dimensions of both local structure and temporal dynamics. GANE [20] utilizes tensor factorization to obtain temporal pattern similarity of nodes and incorporates it into the graph attention network for capturing temporal dynamics.

3 Preliminaries

Notations. A dynamic graph network is defined as a series of static graph network snapshots collected at each time step t, i.e., \(\mathbb {G} =\{G^{1},G^{2},\ldots ,G^{T} \}\), where T denotes the total number of time steps. Each graph snapshot \(G^t=(V^t,E^t)\) is a weight undirected graph network made of a node-set \(V^t\), an edge set \(E^t\), and a weighted adjacency matrix \(A^t\) at each time step t.

Problem Formulation. In this subsection, we formally define the problem of dynamic graph embedding. Given a dynamic graph G, dynamic graph embedding aims to learn mappings \(f^{t}: \{G^{1},G^{2},\ldots ,G^{t}\} \longrightarrow {R}^{\left| V^{t}\right| \times d} \) so that they obtain the latent representation \(Z^{t}=f^{t}(G^{1},G^{2},\ldots ,G^{t})\), where \(Z^{t} \in {R}^{\left| V^{t}\right| \times d}\) and d denotes the embedding dimensionality. Here, each row vector \(Z_{v}^{t} \in {R}^{d}\) is the low dimensional embedding of node v, which preserves local topological proximities and temporal evolutionary pattern information up to time step t.

4 Methodology

In this section, we present our proposed framework for dynamic graph embedding, as illustrated in Fig. 1. The proposed framework includes three major parts. First, we introduce the temporal hypergraph to capture both local structure information and temporal dynamics for dynamic graphs. Then, we use a hyperedge projection to obtain edge-level relationships. After that, we utilize the temporal edge-aware hypergraph convolution to aggregate information and pass on them among nodes to update nodes embedding, illustrated in the following sections.

4.1 Temporal Hypergraph Construction

In this subsection, we discuss temporal hypergraph construction. Note that, a dynamic graph contains a series of graph snapshots. The major challenge for dynamic graph embedding is to capture temporal evolution among these graph snapshots. The prior works mainly focus on restoring to RNN or Transformer to capture temporal dynamics indirectly, which splits the internal connection between topological information and temporal dynamics. To address this issue, we aim to directly capture both temporal dynamics and topological information through the properties of the hypergraph.

Fig. 1.
figure 1

An overview of our proposed framework. Given snapshot graphs \(\{G^{1},G^{2},G^{3}\}\) as input, we first generate the temporal hypergraph based on time steps. To be specific, the temporal hypergraph contains all original nodes from input snapshots and hyperedges. A hyperedge is composed of the nodes from the same time step, i.e., \(e_{1},e_{2},e_{3}\). Then, those hyperedges in the temporal hypergraph are operated by a hyperedge projection. After that, we utilize the temporal edge-aware hypergraph convolution to aggregate information and pass on them among nodes to update nodes embedding.

For a given dynamic graph \(\mathbb {G}=\{\mathbb {V},\mathbb {E}\}\), where \(\mathbb {V}=\{V^{1},V^{2},\ldots ,V^{t} \}\) denotes a series of node sets and \(\mathbb {E}=\{E^{1},E^{2},\ldots ,E^{t} \}\) denotes a series of edge sets. We assume that historical observations start from time step 1 to time step \(\tau \). First, note that \({V^{1}} \subseteq {V^{2}}\subseteq \ldots \subseteq {V^{\tau }}\), \(V^{\tau }\) contains all nodes in graph snapshots up to time step \(\tau \), so we define \(V^{\tau }\) as a hypernode set of the temporal hypergraph. Second, we aim to construct hyperedges of the temporal hypergraph. More specifically, a hyperedge \(e \in E^{\tau } \) is formed by linking a centroid node and its first-order neighbors at the same time step, where \(E^{\tau }=\{e^{m}_{v_{i}} | m \in \{1,..,\tau \}, v_{i} \in V^{\tau } \}\) is the hyperedge set of the temporal hypergraph and m denotes a certain time step. For example, if a hyperedge connects \(v_{1}\),\(v_{2}\) and \(v_{3}\), it can be denoted as \(e^{m}_{v_{1}}=\{{v_1,v_2,v_3 } | v_{2},v_{3} \in N (v_{1}),v_{1},v_{2},v_{3} \in {G^m}\}\), where \(v_{1}\) is assigned as a centroid hypernode, and \(N(v_{1} )\) is the first-order neighbors’ set of hypernode \(v_{1}\). Based on the discussion above, we define the temporal hypergraph as \({H^{\tau }}=({V^{\tau }}, E^{\tau }, W)\), where W denotes weight matrix for hyperedge, \(\left| V^{\tau }\right| \) is the number of hypernodes, and \(\left| E^{\tau }\right| \) is the number of hyperedges. For simplicity, we use |V| and |E| to represent \(\left| V^{\tau }\right| \) and \(\left| E^{\tau }\right| \) respectively. Formally, the temporal hypergraph can be represented by an incidence matrix \(H \in {R}^{\left| V\right| \times \left| E\right| }\):

$$\begin{aligned} H(v_{i}, e^{m}_{v_{j}})=\left\{ \begin{array}{l}1, \text{ if } v_{i} \in e^{m}_{v_{j}} \\ 0, \text{ otherwise } \end{array}\right. \end{aligned}$$
(1)

4.2 Hyperedge Projection

In this subsection, we further explore the edge-level correlations in hypergraphs. The temporal hypergraph is designed to obtain temporal dynamics of dynamic graphs, but it cannot well reflect the edge-level correlations in dynamic graphs. Thus, we introduce a hyperedge projection to extract edge-level correlations for dynamic graphs. The key idea of hyperedge projection is to capture edge-edge correlations of the temporal hypergraph. Figure 2 shows an example of a hyperedge projection. Specifically, the hypernodes connected to the same hyperedge are uniformly mapped into an edge that is defined as a node in the new graph. The hypernode projection can be formally written as follows:

$$\begin{aligned} P =D_{e}^{-1} H^{T} X \end{aligned}$$
(2)

where \(P \in R^{|V| \times M}\) is the hyperedge projection embedding of the original hypernode representation X, \(H^{T}\) is the transpose matrix of the incidence matrix H, and \(D_e \in R^{|E|\times |E|}\) denotes the hyperedge degree matrix. Then, these new nodes are connected if they contain the same hypernode. For example, in Fig. 2, \(e_{5}\) connects \(e_{6}\) by the green line with the number 5, denoting that they contain the same hypernode \(v_{5}\). In other words, these nodes connected are neighbors if they share the same hypernodes. Compared to hyperedges, edges only connect two nodes in this new graph. In a way, we convert the temporal hypergraph to a simple graph by the hyperedge projection and can easily integrate message-passing schemes for nodes, which preserves the original edge-level correlations of the temporal hypergraph.

Fig. 2.
figure 2

An example of a hyperedge projection.

4.3 Temporal Edge-Aware Hypergraph Convolution

In this section, we introduce the details of the message passing process via temporal edge-aware hypergraph convolution. In our work, if nodes have not initial feature, each node v is initialized by a one-hot vector \(x_{v} \in R^{M} \), where M is the number of nodes in \(G^{\tau }\). Then, an update operation for each node v is conducted in the temporal hypergraph, which contains intra-hyperedge aggregation and inter-hyperedge aggregation. The intra-hyperedge aggregation can be formulated as:

$$\begin{aligned} z_{e}=\sum _{v \in e} \frac{x_{v}}{\delta (e)} \end{aligned}$$
(3)

where \(z_{e}\) is the hyperedge representation through the intra-hyperedge aggregation, \(x_{v}\) is the initial representation of the node v, and \(\delta (e)\) denotes the degree of the hyperedge |e|. Afterward, the inter-hyperedge aggregation can be expressed as:

$$\begin{aligned} z_{v}=\sum _{e \in S} \frac{z_{e}}{d(v)} \end{aligned}$$
(4)

where \(z_v\) is the embedding of the node v through the inter-hyperedge aggregation, S is the set of hyperedges containing the node v, and d(v) denotes the number of hyperedges containing the node v. These two steps can merge and be rewritten as:

$$\begin{aligned} Z=D_{e}^{-1} H^{T} D_{v}^{-1} H X \end{aligned}$$
(5)

where \(Z \in R^{|E| \times M}\) is the embedding of hypernodes, \(H \in R^{|V|\times |E|}\) denotes the incident matrix, the original hypernode representation \(X \in R^{|V|\times M}\),and \(D_v \in R^{|V|\times |E|}\) denotes hypernode degree matrix. We observe that this update operation is equal to the simplified hypergraph convolution [2]. Then, we further extend this operation to capture the edge-level correlations in the temporal hypergraph. According to the hypernode projection mentioned in Sect. 4.2, we utilize the hypernode projection to replace the original hypernode representation X with P in the convolution rule as follows:

$$\begin{aligned} Z_{\text{ edge } } = D_{e}^{-1} H^{T} D_{v}^{-1} H P \end{aligned}$$
(6)

where \(Z_{edge} \in R^{|E|\times M}\) is the edge-level embedding of hypernodes. After capturing the edge-level correlations of the temporal hypergraph, we remap the edge-level embedding into the node-level embedding which is assigned to each node in the dynamic graph:

$$\begin{aligned} Z_{node}=H Z_{edge} \end{aligned}$$
(7)

where \(Z_{node} \in R^{|V|\times M}\) is the node-level embedding of hypernodes. Then, we utilize the renormalization trick introduced by [11] and employ a learnbale matrix. The complete temporal hypergraph propagation rule can be written as follows:

$$\begin{aligned} \begin{aligned} \tilde{Z}&=\sigma \left( D_{v}^{-1 / 2} Z_{node} D_{v}^{-1 / 2} \varTheta \right) \\&=\sigma \left( D_{v}^{-1 / 2} H D_{e}^{-1} H^{T} D_{v}^{-1} H P D_{v}^{-1 / 2} \varTheta \right) \end{aligned} \end{aligned}$$
(8)

where \(\tilde{Z} \in R^{|E| \times d}\) is the final embedding of hypernodes, \({\varTheta } \in R^{|E| \times d}\) denotes the model parameters matrix, d denotes the embedding dimensionality, and \(\sigma (\cdot )\) denotes the activation function (e.g. ELU).

Table 1. The statistics of datasets

4.4 Loss Function

In this subsection, we introduce the objective function that enables node representations to capture dynamic topological evolution during training our model. Inspired by DySAT [18], our model encourages nodes sampled in the fixed-length random walk to obtain similar representations. Formally, we use a binary cross-entropy loss to optimize the model parameters as follows:

$$\begin{aligned} \begin{aligned} L = \sum _{v \in V}&( \sum _{u \in N_{w a l k}(v)}-\log \left( \sigma \left(<\tilde{z}_{w}, \tilde{z}_{v}>\right) \right) \\&-\beta \cdot \sum _{u^{\prime } \in P_{n}(v)} \log \left( 1-\sigma \left( <\tilde{z}_{u^{\prime }}, \tilde{z}_{v}>\right) \right) ) \end{aligned} \end{aligned}$$
(9)

where \(\tilde{z}_{v}\) is the final embedding of a node v, \(\sigma (\cdot )\) denotes the sigmoid function, \(<\cdot>\) denotes the inner product. \(N_{walk}(v)\) is the positive nodes’ set of a node v sampled by the random walk, and \(P_n (v)\) is the negative nodes’ set of a node v sampled by a negative sampling function based on the degree of nodes. \(\beta \) is the negative sample value to balance positive and negative samples.

5 Experiments and Analysis

5.1 Experimental Setup

Datasets. To evaluate the performance of our model, we use seven public real-world datasets in our experiments. The datasets are summarized in Table 1. UCI [16]is an online social network. Links of this network denote the massage sent between peer users, i.e., nodes. Enron [12] contains a set of email messages concerning the Enron corporation, which is represented as an email communication network. Nodes of network denote the addresses and links denote there’s an interaction between these email addresses. YelpFootnote 1: is a rating network of users and businesses where links connect users and businesses if users score the businesses. ML-10M [8] consists of users and tags that users applied to certain movies. The links of this network denote there’s an interaction between users and movies. AlibabaFootnote 2 is an e-commerce network, consisting of users and items. The edge between user and item denotes the click interaction. EpinionsFootnote 3 denotes a trusted network between users. The edge of the network indicates the trust correlation between users. Primary School [21] represents the contact network. The link of this network is constructed from the interactions between teachers and students.

Baselines. We compare our proposed model with the following state-of-the-art dynamic graph embedding methods: (1) DynAE [4] utilizes an autoencoder framework based on dense layers; (2) DynAERNN [4] is based on DynAE, which uses recurrent neural networks to capture temporal dynamics of dynamic graphs; (3) DynGEM [6] adopts an incremental autoencoder framework for a dynamic graph based on the last graph snapshot; (4) DySAT [18]: DySAT aims to simultaneously capture the local structure information and temporal dynamics based on the self-attention mechanism; (5) EvolveGCN [17] uses the GCN to learn local structure information for each graph snapshot and employs the GRU or LSTM to update parameters of GCN to capture temporal evolution based on different graph snapshots; (6) GAEN [20] incorporates node temporal pattern similarities based on the tensor factorization technique and neighborhood attention to learn the node embedding for dynamic graphs.

Settings. In our experiments, we evaluate the performance of our model in both link prediction and node classification tasks. For link prediction, we train a logistic regression classifier to predict the existence of links at the time step \(t+1\) based on the embeddings learned from previous networks up to time step t. We randomly sample 60% of nodes for training. We utilize 20% of nodes to tune hyperparameters of our model and the remaining 20% of nodes for testing. We utilize the Mean Accuracy(ACC) and the Mean Area Under the ROC Curve (AUC) as our evaluation metrics of link prediction. For node classification, we randomly sample 20% of nodes as a validation set. Then, we use 30%, 50%, and 70% of nodes as train sets respectively, the corresponding remaining nodes are used as test sets. We also train a logistic regression classifier to map nodes into different categories based on the embeddings learned from previous networks up to time step t. We employ the Mean Accuracy(ACC) as our evaluation metrics of node classification. We use mini-batch gradient descent with Adam. For hyperparameters, we set batch size as 512, the embedding dimensionality d as 128, the learning rate as \(10^{-3}\), the weight decay as \(5 \times 10^{-4}\), the max epoch as 20, and negative sample ratio \(\beta \) as 0.01. We conduct our experiments on a machine with the Intel Core i9-9960X (3.10GHz) CPU, 128 Gb of RAM, and four NVIDIA 2080Ti GPU cards, which are implemented in Python 3.6 with Tensorflow.

Table 2. The predictive performance of link prediction task in terms of AUC and ACC on UCI, Enron, Yelp, ML-10M, Alibaba, and Epinions. The results are the mean and standard deviation of 5 different runs. OOM denotes running out of memory on our machine.
Fig. 3.
figure 3

The results of six datasets in link prediction task in terms of AUC at various time steps.

Table 3. The predictive performance of the node prediction task in terms of ACC on Primary School at different train ratios. The results are the mean and standard deviation of 5 different runs.

5.2 Experimental Results

Link Prediction. In this subsection, we discuss the performance of our model in the link prediction task compared with state-of-the-art methods. Experimental results are illustrated in Table 2. Table 2 shows that DynHyper consistently outperforms baselines in all datasets except that GANE outperforms DynHyper on the Enron dataset under ACC. These results indicate the effectiveness of DynHyper in link prediction. For example, as compared to the best approach of baselines(i.e., DySAT) on the Yelp dataset, we get roughly a 5% improvement in both AUC and ACC. Note that GANE gains better performance than DynHyper on the Enron dataset. GANE obtains node temporal patterns via tensor factorization to improve performance, which may be more successful on tiny datasets like Enron having only 143 nodes. However, DynHyper tries to capture the edge-level correlations on datasets, which may perform better in large datasets rather than small ones. As the result shows, DynHyper obtains about 94% AUC and 99% AUC on ML-10M and Epinions datasets respectively, which are much larger datasets than the Enron dataset. Based on the abovementioned, this might be the reason why our approach on Enron is inferior to GANE. Besides, DynGEM employs the smoothness regularization to capture temporal dynamics that can not address the network with abrupt change. Users’ communications on UCI typically span longer periods, showing that the network is smooth. However, rating behaviors on Yelp, tend to be erratic and connected with events like restaurant openings and discounted promotions, indicating a network with abrupt change. Thus, we observe that DynGEM obtain a relatively better performance on UCI than the performance on Yelp. The predictive results of DynHyper are consistently superior to DynGEM on all the datasets, especially Yelp, demonstrating that DynHyper performs well in both smooth and abrupt networks.

Furthermore, we seek to analyze the detailed performance of these methods at each time step. The results are reported in Fig. 3. First, we note that DynHyper is inferior to some baselines at the initial time step on some datasets, such as Enron and Alibaba. The potential reason is that these datasets do not form a lot of edge-level relationships at the initial time step. Additionally, we find that as the time step is increased, DynHyper’s performance improves. Moreover, DyperHyper is consistently superior to all baselines at each time step on some datasets, such as Yelp and Epinions. This finding might be caused by these datasets containing more edge-level correlations. It is worth noticing that Yelp and Epinions have more links than other datasets.

Fig. 4.
figure 4

Experimental results for ablations

Node Classification. In this subsection, we compare DynHyper’s performance to that of state-of-the-art approaches in the node classification task. Due to the lack of dynamic graphs datasets with node labels, we use the Primary School dataset with different train ratios to fully use this dataset for evaluation. Table 3 shows the results of the experiments. DynHyper achieves a consistent 2%\(\sim \)3% ACC improvement on Primary School at different train ratios, demonstrating DynHyper’s effectiveness in node classification. In addition, approaches with the RNN component, such as DynAERNN and EvolveGCN, perform poorly in node classification. DynAERNN is even superior to DynAE, suggesting that Combination with the RNN component is ineffective at capturing temporal dynamics in the node classification task.

Ablation Study. In this subsection, we conduct ablation studies to evaluate the contribution of the hyperedge projection(HP) of our model. HP aims to capture edge-level relationships of datasets to improve performance. To better demonstrate this, we compare the performance between our model with HP and our model without HP at various time steps. The compared results are shown in Fig. 4. According to Fig. 4, DynHyper with HP outperforms DynHyper without HP on most datasets. As discussed above, the Enron dataset is a small dataset having 143 nodes while HP is much more effective with big datasets. As a result, we note that DynHyper without HP is superior to DynHyper without HP on the Enron dataset.

6 Conclusion

In this paper, we propose a dynamic embedding framework to address dynamic graphs, named DynHyper. We introduce temporal hypergraph construction to capture effectively temporal dynamics for dynamic graphs. Additionally, we propose a hyperedge projection to obtain edge-level relationships of temporal hypergraphs. Furthermore, We propose a temporal edge-aware hypergraph convolutional network to independently and effectively conduct the message passing in dynamic graphs without any RNN components. Experimental results confirm that DynHyper has great performance in both link prediction and node classification tasks, especially on the more complex datasets. Our future work aims to extend our work to address more complex dynamic graphs, such as those with changeable attributed nodes.