Keywords

1 Introduction

With the development of graphics technology, networks with link structures have become more and more worth studying [1]. In an actual system, nodes and links of a network represent the entities and communications of the system, respectively. However, with the rapid development of the Internet and information systems, communications and entities have grown exponentially. It has been difficult to calculate or express the structural properties of nodes and links in the network [2]. Network embedding has been proved an effective method to learn low-dimension feature of nodes in networks. The low-dimension feature not only preserves the structural features of nodes, but also obtains dense feature representations [3, 4].

Previous embedding methods mainly work on static networks [2, 5,6,7]. These models are usually designed to preserve local structures by GCN [7,8,9]. The embedding performance of this encoder-decoder architecture is determined by a loss function which determines the distance of local structures between the original and decoded networks. Classical network embedding methods include the matrix factorization-based approaches (DNR [10]) and the random walk based approaches (such as Deepwalk [5] and Node2vec [2]).The GCN architecture continuously merges information of neighbor nodes by means of message propagation.

The relationship between nodes in social networks and biomolecular networks always changes over time. Compared with static networks, dynamic networks are more suitable for these systems, in which the addition and deletion of nodes and links represent changes in node relationships in the system [11,12,13]. Moreover, in these networks, network embedding needs to consider both the structure and temporal features of the networks.

In recent years, some dynamic network embedding methods [14,15,16] have been proposed. For instance, the methods like dynGEM [16] and dynTriad [14] generated dynamic graph embedding based on a simplified assumption that graphs change smoothly. [15] improved classic skip-gram method for dynamic network embedding by updating the embedding of nodes whose links change most dramatic node embedding. dynRNN [17] and dynAERNN [17] used a recursive layer to solve the problem of inconsistent time spans. It is well known that Recurrent Neural Network (RNN) [18] is difficult to train and have high computational cost. Recently, many methods have been proposed for processing temporal data [19,20,21].

In this paper, we propose a novel model (called GTCN), which preserves both the structural and the temporal feature of dynamic network. GTCN uses GCNto get the embedding vector of each node in each snapshot. Then, it exploits TCN [22] to generate the nodes embedding that evolves over time. Extensive experiments on six real-world networks demonstrate that GTCN outperforms the state-of-the-art methods in link prediction.

2 Problem Definition

A dynamic network can be represented by \( G = \{ {\mathcal{G}}^{1} ,{\mathcal{G}}^{2} ,. \ldots {\mathcal{G}}^{T} \} \), where \( T \) is the total time step, while \( G^{t} \) is the snapshot of the dynamic graph at time stamp \( t \). Each \( {\mathcal{G}}^{t} = \{ {\mathcal{V}}^{t} ,{\mathcal{E}}^{t} \} \) represents undirected graph, where \( {\mathcal{V}}^{t} \) denotes node set and \( {\mathcal{E}}^{t} \) denotes link set at time \( t \). Additionally, we can represent its links as a symmetric adjacency matrix \( {\text{A}}^{t} = [A_{ij}^{t} ] \in {\mathcal{R}}^{n \times n} \), where \( n \) is the number of nodes, with each entry \( \{ i,j\} \) of the matrix as

$$ A_{ij}^{t} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if e}}_{\text{ij}}^{\text{t}} \in {\mathcal{E}}_{\text{t}} } \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right.. $$

Dynamic network embedding aims to obtain the latent representation of each node, so that the latent representation \( {\text{emb}}_{i}^{t,2} \in {\mathcal{R}}^{{f_{2} \times 1}} \) (the embedding of node \( i \)) can capture both the current time graph structure information and the time information of the graph sequence.

3 Our Solution: GTCN

Our model consists of two parts. First, we apply GCN to deal with current time graph structure feature. Then, we exploit TCN to obtain past time temporal information.

3.1 Current Time Graph Structure Feature

This layer is consisted of 2 GCNs [7], used to grasp the current topology features of the graph.

For the first layer of GCN, the input are the adjacency matrix \( {\text{A}}^{t} \in {\mathcal{R}}^{n \times n} \) and the node feature matrix \( {\text{F}}^{t} \in {\mathcal{R}}^{n \times n} \) at time \( t \).

Because the dataset used does not have node features, in general, the identity matrix is used as the node feature. Here, we use the following methods to calculate node features.

$$ \begin{array}{*{20}l} {{\hat{\text{F}}}^{t} = {\text{I}}_{N} + {\text{A}}^{t} } \hfill \\ {{\text{F}}^{t} = \left( {1 + t_{d} \cdot {\text{F}}^{t - 1} } \right) \odot {\hat{\text{F}}}^{t} ,} \hfill \\ \end{array} $$
(1)

where \( t \in \{ 2, \ldots ,T\} \), and \( {\text{F}}^{1} = {\text{I}}_{N} + {\text{A}}^{1} \). \( {\text{I}}_{N} \in {\mathcal{R}}^{n \times n} \) is the identity matrix, \( t_{d} \) is time decay value, a hyper parameter and \( \odot \) is element-wise multiply. That means the longer time the link remains, the higher the similarity between two nodes the link connected, but once the link disappears, the similarity of the corresponding two nodes become zero and re-accumulated.

The output of the first GCN layer \( l_{1} \) of a single snapshot is calculated as follows:

$$ {\text{Z}}_{s}^{{t,l_{1} }} = {\mathbf{relu}}\left( {{\hat{\text{A}}}^{t} \cdot {\text{F}}^{t} \cdot {\text{W}}_{s}^{{t,l_{1} }} } \right), $$
(2)

where \( {\text{Z}}_{s}^{{t,l_{1} }} \in {\mathcal{R}}^{{n \times d_{1} }} \) and \( {\text{W}}_{s}^{{t,l_{1} }} \in {\mathcal{R}}^{{n \times d_{1} }} \) is the parameters with dimensionality \( d_{1} \) of the first GCN layer to be learned and \( {\mathbf{relu}} \) is defined as:

$$ {\mathbf{relu}}\left( x \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {{\text{if x}} > 0} \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right., $$

and \( {\hat{\text{A}}}^{t} \) is defined as:

$$ \begin{array}{*{20}l} {{\tilde{\text{A}}}^{t} = {\text{A}}^{t} + {\text{I}}_{N} } \hfill \\ {{\tilde{\text{D}}}_{ii}^{t} = \sum\limits_{j} {{\tilde{\text{A}}}_{ij}^{t} } } \hfill \\ {{\hat{\text{A}}}^{t} = ({\tilde{\text{D}}}^{t} )^{{ - \frac{1}{2}}} \cdot {\tilde{\text{A}}}^{t} \cdot ({\tilde{\text{D}}}^{t} )^{{ - \frac{1}{2}}} .} \hfill \\ \end{array} $$

For the second GCN layer \( l_{2} \), the output of the first GCN layer \( {\text{Z}}_{s}^{{t,l_{1} }} \) serves as the node feature of the second GCN layer and the output of second GCN layer is calculated as:

$$ {\text{Z}}_{s}^{{t,l_{2} }} = {\hat{\text{A}}}^{t} \cdot {\text{Z}}_{s}^{{t,l_{1} }} \cdot {\text{W}}_{s}^{{t,l_{2} }} , \, $$
(3)

where \( {\text{Z}}_{s}^{{t,l_{2} }} \in {\mathcal{R}}^{{n \times d_{2} }} \) and \( {\text{W}}_{s}^{{t,l_{2} }} \in {\mathcal{R}}^{{d_{1} \times d_{2} }} \) is the parameters with dimensionality \( d_{2} \) of the second GCN layer.

Finally, we need to use the structure embedding vector [23] of each node \( {\text{Z}}_{s}^{{t,l_{2} }} \) to reconstruct the adjacency matrix.

$$ {\hat{\text{A}}}_{s}^{t} = {\mathbf{sigmoid}}\left( {{\text{Z}}_{s}^{{t,l_{2} }} \cdot ({\text{Z}}_{s}^{{t,l_{2} }} )^{T} } \right), $$
(4)

where \( ({\text{Z}}_{s}^{{t,l_{2} }} )^{T} \) means transpose of matrix \( {\text{Z}}_{s}^{{t,l_{2} }} \). \( {\mathbf{sigmoid}} \) is an activation function, defined as \( {\mathbf{sigmoid}}(x) = \frac{1}{{1 + e^{x} }} \).

Then, we compare the reconstructed matrix \( {\hat{\text{A}}}_{s}^{t} \) with the ground truth adjacency matrix \( {\text{A}}_{t} \) at the current moment \( t \) as structure loss. Our goal is to minimize the structure loss of time step t.

$$ {\mathcal{L}}_{{\mathcal{S}}} = \sum\limits_{t = 1} n orm_{s}^{t} \left\| {\left( {{\hat{\text{A}}}_{s}^{t} - {\text{A}}^{t} } \right) \odot {\text{B}}_{s}^{t} } \right\|_{F}^{2} , $$
(5)

where \( norm_{s}^{t} \) is the normalization coefficient, defined as:

$$ norm_{s}^{t} = \frac{n \times n}{{2\left( {n \times n - \left\| {{\mathcal{E}}^{t} } \right\|} \right)}}, $$

\( {\text{B}}_{s}^{t} \in {\mathcal{R}}^{n \times n} \) is the penalty matrix

$$ {\text{B}}_{s,ij}^{t} = \left\{ {\begin{array}{*{20}l} {\frac{{n \times n - \left\| {{\mathcal{E}}^{t} } \right\|}}{{\left\| {{\mathcal{E}}^{t} } \right\|}}} \hfill & {{\text{if e}}_{\text{ij}}^{\text{t}} \in {\mathcal{E}}^{\text{t}} } \hfill \\ 0 \hfill & {\text{otherwise}} \hfill \\ \end{array} } \right., $$

where \( {\text{B}}_{s,ij}^{t} \) is the element at the \( i_{th} \) row and \( j_{th} \) column of matrix \( {\text{B}}^{t} \) and \( \odot \) is element-wise multiply.

3.2 Past Time Temporal Feature

In this section, we will use TCN to solve the temporal problem. In order to predict the adjacency matrix of \( {\mathcal{G}}^{t + 1} \), we need to consider all the time series information before time \( t + 1 \), that is \( \{ {\mathcal{G}}^{1} ,{\mathcal{G}}^{2} , \ldots ,{\mathcal{G}}^{t} \} \).

The input of this layer is the structural feature of each node of the graph at time \( \{ 1,2, \ldots ,t\} \). Here we use the node representation \( {\text{Z}}_{s}^{{t,l_{2} }} \) obtained from the GCN above to be the structural feature. We use \( {\text{Z}}_{s,i*}^{{t,l_{2} }} \in {\mathcal{R}}^{{1 \times d_{2} }} \), the \( i_{th} \) row of the matrix, to represent the structural feature of node \( i \) in \( {\mathcal{G}}_{t} \), then our input for the prediction of each node of \( {\mathcal{G}}^{t} \) is \( \{ {\text{Z}}_{s,i*}^{{1,l_{2} }} ,{\text{Z}}_{s,i*}^{{2,l_{2} }} , \ldots ,{\text{Z}}_{s,i*}^{{t,l_{2} }} \} \).

Here we use 2 layers of TCN [22] as a model for processing temporal information. Each layer of TCN consists of a dilated convolution, a causal convolution, and a skip connection.

For convenience, we set \( {\text{Z}}_{s,i*}^{{t,l_{2} }} \) as:

$$ {\text{H}}_{i}^{t,0} = ({\text{Z}}_{s,i*}^{{t,l_{2} }} )^{T} , $$

where \( {\text{H}}_{i}^{t,0} \in {\mathcal{R}}^{{d_{2} \times 1}} \) means the embedding vector of the node \( i \) of \( {\mathcal{G}}^{t} \) in the layer \( 0 \), i.e., the \( i_{th} \) row of \( {\text{Z}}_{s}^{{t,l_{2} }} \).

For each layer \( l \), dilated convolution and causal convolution is calculated as follows:

$$ {\hat{\text{H}}}_{i}^{t,l} = {\mathbf{relu}}\left( {\sum\limits_{j = 0}^{k - 1} {{\text{W}}_{j}^{l} } \cdot {\text{H}}_{i}^{(t - dj),(l - 1)} } \right), $$
(6)

where \( {\text{W}}_{j}^{l} \in {\mathcal{R}}^{{f_{l} \times f_{l - 1} }} \), \( f_{l} \) is the number of filter (dimension of output) in \( l_{th} \) layer, \( d \) is the dilation factor and \( k \) is the filter size, and \( t - dj \) means the past focused step. Therefore, you can consider \( d \) as the number of steps between two adjacent columns of the filter. When \( d \) is equal to 1, it is equivalent to usual causal convolution. The larger \( d \) enables an exponentially large receptive field, i.e., more information of graph dynamic evolution.

For each layer, skip connection (residual connection) [24], add the original input to the output,

$$ {\text{H}}_{i}^{t,l} = {\mathbf{relu}}\left( {{\text{W}}^{l} \cdot {\text{H}}_{i}^{t,(l - 1)} + {\hat{\text{H}}}_{i}^{t,l} } \right), $$
(7)

where \( {\text{W}}^{l} \in {\mathcal{R}}^{{f_{l} \times f_{l - 1} }} \), used to make the dimension of original input and output equal.

Then, we use a linear layer and the output of the last TCN layer (here we use two layer TCN, so \( {\text{emb}}_{i}^{t,3} \) is the output of the last TCN layer) to reconstruct the vector representation of the node \( i \) at the next time step \( t + 1 \),

$$ {\hat{\text{V}}}_{i}^{t + 1} = {\mathbf{sigmoid}}\left( {{\text{W}}_{d} \cdot {\text{H}}_{i}^{t,2} + {\text{B}}_{d} } \right), $$
(8)

where \( {\text{W}}_{d} \in {\mathcal{R}}^{{n \times f_{2} }} \) and \( {\text{B}}_{d} \in {\mathcal{R}}^{n \times 1} \) are the parameters of the linear layer.

After obtaining the prediction vector representation of node \( i \) at next time step \( t + 1 \), we need to compare the difference between the predicted value and the ground truth, and use this as our temporal loss to optimize,

$$ {\mathcal{L}}_{{\mathcal{T}}} = norm_{t}^{t} \sum\limits_{t = 0}^{T} {\sum\limits_{i = 0}^{n} {\left\| {\left( {{\hat{\text{V}}}_{i}^{t + 1} - {\text{V}}_{i}^{t + 1} } \right) \odot {\text{B}}_{t,i*}^{t} } \right\|_{F}^{2} } } , $$
(9)

where \( T \) is the total time step before \( t + 1 \) and \( n \) is the number of nodes.\( {\text{V}}_{i}^{t + 1} \) is the \( i_{th} \) row of the adjacency matrix \( {\text{A}} \) at time \( t + 1 \). \( norm_{t}^{t} \) and \( {\text{B}}_{t}^{t} \) are the same as \( norm_{s}^{t} \) and \( {\text{B}}_{s}^{t} \) separately except that it used \( t + 1 \) nodes and edges information. And \( {\text{B}}_{t,i*}^{t} \in {\mathcal{R}}^{n \times 1} \) is the \( i_{th} \) row of \( {\text{B}}_{t}^{t} \).

3.3 Total Architecture

In this section, we will present the key components ofour overall model.

Current Time Graph Structure Feature Block:

For this module, we aim to capture the topological features and the neighbor characteristics of the nodes at each time graph \( {\mathcal{G}}_{t} \). The parameters at each time step are not shared, the output of this module will be used as input to the next temporal module.

Past Time Temporal Feature Block:

In this module, we will capture the temporal features of the graph dynamic changing. We use all the graph structure information before \( t + 1 \) as input, i.e., \( \{ {\mathcal{G}}^{1} ,{\mathcal{G}}^{2} , \ldots ,{\mathcal{G}}^{t} \} \), and use TCN to predict \( {\mathcal{G}}^{t + 1} \).

Objective:

we define the loss for optimization as:

$$ {\mathcal{L}} = \hbox{min} ({\mathcal{L}}_{{\mathcal{S}}} + \alpha {\mathcal{L}}_{{\mathcal{T}}} + \lambda {\mathcal{L}}_{reg} ), $$
(10)

where \( {\mathcal{L}}_{{\mathcal{S}}} \) is the graph structure feature loss and \( {\mathcal{L}}_{{\mathcal{T}}} \) is the graph dynamic feature loss.\( \alpha \) is the parameters to control the contribution of \( {\mathcal{L}}_{{\mathcal{S}}} \) and \( {\mathcal{L}}_{{\mathcal{T}}} \), which is between \( 0 \) and \( 1 \). \( \lambda \) is the parameter to control contribution of \( {\mathcal{L}}_{reg} \) and \( {\mathcal{L}}_{reg} \) is an L2-norm regularizer to prevent overfitting, defined as:

$$ {\mathcal{L}}_{reg} = \frac{1}{2}\left\| W \right\|_{F}^{2} , $$

where \( W \) is all the parameters that need to be learned.

Optimization:

We use the adam [25] optimizer.

4 Experimental Results

In this section, firstly we introduce 6 real-world dynamic networks as our data set, and compare 5 models including 1 static graph embedding method and 4 dynamic embedding approaches. Then, we analyze the experimental results of the model under different data sets.

4.1 Experimental Settings

Datasets.

It consists of 6 real-life networks that include web forums, networks for medical, school, workplace and citation relationship. The basic properties of the data set are described in Table 1.

Table 1. Statistics of dataset used in our experiments.

Comparison Models.

We compare the performance of our model with other state-of-the-art models including one static graph embedding model (SDNE [26]) and four dynamic graph embedding models (dynGEM[16], Dyn2vecAE [17], Dyn2vecRNN [17] and Dyn2vecAERNN [17]).

Task.

In real networks, links between nodes tend to follow time changes. Therefore, it is very important to use past node link information to predict changes in node connections at the next moment. Here, we use link prediction as the criterion for judging the pros and cons of our model. In the experiment, we use the information of the past nodes \( 1,2,3, \ldots ,t \) to predict the embedding of nodes at the next moment \( t + 1 \), and calculate the relationship between nodes at \( t + 1 \) snapshot with the binary classification model. Finally, we compare the predicted node association with the ground truth connections and adopt Mean Average Precision (mAP) as judging criteria.

We conduct experiments on the tasks of link prediction in dynamic graphs. Firstly, we train the model and acquire the corresponding learned parameters on snapshots \( \{ {\mathcal{G}}_{1} ,{\mathcal{G}}_{2} , \ldots ,{\mathcal{G}}_{t} \} \) and apply \( {\hat{\text{V}}}_{t + 1} \) to predict links in \( {\mathcal{G}}_{t + 1} \) during evaluation. At each snapshot of the graph, links could be disappeared or added. We compare the performance of different models in link prediction based on their own abilities.

Metrics.

To evaluate the performance of link prediction learned by our model, we use Mean Average Precision (mAP) and precision@k as the metrics.

Detailed Settings.

For our datasets with short time steps, when training the model to predict \( {\mathcal{G}}_{t + 1} \), we will exploit all the graphs before time \( t + 1 \), i.e., \( \{ {\mathcal{G}}_{1} ,{\mathcal{G}}_{2} , \ldots ,{\mathcal{G}}_{t} \} \). In our experiment, we predict the \( T \), \( T - 1 \), \( T - 2 \) and \( T - 3 \) graphs for each data set, and train 20 times for each experiment. Finally, the average of the mAPs for 20 times with 4 time steps is selected as the final assessment metric.

Here, we choose 2-layer GCNs to extract the topology structural feature of each snapshot of graphs, and 3-layer TCNs to obtain the dynamic evolution feature.

All the experiments are conducted on Linux platform with 2 GPU cores (Tesla P100) and 32 GB RAM.

4.2 Result and Analysis

In this section we present the performance of different models for link prediction on different datasets.

Table 2 shows the performance (mAP) of each algorithm on each data set. Among them, we take mAP as the criterion for judging the pros and cons of the model. The larger the mAP, the better performance of the model has.

Table 2. Experiment results on dynamic link prediction (mAP)

From Table 2, we can see that SDNE, dynGEM, dynAE, dynRNN and dynAERNN perform very poorly on fb-forum data set (smaller than 0.01 and 20 times or smaller than the mAP of GTCN). The reason is that the edges in fb-forum change very sharply at each moment, and both SDNE and dynGEM just consider the adjacency matrix of the graph at the current moment as the training set to predict the graph structure of next time step. Therefore, the violent changes cause SDNE and dynGEM to become very difficult to predict next snapshot of the dynamic network. The dynAE, dynRNN and dynAERNN methods can find long historical data information, but the structure of Auto-encoder makes it difficult for them to mine the characteristics of sparse graph structures. However, GTCN achieves better performance since GTCN uses TCN to discover historical information, and then adopts GCN to cooperate with time feature to some extent to alleviate the impact of the sparse graph structure. However, the degree of the node of fb-forum is very small which results in the adjacency matrix being sparse. Thus, the number of edges that can be trained is small, the training is insufficient and the number of nodes is too large which results in lower accuracy for prediction. Hence, all the algorithms are not very good and even the best performance (mAP) can only reach about 0.02. For email dataset, we also find the same situation that using only the information from the previous moment is not enough to discover the structural characteristics of the dynamic network.

Compared to other dynamic graph embedding algorithms, GTCN performs the best on most of data sets. We attribute this to the GCN and the preprocessing of timing features which contain the node features of each snapshot with a rough summary of temporal information. Compared to dynGEM mentioned above, the model only focuses on the changes in the links of the previous 1 time step graph, which is too short-sighted. Compared with dynAE and dynRNN, the AE and RNN structures are not as good as GCN for capturing the neighbor structure of each node in the graph. Furthermore, the combination of structural features and temporal features are also impossible for AE and RNN.

Table 3 and Table 4 show the precision@ k values for various data sets. The larger the precision@ k is, the better the model performs. As we explained above, fb-forum has many nodes, but links between nodes are sparse and change very sharply, which leads to poor model learning. However, our model can still achieve better performance than other models. For other small data sets, our model can also outperform than other models in most of the datasets in P@100, P@500 and P@1000.

Table 3. Experiment results on dynamic link prediction (Precision@k part-1)
Table 4. Experiment results on dynamic link prediction (Precision@k part-2)

5 Conclusion

In this paper, we have proposed a dynamic network embedding method GTCN. The model first captured the topology properties of the nodes in the graph by the GCN, and then adopted TCN to capture the dynamic temporal changes of the nodes. Experimental results have showed that our model not only captured the topology properties of nodes, but also extracted dynamic temporal changes. They have also indicated that our proposed method outperforms the state-of-the-art methods in link predictions.