Keywords

1 Introduction

Knowledge graphs (KGs) organize and store real-world facts, enabling multifarious downstream applications, such as knowledge retrieval, question answering, and recommender systems [12]. KGs encode factual knowledge in the form of triple (sro) as directed graphs, where nodes correspond to the subject entity s or object entity o, and edges represent the relation r among them. Owing to the high cost of knowledge fusion and dynamics of facts, most KGs often suffer from incompleteness [31]. Thus, link prediction becomes a crucial task, which intends to recover the most probable missing facts. Since real-world KGs contain millions of multi-relational facts, traditional symbolic and logic-based approaches cannot be extended to large-scale KGs for link prediction.

Recently, KG embedding has emerged as a promising method for link prediction. It attempts to learn multi-dimensional vectorial representations of entities and relations in KGs, while using a scoring function to evaluate the plausibility of a triplet. Represented by TransE [1], these translation-based approaches achieve a good trade-off between model complexity and link prediction performance by modelling relations as translation operations on entity embeddings. However, the vast majority of existing embedding methods perform link prediction on static KGs, with the assumption that the relational facts in KGs are generally correct.

Actually, facts always evolve over a specific period of time [3]. Therefore, researchers construct temporal knowledge graphs (TKGs) to store ever-growing temporal information either explicitly or implicitly, such as YAGO [24] and ICEWS [16]. Figure 1 shows an example of a temporal knowledge graph (TKG), where the fact (Donald Trump, president of, USA) was accurate only from 2017 to 2020. However, traditional KG embedding methods cannot address the issue of TKGs, where facts often show temporal dynamics. For example, they often confuse entities such as Trump and Biden when predicting (?, president of, USA, 2021). Additionally, TKG embeddings carrying temporal information are challenging due to the sparsity and irregularities of temporal expressions [5].

Fig. 1.
figure 1

Example of temporal knowledge subgraphs.

To solve the challenges, Know-Evolve [27] and its extension DyRep [28] predict future events based on ground truths of preceding events at inference time. As a result, these methods cannot predict missing events in future time-stamps without ground truths. To capture more information based on past facts, Jin proposed a novel autoregressive architecture RE-NET [14], which models facts as probability distributions over TKGs. However, RE-NET learns representations of entities and relations by implicitly exploiting temporal information without distinguishing dynamic dependencies across facts.

In this work, we observe that TKGs are dynamically heterogeneous graphs with multiple relationships, i.e., the local structures of graphs are always diverse under different time windows, and the facts evolve across time windows. As an example in Fig. 1, the local structure information of the entity America comes from 4 entities and 2 relations at \(t_1\). While at \(t_2\), the local structure of the entity America changes significantly, resulting in not only the emergence of new entities and relations but also the absence of some entities and relations at \(t_1\). Moreover, the fact (Donald Trump, president of, America) at \(t_1\) evolves into (Joe Biden, president of, America) at \(t_2\).

To this end, we propose SiepNet, a novel graph neural network for temporal link prediction, driven by local Structural Information and Evolutionary Patterns. The main ideas of SiepNet are (1) capturing graph structure dependencies based on a relation-aware GNN architecture, (2) learning long-range and short-range evolutionary patterns of TKGs using an attention-based recurrent network, and (3) integrating local structures and evolutionary patterns to strengthen the representation learning of facts, which improves the performance of temporal link prediction. We summarize our main contributions as follows:

  • We propose a representation learning model SiepNet for temporal link prediction, which simultaneously considers local structures and evolutionary patterns hidden in TKGs.

  • We design an attention-based recurrent network to tackle dynamic dependencies across entities over time, which helps to distinguish the impact of different historical facts on future facts inference.

  • To validate the effectiveness of our model, we conduct extensive experiments on five real-world TKGs containing millions of multi-relational facts with different time intervals, where our model consistently outperforms other baselines in terms of temporal link prediction.

2 Related Work

Towards temporal link prediction, we restrict our focus to recent works on TKG embedding methods, including geometric models and neural network models.

Geometric Models. These models attempt to minimize the distance between two entity vectors translated by geometric transformations of relations. TTransE [17] extends TransE [1] for static KGs to TKGs by adding temporal constraints. TA-TransE [5] embeds temporal information into relation types, which can be used with existing scoring functions for temporal link prediction in TKGs. HyTE [3] utilizes time-specific normal vectors directly to generate representations of entities and relations over different time-stamps. Nevertheless, these geometric models cannot infer future facts according to past facts and cannot be further extended to extrapolate settings.

Neural Network Models. These models use deep neural networks to learn underlying features of time-stamps for link prediction. RE-NET [14] combines a recurrent neural network and a neighborhood aggregator to model event sequences. CyGNet [34] predicts future facts by modelling observed facts with a copy-generation network. TITer [25] continuously transfers query nodes to new nodes through relevant temporal facts based on time-aware reinforcement learning strategies, and generates representation vectors of unseen entities using an IM module. CluSTeR [19] performs temporal reasoning on TKGs by joint reinforcement learning and a graph convolution network. RE-GCN [20] learns evolutionary representations of facts at each timestamp, by modelling KG sequences recurrently using a recurrent evolutionary network. However, the performance of these neural network models is limited by repetitive patterns.

3 Problem Definition

We consider a temporal knowledge graph as a sequence of graph snapshots, ordered ascending based on time-stamps, namely \(G=\left\{ G_1, G_2, \cdots , G_{\tau } \right\} \), where \(G_t=(V_t,E_t)\) represents the snapshot at a particular time slice t \((t\in {1,\ 2,\cdots ,{\tau }})\) with an entity set \(V_t\) and a relation set \(E_t\). \(V_t\) corresponds to the subject entity s or object entity o at a time slice t, and \(E_t\) represents the relation r between them. Thus, a fact in \(G_t\) is denoted by a quadruple (srot) with a time slice t, in which \(s\in V_t\), \(o\in V_t\) and \(r\in E_t\).

Given the preceding observed facts in G, the temporal link prediction aims to predict the missing facts of the current time slice t, i.e., to predict the unseen subject entity s given (?, rot) (object entity o given (sr, ?, t), and relation r given (s, ?, ot)) at a particular time slice t.

4 Methodology

4.1 The Model Architecture

The proposed model SiepNet depicted in Fig. 2 consists of two main components: (1) Local Structural Information Aggregation, and (2) Evolutionary Patterns Aggregation. First of all, we design a relation-aware GNN to capture the local structural information from multi-relational and multi-hop neighbors of each single graph snapshot. Then, we explore long-range and short-range evolutionary patterns of TKGs using an attention-based recurrent network. In addition, we integrate local structures and evolutionary patterns to strengthen the representation learning of facts, which in turn improves the performance of temporal link prediction.

Fig. 2.
figure 2

The architecture of the SiepNet temporal link prediction model.

4.2 Local Structural Information

To aggregate local structural information from multi-relational and multi-hop neighbors in each graph snapshot \(G_t\), SiepNet seeks to make two linked nodes share similar representations. To achieve this, we let each node representation \(h_o^{(t)}\) in \(G_t\) aggregates neighbors and past messages, and then calculate its new representation. Initially, \(h_o^{(0)}\) is set to trainable embedding vector for each node. SiepNet calculates the forward-pass update of an entity denoted by \(v_o\) in a multi-relational graph, based on the following message-passing neural network:

$$\begin{aligned} h_{o}^{(t)}=\sigma ( \sum _{s \in N_{o,r}^{t} } \mathcal {F}_{str} (h_{s}^{(t-1)},r^{(t-1)}) + W_{o}^{(t-1)}h_{o}^{(t-1)}) \end{aligned}$$
(1)

where \(h_{o}^{(t)}\) is the intermediate representation of node \(v_o\) at time slice t, combining local structural messages \(h_{s}^{(t-1)}\) from all neighbors \(N_{o,r}^{t}\) under relation \(r \in E_t\) and its past messages \(h_{o}^{(t-1)}\). \(W_{o}^{(t-1)}\) is a learnable parameter, indicating the past weight. To comprehensively aggregate the local structural messages of node \(v_o\), we implement the message function \(\mathcal {F}_{str}(., .)\) by

$$\begin{aligned} \mathcal {F}_{str}(h_{s}^{(t-1)},r^{(t-1)}) = \frac{1}{c_{o,r}^{t}} W_{r}^{(t-1)}[h_{s}^{(t-1)} \times r^{(t-1)}] + b_{str} \end{aligned}$$
(2)

where \(h_{s}^{(t-1)} \times r^{(t-1)}\) is the local structural messages, while \(W_{r}^{(t-1)}\) and \(b_{str}\) are the learnable parameters, indicating the local weight and bias. \(c_{s,r}\) is a normalizing factor that can either be learned or chosen in advance (e.g., \(c_{o, r}^{t}=|N_{o,r}^t |\)).

Unlike traditional GCNs, SiepNet accumulates and encodes features of entities from local structural neighborhoods, i.e., \({\frac{1}{c_{s,r}}W_r^{(t-1)}[h_{s}^{(t-1)} \times r^{(t-1)}]}\). Intuitively, relations with different types and directions can derive various local graph structures between entities. Therefore, SiepNet accumulates the overall features of each entity by relation-specific transformations, i.e., \(\sum _{s \in N_{o,r}^{t} } \mathcal {F}_{str}(h_{s}^{(t-1)},r^{(t-1)})\). To calculate the past messages of an entity, SiepNet introduces a single self-connection to each node, i.e., \(W_{o}^{(t-1)}h_{o}^{(t-1)}\). Finally, SiepNet combines both the overall features and information from past steps, and outputs a sequence of representations notated as \(\left\{ H^{(1)},\cdots ,H^{(t)} \right\} \), where \(H^{(t)}=\left\{ h_1^{(t)},\cdots ,h_n^{(t)} \right\} \) denotes the representations of entities in each single graph snapshot \(G_t\).

4.3 Evolutionary Patterns

Besides aggregating local structural information, previous facts also influence current representations. Moreover, facts are always evolving over adjacent time windows, further changing the local structural information of the current graph snapshot. Intuitively, we should capture these two evolutionary patterns, i.e., long-range historical dependence and short-range structural dependence. To achieve this, we design an attention-based recurrent block in SiepNet to capture evolutionary patterns in TKGs. Formally, SiepNet combines the local structural representation \(h_o^{(t)}\) and the historical representation \((\textrm{h}_o^{(t-1)}, Z^{(t-1)})\):

$$\begin{aligned} \textrm{h}_o^{(t)} ,Z^{(t)}:=\mathcal {F}_{evo}(h_o^{(t)},\textrm{h}_o^{(t-1)}, Z^{(t-1)}) \end{aligned}$$
(3)

where \(\mathcal {F}_{evo}\) is a recurrent operator, which allows SiepNet to learn long-range dependencies of sequence data and explore the evolving patterns of temporal knowledge graphs to update current representations. When there are few structural dependencies from neighbor nodes (i.e., \(h_o^{(t)}\longrightarrow 0\)), current representations \((Z^{(t)}, \textrm{h}_o^{(t)})\) will be greatly influenced by long- and short-range historical dependencies \((Z^{(t-1)}, \textrm{h}_{o}^{(t)})\). Otherwise, local structural dependences \(h_o^{(t)}\) will have a greater impact on current representations.

Most existing works use simple recurrent neural networks to implement \(\mathcal {F}_{evo}\) in message propagation, e.g., RE-NET [14] uses GRU [2], EvoNet [11] uses LSTM [10], etc. For historical snapshot propagation, these methods only summarize the current representations of nodes, i.e., \(Z^{(t)}=\sum _{o\in V_t} \textrm{h}_o^{(t)}\), ignoring dynamic interactions of nodes across time windows. However, both long-range historical dependence and short-range dynamic dependence present different temporal information, influencing the evolution of facts. To improve the ability of temporal link prediction, \(\mathcal {F}_{evo}\) should consider historically long-range and short-range dependence of previous facts \(G_{1:t}\) when modelling snapshot propagation, and thus influence current representations through local dynamic dependence of node interactions. Specifically, \(\mathcal {F}_{evo}\) can be implemented by

$$\begin{aligned} \mathcal {F}_{evo}(h_{o}^{(t)},\textrm{h}_o^{(t-1)}, Z^{(t-1)}) =\left\{ \begin{matrix}Z^{(t)}=\textrm{RNN} \left( Z^{(t-1)},G_{t} \oplus g(\alpha _t \sum _{o\in V_{t}}\textrm{h}_{o}^{(t)})\right) ~~~~~~ \\ \\ \textrm{h}_{o}^{(t)}=\textrm{RNN} \left( (1-\alpha _t) \textrm{h}_{o}^{(t-1)},h_o^{(t)} \oplus g(\alpha _t Z^{(t-1)}) \right) \end{matrix}\right. \end{aligned}$$
(4)

where \(\oplus \) denotes the concatenation operator and \(g(*)\) is an element-wise max-pooling operator. We use a recurrent model RNN to update current representations \( \textrm{h}_{o}^{(t)}\) based on historical representation \((\textrm{h}_o^{(t-1)} ,Z^{(t-1)})\) and current local structural representation \(h_o^{(t)}\), and capture evolutionary patterns \(Z^{(t)}\) based on long-range and short-range dependencies \((Z^{(t-1)}, \textrm{h}_{o}^{(t)})\) as well as current facts \(G_t\).

Typically, the impact of long-range historical dependence and short-range structural dependence on current representations varies over time. Accordingly, we design the following temporal attention mechanism as follows to capture temporal information in node interactions, which in turn helps to model the long-range and short-range evolutionary patterns of facts.

$$\begin{aligned} \alpha _t = \textrm{softmax}(W_{\alpha }(Z^{(t-1)}\oplus \sum _{s \in N_{o,r}^{t}} h_{s}^{(t)} )) \end{aligned}$$
(5)

where \(W_{\alpha }\) is a independent parameter matrix, updated automatically by backpropagation. The attention score \(\alpha _t\) re-weights the two evolutionary patterns, which is calculated based on long-range evolutionary dependencies and short-range structural dependencies.

The recurrent model RNN aims at smoothing two input vectors at each time step, which can be implemented using many existing methods. Here, we utilize GRU to update \(\textrm{h}_{o}^{(t)}\) as an example.

$$\begin{aligned} \begin{aligned} \textrm{h}_{o}^{(t)}: \left\{ \begin{array}{l} a^{(t)} = h_o^{(t)} \oplus g(\alpha _t Z^{(t-1)}) \\ i^{(t)} = \sigma (W_i a^{(t)} + U_z (1-\alpha _t) \textrm{h}_{o}^{(t-1)}) \\ r^{(t)} = \sigma (W_r a^{(t)} + U_r (1-\alpha _t) \textrm{h}_{o}^{(t-1)}) \\ \textrm{h}_{o}^{(t)} = (1-i^{(t)}) \circ (1-\alpha _t) \textrm{h}_{o}^{(t-1)} + i^{(t)} \circ \textrm{tanh}(W_h a^{(t)}+U_h(r^{(t)}\circ \textrm{h}_{o}^{(t-1)})) \end{array}\right. \end{aligned} \end{aligned}$$
(6)

where \( i^{(t)}\) and \( r^{(t)}\) are update gate and reset gate respectively, while \(\circ \) is a Hadamard operator. The current node representations are updated by receiving their currently local structure dependencies and historical evolution dependencies, with a temporal attention score regulating the weight of long-range and short-range dependencies.

Consequently, both the representations \(\textrm{h}_{o}^{(t)}\) and \(Z^{(t)}\) capture the evolutionary patterns and local structural dependencies up to the t-th time step, which in turn can be used to predict the facts \(G_{t+1}\) at the next time step. Then, we encode the current graph snapshot \(G_t\) as representation \(\textbf{H}_G^{(t)}\) with a fully connected layer, which can be formulated as

$$\begin{aligned} \textbf{H}_G^{(t)} = \textrm{FCL}_n(Z^{(t)}\oplus \sum _{o \in V_t} \textrm{h}_o^{(t)}; \theta _n) \end{aligned}$$
(7)

where the input are the concatenated features of all \(\textrm{h}_{o}^{(t)}\) and \(Z^{(t)}\), while \(\theta _n\) denotes the parameters of \(\textrm{FCL}_n\). Then we use a classifier to estimate the probability of the next graph snapshot \(\textbf{P}(G_{t+1}\mid \textbf{H}_G^{(t)} ) \).

4.4 Model Optimization

As the topology of TKGs changes over time, SiepNet model should continuously update its parameters to accommodate the evolutionary patterns of TKGs. Furthermore, note that the snapshots closer to the next time slice \((t+1)\) have more similar characteristics than those farther from the ground truth. Hence, we introduce the first l graph snapshots \(G_{t-l+1}^{t+1}=\left\{ G_{t-l+1},\ G_{t-l+2},\cdots ,\ G_{t+1}\right\} \) as the input, which is close to the next time slice \((t+1)\), based on minimizing the cross-entropy loss \(\mathcal {L}\) for training.

$$\begin{aligned} \mathcal {L} =-\sum _{\tau = (t-l)} ^{t} \hat{G}_{\tau +1} \textrm{log} \textbf{P}(G_{\tau +1}\mid \textbf{H}_G^{(\tau )} ) + (1- \hat{G}_{\tau +1}) \textrm{log} (1- \textbf{P}(G_{\tau +1}\mid \textbf{H}_G^{(\tau )} )) \end{aligned}$$
(8)

where \( \hat{G}_{\tau +1} \in \mathbb {R}^{\mid G_{\tau +1} \mid } \) is the label set of ground truths with elements of 1 if the fact occurs and 0 otherwise. SiepNet can fully aggregate the latest temporal information of the dynamic network, according to the sequence of previous snapshots \(G_{t-l+1}^{t+1}\), which is considered as the most similar characteristics to the actual snapshots of \(G_{t+1}\).

As in previous work on regularization, we employ dropout [9] to alleviate overfitting while capturing local structural information and evolutionary patterns.

5 Experiments

5.1 Experimental Setup

Datasets. In our experiments, we used five widely use TKG datasets, including three event-based TKGs (i.e., GDELT [18], ICEWS14 [27], and ICEWS18 [29]) and two public TKGs (i.e., WIKI [17] and YAGO [24]) specifically.

Evaluation Setting and Metrics. Following the prior work [34], we split each dataset except ICEWS14 into a training set, a validation set, and a test set at a ratio of 80%/10%/10%, respectively. For dataset ICEWS14, we directly utilize the splitting provided in [27]. We report a widely used filtered settings [8, 14, 34] of Mean Reciprocal Rank (MRR) and Hits at K (Hits@K), which are standard evaluation metrics for link prediction.

Baselines. We compare our proposed model SiepNet with a variety of static KG models and TKG models. Static KG models include DistMult [32], R-GCN [23], ConvE [4] and RotatE [26]. TKG models include TTransE [13], TA-DistMult [5], TA-TransE [5], HyTE [3], RE-NET [14], TeMP [30], RE-GCN [20], xERTE [6], TANGO-TuckER [7], TANGO-Distmult [7], CyGNet [34], EvoKG [22] and TLogic [21].

Model Configurations. Initially, we set the length of the history l to 10, which means that SiepNet saves the sequence of 10 previous snapshots. The dropout rate is set to 0.5, and the embedding size is set to 200 to match the baseline methods set in [34]. The model parameters are optimized using Adam optimizer [15] with a learning rate of 0.001. The training epoch is set to 20, which is sufficient for convergence in most cases. All experiments are conducted on GeForce GTX 3080 Ti. The baseline results are also adopted from [33].

5.2 Performance Evaluation

Overall Performance. Table 1 and Table 2 show the temporal link prediction performance of SiepNet and baselines on five real-world TKGs, where the best results are shown in bold. We use “–” instead of experimental results that are not run out within a day. Remarkably, SiepNet consistently outperforms the baselines in most cases, which convincingly validates its effectiveness.

Table 1. Performance (in percentage) for temporal link prediction on YAGO and WIKI datasets under the filtered settings
Table 2. Performance (in percentage) for temporal link prediction on ICEWS14, ICEWS18 and GDELT datasets under the filtered settings

Specifically, static KG methods usually show promising results, but lag behind the best-performing TKG method SiepNet to a large extent, as they cannot capture the sequential patterns across time-stamps. Surprisingly, almost static KG methods normally perform better than two TKG methods (i.e., TTransE and HyTE) on five TKG datasets. It owes to the fact that TTransE and HyTE learn representations for each snapshot independently, instead of capturing long-range historical dependencies. Besides, the experimental results of TA-DistMult and DistMult validate the effectiveness of incorporating temporal information for temporal link prediction, where TA-DistMult is a temporal-aware version of static KG method DistMult.

In addition, SiepNet drastically outperforms other TKG methods, although they all consider dynamic features of facts. Especially on YAGO dataset with the most facts, SiepNet leads to improvements of 2.70% in MRR, 6.97% in Hits@1, and 5.10% in Hits@3 compared with the best baseline. We believe this is due to that SiepNet considers dynamic long-range and short-range historical dependencies using temporal attention, while other TKG models ignore the evolutionary patterns. The excellent performance of SiepNet and RE-NET validate the importance of long-range dependencies for link prediction. Although our performance in Hits@3 of YAGO, WIKI, and GDELT dataset are not the best, the remarkable performances in Hits@1 and MRR prove that our algorithm SiepNet is able to predict future facts more accurately. The main reason is that there is a large number of repetitive facts in these datasets. Thus, CyGNet and EvoKG perform well on Hits@3, but they cannot predict more accurate facts, resulting in Hits@1 much lower than ours. TeMP is designed to handle knowledge graph complementation tasks (graph interpolation) rather than predicting future events, so it does not perform as well as extrapolation models. Although xERTE supports a certain degree of predictive interpretation capability, it cannot efficiently handle large-scale datasets, such as GDELT and WIKI.

Note that static KG model and TKG model perform similarly well on YAGO and WIKI, but poorly on ICEWS14, ICEWS18 and GDELT. As discussed in [22], the time intervals of YAGO and WIKI datasets are much larger than other datasets. Therefore, each time-stamp in YAGO and WIKI has more local structural information than the other three datasets. Besides, ICEWS14 and ICEWS18 are extracted from the Integrated Crisis Early Warning System (ICEWS), which records many recurring political events with time stamps. Accordingly, only modelling repetitive patterns or 1-hop neighbors will lose a significant amount of evolutionary patterns and structural information. The experimental results show that SiepNet is able to better model these datasets, which contain complex dynamic dependencies over concurrent facts.

Performance over Time. To further evaluate the performance of SiepNet over time, we compared the performance in percentage of different timestamps, using filtered Hits@3 on YAGO, WIKI, and ICEWS18. As shown in Fig. 3, SiepNet consistently outperforms baselines over different timestamps. The performance of each method varies with the entities in the test set at each timestamp. In addition, the difference between our TKG model SiepNet and static KG model ConvE evolves slowly as time goes by, as shown in Fig. 3. We believe that further facts in the future are even harder to predict.

Fig. 3.
figure 3

Performance over specific timestamps with filtered Hits@3.

Specifically, each method shows a significant performance improvement at a particular timestamp in the future. We believe this is because facts from the past tend to reappear at the future timestamps. As shown in Fig. 3(a), all methods perform poorly in 2016, but in 2017 surpass their performance in 2013.

5.3 Ablation Study

To eliminate the effect of different model components of SiepNet, we create variants of SiepNet by adapting the use of model components and report the performances (in percentage) on YAGO dataset.

Table 3. Ablation study for temporal link prediction

Evolutionary Patterns. To demonstrate how evolutionary patterns affect the final results of SiepNet, we conduct experiments using l random past graph snapshots rather than l snapshots closest to the current graph snapshot. The results denoted as SiepNet w. R are presented in Table 3. Obviously, SiepNet w. R hurts model quality, suggesting that modelling the snapshots closer to the current time slice can improve performance.

Fig. 4.
figure 4

Performance over different lengths of time slice with filtered MRR.

As described in Sect. 4.5, graph snapshots of adjacent time slices tend to have more similar characteristics. Thus, the length of previous time slice l affects the performance of our proposed model SiepNet. Figure 4 shows the performance of SiepNet on YAGO, WIKI and ICEWS18 datasets, with different lengths of time slices l for temporal link prediction. As the length of time slices increases, SiepNet performs better on MRR. Nevertheless, MRR tends to be stable when the length of time slices is over 6. As a result, longer time slices introduce more noise and lead to performance fluctuations of SiepNet.

Evolutionary Directions. SiepNet w. B in Table 3 indicates the variant of SiepNet using Bi-GRU instead of GRU to explore evolving patterns of TKGs. The experimental results of SiepNet w. B and SiepNet are similarly well on YAGO, as compared with other variants of SiepNet. Therefore, combining forward and backward snapshot information has less significant impacts on the performance of SiepNet and more computational overhead.

Temporal Attention. The results denoted as SiepNet w/o TA in Table 3 demonstrate the performance of SiepNet without temporal attention component. It can be seen that SiepNet w/o TA performs noticeably worse than SiepNet on YAGO datasets, which justifies the necessity of temporal attention component to model long-range and short-range dependencies.

6 Conclusion

In this paper, we propose a novel temporal link prediction model SiepNet, which adapts to the evolutionary process of dynamic facts by modelling temporal adjacency facts with associated semantic and informational patterns. Specifically, SiepNet explores the local structural information based on a relation-aware GNN architecture. In addition, SiepNet incorporates temporal attention to help with modelling long-range and short-range historical dependencies hidden in TKGs. The experimental results on seventeen baselines demonstrate the significant advantages and promising performance of SiepNet in temporal link prediction. In future work, we will explore the persistence modelling of facts, rather than just predicting missing facts at a certain time slice t.