Keywords

1 Introduction

Collaborative Filtering (CF) is the most popular recommendation strategy, which exploits users’ historical interactions to infer their preferences. However, they usually suffer from the data sparsity and cold start problem. Various types of side information have been incorporated to address it, such as social networks  [7], temporal context  [11] and user/item attributes  [14]. Knowledge graph (KG) as the source of auxiliary data has been widely adopted to enhance recommendation. It connects various entities and links from different topic domains as nodes and edges to develop insights on recommendation.

Some state-of-art methods utilizing KG are proposed to boost recommendation quality. Meta-path based methods extract paths between two entities to represent different semantic relations, which leverages the relations of item-item  [19], user-user  [1, 10, 21], and user-item  [6, 9]. They can generate effective recommendation by modeling the user preference based on the semantic relations. Because the extracted meta-paths rely on manually designed features based on domain knowledge, they are always incomplete to represent all semantic relations. KG embedding based methods  [14, 16, 20] automatically learn the embeddings of entities to capture entity semantics and incorporate them to recommendation framework. But A major limitation of these KG embedding methods is less intuitive and effective to represent the connection semantic relations of entities. For example, Zhang et al.  [20] extracted items’ semantic representations from structural content, textual content and visual content by capturing entity semantics via TransR, but ignored the high-order semantic relations between paired entities for recommendation. Then, some methods try to seek a way which not only can capture the semantic relations of entities and paths, but also not rely on handcrafted features and domain knowledge. Sun et al.  [12] employed recurrent neural network (RNN) to learn semantic representations of both entities and high order paths to improve recommendation.

Almost all above methods rely on knowledge graph which includes various information from different domains. However, information fusion and network alignment are also very difficult. To address the limitations of constructing knowledge graph, a solution is to design a lightweight Collaborative Knowledge Graph (CKG) by only utilizing the facts in one domain as knowledge. CKG that often includes the interaction behaviors of users on items and side information for items (e.g., item attributes and external knowledge) and users (e.g., age, zip code, and occupation). Wang et al.  [17] proposed Knowledge Graph Attention Network (KGAT), which explicitly models high-order connectivities in CKG and recursively propagates the embedding from a node’s neighbors to refine its representation. But it only considered the high-order relations between users and items. Wang Hongwei et al.  [15] proposed RippleNet which extends a user’s potential preference along links in the CKG. These methods model user’s preference by utilizing the high-order semantic representations and relations into recommender system, while they don’t consider temporal influence. Xiao et al.  [18] proposed KGTR which captures the joint effects of interactions by defining three categories relationships in CKG and considers the effect of temporal context. It can obtain the first and second order semantic relations by TransE and the embeddings of user’s and item’s various attributes, however, can not learn the high-order semantic relations.

Considering the limitations of existing solutions, we believe it is critical to develop a model that can effectively exploit high-order connections in CKG and take temporal information into account. To this end, we propose a novel High-order semantic Relations-based Temporal Recommendation (HRTR), which captures the joint effects of high-order semantic relations in CKG for recommendation. HRTR firstly mines semantic relations about some entities from different order connectivities. Then, it jointly learns high-quality representations of users, items, and their attributes to capture structural knowledge by employing TransE  [2] and to explore sequence information by using recurrent neural network to encode semantic paths, which are regard as the users’/items’ long-term static features. Next, by splitting the users’ interactions with a time window, the users’ short-term dynamic preferences are learned by LSTM  [5]. The set of users who have recently interacted with an item is used to explore the items’ short-term features by attention mechanism  [13]. At last, the long-term and short-term preferences of users and items are integrated to recommend an item list to a user.

We summarize our main contributions as follows:

  • We propose a joint learning model to capture high-quality representations of entities in a lightweight Collaborative Knowledge Graph, which not only can capture structural information, but also can explore sequence information by automatically encoding extracted semantic paths.

  • We seamlessly fuse high-quality representations of entities and temporal context for recommendation, which effectively captures the users’ and items’ stable long-term and short-term dynamic preferences.

  • We conduct experiments on real-world datasets, and the results show the significant superiority of HRTR over several state-of-the-art baselines.

2 Related Work

In this section, we review existing works on meta path based methods, KG embedding based methods, and semantic relation based methods, which are most related to our work.

2.1 Meta Path Based Methods

Meta path based methods capture the relations between two entities in KG by defining meta-paths, which are predefined by using handcrafted features based on domain knowledge. They generally infer a user preference by leveraging the different entity similarity of item-item  [19], user-item  [6, 9], and user-user  [1, 10, 21]. HeteRec  [19] learned the user preference on an item connected with his/her rated item via different meta paths. SemRec  [10] captured semantic similarity among users by introducing the weighted meta path. Wang et al.  [1] and Zheng et al.  [21] respectively proposed matrix factorization model to regularize user similarity derived from meta path. SimMF  [9] extended matrix factorization based model by adding meta path based user-item similarity. They successfully model the user preference based on the semantic relations, but they heavily rely on manually designed features based on domain knowledge and can not completely represent all semantic relations between two entities.

2.2 KG Embedding Based Methods

KG embedding based methods first capture the entity embedding by exploiting the structural information of KG and incorporate the learned entity embeddings into a recommendation framework. CKE proposed by Zhang et al.  [20] combined CF with item embeddings obtained via TransR  [8]. DKN  [16] combined the treated entity embeddings with CNN for news recommendation. SHINE  [14] embed three types of networks by designing deep autoencoders for celebrity recommendations. But a major limitation of these KG embedding methods is less intuitive and effective to represent the semantic relations of entities.

2.3 Semantic Relation Based Methods

Another kind of methods effectively improves the performance of recommendation by mining the high-order semantic relations or integrating various other information and strategies to capture better representations for recommendation. Sun et al.  [12] employed RNN to model different order semantics of paths to characterize user preferences. Wang et al.  [17] proposed knowledge graph attention network (KGAT), which recursively propagates the embedding from a node’s neighbors to refine its representation and discriminates the importance of neighbors by using an attention mechanism. Wang Hongwei et al.  [15] proposed RippleNet which extends a user’s potential preference along links in CKG. These methods model users’ preferences by utilizing the high-order semantic representations and relations, while they do not consider temporal influence. Xiao et al.  [18] proposed KGTR which captures the joint effects of interactions by defining three categories relationships and temporal context.

Different from these works, our proposed method not only can effectively exploit semantics of entities and high-order connectivities, but also take the long-short term preferences of users and items into account.

3 Our Proposed Model

Let \(U=\{u_1,u_2,\cdots \}\) and \(V=\{v_1,v_2,\cdots \}\) denote the sets of users and items, respectively. \(M=\{M_{uv}|u\in U, v \in V\}\) is a sparse user-item interaction matrix that consists of users, items, and the interactions which include rating, browsing, clicking and so on. Meanwhile, there are various attributes of users and items, such as gender, age, occupation, which are significant auxiliary information for recommendation result. We aim to build temporal personalized recommendation model for a user based on the semantic embeddings of users, items and their attributes, and then recommend items to users.

The overview of our proposed HRTR is shown as Fig. 1, which consists of three parts: (1) learning high quality semantic representations of users, items and their attributes by TransE and RNN; (2) training long-short term preferences of users and items, in which the learned semantic representations are considered as the long-term features, the short-term features of users and items are captured by LSTM and attention machine based on the learned semantic embeddings and interactions, repectively; (3) predicting how likely a user interacts an item by integrating these learned long-short term preferences into a sigmoid based prediction model.

Fig. 1.
figure 1

The framework of high order semantic relations temporal recommendation

3.1 Different Order Semantic Relations Mining

Designing Collaborative Knowledge Graph. Given U, V, M as well as users’/items’ attributes, user-item interaction graph and user/item attribute graph are defined, which is regarded as the formal construction of Collaborative Knowledge Graph (CKG).

As illustrated in Fig. 2, taking movie data as an example, the users and items are treated as entities. When there is an observed interaction between user u and item i (e.g., purchases, clicks, ratings), a link will be constructed between them. Here, user-item interaction graph \(G_1\) is denoted as \(G_1=\{(u,m_{uv},i)|u\in U, i\in V, m_{uv}\in R' \}\), and \(R'\) is the interaction sets. In addition to the interactions, users/items have different types of side information to profile them. The user/item attribute graph \(G_2\) is defined to organize the side information in the form of directed graph. Formally, it is presented as \(G_2=\{(h',r',t')|h'\in U\cup V ,t'\in \varPsi , r'\in \varOmega \}\), where \(\varPsi \) is the attribute values set, \(\varOmega \) is the attribute set and contain canonical relations and their inverse direction. \((h',r',t')\) describes that there is a semantic relationship \(r'\) from \(h'\) to \(t'\). For example, (Tomage, 45) states the fact that Toms age is 45. Then, Collaborative Knowledge Graph which encodes user interactions and the side information of users and items is defined as a unified graph \(G=\{(h,r,t), h,t\in \varepsilon , r\in R \}\), where \(\varepsilon = U \cup V \cup \varPsi \), \(R=R' \cup \varOmega \).

Fig. 2.
figure 2

Different order semantic relations mining on CKG

Different Order Semantic Relations Mining. The key to successful recommendation is to fully exploit the high-order relations in CKG, which represents the way to learn the embedding of entities by using the first-order, second-order or even higher-order semantic relations, respectively. Formally, we define the L-order relations between nodes as a multi-hop relation path: \(e_{0} \!\xrightarrow {r_1} e_1 \!\xrightarrow {r_2}\cdots \!\xrightarrow {r_L}e_L\), where \(e_l\in \varepsilon \) and \(r_l \in R\), \((e_{l-1}, r_l, e_l)\) is the \(l-th\) triplet, and L is the length of relation path. Then we can denote the semantic relation paths that reach any node from \(e_0\) with different length l. As shown in Fig. 2, for an entity “Tom”, we can exploit the first-order semantic relations, \(Tom\!\xrightarrow {age}45\), \(Tom\!\xrightarrow {occupation}teacher\), \(Tom\!\xrightarrow {rating} The Terminal (TT)\) and \(Tom\!\xrightarrow {rating} Schindler's List (SL)\), which represents the attributes of Tom, and his rating activities, respectively. They can be easily extended to the second-order semantic relations, which contains more richer semantics. For example, \(Tom\!\xrightarrow {age}45\!\xrightarrow {-age}Alice\), \(Tom\!\xrightarrow {occupation}teacher\!\xrightarrow {-occupation}Alice\), \(Tom\!\xrightarrow {rating} TT\!\xrightarrow {-rating}Bob\), \(Tom\!\xrightarrow {rating} SL\!\xrightarrow {directedby}Steven\), which indicates semantic relations between Tom and Alice, Bob, Steven relying on common attributes, rating on one item TT, or the relationship on SL. However, to exploit such high-order relations, there are challenges: 1) the number of semantic paths increases dramatically with the order size, which will lead to more computation in training it, and 2) the different order relations are of different importance to recommendation, which requires the model to carefully define them.

Generally, shorter semantic paths indicate stronger relations, while longer ones may represent more semantic relations. To increase model efficiency, We only consider the semantic paths with the length less than a threshold and take the semantic relations started from an entity of user or item into account.

3.2 Semantic Relation Learning

We aim to parameterize entities and relations as vector representations to improve recommendation, which not only learns the structural information, but also the sequence information of semantic relations. Here we employ TransE  [2], a widely used method, on CKG to capture this structural knowledge. Sequence information of semantic paths is exploited by adopting RNN.

Structural Embedding. To capture this structural information, TransE is used to learn it by optimizing the probability P(hrt) of the relational triples (hrt), which exists in the graph. So the probability P(hrt) is formalized as follows:

$$\begin{aligned} L_{SE}=P(h,r,t) =\sum _{(h,r,t^{+})\in CKG} \sum _{(h,r,t^{-})\in CKG^{-}}\sigma (g(h,r,t^{+}) - g(h,r,t^{-})) \end{aligned}$$
(1)

where \( \sigma (x)=1 /(1 + exp(x))\) is sigmoid function. The CKG and the \(CKG^{-}\) are the positive and negative instances set, respectively. \(g(\cdot )\) is the energy function which represents the correlation from h to t in the relation r. The score of \(g(\cdot )\) is lower if the triplet is more likely to be true. Here, we define g(hrt) as follow:

$$\begin{aligned} g(h,r,t)=||e_h + e_r-e_t||_{L_{1}/L_{2}} + b_{1} \end{aligned}$$
(2)

where \(e_h, e_r, e_t\) are the embedding of h,r and t; \(b_1\) is a bias constant. The relations of entities are modeled through the triples, which can inject the direct connections into embedding to increase the model representation ability.

Sequence Embedding. Structural embedding can capture entity semantics and semantic relations between entities, however, can not study the semantic relations of high-order paths. By regarding the entities in different high-order semantic paths as a sequence, we naturally think that recurrent neural networks are suitable for modeling different order semantic paths. This is mainly because that it has capability in modeling sequences with various lengths. To this end, we adopt RNN to learn the semantics of entities by encoding the semantic paths with different lengths, and then a pooling operation is used to get the final semantic representation.

Assume n paths of different lengths from an user \(u_i\) to any another entity \(e_j\), i.e., \(p_l=e_0\!\xrightarrow {r_1}e_1\!\xrightarrow {r_2}\cdots \cdots \!\xrightarrow {r_T}e_T\) with \(e_0=u_i\), the RNN learns a representation \(h_{lt}\) for each entity \( e_t\) in \(p_l\), which considers both the embeddings of entities in the path and the order of these entities. It encodes the sequence from the beginning entity of the path \(e_0\) to the subsequent entity \(e_t\). For entity \(e_t\)

$$\begin{aligned} O_{lt}=\delta (W\cdot O_{l(t-1)}+ H\cdot h_{lt}+b_2) \end{aligned}$$
(3)

where W is the linear transformation parameters for the previous step, H is for current step; \(b_2\) is the bias term; \(\delta \) is the sigmoid function. \(O_{l(t-1)}\) is a learned hide state by encoding the subsequence from \(e_0\) to \(e_{t-1}\), \(O_{lt}\) is a learned hide state after learning the embedding of \( h_{lt}\) at step t. For n paths from a user entity \(u_i\), their last representations are \(O_{1T_1},O_{2T_2}\cdots O_{nT_n}\), where \(T_n\) is the length of \(p_n\). Based on this, we get the entity representation \(O[u_i]\) by adding a max pooling or an average pooling operation towards all the n paths. Similarly, we can get the representation \(O[v_j]\) of item \(v_j\). So the objective function can be defined as:

$$\begin{aligned} L_{SP} =\sum _{(u_i,v_j)\in CKG^{+}}-\ln \delta (\hat{y}(u_i,v_j)-y(u_i,v_j)) \end{aligned}$$
(4)

where the probability \(\hat{y}(u_i,v_j)=\delta (O[u_i]^T O[v_j])\) is predicted by conducting inner product of user and item representations, \(CKG^{+}\) is positive instances set, \(\sigma (\cdot )\) is the sigmoid function.

Finally, we have the objective function to jointly learn Eqs. (1) and (4), as follows:

$$\begin{aligned} L =L_{SE}+L_{SP} \end{aligned}$$
(5)

We optimize \(L_{SE}\) and \(L_{SP}\) alternatively. Specifically, all representations for nodes are updated by randomly sampling a batch of instances \(h,r,t,t'\); hereafter, we randomly sample some users or items and mine semantic paths starting from them, and update the representation for all nodes. Then we can get the embeddings of users, items and their attributes \(U_L\), \(V_L\), \(U_a\), \(V_a\), which are regard as the long term preferences of users and items for temporal recommendation.

3.3 Training Long-Short Term Preference of Users and Items

Users Long-Short Preference. A user preference is compose of the long-term and short-term preference. The long-term preference indicates the stable interest, which is represented by semantic presentations of the user’s interacted items and their attributes. The short-term preference indicates a user’s dynamic interest, which are learned by LSTM here. The size of time window t is the key issue when modeling the user dynamic preference. The more fine-grained interest changes can be captured by using smaller time window, but the training data is very sparse and the learning process is difficult. On the contrary, the larger time window will has sufficient training data, while the model is less adaptive for capturing dynamics changes of a user preference. To this end, we adopt the latest n items to model the user short term preference, which ensures the enough training data to train the user preference. Instead of inputting the user interacted history in form of items sequence into LSTM, the learned semantic representations of the interacted items and their attributes are regarded as pre-train input of LSTM. This makes the training faster and more effective. Finally, the output of LSTM \(U_S\) is taken as the user short-term preference.

Items Long-Term Preference. Similar to the user preference, the item preferences are also made up of two parts. The learned semantic representations of items and their attributes are regarded as their long-term preferences. Their short-term features are determined by the popularity of them changing over time. We think that the most fashionable items currently have a greater contribution to user preference. Here, we adopt attention machine to capture the short-term characteristics of items because of its capability of keeping the contextual sequential information and exploiting the relationships between items. At last, the items recently viewed by all users are used as attention input. Similar to  [13], the attention vector for items \((1,2,\cdots I)\) are calculated by using Eq. (6) at each output time t.

$$\begin{aligned} \begin{aligned} V'_{s} =\sum (\delta (z^{T}tanh(W_{c} c_{t}+ W_{y} y_{i}))y_{i}) \end{aligned} \end{aligned}$$
(6)

where z, \(W_{c}, W_{y}\) are learnable parameters, \(c_{t}\) is the training item at time t and \( y_{i}\) is i-th item in input sequence. \(\delta (\cdot )\) is a sigmoid function. Lastly, \(c_{t}\) and \(V'_{s}\) are concatenated as the next input \(c_{t+1}\). The final output \(V_{s}\) can be regarded as items’ dynamic preferences.

3.4 Recommending Items to Users

Our task is to predict items which the user likely to prefer to when giving the long-short term preferences of users, items and their attributes. They can be concatenated into a single vector as the input of a standard multi-layer perceptron (MLP), as follow:

$$\begin{aligned} \begin{aligned} U_P&=U_L\Vert U_a\Vert U_S\\ V_P&=V_L\Vert V_a\Vert V_S \end{aligned} \end{aligned}$$
(7)

where \(\Vert \) is the concatenation operation. \(\hat{y}_{uv}\) is used to represent the probability of the user u interact with the item v. It is represented by Eq. (8)

$$\begin{aligned} \begin{aligned} \hat{y}_{uv}&=\sigma (h^{T}O_L) \end{aligned} \end{aligned}$$
(8)

where \(O_L\) is output of MLP. For any \(l-th\) layer, \(O_l\) is defined as Eq. (9)

(9)

where \(\phi _l\), \(\lambda _l\), \(\vartheta _l\) are the ReLU activation function, weight matrix and bias vector for the l-th layer’s perceptron, respectively. \(O_{l-1}\) is the \(l-1\)-th layer’s output of MLP. \(U_P\) and \(V_P\) are the input of input layer.

We treat \(y_{uv}\) as label, which represents the actual interaction. 1 means user u has interacted with item v, and 0 otherwise. Therefore, the likelihood function is defined as Eq. (10):

$$\begin{aligned} p(y,y^{-}|\varTheta _{f}) =\prod _{(u,v)\in y }\hat{y}_{uv}\prod _{(u,v)\in y^{-} }(1-\hat{y}_{uv}) \end{aligned}$$
(10)

Taking the negative logarithm of the likelihood, we gain the objective function as Eq. (11):

$$\begin{aligned} {\begin{matrix} L&{}=-\sum _{(u,v)\in y }\log \hat{y}_{uv}-\sum _{(u,v)\in y^{-} }\log (1-\hat{y}_{uv}) \\ &{}=-\sum _{(u,v)\in y\cup y^{-} } y_{uv}\log \hat{y}_{uv}+(1-{y}_{uv})\log (1-\hat{y}_{uv}) \end{matrix}} \end{aligned}$$
(11)

where \(y^{-}\)is the negative instances set, which is uniformly sampled from unobserved interactions with the sampling ratio related to the number of observed interactions. The output of each neuron is controlled in [0,1] by using sigmoid function. The learning will stop when their output is near to either 0 or 1.

We adopt adaptive gradient algorithm to optimize our model, which automatically adapts the step size to reduce the efforts in learning rate tuning. In the recommendation stage, candidate items are ranked in ascending order according to the prediction result, and we recommend the top ranked items to users.

4 Experiments

In this section, we perform experiments to evaluate HRTR. We first introduce experimental setup, including the datasets, baselines, evaluation metrics and parameter settings, and then present the experiment results against the related baselines.

4.1 Experimental Setup

Dataset Description. To demonstrate the effectiveness of HRTR, We conduct experiments on two public datasets. The one is MovieLens-1MFootnote 1 which consists of 6,040 users, 3,952 items and approximately 1M explicit ratings. Besides the user-item ratings, it also includes some auxiliary information about users and items, such as age, occupation, zip code, genre, title, director, etc. Ratings ranging from 1 to 5 are transformed into either 1 or 0, where 1 indicates a user have rated an item, otherwise 0. Another one is YelpFootnote 2, which contains 4700000 review information, 156000 businesses and 110000 users. Here we consider businesses, for example movie theaters, as items. We set the threshold to 10, which represents that a user has at least 10 interactions.

For each user, his/her interactions are first sorted based on interactive time, and the latest one is regarded as the test positive instance and others are utilized as positive instances for training. Finally we randomly sample four negative instances for per positive one, and randomly sample 99 unrated items as the test negative instances.

Evaluation Metrics. Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) are used to evaluate the performance of a ranked list  [4]. The HR intuitively measures whether the recommendation list includes the test item. The NDCG measures the ranking of the test item in top-K list. We calculate HR and NDCG for each test user and take the average score as the final results.

Baselines. To validate the effectiveness of our proposed HRTR, we compare it with the following state-of-the-art baselines

  • NCF  [3]: It uses a multi-layer perceptron replacing the inner product to learn the user-item interactions.

  • MCR  [6]: It is a meta path based model, which extracts qualified meta paths as similarity between a user and an item.

  • CKE  [20]: It is a collaborative KG embedding based method, which learns item latent representations by combining structural, textual and visual information in a unified framework.

  • KGTR  [18]: It is a semantic relation plus temporal method, which defines three relationships in CKG to express interactions for recommendation.

Parameter Settings. For structural embedding training, the embedding size is fixed to 100, hyper parameter \(b_1\) is set to 7, and \(L_{1}\) is taken as distance metric. For sequence embedding training, the threshold of the longest semantic path length is set to 6. A longer path hardly improves performance but brings heavier computational overhead. We implement HRTR in Python based on the Keras framework and employ mini-batch Adam to optimize it. For MovieLens dataset, 16 items are selected as the input of LSTM for one user to learn his/her short term preference. For Yelp, 8 items are selected to mine users’ preference. We select items which interacted by all users in the latest hour as input of attention to learn the items’ short term features.

We find out other optimal parameters for HRTR by experiment and take HR and NDCG as metrics to evaluate them. We apply a grid search to find out the best values for hyperparameters: the dimension of representation vector d is tuned in {50, 100, 150, 200}, the batch size s is searched in {128, 256, 512, 1024, 2048}. Due to space limitation and the same trend, only the results on MovieLens are shown in Fig. 3. From Fig. 3 we can see that HR@10 and NDCG@10 firstly increase and then decrease with the increase of d. The performance of HRTR is best when \(d=100\). As s increases, its performance increases rapidly and tends to be stable with different batch size and the best performance is obtained when bach size is set to 2048. So we set \(s=2048\) and \(d=100\) as the optimal parameters for MovieLens, while for Yelp \(s=2048\) and \(d=150\). The optimal parameters of baselines are set to their recommended values.

Fig. 3.
figure 3

The performance of our HRTR with different dimensions and batch sizes

4.2 Results and Analysis

Results Analysis of HRTR. Here, we report the performance of our HRTR for top@k recommendation on MovieLens, where k is tuned in {5, 10, 15, 20, 25, 30}. Firstly, the batch size is set to 2048 and the dimension is tuned in {50, 100, 150, 200}, the results are shown in Fig. 4. Some interesting observations can be noted from Fig. 4. With increasing of k, HR@k and NDCG@k are improved rapidly and tend to be stable. In general, HR@k and NDCG@k get better results when \(d=100\), while the difference is very slight. The result is consistent with the analysis in parameter settings. That shows it is not sensitive to vector dimension.

As shown in Fig. 5, we also tested the top@k item recommendations, when vector dimension is fixed to 100 while batch size is searched in {512, 1024, 2048}. We can observe that HR@k and NDCG@k increase when k varies from 5 to 30. HR@k and NDCG@k all get the better performance when batch size becomes larger and it is obvious for NDCG@k. Due to the same trends, the results on Yelp are not described in detail.

Fig. 4.
figure 4

HR@K and NDCG@K results with different dimensions

Fig. 5.
figure 5

HR@K and NDCG@K results with different batch size

Comparing HRTR with Baselines. Table 1 summarizes the performance of all methods on two datasets and the best performance is boldfaced. MCR without considering temporal information and CKE without employing semantic paths achieve poor performance compared with other methods. This confirm that semantic path and temporal context are useful to provide better recommendation. MCR highly outperforms all models if the ranking list is short. That is mainly because MCR exploits the entity relation in the KG by introducing meta paths relying on domain knowledge, which has its superiority when requiring shorter recommendation list. The performance of KGTR is closest to HRTR. The reason is that they all consider semantic relation with temporal context. While HRTR is still superior to KGTR. The mainly reason is that HRTR can capture high order semantic relation, KGTR can not do it. For sparser Yelp dataset, HRTR achieves significantly better performance than other methods, that shows that HRTR is more suitable for sparser data.

Table 1. Performance of all comparison methods across all the evaluation metrics.

5 Conclusions

In this paper, we proposed a High-order semantic Relations-based Temporal Recommendation model (HRTR) that explores the joint effects of different semantic relations in CKG and temporal context. HRTR overcame the limitations of existing KG-aware methods by jointly learning different order semantic relations between entities, which not only captures structural information, but also explores sequence information in CKG. HRTR respectively exploited the users’ and items’ long and short term features, which could capture their stable and temporal dynamic preferences. Extensive experiments were conducted on real datasets and the experimental results demonstrated the significant superiority of HRTR over state-of-the-art baselines. In future, we plan to design an effectively unified model to simultaneously explore the structural and sequence information to improve the performance.