Deep Hybrid Knowledge Graph Embedding for Top-N Recommendation

Li, Jian; Xu, Zhuoming; Tang, Yan; Zhao, Bo; Tian, Haimei

doi:10.1007/978-3-030-60029-7_6

Jian Li¹⁴,
Zhuoming Xu¹⁴,
Yan Tang¹⁴,
Bo Zhao¹⁴ &
…
Haimei Tian¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12432))

Included in the following conference series:

International Conference on Web Information Systems and Applications

2334 Accesses
13 Citations

Abstract

In knowledge graph (KG) based recommender systems, path-based methods make recommendations by building user-item graphs and exploiting connectivity patterns between the entities in the graph. To overcome the limitations of traditional meta-path based methods that rely heavily on handcrafted meta-paths, recent deep neural network based methods, such as the Recurrent Knowledge Graph Embedding (RKGE) approach, can automatically mine the connectivity patterns between entities in the KG, thereby improving recommendation performance. However, these methods usually use only one type of neural network to encode path embeddings, which cannot fully extract path features, limiting performance improvement of the recommender system. In this paper, we propose a Deep Hybrid Knowledge Graph Embedding (DHKGE) method for top-N recommendation. DHKGE encodes embeddings of paths between users and items by combining convolutional neural network (CNN) and the long short-term memory (LSTM) network. Furthermore, it uses an attention mechanism to aggregate the encoded path representations and generate a final hidden state vector, which is used to calculate the proximity between the target user and candidate items, thus generating top-N recommendation. Experiments on the MovieLens 100K and Yelp datasets show that DHKGE overall outperforms RKGE and several typical recommendation methods in terms of Precision@N, MRR@N, and NDCG@N.

Access provided by Autonomous University of Puebla. Download conference paper PDF

BIKAGCN: Knowledge-Aware Recommendations Under Bi-layer Graph Convolutional Networks

Article Open access 07 February 2024

Leveraging Knowledge Context Information to Enhance Personalized Recommendation

Graph attention-based collaborative filtering for user-specific recommender system using knowledge graph and deep neural networks

Article 23 August 2022

Keywords

1 Introduction

Knowledge graphs (KGs) have proven to be effective in improving recommendation performance [7, 16]. According to [7], there are three categories of KG-based recommendation methods: path-based methods, embedding-based methods, and unified methods. Path-based methods make recommendations by building a KG which contains users, items, and user-item interactions, and then exploiting connectivity patterns between the entities (users or items) in the KG. The traditional meta-path based methods use the semantic similarity of entities in different meta-paths [18] as graph regularization to refine representations of users and items [7]. However, such methods rely heavily on handcrafted meta-paths, which further rely on domain knowledge [14].

To overcome the limitations of meta-path based methods, deep neural network based methods have recently been devised to automatically mine the connectivity patterns between entities (i.e., path embeddings) in the KG. Path representations are learned by extracting path features from connectivity patterns to characterize user preferences towards items, which are finally used to generate recommendation.

However, existing deep neural network based methods, such as the Recurrent Knowledge Graph Embedding (RKGE) approach [14], usually use only one type of neural network to encode path embeddings. But this cannot fully extract path features, which limits performance improvement of the recommender system. Recently proposed deep hybrid models, such as [12], can combine several neural building blocks to form a more powerful recommendation model. To the best of our knowledge, existing deep hybrid models seldom use KGs for recommendation.

To overcome the weaknesses of existing methods, in this paper we propose a Deep Hybrid Knowledge Graph Embedding (DHKGE) method for top-N recommendation. DHKGE encodes embeddings of paths between users and items that are involved in the recommender system by combining convolutional neural network (CNN) and the long short-term memory (LSTM) network. It further uses an attention mechanism to aggregate the encoded path representations and generate a final hidden state vector. This vector is then used to calculate the proximity between the target user and candidate items, and generate top-N recommendation for the user by ranking the proximity.

In summary, the main contributions of this paper are as follows:

We propose the Deep Hybrid Knowledge Graph Embedding (DHKGE) method for top-N recommendation, which exploits a deep hybrid model to encode the path between users and items.
We propose to use the attention mechanism to distinguish the importance of multiple semantic paths between a user-item pair, so that salient paths play a greater role in modeling user preferences.
We evaluated our method on the MovieLens 100K and Yelp datasets. The experimental results show that our method overall outperforms RKGE and several typical recommendation methods in terms of Precision@N, MRR@N, and NDCG@N.

2 Related Work

2.1 Path-Based Recommendation Methods

Path-based methods make recommendations by building user-item graphs and exploiting connectivity patterns between the entities in the graph [7]. Traditional meta-path based methods rely heavily on handcrafted meta-paths. Deep neural network based methods can automatically mine the connectivity patterns between entities in the graph, thereby improving recommendation performance. For example, Hu et al. [9] proposed to leverage meta-path based context for top-N recommendation with a neural co-attention model. Sun et al. [14] proposed the RKGE approach that employs RNN to learn high-quality representations of both users and items, which are then used to generate better recommendations. Wang et al. [15] proposed the Knowledge-aware Path Recurrent Network (KPRN) which exploits KG to generate better recommendation, where the path embeddings in the KG are encoded with LSTM.

Existing path-based recommendation methods usually use only one type of neural network to encode path embeddings, while our proposed DHKGE exploits a deep hybrid model to encode path embeddings, which can generate a more comprehensive path representation for better recommendation.

2.2 Deep Neural Network-Based Recommendation

Deep neural networks have been widely used in recommender systems. The existing recommendation models can be divided into two categories: recommendation with neural building blocks and recommendation with deep hybrid models [4, 20].

In the first category, the recommendation models are divided into several subcategories [20] that exploit the deep learning models: CNN, recurrent neural network (RNN), and attentional model (AM), etc. For example, Kim et al. [10] proposed a context-aware recommendation model named convolutional matrix factorization (ConvMF) that integrates CNN into probabilistic matrix factorization.

Recently, researchers have proposed deep hybrid models, which can combine several neural building blocks to complement one another and form a more powerful recommendation model [20]. For instance, Lee et al. [12] proposed a deep learning recommender system that combines RNN and CNN to learn semantic representation of each utterance and build a sequence model for the dialog thread. To the best of our knowledge, existing deep hybrid models seldom use KGs for recommendation.

3 DHKGE: Deep Hybrid KG Embedding Method

In this section, we expatiate on our DHKGE method. After introducing concepts and notations, we first briefly explain its overall framework, then describe its main components, and finally describe model learning and recommendation generation.

Given a user set $ {\mathcal{U}} = \{ u_{1} ,\text{ }u_{2} ,\text{ } \ldots \text{ },\text{ }u_{m} \} $ and an item set $ {\mathcal{V}} = \{ v_{1} ,\text{ }v_{2} ,\text{ } \ldots \text{ },\text{ }v_{n} \} $ of the recommender system, we construct the users’ implicit feedback matrix $ {\mathbf{R}} \in {\mathbb{R}}^{m \times n} $, where each element is defined as follows: when user $ u_{i} $ interacted with item $ v_{j} $ set $ r_{ij} \; = \; 1 $ indicating that the user prefers the item, otherwise set $ r_{ij} \; = \text{ }0 $. Based on the matrix $ {\mathbf{R}} $ and an external knowledge source (e.g., the IMDB dataset) that describes the items, we build a KG for recommendation, which contains the users, items, user’s preference for the items, and the item descriptions extracted from the knowledge source, such as actors, directors and genres (as entities), as well as rating, categorizing, acting, and directing (as entity relations) in the domain of movie recommendation. We refer to all objects (e.g., users, items, actors, directors, and genres) except for various relations in the KG as entities. The definition [14] of the KG is given below.

Definition 1 (Knowledge Graph).

KG is defined as a directed graph $ {\mathcal{G}} = ({\mathcal{E}},\;{\mathcal{L}}) $, where $ {\mathcal{E} } = \{ e_{1} ,\;e_{2} ,\; \ldots \;,\;e_{{\left| {\mathcal{E}} \right|}} \} $ denotes the sets of entities and $ {\mathcal{L}} $ the sets of links. An entity type mapping function $ \phi :{\mathcal{E}} \to {\mathcal{A}} $ and a link type mapping function $ \varphi :{\mathcal{L}} \to {\mathcal{R}} $ are defined for the graph. Each entity $ e \in {\mathcal{E}} $ belongs to an entity type $ \phi (e) \in {\mathcal{A}} $, and each link $ l \in {\mathcal{L}} $ belongs to a link type (relation) $ \varphi (l) \in {\mathcal{R}} $.

Based on the KG definition, we further define the connected semantic paths between entity pair $ (e_{i} ,\;e_{j} ) $ as $ {\mathcal{P}}(e_{i} ,\text{ e}_{j} ) = \{ p_{1} ,\text{ }p_{2} ,\text{ } \ldots \text{ },\text{ }p_{s} \} $ with $ s $ being the number of paths. A semantic path of length $ T $ in $ {\mathcal{P}} $ is denoted as: $ p = e_{i} \mathop{\longrightarrow}^{{r_{1} }}e_{1} \mathop{\longrightarrow}^{{r_{2} }} \cdots \mathop{\longrightarrow}^{{r_{T} }}e_{j} $.

Following the two semantic path mining strategies proposed in [14], DHKGE only considers user-item paths $ {\mathcal{P}}(u_{i} ,\text{ }v_{j} ),\;u_{i} \in {\mathcal{U},}\;v_{j} \in {\mathcal{V}} $ that connect user $ u_{i} $ with all her rated items $ v_{j} $, and sets a length constraint for such paths, i.e., path length is $ T $.

3.1 Overview

Our goal is to fully extract the information in the semantic path to model user preferences, which are then used to generate better recommendations. To achieve this goal, we propose the deep hybrid knowledge graph embedding (DHKGE) method.

The core ideas of DHKGE is as follows: Given a user and an item, DHKGE first automatically extracts all semantic paths between the user and the item from the KG according to the semantic path mining strategies. It then uses a deep hybrid model to obtain a final hidden vector for quantifying the relation (proximity) between the user and the item. Finally, it generates a top-N recommendation list for the user by sorting the proximity scores of the candidate items in descending order.

The overall framework of DHKGE is depicted in Fig. 1. As shown in the figure, DHKGE is composed of four key components: the embedding layer, CNN layer, LSTM layer, and attention layer, which are further described as follows:

The embedding layer: This layer takes the semantic path of length $ T $ as input, learns $ T + 1 $ low-dimensional embedding vectors for $ T + 1 $ entities on the semantic path, and outputs these vectors as an embedding of the path.
The CNN layer: This layer takes the path embedding as input, uses multiple filters to extract the local features of the path to form $ T $ local feature vectors, and outputs these vectors.
The LSTM layer: This layer takes the ordered local feature vectors as input, encodes them to get a representation of the path, and outputs the path representation.
The attention layer: This layer takes the representations of $ s $ paths as input, uses the attention mechanism to aggregate these path representations by weighting them to obtain a final hidden state vector, and outputs the vector.

3.2 Embedding Layer

Given a set of $ s $ semantic paths of length $ T $ between user $ u_{i} $ and item $ v_{j} $, $ {\mathcal{P}}(u_{i} ,\text{ }v_{j} ) = \{ p_{1} ,\text{ }p_{2} ,\text{ } \ldots \text{ },\text{ }p_{s} \} $, where the start entity and end entity of each path in $ {\mathcal{P}} $ are $ u_{i} $ and $ v_{j} $, respectively. As shown in Fig. 1, $ e_{0} = u_{i} $ and $ e_{T} = v_{j} $ in path $ p_{1} $. The embedding layer maps each entity $ e_{t} $ in such a path into a d-dimensional vector $ {\mathbf{e}}_{t} \in {\mathbb{R}}^{d} $, which captures the semantic meaning of the entity. The vectors of all entities in the path constitute an embedding $ {\mathbf{p}}_{{\mathbf{1}}} = \{ {\mathbf{e}}_{0} ,\text{ }{\mathbf{e}}_{1} ,\text{ } \ldots \text{ },\text{ }{\mathbf{e}}_{T} \} $ of the path.

3.3 CNN Layer

The CNN layer takes path embedding $ {\mathbf{p}}_{{\mathbf{1}}} $ as input, and then slides multiple filters with the same window size over the path embedding to extract local features of the path. Let $ {\mathbf{W}}_{1} \text{ } \in \text{ }{\mathbb{R}}^{2 \times d} $ be a filter with a window size of 2. As shown in Fig. 1, $ {\mathbf{W}}_{1} \text{ } $ is applied to two embeddings $ {\mathbf{e}}_{t} $ and $ {\mathbf{e}}_{t + 1} $ of the adjacent entities to generate a local feature $ x_{1} $, which is defined as Eq. (1) [5, 11].

$$ x_{1} = \text{ }f({\mathbf{W}}_{1} \; \circ \;\text{[}{\mathbf{e}}_{t} ,\;{\mathbf{e}}_{t + 1} \text{] } + \text{ }b_{1} ) $$

(1)

where $ \circ $ denotes the convolution operation, $ b_{1} $ is the bias, and $ f( \cdot ) $ is the nonlinear activation function ReLU.

This way, $ k $ filters with the same window size, $ {\mathbf{W}}_{1} ,\;{\mathbf{W}}_{2} ,\; \ldots \; $, are applied to the two entity embeddings $ {\mathbf{e}}_{t} $ and $ {\mathbf{e}}_{t + 1} $ to obtain a local feature vector $ {\mathbf{x}}_{t} = [x_{1} ,\;x_{2} ,\; \ldots \;,\;x_{k} ] $, where $ k $ is a hyperparameter. The CNN layer slides $ k $ filters from entity embedding $ {\mathbf{e}}_{0} $ to entity embedding $ {\mathbf{e}}_{T - 1} $ with stride 1, thus forming a sequence of local feature vectors $ \{ {\mathbf{x}}_{0} ,\;{\mathbf{x}}_{1} ,\; \ldots \;,{\mathbf{x}}_{T - 1} \} $.

3.4 LSTM Layer

Taking $ T $ ordered local feature vectors $ \{ {\mathbf{x}}_{0} ,\;{\mathbf{x}}_{1} ,\; \ldots \;,{\mathbf{x}}_{T - 1} \} $ as input, the LSTM layer uses LSTM to encode the sequence information in the local feature vectors to generate a path representation. At the time step $ t - 1 $, LTSM outputs a hidden state vector $ {\mathbf{h}}_{t - 1} \in {\mathbb{R}}^{{d^{\prime}}} $, where hyperparameter $ d^{\prime} $ is the number of LSTM hidden units. As shown in Fig. 1, the hidden state vector $ {\mathbf{h}}_{t - 1} $ and the local feature vector $ {\mathbf{x}}_{t} $ are used to learn the hidden state vector $ {\mathbf{h}}_{t} $ at time step $ t $, and $ {\mathbf{h}}_{t} $ is defined as Eq. (2) [6, 15].

$$ \begin{aligned} \hfill \\ \begin{array}{*{20}l} {{\mathbf{i}}_{t} \; = \;\sigma ({\mathbf{U}}_{i} {\mathbf{x}}_{t} \; + \;{\mathbf{W}}_{i} {\mathbf{h}}_{t - 1} \; + \;{\mathbf{b}}_{i} )} \hfill \\ {{\mathbf{f}}_{t} \; = \;\sigma ({\mathbf{U}}_{f} {\mathbf{x}}_{t} \; + \;{\mathbf{W}}_{f} {\mathbf{h}}_{t - 1} \; + \;{\mathbf{b}}_{f} )} \hfill \\ {{\mathbf{o}}_{t} \; = \;\sigma ({\mathbf{U}}_{o} {\mathbf{x}}_{t} \; + \;{\mathbf{W}}_{o} {\mathbf{h}}_{t - 1} \; + \;{\mathbf{b}}_{o} )} \hfill \\ {{\hat{\mathbf{c}}}_{t} \; = \;\tanh ({\mathbf{U}}_{c} {\mathbf{x}}_{t} \; + \;{\mathbf{W}}_{c} {\mathbf{h}}_{t - 1} \; + \;{\mathbf{b}}_{c} )} \hfill \\ {{\mathbf{c}}_{t} \; = \;{\mathbf{i}}_{t} \; \odot \;{\hat{\mathbf{c}}}_{t} \; + \;{\mathbf{f}}_{t} \odot {\mathbf{c}}_{t - 1} } \hfill \\ {{\mathbf{h}}_{t} \; = \;{\mathbf{o}}_{t} \; \odot \;\tanh ({\mathbf{c}}_{t} )} \hfill \\ \end{array} \hfill \\ \end{aligned} $$

(2)

where, $ {\mathbf{i}}_{t} ,\;{\mathbf{f}}_{t} ,\;{\mathbf{o}}_{t} \in {\mathbb{R}}^{{d^{\prime}}} $ represent the input, forget, and output gates at time step $ t $, respectively. $ {\hat{\mathbf{c}}}_{t} ,\;{\mathbf{c}}_{t} ,\;{\mathbf{h}}_{t} \in {\mathbb{R}}^{{d^{\prime}}} $ denote the information transform module, cell state vector, and hidden state vector at time step $ t $, respectively. $ {\mathbf{U}}_{i} ,\;{\mathbf{U}}_{f} ,\;{\mathbf{U}}_{o} ,\;{\mathbf{U}}_{c} \in {\mathbb{R}}^{{d^{\prime} \times k}} $ are input weights, $ {\mathbf{W}}_{i} ,\;{\mathbf{W}}_{f} ,\;{\mathbf{W}}_{o} ,\;{\mathbf{W}}_{c} \in {\mathbb{R}}^{{d^{\prime} \times d^{\prime}}} $ are recurrent weights, and $ {\mathbf{b}}_{i} ,\;{\mathbf{b}}_{f} ,\;{\mathbf{b}}_{o} ,\;{\mathbf{b}}_{c} \in {\mathbb{R}}^{{d^{\prime}}} $ are biases. $ \sigma ( \cdot ) $ is the sigmoid activation function and $ \odot $ stands for the element-wise product of two vectors.

As shown in Fig. 1, the learning process continues until the LSTM layer obtains the hidden state vector at the final time step $ T - 1 $. This hidden state vector is therefore output as a path representation, denoted $ {\mathbf{m}}_{1} \in {\mathbb{R}}^{{d^{\prime}}} $.

3.5 Attention Layer

Once the path representations are obtained, the attention layer takes these path representations as input and uses the attention mechanism to generate a final hidden state vector and output it. The process of generating hidden state vectors is as follows: First, this layer learns an attention score $ score({\mathbf{m}}_{i} ) $ for each path representation $ {\mathbf{m}}_{i} $ in the path representation set $ \{ {\mathbf{m}}_{1} ,\;{\mathbf{m}}_{2} ,\; \ldots \;,{\mathbf{m}}_{s} \} $. Then these scores are normalized, and finally these path representations are aggregated by weighting them to obtain a final hidden state vector $ {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }} \in {\mathbb{R}}^{{d^{\prime}}} $, which characterizes the user preferences towards items. The above process is defined as Eq. (3) [17].

$$ \begin{array}{*{20}l} {score({\mathbf{m}}_{i} ) = {\mathbf{w}}_{a}^{\text{T}} \tanh ({\mathbf{W}}_{a} {\mathbf{m}}_{i} )} \hfill \\ {\alpha_{i} = \frac{{\exp (score({\mathbf{m}}_{i} ))}}{{\sum\limits_{i = 1}^{s} {\exp (score({\mathbf{m}}_{i} ))} }}} \hfill \\ {{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }} = \sum\limits_{i = 1}^{s} {\alpha_{i} } {\mathbf{m}}_{i} } \hfill \\ \end{array} $$

(3)

where $ {\mathbf{w}}_{a} \in {\mathbb{R}}^{{d^{\prime}}} $ and $ {\mathbf{W}}_{a} \in {\mathbb{R}}^{{d^{\prime} \times d^{\prime}}} $ are weights, and $ \alpha_{i} $ the normalized attention score.

Finally, DHKGE uses a fully connected layer to quantify the proximity $ \tilde{r}_{ij} $ of user $ u_{i} $ and item $ v_{j} $, which is defined as Eq. (4) [14]:

$$ \tilde{r}_{ij} = \sigma ({\mathbf{W}}_{r} {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }} + b_{r} ) $$

(4)

where $ {\mathbf{W}}_{r} \in {\mathbb{R}}^{{1 \times d^{\prime}}} $ and $ b_{r} $ are the weights and bias, respectively.

3.6 Method Learning and Recommendation Generation

Like RKGE [14], given the training data $ {\mathcal{D}}_{{\text{train}}} $, which contains instances in the form of $ (u_{i} ,\;v_{j} ,\;r_{ij} ,\;{\mathcal{P}}(u_{i} ,\;v_{j} )) $, DHKGE also uses stochastic gradient descent (SGD) to minimize the loss function defined as Eq. (5) to learn all the parameters in DHKGE.

$$ {\mathcal{J}} = \frac{1}{{|{\mathcal{D}}_{{\text{train}}} |}}\sum\nolimits_{{r_{ij} \in {\mathcal{D}}_{{\text{train}}} }} {BCELoss(\tilde{r}_{ij} ,\;r_{ij} )} $$

(5)

where $ BCELoss( \cdot ) $ is the binary cross-entropy between the observed ratings and estimated ones.

The recommendation problem can be dealt with as a binary classification problem [8, 14]. When user $ u_{i} $ prefers item $ v_{j} $, namely $ r_{ij} = 1 $, we expect the estimated proximity $ \tilde{r}_{ij} $ to approach 1, otherwise it approaches 0. Once the learning process is completed, DHKGE can obtain all trained embeddings of the users and items.

Following [14, 18], during the testing process DHKGE can obtain the proximity scores between the target user and candidate items by calculating the inner products of the user embedding and the item embeddings. DHKGE finally generates top-N recommendation lists for the user by sorting the proximity scores in descending order.

4 Experimental Evaluation

4.1 Experimental Setup

Datasets.

Our experiment used two datasets MovieLens 100K^{Footnote 1} and Yelp published on GitHub^{Footnote 2} by [14]. The former is a movie dataset containing user interaction with movies. Sun et al. [14] combined this dataset with the IMDB dataset^{Footnote 3} to add description information of movies, such as genre, actor, and director. Yelp contains user check-ins to local business, user reviews, and local business information, and no external information needs to be added to this dataset. The two datasets were used to build two KGs following Definition 1. The statistics of two datasets are shown in Table 1.

Table 1. Dataset and knowledge graph statistics

Full size table

Following [3, 14], we sorted the two datasets according to the feedback timestamp, and used the earlier 80% feedback as training data and the more recent 20% feedback as test data. For each user-item pair in the training set, we extracted all paths with a length of 3 and randomly selected five paths from them to train our model.

Evaluation Metrics.

Three popular evaluation metrics [1], Precision at N (Prec@N), Mean Reciprocal Rank at N (MRR@N), and Normalized Discounted Cumulative Gain at N (NDCG@N), are adopted to evaluate the top-N recommendation methods in our experiment. We set N = {1, 5, 10, 20} for Prec@N, and N = {5, 10, 20} for MRR@N and NDCG@N.

Comparison Methods and Their Implementation.

We compared our DHKGE with the following four recommendation methods:

BPRMF [13]: It is a Bayesian personalized ranking method based on Matrix Factorization. We used the Cornac^{Footnote 4} framework to implement BPRMF.
NCF [8]: It is a classic neural network-based recommendation method. It was also implemented by using the Cornac framework.
CKE [19]: It is the recently proposed state-of-the-art KG embedding based recommendation method. This method directly used the Python code^{Footnote 5} provided in [2].
RKGE [14]: It is a state-of-the-art recommendation method based on KG path. This method directly used the Python code published on GitHub^{Footnote 6} by the authors.

We used PyTorch to generate the code of DHKGE by modifying the recurrent network module and performance evaluation module in the RKGE code.

Hyperparameter Settings.

For DHKGE, we used grid search to select both the dimension d of the entity embedding and the number k of convolution filters in $ \{ 10,\; 2 0,\; 3 0,\; 4 0,\; 5 0,\; 1 0 0\} $, the number $ d^{\prime} $ of LSTM hidden units in $ \{ 1 6,\; 3 2,\; 6 4,\; 1 2 8\} $, and the learning rate $ \lambda $ of SGD in $ \{ 0. 0 0 1,\; 0. 0 1,\; 0. 1,\; 0. 2\} $. The hyperparameters for DHKGE were set to $ d = 10,\;k = 10,\;d^{\prime} = 16,\;\lambda = 0.2 $ on MovieLens 100K and $ d = 20,\;k = 40,\;d^{\prime} = 32,\;\lambda = 0.01 $ on Yelp. For the four comparison methods, the hyperparameters were set as suggested by the original papers.

4.2 Experimental Results

Tables 2 and 3 show the results of top-N recommendation performed on the two datasets. In the tables, bold numbers indicate the best performance among all the methods; underlined numbers are the best performance among the four comparison methods; the numbers in the “Improve” column indicate the percentage (%) of performance improvement achieved by DHKGE relative to the best performance among the comparison methods. The same way as in [14], we also created two views for each dataset: “All Users” means that all users are considered in the test data, whereas “Cold Start” indicates that the test data only includes users with less than 5 ratings.

Table 2. Results of top-N recommendation on MovieLens 100K

Full size table

Table 3. Results of top-N recommendation on Yelp

Full size table

Observing these results, we can obtain the following findings:

1.
The performance of both DHKGE and RKGE in terms of all metrics except for Prec@1 is significantly better than the other three methods. This indicates DHKGE and RKGE can make full use of the path information to model user’s preference for items, thereby improving the recommendation performance.
2.
DHKGE’s performance is better than RKGE in all metrics (the performance in terms of Prec@1 on MovieLens 100K is the same). This indicates that deep hybrid models can encode semantic paths more efficiently than one type of neural network, because by extracting the local features of the path and encoding the sequence information in the path, DHKGE can generate a more comprehensive path representation for recommendation.
3.
On most metrics, the performance improvement of DHKGE in the “All Users” view is higher than in the “Cold Start” view. This indicates that in the “Cold Start” view, the quality and quantity of the semantic paths extracted from the KG are limited, which affects the recommendation performance of DHKGE since this method relies on the user’s historical interaction information to make recommendations.

Based on these findings, we can draw the conclusion that DHKGE’s recommendation performance is generally better than RKGE and other comparison methods.

5 Conclusions

To overcome the weaknesses of existing KG path-based recommendation methods, in this paper we propose the DHKGE method for top-N recommendation. DHKGE exploits a deep hybrid model to encode the path between users and items, and uses the attention mechanism to distinguish the importance of multiple semantic paths between a user-item pair. Experiments on the MovieLens 100K and Yelp datasets show that DHKGE overall outperforms RKGE and several typical recommendation methods in terms of Precision@N, MRR@N, and NDCG@N. In future work, we plan to improve our method by adding entity relations to path embeddings.

Notes

1.
The experiment of [14] used MovieLens 1M, but the pre-training vectors of users and items in MovieLens 1M were not published on GitHub, so we can only use MovieLens 100K.
2.
https://github.com/sunzhuntu/Recurrent-Knowledge-Graph-Embedding/tree/master/data.
3.
https://www.imdb.com/.
4.
https://github.com/PreferredAI/cornac.
5.
https://github.com/TaoMiner/joint-kg-recommender.
6.
https://github.com/sunzhuntu/Recurrent-Knowledge-Graph-Embedding.

References

Aggarwal, C.C.: Evaluating recommender systems. Recommender Systems, pp. 225–254. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-29659-3_7
Chapter Google Scholar
Cao, Y., Wang, X., He, X., Hu, Z., Chua, T.: Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In: Proceedings of the 28th International Conference on World Wide Web, WWW 2019, pp. 151–161. ACM (2019). https://doi.org/10.1145/3308558.3313705
Catherine, R., Cohen, W.W.: Personalized recommendations using knowledge graphs: a probabilistic logic programming approach. In: Proceedings of the 10th ACM Conference on Recommender Systems, RecSys 2016, pp. 325–332. ACM (2016). https://doi.org/10.1145/2959100.2959131
Da’u, A., Salim, N.: Recommendation system based on deep learning methods: a systematic review and new directions. Artif. Intell. Rev. 53(4), 2709–2748 (2020). https://doi.org/10.1007/s10462-019-09744-1
Article Google Scholar
Goodfellow, I.J., Bengio, Y., Courville, A.C.: Convolutional networks. In: Goodfellow, I.J., Bengio, Y., Courville, A.C. (eds.) Deep Learning, pp. 330–372. MIT Press (2016). http://www.deeplearningbook.org/contents/convnets.html
Goodfellow, I.J., Bengio, Y., Courville, A.C.: Sequence modeling: recurrent and recursive nets. In: Goodfellow, I.J., Bengio, Y., Courville, A.C. (eds.) Deep Learning, pp. 373–420. MIT Press (2016). http://www.deeplearningbook.org/contents/rnn.html
Guo, Q., Zhuang, F., Qin, C., et al.: A survey on knowledge graph-based recommender systems. CoRR abs/2003.00911 (2020). https://arxiv.org/abs/2003.00911
He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.: Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web, WWW 2017, pp. 173–182. ACM (2017). https://doi.org/10.1145/3038912.3052569
Hu, B., Shi, C., Zhao, W.Z., Yu, P.S.: Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, pp. 1531–1540. ACM (2018). https://doi.org/10.1145/3219819.3219965
Kim, D., Park, C., Oh, J., Lee, S., Yu, H.: Convolutional matrix factorization for document context-aware recommendation. In: Proceedings of the 10th ACM Conference on Recommender Systems, RecSys 2016, pp. 233–240. ACM (2016). https://doi.org/10.1145/2959100.2959165
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, pp. 1746–1751. ACL (2014). https://doi.org/10.3115/v1/D14-1181
Lee, H., Ahn, Y., Lee, H., Ha, S., Lee, S.: Quote recommendation in dialogue using deep neural network. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 957–960. ACM (2016). https://doi.org/10.1145/2911451.2914734
Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009, pp. 452–461. AUAI Press (2009). https://dslpitt.org/uai/papers/09/p452-rendle.pdf
Sun, Z., Yang, J., Zhang, J., Bozzon, A., Huang, L., Xu, C.: Recurrent knowledge graph embedding for effective recommendation. In: Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, pp. 297–305. ACM (2018). https://doi.org/10.1145/3240323.3240361
Wang, X., Wang, D., Xu, C., He, X., Cao, Y., Chua, T.: Explainable reasoning over knowledge graphs for recommendation. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, pp. 5329–5336. AAAI Press (2019). https://doi.org/10.1609/aaai.v33i01.33015329
Xu, W., Xu, Z., Ye, L.: Computing user similarity by combining item ratings and background knowledge from linked open data. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds.) WISA 2018. LNCS, vol. 11242, pp. 467–478. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02934-0_43
Chapter Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2016, pp. 1480–1489. ACL (2016). https://doi.org/10.18653/v1/N16-1174
Yu, X., Ren, X., Sun, Y., Sturt, B., Khandelwal, U., Gu, Q., Norick, B., Han, J.: Recommendation in heterogeneous information networks with implicit user feedback. In: Proceedings of the 7th ACM Conference on Recommender Systems, RecSys 2013, pp. 347–350. ACM (2013). https://doi.org/10.1145/2507157.2507230
Zhang, F., Yuan, N.J., Lian, D., Xie, X., Ma, W.: Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 353–362. ACM (2016). https://doi.org/10.1145/2939672.2939673
Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 5:1–5:38 (2019). https://doi.org/10.1145/3285029
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Information, Hohai University, Nanjing, 210098, China
Jian Li, Zhuoming Xu, Yan Tang, Bo Zhao & Haimei Tian

Authors

Jian Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoming Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Haimei Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuoming Xu .

Editor information

Editors and Affiliations

Guangzhou University, Guangzhou, China
Guojun Wang
The University of New South Wales, Sydney, NSW, Australia
Xuemin Lin
Rensselaer Polytechnic Institute, Troy, NY, USA
James Hendler
Wuhan University, Wuhan, China
Wei Song
Hohai University, Nanjing, China
Zhuoming Xu
Fuzhou University, Fuzhou, China
Genggeng Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Xu, Z., Tang, Y., Zhao, B., Tian, H. (2020). Deep Hybrid Knowledge Graph Embedding for Top-N Recommendation. In: Wang, G., Lin, X., Hendler, J., Song, W., Xu, Z., Liu, G. (eds) Web Information Systems and Applications. WISA 2020. Lecture Notes in Computer Science(), vol 12432. Springer, Cham. https://doi.org/10.1007/978-3-030-60029-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-60029-7_6
Published: 22 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60028-0
Online ISBN: 978-3-030-60029-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Deep Hybrid Knowledge Graph Embedding for Top-N Recommendation

Abstract

Similar content being viewed by others

BIKAGCN: Knowledge-Aware Recommendations Under Bi-layer Graph Convolutional Networks

Leveraging Knowledge Context Information to Enhance Personalized Recommendation

Graph attention-based collaborative filtering for user-specific recommender system using knowledge graph and deep neural networks

Keywords

1 Introduction

2 Related Work

2.1 Path-Based Recommendation Methods

2.2 Deep Neural Network-Based Recommendation

3 DHKGE: Deep Hybrid KG Embedding Method

Definition 1 (Knowledge Graph).

3.1 Overview

3.2 Embedding Layer

3.3 CNN Layer

3.4 LSTM Layer

3.5 Attention Layer

3.6 Method Learning and Recommendation Generation

4 Experimental Evaluation

4.1 Experimental Setup

Datasets.

Evaluation Metrics.

Comparison Methods and Their Implementation.

Hyperparameter Settings.

4.2 Experimental Results

5 Conclusions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation