Keywords

1 Introduction

Knowledge graphs (KGs) have proven to be effective in improving recommendation performance [7, 16]. According to [7], there are three categories of KG-based recommendation methods: path-based methods, embedding-based methods, and unified methods. Path-based methods make recommendations by building a KG which contains users, items, and user-item interactions, and then exploiting connectivity patterns between the entities (users or items) in the KG. The traditional meta-path based methods use the semantic similarity of entities in different meta-paths [18] as graph regularization to refine representations of users and items [7]. However, such methods rely heavily on handcrafted meta-paths, which further rely on domain knowledge [14].

To overcome the limitations of meta-path based methods, deep neural network based methods have recently been devised to automatically mine the connectivity patterns between entities (i.e., path embeddings) in the KG. Path representations are learned by extracting path features from connectivity patterns to characterize user preferences towards items, which are finally used to generate recommendation.

However, existing deep neural network based methods, such as the Recurrent Knowledge Graph Embedding (RKGE) approach [14], usually use only one type of neural network to encode path embeddings. But this cannot fully extract path features, which limits performance improvement of the recommender system. Recently proposed deep hybrid models, such as [12], can combine several neural building blocks to form a more powerful recommendation model. To the best of our knowledge, existing deep hybrid models seldom use KGs for recommendation.

To overcome the weaknesses of existing methods, in this paper we propose a Deep Hybrid Knowledge Graph Embedding (DHKGE) method for top-N recommendation. DHKGE encodes embeddings of paths between users and items that are involved in the recommender system by combining convolutional neural network (CNN) and the long short-term memory (LSTM) network. It further uses an attention mechanism to aggregate the encoded path representations and generate a final hidden state vector. This vector is then used to calculate the proximity between the target user and candidate items, and generate top-N recommendation for the user by ranking the proximity.

In summary, the main contributions of this paper are as follows:

  • We propose the Deep Hybrid Knowledge Graph Embedding (DHKGE) method for top-N recommendation, which exploits a deep hybrid model to encode the path between users and items.

  • We propose to use the attention mechanism to distinguish the importance of multiple semantic paths between a user-item pair, so that salient paths play a greater role in modeling user preferences.

  • We evaluated our method on the MovieLens 100K and Yelp datasets. The experimental results show that our method overall outperforms RKGE and several typical recommendation methods in terms of Precision@N, MRR@N, and NDCG@N.

2 Related Work

2.1 Path-Based Recommendation Methods

Path-based methods make recommendations by building user-item graphs and exploiting connectivity patterns between the entities in the graph [7]. Traditional meta-path based methods rely heavily on handcrafted meta-paths. Deep neural network based methods can automatically mine the connectivity patterns between entities in the graph, thereby improving recommendation performance. For example, Hu et al. [9] proposed to leverage meta-path based context for top-N recommendation with a neural co-attention model. Sun et al. [14] proposed the RKGE approach that employs RNN to learn high-quality representations of both users and items, which are then used to generate better recommendations. Wang et al. [15] proposed the Knowledge-aware Path Recurrent Network (KPRN) which exploits KG to generate better recommendation, where the path embeddings in the KG are encoded with LSTM.

Existing path-based recommendation methods usually use only one type of neural network to encode path embeddings, while our proposed DHKGE exploits a deep hybrid model to encode path embeddings, which can generate a more comprehensive path representation for better recommendation.

2.2 Deep Neural Network-Based Recommendation

Deep neural networks have been widely used in recommender systems. The existing recommendation models can be divided into two categories: recommendation with neural building blocks and recommendation with deep hybrid models [4, 20].

In the first category, the recommendation models are divided into several subcategories [20] that exploit the deep learning models: CNN, recurrent neural network (RNN), and attentional model (AM), etc. For example, Kim et al. [10] proposed a context-aware recommendation model named convolutional matrix factorization (ConvMF) that integrates CNN into probabilistic matrix factorization.

Recently, researchers have proposed deep hybrid models, which can combine several neural building blocks to complement one another and form a more powerful recommendation model [20]. For instance, Lee et al. [12] proposed a deep learning recommender system that combines RNN and CNN to learn semantic representation of each utterance and build a sequence model for the dialog thread. To the best of our knowledge, existing deep hybrid models seldom use KGs for recommendation.

3 DHKGE: Deep Hybrid KG Embedding Method

In this section, we expatiate on our DHKGE method. After introducing concepts and notations, we first briefly explain its overall framework, then describe its main components, and finally describe model learning and recommendation generation.

Given a user set \( {\mathcal{U}} = \{ u_{1} ,\text{ }u_{2} ,\text{ } \ldots \text{ },\text{ }u_{m} \} \) and an item set \( {\mathcal{V}} = \{ v_{1} ,\text{ }v_{2} ,\text{ } \ldots \text{ },\text{ }v_{n} \} \) of the recommender system, we construct the users’ implicit feedback matrix \( {\mathbf{R}} \in {\mathbb{R}}^{m \times n} \), where each element is defined as follows: when user \( u_{i} \) interacted with item \( v_{j} \) set \( r_{ij} \; = \; 1 \) indicating that the user prefers the item, otherwise set \( r_{ij} \; = \text{ }0 \). Based on the matrix \( {\mathbf{R}} \) and an external knowledge source (e.g., the IMDB dataset) that describes the items, we build a KG for recommendation, which contains the users, items, user’s preference for the items, and the item descriptions extracted from the knowledge source, such as actors, directors and genres (as entities), as well as rating, categorizing, acting, and directing (as entity relations) in the domain of movie recommendation. We refer to all objects (e.g., users, items, actors, directors, and genres) except for various relations in the KG as entities. The definition [14] of the KG is given below.

Definition 1 (Knowledge Graph).

KG is defined as a directed graph \( {\mathcal{G}} = ({\mathcal{E}},\;{\mathcal{L}}) \), where \( {\mathcal{E} } = \{ e_{1} ,\;e_{2} ,\; \ldots \;,\;e_{{\left| {\mathcal{E}} \right|}} \} \) denotes the sets of entities and \( {\mathcal{L}} \) the sets of links. An entity type mapping function \( \phi :{\mathcal{E}} \to {\mathcal{A}} \) and a link type mapping function \( \varphi :{\mathcal{L}} \to {\mathcal{R}} \) are defined for the graph. Each entity \( e \in {\mathcal{E}} \) belongs to an entity type \( \phi (e) \in {\mathcal{A}} \), and each link \( l \in {\mathcal{L}} \) belongs to a link type (relation) \( \varphi (l) \in {\mathcal{R}} \).

Based on the KG definition, we further define the connected semantic paths between entity pair \( (e_{i} ,\;e_{j} ) \) as \( {\mathcal{P}}(e_{i} ,\text{ e}_{j} ) = \{ p_{1} ,\text{ }p_{2} ,\text{ } \ldots \text{ },\text{ }p_{s} \} \) with \( s \) being the number of paths. A semantic path of length \( T \) in \( {\mathcal{P}} \) is denoted as: \( p = e_{i} \mathop{\longrightarrow}^{{r_{1} }}e_{1} \mathop{\longrightarrow}^{{r_{2} }} \cdots \mathop{\longrightarrow}^{{r_{T} }}e_{j} \).

Following the two semantic path mining strategies proposed in [14], DHKGE only considers user-item paths \( {\mathcal{P}}(u_{i} ,\text{ }v_{j} ),\;u_{i} \in {\mathcal{U},}\;v_{j} \in {\mathcal{V}} \) that connect user \( u_{i} \) with all her rated items \( v_{j} \), and sets a length constraint for such paths, i.e., path length is \( T \).

3.1 Overview

Our goal is to fully extract the information in the semantic path to model user preferences, which are then used to generate better recommendations. To achieve this goal, we propose the deep hybrid knowledge graph embedding (DHKGE) method.

The core ideas of DHKGE is as follows: Given a user and an item, DHKGE first automatically extracts all semantic paths between the user and the item from the KG according to the semantic path mining strategies. It then uses a deep hybrid model to obtain a final hidden vector for quantifying the relation (proximity) between the user and the item. Finally, it generates a top-N recommendation list for the user by sorting the proximity scores of the candidate items in descending order.

The overall framework of DHKGE is depicted in Fig. 1. As shown in the figure, DHKGE is composed of four key components: the embedding layer, CNN layer, LSTM layer, and attention layer, which are further described as follows:

Fig. 1.
figure 1

The overall framework of DHKGE, which describes the case of a user-item pair \( (u_{i} ,v_{j} ) \).

  • The embedding layer: This layer takes the semantic path of length \( T \) as input, learns \( T + 1 \) low-dimensional embedding vectors for \( T + 1 \) entities on the semantic path, and outputs these vectors as an embedding of the path.

  • The CNN layer: This layer takes the path embedding as input, uses multiple filters to extract the local features of the path to form \( T \) local feature vectors, and outputs these vectors.

  • The LSTM layer: This layer takes the ordered local feature vectors as input, encodes them to get a representation of the path, and outputs the path representation.

  • The attention layer: This layer takes the representations of \( s \) paths as input, uses the attention mechanism to aggregate these path representations by weighting them to obtain a final hidden state vector, and outputs the vector.

3.2 Embedding Layer

Given a set of \( s \) semantic paths of length \( T \) between user \( u_{i} \) and item \( v_{j} \), \( {\mathcal{P}}(u_{i} ,\text{ }v_{j} ) = \{ p_{1} ,\text{ }p_{2} ,\text{ } \ldots \text{ },\text{ }p_{s} \} \), where the start entity and end entity of each path in \( {\mathcal{P}} \) are \( u_{i} \) and \( v_{j} \), respectively. As shown in Fig. 1, \( e_{0} = u_{i} \) and \( e_{T} = v_{j} \) in path \( p_{1} \). The embedding layer maps each entity \( e_{t} \) in such a path into a d-dimensional vector \( {\mathbf{e}}_{t} \in {\mathbb{R}}^{d} \), which captures the semantic meaning of the entity. The vectors of all entities in the path constitute an embedding \( {\mathbf{p}}_{{\mathbf{1}}} = \{ {\mathbf{e}}_{0} ,\text{ }{\mathbf{e}}_{1} ,\text{ } \ldots \text{ },\text{ }{\mathbf{e}}_{T} \} \) of the path.

3.3 CNN Layer

The CNN layer takes path embedding \( {\mathbf{p}}_{{\mathbf{1}}} \) as input, and then slides multiple filters with the same window size over the path embedding to extract local features of the path. Let \( {\mathbf{W}}_{1} \text{ } \in \text{ }{\mathbb{R}}^{2 \times d} \) be a filter with a window size of 2. As shown in Fig. 1, \( {\mathbf{W}}_{1} \text{ } \) is applied to two embeddings \( {\mathbf{e}}_{t} \) and \( {\mathbf{e}}_{t + 1} \) of the adjacent entities to generate a local feature \( x_{1} \), which is defined as Eq. (1) [5, 11].

$$ x_{1} = \text{ }f({\mathbf{W}}_{1} \; \circ \;\text{[}{\mathbf{e}}_{t} ,\;{\mathbf{e}}_{t + 1} \text{] } + \text{ }b_{1} ) $$
(1)

where \( \circ \) denotes the convolution operation, \( b_{1} \) is the bias, and \( f( \cdot ) \) is the nonlinear activation function ReLU.

This way, \( k \) filters with the same window size, \( {\mathbf{W}}_{1} ,\;{\mathbf{W}}_{2} ,\; \ldots \; \), are applied to the two entity embeddings \( {\mathbf{e}}_{t} \) and \( {\mathbf{e}}_{t + 1} \) to obtain a local feature vector \( {\mathbf{x}}_{t} = [x_{1} ,\;x_{2} ,\; \ldots \;,\;x_{k} ] \), where \( k \) is a hyperparameter. The CNN layer slides \( k \) filters from entity embedding \( {\mathbf{e}}_{0} \) to entity embedding \( {\mathbf{e}}_{T - 1} \) with stride 1, thus forming a sequence of local feature vectors \( \{ {\mathbf{x}}_{0} ,\;{\mathbf{x}}_{1} ,\; \ldots \;,{\mathbf{x}}_{T - 1} \} \).

3.4 LSTM Layer

Taking \( T \) ordered local feature vectors \( \{ {\mathbf{x}}_{0} ,\;{\mathbf{x}}_{1} ,\; \ldots \;,{\mathbf{x}}_{T - 1} \} \) as input, the LSTM layer uses LSTM to encode the sequence information in the local feature vectors to generate a path representation. At the time step \( t - 1 \), LTSM outputs a hidden state vector \( {\mathbf{h}}_{t - 1} \in {\mathbb{R}}^{{d^{\prime}}} \), where hyperparameter \( d^{\prime} \) is the number of LSTM hidden units. As shown in Fig. 1, the hidden state vector \( {\mathbf{h}}_{t - 1} \) and the local feature vector \( {\mathbf{x}}_{t} \) are used to learn the hidden state vector \( {\mathbf{h}}_{t} \) at time step \( t \), and \( {\mathbf{h}}_{t} \) is defined as Eq. (2) [6, 15].

$$ \begin{aligned} \hfill \\ \begin{array}{*{20}l} {{\mathbf{i}}_{t} \; = \;\sigma ({\mathbf{U}}_{i} {\mathbf{x}}_{t} \; + \;{\mathbf{W}}_{i} {\mathbf{h}}_{t - 1} \; + \;{\mathbf{b}}_{i} )} \hfill \\ {{\mathbf{f}}_{t} \; = \;\sigma ({\mathbf{U}}_{f} {\mathbf{x}}_{t} \; + \;{\mathbf{W}}_{f} {\mathbf{h}}_{t - 1} \; + \;{\mathbf{b}}_{f} )} \hfill \\ {{\mathbf{o}}_{t} \; = \;\sigma ({\mathbf{U}}_{o} {\mathbf{x}}_{t} \; + \;{\mathbf{W}}_{o} {\mathbf{h}}_{t - 1} \; + \;{\mathbf{b}}_{o} )} \hfill \\ {{\hat{\mathbf{c}}}_{t} \; = \;\tanh ({\mathbf{U}}_{c} {\mathbf{x}}_{t} \; + \;{\mathbf{W}}_{c} {\mathbf{h}}_{t - 1} \; + \;{\mathbf{b}}_{c} )} \hfill \\ {{\mathbf{c}}_{t} \; = \;{\mathbf{i}}_{t} \; \odot \;{\hat{\mathbf{c}}}_{t} \; + \;{\mathbf{f}}_{t} \odot {\mathbf{c}}_{t - 1} } \hfill \\ {{\mathbf{h}}_{t} \; = \;{\mathbf{o}}_{t} \; \odot \;\tanh ({\mathbf{c}}_{t} )} \hfill \\ \end{array} \hfill \\ \end{aligned} $$
(2)

where, \( {\mathbf{i}}_{t} ,\;{\mathbf{f}}_{t} ,\;{\mathbf{o}}_{t} \in {\mathbb{R}}^{{d^{\prime}}} \) represent the input, forget, and output gates at time step \( t \), respectively. \( {\hat{\mathbf{c}}}_{t} ,\;{\mathbf{c}}_{t} ,\;{\mathbf{h}}_{t} \in {\mathbb{R}}^{{d^{\prime}}} \) denote the information transform module, cell state vector, and hidden state vector at time step \( t \), respectively. \( {\mathbf{U}}_{i} ,\;{\mathbf{U}}_{f} ,\;{\mathbf{U}}_{o} ,\;{\mathbf{U}}_{c} \in {\mathbb{R}}^{{d^{\prime} \times k}} \) are input weights, \( {\mathbf{W}}_{i} ,\;{\mathbf{W}}_{f} ,\;{\mathbf{W}}_{o} ,\;{\mathbf{W}}_{c} \in {\mathbb{R}}^{{d^{\prime} \times d^{\prime}}} \) are recurrent weights, and \( {\mathbf{b}}_{i} ,\;{\mathbf{b}}_{f} ,\;{\mathbf{b}}_{o} ,\;{\mathbf{b}}_{c} \in {\mathbb{R}}^{{d^{\prime}}} \) are biases. \( \sigma ( \cdot ) \) is the sigmoid activation function and \( \odot \) stands for the element-wise product of two vectors.

As shown in Fig. 1, the learning process continues until the LSTM layer obtains the hidden state vector at the final time step \( T - 1 \). This hidden state vector is therefore output as a path representation, denoted \( {\mathbf{m}}_{1} \in {\mathbb{R}}^{{d^{\prime}}} \).

3.5 Attention Layer

Once the path representations are obtained, the attention layer takes these path representations as input and uses the attention mechanism to generate a final hidden state vector and output it. The process of generating hidden state vectors is as follows: First, this layer learns an attention score \( score({\mathbf{m}}_{i} ) \) for each path representation \( {\mathbf{m}}_{i} \) in the path representation set \( \{ {\mathbf{m}}_{1} ,\;{\mathbf{m}}_{2} ,\; \ldots \;,{\mathbf{m}}_{s} \} \). Then these scores are normalized, and finally these path representations are aggregated by weighting them to obtain a final hidden state vector \( {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }} \in {\mathbb{R}}^{{d^{\prime}}} \), which characterizes the user preferences towards items. The above process is defined as Eq. (3) [17].

$$ \begin{array}{*{20}l} {score({\mathbf{m}}_{i} ) = {\mathbf{w}}_{a}^{\text{T}} \tanh ({\mathbf{W}}_{a} {\mathbf{m}}_{i} )} \hfill \\ {\alpha_{i} = \frac{{\exp (score({\mathbf{m}}_{i} ))}}{{\sum\limits_{i = 1}^{s} {\exp (score({\mathbf{m}}_{i} ))} }}} \hfill \\ {{\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }} = \sum\limits_{i = 1}^{s} {\alpha_{i} } {\mathbf{m}}_{i} } \hfill \\ \end{array} $$
(3)

where \( {\mathbf{w}}_{a} \in {\mathbb{R}}^{{d^{\prime}}} \) and \( {\mathbf{W}}_{a} \in {\mathbb{R}}^{{d^{\prime} \times d^{\prime}}} \) are weights, and \( \alpha_{i} \) the normalized attention score.

Finally, DHKGE uses a fully connected layer to quantify the proximity \( \tilde{r}_{ij} \) of user \( u_{i} \) and item \( v_{j} \), which is defined as Eq. (4) [14]:

$$ \tilde{r}_{ij} = \sigma ({\mathbf{W}}_{r} {\mathbf{\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{h} }} + b_{r} ) $$
(4)

where \( {\mathbf{W}}_{r} \in {\mathbb{R}}^{{1 \times d^{\prime}}} \) and \( b_{r} \) are the weights and bias, respectively.

3.6 Method Learning and Recommendation Generation

Like RKGE [14], given the training data \( {\mathcal{D}}_{{\text{train}}} \), which contains instances in the form of \( (u_{i} ,\;v_{j} ,\;r_{ij} ,\;{\mathcal{P}}(u_{i} ,\;v_{j} )) \), DHKGE also uses stochastic gradient descent (SGD) to minimize the loss function defined as Eq. (5) to learn all the parameters in DHKGE.

$$ {\mathcal{J}} = \frac{1}{{|{\mathcal{D}}_{{\text{train}}} |}}\sum\nolimits_{{r_{ij} \in {\mathcal{D}}_{{\text{train}}} }} {BCELoss(\tilde{r}_{ij} ,\;r_{ij} )} $$
(5)

where \( BCELoss( \cdot ) \) is the binary cross-entropy between the observed ratings and estimated ones.

The recommendation problem can be dealt with as a binary classification problem [8, 14]. When user \( u_{i} \) prefers item \( v_{j} \), namely \( r_{ij} = 1 \), we expect the estimated proximity \( \tilde{r}_{ij} \) to approach 1, otherwise it approaches 0. Once the learning process is completed, DHKGE can obtain all trained embeddings of the users and items.

Following [14, 18], during the testing process DHKGE can obtain the proximity scores between the target user and candidate items by calculating the inner products of the user embedding and the item embeddings. DHKGE finally generates top-N recommendation lists for the user by sorting the proximity scores in descending order.

4 Experimental Evaluation

4.1 Experimental Setup

Datasets.

Our experiment used two datasets MovieLens 100KFootnote 1 and Yelp published on GitHubFootnote 2 by [14]. The former is a movie dataset containing user interaction with movies. Sun et al. [14] combined this dataset with the IMDB datasetFootnote 3 to add description information of movies, such as genre, actor, and director. Yelp contains user check-ins to local business, user reviews, and local business information, and no external information needs to be added to this dataset. The two datasets were used to build two KGs following Definition 1. The statistics of two datasets are shown in Table 1.

Table 1. Dataset and knowledge graph statistics

Following [3, 14], we sorted the two datasets according to the feedback timestamp, and used the earlier 80% feedback as training data and the more recent 20% feedback as test data. For each user-item pair in the training set, we extracted all paths with a length of 3 and randomly selected five paths from them to train our model.

Evaluation Metrics.

Three popular evaluation metrics [1], Precision at N (Prec@N), Mean Reciprocal Rank at N (MRR@N), and Normalized Discounted Cumulative Gain at N (NDCG@N), are adopted to evaluate the top-N recommendation methods in our experiment. We set N = {1, 5, 10, 20} for Prec@N, and N = {5, 10, 20} for MRR@N and NDCG@N.

Comparison Methods and Their Implementation.

We compared our DHKGE with the following four recommendation methods:

  • BPRMF [13]: It is a Bayesian personalized ranking method based on Matrix Factorization. We used the CornacFootnote 4 framework to implement BPRMF.

  • NCF [8]: It is a classic neural network-based recommendation method. It was also implemented by using the Cornac framework.

  • CKE [19]: It is the recently proposed state-of-the-art KG embedding based recommendation method. This method directly used the Python codeFootnote 5 provided in [2].

  • RKGE [14]: It is a state-of-the-art recommendation method based on KG path. This method directly used the Python code published on GitHubFootnote 6 by the authors.

We used PyTorch to generate the code of DHKGE by modifying the recurrent network module and performance evaluation module in the RKGE code.

Hyperparameter Settings.

For DHKGE, we used grid search to select both the dimension d of the entity embedding and the number k of convolution filters in \( \{ 10,\; 2 0,\; 3 0,\; 4 0,\; 5 0,\; 1 0 0\} \), the number \( d^{\prime} \) of LSTM hidden units in \( \{ 1 6,\; 3 2,\; 6 4,\; 1 2 8\} \), and the learning rate \( \lambda \) of SGD in \( \{ 0. 0 0 1,\; 0. 0 1,\; 0. 1,\; 0. 2\} \). The hyperparameters for DHKGE were set to \( d = 10,\;k = 10,\;d^{\prime} = 16,\;\lambda = 0.2 \) on MovieLens 100K and \( d = 20,\;k = 40,\;d^{\prime} = 32,\;\lambda = 0.01 \) on Yelp. For the four comparison methods, the hyperparameters were set as suggested by the original papers.

4.2 Experimental Results

Tables 2 and 3 show the results of top-N recommendation performed on the two datasets. In the tables, bold numbers indicate the best performance among all the methods; underlined numbers are the best performance among the four comparison methods; the numbers in the “Improve” column indicate the percentage (%) of performance improvement achieved by DHKGE relative to the best performance among the comparison methods. The same way as in [14], we also created two views for each dataset: “All Users” means that all users are considered in the test data, whereas “Cold Start” indicates that the test data only includes users with less than 5 ratings.

Table 2. Results of top-N recommendation on MovieLens 100K
Table 3. Results of top-N recommendation on Yelp

Observing these results, we can obtain the following findings:

  1. 1.

    The performance of both DHKGE and RKGE in terms of all metrics except for Prec@1 is significantly better than the other three methods. This indicates DHKGE and RKGE can make full use of the path information to model user’s preference for items, thereby improving the recommendation performance.

  2. 2.

    DHKGE’s performance is better than RKGE in all metrics (the performance in terms of Prec@1 on MovieLens 100K is the same). This indicates that deep hybrid models can encode semantic paths more efficiently than one type of neural network, because by extracting the local features of the path and encoding the sequence information in the path, DHKGE can generate a more comprehensive path representation for recommendation.

  3. 3.

    On most metrics, the performance improvement of DHKGE in the “All Users” view is higher than in the “Cold Start” view. This indicates that in the “Cold Start” view, the quality and quantity of the semantic paths extracted from the KG are limited, which affects the recommendation performance of DHKGE since this method relies on the user’s historical interaction information to make recommendations.

Based on these findings, we can draw the conclusion that DHKGE’s recommendation performance is generally better than RKGE and other comparison methods.

5 Conclusions

To overcome the weaknesses of existing KG path-based recommendation methods, in this paper we propose the DHKGE method for top-N recommendation. DHKGE exploits a deep hybrid model to encode the path between users and items, and uses the attention mechanism to distinguish the importance of multiple semantic paths between a user-item pair. Experiments on the MovieLens 100K and Yelp datasets show that DHKGE overall outperforms RKGE and several typical recommendation methods in terms of Precision@N, MRR@N, and NDCG@N. In future work, we plan to improve our method by adding entity relations to path embeddings.