Keywords

1 Introduction

The volume of data is growing with the rapid development of society, leading to information overload. In response, recommender systems have been proposed for personalized information filtering, which centers on predicting whether a user will interact with a certain item. Collaborative filtering (CF) [4] has been extensively studied in recommender systems, which uses user-item interaction data to learn the user and item embeddings [9,10,11,12,13, 16, 17, 19, 23]. However, the data sparsity that accompanies CF hinders the development of recommender systems. With the rapid increase in social platforms, users are willing to make friends and express their preferences on social platforms. Inspired by this, the social recommendation was proposed to alleviate the data sparsity issue through abundant social relationships.

As users take on an important role in both user-user social networks and user-item interaction graphs, the key to social recommendation is that the final embedding contains information from two interaction graphs. Traditional social recommendation methods [5, 10, 20, 26] aim at adding the influence of social neighbors to the user embedding representation via a user-user interaction matrix with the matrix factorization. These improvements can be seen as utilizing first-order neighbors of the graph structure to make improvements. In the following research, Wu et al. proposed Diffnet [18] and Diffnet++ [21] to further enhance the embedding learning by incorporating high-order neighborhood information into the embedding learning process. Although the performance of the recommendation models of these social networks has improved, we believe that there is still room for improvement in the present social recommendation models. Specifically, these methods are not explicitly coded between the user and the long-distance item. For example, there is such an association \(\mathcal {U}_{4}\rightarrow \mathcal {U}_{1}\rightarrow \mathcal {I}_{1}\) in Fig. 1, then \(\mathcal {U}_{4}\) and \(\mathcal {I}_{1}\) are intrinsically related, but Diffnet does not explicitly model these latent collaborative signals, we hope to encode such collaboration signals in an explicit way. As well, during the training of the previous models, all interactions between users and users (items) are coded uniformly, regardless of the reliability of the connections between them. Some unreliable connections may distract the nodes and lead to diminishing embedding representation capabilities. Therefore, we expect users to focus on those users and items that are more relevant to them in the modeling.

In this work, we propose a new recommendation framework named Meta-path Enhanced Lightweight Graph Neural Network (ME-LGNN), which models high-order collaborative signals explicitly. Specifically, we merge the user-user social network and the user-item interaction graph into a unified social recommendation HIN, then we utilize a lightweight graph convolution operation to aggregate neighbors’ information. By stacking multiple layers, users and items can obtain the characteristics of their high-order neighbors. In the aggregation process, we use the attention network to adaptively aggregate the embedding representation of different neighbors. In addition, to enable users to capture collaborative signals more efficiently, we devise a series of interpretable meta-paths in HIN. To make the model focus more on reliable connections during training, we reduce noises by constraining the dependency probability of meta-paths, making the embedding representation more capable. The two training processes are alternated to improve the overall recommendation performance.

Fig. 1.
figure 1

Heterogeneous information network (HIN) constructed by the user-item interaction graph and the social network. The blue node represents the user, the green node represents the item, the orange dashed line indicates the interaction between the user and the item, the purple dashed line indicates the trust relationship between the users. (Color figure online)

In summary, the contributions of this work are as follows:

  • We emphasize the significance of explicitly incorporating high-order collaborative signals into embedding in social recommendation models.

  • We fuse the user-user social network and the user-item interaction graph into a unified heterogeneous network. On the basis of the network, we propose a new social recommendation framework called ME-LGNN, which can model high-order collaborative signals explicitly.

  • To enable users to focus more on reliable connections, we designed a series of meaningful meta-paths, which further improve embedding learning by constraining meta-path dependency probabilities.

  • We conduct extensive experiments on three public datasets. Results demonstrate competitive performance of ME-LGNN and the effectiveness of explicit modelling of unified HINs.

2 Related Work

In this section, we briefly review the work related to social recommendation.

SoRec [15] is an early social recommendation model based on matrix factorization, which proposed a factor analysis method using the probability matrix. To incorporate the preferences of friends trusted by the user, Ma et al. proposed a recommendation model RSTE [14] with weighted social information on top of SoRec. However, this model does not propagate information about users’ preferences in the network, so SocialMF [10] was proposed, which is based on the trust propagation mechanism of social networks to obtain the embedding representation of users. Observing that users tend to give higher scores to items that their friends like, Zhao et al. proposed the SBPR [26] model, which uses social information to select training examples and incorporates social relationships into the model. TBPR [20] was built on it by incorporating strong and weak ties in social relationships into social recommendations. These models proposed different methods to address the sparse issue of collaborative filtering.

The recommender systems based deep learning aim to capture non-linear features from the interaction graph. NeuMF [8] combines traditional matrix factorization and Multilayer Perceptron (MLP) to extract both low and high dimensional features with promising recommendation performance. DeepSoR [2] proposed to learn embedding representations from social relationships and then integrated the user’s embedding representation into a probability matrix factorization for evaluating prediction. However, these methods do not encode the interaction of information in an explicit way.

In recent years, graph neural networks are becoming well known for their powerful performance in learning graph data. Recommendation tasks also can be well represented as graph structures, so GNNs provide great potential for the development of recommendation tasks. GC-MC [1] was proposed to construct embedding of users and items by passing messages on the user-item interaction graph, but this passing only operates on a layer of links and does not incorporate high order interaction information. The NGCF [19] proposed by Wang et al. designs a graph convolution operation on the user-item interaction graph to capture collaborative filtering signals in high-order connections. LightGCN [7] makes the model more suitable for collaborative filtering tasks by removing feature transformations and non-linear variations from the GCN of the NGCF. To incorporate social relationships, GraphRec [3] learns an adequate embedding representation from the rich social relationships by fusing first-order interactions on social networks and user-item interaction graphs with neural network processes. Diffnet [18] leverages convolutional operations to perform recursive diffusion in social networks to obtain high-order collaborative signals from users. With these existing models, our work performs differently in that we fuse the social network and user-item interaction graphs into a unified heterogeneous information graph that explicitly encodes potential collaborative signals by propagating information over the heterogeneous graph.

Considered from the perspective of heterogeneous graphs, a number of meta-path-related models have been proposed to solve the recommendation task. The recommendation model based on the meta-path can perform interpretable analysis of the recommendation results. Yu et al. proposed HteRec [25] which targets implicit feedback and obtains different user preference matrices based on different meta-paths, and then goes through a matrix factorization model to implement recommendation tasks. Han et al. proposed the NeuACF [6] model to extract different dimensions of information through multiple meta-paths in an attempt to fuse different aspects of information. The IF-BPR [24] model also learns the user’s embedding representation through a meta-path approach and then discovers the user’s potential friends through an adaptive approach. On the contrary, our work aims to enhance representation by calculating meta-path dependency probabilities in such a way that users can focus on connections that are more closely related to themselves and ignore connections that are more dissimilar to themselves in a heterogeneous graph.

3 Methodology

In general, ME-LGNN is composed of four parts: (1) Embedding Layer: providing the initial embedding of users and items. (2) Aggregation layer: learning embedding representations of users and items through a lightweight GCN with the attention mechanism; (3) Enhancement layer: designing some reasonable meta-paths, through meta-path constraints to further enhance the embedding representation capabilities. (4) Predicting Layer. We show the overall neural architecture of ME-LGNN in Fig. 2. We first describe the problem formulation, then introduce the four components of our model, and finally discuss the training optimization process.

Fig. 2.
figure 2

Illustration of the ME-LGNN model architecture (arrow lines indicate information flow). In the aggregation layer user (blue) and item (green) are aggregated through multiple propagation layers for neighborhood aggregation, the output is subjected to the first round of training, then further enhanced by a meta-path enhancement layer for embedding learning, and its output is subjected to the final prediction. (Color figure online)

3.1 Notation and Problem Formulation

We introduce necessary notations and definitions by describing the graph-based collaborative filtering problem. In graph-based social recommendation, there are two types of entities, user set \(U \) (\(\mid U \mid = M \)) and item set \(I \) (\(\mid I\mid = N \)). Two interaction graphs are formed from U and I: (1) User-item interaction graph \(\boldsymbol{G}_{u-i}\): we can use the adjacency matrix to represent the interactions in the graph. \(R \) \(\in \) \(\mathbb {R}^{M \times N}\) is the interaction matrix between users and items. If there is an interaction between the user u and the item i, \(y_{ui}\) is 1, otherwise it is 0. (2) User-user social network \(\boldsymbol{G}_{u-u}\): \(S \) \(\in \) \(\mathbb {R}^{M \times M}\) is the interaction matrix between users. If the user \(u_1\) and the user \(u_2\) trust each other, \(y_{u_{1}u_2}\) is 1, otherwise it is 0. The type of relationship between entities a and b is \(\boldsymbol{r}_{ab}\). Relationship types include social relationships between users and user ratings of items. We merge the two networks to obtain a unified heterogeneous information network \(\boldsymbol{G}_H\). For social recommendation, given the evaluation matrix \(R \) of users and items and the social network interaction matrix \(S \), our goal is to predict the user’s preference for unknown items. Next, we define the problems investigated in this work as follows:

\(\boldsymbol{Input}\): user-user social network \(\boldsymbol{G}_{u-u}\), user-item interaction graph \(\boldsymbol{G}_{u-i}\), user set \(U \) and item set \(I \).

\(\boldsymbol{Output}\): a personalized ranking function that maps an item to a real value for each user: \(f_u: I \rightarrow \mathbb {R}\).

In this process, we first merge the two input interaction graphs to obtain a new heterogeneous information network \(\boldsymbol{G}_H\). The rest of the training is carried out on \(\boldsymbol{G}_H\).

3.2 Embedding Layer

For social recommendation, we consider it as a representation learning problem. Following some mainstream models [7, 18, 19, 21], we use the embedding vector \(\boldsymbol{e}_{u} \in \mathbb {R}^{d}~(\boldsymbol{e}_{i} \in \mathbb {R}^{d})\) to describe the user (item) and use the embedding vector \(\boldsymbol{r} \in \mathbb {R}^{d}\) to describe the relationship type, d represents the dimension of the embedding. We construct two parameter matrices as embedding lookup tables:

$$\begin{aligned} \boldsymbol{E}=[ \overbrace{\boldsymbol{e}_{u_{1}},\cdot \cdot \cdot ,\boldsymbol{e}_{u_{N}}}^{\text {user embeddings}}, \overbrace{\boldsymbol{e}_{i_{1}},\cdot \cdot \cdot ,\boldsymbol{e}_{i_{M}}}^{\text {item embeddings}}], \boldsymbol{R}=[ \overbrace{\boldsymbol{r}_{uu},\boldsymbol{r}_{ui}}^{\text {relationship type embeddings}}]. \end{aligned}$$
(1)

3.3 Aggregation Layer

At the aggregation layer, we employ the attention network structure to aggregate adaptively the learning embeddings of different neighbors in the heterogeneous graph \(\boldsymbol{G}_H\). Through iterative aggregation, information can be diffused and propagated in the network. Some high-order cooperative information can also be captured, so the model can model high-order collaborative signals. Recent work [7] has found that the most common feature transformations and non-linear activations in GCN contribute little to collaborative filtering and instead make training more difficult and reduce recommendation performance. In our work, we remove the two steps of feature transformation and nonlinear activation to reduce model complexity. We aggregate all nodes in the heterogeneous information network, the graph convolution operation in the ME-LGNN is defined as:

$$\begin{aligned} \boldsymbol{e}_{a}^{*}=AGG \left( \boldsymbol{e}_{a},\mathcal {N}(a)\right) =\boldsymbol{e}_{a}+\sum _{b \in \mathcal {N}(a)}\tilde{\pi }_{ab}\cdot \boldsymbol{e}_{b}, \end{aligned}$$
(2)

where \(\boldsymbol{e}_{a}^{*}\) represents the embedding representation of node a, \(\mathcal {N}(a)\) represents the set of neighbors of a, and \(\tilde{\pi }_{ab}\) is the attention value, which indicates that how strongly node a is influenced by node b. The attention value is specifically defined as follows:

$$\begin{aligned} \tilde{\pi }_{ab}=\frac{exp(\pi _{ab})}{\sum _{b^{\prime } \in \mathcal {N}(a)}exp(\pi _{ab_{\prime }})}, \end{aligned}$$
(3)
$$\begin{aligned} \pi _{ab}=(\boldsymbol{e}_{a}\odot \boldsymbol{e}_{b})^{T}tanh(\boldsymbol{r}_{ab}\odot \boldsymbol{e}_{b}), \end{aligned}$$
(4)

where \(r_{ab}\) is the interaction type between node a and node b. High-order correlation is crucial for encoding collaborative signals and estimating correlation scores between users and items. We iteratively execute the above aggregation process and encode high-order collaborative signals to explore high-order connectivity information.

3.4 Enhancement Layer

In the training process of the previous model, Fig. 1 shows that the interaction information between all users and users (items) is uniformly coded, regardless of the user’s high-order information relevance. Some unreliable links may distract nodes. To alleviate the above problems, we do the following work to enhance the representation learning ability.

Design Meta-path. Referring to [24], the interaction in this paper is undirected. Here we have designed some reasonable meta-paths as shown in Table 1. We generate these meta-paths by means of a random walk method on the heterogeneous graph \(\boldsymbol{G}_H\). Then, a new path set \(\boldsymbol{P}\) is obtained.

Table 1. Meta-paths designed for social recommendation.

Calculate Dependent Path Probability. We reconstruct a dependency path. For the path head and tail nodes, if there are interactions between the two nodes in the ground-truth, the path will have a higher score, otherwise, it will have a lower score. The goal is to enable users to pay more attention to the nodes that are more likely to interact with themselves when learning embedding, so as to further enhance the embedding learning.

Inspired by [22], we calculate the dependency probabilities of paths with the following approach. Unlike [22] which proposed to model reconstructed sequences as sequence generation, the sequence nature of paths in our study is not obvious. Specifically, since there is a relationship between the two nodes in the path, we use self-attention to calculate the hidden state \(\boldsymbol{p}_{b_{l}}\) of each node \(\boldsymbol{q}_{b_{l-1}}\) on the path \(\phi (n)\), as follows:

$$\begin{aligned} \boldsymbol{p}_{b_{l}}=self-attention (\boldsymbol{p}_{b_{l}-1},\boldsymbol{q}_{b_{l-1}}), \end{aligned}$$
(5)

where \(\boldsymbol{p}_{b_{l}}\) is the embedding of the node, and the initialization of \(\boldsymbol{p}_{b_{0}}\) is the output of the aggregation layer. Then \(\boldsymbol{p}_{b_{l}}\) is given to the softmax layer to calculate the probability of node \(v_{b_{l}}\) in the path:

$$\begin{aligned} P\left( v_{b_{l}} \mid v_{<b_{l}}\right) =\frac{exp\left( \boldsymbol{p}_{b_{l}} \boldsymbol{W}_{r}\boldsymbol{q}_{b_{l}}\right) }{\sum _{n}exp\left( \boldsymbol{p}_{b_{l}} \boldsymbol{W}_{r}\boldsymbol{q}_{b_{n}}\right) }, \end{aligned}$$
(6)

where \(\boldsymbol{W}_{r} \in \mathbb {R}^{d \times d}\) is the parameter matrix. Then we get the set of node probabilities for path \(\phi (n)\) as \(\{P(v_{b_1} \mid v_{<b_1 }),P(v_{b_2} \mid v_{<b_2 }),\cdots ,P(v_{b_L} \mid v_{<b_L})\}\), finally, the probability of path \(\phi (n)\) is calculated as:

$$\begin{aligned} N(\phi (n))=\prod _{l=1}^{L}P\left( v_{b_{l}} \mid v_{<b_{l}}\right) . \end{aligned}$$
(7)

3.5 Predicting Layer

After propagation through k layers, we obtain a representation of the user embedding \(\{\boldsymbol{e}_u^1,\boldsymbol{e}_u^2,\cdots ,\boldsymbol{e}_u^k\}\) and item embedding \(\{\boldsymbol{e}_{i}^{1},\boldsymbol{e}_{i}^{2},\cdots ,\boldsymbol{e}_{i}^{k}\}\) for each layer. Since each layer may reflect different dimensions of user preferences, we concatenate each layer of user representation as the final representation: \(\boldsymbol{e}_{u}^{*} = [\boldsymbol{e}_{u}^{0} \parallel \boldsymbol{e}_{u}^{1} \parallel \cdots \parallel \boldsymbol{e}_{u}^{k}]\). we use a similar approach to get the final representation of item: \(\boldsymbol{e}_{i}^{*} = [\boldsymbol{e}_{i}^{0} \parallel \boldsymbol{e}_{i}^{1} \parallel \cdots \parallel \boldsymbol{e}_{i}^{k} ]\). Ultimately we calculate the user’s preference \(\hat{y}_{ui}\) for a certain item via inner product:

$$\begin{aligned} \hat{y}_{ui}={\boldsymbol{e}_{u}^{*}}^T \boldsymbol{e}_{i}^{*}. \end{aligned}$$
(8)

3.6 Model Training

To learn the model parameters, for the aggregation layer, we adopt cross-entropy loss as the loss function:

$$\begin{aligned} \mathcal {L}_{A}=\sum _{(u,i,j) \in \mathcal {O}} -\ln \sigma (\hat{y}_{ui} - \hat{y}_{uj}), \end{aligned}$$
(9)

where \(\mathcal {O} = \{(u,i,j)\mid (u,i) \in \mathcal {R}^+,(u,j) \in \mathcal {R}^-\}\) denotes the pairwise training data, \(\mathcal {R}^+\) denotes positive samples, \(\mathcal {R}^-\) denotes negative samples.

For the enhancement layer, the \(\hat{y}_{ui}\) was calculated using the same method as in Eq. (8). We calculate the additional reconstruction loss for the generated training set of reconstructed paths. We get fewer positive samples by random walk through the meta-path, so we give a larger weight to the positive samples during training. The difference with Eq. (9) is that here the training is performed through a cross-entropy loss function with weights:

$$\begin{aligned} \mathcal {L}_{E}=\sum _{(u,i,j) \in \mathcal {O}} -\ln \sigma (\mu \hat{y}_{ui} - \hat{y}_{uj}), \end{aligned}$$
(10)
$$\begin{aligned} \mu =\sqrt{\frac{\parallel \mathcal {R}^+ \parallel +\parallel \mathcal {R}^- \parallel }{\parallel \mathcal {R}^+ \parallel }}, \end{aligned}$$
(11)

where \(\mu \) is the weight value of the positive sample.

The complete loss function of ME-LGNN is as follows:

$$\begin{aligned} \mathcal {L}=\mathcal {L}_{A}+\mathcal {L}_{E} + \lambda {{\Vert \varTheta \Vert }^2}, \end{aligned}$$
(12)

where \(\varTheta =\{\boldsymbol{E}, \boldsymbol{R}, \boldsymbol{W}_{r}\}\) represents all the trainable model parameters. It should be noted here that the only training for our aggregation layer is the embedding of the \(0^{th}\) layer. Compared with the traditional graph convolution operation, the amount of parameters is reduced. Our enhancement layer model training parameters and the conversion matrix in self-attention. \(\lambda \) is used to balance the power of \(L_2\) regularization and prevent overfitting. \(\mathcal {L}_{A}\) and \(\mathcal {L}_{E}\) are trained alternately to jointly improve the performance of the model. We adopt Adam optimizer to optimize the aggregation layer and enhancement layer to update the parameters. Adam is a commonly used optimizer which can adaptively adjust the learning rate.

figure a

The specific process of ME-LGNN is presented in Algorithm 1. A training epoch involves two stages: the aggregation layer (line 3–5) and the enhancement layer (line 6–8). In each iteration, we execute the aggregation layer and the enhancement layer in turn to enhance embedding learning.

3.7 Complexity Analysis

In this section, we analyse the complexity of our model.

Model Size. We adopt the alternative optimization strategy, the training matrix of our model consists of three parts: user and item embeddings, the parameter matrix, transformation matrixes. For the aggregation layer, we only need to learn the \(0^{th}\) layer user embeddings \(\boldsymbol{U}^{(0)}\in \mathbb {R}^{N\times d}\) and item embeddings \(\boldsymbol{I}^{(0)}\in \mathbb {R}^{M\times d}\). For the enhancement layer, model training matrix \(\boldsymbol{W}^r\) and three conversion matrixes in self-attention need to be learned. The number of parameters in each matrix is \(d\times d\). In total, the overall model size is approximately \((N+M+4d)d\). It can be seen that our model is very lightweight.

Time Complexity. Time consumption comes in five main parts: graph convolution, aggregation attention, self-attention, dependent path probability, and prediction. We first calculate the number of interactions \(t = \mid R^+ \mid + \mid S^+ \mid \) in the adjacency matrix, where \(\mid R^+ \mid \) and \(\mid S^+ \mid \) denote the number of nonzero elements in R and S. For the graph convolution and attention through k layers, time consumption both are O(tdk), d is the dimension of embedding. The self-attention and dependent path probability have computational complexity O(Nd). For the prediction layer, only inner product calculations were carried out, taking time \(O(NMd^2)\). Since our model removes the feature transformations and non-linear transformations from the GCN according to [7], our model is also more efficient in the training process than traditional GNN-based social recommendation models.

4 Experiments

In this section, we conduct experiments on three real datasets, Yelp, Douban and Lastfm-2k, to answer the following three questions:

\(\boldsymbol{RQ1}\) Can ME-LGNN perform better than other competitive methods?

\(\boldsymbol{RQ2}\) Does our proposed meta-path enhancement module improve recommendation performance?

\(\boldsymbol{RQ3}\) Is ME-LGNN effective in mitigating data sparsity problems?

4.1 Experimental Settings

Dataset. To validate the performance of the model proposed in this paper, we conduct experiments on three public datasets, Yelp, Douban and Lastfm-2k. The three datasets are described in detail as follows.

Table 2. The statistics of the three datasets.
  • Yelp: The dataset is an online location-based social network where users rate and interact with each other on top of the site.

  • Douban: The dataset is a crawl of Douban reading data, which describes the interaction behaviour of users on Douban.

  • Lastfm-2k: The dataset contains the social networks of the 2K users set from the Last.fm online music system, as well as information about the users’ listening.

With the dataset described above, we randomly selected 60\(\%\) of the user interaction dataset as the training set, 20\(\%\) as the test set and the remaining 20\(\%\) as the validation set to tune the hyperparameters. Next, we perform a negative sampling strategy on the dataset, sampling the items that have not been consumed by the users as negative samples. The information of the dataset after pre-processing is shown in Table 2.

Evaluation Indicators. In this article, with an aim of evaluating the top-N, We use two benchmarks which are commonly used in recommender systems, Recall and Normalized Discounted Cumulative Gain (NDCG). Recall measures how many positive examples are judged to be positive, NDCG considers not only the ranking but also the relevance of the top positive examples.

Baselines. To verify the performance of our model, we compare ME-LGNN with the following methods.

  • TBPR [20]: The approach incorporates the important concept of strong and weak ties into social recommendation, combining the BPR [16] model to discriminate between strong and weak ties. The authors propose an EM model-based algorithm for discriminating strong and weak ties in social networks, which learns the potential feature vectors of users and items based on the optimal recommendation accuracy.

  • SocialMF [10]: The method constrains that a user’s preferences should be as similar as possible to the average preferences of the social neighbors to which the user is connected, and introduces trust propagation in the matrix factorization so that users are represented as being close to the users they trust.

  • NeuMF [8]: The method is a typical recommendation algorithm based on deep learning. It combines traditional matrix factorization and a multi-layer perceptron to extract both low and high dimensional features with impressive recommendation results.

  • Diffnet [18]: This approach uses GCN to model the user’s social network to obtain the user’s embeddings then leverages the SVD++ [11] framework to implement the recommendation task.

  • LightGCN [7]: This model is also a graph-based model, where it explores recommendation tasks on a user-item interaction graph. In this work, the authors’ goal is to simplify the design of the GCN to make it cleaner and more suitable for collaborative filtering tasks.

Parameter Settings. We use tensorflow to implement our model. For all models relying on gradient descent-based approaches in the model learning process, we utilise Adam as the optimization method. For all comparison models, we tune the learning rate between [0.001, 0.005, 0.01.0.02, 0.05] to get the best results. The regularization factor was tuned between [\(10^{-6}\),\(10^{-5}\),\(\cdots \),\(10^{1}\),\(10^{2}\)] to obtain the best results. The training batch size is set to 1024. For NeuMF, we set the hidden layers as suggested in [8]. For Diffnet, LightGCN, we tune the GCN layers between [1, 2, 3]. For our model, the number of GCN layers is set to 2. Finally, we set the embedding dimension to 32 and 64 respectively to compare the recommended performance.

4.2 Overall Comparison (RQ1)

In Table 3 and Table 4, we show the overall performance of the top-10 recommendations for all models with different dimensional embeddings in the three datasets. It can be observed that almost all the performances are improved accordingly with increasing embedding dimension d. Both TBPR [20] and SocialMF [10] leverage the social connections of users to mitigate the problem of sparsity, and while TBPR [20] incorporates the strength of ties, SocialMF [10] introduces the propagation of trust. NeuMF [8] introduce high-dimensional features on the basis of traditional methods and achieve considerable results. Graph convolution models Diffnet [18] and LightGCN [7] show a significant improvement, as reflected in the great potential of graph-based recommendation models for recommendation tasks. Our ME-LGNN is superior on almost all data sets, which shows the effectiveness of explicit modeling for high-order collaborative signals. We carry out further experiments on this aspect, the results are depicted in Table 5 and Table 6. They present the experimental performance under different Top-N when d = 64, which are consistent with our previous analysis, further verifying the effectiveness of our model. Such results and explanations prove that our model is effective through specific experiments. As can be seen from the results of the experiment, the results perform best when the dimension is 64, so we will use d = 64 for model analysis in the following experiments.

Table 3. Recall@10 comparisons for different dimension size d.
Table 4. NDCG@10 comparisons for different dimension size d.
Table 5. Recall@N comparisons for different top-N values.
Table 6. NDCG@N comparisons for different top-N values.

4.3 Ablation Experiments (RQ2)

In this subsection, we examine whether our proposed meta-path enhancement module is effective through ablation experiments. We remove the meta-path enhancement module to train the model and obtain experimental results for the model LGNN. The results of the experiment are shown in Table 7 and Table 8, it can be observed that the addition of the meta-path enhancement module to the LGNN yields the ME-LGNN, which further improves performance and demonstrates the effectiveness of valuing reliable connections for modeling. Moreover, after removing the meta-path enhancement module, our model still exhibits impressive performance, further demonstrating its effectiveness in explicit modeling of HINs.

Table 7. Recall@N comparisons for different top-N values.
Table 8. NDCG@N comparisons for different top-N values.

4.4 Performance Comparison Under Different Sparsity (RQ3)

To verify whether our model can mitigate the data sparsity problem, we conduct sparsity experiments on the Yelp and Douban datasets. For the users in the training data, we first group them according to the quantity of interactions. For example, [16, 32) means that the user interacts with the item at least 16 times and at most 32 times in the training set. The experimental results are presented in Fig. 3, from which we are able to observe a general trend of improved performance with increasing numbers of interactions. The result is cognitive, since the more interactions there are, the more information that can be captured. It can also be noted that our model exhibits respectable performance in most cases, with a significant improvement on the Yelp dataset in particular, indicating that our model can mitigate the data sparsity problem.

Fig. 3.
figure 3

Performance under different rating sparsity on two datasets.

5 Conclusion

In this work, we propose a new recommendation framework called Meta-path Enhanced Lightweight Graph Neural Network (ME-LGNN) to explicitly model high-order collaborative signals. In order to make users pay more attention to reliable links, we design a series of meaningful meta-paths, random walk based on the meta-paths, and constrain the links by calculating the dependency probabilities of the meta-paths. Extensive experimental analysis on three public datasets shows the effectiveness of our proposed model.

Currently, our approach shows promising performance in handling simple heterogeneous information networks. In future, we consider improving our approach to enable the handling of complex network structures with more attribute information. We will also consider how to make more rational use of meta-paths to improve the interpretability of recommendation results.