Keywords

1 Introduction

Recommendation systems play an important role in online businesses because of the economic benefits they bring by suggesting suitable products or services to customers. That motivation has driven research to improve algorithms to offer powerful recommendation engines, typically collaborative filtering (CF). Concurrent with the rise of deep learning, especially the use of GNNs to learn representations of users and items (as known as embeddings), many recent studies have focused on enriching embeddings by encoding them with collaborative signals, which carry information about user-item interactions [1,2,3,4,5]. These signals are extracted through the message-passing architecture of GNNs. More specifically, considering a user u as a node in the graph whose embedding is \(e_u\), at each propagation time this user node will adjust its embedding by aggregating all embeddings of neighboring items. During the aggregation progress, each embedding \(e_i\) from a neighboring node item i will be multiplied by a coefficient \(p_{ui}=1/\sqrt{|\mathcal {N}_u||\mathcal {N}_i|}\), where \(\mathcal {N}_u\) and \(\mathcal {N}_i\) denote the first-hop neighbors of user u and item i, so the updated embedding value \(e_u\) not only carries information about neighboring items, it also reflects the mutual importance between user u and item i through the \(p_{ui}\) coefficient.

However, GNN models only help each user or item embedding in the user-item interaction graph to be similar to neighboring nodes without regard to the weights of the links between nodes during the entire propagation process. There has been some research on adding weights to the embedding encoding process, such as [4, 5]. These studies construct a user-user graph where each connection between two users is the number of items shared by them. Aiming at addressing this problem, we have normalized the user-user graph based on Jaccard similarity and integrated these weights, which improves the quality of the extracted collaborative signal over each propagation and produces satisfactory embeddings for CF. Combining embedding propagated from two graphs has also been conducted through many studies [3,4,5]. SocialLGN has proven that their proposed graph fusion operation is a state-of-the-art combination embeddings from the two graph method [3]. Their results are more accurate than results from graph fusion operations based on GCN methods [6] or GraphSage [7].

In this paper, we propose a model named CombiGCN based on Light Graph Convolution (LGC) [2] to propagate user and item embeddings on the user-item interaction graph; in the meantime, user embedding is also propagated on the user-user weighted connection graph. To fuse two user embeddings obtained after each propagation into an integrated embedding, instead of using the fusion graph operation of SocialLGN, we simply use the weighted sum of the embeddings. We demonstrate the superior performance of CombiGCN by comparing it with state-of-the-art recommendation models on three real-world datasets that we preprocessed to avoid cold-start and over-fitting.

2 Related Work

2.1 Graph Convolution Networks

Due to its superior ability to model graph-structured data, Graph Neural Networks (GNNs) have become the state-of-the-art technology for recommendation systems. A graph convolution network (GCN) is a special type of GNNs that uses convolution aggregations. Spatial GCNs based on 1-hop neighbor propagation and in each layer GCN neighborhood aggregations are required and thus the computational cost is significantly reduced. In the recommended context, spatial GCN contains NGCF [1], WiGCN [4] and GCMC [8]. A recent study [2] developed Light Graph Convolution (LGC) based on the GCN architecture but eliminates trainable matrices and non-linear activation functions in GCN. The experiment reported in [2] also shows that LGC outperform GCN. Today’s most modern CF models [2, 3, 5] also use LGC instead of traditional GCN.

2.2 Multi-layer Perceptron and Backpropagation

In machine learning models involving Neural Networks like GNNs, learning the trainable parameters includes forward-propagation and back-propagation. The forward-propagation stage calculates the embedding value of each node in the neural network with trainable parameters. In matrix form, these parameters include the trainable matrices \(W^k\) and embeddings \(E^k\) in the k-th layer. In addition, non-linear activation functions such as ReLU and tanh are also applied to the results. The calculations in forward-propagation produce the prediction result in the last layer. To train the model, an optimization function to this result and propagate back to adjust the trainable parameters [EW]. When training neural networks, forward-propagation and back-propagation depend on each other. For the recommendation problem using GNNs also follows this rule. However, a recent study [2] has demonstrated the redundancy of the trainable matrices \(W^k\) and non-linear activation functions in recommendation models, the reason being that the embeddings mapped from user and item IDs are not many features, so using too many trainable parameters makes the model heavy and ineffective.

3 Our Proposed Method

We proposed our model, which includes a method to pre-process data set and the CombiGCN model, in this chapter. CombiGCN will explores the user and item interaction and weighted similarity matrix as the input, and make prediction at the output as recommendations. The overview of CombiGCN model is illustrated in Fig. 1.

3.1 Pre-processing Data

Algorithm 1
figure a

. Inference for data pre-processing

Algorithm 1 brings two main benefits in the learning process of the recommendation model: 1) Avoid over fitting, the dataset obtained from Algorithm has the most common features in the original dataset, so there will be little information about the typical interactions that cause the over fitting; 2)Remove noisy-negative interactions in implicit feedback [15], in the first step we have determined the set of items \(I_{c}\) and throughout the next steps when collecting users we only collect interactions with items contained in set \(I_{c}\). This will limit unpopular interactions, which are likely to be noisy-negative interactions.

3.2 Adjacent Graph and Weighted References Matrix

In this article, we use two graphs as the data sources including the user-item interaction graph and the user-user weighted connection graph denote by \(G_R\) and \(G_W\), \(U = [u_1, \cdots , u_n] (|U| = n)\) denotes the user nodes across both \(G_R\) and \( G_W\) and and \(I = [i_1, \cdots , i_m] (|I| = m)\) denotes the item nodes in \(G_R\). \(R \in \mathbb {R}^{n \times m}\) is the binary matrix with entries only 0 and 1 that represent user-item interactions in \(G_R\).

In WiGCN [4], matrix \(W_u = RR^T \in \mathbb {R}^{n \times n}\) accumulates the paths connecting two user via shared items. However, the matrix \(W_u\) only shows the number of intersections between the two sets of items \(I_i\) of user \(u_i\) and \(I_j\) of user \(u_j\), but has not recorded the influence of couple of users ij to all interaction data. We build the weight users matrix W to represent the user-user weighted connection graph through the matrix \(W_u\).

(1)

where, \(D_R \in \mathbb {R}^{n \times n}\) is a diagonal matrix with each entry \(D_{R_{ii}}\) represents the number of neighboring items of user i, \(J \in \mathbb {R}^{n \times n}\) is the matrix of ones (or all-ones matrix) and denote element-wise product. From a mathematical perspective, each element of \(W_u\) represents the intersection while \((D_{R}J + (D_{R}J)^T - W_u)\) represents the union of two list items of two users \(u_i\) and \(u_j\). Therefore Eq. 1 calculates the similarity between the pair of users ij based on Jaccard Similarity. To avoid over-fitting the model when using both matrices W and R during the propagation process, we mapping the values of W to a number of discrete values in the interval [0, 1] where value 0.0 represents no correlation between these two users while value 1.0 represents very high correlation.

3.3 CombiGCN

The general design of the proposed model is shown in Fig. 1, our model including three components - 1) Embeddings layer that use the unique identifiers of users and items to create embedding, 2) Propagation layers, which propagate the representations of users and items in LGC architecture and, 3) Prediction layer, that predicts the score between users and items pair based on final embeddings obtained after L propagation layers.

Fig. 1.
figure 1

The architecture of the CombiGCN model

Embedding Layer. Following the mainstream well-known models [1, 2, 4], we initialize user and item embeddings by map unique its ID into the latent space, and obtain dense vectors \(e_u^{l=0}\in {\mathbb {R}^d} (e_i^{l=0}\in {\mathbb {R}^d})\). Where l denote the number of layer propagation. The dimension of embeddings is denoted by d. We denote \(E^l\in {\mathbb {R}^{(n+m) \times d}}\) is the set of all embeddings during propagation, i.e. \(E^l\) contains the set of n user embeddings and m item embeddings at l-th layer.

$$\begin{aligned} E^l=E_U^l \parallel E_I^l=[e_{u_1}^l,\cdots ,e_{u_n}^l,e_{i_1}^l,\cdots ,e_{i_m}^l] \end{aligned}$$
(2)

Propagation Layers. In order to clearly introduce the embedding propagation process, we will first show this propagation process in the first layer of LGC architecture, and then show the general formula in the higher propagation layers. User Embeddings Propagation. The input of first layer is embedding \(E_U^0\), we will propagate this user embedding in two graphs, user-item interaction graph \(G_R\) and user-user weighted connection graph \(G_W\) respectively to obtain two user embeddings \(E_{U_{R}}^1\) and \(E_{U_{W}}^1\).

$$\begin{aligned} E_{U_{R}}^1 = \widetilde{R}E_I^0 ; E_{U_{W}}^1 = \widetilde{W}E_U^0 \end{aligned}$$
(3)

We further define \(\widetilde{R} = D_{R}^{-1/2}RD_{R^T}^{-1/2}\), where \(D_R \in \mathbb {R}^{n \times n}\) is a diagonal matrix with each entry \(D_{R_{ii}}\) represents the number of neighboring items of user i and \(D_{R^T} \in \mathbb {R}^{m \times m}\) is a diagonal matrix with each entry \(D_{R^T_{jj}}\) represents the number of neighboring users of item j. Similarly, \(\widetilde{W}\) is a symmetrically normalized matrix of W and \(\widetilde{W} = D_{W}^{-1/2}WD_{W}^{-1/2}\). We then combine the two embedding users \(E_{U_{R}}^1\) and \(E_{U_{W}}^1\) into \(E_U^1\).

$$\begin{aligned} E_U^1 = E_{U_{R}}^1 + E_{U_{W}}^1 \end{aligned}$$
(4)

Item Embeddings Propagation. For item embeddings, we just propagate them on LGC architecture only with user-item interaction graph. We also define \(\widetilde{R^T} = D_{R^T}^{-1/2}R^TD_{R}^{-1/2}\).

$$\begin{aligned} E_I^1 = \widetilde{R^T}E_U^0 \end{aligned}$$
(5)

The General Equation Embeddings Propagation. We have presented the first propagation step in LGC architecture, in the next steps the process is similar, but the input will be user embeddings of the previous layer and not \(E_U^0\) and \(E_I^0\). Equation (8) represents the propagation processes of embedding at higher levels.

$$\begin{aligned} {\begin{matrix} E^{l} = (E_{U_{R}}^l + E_{U_{W}}^l) \parallel E_I^l = (\widetilde{R}E_I^{l-1} + \widetilde{W}E_U^{l-1}) \parallel \widetilde{R^T}E_U^{l-1} \end{matrix}} \end{aligned}$$
(6)

Prediction and Optimization. After L embedding propagation layer we will get \(L + 1\) embeddings, the arrival of \(L + 1\) after L propagation layer is due to including initial embedding \(E^0\).

$$\begin{aligned} E^*= \alpha _0E^0 + \alpha _1E^1 + \cdots + \alpha _LE^L \end{aligned}$$
(7)

where \(\alpha _l = 1/(L + 1)\), that mentioned in [2] denotes the importance of the l-th layer embedding in constituting the final embedding. To perform model prediction, we conduct the inner product to estimate user preference for the target.

$$\begin{aligned} \hat{y}_{ui}= {e_u^*}^Te_i^* \end{aligned}$$
(8)

To learn parameters \(\varPhi = [E_U^0, E_I^0]\), CombiGCN have been applied Bayesian Personalized Ranking (BPR) [9]. BPR assumes observed interactions have higher preferences than an unobserved interactions. To optimize the prediction model we use mini-batch Adam [10] and minimize the BPR loss.

$$\begin{aligned} Loss_{bpr} = \sum _{\varOmega ^+_{ui}}\sum _{\varOmega ^-_{uj}} -ln \sigma ( \widehat{y}_{ui} - \widehat{y}_{uj} ) + \lambda \parallel \varPhi \parallel ^2_2 \end{aligned}$$
(9)

4 Experiments

4.1 Datasets Description

We make experiments with our proposed model on three well-known datasets, which are Ciao, Epinions, and Foursquare. Each dataset is being pre-processed and divided into two sets: 80% for training and 20% for testing.

  • Ciao [11, 12]: The Ciao dataset is an online shopping dataset containing the ratings given by users on a larger number of items.

  • Epinions [11, 12]: Epinions is a popular online consumer review website.

  • Foursquare [13, 14]: The Foursquare dataset record check-in data for different cities in the world.

4.2 Experimental Settings

Setting Parameters. To ensure that the experimental results are fair, we set the parameters to be the same across all models. Specifically, the learning rate is 0.001, the coefficient of L2 normalization is 0.00001, and the number of layers of LGC is 3, with each layer having an embedding size of 64. We also use the same early stop strategy as NGCF and LightGCN; specifically, if in 50 consecutive epochs, the recall at 20 on the test result does not increase, the model will be stopped.

Baseline. We use the same datasets and repeat the experiments on all the following baseline models to demonstrate the result:

  • BPR-MF [9] is matrix factorization optimized by the Bayesian personalized ranking (BPR) loss, which exploits the user-item direct interactions only as the target value of interaction function.

  • GCMC [8] adopts GCN encoder to generate the representations for users and items, where only the first-order neighbors are considered. Hence one graph convolution layer, where the hidden dimension is set as the embedding size.

  • WiGCN [4] is developed on top NGCF and add the connection between each user-user and item-item pair in the interaction graph by the number of shared items or users.

  • NGCF [1] conducts propagation processes on embeddings with several iterations. The stacked embeddings on output contains high-order connectivity in interactions graph. The collaborative signal is encoded into the latent vectors, making the model more sufficient.

  • LightGCN [2] focus on the neighborhood aggregation component for collaborative filtering. This model uses linearly propagating to learn users and items embeddings for interaction graph.

4.3 Experiment Results

Table 1. Overall Performance Comparisons

The overall performance comparison is shown in Table 1. The results clearly show that our model consistently achieves the best performance in all three metrics and all three datasets. Further, MF performance is much inferior to that of GNN-based models because it cannot capture collaborative signals. Although GCMC uses GCN, it only captures neighborhood information in the first layer, so it is less effective than the NGCF and WiGCN. WiGCN has better accuracy than NGCF because it introduces information about the weights of users and items during embedding propagation, which makes the WiGCN model more efficient in capturing collaborative signals. LightGCN is an LGC-based model that has removed components that have been shown to negatively affect the model training process, so the results of LightGCN are very good, only worse than those of CombiGCN.

5 Conclusion

In this work, we attempted to improve the embedding quality by adding connection weights between users based on their interaction history. To do that, we introduce the CombiGCN model, which implements embedded functions on two graphs: a user-item interaction graph and a user-item weighted connection graph based on the Light Graph Convolution architecture. The key to CombiGCN lies in its ability to combine embedding functionality across multiple graphs in a simple and effective way. We provide three preprocessed datasets with our algorithm to reduce cold start, over-fitting, and data noise problems for evaluation experiments. The results of experiments with state-of-the-art models are a valuable demonstration of the success of weight addition and the multi-graph combination architecture of CombiGCN.