1 Introduction

Recommender systems offer an effective way of delivering information, products, and services to users with personalized information, which have been proven successful in various domains such as online entertainment and e-commerce [29]. However, users may not trust the recommender systems due to inaccurate recommendation results. For example, a user may not trust a stranger’s preference even when they have similar rating records. Another example is that the system may recommend an item that is deliberately highly rated by malicious users.

One traditional solution to the above issues is leveraging external trust relationships, which is often called trust-aware recommendation [19]. Related research diverges into memory-based and model-based methods. The former mainly employ memory-based collaborative filtering methods—they search the trust networks to obtain the neighbors and then make recommendations based on those trusted neighbors [17]. For example, Jamali and Ester [12] combine TrustWalker [11] with neighborhood collaborative filtering. They first use random walks to get the user representation from the trust network and then perform a probabilistic strategy for selecting items to give recommendation. Similarly, Zhang et al. [31] retrieve the user trust information from user feedback and infer the user preference from the top-k identified friends. Model-based methods are majorly apply model-based collaborative filtering methods, such as matrix factorization [7, 28], for recommendation. For example, Zhao et al. [32] incorporate the social trust information based on a Bayesian Personalized Ranking approach. They assume that the user preference will be affected by their friends, i.e., the user will also leave high ratings to items preferred by their friends. Guo et al. [7] integrates the social trust information with using a SVD++[14] based method. Both memory-based and model based trust-aware recommendation methods improve the model performance by leveraging the explicit or implicit relationship among users. However, they may fail to consider the reliability of ratings in determining the trustworthiness of recommender systems.

Another direction toward trust-aware recommendation is to design a robust recommender system that resists biased or randomized ratings provided by users in a real-world context. One approach is to insert man-made noise into the input to force the system to learn robust parameters of the input so that to improve the model’s ability in resisting the noise. One example is the denoising auto-encoder (DAE) [3], which corrupts the inputs with man-made noises. The work [27] used collaborative denoising auto-encoder (CDAE) which shares similar ideas of DAE. The inputs (ratings) are corrupted by the Gaussian noises and then fed into the neural nets via an encoder to get a dense representation. The decoder is trying to map the dense representation into the user-item interactions and for recommendation. Instead of man-made noise, some work adds adversarial noise to the model. The majority of this type of work focus on introducing noise in model configurations to improve the robustness of the model parameters. For example, He et al. [9] introduced an additional objective function in the traditional Bayesian Personalized Ranking approach to quantify the loss of a model under perturbations on its parameters. In details, the adversarial noises are added to the model parameters; the recommender model is updated by considering both the training loss and the adversarial loss, where they minimize the training loss while maximize the adversarial loss. Yuan et al. [30] mixed adversarial noise with model parameters and latent user representations to improve the robustness of the model. Their training strategy includes two learning steps: first, obtain optimal parameters by a training step; and second, minimize the recommendation loss while maximize the adversarial noise loss. Similar to trust relationship-aware recommendation approaches, a limitation of the above proposed noises is that the model cannot defend the real-world noises like fake ratings.

To the best of our knowledge, few studies have focused on the robustness issue caused by user misbehaviors in rating. In this regard, we embrace the advantages of adversarial training in simulating biased or malicious ratings and propose reinforced trust-aware recommendation to harvest the benefits of both social information and the denoising approach. Our method consists of a predictor that infers the ratings and a discriminator that enforces cohort rating patterns on the predicted ratings. In a nutshell, we make the following contributions:

  • We propose a rating predictor based on an encoder-decoder structure to learn latent information about user rating patterns and user social trust networks. User social trust embedding learned by an attentive graph neural network can balance the contributions of user neighbors. The predictor distinguishes from previous studies in considering not only user’s trust relationship but also rating quality.

  • We introduce a discriminator to learn transferable patterns in rating behaviors while eliminating user-specific bias, thereby enforcing consistent rating patterns among different users to lift the robustness of the model.

  • We have tested the proposed model on three real-world datasets to show its competitive performance against several baselines. We provide detailed parameter studies and model discussions.

We will review the related work for social-aware recommendation and the robustness of the recommender systems in the following Sect. 2. We further present our proposed method in Sect. 3 and show the model performance in Sect. 4. The conclusion and future works are discussed in Sect. 5.

2 Related work

Traditional recommendation techniques to deal with the trust issue include the exploitation of social relations or adding randomly generalized noise to the model configurations to improve the robustness of the recommender system [2].

Social-aware recommendation approaches utilize the user trust network to complement the sparse rating data. This will improve the recommendation performance by considering two source of rating information: the original user preference and the preference from the trusted users. Traditional social-aware recommendations include memory-based methods and model-based methods. The former mainly propose trust propagation methods by leveraging the ratings of friends to deduce the ratings of a targeted user [26]. The work by [18] is one of the first works that leverages the social relationships. The idea is replacing the role of collaborative filtering by trust network. Specifically, the model propagates trust information over the social trust network to estimate the weight for the trust link that can be used in place of the user similarity weight. Jamali and Ester [12] combine TrustWalker [11] with neighborhood collaborative filtering. They first use random walks to get the user representation from the trust network and then perform a probabilistic strategy for selecting items to give recommendation. Zhang et al. [31] retrieve the user trust information from user feedback and infer the user preference from the top-k identified friends. Model-based methods largely depend on matrix factorization. The social relations are generally used to form the user representation. For example, Wen et al. [25] use graph embedding approaches for learning learn the user social trust representation and then combine the trust representation with user ratings as the input of matrix factorization. Guo et al. [7] integrates the social trust information with using a SVD++[14] based method. Ahn et al. [1] have quantified by how much social network information can reduce sample complexity, which provides the theoretical support for integrating the social trust information. Zhao et al. [32] incorporate the social trust information based on a Bayesian Personalized Ranking approach. They assume that the user preference will be affected by their friends, i.e., the user will also leave high ratings to items preferred by their friends.

Another direction is designing robust recommender systems. A general way is introducing noise to the system configurations to improve system performance. By doing so, the model is forced to learn robust parameters to improve denoising capability. Traditional methods include introducing human-made noise. For example, in the collaborative denoising auto-encoder [27], the input data are corrupted by Gaussian noise before fed to the neural network. The decoder is trying to map the dense representation into the user-item interactions and thus for recommendation. Wang et al.[23] integrate both recurrent neural networks (RNNs) [15] and denoising autoencoders for recommendation. The RNNs are used for extracting the information from the item textual description. The whole model is in an autoencoder structure, where the RNNs are used as encoder and decoder layers. The proposed recurrent autoencoder can learn both rating information and sequential information (e.g., textual information) to get the dense representation. Strub et al. [21] corrupt the inputs by stacked denoising autoencoders[22]. They also considered the side information, e.g., user profiles and item profiles, to enhance the robustness of the model. In the later researches, some works leverage adversarial noise to improve the robustness of the model. Wang et al. [24] propose a generative adversarial model that consists of a generator and a discriminator for recommendation [6]. The generator (predictor) acts as an attacker to cheat the discriminator by capturing the rating patterns from the users and generating ratings with similar patterns; the discriminator targets distinguishing the generated samples from the real ratings. The two models update step by step by competing with each other, like playing a minimax game until the generator (predictor) provides well and stable rating prediction. He et al. [9] introduced an additional objective function in the traditional Bayesian Personalized Ranking approach to quantify the loss of a model under perturbations on its parameters. In details, the adversarial noises are added to the model parameters; the recommender model is updated by considering both the training loss and the adversarial loss, where they minimize the training loss while maximize the adversarial loss. Yuan et al. [30] mixed adversarial noise with model parameters and latent user representations to improve the robustness of the model. Their training strategy includes two learning steps: first, obtain optimal parameters by a training step; and second, minimize the recommendation loss while maximize the adversarial noise loss.

However, The above two directions of trust-aware recommender systems do not consider the reliability of the ratings, i.e., the existence of biased, randomized, or malicious ratings provided by users. The former social-aware approaches mostly do not consider the robustness issues, and the denoising approaches majorly focus on the parameter robustness. In this paper, we bridge the advantages of both social-aware recommender systems and robustness issues for the recommendation with reinforcing cohort rating patterns.

3 Methodology

3.1 Overview

In this work, we consider the rating prediction problem in recommender systems. Our target is predicting users’ ratings on new items based on the user-item rating interactions and social trust relationships. Let \(R\in \mathbb {R}^{m\times n}\) denotes the user-item rating matrix, where each entry \(r_{u,i}\) represents the rating of user u on item i; m and n are the numbers of users and items, respectively. We use \(I_{u}\) to represent the set of items rated by user u and \(r_{u}\) to represent the according ratings. The social network can be represented by a graph \(G=(V, E)\), where V is a set of m nodes (users), and E denotes directed trust relations among users. We use T to describe the weight of E, where \(t_{u,v}\in T\) indicates the trust degree between u and v. The trusted users by user u is represented by \(V_{u}\), i.e., \(\{t_{u,v}=1| v \in V_{u}\}\). The ratings from the trusted users are denoted as \(\{r_{v}| v\in V_{u}\}\). Thus the recommender model is trying to predict \(r_{u,i}\) for new items by \(r_{u,i} \leftarrow (r_{u}, r_{V_{u}})\). Figure 1 illustrates the structure of the proposed model, where we have the recommender model works as the predictor and a discriminator to force the cohort rating patterns in the predicted ratings. The predictor first learns the latent representation of users’ ratings and trust relations and then combines them into a shared hidden layer that contains users’ latent patterns. It also acts as a generator to simulate rating patterns of real users. The rating pattern embedding is learned from neural networks, i.e., \(H_{r}\leftarrow r_{u}\); while the social trust embedding is learned by attentive graph neural networks [5], i.e., \(H_{t} \leftarrow \{r_{v}|v \in V_{u}\}\). The discriminator determines whether the predicted ratings \(\{\hat{r}_{u,i} | i \in I_{u}\}\) follow the cohort patterns as the meta-information \(\{r_{u,i}| i \in I_{u}\}\), thereby providing accurate and confidential rating prediction. It also detects the abnormal rating patterns to improve the robustness of the model. We provide more details about the proposed model as in the followings.

Fig. 1
figure 1

Structure of our proposed end-to-end model. The predictor predicts users’ ratings based on their previous ratings and trust relationships. The discriminator enforces consistent predictions regardless of individuals’ behavioral differences in rating

3.2 Rating prediction with correlative trust relationship fusion

Autoencoder is an unsupervised model that reconstructs its inputs in the output layer, which has been used in many recommendation tasks [20]. The encoder-decoder structure can help with learning the latent preferences of users according to the user-item interactions and providing predictions based on the latent preferences. In this work, we integrate trust information into the layers to conduct comprehensive recommendations. We first learn a shared latent representation from two types of sparse information: users’ previous ratings and ratings from trust users, i.e., dual autoencoders, and then predict ratings based on that representation.

3.2.1 Embedding learning

Here we learn two types of sparse information to get the latent representation, i.e., social trust embedding and rating pattern embedding.

The meta representation for the user rating pattern is simply represented by \(r_{u}\). To infer the rating pattern embedding, the encoder layer maps the inputs into a low-dimensional space by neural networks. The simplest case is using fully connected layers:

$$\begin{aligned} H_{r} = \sigma (W_{e}^{\top }r_{u}+b_{e}^{r}) \end{aligned}$$
(1)

where \(W_{e}\) is the weight in encoder layers, \(b^{r}_{e}\) is the bias term, and \(\sigma \) is the activation function. The encoder layer could also be in other forms, such as convolutional neural network [15], according to the learning tasks.

The meta representation of the social trust relationships is learned from the rating patterns of the trusted users. Given a set of trusted users \(V_{u}\), i.e., \(t_{u,v}=1\) for \(v \in V_{u}\), where each user has a rating record \(r_{v}\), we employ an attentive graph neural network [5] for learning the social trust meta representation \(s_{u}\):

$$\begin{aligned} s_{u} = \sigma (W_{s}^{\top } \varSigma _{v\in V_{u}} \alpha _{v}r_{v}+b_{s}) \end{aligned}$$
(2)

where \(\alpha \) represents the contribution of user v, which could be regarded as attention values; \(\sigma \) stands for the activation function; \(W_{s}\) and \(b_{s}\) are weights and biases. We will hereby omit explanation of similar notations of weights and bias for simplicity. Intuitively, the neighbors of user u are not equally contributed to the social trust representation of users; thus, we utilize the attention mechanism, i.e., the attention values \(\alpha _{v}\) proposed in equation 2, to balance the social influences. Suppose user u has strong connections with the neighbors who has similar tastes, then we learn the attention value for each user as follows:

$$\begin{aligned} \beta _{v}&= \sigma (W_{a}^{\top }f(r_{u},r_{v})+b_{a}) \end{aligned}$$
(3)
$$\begin{aligned} \alpha _{v}&= \frac{\exp {\beta _{v}^{\top }w_{v}}}{\varSigma _{v\in V_{u}}\exp {\beta _{v}^{\top }w_{v}}} \end{aligned}$$
(4)

where \(f(r_{u},r_{v})\) is the correlation function representing the correlative rating patterns between user u and trusted user v; \(w_{v}\) is a randomly initialized vector that captures the correlative latent patterns. The correlation function evaluates the rating pattern similarity between the user and the trusted users. It can be in different forms. For example, the correlation function can be the concatenation or the difference of two rating lists. We will discuss the model performance on different correlation functions in the further experiments. Now we have the social trust meta representation \(s_{u}\). Similarly, the encoder layer will map the meta representation \(s_{u}\) into a low-dimensional space by neural networks:

$$\begin{aligned} H_{t} = \sigma (V_{e}^{\top }s_{u}+b_{e}^{s}) \end{aligned}$$
(5)

where \(V_{e}\) stands for the weights.

The two encoder layers works simultaneously. Given the user history ratings \(r_{u}\) and the information of the trusted users, we get the social trust embedding \(H_{t}\) and rating pattern embedding \(H_{r}\) by several encoder layers.

3.2.2 Rating prediction

After several encoder layers, we get more concise representations of the rating records as \(H_{r}\) and trust information as \(H_{s}\).

To integrate two sources of information, we sum up the latent representations \(H_{r}\) and \(H_{s}\) with weights to form a shared latent representation:

$$\begin{aligned} H = \gamma \cdot H_{r} + (1-\gamma ) \cdot H_{t} \end{aligned}$$
(6)

where \(\gamma \in [0,1]\) is the parameter to control the contribution of the rating information to the shared latent representation in comparison with social trust information. Another way to combine two sources of information is concatenating the latent representation, i.e., \(H = [H_{r}, H_{t}]\). The experimental results showed that the summing of the two representations performs better than the concatenation of the two representations. We will discuss the performances of such two ways later in the ablation studies.

Differing from the encoder, the decoder aims to explain or expand the concise latent representation. Given the concentrated information about a user’s preferences embedded in the shared latent representation, we obtain the predicted ratings \(\hat{r}_{u}\) by decoding it into a list of ratings and trust relationship:

$$\begin{aligned} \hat{r}_{u} = \sigma (W_{d}^{\top }H+b_{d}). \end{aligned}$$
(7)

The performance of the recommendation can be evaluated by the loss between the original inputs and the predictions, i.e., \(\ell (r,\hat{r})\) and \(\ell (t,\hat{t})\), where \(\ell \) is the loss function.

3.3 Cohort rating patterns enforcement

The predictor works fine alone after training but may neglect noises in the input, due to the possible diverse rating distributions from abnormal users in a real-world context. We design a discriminator to distinguish the generated ratings \(\hat{r}\) from real ratings and train the model until the discriminator cannot classify them accurately [6]. This way, we can enforce cohort rating patterns on the generated ratings to reduce the adverse impact of users’ biases, misbehaviors, and low-quality ratings. We use a multilayer perceptron as the classifier to predict any type of rating inputs (\(r_{u}\) or \(\hat{r}_{u}\)), say \(r_{*}\):

$$\begin{aligned} \hat{y} = D(r_{*})= \text {softmax}(\sigma (W_{c}^{T}r_{*}+b_{c})) \end{aligned}$$
(8)

We train the classifier in two steps: discriminating and generating. In the first step, a discriminator aims to output \(y=0\) for any generated rating \(\hat{r}\) and \(y=1\) for real ratings r, by minimizing

$$\begin{aligned} L_{D} = E_{r_{*}\in \{r,\hat{r}\}}[\ell (y,\hat{y})] + \lambda ||\varTheta ||_{1} \end{aligned}$$
(9)

via gradient descent, where \(E_{r_{*}\in \{r,\hat{r}\}}[\ell (y,\hat{y})]\) is the mean prediction loss for our generated ratings or the real ratings, \(\ell \) is the cross entropy loss function, \(\varTheta \) represents model parameters, \(\lambda \) is the hyper-parameter, and \(||\varTheta ||_{1}\) is the regularization item to avoid over-fitting, where here we use absolute-value norm for regularization. Since the generated ratings \(\hat{r}\) are not as sparse as the real data (the real data are sparse due to the limited user-item rating records), we multiply them with a mask vector before feeding them into the discriminator, where the i-th element will be zero if a user does not provide a rating to item i.

The generating step trains a predictor to cheat the discriminator the discriminator aims to output \(y=1\) for the generated ratings (\(\hat{r}\)) by minimizing the gaps between the predicted labels of generated ratings \(\hat{r}\) and \(y=1\), in order to learn a transferable rating patterns. The loss for the generating step is:

$$\begin{aligned} L_{G} = E_{r_{*}\in \hat{r}}[\ell (1,D(r_{*}))] + \lambda ||\varTheta ||_{1} \end{aligned}$$
(10)

The whole process iterates until the discriminator cannot predict the generated ratings correctly.

Before training the discriminator, we train the predictor, i.e., our recommender model by minimizing the rating prediction loss (i.e., mean squared loss \(\ell \)) until convergence to train an accurate and robust model:

$$\begin{aligned} L_{R} = \ell (r, \hat{r}) + \lambda ||\varTheta ||_{1} \end{aligned}$$
(11)

The overall training process of our method is described in algorithm 1, where the model is updated for \(u\in U^{Train}\). The pseudo code of the testing phase is showed in algorithm 2. In the actual experimental settings, we update the model with batch of users. Specifically, we update the recommender model and the discriminator asynchronously. The target of our method is constructing an accurate and robust model; the design for the cohort rating patterns enforcement will help the model produce reliable rating predictions. Thus, we update the recommender model with every batch of training users, and we update the discriminator with every few batches of training users.

figure a
figure b

4 Experiments

4.1 Datasets

We evaluate the proposed model on three real-world datasets: FilmTrust, Epinions and CiaoFootnote 1. FilmTrust is a small dataset that consists of 35,497 ratings of 2071 items from 1508 users, and 1853 trust links. The latter two datasets contain over 100 thousand items from thousands of users. For Epinions, which is a product review dataset, there are 469,126 ratings from 37,701 users in 19,627 items. There are about 487,000 trust relationships among users. Ciao consists of 137,187 ratings from 7237 users for 8819 products, and there are 111,781 trust links.

4.2 Model setups

Data-preprocessing First, we filter the missing values of the dataset. Second, we preprocess the two larger datasets to make them applicable to our method. The Epinions and Ciao are two large dataset that contains over 100 thousand items. Our approach is based on the user-item interactions, i.e., the rating records for each user is represented by \(r_{u}\in \mathbb {R}^{n}\), where n is the number of items. The computation cost of our method will be high if with a large n; besides, using fully connected layers for encoding the inputs will aggravate such situation. There are two ways for alleviating the computation cost: decrease the number of items or use less complicated model structure (less parameters). Thus, we filter the dataset with items that with less than 10 rating records, and we use convolutional layers for encoding the inputs.


Experimental settings Our code is implemented with TensorFlow Footnote 2 in Python 3.7 and runs on a Linux server with NVIDIA TITAN X. The processed datasets will take about 60MB hard disk space. The default activation function is Sigmoid function [8]. We have parameters in model set-ups, encoder-decoder structures, and hyperparameters. By default, we use 90% of each dataset for training and others for testing; the batch size is about 1/10 of the dataset; we use two encoder layers for encoding the inputs and two decoder layers for decoding the latent representation; the hyperparameter \(\gamma \) for controlling the contribution of rating pattern embedding is set as 0.7; the hyperparameter \(\lambda \), which is the coefficient for the regularization item, is set as 0.001; the learning rate is 0.001.

Fig. 2
figure 2

Sensitivity to parameter settings

Fig. 3
figure 3

Model performance during the training process

4.3 Parameter studies

We have three types of parameters for setting up the model: the data set-up parameters, the encoder-decoder structure settings, and the hyperparameters. In this section, we study on the performance of our proposed model with different settings on the FilmTrust dataset. We will show the results under different settings and different learning epochs. The results are under two evaluation metrics: Mean Absolute Error (MAE) and Root-Mean Squared Error (RMSE).

  • MAE: \(\frac{1}{m}\varSigma ^{m}_{u=1} \frac{1}{n}\varSigma ^{n}_{i=1}|r_{u,i}-\hat{r}_{u,i}|\)

  • RMSE: \(\sqrt{\frac{1}{m}\varSigma ^{m}_{u=1} \frac{1}{n}\varSigma ^{n}_{i=1}(r_{u,i}-\hat{r}_{u,i})^{2}}\)

A lower value indicates better model performance.


Data set-up parameters include the train-test split ratio and batch size. Default settings for these parameters are 0.9 (for training dataset), and the batch size is about 1/10 of the training dataset. Figure 2a–b shows the overall model performance on different settings. The results suggest a larger training set improves the model performance; and the model with a moderate batch size, rather than the extreme settings of the batch size (e.g., 1/100 or 1/5), delivers the best performance. Figure 3a–b shows the model performance during the training process. We could see that a larger training set also improves the stability of the model, where the model performs best when training ratio is 0.95. According to Fig. 3b, the model performance fluctuates with a small batch size, while converges slowly with a large batch size.


Encoder-decoder structure setting. We compare the models under different settings regarding the number of encoder/decoder layers (1, 2, 3) and the number of neural nodes (1/20 to 1/2 of the dimension of inputs) in the hidden layers. We show results of models with one, two, and three layers, and we use ‘+’, ‘-’ to indicate higher (e.g., 1/10 to 1/20 of the dimension of inputs) or lower dimensions (e.g., 1/2 to 1/10 of the dimension of inputs) of layer nodes. Our experimental results (Fig. 2c) reveal that adding more layers to the encoder or the decoder delivers better performance, due to the sparsity and high dimensionality of the datasets. The two-layer structure delivers very similar results as the three-layer structure, though the performance slightly fluctuates for a three-layer structure under high dimensionality. Smaller dimensions of layer nodes generally result in better performance, given the same number of layers (except for one layer). Figure 3c also suggests that it is hard for the model to learn effective patterns with only one neural layer, and a three layers encoder with lower dimensionality provides most stable prediction.


Hyperparameters. Figure 2d–f shows the performance over the hyperparameters (\(\gamma \), \(\lambda \)) and learning rate. \(\gamma \) controls the weight for user rating patterns in comparison with user social trust embedding. Our experiment on \(\gamma \) reveals that bias in user preference may lead to better performance of our model. Besides, using only one source information (ratings or trust relations) delivers inferior results, indicating there exist hidden relationships between users, rating behaviors, and their trust relationship. According to Fig. 3d, we could also observe that using only one source information will aggravate the over-fitting issue. So a median value of \(\gamma \) provides better and stable performance for the recommendation. \(\lambda \) is the regularization coefficient. According to both Figs. 2e and 3e, a small value of \(\lambda \) (between 0.0001 and 0.00001) provides best and reliable results, while a larger value (e.g., over 0.001) or a near zero value will lead to a bad model performance. As for the learning rate, it is reasonable to set the learning rate to a moderate value because large values tend to make the convergence difficult, while smaller values may slow down the learning. Here, the value 0.001 provides the best performance.

Table 1 Comparison results

4.4 Comparison results

We compare the proposed model with several baseline algorithms, including TrustMF [28], SoReg [16] and SocialMF [13]. These methods use matrix-factorization based methods and combining social information into user embedding. Besides, considering the popularity of deep learning in the recent recommendation research, we compare three recent deep learning-based methods for comparison, which are NeuMF [10], DeepSoR [4], and GraphRec [5].

  • TrustMF: constructs a trust network and maps the users into truster space and trustee space. Each user has feature vectors in the trust networks, and the representation for each user is affected by the trusted users. Then collaborative filtering method is used for recommendation.

  • SoReg: employs social trust networks to get regularization terms for controlling the matrix factorization objective function.

  • SocialMF: also considers the matrix factorization methods, where they incorporates the user social information for forming the user representation. The prediction is based on the user representation and item representation.

  • NeuMF: is a matrix factorization model with neural network architecture.

  • DeepSoR: forms user representation from social networks and use probabilistic matrix factorization for rating prediction.

  • GraphRec: models two graphs, i.e., the user-user social graph and the user-item graph, with graph neural networks; the rating prediction is based on the concatenation of item representation and user representation.

The above comparison methods all consider the social relationships for recommendation, while they are mostly based on matrix factorization algorithms. The GraphRec method is similar to our work that they use graph neural networks to infer the user trust embedding from the social relationships; but it ignores the reliability of the user ratings. We reuse the default parameters or the presented results from the original papers for comparison. Table 1 shows the experimental results, where the last row stands for the performance of our model. We can see that both matrix-factorization based methods and deep learning-based methods use users’ latent preferences for recommendation while in different ways. Both methods achieved similar performance on the small FilmTrust dataset. Deep learning-based methods perform better on the Ciao and Epinions datasets, which are much larger than FilmTrust. This can be attributed to the stronger capability of deep neural networks in capturing complex relationships among input. Due to the ascendant ability in handling complex graph structures, GraphRec performs better than our model in dataset Ciao, which has a higher density of social links and user-item ratings. Overall, our model performs consistently well on three datasets and outperforms a series of comparison methods, which shows the effectiveness of our proposed attentive graphical user trust relationship learning and the adversarial training strategy. We will discuss the details about these modules in the following sections.

4.5 Ablation studies

In this section, we carry out a series of ablation studies to show the effectiveness of leveraging both of the trust information and the robustness of recommender systems, i.e., our attentive graphical learning for user trust representation and the adversarial training strategy for updating the recommender model. Besides, we discuss the model performance with different settings of the correlation function, which is designed to balance the neighbors’ contributions to the user social trust representation. We perform the studies on the FilmTrust dataset.

4.5.1 Impact of social trust information

We tested two methods to combine the latent representations of the rating and trust information. The first method concatenates representations of rating and trust data for each user; the second sums up the representations with different weight settings (as introduced in our method). According to the results listed in Table 2, the second method exhibited better prediction performance in our experiments.

The reason may lie in that our designs of user social trust embedding share similar data structures with user rating pattern embedding. The sum up way keeps more structural information than simply concatenation. Besides, we tested the model performance with different settings of parameter \(\gamma \), which controls the weights for user rating pattern embedding when summed up with user social trust embedding. \(\gamma = 0\) and \(\gamma = 1\) indicate the cases that model is trained without and solely based on user rating history, respectively. We could observe that the combination of user social trust embedding and the rating pattern embedding performs better than a single perspective of embedding. Interestingly, the model performs well with only the user social trust embedding, confirming our assumption that users share similar tastes with their neighbors.

Table 2 Impact of social trust information

4.5.2 Impact of adversarial training

To validate the discriminator’s effectiveness in enforce cohort rating patterns among the real and generated ratings, we compare the ratings generated by our proposed model and those generated solely by the predictor (without adversarial training). Figure 4 shows the model performance and the distribution of the ratings. First, we can see that the model with adversarial training consistently outperforms the model with solely the predictor. Second, compared with ratings generated by the sole predictor, the predicted ratings with adversarial training tend to fall into different ranges for different items with similar patterns as real ratings.

Fig. 4
figure 4

a The performance on model with/without adversarial training. b The distribution of ratings. Red area shows the distribution for real ratings; green area shows the predicted ratings without adversarial training; and the blue area shows the predicted ratings with adversarial training

Fig. 5
figure 5

a An example for the user social trust representation learning with correlation function as cosine similarity. b The model performance with different settings of correlation function

4.5.3 Impact of correlation function

The correlation function \(f(r_{u}, r_{v})\) defines the relationship between a user and its neighbors to learn the neighbors’ contributions to the user’s social trust embedding. Intuitively, users with high consistency in history rating records may share similar tastes. We compare the following correlation functions:

  • Cosine similarity: \(f(r_{u}, r_{v}) = \frac{r_{u}\cdot r_{v}}{||r_{u}||||r_{v}||}\)

  • Concatenation: \(f(r_{u}, r_{v})=[r_{u},r_{v}]\)

  • Difference: \(f(r_{u}, r_{v})=r_{u}-r_{v}\)

  • Dot product: \(f(r_{u}, r_{v})=r_{u}\cdot r_{v}\)

  • Equal contribution: \(f(r_{u}, r_{v})=1\)

Figure 5a gives an example of the user social trust representation learning with cosine similarity. The cosine similarity is calculated based on the rating history of user and her neighbors. We further use the cosine similarity to learn the contribution of \(r_{\text {Friend 1}}\) and \(r_{\text {Friend 2}}\) to the social trust embedding \(s_{u}\) with attention mechanism, referring to Eqs. (2) to (4). Figure 5b shows the results with different settings. The performance of the equal contribution measure indicates friends are not equally contributed to the user social trust representation. Instead, other measures that consider the distance between the user and her neighbors present well performance. However, the performance of the difference measure is quite unstable; it may eliminate the information when users have the same ratings on the same items.

5 Conclusion

In this work, we propose a unified reinforced trust-aware recommendation model that leverages both users’ trust relationships and rating quality to improve the recommendation performance. The model employs a predictor based on an encoder-decoder structure to learn the shared latent information from sparse user ratings and trust relationships, and a discriminator to enforces cohort rating patterns on the predicted ratings. We compare the proposed method with a series of baselines and state-of-the-arts, and discuss the model performance under different configuration. The experiments on three datasets show the model’s competitive performance. One limitation of our proposed method is that the computation cost would be high with larger number of items. We will address this issue in the future work.