1 Introduction

Recommender systems [1] have been applied widely in various types of electronic service systems, such as e-commerce [2] and social networks. These systems can recommend new items that the user might prefer based on the user’s history ratings. Two commonly used methods in recommender systems are content-based filtering and collaborative filtering (CF) [3]. The basic idea of content-based filtering is to recommend similar items to users based on what they liked in the past. By contrast, CF uses the known preferences of a group of users to make recommendations or to predict the unknown preferences of other users. One of the main difficulties that affect CF is the data sparsity problem, i.e., the rating matrices are very sparse. Another difficulty is cold start problem, where recommendations are required for items that no one has yet rated. A possible method to solve above difficulties is to build a hybrid model that fuses CF and content-based filtering where side information is integrated into the training process. The hybrid model aims to supplement the missing ratings by using the side information. For example, Adams et al. [4] proposed the addition of related (side) information regarding venues and dates to predict the scores of professional basketball games. Porteous et al. [5] introduced a Bayesian probabilistic matrix factorization framework that performed regression against side information to predict movie ratings.

Due to the rapid development of social networks, recommendation algorithms that combine trust information have attracted increasing attention recently. Scott et al. [6] noted that user preferences are always influenced by those of their friends, and thus some trust information-based methods have been proposed. Jamali et al. [7] presented a propagation algorithm to model trust relationships. However, only very sparse explicit trust information is available for use in these algorithms. Thus, considering implicit feedback is a good choice to learn user preferences more exactly [8, 9]. For example, SVD++ that is based on singular value decomposition (SVD) combined both implicit feedback and the user ratings together [8]. Recently, Guo et al. [9] proposed a new method called TrustSVD that combined both explicit and implicit feedback based on trust relationships as well as ratings to make better recommendations. The success of these methods demonstrates that adding explicit and implicit information can effectively improve the performance of the model.

The proposed methods mentioned above provide many ways of integrating trust information into a recommender system, but these trust-aware models are only extensions of the traditional methods or models. The problem is that these traditional simple linear transformations cannot explore deep semantic connections in a significant manner. Due to the development of deep learning, the effectiveness of neural networks is proved to be better than that of traditional algorithms in tasks such as speech recognition, natural language processing, and image processing. For example, Hong et al. [10] proposed a novel approach for recovering three-dimensional human poses from silhouettes, and this approach improved the traditional methods by employing multiview locality-sensitive sparse coding in the retrieval process. Moreover, they proposed a novel pose recovery method that used nonlinear mapping with a multi-layered deep neural networks, and the recovery error was reduced by 20–25% [11]. As a representative deep learning method, the autoencoder has been used in many applications and attracted increasing attention. Liu et al. [12] proposed a large margin auto-encoder to further boost the discriminability by enforcing different class samples to be highly marginally distributed in the hidden feature space and achieved remarkable results. They also proposed a Hessian regularized sparse auto-encoder by incorporating both Hessian regularization and sparsity constraints into the autoencoder, which performed well at preserving the local geometry of the data points and extracting the hidden structure in the data by using sparsity constraints [13]. Furthermore, several studies have used neural networks as powerful tools in recommender systems. For example, Wu et al. [14] proposed a model called collaborative denoising autoencoder, which was a generalization of several well-known CF models. Pan et al. [15] proposed a correlative denoising auto-encoder (DAE) model to learn correlations based on ratings and trust information to provide the top-N recommendation. Deng et al. [16] developed a novel trust-based approach for recommendation in social networks.

In this paper, we propose two trust-aware neural network models based on a DAE to alleviate the data sparsity problem and the cold start problem in CF. The main contributions of the paper include:

  • First, we add masking noise to the raw input data to greatly enhance the robustness of the model.

  • Second, we combine the rating information and the explicit trust information to depict each user more accurately, where we refer to this model as TDAE.

  • Third, we propose to extract the implicit trust relationships between users based on similarity measures in order to supplement the sparse explicit trust information, where the model that integrates both the explicit and implicit trust relationships is called TDAE++.

  • Finally, in order to reduce the nonlinear relationship between the trust information and the rating information, we add the trust relationships into both the input layer and the hidden layer of the autoencoder.

The remainder of this paper is organized as follows. In Sect. 2, we introduce some related studies of DAEs, the trust relationships in social networks, and CF with neural networks. In Sect. 3, we describe our proposed models in detail. And the experimental results and analyses are given in Sect. 4. Finally, we summarize the models and suggest potential ways to improve the performance of the models in Sect. 5.

2 Related Work

2.1 Notations

In this paper, we use the following notations:

  • i and j denote user i and user j, respectively,

  • \(r_{i}\) and \(r_{j}\) are the rating vectors of user i and user j,

  • \(r_{i,k}\) and \(r_{j,k}\) are the ratings of user i and j on item k,

  • \(\bar{r_{i}}\) and \(\bar{r_{j}}\) are the mean values of \(r_{i}\) and \(r_{j}\),

  • N and H are the dimensions of the input layer and the hidden layer of the autoencoder.

2.2 Denoising Autoencoder

The proposed model is based on a classical autoencoder [17]. The autoencoder is implemented as a one hidden-layer neural network, which takes a vector \(x\in \mathbb {R}^N\) as the input and maps it to a hidden representation via an activation function:

$$\begin{aligned} y=\rho \left( W ^T x+ b \right) , \end{aligned}$$
(1)

where W is a \(N \times H\) weight matrix and \(b \in \mathbb {R}^H\) is a bias vector. The resulting latent representation is then mapped back to a reconstruction vector \(\hat{x} \in \mathbb {R}^N \) by:

$$\begin{aligned} \hat{x} =\rho \left( W'y+ b'\right) , \end{aligned}$$
(2)

where \(b' \in \mathbb {R}^N\) is a bias vector and \(W'=W\) are tied weights for avoiding over-fitting and for enhancing the robustness of the model.

The parameters of this model are trained to minimize the average reconstruction error:

$$\begin{aligned} \begin{matrix} { argmin}\\ W^{T},W',b,b' \end{matrix} \frac{1}{m}\sum ^m_{i=1}l\left( x_i ,\hat{x}_i\right) , \end{aligned}$$
(3)

where m is the total number of the users and l is a loss function, such as the squared loss or the cross-entropy loss.

The DAE extends the classical autoencoder by training the data point x from its corrupted version \(x'\) to reconstruct the input vector more accurate. The goal of DAE is to force the hidden layer to discover a more robust low-dimensional representation and to prevent it from simply learning the features from the identity function [18]. The corrupted input \(x'\) is typically drawn from a conditional probability distribution \(p(x'|x)\). And another way of corrupting the input vector is to mask a random fraction of the input by replacing them with zero.

2.3 Trust Relationships in Social Networks

Trust information reflects interpersonal relationships in the real world in a social network, thereby representing the strength of the relationship and the degree of mutual recognition between users. Thus, trust is a powerful tool for optimizing the recommended results and it plays a very important role in recommender systems. Studies have shown that target users would rather believe the recommendations of their trusted friends than those of other online users who share a common hobby [19]. In general, trust relationships can be obtained directly from most social networks based on mutual friends, which are known as explicit trust relationships. However, the explicit trust relationships are not obvious in some e-commerce websites or review websites, and thus it is often necessary to determine the existence of trust relationships between users based on the similarity of their ratings or reviews, where this type of trust relationship is implicit.

In general, similarity measures can be used to assess the relationships between users. Many similarity measures are employed in CF, such as the cosine similarity, adjusted cosine similarity, and Pearsons correlation coefficient. If the ratings of a user can be considered as vectors in the vector-space, then the user preferences can be represented by their rating vectors and the cosine angles between vectors can measure the similarity between users. However, the cosine similarity does not consider the user preferences, so the adjusted cosine similarity solve this problem by subtracting the mean value of the user ratings:

$$\begin{aligned} \cos '(i,j)=\frac{\sum _{k\in L}\left( r_{i,k}-\bar{r_{i}}\right) \cdot \left( r_{j,k}- \bar{r_{j}}\right) }{\sqrt{{\sum _{k\in I_{i}}\left( r_{i,k}-\bar{r_{i}}\right) }^2} \cdot \sqrt{{\sum _{n\in I_{j}}\left( r_{j,k}-\bar{r_{j}}\right) }^2}}, \end{aligned}$$
(4)

where L is the set of the co-ratings rated by user i and user j, and \(I_{i}\) and \(I_{j}\) represent the sets of ratings rated by user i and user j, respectively.

Pearsons correlation coefficient can measure the correlations between users or items, where the range of the results is \([-\,1,1]\), and the user or item will be more similar when the value is greater:

$$\begin{aligned} pcc\left( i,j\right) =\frac{\sum _{k\in L}\left( r_{i,k}-\bar{r_{i}}\right) \cdot \left( r_{j,k}- \bar{r_{j}}\right) }{\sqrt{{\sum _{k\in L}\left( r_{i,k}-\bar{r_{i}}\right) }^2} \cdot \sqrt{{\sum _{k\in L}\left( r_{j,k}-\bar{r_{j}}\right) }^2}}, \end{aligned}$$
(5)

However, the Pearsons correlation coefficient is less reasonable when the number of co-ratings for users or items is very small. In addition to the three types of similarity described above, Jaccard and the mean squared deviation are two other commonly used methods. However, the Jaccard method only considers the number of co-ratings by the two users without considering the numerical rating given by the users on the items, whereas the mean squared variance is the opposite.

2.4 Collaborative Filtering with Neural Networks

Applications of neural networks have developed rapidly in computer vision, natural language processing, and other fields, but they have received less attention in CF. In a preliminary study, Salakhutdinov et al. [20] addressed the Netflix challenge using restricted Boltzmann machines. In addition, it is proved that the neural networks can discover nonlinear latent factors in heterogeneous data [21], which makes them promising tools for use in CF. Moreover, Strub and Mary [22] and Dziugaite et al. [23] directly trained autoencoders to predict the last ratings. In general, these neural network-based methods obtain better results than the traditional CF techniques. Thus, we consider using the autoencoder to improve the performance of CF.

3 Proposed Model

The detail of our methods is presented in this section. Firstly, we introduce the input data sparsification process employed in integrating the explicit trust information and the implicit trust information. A tuning parameter \(\alpha \) is employed to control the influence of adding noise to the training process of the model. Secondly, we present a method to integrate trust information into DAE. Finally, we choose a suitable similarity measure method to extract implicit trust information.

3.1 Input Sparsification

For rating prediction, data sparsity has always been a serious problem that affects the accuracy of the prediction results. Most previous studies that dealt with sparse inputs addressed this problem by precomputing estimates of the missing values [24]. However, we sparsify the inputs using the autoencoder itself [25]. First, the weighted edges between the input layer and the hidden layer can be limited via the random inhibition of the neurons in the input layer in the forward propagation process, thereby making the input vectors sparser. In the same way, some neurons in the output layer can be set to zero to limit the weighted edges between the output layer and the hidden layer in the backpropagation process, too. Finally, the impact of the sparsification operations can be measured by a tuning parameter \(\alpha \) in the loss function.

The concrete way of limiting the weighted edges between the input layer and hidden layer is to change the missing values into zero. In order to prevent the autoencoder from always returning zero, we use an empirical loss function that can disregard the loss of unknown values. That is, the missing values are unknown values in the forward propagation process, so the error during the backpropagation process required to train the model based on the reconstruction errors is also ignored. This operation is similar to removing the neurons with missing values in the methods proposed by Salakhutdinovet et al. [26] and Sedhain et al. [27].

Fig. 1
figure 1

The process of sparsifying inputs. The input vector is extracted from the user-item rating matrix. First, the missing values of the vector will turn to zero, and then the input is corrupted by masking noise. Before backpropagation, the error of missing values will turn to zero. Moreover, the denoising errors are reweighed by \(\alpha \) and reconstruction errors are reweighed by \((1- \alpha )\)

Finally, we integrate the method described above as well as the masking noise into the autoencoder. The input sparsification process is illustrated in Fig. 1. The neural network can achieve supervised learning by using this operation. The value of the original data can be overwritten with zero via the addition of masking noise, but the value actually exists and it is assumed to be missing, so the autoencoder is forced to learn the missing value in this setting.

In this case, the input vector x can be considered as including two parts, one is the element \(x'_{i}\) which is obtained by adding noise on \(x_{i}\) and the other is the element \(x_{i}\) which has not changed. The loss function is then modified to emphasize the denoising part of the neural network. The loss function comprises two parts based on a tuning parameter \(\alpha \):

$$\begin{aligned} L_{\alpha }\left( x,x'\right) =\alpha \left( \sum _{x'_{i}\in C\left( x'\right) }{\left[ \hat{x'} _{i} - x'_{i}\right] ^2} \right) +(1-\alpha )\left( \sum _{x_{i}\in N(x)}{\left[ \hat{x}_{i} -x_{i}\right] ^2}\right) , \end{aligned}$$
(6)

where \(\alpha \) is the tuning parameter for the denoising squared error, \((1-\alpha )\) is the tuning parameter for the reconstruction squared error, \(x'\in \mathbb {R}^N\) is the corrupted part of x, \(C(x')\) is the set of corrupted elements of x, N(x) is the set of the unchanged elements of x, \(\hat{x{'}_{i}}\) and \(\hat{x_{i}}\) are the ith outputs of the network.

3.2 Integrating Trust Information

In general, CF algorithms perform calculations using the user ratings for a range of items. However, due to the sparsity of the ratings, using the rating information alone is highly restrictive. Therefore, the recommendation performance can be enhanced if more abundant information can be obtained for the user or the item. Thus, we integrate the user trust information in the social networks with the user rating information together in order to improve the accuracy of our proposed method. We employ the DAE described in Sect. 2.2 as our base model in our proposed model. A straightforward approach is to inject the trust information into the input layer to train the model, and then denote the representations in the output layer as the final predicted ratings:

$$\begin{aligned} \hat{z} =\rho \left( W'\left( \rho \left( W^{T}\left\{ x_{i},t_{i}\right\} +b\right) \right) +b'\right) , \end{aligned}$$
(7)

where \(\hat{z}\) is the final representation of the output layer, \(W^{T}\in \mathbb {R}^{H\times (N+T)}\) and \(W^{'T}\in \mathbb {R}^{H\times N}\) are the weight matrices, respectively, \(t_i\in \mathbb {R}^{T}\) is the trust information for \(u_{i}\) , \(b\in \mathbb {R}^{H}\) and \(b'\in \mathbb {R}^{N}\) are the bias vectors, and \(\rho \) is a hyperbolic tangent function.

However, Ngiam et al. [28] showed that the correlations between ratings and trust data are highly nonlinear with different distributions. Thus, in order to solve this problem and learn more features from trust information, we inject the trust relationship into both the input layer and hidden layer of the autoencoder:

$$\begin{aligned} \hat{z} =\rho \left( V'\left\{ \rho \left( V^{T}\left\{ x_{i},t_{i}\right\} +b \right) ,t_{i}\right\} +b'\right) , \end{aligned}$$
(8)

However, if the dimension of the trust information is excessively large, the autoencoder may have difficulties in utilizing these data effectively. Thus, we enforce a constraint that the dimension of the input layer must be greater than the dimension of the hidden layer, and the latter must be greater than the dimension of the trust information [25], i.e., \(N\gg H\gg T\), where N denotes the dimension of the rating information of the input layer, which is the number of the neurons of rating information in the input layer. H denotes the dimension of the rating information of the hidden layer, which is the number of the neurons of the rating information in the hidden layer. And T denotes the dimension of the trust information integrated into the input layer or the hidden layer, which is the number of the neurons of the trust information of the input layer or the hidden layer. Finally, we obtain an autoencoder that can incorporate ratings and trust information, and we train it by backpropagation.

3.3 Extracting Implicit Trust Information

The accuracy of a recommender system can be improved effectively by integrating trust relationships. However, the explicit trust information is generally very sparse. Therefore, implicit trust information has attracted more and more attention. Recently, implicit information has been applied in many methods. For example, Zheng et al. [29, 30] proposed a model called IMPLICIT neural autoregressive distribution estimator for collaborative filtering (IMPLICIT CF-NADE) based on the CF-NADE model to exploit the implicit users feedback. And this method performs better than classical implicit matrix factorization [31]. However, exploring more implicit trust information between users is still a challenge. A common way to overcome this challenge is to employ similarity measures to assess the relationships between users. Papagelis et al. [32] proposed a method for measuring the degree of trust between two users using Pearsons correlation coefficient, they considered that this similarity could represent the degree of trust between two people to some extent. However, if there is only one co-rating between two users, then regardless of the rating score, the final similarity is 1, which is not consistent with common sense. Therefore, when calculating the similarity, Wang et al. [33] considered this problem and standardized the similarity to [0, 1]:

$$\begin{aligned} { Sim}_{i,j}={\left\{ \begin{array}{ll} 1, &{} \quad i=j ,\\ \left( 1- \frac{1}{n}\right) \left( \frac{{ pcc}(i,j)+1}{2}\right) , &{} \quad i\ne j. \end{array}\right. } \end{aligned}$$
(9)

where n is the number of co-ratings between two users. In addition, Wang et al. proposed to use a threshold \(\theta \) of similarity to infer the implicit trust relationship, the implicit trust relationship can be conformed only when the similarity is greater than the threshold \(\theta \):

$$\begin{aligned} t_{i,j}={\left\{ \begin{array}{ll} 1, &{}\quad \text {if}\quad { Sim}_{i,j}\ge \theta ,\\ 0, &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(10)

where \(t_{i,j}\) is the binary implicit trust relationship between user i and user j. In this paper, we use the method proposed by Wang et al. to extract the implicit trust relationship from the user rating information in order to extend the original trust information and obtain a new denser trust matrix. Another point to note is that though the implicit trust relationships are extracted from the rating information, the implicit trust information and the ratings are two types of data. We then integrate the implicit trust relationships into the TDAE model to obtain the TDAE++ model, and the experiments show that the TDAE++ model can produce more accurate results on three data sets. Moreover, in order to make the structure of the proposed models clearer, we show the overall structure of the models in Fig. 2.

In fact, the most important step in the process of extracting implicit trust information is how to determine the optimal value of \(\theta \). In our experiments, \(\theta \) is closely related to model TDAE++. The process of determining the optimal value of \(\theta \) is as follows. The implicit trust information is extracted from the rating vectors of two different users, and the larger value of \(\theta \), the less implicit trust links can be extracted. We search the different values of \(\theta \) on TDAE, then observe the experimental results of TDAE++ to get the optimal value of \(\theta \).

Fig. 2
figure 2

The structures of TDAE and TDAE++. The autoencoder has two kinds of inputs: one is the rating information and the other is the trust information. The trust information is injected into both the input and the hidden layer of the autoencoder. Moreover, the explicit trust is in the dark green background, and the implicit trust is in the light green background. (Color figure online)

3.4 Training Loss

We employ the following loss function to train the proposed models, after regularization, the loss function can be written as:

$$\begin{aligned} L_{\alpha }\left( x,x'\right)= & {} \alpha \left( \sum _{x'_{i}\in {C\left( x'\right) }}{ \left[ \hat{x'} _{i} - x'_{i}\right] ^2}\right) \nonumber \\&+\,(1-\alpha )\left( \sum _{x_i\in {N(x) }}{\left[ \hat{x}_{i} -x_{i}\right] ^2}\right) +\lambda \left( |W|^2_{{ Fro}}\right) , \end{aligned}$$
(11)

where W is the weight matrix and \(\lambda \) is the regularization hyperparameter. \(\lambda \) is used to control the learning degree of the model relative to the sample, which affects the generalizability of the model. When \(\lambda \) has a very small value, the model fits the training set better, but it also increases the risk of overfitting. When \(\lambda \) is zero, this is equivalent to the formula without a regularization term, and the model does not have any protection against overfitting. When \(\lambda \) has a large value, the regularization effect increases, but the risk of underfitting the model also increases. Thus, it is necessary to manually debug the value of \(\lambda \) and finally determine the appropriate value of \(\lambda \) according to the experimental results.

Moreover, during the training process of the model, the weight matrix W is initialized with the fan-in rule \(W_{i, j}\sim \left[ -\frac{1}{\sqrt{n}} , \frac{1}{\sqrt{n}}\right] \), and W is then optimized by stochastic gradient descent (SGD), where the mini-batch size is 35. SGD is an effective method for training deep networks and SGD variants can obtain state-of-the-art performance. SGD minimizes the loss by optimizing the network parameter W:

$$\begin{aligned} W = \begin{matrix} { argmin}\\ W \end{matrix} \frac{1}{N}\sum ^N_{i=1}l(x_i , W), \end{aligned}$$
(12)

where \(x_{1}, x_{2},\ldots ,x_{N}\) are the elements in the training data set. And we will consider a mini-batch \(x_{1}, x_{2},\ldots ,x_{M}\) of size M at each step of using SGD. The mini-batch is used to approximate the gradient of the loss function with respect to the parameters by computing:

$$\begin{aligned} \frac{1}{M} \frac{\partial l(x_{i}, W)}{\partial W}, \end{aligned}$$
(13)

The process employed of updating W is as follows:

$$\begin{aligned} W' = W - \frac{\gamma }{M}\sum ^M_{i=1}\frac{\partial l\left( x_{i}, W\right) }{\partial W}, \end{aligned}$$
(14)

where \(\gamma \) is the learning rate.

4 Experiments

4.1 Data Sets

We use three popular data sets in our experiments: Filmtrust, Epinions, and Douban. The Filmtrust data set contains 35,497 ratings for 2071 movies provided by 1508 users according to a ratings range from 0 to 4. This data set also includes 1853 trusted relationships provided by 609 users. The Epinions data set contains 40,289 users and 139,738 movies. The total number of movie ratings is 662,824 and there are 487,183 claimed social relationships. The Douban data set contains 129,490 users and 58,541 movies. The total number of movie ratings is 16,830,839 and there are 1,692,952 claimed social relationships. The statistics for these data sets are shown in Table 1.

Table 1 Data set statistics

4.2 Comparisons

We focus on the CF problem, so it is reasonable to compare our proposed models with other methods for rating prediction tasks, such as TrustSVD and DAE. Thus, we select several state-of-the-art algorithms to compare and evaluate our models. Brief descriptions of the baseline models are given as follows:

  • SVD++ [8]. This algorithm predicts ratings based on the users history browsing data and history ratings as implicit feedback by employing SVD.

  • TrustSVD [9]. This is a simple and widely used algorithm for rating prediction. Explicit trust information is added to SVD++ to obtain more accurate predictions.

  • Probabilistic matrix factorization (PMF) [34]. PMF employs a probabilistic model based on Regularized MF for further optimization. It can predict unknown values of the ratings matrix based on the user and the item feature matrices.

  • TrustMF [35]. The TrustMF algorithm adds trust information based on MF to improve the performance of the model.

  • DAE [18]. DAE improves the accuracy of predicting ratings with the classical autoencoder by adding noise to the input data. In addition, this DAE refers to the model for rating prediction in 2016 [25].

4.3 Evaluation Metrics

Two evaluation metrics are used to measure the performance of our proposed models: mean absolute error (MAE) and root mean squared error (RMSE). When the values of the these two evaluation indicators are smaller, the performance of models is better. For a user i, the real rating on item k is \(r_{i,k}\), the predicted rating is \(\hat{r}_{i,k}\), and T is the number of ratings in the test set.

  • MAE is defined as:

    $$\begin{aligned} { MAE}=\frac{\sum _{i,k}|r_{i,k}-{{\hat{r}}_{i,k}}|}{T}, \end{aligned}$$
    (15)
  • RMSE is defined as:

    $$\begin{aligned} { RMSE}=\sqrt{\frac{\sum _{i,k}{\left( r_{i,k}-{{\hat{r}}_{i,k}}\right) ^2}}{T}}. \end{aligned}$$
    (16)

4.4 Implementation Details

We consider three data sets to provide the implementation details in our experiments. Two main types of parameters can affect the performance of models: the parameter used to partition the data set and the parameters employed in the training process of the model. Thus, we explain the impact of each parameter in the following paragraphs.

We compare the effects of the proposed models with different training ratios on Filmtrust data set and Table 2 shows that the accuracy of the models improve significantly when the training ratio is 90/10%. Thus, we use 90% of the data set as the training set and 10% as the test set for all of the neural network models.

We train a two-layer autoencoder on each data set to assess the influence of the model parameters, where the neural network is optimized by stochastic backpropagation with a mini-batch of size 35, and a weight decay is added for regularization. The activation functions are hyperbolic tangents. We perform a series of comparative experiments on the proposed models to determine the optimal parameters.

In our experiments, we use \(\alpha \) as a tuning parameter to constrain the effect of adding noise on the autoencoder. We set the initial value of \(\alpha \) to 0.05 and then increase it incrementally to 0.95 at a step size of 0.05. Figure 3 shows the gradual change process of RMSE as the value of \(\alpha \) increases. The performance of the neural network is the best when the parameter \(\alpha \) is 0.55 on three data sets.

Selecting an appropriate epoch of the model can improve the efficiency of the experiment. Our experiments show that the model works best on Filmtrust data set when the epoch is 30. In addition, for Epinions and Douban data sets, the time cost and the model performance can be balanced to the greatest extent when the epoch is 20. And the process of determining the optimal epoch for each data set is showed in Fig. 4.

Table 2 Influence of the training ratio on Filmtrust data set
Fig. 3
figure 3

Influence of alpha (\(\alpha \)) on RMSE. a Filmtrust. b Epinions. c Douban

Fig. 4
figure 4

Influence of epochs on RMSE. a Filmtrust. b Epinions. c Douban

Fig. 5
figure 5

Influence of layersize on RMSE. a Filmtrust. b Epinions. c Douban

The parameter with the greatest impact on the accuracy of the model is the size of the hidden layer, i.e., the number of neurons in the hidden layer, which reflects the ability of the hidden layer to learn the semantic representations of users. The user features can not be extracted if this number is excessively small, however, the transformation between the layers will be similar to the identity transformation if the value is excessively large. In the experiments, we employ a two-layer autoencoder in order to determine the optimal number of neurons in the hidden layer. As Fig. 5 shows, the RMSE of the model is the lowest when the layer with 170 neurons on Filmtrust data set. In addition, the best results are obtained when the hidden layer with 1170 neurons on Epinions data set, and 1000 neurons on Douban data set.

4.5 Results and Analyses

We conduct a series of experiments to evaluate the performance of the proposed models. For each data set, the parameter settings of the DAE-based models are the same. All the experiments can be divided into three types. The first type is to compare the effects of the proposed models and the baseline models, the second type is to analyse the influence of the explicit trust information, the last type is to analyse the effect of integrating the implicit trust information. In addtion, we give the results obtained from DAE [25] with the same experimental parameters settings of the DAE-based models on Douban data set.

  • Comparison with baseline models

We compare the performance of the proposed models with baseline models in Table 3. As Table 3 shows, DAE performs better than the traditional CF methods. There are two reasons for the results. First, the traditional methods are based on the linear transformation which can not learn the nonlinear relationship from data. However, the model based on neural network can avoid this defect. Since the autoencoder has the ability to find out the deeper and richer semantic information with a nonlinear activation function, it can extract the information from the data more effectively. Second, the neural network can not only exploit the explicit information, but also help to find the implicit trust relationships between users. In addition, it can be seen that the accuracy of TDAE and TDAE++ is better than DAE on Filmtrust data set and Epinions data set. That is, the neural network can model users more accurately after integrating the trust information. However, since the trust density of Douban is too low, it is difficult to use the neural network to learn more accurate user representations by the small amount of explicit trust information. Thus the performance of the model is limited, and the improvement of the proposed models on Douban data set is not obvious.

Table 3 RMSE with a training/testing set of 90/10%
  • Influence of explicit trust information

In Table 4, we compare the effects of TDAE and DAE to illustrate the influence of the explicit trust information. And all experimental parameters of TDAE and DAE are the same on each data set, the only difference between DAE and TDAE is that TDAE integrates the explicit trust information between users. In fact, user modeling by rating information and by trust information are from different perspectives, thus the integration of these two types of information can obtain more accurate user representations. As Table 4 shows, the effect of adding explicit trust information into TDAE on Filmtrust and Epinions is obvious, which shows that the neural network can obtain more accurate user representations by integrating the explicit trust information. However, the trust density is only 0.01% in Douban data set, thus it is difficult for the neural network to utilize the sparse trust information and the improvement of TDAE on Douban data set is not significant. In addition, the different density of the trust information result in different enhancement of the experimental performance, the improvement in RMSE is higher when the trust density is larger. For example, the RMSE of TDAE is increased by 3.4369 and 1.318% on Filmtrust and Epinions data set respectively, while on Douban data set, its improvement is only 0.2075%. As a result, we can see that our proposed model TDAE performs better on the data sets with denser trust information or richer social information between users.

Table 4 Influence of integrating explicit trust information
  • Influence of implicit trust information

We extract the implicit trust relationships from the rating information based on a similarity measure. As Table 5 shows, the model TDAE++ performs better than TDAE, which shows that the implicit trust can improve the accuracy of predictions. However, the implicit trust information is only a supplement to the explicit trust information, and it actually plays a role of propulsion, so the performance of model TDAE++ and TDAE on the three data sets is similar. Thus, the effect of TDAE++ improves obviously on Filmtrust data set and Epinions data set, the promotion effect on Douban is not significant. A possible reason is that the user ratings are more focused on the some popular items, and so a part of the extracted implicit trust information is repetitive with the explicit trust information in Douban data set. Therefore, we can see that TDAE++ can get better results on the data sets with large trust information density.

In addition, the model TDAE++ integrates the trust information on the basis of sparse ratings to enrich each users features, the achieved results can show that the model TDAE++ can also help to alleviate the cold start problem.

Table 5 Influence of integrating implicit trust information

5 Conclusion

In this paper, we propose two trust-aware neural network models for CF. In order to alleviate the data sparsity problem and the cold start problem, the proposed model TDAE integrates the user rating information and the explicit trust information together to improve the accuracy of the recommender systems. In addition, in order to overcome the sparsity of the explicit trust information, we use a similarity measure to extract the implicit trust relationships. The model that integrates both the explicit and implicit trust information is called TDAE++.

In future research, some possible enhancements may improve the accuracy of our proposed models. First, other types of information such as comment text and descriptive information for items can be employed as side information. For example, it will be helpful to utilize users comment text to extract the implicit trust information between users in Douban data set. Second, different types of autoencoders such as a stacked autoencoder or sparse autoencoder can be used in the model. Third, other types of neural network can be used, such as convolutional neural network (CNN) or recurrent neural network (RNN).