Keywords

1 Introduction

Nowadays, the recommender system is playing an important role in information filtering. The basic technique of recommender systems is collaborative filtering, which depends on the collective historical ratings to predict items that will be positively rated by the active user [10]. However, traditional rating-based only recommenders may suffer from the problem of data sparsity since most users rate few of the millions of available items. With the prosperity of social platforms, abundant social information can be harnessed to mitigate the problem, for the reason that the users’ preferences can be inferred from those of their friends [18]. Based on this startup idea, social recommender systems [15] emerged and have attracted increasing attention over the years. Nevertheless, recent reports [1, 2] show that social recommenders fail in the practical use in industry. Key findings from negative experiences in applying social recommender systems include: (1) social relations are too noisy and may have a negative impact; (2) social relations have different interpretations in different contexts [20]. Therefore, utilizing the explicit social relations without further processing will lead to a degradation in recommendation quality.

To diminish the aforementioned negative influence, the ways which can reveal the implicit nature of the social relations should be explored. Most existing social recommenders are based on matrix factorization [7], a CF model which usually decomposes the observed user-item rating matrix into user and item latent factors. But almost all of them utilize the explicit social relations directly. Enlightened by the recent success of word embedding models such as word2vec [16], we consider that users in recommender systems can be embedded likewise. The prior work [8] proves that the skip-gram model of word2vec is equivalent to the implicit factorization for the word co-occurrence matrix. Based on this work and CF, our model, SocialEM is proposed.

In the model of SocialEM, the user-item rating matrix and the user-user co-occurrence matrix are jointly decomposed with shared user latent factors. For each pair of users, the co-occurrence matrix encodes the number of being trusted together by other users in the social relation network. With the help of the user co-occurrence embedding, the recommender can further expand its model capacity and eliminate the side effect of explicit social relations. Firstly, learned with a context, the embeddings of users can uncover more interactions among users. Secondly, by tuning the minimal co-occurrence counts, users with few attention will be regarded as the anomalies which should be filtered. The experiments conducted on real-world datasets show that SocialEM is superior to state-of-the-art methods in terms of recommendation accuracy. At last, we also provide exploratory analysis to better understand the effectiveness of SocialEM.

The remainder of this paper is organised as follows. In Sect. 2, the preliminaries are briefly introduced. Section 3 will focus on the proposed method - SocialEM. Section 4 reports the experimental results. In Sect. 5, the related studies are introduced. Finally, in Sect. 6 we conclude this paper and point out some potential future work.

2 Preliminaries

In this section, we will briefly introduce matrix factorization and word embedding, which are the pillars of the SocialEM model.

Matrix Factorization. Generally, most model-based CF methods are based on matrix factorization [7], which maps both users and items into a low-dimensional latent-factor space such that user-item interactions are modeled as inner products in that space and user and item latent factors are denoted by \(p_{u}\in \mathbb {R}^{k}\) and \(q_{i}\in \mathbb {R}^{k}\), respectively. The resulting dot product, \(p_{u}^{T}q_{i}\), can capture the interaction between user u and item i, which is used to predict the missing ratings.

Let \(I_{ui}\) be an indicator of whether a rating has been observed in the user-item rating matrix R,

$$\begin{aligned} I_{ui}= \left\{ \begin{array}{ll} 1 &{} if \ r_{ui}\mathrm {\ is\ observed}\\ 0 &{} if \ r_{ui}\mathrm {\ is\ missing} \end{array} \right. \end{aligned}$$
(1)

Matrix factorization optimizes the following objective:

$$\begin{aligned} \mathcal {L}=\frac{1}{2}\sum _{u, i}I_{ui}(r_{ui}-p^{T}_{u}q_{i})^{2}+\lambda (\Vert p_{u}\Vert ^{2}+\Vert q_{i}\Vert ^{2}), \end{aligned}$$
(2)

where \(r_{ui}\) denotes the rating expressed by user u on item i and the algorithmic parameter \(\lambda \) controls the magnitudes of the latent factors.

Word Embedding. Word embedding models [5, 6, 16] have caught much attention in a variety of NLP tasks recently for its extraordinary performance. The skip-gram model trained with negative sampling value of k in word2vec [16] has been proven to be equivalent to the implicit factorization for the point-wise mutual information (PMI) matrix shifted by k [8]. PMI between a word w and its context word c can be empirically estimated as:

$$\begin{aligned} PMI(i,j)=\log \frac{\#(w,c)\cdot |D|}{\#(w)\cdot \#(c)}, \end{aligned}$$
(3)

where \(\#(w, c)\) is the number of times word w appears in the context of word c, \(\#(w) = \sum _{c}{\#(w,c)}\), \(\#(c)=\sum _{w}{\#(w, c)}\), and |D| is the total number of the pairs in the corpora.

When PMI has been constructed, the distributed representation of word w is obtained by factorizing the shifted positive point-wise mutual information matrix (SPPMI) which is defined as:

$$\begin{aligned} SPPMI_{k}(w, c)=\max (PMI(w, c)-\log k,0), \end{aligned}$$
(4)

where k is the number of the negative sampling that controls the sparsity of the SPPMI matrix here.

3 SocialEM: Integrating User Embedding and CF for Social Recommendations

In this section, we propose a Social recommender that combines user Embedding and Matrix factorization, called SocialEM.

User Embedding. In word2vec, given a center word, the sequence of words surrounded are defined as the context of the center word. Likewise, in social recommender systems we can define the context of a user u as other users who are trusted together with u by a same user. For example, both \(u_{1}\) and \(u_{2}\) are trusted by \(u_{3}\), therefore, \(u_{1}\) and \(u_{2}\) are contexts of each other. Then, the user co-occurrence SPPMI matrix \(T\in \mathbb {R}^{m\times m}\) is constructed by computing \(\#(i, j)\) that denotes the number of times both user i and user j are trusted by other users. After that, we can obtain user embedding by factorizing T. In contrast to the explicit trust or relation, co-occurrences can express the trust degree or affection in a finer grained view. Factorizing SPPMI matrix further reveals the implicit interactions among users in the social network. In addition, by tuning the value of k, noises in the social relations can be neglected whereas available connections are preserved.

Training Procedure. SocialEM has two main components: factorization for the user-item rating matrix and the user-user co-occurrence matrix. To fusing both the rating and social information into the latent factors, we jointly decompose the two matrices with a shared user latent space. Factorizing the rating matrix encodes users’ preferences and items’ characteristics; factorizing the co-occurrence matrix can extract the implicit connection pattern among the user network. Sharing a same user latent space ensures that both preferences and the pattern are included in the user latent factors. After training, low dimensional shared user latent factors \(P\in \mathbb {R}^{m\times d}\), item latent factors \(Q\in \mathbb {R}^{n\times d}\), and the context embedding \(G\in \mathbb {R}^{m\times d}\) are obtained. Then, missing ratings can be predicted by the dot products of shared user latent factors and item latent factors.

Model Formulation. The objective function of SocialEM is stated as:

$$\begin{aligned} \begin{aligned} \mathcal {L}=&\overbrace{\frac{1}{2}\sum _{u, i}I_{ui}(r_{ui}-P^{T}_{u}Q_{i}-b_{u}-b_{i})^{2}}^\mathrm{{Factorization\ for\ the\ rating \ matrix}}+\overbrace{\frac{\alpha }{2}\sum _{t_{uj}\ne 0}(t_{uj}-P^{T}_{u}G_{j}-w_{u}-c_{j})^{2}}^\mathrm{{Social\ user\ embedding}}\\&+ \, \overbrace{\frac{\lambda }{2} (\sum _{u}(P_{u}^{2}+G_{u}^2+b_{u}^{2}+w_{u}^{2}+c_{u}^{2})+\sum _{i}(Q_{i}^{2}+b_{i}^{2}))}^\mathrm{{Regularization}}. \end{aligned} \end{aligned}$$
(5)

As can be seen in Eq. 5, the objective function has three parts. In the part of factorization for the rating matrix, the errors between observed ratings and predicted ratings are quadratically cumulated. \(b_{u}\) and \(b_{i}\) indicate the deviations of user u and item i. In the part of social user embedding, the user co-occurrence matrix is factorized. The user latent factors connect these two parts and account for both user-item interactions and user-user co-occurrence. \(w_{u}\) and \(c_{j}\) are the biases of the user and context. In the last part, all model parameters are regularized. Besides, to balance the effect of the rating information and social information, we introduce a hyper-parameter \(\alpha \). The impact of \(\alpha \) will be investigated in Sect. 4.

In [8], several optimizers are available to identify good solutions for factorizing the word co-occurrence matrix. In our work, we use stochastic gradient descent because it works very efficiently in the case of redundant data. A local minimum of the objective function given by Eq. 5 can be found by performing gradient descent in all model parameters:

(6)

where \(e_{ui}\) denotes the error between the observed rating and the predicted rating, \(e_{uj}\) denotes the error between co-occurrences and the dot products of user latent factors and context embedding, coupled with biases.

Complexity. With regard to complexity, compared with matrix factorization, SocialEM needs to learn only an additional context embedding matrix with \(m\,\times \,d\) elements. Since d is a small number, which is consistent with the dimensionality of the latent factors of users and items, the cost is not be computationally expensive. However, SocialEM computes the user-user co-occurrence with a \(O(n^{2})\) complexity at the beginning, which is time-consuming. To save time, the user-user co-occurrence should be computed off-line. Fortunately, the computation for co-occurrence is new-user-friendly. When a new user enters, SPPMI merely has few entries which are related to the new user need to be updated.

4 Experimental Results

In this section, we present the experimental results. Two types of experiments are conducted: (1) the recommendation quality of the previous models and SocialEM is compared; (2) the impact of the parameters \(\alpha \) and k is investigated.

Two common real-world datasets, Ciao and Epinions, are used in our experiments. The statistics of the datasets released by [15, 19] are shown in Table 1. The mean absolute error (MAE) and root mean square error (RMSE) are chosen to measure the prediction error for all methods. A lower MAE or RMSE indicates that missing ratings are predicted more precisely.

Table 1. Dataset statistics

To demonstrate the performance improvement of SocialEM, we compare our method with the following methods: PMF [17], SoRec [13], SocialMF [4], and RSTE [11].

To tune the methods, we use 80% of the data as the training set, from which we randomly select 10% as the validation set. For the remaining 20% of the data, we consider the users and items that appear in the training and validation sets to obtain the test set. We record the best parameters of these methods according to their performance. Afterwards, all the experiments are performed with 5-fold cross validation (Table 2).

Table 2. The MAE and RMSE for all methods

4.1 Performance for Predicting Missing Ratings

In this section, we compare the performance of SocialEM with that of the baselines. We set the step size \(\gamma \) = 0.05, \(\alpha \) = 0.5 on Ciao and 1 on Epinions, and \(k=5\) for SocialEM. The dimensionality d of the latent factors is set to 20, and the regularization parameter \(\lambda \) is set to 0.01 for all the methods.

As can be seen in Table 1, SocialEM is superior to the corresponding methods which utilize explicit social relations directly, along with two baselines. The main findings are summarized as follows: (1) SocialEM beats PMF by a convincing margin, demonstrating the effectiveness of incorporating social relations; (2) SocialEM outperforms social-related methods by an average improvement of 2.44% on MAE, and 3.84% on RMSE, which proves that adopting user embeddings to reveal the implicit interactions among users is promising. In particular, both SocialEM and SoRec jointly decompose the user-item rating matrix and the relation matrix with a shared user latent space. The difference on performance further validates the positive effect of user embeddings.

4.2 Impact of Parameters \(\alpha \) and k

In SocialEM, two hyper-parameters, \(\alpha \) and k, are introduced. \(\alpha \) is the trade-off used to balance the effect of ratings and relations; k controls the sparsity of the user co-occurrence matrix. First, we fix \(k=5\) and increase the value of \(\eta \) from 0 to 10 with different intervals to observe the variation in the performance. Then, we fix \(\alpha =0.1\) to observe the influence of changing k. The experiments are conducted on both datasets, and the other settings are the same as those in Sect. 4.2.

Fig. 1.
figure 1

Impact of parameter \(\alpha \)

From Fig. 1, we can witness the fluctuation. Initially, SocialEM has poor performances when \(\alpha =0\), which illustrates the importance of considering social relations. Then the curves decline until that \(\alpha =0.5\) on Ciao and 2 on Epinions where the best performance is reached. And then the curves climb back to a poor level with the increase of \(\alpha \). The observed changes once again emphasize the effectiveness of social relations and also exhibit a proper range of the trade-off on different datasets.

Fig. 2.
figure 2

Impact of parameter \(\alpha \)

As can be seen in Fig. 2, SocialEM reaches its best performance when \(k=2^{5}\). Neither larger values nor smaller values lead to a performance degradation. Specifically, larger k value means a more sparse SPPMI matrix, otherwise a more dense one. The result demonstrates that user pairs with few co-occurrence times can be the noises which negatively impact the accuracy. Because the few co-occurrences are likely to derive from the coincidence, they can not uncover the real interactions among users. In addition, a more sparse SPPMI matrix leaves out finer grained interactions and only considers user pairs with many co-occurrence times. Therefore, the predictions for users with few relations may fail. According to the result, we can draw a conclusion that finding a proper k helps to relieve the problem of noises in social relations.

5 Related Work

Complementing matrix factorization with further data, e.g., implicit feedback, temporal information or predefined metadata, has widely been accepted to increase algorithmic accuracy. Because of the convenience of matrix factorization for incorporating social information, several social recommenders based on matrix factorization have been proposed recently [3, 4, 11,12,13,14, 21].

In [11], the authors proposed an ensemble method that considers the preference of users to be determined by their own tastes and their friends’ tastes. Therefore, the missing ratings of a given user are predicted as a linear combination of the ratings of the user and his friends. [13] proposed a model to co-factorize the rating matrix and the relation matrix into three low-rank matrices. In this model, social information and rating information are connected through the shared user latent feature space. In [21], two trust models were proposed. In contrast to other social recommenders, this work considers that both the trustors and trustees can affect users’ preferences. The authors of [4] proposed a method named SocialMF. The idea behind SocialMF is that a user’s preferences should be similar to those of his friends. Thus, for a given user, the method forces the latent factors to be close to that of his friends. In [12], a method was proposed that takes both trust and distrust into consideration. The authors deemed that each user’s latent factors should be similar to those of his friends and differ widely from those of distrusted users. However, all these methods use social relations directly, which makes these methods less reliable.

Among the existing research, the principle of [9, 22] is similar to that of our method. In [9], the authors investigated the interactions among items. It is the earliest work which combines word embeddings with CF, but it ignores the potential value in social relations. [22] adopts the skip-gram model of word2vec to embeds the social network structure. Then it forces the user latent factors to approximate the top-k friends embedding. This model has been prove to be effective, however, it is time-consuming for the construction of network embeddings.

6 Conclusion

This paper is motivated by the recent success of the word embedding model, word2vec, which can be interpreted as the implicit factorization for the word co-occurrence matrix. Based on this work, we proposed a novel social recommender called SocialEM that combines user embedding and CF. For each pair of users, the user co-occurrence matrix encodes the number of being trusted together by other users in the social relation network. Then SocialEM jointly decomposes the user-item rating matrix and the user co-occurrence matrix with shared latent user factors, and next uses the dot products of user latent factors and item latent factors to predict missing ratings. Experiments conducted on real-word datasets show that SocialEM outperforms state-of-the-art methods in terms of recommendation accuracy.

In contrast to general CF-based recommenders, models which integrates user embeddings can further reveal implicit interactions among users. Therefore, the extensions of our model can be applied to other fields related to recommender systems. For instance, spammers who are known as shilling attackers can insert malicious ratings to manipulate the recommendations, which results in different user profiles. User embeddings can uncover the nuance between normal users and spammers, which will contribute to the spammer detection.