1 Introduction

With the exponential growth of network information, improving the utilization efficiency of information and alleviating the problem of information overload become a pressing research issue. Recommender systems provide a promising solution to the above problems [2], and plays an important role in e-commerce, information retrieval, e-tourism, online advertising, mobile applications and so on [3, 19, 24]. For example, according to a survey from VentureBeat, 35% sales of the online retail giant Amazon are materialized by its recommender system, and the researchers of the online DVD rental company Netflix pointed out in their 2011 report that 60% users are able to find their interested movies and videos from the company’s recommender system.

With the development of online social network, people are accustomed to commenting items such as commodities that they have purchased or movies that they have seen before in online social network, and shared their experiences with friends. Through these comments, other people can assess the quality of items and choose suitable or interested items according to their own preference. From the perspective of sociology and psychology, the choice of item selection is affected by many factors. For example, before choosing a movie to watch, one may be affected by a variety of factors such as his own previous experience, movie trailers, his favorite types of movie or friends’ recommendations, and even his own feelings and environment at that time. All these factors exert an influence to different degree on their final choice. In order to improve the accuracy of recommender systems, the recommended model used should take into account as many important factors as possible [8, 12, 25, 35], leading to a model with comprehensive information to improve the accuracy of the model.

Generally speaking, two types of recommender systems have been investigated in literature: the content-based systems and the collaborative filtering-based systems. The content-based recommendations originated from the area of information retrieval [51]. Content-based recommender systems use content information of items to find the match between items and users. In [52], the keywords of purchased books of a user are used to find other books that contain the same or similar keywords for recommendation. Content-based recommendations typically require external information that might not be available or easy to collect, which leads to a major disadvantage of the content-based systems that they may not have a satisfactory accuracy. Yet, the traditional content-based approaches have advantages in some aspects, such as data sparsity and new item problem [3].

Unlike the content-based recommendation methods, the collaborative recommender systems try to predict the utility of items for a particular user based on the items previously rated by other users. When users have supplied sufficient ratings, the collaborative recommender systems are generally more accurate than the content-based techniques, as evidenced by [23, 27, 29, 50, 51]. Several collaborative filtering-based systems have recently been proposed to improve recommendation accuracy by using social trust [12, 16, 17, 30, 48, 51, 52]. Yang et al. proposed to use the concept of inferred trust circle based on social network to recommend user favorite items [48]. In [17], M. Jamali et al. incorporated the mechanism of trust propagation into the recommendation model. In [12], a user model with trust and distrust networks was proposed to identify trustworthy users. Due to the preference similarity, user latent features should be similar to his/her friends based on the probabilistic matrix factorization model [33, 47, 49, 54]. In [47], Qian et al. proposed a personalized recommender model (PRM) combining user interpersonal interest similarity, interpersonal influence and personal interest factor. The main contribution of this paper is the proposal of a new method to make use of the category information of products, and user personal interest. In [49], Hu et al. proposed a factor model treating the data as an indication of positive and negative preference associated with vastly varying confidence levels. Zhao et al. proposed a user-service rating prediction approach by exploring social users rating behaviors [54]. In their work, four factors including user personal interest, interpersonal interest similarity, interpersonal rating behavior similarity and interpersonal rating behavior diffusion were fused into the matrix factorization model. Many recent studies have begun to mine text reviews to improve prediction accuracy for collaborative filtering recommender system. McAuley et al. proposed a HFT model which combines latent rating dimensions with latent review topics [31]. Lei and Qian et al. proposed a sentiment-based rating prediction method (RPS) to improve prediction accuracy in recommender systems which use LDA to mine the topic of reviews [25]. Jiang et al. proposed an author topic model-based collaborative filtering (ATCF) method to facilitate comprehensive points of interest (POIs) recommendations for social users [20]. Additionally, contextual information is exploited to reduce the prediction error in collaborative filtering-based system. In [26], Liu et al. handled contextual information by applying random decision trees to partition the original user-item rating matrix such that the ratings with similar contexts are grouped based on matrix factorization method. A. Akther et al. designed a new architecture for user personalization which combines both social network data and context data [4]. Zhao et al. proposed a LBRP model which make a full use of mobile users location sensitive characteristics to carry out rating prediction [53]. Their work demonstrates that human’s rating behaviors are significantly affected by the contextual information of their geographical locations. A major appeal of the collaborative filtering-based systems is that they can deal with any kinds of content and recommend any items [1, 14, 32, 42], and they can address data aspects that are often elusive and difficult to profile when using the content-based systems. However, there are two critical issues in the collaborative recommender systems regarding new users and new items, which are termed the cold start problem. Because newly added users have few or no ratings in online social networks, it’s difficult for recommender systems to quantify users’ preferences.

In order to overcome the shortcomings of the content-based and collaborative filtering-based systems, several recommender systems use hybrid approaches by combining content-based and collaborative filtering-based mechanisms [7, 22, 26]. In [50], K. Yoshii et al. proposed an efficient hybrid music recommender system, and Wei et al. developed a market-based recommender system that allows multiple agents to compete with each other to present their best recommendations to users [35]. In [10], Cao et al. presented another novel idea and proposed a hybrid collaborative filtering algorithm for bidirectional Web service recommendation which confirms that the collaborative recommendation is more efficient once again. In [9], Braunhofer proposed a context-aware recommender system using various hybridization techniques to improve the prediction accuracy. In the area of e-commerce, Chen considered both the overall ratings and feature-level opinion values to identify reviewers preference homogeneity [11]. In [12, 14, 17, 23, 47], the cold start problem is solved to different degree, but is still far from sufficient, because the existing methods only use social relations such as social trust,neighbor relations, etc, to help the recommender system to alleviate the cold start problem, without taking into account the useful content-based information about the features of users and items involved.

In this paper, we incorporate some important content-based characteristics into a collaborative approach based on matrix factorization method. We proposed a novel social network hybrid recommender system based on hypergraph topologic structure. For simplicity of presentation, we call it the HMF model (stands for Hybrid Matrix Factorization model). The technical contributions of this paper are summarized as follows:

  1. 1.

    We propose a novel HMF model which fuses important contextual information, including users ratings, item feature, user feature, into the matrix factorization approach;

  2. 2.

    We build a hypergraph topologic structure to represent the social network and theoretically analyze the basic concepts of the HMF model;

  3. 3.

    We use the k-Modes algorithm to cluster user-item datasets based on the contextual information which can effectively enhance the data correlation of sub-datasets and improve the recommendation accuracy;

  4. 4.

    Extensive experimental evaluation on publicly available datasets demonstrates that our proposed hybrid recommender system outperforms the existing recommender systems in tackling cold start problem and dealing with sparse rating datasets. Our system also enjoys improved recommendation accuracy compared with several major existing recommendation approaches.

The rest of the paper is organized as follows. We describe the related work and several typical recommender models in Section 2. In Section 3, we present a hypergraph structure in social networks and describe the relations through mathematical definitions. On the basis of the hypergraph structure introduced in Section 3, we build a comprehensive recommender model and a corresponding training method in Section 4. We present experimental evaluation results of our recommender system which involves a performance comparison with several major existing recommender models in Section 5. The paper is concluded with a summary and a discussion on future research directions in the final section.

2 Related work and preliminaries

In this section, we first present a survey on the recent related work . Then, several major recommendation approaches in the domain are introduced which are mostly relevant to our work and are all based on the technique of matrix factorization.

2.1 Related work

Recommender systems are generally classified into three categories: content-based systems, collaborative filtering-based systems, and hybrid systems [2]. Content-based recommender systems mainly select highly similar items with respect to user preference [14, 36, 40]. Collaborative filtering-based technology is a more efficient approach for recommender systems [1, 7, 34, 39, 47, 51]. Based on the rating profiles of the target user, collaborative filtering-based recommender systems calculate profile similarity between the target user and the existing users, and recommend the items of the interest of the similar existing users to the target user. The hybrid recommender systems are designed to overcome the limitations of single recommendation technology [5, 9, 50]. Such hybrid systems typically combine a collaborative filtering-based approach with a content-based approach in order to produce more accurate recommendation.

In recent years, matrix factorization method is applied to collaborative filtering recommendation, which effectively improves the accuracy of the recommender systems. In [28], Lu et al. demonstrated that mine structural information of the original data can improve efficiency and lessen the interference of noise to some extent based on matrix factorization. So, matrix factorization techniques is employed by collaborative recommender systems, and these methods have become popular in recent years featuring good scalability [13, 17, 24, 29, 39, 42, 43]. In [33], Mnih et al. first presented a probabilistic matrix factorization model without taking into account any social factors. Subsequently, many researchers incorporate social factors such as social trust, interpersonal influence, geographical location, user sentiment and so on into matrix factorization model [6, 25, 30, 31, 47, 48, 53, 54]. These studies show that social factors can improve the accuracy of recommendations. Besides the aforementioned models, there are a body of variant models based on matrix factorization method. A distributed recommendation algorithm was proposed in [46]. This algorithm enjoys a better performance in terms of time and when dealing with the cold start problem. In addition, Liu et al. presented a recommender system which employed the matrix factorization techniques and aided context-aware information [26]. In [48], Yang et al. proposed circle-based recommendation models which is a novel idea based on matrix factorization method. The accuracy of the models mentioned above is impressive when an abundance of rating information is available. But when the users only have few ratings, which is the case for most real-world sparse datasets [21, 37, 44], the performance of these models becomes unsatisfactory. However, these methods metioned above need enough ratings data or text reviews to get the explicit or implicit social information. So they still trapped in the cold start to some extent.

2.2 Compared algorithms

In this section, we will review several typical recommendation techniques based on matrix factorization method.

2.2.1 BaseMF model

To start with, we briefly review the basic low-rank matrix factorization (BaseMF) approach [33], which does not take any social factors into consideration. This method can efficiently handle large datasets. In comparison with the traditional collaborative filtering methods, BaseMF features a better performance. In details, BaseMF model is trained on the observed rating data by minimizing the following objective function:

$$ {\Psi} \left( {R,P,Q} \right) = \frac{1}{2}{\sum\limits_{({u_{i}},{s_{j}}) \in T} {\left( {{R_{ij}} - {{\hat R}_{ij}}} \right)}^{2}} + \frac{\lambda }{2}\left( {\left\| P \right\|_{F}^{2} + \left\| Q \right\|_{F}^{2}} \right) $$
(1)
$$ \hat R = r + P{Q^{T}} $$
(2)

Where T denotes the training dataset, R i j is the real rating values for item s j from user u i in T, P and Q are the user and item latent feature matrices, λ is the regularization constant. ∥P F and ∥Q F are the Frobenius norm of matrices P and Q, respectively, and r denotes the mean rating value of T. This objective function can be minimized efficiently using gradient descent method. Once the low-rank matrices P and Q have been obtained, rating values can be predicted according to (2) for any user-item pair (u i , s j ).

2.2.2 STE model

The STE Model [30] is a linear combination of BaseMF and a social network based approach which taken the social trust into account. The objective function of STE Model is defined as follows.

$$\begin{array}{@{}rcl@{}} {\Psi} \left( {R,P,Q,S} \right) &=& \frac{1}{2}{\sum\limits_{({u_{i}},{s_{j}}) \in T} {\left( {{R_{ij}} - g(\alpha {P_{i}}{Q_{j}^{T}}{{\hat R}_{ij}} + (1 - \alpha )\sum\limits_{v \in {N_{i}}} {{S_{iv}}{P_{i}}{Q_{j}^{T}})} } \right)}^{2}}\\ &&+ \frac{\lambda }{2}\left( {\left\| P \right\|_{F}^{2} + \left\| Q \right\|_{F}^{2}} \right) \end{array} $$
(3)

Where S i v denotes the trust degree of u i from his friends u v and parameter α controls the effects of neighbors on predicted rating. g(x) is the logistic function g(x) = 1/(1 + exp(−x)), which makes it possible to bound the range of \({P_{i}}{Q_{j}^{T}}\) within the range of [0, 1]. Experiments show that STE model outperforms the BaseMF model in terms of accuracy.

2.2.3 SocialMF model

The SocialMF [17] incorporates the mechanism of trust propagation into the model based on the matrix factorization technique. This model outperforms BaseMF and STE [30] in terms of Root Square Mean Error (RSME) and the efficacy in dealing with the cold start problem. The objective function of SocialMF Model is defined as follows.

$$\begin{array}{@{}rcl@{}} {\Psi} \left( {R,P,Q,T} \right) &=& \frac{1}{2}{\sum\limits_{({u_{i}},{s_{j}}) \in T} {\left( {{R_{ij}} - {{\hat R}_{ij}}} \right)}^{2}} + \frac{\lambda }{2}\left( {\left\| P \right\|_{F}^{2} + \left\| Q \right\|_{F}^{2}} \right)\\ &&+ \frac{\alpha }{2}\sum\limits_{T} {\left( {\left( {{P_{i}} - \sum\limits_{{u_{v}} \in {N_{{u_{i}}}}} {{T_{iv}}{P_{v}}} } \right){{\left( {{P_{i}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{}} {{T_{iv}}{P_{v}}} } \right)}^{T}}} \right)} \end{array} $$
(4)

Where T i v denotes the degree of trust of user u i on user u i , T i v is a positive value T i v ∈ [0, 1], and each row of the trust matrix T is normalized to 1. Compared with (1), the factor of trust propagation is enforced by the last term in (4), which means the user latent feature P i should be similar to his/her friends latent feature P v . The trade-off between the ratings and the trust propagation is determined by a weight α ≥ 0. If α = 0, the influence of trust propagation is ignored and this model will degenerate back to BaseMF model as a result.

Equation (4) can be optimized by gradient descent approach as well. Once the model is trained, the latent feature matrix P is constrained by the last term in (4) and the degree of trust T i v enhances the latent feature similarity between user u i and u v . The rating value for any user concerning any item can be predicted according to (2).

2.2.4 CircleCon model

The CircleCon model [48] uses inferred circle which is a trust sub-network derived from the trust relationship in social network, and the model applies each such circle to a single category of items. Therefore, the objective function of the CircleCon model is similar to that of SocialMF, but differs in that the Circlecon model being trained in each category separately. The objective function of the CircleCon model is defined as follows.

$$\begin{array}{@{}rcl@{}} {{\Psi}^{c}}\left( {{R^{c}},{P^{c}},{Q^{c}},{T^{c*}}} \right)\! &=&\! \frac{1}{2}{\sum\limits_{({u_{i}},{s_{j}}) \in T} {\left( {R_{ij}^{c} - \hat R_{ij}^{c}} \right)}^{2}} + \frac{\lambda }{2}\left( {\left\| {{P^{c}}} \right\|_{F}^{2} + \left\| {{Q^{c}}} \right\|_{F}^{2}} \right)\\ &&+ \frac{\alpha }{2}\sum\limits_{T} {\left( {\left( \!{{P_{i}^{c}} \,-\, \sum\limits_{{u_{v}} \in {N_{{u_{i}}}}} {T_{iv}^{c*}{P_{v}^{c}}} } \right){{\left( \! {{P_{i}^{c}} \,-\, \sum\limits_{{u_{v}} \in {N_{{u_{i}}}}} {T_{iv}^{c*}{P_{v}^{c}}} } \right)}^{T}}} \right)} \end{array} $$
(5)

Where c is the category of items related to a inferred circle and \(T_{iv}^{c*}\) denotes the normalized interpersonal trust of u v to u i in category c. The CircleCon model enjoys a higher prediction accuracy compared with the SocialMF model.

2.2.5 ContextMF model

The ContextMF model is proposed in [18], which fused the social context including individual preference and interpersonal influence into the probabilistic matrix factorization method used. Individual preference represents the quantitative value for how strong a user likes different items, and interpersonal influence denotes degree of relationships between the user and the item senders. In the ContextMF model, the individual preference similarity is represented by matrix W and interpersonal influence is denoted by matrix S. The objective function of this model is defined as follows.

$$\begin{array}{@{}rcl@{}} {\Psi} \left( {R,P,Q,{S^{*}},{W^{*}}} \right) &=& \frac{1}{2}{\sum\limits_{({u_{i}},{s_{j}}) \in T} {\left( {{R_{ij}} - {{\hat R}_{ij}}} \right)}^{2}} + \frac{\lambda }{2}\left( {\left\| P \right\|_{F}^{2} + \left\| Q \right\|_{F}^{2}} \right)\\ &&+ \frac{\alpha }{2}\sum\limits_{T} {\left( {\left( {{P_{i}} - \sum\limits_{{u_{v}} \in {N_{{u_{i}}}}} {S_{iv}^{*}{P_{v}}} } \right){{\left( {{P_{i}} - \sum\limits_{{u_{v}} \in {N_{{u_{i}}}}} {S_{iv}^{*}{P_{v}}} } \right)}^{T}}} \right)} \\ &&+ \frac{\beta }{2}{\sum\limits_{T} {\left( {W_{iv}^{*} - {P_{i}}{P_{v}^{T}}} \right)}^{2}} \end{array} $$
(6)

Where \(S_{iv}^{*}\) denotes the normalized interpersonal influence of u v to u i and \(W_{iv}^{*}\) denotes the normalized individual preference similarity between u i and u v . α and β are the tradeoff parameters which represent the strength of factors of individual preference and interpersonal influence.

2.2.6 PRM model

PRM model [47] takes more comprehensive factors into account. More specifically, it combines the advantages of different models including BaseMF model, CircleCon model [48] and ContextMF model [18]. Additionally, three social factors, i.e., personal interest, interpersonal interest similarity and interpersonal influence, are incorporated into the PRM model which is shown in (7).

$$\begin{array}{@{}rcl@{}} {{\Psi}^{c}}\!\left( \! {{R^{c}}\!,{U^{c}}\!,{P^{c}}\!,{S^{c*}}\!,{W^{c*}}\!,{Q^{c*}}} \right)\! &=& \frac{1}{2}{\sum\limits_{u,i} {\left( {R_{u,i}^{c} - \hat R_{u,i}^{c}} \right)}^{2}}+ \frac{\lambda }{2}\left( {\left\| {{U^{c}}} \right\|_{F}^{2} + \left\| {{P^{c}}} \right\|_{F}^{2}} \right) \\ &&+ \frac{\beta }{2}\sum\limits_{u} {\left( {\left( {{U_{u}^{c}} - \sum\limits_{v} {S_{u,v}^{c*}{U_{v}^{c}}} } \right){{\left( {{U_{u}^{c}} - \sum\limits_{v} {S_{u,v}^{c*}{U_{v}^{c}}} } \right)}^{T}}} \right)}\\ &&+ \frac{\gamma }{2}\sum\limits_{u} {\left( {\left( {{U_{u}^{c}} - \sum\limits_{v} {W_{u,v}^{c*}{U_{v}^{c}}} } \right){{\left( {{U_{u}^{c}} - \sum\limits_{v} {W_{u,v}^{c*}{U_{v}^{c}}} } \right)}^{T}}} \right)} \\ &&+ \frac{\eta }{2}{\sum\limits_{u,i} {\left| {H_{u}^{c*}} \right|(Q_{u,i}^{c*} - {U_{u}^{c}}P_{i}^{cT})}^{2}} \end{array} $$
(7)

In PRM model, items are first classified into several categories according to tree structure of items. Then, it uses the interpersonal trust and similarity of interest circle which both have effect on user latent feature matrix. Finally, it takes into account the influence of user personal interest for item latent feature. By doing above, a more sophisticated recommendation can be produced.

2.2.7 Our Hybrid matrix factorization(HMF) model

Our HMF model absorbs the advantage of hybrid recommendation and takes more social factors into consideration. It builds on the structure of hypergraph and consider not only the relation between user but also the relation between items, users preference for items and the contextual information when users are selecting items.

3 Basic concepts and definitions on hypergraph

The key tasks of recommender systems are to establish binary relationship between users and items, and employ user ratings or similarity of users interest to identify potential interesting items for users so as to achieve personalized recommendation. In this paper, we use hypergraph to represent the binary relation between users and items [15]. In the following, we present some basic definitions based on hypergraph topological structure.

Definition 1

Item Centered Hypergraph. Suppose that a social network has m items and n users, the binary relationship between item and user can be expressed as H = (S,E S ), where S = {s 1, s 2,⋯,s m } denotes the set of items and E S = {e s 1, e s 2, ⋯, e s m } denotes hyperedge set and satisfies the condition \(\bigcup \limits _{i = 1}^{m} {e{s_{i}} = U}\), where U = {u 1, u 2,⋯,u n } denotes the set of users. We call this graph as item centered hypergraph. According to the data in Table 1, we have H = ({s 1, s 2, s 3, s 4}, {{u 1}, {u 1, u 2, u 3, u 4, u 5}, {u 4, u 5, u 6, u 7}, {u 6, u 7, u 8}}). The item centered hypergraph is shown in Figure 1.

Table 1 Ratings for item from users
Figure 1
figure 1

Item Centered Hypergraph

Definition 2

User Centered Hypergraph. Suppose that an item centered hypergraph is H = (S,E S ), and its dual graph denoted by H = (U,E U ), where E U = {e u 1, e u 2,⋯,e u n } denotes hyperedge set and meets the condition \(\bigcup \limits _{i = 1}^{n} {e{u_{i}} = S}\). We call this graph as user centered hypergraph. Again, according to the data in Table 1, we have

$${H^ *}=(\{u_{1}, u_{2}, u_{3}, u_{4}, u_{5}, u_{6}, u_{7}, u_{8}\}, \{\{s_{1}, s_{2}\}, \{s_{2}\}, \{s_{2}\}, \{s_{2}, s_{3}\}, \{s_{2}, s_{3}\}, \{s_{3}, s_{4}\}, \{s_{3}, s_{4}\}, \{s_{4}\}\}) $$

The user centered hypergraph is shown in Figure 2.

Figure 2
figure 2

User Centered Hypergraph

Definition 3

Item Feature Space. It is described by a vector \(\overrightarrow {fs} = (f{s_{1}},f{s_{2}}, {\cdots } ,f{s_{{l_{s}}}})\), where l s denotes the dimensionality of \(\overrightarrow {fs}\). The value of f s i (il s ) uses quantitative value according to [24]. The item feature space of item s j can be expressed as \(\overrightarrow {f{s_{j}}} = (f{s_{j1}},f{s_{j2}}, {\cdots } ,f{s_{j{l_{s}}}})\). For example, the Item Feature Space of the movie Avatar can be something like \(\overrightarrow {fs}(Avatar)=(\) science fiction movie, America, 2009, James Cameron, Twentieth Century Fox Film Corporation, Sam Worthington, Zoe Saldana, Oscar ) which represents the item feature elements of film types, production area, publication year, production company, leading actor, leading actress and the award received, respectively.

Definition 4

User Feature Space. It is described by a vector \(\overrightarrow {fu} = (f{u_{1}},f{u_{2}}, {\cdots } ,f{u_{{l_{u}}}})\), where l u denotes the dimensionality of \(\overrightarrow {fu}\). The value of f u i (il u ) uses quantitative value according to [24]. The user feature space of user u i can be expressed as \(\overrightarrow {f{u_{i}}} = (f{u_{i1}},f{u_{i2}}, {\cdots } ,f{u_{i{l_{u}}}})\). For example, user u1 can expressed as \(\overrightarrow {fu}(u1)=(male, 31, China, Engineer, Married)\) which represents the user feature elements of gender, age, nationality, job and marital status, respectively.

Definition 5

Rating Function. Assume that user u i provides rating on item s j , which is denoted by r i j = R a n k(u i , s j ), where r i j R, and R is the rating value set, denoted by R = {1,2,⋯,N U M}. Without losing generality, we set N U M = 5. Figure 3 shows a rating-weighted hypergraph, both of which are based on the data in Table 1.

Figure 3
figure 3

Rating-weighted hypergraph

Definition 6

User Neighbor. If user u i and user u v have rated some identical items, that is e u i e u v in H = (U,E U ) with e u i E U and e u v E U , then u i and u v is a user neighbor of each other. The neighbor of user u i is denoted by \({N_{{u_{i}}}}\). For example, u 1 is a user neighbor of u 2 in Table 1 and vice versa.

Definition 7

Adjacent Item. Assume that the users who have rated item s j are denoted by \({U_{{s_{j}}}}\). Then all the items which have been rated by the users in \({U_{{s_{j}}}}\) is referred to as adjacent items of item s j , which is denoted by \({M_{{s_{j}}}}\). For Instance, the adjacent items of s 3 in Table 1 is {s 2, s 4}.

Definition 8

Rating Contextual Information. It represents those related aspects of information when user u i chooses item s j , which is denoted by a vector \(\overrightarrow C = ({c_{1}},{c_{2}}, \cdots ,{c_{{l_{c}}}})\) with each vector component c t (t = 1,2,⋯,l c ) representing one aspect of contextual information, such as temperature, time, location and so on.

Definition 9

Rating Contribution. The function of rating contribution for item s j from user u i is defined as \({d_{ij}} = Contribution\left ({{u_{i}},{s_{j}}} \right ) = \frac {{Rank\left ({{u_{i}},{s_{j}}} \right )}}{{\sqrt {\sum \limits _{i = 1}^{n} {{{\left ({Rank\left ({{u_{i}},{s_{j}}} \right )} \right )}^{2}}} } }}\). We can employ rating contribution to build a rating contribution weighted hypergraph that is demonstrated in Figure 4 based on the data in Table 1.

Figure 4
figure 4

Rating Contribution Weighted Hypergraph

Definition 10

User Rating Similarity. Assume that the item set I c = {s 1, s 2,⋯ ,s n } which has both been rated by user u i and user u j . Then the user rating similarity between u i and u j is defined as \(Sim\_C\left ({{u_{i}},{u_{j}}} \right ) = \frac {{\sum \limits _{k = 1}^{n} {({d_{ik}} - \overline {{d_{i}}} )({d_{jk}} - \overline {{d_{j}}} )} }}{{\sqrt {\sum \limits _{k = 1}^{n} {{{\left ({{d_{ik}} - \overline {{d_{i}}} } \right )}^{2}}} \sum \limits _{k = 1}^{n} {{{\left ({{d_{jk}} - \overline {{d_{j}}} } \right )}^{2}}} } }}\), where \(\overline {{d_{i}}} = \left ({1/\left | {{I_{c}}} \right |} \right ){\sum }_{{s_{j}} \in {I_{c}}} {{d_{ij}}}\) and the item set which has been rated by user u i is denoted by I c . If I c = , we will assign the value of \(\overline {{d_{i}}}\) as 0. In this paper, we use S C i j to represent S i m_C(u i , u j ).

Definition 11

Item Feature Similarity. The function of item feature similarity between item s i and item s j defined as the following equation using cosine measure:\(Sim\_S\left ({{s_{i}},{s_{j}}} \right ) = \cos \left ({\overrightarrow {f{s_{i}}} ,\overrightarrow {f{s_{j}}} } \right ) = \frac {{\sum \limits _{k = 1}^{{l_{s}}} {f{s_{ik}}f{s_{jk}}} }}{{\sqrt {\sum \limits _{k = 1}^{{l_{s}}} {fs_{ik}^{2}} } \sqrt {\sum \limits _{k = 1}^{{l_{s}}} {fs_{jk}^{2}} } }}\). In this paper, we use S S i j to represent S i m_S(s i , s j ).

Definition 12

User Feature Similarity. The user feature similarity between user u i and user u j is defined as follows using cosine measure: \(Sim\_U\left ({{u_{i}},{u_{j}}} \right ) = \cos \left ({\overrightarrow {f{u_{i}}} ,\overrightarrow {f{u_{j}}} } \right ) = \frac {{\sum \limits _{k = 1}^{{l_{u}}} {f{u_{ik}}f{u_{jk}}} }}{{\sqrt {\sum \limits _{k = 1}^{{l_{u}}} {fu_{ik}^{2}} } \sqrt {\sum \limits _{k = 1}^{{l_{u}}} {fu_{jk}^{2}} } }}\). In this paper, we use S U i j to represent S i m_U(u i , u j ).

4 Our recommendation model

In our proposed model HMF, we employ matrix factorization method to obtain the optimal latent features for users and items using the stochastic gradient decent approach. The objective function used in BaseMF model, as in (1), only involves the item and user latent features, and it does not take into account other factors which are contextual information, relation between users and items, relation between items and relation between users. In this paper, we will combine various related factors, which are about three object entities and four relations shown in Figure 5, to produce hybrid recommendations based on BaseMF to achieve a better performance.

Figure 5
figure 5

Three entities in recommender system, including contextual information, user feature and item feature. Four types relation between the entities which are user-item relation, item-item relation, user-user relation and contextual information-user relation

4.1 Fusing factors into HMF model

The proposed HMF model fuses four factors: contextual information, user rating similarity, item feature and user feature. Firstly, we introduce the approach by adding contextual information. Then we infer the objective function of the proposed HMF model by adding other three factors. In this way, we will represent our sophisticated model step by step in an incremental manner.

4.1.1 Stage 1: Adding contextual information

Contextual information has been proven to be valuable information for building accurate recommender system [26]. We handle contextual information by applying clustering algorithm to partition the original user item-rating matrix such that the ratings with similar contextual information are grouped. The contextual information is typically associated with the process when users are selecting items. The contextual information space in this paper is represented by vector \(C = ({c_{1}},{c_{2}}, {\cdots } ,{c_{{l_{c}}}})\), where each vector component c t is binary taking the value of 0 or 1. For instance, c 1 may denote Weekend, c 2 may denote Daylight, and c 3 may denote Alone. In this way, the contextual information that Bob went to see the movie Avatar with his friends at 9:00 PM on Saturday can be expressed as (1,0,0). In this way, we can express contextual information in a unified form. We can then cluster the contextual information so that the contextual information which has high correlation can be grouped together in one cluster. In this paper, the k-Modes algorithm which has a good performance for discrete data is applied to cluster contextual information.

Recall that in (1), the optimization of the objective function Ψ is carried out based on the entire training dataset. In order to improve the optimization performance of the object function Ψ, we first group the training dataset into k clusters using the k-Modes Algorithm. Then, the object function Ψ can be modified accordingly as follows:

$$ {\Psi} \left( {{R^{{T_{x}}}},{P^{{T_{x}}}},{Q^{{T_{x}}}}} \right) = \frac{1}{2}{\sum\limits_{({u_{i}},{s_{j}}) \in {T_{x}}} {\left( {R_{_{ij}}^{{T_{x}}} - \hat R_{_{ij}}^{{T_{x}}}} \right)}^{2}} + \frac{\lambda }{2}\left( {\left\| {{P^{{T_{x}}}}} \right\|_{F}^{2} + \left\| {{Q^{{T_{x}}}}} \right\|_{F}^{2}} \right) $$
(8)
$$ \hat R_{_{ij}}^{{T_{x}}} = {r^{{T_{x}}}} + {P^{{T_{x}}}}{\left( {{Q^{{T_{x}}}}} \right)^{T}} $$
(9)

Where T x denotes the x th cluster of training dataset. \(R_{_{ij}}^{{T_{x}}}\) and \(\hat R_{_{ij}}^{{T_{x}}}\) represent the actual and predicted rating for item s j from user u i . \({r^{{T_{x}}}}\) denotes the mean value of user rating in the x th cluster of the training dataset.

After partitioning the training data into k clusters, we then use objective function in (8) to get the optimal latent features for users and items iteratively. There are several salient advantages for this approach. Firstly, the high density rating data is gathered into a sub-matrix, whereby the irrelevant rating data or those with low degree of correlation are removed. This can contribute to improving the accuracy of the recommender system. Secondly, through data clustering, the training data is divided into k clusters which can effectively reduce the complexity of matrix operations.

4.1.2 Stage 2: Adding user rating similarity

In real-life scenarios, many users choose to purchase an item or see a movie through friends’ recommendations. Therefore, using friends circle recommendation [41] can improve the accuracy of recommender systems and resolve the limited content problem in content-based recommendation models. The experiment results in [48] show that the circle-based recommendation model outperforms the BaseMF model in terms of RMSE. The difference of preference and interest between friends will affect the user latent feature. In this paper, we extract the user rating similarity under different contextual scenarios to optimize the mode in (8). Using inferred social circles of those friends exhibiting similar taste, we incorporated the user rating similarity into our model.

According to [17], zero mean Gaussian priors are assumed for user and item latent feature vectors: \(p(P|{\sigma _{P}^{2}}) = \prod \limits _{{u_{i}}} {N({P_{i}}|0,{\sigma _{P}^{2}}I)},p(Q|{\sigma _{Q}^{2}}) = \prod \limits _{{u_{i}}} {N({Q_{j}}|0,{\sigma _{Q}^{2}}I)} \) By adding user rating similarity to the user latent feature vectors, we can have the conditional distributions given the user latent features of his neighbors:

$$ p\left( {{P^{{T_{x}}}}|SC_{}^{{T_{x}}},{{\Omega}^{{T_{x}}}}} \right) = \prod\limits_{{u_{i}}} {N\left( P_{i}^{^{{T_{x}}}}|\sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SC_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} ,{{\Omega}^{{T_{x}}}}\right)} $$
(10)

Where \({{\Omega }^{{T_{x}}}}\) denotes the zero-mean spherical Gaussian priors in x th cluster, \(N_{_{{u_{i}}}}^{{T_{x}}}\) denotes user u i neighbors in the x th cluster and \(SC_{iv}^{{T_{x}} * }\) is the normalization of \(SC_{iv}^{{T_{x}}}\).

According to [17], through a Bayesian inference, the posterior probability of the latent feature vectors can be obtained as follows:

$$\begin{array}{@{}rcl@{}} &&p\left( {{P^{{T_{x}}}},{Q^{{T_{x}}}}|{R^{{T_{x}}}},SC_{}^{{T_{k}} * },{{\Omega}^{{T_{x}}}}} \right)\\ &&\propto p\left( {{R^{{T_{x}}}}|{P^{{T_{x}}}},{Q^{{T_{x}}}},{{\Omega}^{{T_{x}}}}} \right)p\left( {{P^{{T_{x}}}}|SC_{}^{{T_{x}} * },{{\Omega}^{{T_{x}}}}} \right)p\left( {{P^{{T_{x}}}}|{{\Omega}^{{T_{x}}}}} \right)p\left( {{Q^{{T_{x}}}}|{{\Omega}^{{T_{x}}}}} \right)\\ &&= {\prod\limits_{{u_{i}}} {\prod\limits_{{s_{j}}} {\left[ {N\left( {R_{_{ij}}^{{T_{x}}}|P_{i}^{{T_{x}}}{{(Q_{j}^{{T_{x}}})}^{T}},{{\Omega}^{{T_{x}}}}} \right)} \right]} }^{I_{ij}^{{T_{x}}}}} \times \prod\limits_{{u_{i}}} {N\left( {P_{_{i}}^{{T_{x}}}|\sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SC_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}},{{\Omega}^{{T_{x}}}}} } \right)}\\ &&\quad\times \prod\limits_{{u_{i}}} {N\left( {P_{_{i}}^{{T_{x}}}|0,{{\Omega}^{{T_{x}}}}} \right)} \times \prod\limits_{{s_{j}}} {N\left( {Q_{j}^{{T_{x}}}|0,{{\Omega}^{{T_{x}}}}} \right)} \end{array} $$
(11)

where \(I_{ij}^{{T_{x}}}\) is the indicator function that equals to 1 if user u i has rated item s j and equals to 0 otherwise. The training objective function according to (11) is improved as follows:

$$\begin{array}{@{}rcl@{}} {\Psi} \left( {{R^{{T_{x}}}},{P^{{T_{x}}}},{Q^{{T_{x}}}},SC_{iv}^{{T_{x}} * }} \right) &=& \frac{1}{2}{\sum\limits_{({u_{i}},{s_{j}}) \in {T_{x}}} {\left( {R_{_{ij}}^{{T_{x}}} - \hat R_{_{ij}}^{{T_{x}}}} \right)}^{2}} + \frac{\lambda }{2}\left( {\left\| {{P^{{T_{x}}}}} \right\|_{F}^{2} + \left\| {{Q^{^{{T_{x}}}}}} \right\|_{F}^{2}} \right)\\ &&+ \frac{\alpha }{2}\sum\limits_{{T_{x}}} \left( \left( {P_{i}^{{T_{x}}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SC_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} } \right)\right.\\ &&\qquad\qquad\left.\times{{\left( {P_{i}^{{T_{x}}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SC_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} } \right)}^{T}} \right) \end{array} $$
(12)

Where α is tradeoff parameter which controls the factor of user rating similarity by the last term in (12).

4.1.3 Stage 3: adding user and item feature similarity

The survey of sociology shows that certain users have common interests [1]. For example, housewives like watching Korean teleplay. Moreover, some kinds of items become more popular at a certain time of the year. For example, people more likely to buy turkey on Thanksgiving Day, and on June 1 the children’s day, childrens movies are popular. Therefore, in the case of contextual information clustering, we supplement the restricting terms for item and user feature similarity on the basis of (12) which can enhance our models practicability.

Similar to (10), we have the conditional distributions given the user and item latent features. Equations (13) and (14) in a graphical form are shown in Figure 6.

$$ p\left( {{P^{{T_{x}}}}|S{U^{{T_{x}}}},{{\Omega}^{{T_{x}}}}} \right) = \prod\limits_{{u_{i}}} {N\left( P_{i}^{^{{T_{x}}}}|\sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SU_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} ,{{\Omega}^{{T_{x}}}}\right)} $$
(13)
$$ p\left( {{Q^{{T_{x}}}}|S{S^{{T_{x}}}},{{\Omega}^{{T_{x}}}}} \right) = \prod\limits_{{s_{j}}} {N(Q_{j}^{^{{T_{x}}}}|\sum\limits_{{s_{v}} \in M_{_{{s_{j}}}}^{{T_{x}}}} {SS_{jv}^{{T_{x}} * }Q_{v}^{{T_{x}}}} ,{{\Omega}^{{T_{x}}}})} $$
(14)
Figure 6
figure 6

Recommendation Schematic Drawings by adding user and item feature similarity. n denotes the total number of users in x th cluster, m denotes the total number of items in x th cluster, \(|{N_{_{{u_{i}}}}^{{T_{x}}}} |\) denotes the total number of neighbors of user u i in x th cluster. \(| {M_{_{{s_{j}}}}^{{T_{x}}}} |\) denotes the total number of adjacent items of s j

Through a Bayesian inference, the posterior probability of the latent feature vectors \({P^{{T_{x}}}}\) and \({Q^{{T_{x}}}}\) can be obtained as follows based on (11).

$$\begin{array}{@{}rcl@{}} &&\!\!\! p\left( {{\!P^{{T_{x}}}}\!,{\!Q^{{T_{x}}}}|{R^{{T_{x}}}}\!,S{C^{{T_{x}} * }}\!,S{U^{{T_{x}} * }}\!,S{S^{{T_{x}} * }}\!,{{\Omega}^{{T_{x}}}}} \right) \!\propto\! p\!\left( {{\!R^{{T_{x}}}}|{P^{{T_{x}}}}\!,{Q^{{T_{x}}}}\!,{{\Omega}^{{T_{x}}}}} \right)\!p\!\left( {{\!P^{{T_{x}}}}|S{C^{{T_{x}} * }},{{\Omega}^{{T_{x}}}}} \right)\\ &&\!\!\!p\left( {{P^{{T_{x}}}}|S{U^{{T_{x}} * }},{{\Omega}^{{T_{x}}}}} \right)p\left( {{Q^{{T_{x}}}}|S{S^{{T_{x}} * }},{{\Omega}^{{T_{x}}}}} \right) p\left( {{P^{{T_{x}}}}|{{\Omega}^{{T_{x}}}}} \right)p\left( {{Q^{{T_{x}}}}|{{\Omega}^{{T_{x}}}}} \right)\\ &&= {\prod\limits_{{u_{i}}} {\prod\limits_{{s_{j}}} {\left[ {N(R_{_{ij}}^{{T_{x}}}|P_{i}^{{T_{x}}}{{(Q_{j}^{{T_{x}}})}^{T}},{{\Omega}^{{T_{x}}}})} \right]} }^{I_{ij}^{{T_{x}}}}} \times \prod\limits_{{u_{i}}} {N(P_{_{i}}^{{T_{x}}}|\sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SC_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}},{{\Omega}^{{T_{x}}}}} )}\\ &&\quad\times \prod\limits_{{u_{i}}} {N(P_{_{i}}^{{T_{x}}}|\sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SU_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}},{{\Omega}^{{T_{x}}}}} )} \times \prod\limits_{{s_{j}}} {N(Q_{j}^{{T_{x}}}|\sum\limits_{{s_{j}} \in M_{_{{s_{j}}}}^{{T_{x}}}} {SS_{jv}^{{T_{x}} * }Q_{v}^{{T_{x}}},{{\Omega}^{{T_{x}}}}} )}\\ &&\quad\times \prod\limits_{{u_{i}}} {N(P_{_{i}}^{{T_{x}}}|0,{{\Omega}^{{T_{x}}}})} \times \prod\limits_{{s_{j}}} {N(Q_{j}^{{T_{x}}}|0,{{\Omega}^{{T_{x}}}})} \end{array} $$
(15)

The objective function which supplements restrict term of user feature similarity and item feature similarity to the (12) is shown as follows:

$$\begin{array}{@{}rcl@{}} &&{\Psi} \left( {{R^{{T_{x}}}},{P^{{T_{x}}}},{Q^{{T_{x}}}},S{C^{{T_{x}} * }},S{U^{{T_{x}} * }},S{S^{{T_{x}} * }}} \right)\\ &=& \frac{1}{2}{\sum\limits_{({u_{i}},{s_{j}}) \in {T_{x}}} {\left( {R_{_{ij}}^{{T_{x}}} - \hat R_{_{ij}}^{{T_{x}}}} \right)}^{2}} + \frac{\lambda }{2}\left( {\left\| {{P^{{T_{x}}}}} \right\|_{F}^{2} + \left\| {{Q^{{T_{x}}}}} \right\|_{F}^{2}} \right)\\ &&+ \frac{\alpha }{2}\sum\limits_{{T_{x}}} {\left( {\left( {P_{i}^{{T_{x}}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SC_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} } \right){{\left( {P_{i}^{{T_{x}}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SC_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} } \right)}^{T}}} \right)}\\ &&+ \frac{\beta }{2}\sum\limits_{{T_{x}}} {\left( {\left( {P_{i}^{{T_{x}}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SU_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} } \right){{\left( {P_{i}^{{T_{x}}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SU_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} } \right)}^{T}}} \right)}\\ &&+ \frac{\gamma }{2}\sum\limits_{{T_{x}}} {\left( {\left( {Q_{j}^{{T_{x}}} - \sum\limits_{{s_{v}} \in M_{_{{s_{j}}}}^{{T_{x}}}} {SS_{jv}^{{T_{x}} * }Q_{v}^{{T_{x}}}} } \right){{\left( {Q_{j}^{{T_{x}}} - \sum\limits_{{s_{v}} \in M_{_{{s_{j}}}}^{{T_{x}}}} {SS_{jv}^{{T_{x}} * }Q_{v}^{{T_{x}}}} } \right)}^{T}}} \right)} \end{array} $$
(16)

Where β and γ are tradeoff parameters which control the factor of user and item feature similarity respectively.

4.2 Model training

After the rating data is divided into k subsets based on the clustering result of k-Modes algorithm on the contextual information, the model in (16) is used to train \({P^{{T_{x}}}}\) and \({Q^{{T_{x}}}}\) in each cluster separately. \({P^{{T_{x}}}}\) and \({Q^{{T_{x}}}}\) are considered as variables and gradient decent method can be used to obtain the optimal latent feature vectors. An improved variable step gradient descent algorithm is proposed which is more efficient than PRM [47].

$$\begin{array}{@{}rcl@{}} \frac{{\partial \psi }}{{\partial P_{i}^{{T_{x}}}}} &=& \sum\limits_{({u_{i}},{s_{j}}) \in {T_{x}}} {\left( {\hat R_{_{ij}}^{{T_{x}}} - R_{_{ij}}^{{T_{x}}}} \right)} Q_{j}^{{T_{x}}} + \lambda P_{i}^{{T_{x}}}+ \alpha \left( {P_{i}^{{T_{x}}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SC_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} } \right) \\ &&- \alpha \sum\limits_{{u_{v}}:{u_{i}} \in N_{_{{u_{v}}}}^{{T_{x}}}}^{} {SC_{vi}^{{T_{x}} * }\left( {P_{v}^{{T_{x}}} - \sum\limits_{{u_{w}} \in N_{_{{u_{v}}}}^{{T_{x}}}} {SC_{vw}^{{T_{x}} * }P_{w}^{{T_{x}}}} } \right)}\\ &&+ \beta \left( {P_{i}^{{T_{x}}} - \sum\limits_{{u_{v}} \in N_{_{{u_{i}}}}^{{T_{x}}}} {SU_{iv}^{{T_{x}} * }P_{v}^{{T_{x}}}} } \right) \\ &&- \beta \sum\limits_{{u_{v}}:{u_{i}} \in N_{_{{u_{v}}}}^{{T_{x}}}}^{} {SU_{vi}^{{T_{x}} * }\left( {P_{v}^{{T_{x}}} - \sum\limits_{{u_{w}} \in N_{_{{u_{v}}}}^{{T_{x}}}} {SU_{vw}^{{T_{x}} * }P_{w}^{{T_{x}}}} } \right)} \end{array} $$
(17)
$$\begin{array}{@{}rcl@{}} \frac{{\partial \psi }}{{\partial Q_{j}^{{T_{x}}}}} &=& \sum\limits_{({u_{i}},{s_{j}}) \in {T_{x}}} {\left( {\hat R_{_{ij}}^{{T_{x}}} - R_{_{ij}}^{{T_{x}}}} \right)} p_{i}^{{T_{x}}} + \lambda Q_{j}^{{T_{x}}}+ \gamma \left( {Q_{j}^{{T_{x}}} - \sum\limits_{{s_{v}} \in M_{_{{s_{j}}}}^{{T_{x}}}} {SS_{jv}^{{T_{x}} * }Q_{v}^{{T_{x}}}} } \right) \\ &&- \gamma \sum\limits_{{s_{v}}:{s_{j}} \in M_{_{{s_{v}}}}^{{T_{x}}}}^{} {SS_{vj}^{{T_{x}} * }\left( {Q_{v}^{{T_{x}}} - \sum\limits_{{s_{w}} \in M_{_{{s_{v}}}}^{{T_{x}}}} {SS_{vw}^{{T_{x}} * }Q_{w}^{{T_{x}}}} } \right)} \end{array} $$
(18)
figure a

Before running the optimization algorithm, we first set the values of \({P^{{T_{x}}}}\) and \({Q^{{T_{x}}}}\) as 0. According to (17) and (18), the user and item latent feature vectors \({P^{{T_{x}}}}\) and \({Q^{{T_{x}}}}\) are updated as follows as in (19) and (20) based on previous values to ensure the fastest improvement of the objective function by each iteration:

$$ P_{i}^{{T_{x}}}(t + 1) = P_{i}^{{T_{x}}}(t) - l\frac{{\partial \psi }}{{\partial P_{i}^{{T_{x}}}}}(t) $$
(19)
$$ Q_{j}^{{T_{x}}}(t + 1) = Q_{j}^{{T_{x}}}(t) - l\frac{{\partial \psi }}{{\partial Q_{j}^{{T_{x}}}}}(t) $$
(20)

Where l is the learning rate. Our improved variable step gradient descent algorithm is presented in Algorithm 1. In the algorithm, we set a chaos variable μ which has micro different value in each iteration, making it possible for the objective function to converge steadily.

The main computational overhead of the gradient methods lies in the evaluation of the objective function Ψ and its gradients against variables. Because of the sparsity of matrices \({{\mathrm {R}}^{{T_{x}}}}\), \({\text {SC}^{{T_{x}}}}\), \({\text {SU}^{{T_{x}}}}\) and \({\text {SS}^{{T_{x}}}}\), the computational complexity of evaluating the object function Ψ is \(O({l_{n}}{\omega _{{R^{{T_{x}}}}}} + {l_{n}}{\omega _{S{C^{{T_{x}}}}}} + {l_{n}}{\omega _{S{U^{{T_{x}}}}}} + {l_{n}}{\omega _{S{S^{{T_{x}}}}}})\), where \({\omega _{{R^{{T_{x}}}}}}\), \({\omega _{SC^{{T_{x}}}}}\), \({\omega _{SU^{{T_{x}}}}}\) and \({\omega _{SS^{{T_{x}}}}}\) are the numbers of nonzero entries in matrices \({{\mathrm {R}}^{{T_{x}}}}\), \({\text {SC}^{{T_{x}}}}\), \({\text {SU}^{{T_{x}}}}\) and \({\text {SS}^{{T_{x}}}}\), respectively, and l n is the number of latent factors. The computational complexity for gradients \(\frac {{\partial \psi }}{{\partial P_{i}^{{T_{x}}}}}\), \(\frac {{\partial \psi }}{{\partial Q_{i}^{{T_{x}}}}}\) in (17), (18) are \(O({l_{n}}{\omega _{{R^{{T_{x}}}}}} + {l_{n}}{\omega _{S{C^{{T_{x}}}}}} + {l_{n}}{\omega _{S{U^{{T_{x}}}}}})\), \(O({l_{n}}{\omega _{{R^{{T_{x}}}}}} + {l_{n}}{\omega _{S{S^{{T_{x}}}}}})\), respectively. Thus, for each iteration, the total computational complexity is \(O({l_{n}}{\omega _{{R^{{T_{x}}}}}} + {l_{n}}{\omega _{S{C^{{T_{x}}}}}} + {l_{n}}{\omega _{S{U^{{T_{x}}}}}} + {l_{n}}{\omega _{S{S^{{T_{x}}}}}})\), which indicates that the computational time of our method is linear with respect to the number of observations in the four sparse matrices. In addition, we divide the training data into groups by clustering, which further helps improve the efficiency of our method. This complexity analysis shows that our proposed approach is very efficient and can well scale to very large datasets.

5 Experimental evaluation

In this section, we will report the extensive experiments that we have conducted to evaluate the performance of our proposed recommender system. We will start with a discussion on the experimental setup of our evaluation, which will cover the information about datasets, performance measurements, the recommendation algorithms to be used for comparative study and the parameter setting. The detailed experimental results will be reported in the second half of this section.

5.1 Experimental setup

5.1.1 Datasets

Three commonly used real-life datasets are used to validate our model and perform comparative study with existing models in our work. These datasets are MovieLens,Footnote 1 EpinionsFootnote 2 and DoubanFootnote 3 which are chosen due to the use of item feature, user feature and contextual information in our model. The Moivelens dataset includes 3,900 moives, 6,040 users and 1,000,209 ratings. Epinions is a consumer opinion website where users can review items and also assign them numeric ratings in the range of 1 to 5. We use the version of the Epinions dataset published by the authors of [55]. This dataset is made up of ratings from 75,888 users who rated a total of 149,943 different items. The total number of ratings is 598,329, it includes 25 categories and 240 subcategories. In our work, we selected 5 representative categories in the Epinions dataset and experimented based on them. Douban is one of the most popular social networks in China. It includes several parts: reading, movie, music, city circle and so on. In Douban, users can rate the book which they have read and share the reviews to their friends. We crawled nearly 30,000 users and 400,000 items from July 2012 to January 2014. The details of these three datasets are shown in Tables 23 and 4.

Table 2 MovieLens dataset
Table 3 Epinions dataset
Table 4 Douban dataset

5.1.2 Performance Measures

In our work, we use 70% of the data as the training dataset and the remaining 30% as the test dataset in the above three datasets respectively. The performance measures we use in our experiments are Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) which are the most popular accuracy measures. RMSE and MAE are defined as follows respectively.

$$ RMSE = \sqrt {\frac{{\sum\limits_{\left( {{u_{i}},{s_{j}}} \right) \in {T_{test}}} {{{\left( {{R_{ij}} - {{\hat R}_{ij}}} \right)}^{2}}} }}{{\left| {{T_{test}}} \right|}}} $$
(21)
$$ MAE = \frac{{\sum\limits_{\left( {{u_{i}},{s_{j}}} \right) \in {T_{test}}} {\left| {{R_{ij}} - {{\hat R}_{ij}}} \right|} }}{{\left| {{T_{test}}} \right|}} $$
(22)

Where T t e s t is the training dataset, R i j is the real rating of item from user, and \({\hat R_{ij}}\) is the corresponding predicted rating.

5.1.3 Competitive algorithms for comparison

The following recommendation models are involved in our experimental evaluation for performance comparative study.

  1. (1)

    BaseMF. This model uses basic matrix factorization techniques without considering any social factors.

  2. (2)

    SocialMF. This mode improves the recommenda-tion accuracy compared with the BaseMF as it makes a use of the factor of social trust as the social links in the dataset.

  3. (3)

    PRM. This model integrates the user interest, interpersonal interest similarity and interpersonal influence into the model. According to the properties of items, the dataset was divided into subcategories to improve the recommendation accuracy of the model.

  4. (4)

    HMF. Our model considers four types relation which are user-item relation, item-item relation, user-user relation and contextual information-user relation based on the topological structure of hypergraph. The model classifies the training dataset into several groups based on the contextual information which helps enhance the intrinsic link among features in the latent space, and three important factors (user feature, item feature and users ratings) are incorporated into the model.

5.1.4 Parameter settings

Table 5 presents the details about the parameters used in all methods including their meanings and the default values.

Table 5 Description and default value of parameters

5.2 Experiments results

5.2.1 Recommendation accuracy

In this section, we first evaluate the recommendation accuracy of different models in terms of both RMSE and MAE. Then we compare their classification accuracy based on precision, recall and F-measure. We conduct a series of experiments on MovieLens, Epininons and Douban under the default experimental settings set forth in Table 6. The results of experiments are shown in Tables 67 and 8. The percentage numbers in each cell of the tables are the relative improvements of the HMF model over the other comparative models. We can see from the results that the accuracy of our model is better than PRM and outperforms BaseMF and SocialMF by a larger margin. More specifically, as shown in Table 6, our model increases recommendation accuracy by 17.2%, 6.0% and 3.2% in terms of RMSE, and by 16.8%, 7.1% and 1.3% in terms of MAE over BaseMF, SocialMF and PRM, respectively. Our model achieves an even better accuracy performance when dealing with Epinions and Douban datasets , as shown in Tables 7 and 8. This is because that the contextual information was taken into account in our model and the training dataset was divided into 8 groups using k-Modes clustering algorithm. The MovieLens dataset doesn’t have the contextual information, so the k-Modes clustering algorithm is not applicable on this dataset.

Table 6 Performance comparison on the movielens dataset
Table 7 Performance comparison on the epinions dataset
Table 8 Performance comparison on the douban dataset

In order to further test the overall performance of the HMF model, we utilize three additional measures including precision, recall, and their harmonic mean F1 for experimental evaluation. For the conduct of this experiment, we must transform the item ratings into a binary scale (i.e., relevant and non-relevant) by converting each rating of four or five to relevant and all ratings from one to three to not-relevant. Thus, the precision is the ratio of recommended items (i.e., the items whose predicted ratings are larger than three) that are relevant. The recall is the fraction of relevant items that are recommended. Finally, the F1 score is the harmonic mean of the precision and recall. We use MovieLens dataset and set numbers of recommendation range from 2 to 30. From Figure 7a and b, we can see that our model is superior to other models in terms of precision, recall and F1.

Figure 7
figure 7

Results on dataset (MovieLens) in terms of precision/recall and F1

5.2.2 Performance on sparse datasets

In order to analyze the performance of various models when dealing with sparse datasets, we take out some rating data from the datasets randomly to generate new datasets with desired sparsity for the experiments, and evaluate the recommendation accuracy of each model against RMSE and MAE in different scenarios. To ensure the fairness of the experiment, we only conduct this experiment using MovieLens with default parameter settings for different models. Figure 8a and b present the experiment results of different models in terms of MAE and RMSE under 7 different cases of data sparsity, which are 1E-2, 5E-3, 1E-3, 5E-4, 1E-4, 5E-5, 1E-5. For instance, 1E-2 denotes 1 × 10−2. X coordinates represents sparsity which is defined as: S p a r s i t y = N r /(N u × N s ), where N r denotes the number of ratings, N u and N s denote the number of users and items, respectively. We can see that our model is superior to other models compared from 1E-2 to 1E-5 in terms of both MAE and RMSE. Because HMF model uses both user and item feature whilst the competitive models do not use, it demonstrates the user feature and item feature are effectively integrated into our model.

Figure 8
figure 8

Results on sparse dataset (MovieLens) in terms of MAE and RMSE

5.2.3 Sensitivity study

In this subsection, we conducted experiments to test the sensitivity of our model with regard to various parameters to understand the effect posed by them on the recommendation accuracy of our model.

  • (a) The effect of the number of iterations

    In our recommendation model, we use the improved variable step gradient descent algorithm (IVGDA) to do experiment to get the optimal user and item latent feature vectors. Because PRM use the MovieLens dataset, in order to compare with the other algorithm fairly, in this experiment, the MovieLens is used and we set m = 2, μ = 0.5 and ε = 0.001 in Algorithm 1. We compare the performance of IVGDA with the gradient descend algorithm (GDA) used by PRM Model [47]. From Figure 9a and b, we can observe that both MAE and RMSE decrease steadily when the number of iterations increases for the two algorithms, but IVGDA features a noticeably better convergence than GDA.

  • (b) The effect of cluster number ( k )

    In order to evaluate the effect of number of clusters on recommendation accuracy, we conduct experiments on Epinions and Douban datasets, because these two datasets have the contextual information, but MovieLens does not. We set the value of k from 1 to 32, and set other parameters as their respective default value. As shown in Figure 10a and b for Epinions dataset, we see that subcategories such as software and toys have a lower predicted error when k is set to relatively small values, and the value of MAE and RMSE grow quickly when k is bigger than 8. In Figure 11a and b for Douban dataset, we can see that the predicted error reaches the minimum when the value of k gets closer to 16. In fact, the rating count in Douban is 10 times larger than that in Epinions, so we can conclude that k should be generally set to a larger value when the rating count in the dataset is higher. However, the prediction accuracy will decrease when k is greater than a threshold. Experimentally, when k value is 8 or 16, the HMF has a better recommendation accuracy in our experimental datasets.

  • (c) Effect of weight coefficients in our model

    In this section, we conducted a series experiments on Douban dataset to test the impact of different weight coefficients used in our model on its prediction error. Specifically, we evaluate the effect of parameters λ, α, β and γ. Parameters α, β, γ control the influence of user rating similarity, user feature similarity and item feature similarity on the prediction error respectively, and λ, being the regularization constant, plays a role in adjusting the strengths of different terms in the objective function (16). Larger values of the weight coefficients in the objective function in (16) indicate a stronger impact of the respective factor on the user and item latent feature. In order to analyze each parameter individually, we every time only change the value of the target parameter while keeping the values of others fixed.

Figure 9
figure 9

The effect of the number of iterations (MovieLens)

Figure 10
figure 10

The number of clusters experiments with MAE and RMSE on Epinions

Figure 11
figure 11

The number of clusters experiments with MAE and RMSE on Douban

In Figure 12, we set λ from 0.01 to 20, and other parameters were set as default. As shown in Figure 12a and b, our model achieves its better performance on Douban when λ ranges from 0.05 to 1, and the prediction error will increase when λ is set to a too small or too large value.

Figure 12
figure 12

The value of λ influence on prediction error which based on Douban dataset

As shown in Figure 13a–f, the accuracy of our model reaches its highest level on Douban when α = 5, β ranges from 0.5 to 2 and γ falls in the range of (1,5), which provides a good guidance to set their values in practice.

Figure 13
figure 13

The value of tradeoff parameters influence on prediction error which based on Douban dataset

To ensure the fairness of our experiments, we set λ = 0.1, the same configuration as in the competitive model. α, β and γ are set to 1 as an empirical value according to previous works. In order to obtain a more optimized combination of parameters, we choose the appropriate value for parameters combining from a single parameter optimal interval. Then, the experimental results of 16 representative combinations are presented in Table 9.

Table 9 Parameters combination performance on Douban dataset

Besides evaluating the effect of each weight coefficient individually on our recommendation model, we also explore different binary combinations of them to see how different combinations of factors (user rating similarity, user feature similarity and item feature similarity) affect the performance of our model. As shown in Figure 14, NULL denotes the model where α, β and γ values are zero (i.e., the model itself is reduced to BaseMF). UF denotes the model using only the user feature similarity (α = 0,β = 1,γ = 0), If denotes the model using only the item feature similarity (α = 0,β = 0,γ = 1) and UI denotes the model using only the user rating similarity (α = 1,β = 0,γ = 0). UF+If, UF+UI, If + UI correspond to the cases of using the combination of two appropriate factors, and finally ALL is a case where all the three factors are considered. Furthermore, two experiments were conducted under the normal and cold star circumstances. The results are shown in Figure 14. The settings under the cold start scenario are the same as those in Section 5.2.4 where the number of target user’s ratings is no more than 5. When each factor is used in the model individually, the symbol UI shows the highest prediction accuracy. The results suggest that the similarity of user ratings is the most important factor compared to others. From those experiments, we also obtained an interesting finding that the gap in the prediction accuracy using various factors become larger when dataset becomes increasingly sparse. Figure 14 clearly demonstrates the benefits of using the three different factors as a combination, which can apparently help improve the recommendation accuracy of our model in the two scenarios.

Figure 14
figure 14

Different combinations of tradeoff parameters influence on MAE and RMSE which test on Douban Movies Dataset. a and b are done under normal circumstance, c and d are done under cold start circumstances

5.2.4 Performance on cold start problem

Some users in social network provide a lot of ratings, but most users only provide very limited number of ratings [24, 38]. In this paper, we define the users who have provided no more than 5 ratings as cold start users [16]. We conduct an experiment on the cold start users by using Douban dataset, and the comparative results for different models are shown in Table 10. Our model, by considering both user feature and item feature, is able to decrease the recommendation error by more than 30%, 10%, 6% on RMSE and by more than 30%, 8%, 2% on MAE against BaseMF, SocialMF and PRM.

Table 10 Cold start performance comparison on Douban dataset

6 Conclusion and future work

With the advent of online social networks, exploiting the information hidden in the social networks to predict the behavior of users has become very valuable. Recommender systems have become very useful tools to mine the knowledge from the social networks to improve recommendation accuracy. In this paper, we presented a hybrid recommender system based on hypergraph topologic structure in social networks, and four types of relations, which are user-item relation, item-item relation, user-user relation and contextual information-user relation, have been taken into account in our recommendation model to improve its accuracy. We also conducted extensive experiments on three larger real-life datasets, and the experimental results show that our model enjoys higher recommendation accuracy over the existing major recommendation models. Although it has some advantages in its recommendation effectiveness, our model still has some limitations and there is room for further improvement in some aspects. Firstly, our HMF model requires access to more information in social networks such as user feature, item feature and contextual information. Secondly, our HMF model faces an increased computational complexity when user feature and item similarity are calculated.

There will be several interesting directions to explore for our future work. We want to extend the model to handle recommendation across multiple social networks as it is not uncommon that many users register in several different social networks. Just as presented in [45], shilling attack is one of great threat to recommender system, attack detection is also an important direction for our next work. Furthermore, since the information of user feature and context will be used and exploited, the issues of privacy preservation and protection in recommendation should be considered and further investigated.