Keywords

1 Introduction

Newly registered users usually have no or only a few purchase and comment records, this kind of user is known as cold-start user. To our knowledge, social relationships of users are not static all the time, as time pass by, cold-start user could have new friends, or might lose friends. It is not accurate to give recommendations for cold-start users based on former social relationship. In order to give accurate cold-start recommendation in a timely manner, many researchers have done a lot of efforts on real-time recommendation. Such as, researchers use time function to describe negative correlation attribute between time and trend of user interest. However, these efforts are based on cold-start user’s own state without considering other users could also change their social relationships and properties. Another issue is the scale of social network is large that could lead to high time-complexity of updating similar users for cold-start user.

Cold-start recommendation is one of the core technologies for many network applications. Researchers have done a lot of efforts on research of cold-start recommendation. For cold-start users have few historical information, research on cold-start recommendation mainly focus on mining, expanding information of cold-start users and finding similar users of cold-start user [1, 2]. The research on cold-start recommendation could be categorized into rule-based recommendation, content-based recommendation, collaborative filtering based recommendation, and trust based recommendation.

  1. (1)

    Rule-based recommendation methods extract user characteristics to make rules and give recommendations follow the rules. Effectiveness of rule-based recommendation method depends on accurate characteristics extraction and appropriate rules. Such as Ren select suitable candidate and user consumption pattern to build rule for recommendation [3]. Ling extracted characteristics of cold-start users, grouped users by characteristics, and gave recommendations based on cluster [4]. Rule-based recommendation method could give clear recommendations, however rule-based recommendation method requires a lot of user information to make appropriate rules, which is not suitable for cold-start recommendation.

  2. (2)

    Content-based recommendation methods extract keywords from commodity and users’ historical information and give recommendations according to similarity of their keywords [5]. The main steps of content-based recommendation method includes: keywords extraction; find similar users, and give recommendations according to similar user’s records. The effectiveness of content-based recommendation methods depend on accurate keywords extraction. Such as Cui proposes a latent class statistical mixed model TCAM, which could give recommendations in real-time [6]. Content based recommendation methods also need a lot of historical records, and its recommendations are usually duplicate with users’ historical records.

  3. (3)

    Collaborative filtering based recommendation methods give recommendations according to similarity degree of users or similarity degree between users and items [7]. Collaborative filtering based recommendation method could be categorized into user-based collaborative filtering recommendation method and item-based collaborative filtering recommendation method. User-based collaborative filtering recommendation methods normalize and compare similarity against comments of same items given by users. Item-based collaborative filtering recommendation methods calculate similarity of items and predicted ratings of item by user-item matrix. The effectiveness of collaborative filtering recommendation depends on degree of data sparseness of user-item matrix. It is difficult to determine similar users for cold-start user by sparse user-item matrix. To solve this issue, Jamali proposed a method using Non-Negative Matrix Factorization (NMF) which divided original user-item matrix into two low rank matrixes. The product of low rank matrix equals the original matrix [8]. Ma proposed a recommendation method based on probability matrix factorization (PMF) which introduce a condition that deviation between actual ratings and estimated ratings follow normal distribution [9]. Wu proposed a recommendation method which involved user preferences factor to improve accuracy of recommendation [10]. In order to handle user interest drift issue, Koren proposed a time-sensitive collaborative filtering recommendation method to distinguish transient effects and long-term patterns of user interest [11]. Gu proposed a collaborative filtering recommendation method based on balance score prediction mechanism, which could adjust dynamic right and overall weight [12]. Collaborative filtering based recommendation methods could avoid adverse effectiveness of incomplete or inaccurate characteristics extraction, but still need plenty of users’ historical information for recommendation.

  4. (4)

    Trust based recommendation methods assume trust relationship is a stable social relationship. Researchers have done a lot of work on build trust relationship between users, including trust propagation strategy [13, 14], verifying small world feature of trust network [15], building trust network based on small world feature and weak relationship among users [16]. Wang proposed a trust based recommendation method which use gaming between relative feature and absolute feature in process of trust propagation [17]. Guha proposed a trust based recommendation method which could build trust relationship among users by a few trust or distrust [18]. Sun proposed a trust based recommendation method which amends deviation of recommendations according to Bayesian decision theory [19]. Kang proposed a trust based recommendation method which analyzed excessive trust propagation by dynamic social network model [20]. However existing researches ignore that trust relationship is only one kind of relationship in social network, and trust relationship could not comprehensively describe users’ characteristics for recommendation.

As the increasing popularity of internet and social network service, there are various types of social relationships among users. Each kind of social relationship could reflect a characteristic of user. So we could predict users’ preference by these social relationships. Cold-start users might have social relationship. It is feasible to find similar users for cold-start user according to social relationship, and give recommendations for cold-start user by similar users’ records of purchase and comments. However, cold-start user and other users’ social relationship may change as time pass by, in order to give accurate and timely in manner recommendations for cold-start users, we proposed an incremental graph pattern matching based dynamic cold-start recommendation method (IGPMDCR), which updates similar users for cold-start user based on topology of social network, and give recommendations for cold-start users by latest similar user’s records.

2 Dynamic Cold-Start Recommendation Using Incremental Graph Pattern Matching

In order to give accurate and timely in manner recommendations for cold-start user, we proposed an incremental graph pattern matching based dynamic cold-start recommendation method (IGPMDCR) to update similar users for cold-start user continuously, and give recommendations based on latest updated similar users’ records. The main steps of IGPMDCR are: (1) define topology of social network, (2) find similar users for cold-start user based on topology of social network, (3) update similar users by incremental graph pattern matching, (4) give recommendations according to latest similar users’ records.

2.1 Definite Topology of Social Network

IGPMDCR updates similar users for cold-start user based on topology of social network, in which topology of social network is a directed graph G = (V, E), where (1) V is a finite set of nodes, each node denotes a user; (2) E ⊆ V × V, in which (u, u′) denotes an edge from node u to u′ Topology of social relationship is composed of nodes and edges, where nodes denote users and edges denote relationship among users. In topology of social relationship, edges have two attributes, one is type of social relationship, and the other denotes degree of tightness between users. f e(u, u′) denotes distance between u and u′ in a social relationship. Shorter distance means more tightness relationship, longer distance means less tightness relationship. f v(u, u′) denotes type of social relationship. We regard topology of cold-start user’s social relationship as pattern, and use graph pattern matching to find similar users with the pattern in directed graph G. We argue that the similar users have similar topology of social relationship with that of cold-start user.

2.2 Find Similar Users Based on Topology of Social Network

Similar users of cold-start users are found according to topology of social relationship. We proposed bounded graph simulation match algorithm to find similar users for cold-start user.

In bounded graph simulation match algorithm, cold-start user’s topology of social relationship is regard as pattern. We use this pattern to match similar users in topology of social network. Before illustrating the algorithm, we first present notations it uses. Let u denote cold-start user. u′ denotes user who have relationship with u, x, y, z, v, v 1 and v′ denote nodes in G. f v (u, u′) denotes type of social relationship between user u and u′. f e(u, u′) denotes distance between user u and u′. The value of f e(u, u′) could be integer k or ∞. V C denotes nodes in pattern, and E C denotes edges in pattern. V denotes all nodes in social network. Matching pattern is defined as Q = (V C , E C , f v , f e). Let \( S \subseteq V_{C} \times V \), the match condition of any node v is, (a) attribute of v is similar with attribute of u. (b) for (u, u′), there exist a non empty path v/…/v′ whose length is no more than f e(u, u′), and \( (u,v) \in S \). (c) for (u, u′), there exists v/…/v′ make f v(v,v 1)…f v(v n, v′) satisfy f v(u, u′) on the path of (u, u′), where node order of v/…/v′ is (v, v 1,…,v n, v′). Bounded graph simulation match algorithm is shown as Algorithm 1.

The bounded graph simulation match algorithm, referred to as BGSM, Given P and G, BGSM returns a maximum match S for P or it returns empty set empty otherwise. Before illustrating process of the BGSM algorithm, we present notations in the algorithm, (1) distance matrix Matr maintains distances between all pairs of nodes in G. (2) For each node u in pattern P, we use a set mat(u) to record nodes in G that may match u, and a set demat(u) for those nodes that cannot match any parent of u. (3) For each node, we use anc() finds ancestor of a node by depth-first algorithm; desc() is used to find descendent of a node. (4) For each node xV and edge \( (u\prime ,u) \in {\text{E}}_{\text{C}} \), anc(f e(u′, u), f v (u′), x) records nodes x′ in the graph G such that (i) the distance from x′ to x is within the bound imposed by f e(), i.e., len(x′/· · ·/x) ≤ f e (u′, u), and (ii) f v (x′) satisfies the predicate f v (u′) defined on u′; similarly for desc(f e (u, u′), f v (u′), x), for descendants of x. BGSM algorithm computes the distance matrix Matr for G in line 1. It then computes ancestor using anc() and descendent using desc() by inspecting the predicates and bounds specified in the pattern P in lines 2–3. For each node of the pattern, if u∈VC, the algorithm also initializes mat(u) and demat(u) using P and Matr in lines 4–6. For each parent node u′ of u, algorithm then refines mat(u′) by removing those nodes in G that can not match u′, namely, nodes z∈demat(u) in lines 8–9. Moreover, it utilizes z to identify nodes z′ that cannot match any parent u′′ of u′, and includes z′ in demat(u′) in lines 11–14. More specifically, z′ is not a candidate match of u′′, if z is the only descendant of z′ that is within the bound f e (u′′, u′), satisfies the predicate f v (u′), and is in mat(u′). In lines 7–15 the process iterates until no mat() can be reduced, i.e., if demat(u) is empty for all pattern node u. The nodes remaining in mat(u) are those that match u, and are collected in S, which is returned as the match in lines 16–18. If mat(u) is empty for any u ∈ VC in the process, u cannot find a match in G, and algorithm returns empty in line 10. Time complexity of the algorithm is depended on product of nodes and edges in pattern P and G, i.e. O((|V| + |VC|)(|E| + |EC|)), expanding (|V| + |VC|)(|E| + |EC|), we get (|V||E| +|V||EC| + |E||VC| + |VC||EC|), usually number of nodes in P is much less than that of G. |E| and |V|2 are approximately equal. So Time complexity of the algorithm could be simplified as O(|V ||E| + |EC||V| + |VC||V|2).

2.3 Update Similar Users for Cold-Start User Based on Incremental Graph Pattern Matching

An incremental graph pattern matching is proposed to dynamically update similar users for cold-start user. Incremental graph pattern matching (IGPM) updates similar users for cold start users according to the change of path between users in G. IGPM iteratively finds change of social relationship according to length of path between users. Firstly, determine affect field AFFcs caused by change of social relationship. If length of certain path beyond a threshold value, we argue corresponding social relationship is too weak to find similar users, and could be deleted. Deleted nodes might cause length of some other paths beyond threshold value, so we have to find the deleted nodes’ predecessor nodes, determine whether its path match that of P and there is only one path in maximum matching set M(P, G) satisfy P. Algorithm of incremental graph pattern matching is shown as Algorithm 2. In Algorithm 2, Gr = (Vr, Er) denotes original match set of cold-start users, Ecs stores edges in AFFcs. If e = (v′; v) is a path of matching pattern then iteratively find nodes which match v′, and do not match P anymore. If e = (v′; v) is not a path of matching pattern, then leave Gr as it is. Time complexity of the Algorithm 2 depends on product of nodes and edges in pattern P, updated edges in E cs . i.e. O((|E c | + |V C |)(|E cs |)), so Time complexity of the algorithm could be O(|E c ||E cs | + |E cs ||V C |) whose time-complexity is much less than that of updating similar users in the entire G.

2.4 Give Recommendations According to Similar Users’ Records

We give recommendations for cold-start user according to similar users’ records of purchase and comments. Unknown ratings of items for cold-start user are predicted according to user-item matrix. Each row of user-item matrix denotes a user’s ratings. Each column of user-item matrix denotes users’ ratings on one item. In user-item matrix, an item with higher rating means this item has more probability to be chosen. We select items with highest rating in user-item matrix as recommendations for cold-start user. The unknown ratings of cold-start user in user-item matrix are given by Eq. (1), and recommendations for cold-start user are given by Eq. (2). In Eq. (1), \( r_{u,i} \) denotes rating of user u on item i. \( w_{a,u} \) denotes similarity among users, the range of \( w_{a,u} \) is from 0 to 1, in which 1 denotes user a and u are completely similar, while 0 means user a and u are totally dissimilar. \( \bar{r}_{a} \) and \( \bar{r}_{u} \) denote average ratings given by user a and u. Equation (2) gives top N items with highest rating as recommendations for cold-start user, in which N denotes the number of recommendations.

$$ p_{a,i} = \bar{r}_{a} + \frac{{\sum\nolimits_{u = 1}^{t} {w_{a,u} (r_{u,i} - \bar{r}_{u} )} }}{{\sum\nolimits_{u = 1}^{t} {w_{a,u} } }} $$
(1)
$$ Top(a,N): = \mathop {\hbox{max} }\limits_{{}}^{N} p_{a,i} $$
(2)

3 Experimental Results and Analysis

In order to verify that IGPMDCR could give accurate and timely in manner recommendations, we select following state-of-art recommendation methods to compare with IGPMDCR, neural learning based recommendation method proposed by Bobadilla [1], referred to as Bobadilla’s method, classification based collaborative filtering recommendation method proposed by Lika [2], referred to as Lika’s method, vector cosine matrix based top-N recommendation method [4], referred to as Ling’s method.

3.1 Experimental Data and Evaluation Method

The experimental dataset is chosen from Epinions.com and YouTube. Epinions.com is a consumer review site. At Epinions.com, visitors could read reviews about a variety of items to help them purchase commodity. Epinions.com contains about 131,828 nodes, 841,372 edges, which is a large scale social network. We remark users who have less than three purchase and comment records as cold-start user. Cold-start users account for 18.37 % of this dataset. We select multiple social relationships from Epinions.com, includes interest groups, comment forwarding relationship, concern blogs, trust relationship, and adoption of evaluation to describe characteristics of users.

YouTube is a video sharing site. YouTube consists of various types of interactions. We crawled 30, 522 user-profiles. Based on the crawled information, we construct 5 different relationships among 30, 522 users. These relationships include: contact network between the 30, 522 users; number of shared friends; number of shared subscriptions among users; number of shared subscribers among users; number of shared favorite videos. We remark users who have less than three shared subscriptions and shared favorite videos as cold-start user. Cold-start users account for 27.37 % of this dataset.

The entire dataset is divided into training and test sets. We choose root mean square error (RMSE) as the benchmark. RMSE measure deviation between predicted ratings and actual ratings [17], definition of RMSE is shown as Eq. (3), where \( r_{u,i} \) denotes actual rating of item i given by user u. The value range of RMSE is from 0 to 5. Smaller RMSE means less deviation between predicted ratings and actual ratings.

$$ \text{RMSE} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {(r_{u,i} - \hat{r}_{u,i} )^{2} } }}{N}} $$
(3)

3.2 Experimental Result

In order to verify the recommendation effectiveness of IGPMDCR during a period of time, we compare IGPMDCR, Bobadilla’s, Lika’s, Ling’s method in a period of 640 days. We divide the entire period of 640 days into 9 stages. In dataset obtained from Epinions.com, the training data accounts for 5 %, 15 %, 25 %, 35 % respectively, and the corresponding recommendation effectiveness is shown as Fig. 1(ad).

Fig. 1.
figure 1figure 1

Comparison of recommendation effectiveness against RMSE on Epinions.com.

Users’ social relationship might change as time pass by, it is reasonable to update the social relationship to get accurate recommendations. The gap between actual social relationship and predicted social relationship given by methods might grow as time pass by, so the RMSE of recommendation during early stages is less than that of the latter stages. In Fig. 1(ad) during the first two stages, RMSE of IGPMDCR is almost same with that of Bobadilla’s, Lika’s, Ling’s method. The gap between RMSE of IGPMDCR and that of Bobadilla’s, Lika’s, Ling’s method becomes larger from the third stage. During the fourth to the seventh stage, the gap between RMSE of IGPMDCR and that of Bobadilla’s, Lika’s, Ling’s method continuously increases. In the last stage, the gap between RMSE of IGPMDCR and that of Bobadilla’s, Lika’s, Ling’s method increases sharply. In the 640th day, the gap of RMSE of IGPMDCR and that of Bobadilla’s, Lika’s, Ling’s method reaches the maximum value. In the entire period, RMSE of IGPMDCR is less than that of Bobadilla’s, Lika’s, Ling’s method. This means IGPMDCR could get the latest similar users for cold-start user and get more accurate recommendations.

In dataset obtained from YouTube, the training data also accounts for 5 %, 15 %, 25 %, 35 % respectively, and the corresponding recommendation effectiveness is shown as Fig. 2(ad). In the first two stages, RMSE of IGPMDCR is almost same with that of Bobadilla’s, Lika’s, Ling’s method. The gap between RMSE of IGPMDCR and that of Bobadilla’s, Lika’s, Ling’s method becomes larger from the third stage. During the fourth to the seventh stage, the gap between RMSE of IGPMDCR and that of Bobadilla’s, Lika’s, Ling’s method continuously increases. In the last stage, the gap between RMSE of IGPMDCR and that of Bobadilla’s, Lika’s, Ling’s method increases sharply. In the 640th day, the gap of RMSE of IGPMDCR and that of Bobadilla’s, Lika’s, Ling’s method reaches the maximum value. In the entire period, RMSE of IGPMDCR is less than that of Bobadilla’s, Lika’s, Ling’s method. This also means IGPMDCR could get the latest similar users for cold-start user and get more accurate recommendations. According to the experimental result on dataset obtained from Epinions.com and YouTube, we argue IGPMDCR could update similar users as time pass by, and give accurate recommendations for cold-start user.

Fig. 2.
figure 2figure 2

Comparison of recommendation effectiveness against RMSE on YouTube.

4 Conclusion

Accurate recommendations for cold-start users could win trust of users, and improve attractive force of E-commerce system. Social relationship could reflect users’ characteristics. Recommendations based on social network have been proved effective. However, social relationship among users might change as time pass by. Recommendations based on the former social relationship are inaccurate. In order to give accurate and timely in manner recommendations for cold-start user, we proposed an incremental graph pattern matching based dynamic cold-start recommendation method (IGPMDCR) which updates similar users for cold-start user according to the latest social relationship, and gives recommendations based on latest similar users’ records. The experimental results on both Epinions.com and YouTube dataset show that, IGPMDCR could give accurate and timely in manner recommendations.