Keywords

1 Introduction

In the information era, the recommendation system [14] has been designed to recommend relevant items to different users, by analyzing the characteristics of users and items. It has been applied in many successful system instances, e.g., Amazon product recommendation system [9], Netflix [1] and MovieLens [5].

With the development of Web 2.0, users are no longer satisfied with the information retrieved by keyword-based approaches, but tend to require the personalized information services according to their own preferences. Therefore, the tagging technique is designed, e.g., Folksonomy system [11], to enable users freely creating and using tags to describe the resources on the Web, and also sharing the tags with other users. In this way, the correlation between resources and the interaction between users can be effectively enhanced, since the tags describing the information of resources and users are more flexible and accurate to indicate the resource characteristics and the user interest preferences.

As the development of tagging technique, the concept of personalized information recommendation based on tags is proposed, e.g., tag-based recommendation system. Mishne first designed a simple automatic tag assignment system [12], which compared and clustered the user’s blog information to generate a list of tags, and then filtered and sorted these tags into a result set to recommend to the user. Firan et al. [7] suggested that social tags can not only represent the resources on the Web, but also indicate the preferences of users. Since then, more and more researchers applied the tagging technique in the field of personalized recommendation, and achieved effective performance. For example, Chen et al. [3] mapped the tags to the users and contents for filtering, and proposed a hybrid recommendation approach considering the correlation of tags and the social network; Ma et al. [10] proposed a collaborative recommendation system based on social network and tags, and trust network was introduced to improve the recommendation confidence. However, the tag-based recommendation system has to face the problem of data sparsity, which significantly affects the efficiency and effectiveness of recommendation. Motivated by this situation, we propose an innovative tag-based resource recommendation solution in this work.

In tag-based recommendation system, the data sparsity results from the potentially unlimited set of social tags by free user tagging. This situation also produces noisy tags and leads to the problems of tag redundancy and semantic fuzziness, which will increase the burden of recommendation process and reduce the accuracy. Therefore, we consider from the perspective of resource providers in practical applications, and design the recommendation algorithm based on regular tags and user behaviors, so as to effectively reduce the tag noise and data sparsity. The regular tags are those created and maintained by the official resource providers, which are more accurate and strict to describe the intrinsic attributes of resources. The user behaviors are those operations when users browse the resources, which can be recognized as the feedbacks to resource providers and reflect the user preferences.

Generally, most recommendation systems are based on the tagging records of the user group and the item group to make the recommendation using collaborative filtering [9]. However, these systems are short in scalability due to the sparse tag matrix, and in the process of recommendation, they ignore the personalized characteristics of individual users. In this work, we propose to build the “implicit” user-item scoring model, by integrating the regular tags and the user operations. We synthesize the resource tag characteristics, the user operations and the time factor to represent the user feature, and the user operations are utilized to weight the user-tag matrix, so as to embed the user preference. Based on the user representation and regular tags, we analyze the user preference on the existing resources and potential new resources. In the process of recommendation, the preference scores on new resource items for target user and his similar users are calculated to form a ranking list, and collaborative filtering is employed to propose the final recommendation.

We summarize the contributions of this work as follows:

  • We design the recommendation system from the perspective of resource provider, and propose regular tags to generate the standard tag system;

  • We propose a new model to represent the user feature, which integrates the tag characteristics, the user operation and the time factor.

  • The proposed approach is evaluated by practical system with extensive real dataset.

The following part of this paper is organized as below: In Sect. 2, we introduce the related works about tag-based recommendation approaches. Section 3 interprets our personalized recommendation strategy based on regular tags in details. The experimental study is provided in Sect. 4, and we conclude our work in Sect. 5.

2 Related Works

Generally, the tag-based recommendation systems suggest the resources to users by analyzing the tags and the rating scores assigned to the resources. A large number of approaches to improve the recommendation systems focus on the problem of data sparsity, including the social tag data and potential user rating data on the resources. Pan et al. [13] proposed to expand tag neighbors by calculating tag similarity and investigate the spectral clustering algorithm to filter out noisy and redundant tags, in this way to improve the recommendation accuracy. Yuan et al. [15] proposed a collaborative filtering recommendation algorithm based on a temporal interest evolution model and social tag prediction. The optimized tags are used to model the relationship between users, tags and resources, and the recommendation is made by community discovery and maximum tag voting. Durao et al. [6] proposed to extend the basic similarity calculation with external factors such as tag popularity, tag representativeness and the affinity between user and tag, so as to study and evaluate the recommendation system.

In addition, to alleviate the sparsity problem of user rating data, many algorithms propose to conduct the user characteristics from some indirect information derived from the user rating scores. For example, the number of user browsing and searching for resources is used as the user rating [8]. Cheng et al. [4] use the TF-IDF algorithm to calculate the attributes of users and resources respectively, and estimate the user preference for the resources based on the two feature vectors. However, it is not comprehensive to calculate the rating score only using the records of user browsing on the resources, but more discriminative information should be taken into account.

Compared with existing approaches, our work considers from the perspective of resource provider, so as to generate the standard tag system to avoid tag sparsity. The actual user behaviors on the resources are returned as the feedbacks, which are more discriminative and informative to indicate the users’ preference. Our recommendation algorithm will be based on these two components and follow collaborative filtering mechanism.

3 Resource Recommendation Based on Regular Tags

In this section, we will describe our proposed recommendation approach based on regular tags and user operations in details. We will formally introduce some basic concepts, and then interpret how to represent the users by tag-based information. After establishing the user preference model, we provide the recommendation algorithm in collaborative filtering mechanism.

3.1 Preliminaries

In our work, we consider from the perspective of resource provider, and design the recommendation model based on regular tags. The regular tags are those created by resource provider and assigned to each resource item to describe its characteristics. For example, a book item may be assigned the following regular tags: science fiction, Chinese and space travel. We employ regular tags because they are more accurate and strict than social tags, and based on which, the tag sparsity, redundancy and fuzziness can be effectively reduced in practical applications. Generally, in the recommendation system, we assume that the size of regular tag set is l, and the set can be denoted as \(T=\{t_1,t_2,\ldots ,t_l\}\). The user set contains m users and the item set contains n items, which can be represented as \(U=\{u_1,u_2,\ldots ,u_m\}\) and \(I=\{i_1,i_2,\ldots ,i_n\}\) respectively.

We also focus on the user feedbacks on the resources, which can be returned to the resource provider. Traditional user rating mechanism does not fully measure the user preference, because the rated items are usually no more than 1 % of the total number of items, so in large scale recommendation system, the user rating data will be extremely sparse, which will reduce the quality of the recommendation. Therefore, we collect the user behaviors after browsing the resources as the implicit user rating, e.g., reading, sharing or purchasing after browsing a book item. The different operations will reflect the user preference on the item. For example, a user shares item A with others after browsing it while does nothing after browsing item B, which indicates that compared with item B, the user prefers item A more. In the system, we assume f kinds of operation are defined, denoted as \(O=\{o_1,o_2,\ldots ,o_f\}\). For each operation \(o_i\), we assign it a weight \(w_i\)(\(1\le w_i \le f\)) to represent its importance, so as to quantify the discrimination between users.

Based on the concepts above, we establish the relationship between tag, user, item and operation weight. Naturally, we can collect the information of regular tags describing each resource item, and the item-tag relationship is defined as

$$R = \left( {\begin{array}{*{20}{c}} {{r_{11}}}&{}{{r_{12}}}&{} \cdots &{}{{r_{1l}}}\\ {{r_{21}}}&{}{{r_{22}}}&{} \cdots &{}{{r_{2l}}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {{r_{n1}}}&{}{{r_{n2}}}&{} \cdots &{}{{r_{nl}}} \end{array}} \right) $$

where \(r_{jk}=1\) means tag \(t_k\) is employed to describe item \(i_j\), and \(r_{jk}=0\) means tag \(t_k\) is not used. For each user, we can collect his operations on each resource item, and build the user-item relationship as

$$S = \left( {\begin{array}{*{20}{c}} {{s_{11}}}&{}{{s_{12}}}&{} \cdots &{}{{s_{1n}}}\\ {{s_{21}}}&{}{{s_{22}}}&{} \cdots &{}{{s_{2n}}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {{s_{m1}}}&{}{{s_{m2}}}&{} \cdots &{}{{s_{mn}}} \end{array}} \right) $$

where \(s_{jk}\) records the operation weight of user \(u_j\) on item \(i_k\). Here if \(u_j\) makes no operation on \(i_k\), \(s_{jk}=0\).

Since the operation behavior of the user on each item can express his preference, we can combine R and S to establish the relationship between users and tags, so the preference of a user on different tags can be estimated. Based on the weights assigned to each operation, we have the weighted user-tag relationship here:

$$G = \left( {\begin{array}{*{20}{c}} {{g_{11}}}&{}{{g_{12}}}&{} \cdots &{}{{g_{1l}}}\\ {{g_{21}}}&{}{{g_{22}}}&{} \cdots &{}{{g_{2l}}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {{g_{m1}}}&{}{{g_{m2}}}&{} \cdots &{}{{g_{ml}}} \end{array}} \right) $$

The \(g_{jk}\) denotes the weighted preference of user \(u_j\) on tag \(t_k\), and it is calculated as:

$$\begin{aligned} {g_{jk}} = \sum \limits _{e = 1}^n {{s_{je}}{r_{ek}}}. \end{aligned}$$
(1)

We can conclude that \(g_{jk}\) is the accumulated operation effect of \(u_j\) on tag \(t_k\) related resources. By such weighted user-tag relationship, the user preference on each tag can be more accurately estimated, so as to reflect the users’ interests.

3.2 User Feature Representation

In order to make personalized recommendation, it is essential to find the effective representation to reflect the discriminative characteristics of different users, which is also important for discovering similar users.

The user features are derived from the property of items and the user operations showing different preferences. Similar as text processing, TF-IDF has been applied to express the tag feature vector for different users [2], but it only describes the information of tag frequency, which is not sufficiently discriminative to describe the user characteristics. For example, user A and B browse book C at the same time, but A purchases the book, showing that user A favors this book more than B. Considering this, our work takes into account three aspects of the users to form the user feature representation: the tag feature, the operation feature and time factor.

The Tag Feature. In this work, the tag feature is to employ the user’s favorite tags to represent the user preference feature. The normalized TF-IDF is employed to calculate the tag feature vector, and for user u and tag \(t_k\), we have:

$$\begin{aligned} F_{u{t_k}}^{tag} = \frac{{{v_{u{t_k}}}}}{{\sum \nolimits _j {{v_{u{t_j}}}} }}\log \frac{m}{{{v_{{t_k}u}}}}\qquad (1\le k \le l). \end{aligned}$$
(2)

Here \({{v_{u{t_k}}}}\) is the counts that user u uses tag \(t_k\), and \({{v_{{t_k}u}}}\) is the number of users that use tag \(t_k\), so \(F_{u{t_k}}^{tag}\) can reflect the preference of user u towards tag \(t_k\).

The Operation Feature. The user operations on the resource items are important information to reflect the user preference on the items. In preliminary part, we have introduced the operation-weighted user-tag relationship, and the operation feature of the users will be based on this relationship.

In our work, the user’s long-term average preference will be applied to generate the operation feature, so as to improve the discrimination of users towards different tags. For user u and tag \(t_k\), the operation feature is calculated as

$$\begin{aligned} F_{u{t_k}}^{op} = {e^{\frac{{{g_{uk}}}}{{{v_{u{t_k}}}}} - \lambda }}\qquad (1\le k \le l). \end{aligned}$$
(3)

Here \({{g_{uk}}} (1\le j\le l)\) is the weighted user-tag preference calculated by Eq. (1), and \({{v_{u{t_k}}}}\) is for normalization. \(\lambda \) represents the minimum value of operation weight of user u, so as to remove the operation bias of different users. Equation (3) reflects the user feature for different tags according to the actual operations.

The Time Factor. In addition, we focus on an interesting factor that reflects the user preference: the time factor. It is generally believed that the most recent collected resources can best reflect the user’s interest, i.e., the tags used recently can best describe the user’s preference. A user-interest model [4] based on the forgetting mechanism has been proposed based on the adaptive exponential decay function to deal with the time information of tags. In our work, we apply the adaptive time decay function and the idea of Ebbinghaus forgetting curve, to define the user feature based on time factor. The formula for user u on tag \(t_k\) is designed as:

$$\begin{aligned} F_{u{t_k}}^{time} = \beta + (1 - \beta ){e^{ - ({d_{now}} - {d_{u{t_k}}})}} \qquad (1\le k \le l). \end{aligned}$$
(4)

Here \(d_{now}\) represents the current time point, and \(d_{u{t_k}}\) means the last time when tag \(t_k\) marks the user. \(\beta \in [0,1]\) is used to adjust the influence of the time factor in the user’s interest, and the influence of time factor is greater when \(\beta \) is smaller.

The Comprehensive Representation. According to the analysis above, we can now formally describe our user preference model. The user feature representation to describe his tag-based preference can be formulated by the tag feature, operation feature and time-based feature. For user u and tag \(t_k\), we have:

$$\begin{aligned} {F_{u{t_k}}} = F_{u{t_k}}^{tag} \cdot F_{u{t_k}}^{op} \cdot F_{u{t_k}}^{time}\qquad (1\le k \le l). \end{aligned}$$
(5)

So the feature vector of user u is: \(F_u=\{F_{u{t_1}},F_{u{t_2}},\ldots ,F_{u{t_l}}\}\).

3.3 User-Item Preference Analysis

The user preference for items is usually analyzed by the historical behavior of the user. Traditional collaborative filtering algorithm is based on user’s rating to reflect the interest preferences of the user and to estimate the resource similarity, which ignores the characteristics of users and resources, and thus decreases the recommendation quality for new resource items significantly. We propose to analyze the user preference by the prediction score of the user himself and the similar users, based on the regular-tag characteristics of all items. The user-item preference can be divided into two categories: one is the preference of user for historical items, i.e., the browsed items; the other one is for new items.

Preference for Historical Items. The user preference for historical resources can be estimated by the user-item relationship S. For those browsed items, the user operations are utilized to weight the item browsing records. For user \(u_j\) on item \(i_k\), the user-item preference can be estimated as:

$$\begin{aligned} P_{jk}^{hist} = \frac{{{s_{jk}}}}{{\sum \nolimits _i {{s_{ji}}} }}. \end{aligned}$$
(6)

Here the result is normalized to benefit further analysis, and \(s_{jk}\) is from user-item relationship matrix.

Preference for New Items. For new items, there are no existing user operation records. However, for each new item, the resource characteristics(tags) can be determined when the item is created, so the user preference can be estimated by comparing the user feature and the item-tag record. For user \(u_j\) on new item \(i_k\), we design the normalized user preference for new item by the following equation:

$$\begin{aligned} P_{jk}^{new} = \frac{{\sum \limits _{i = 1}^l {{r_{ki}} \cdot {F_{j{t_i}}}} }}{{\sum \limits _{k = 1}^n {\sum \limits _{i = 1}^l {{r_{ki}} \cdot {F_{j{t_i}}}} } }}. \end{aligned}$$
(7)

Here \(r_{ki}\) is from item-tag relationship matrix, and \(F_{j{t_i}}\) is the user feature value.

3.4 Top-K Recommendation Algorithm

Finally, we introduce the personalized recommendation algorithm, which recommend the new resource items to different users according to their personal preferences. Our framework follows the collaborative filtering mechanism. To recommend new items to a user, the algorithm will consider the preference of the target user and his similar users, and the preference scores will be ranked to provide the most favorite items for the target user.

Primely, for the target user, we need to find his similar users. Given the user feature representation by Eq. (5), we select the cosine similarity to calculate the similarity scores between the target user and other users, as follows:

$$\begin{aligned} sim({u_a},{u_b}) = \frac{{{F_{{u_a}}} \cdot {F_{{u_b}}}}}{{\left\| {{F_{{u_a}}}} \right\| \left\| {{F_{{u_b}}}} \right\| }}. \end{aligned}$$
(8)

Formally, given a target user \(u_j\), and a new item set for him, the estimation of the target user preference for each new item \(i_k\) can be divided into the following two parts:

  1. 1.

    Target user preference: Since we aim to recommend new items to the target user, his preference on the new item can be estimated by Eq. (7).

  2. 2.

    Similar user preference: We also take into account other users’ interests as a reference. There are two situations for the referred user, that if this user has browsed \(i_k\), his preference score can be calculated by Eq. (6), or else the score will be calculated by Eq. (7). The similarity between the target user and the referred user will be embedded as a weighting factor.

With the above two parts, we can design the preference score of \(u_j\) on item \(i_k\) as:

$$\begin{aligned} Score({u_j},{i_k}) = \alpha \cdot P_{jk}^{new} + (1 - \alpha )\frac{{\sum \limits _{{u_i} \in U,i \ne j} {sim({u_i},{u_j}) \times {P_{ik}}} }}{{\sum \limits _{{u_i} \in U,i \ne j} {sim({u_i},{u_j})} }}. \end{aligned}$$
(9)

Here \(\alpha \in [0,1]\) is a constant factor, which is used to express the significance of similar users on the recommendation result.

Based on Eq. (9), if we can calculate the preference scores of target user \(u_j\) on each new item, the preference result can be sorted in descending order, and the items with top K highest scores can be selected to recommend to the target user.

4 Experimental Evaluation

In this part, we will introduce the empirical study. The experiment setting will be introduced first, and then the effect of some important factors in our model will be reported. Finally we will present the comparison of our algorithm with some existing approaches.

4.1 Experiment Setting

Dataset. In our work, we evaluate the proposed recommendation algorithm on a practical resource publishing system, which is developed by WHUT Digital Communication Engineering Co., Ltd. The dataset contains a collection of 17735 tagged multimedia resource items, including books, articles, videos, etc. The system is maintaining 554 WeChatFootnote 1 public accounts, with a total of 674520 subscribers by the end of March 2016. The user data with operations on different resource items are collected through this platform.

Parameter Setting. According to the resource publishing on WeChat platform, the user behaviors on the resource items can be categorized into 6 operations. Different weights are assigned to each operation, and higher weight means higher degree of user preference on the item. The operations and the corresponding weights are listed in Table 1.

Table 1. The user operations and the weight assignment

Our user preference model contains two important parameters. \(\beta \) is used to control the influence of time factor in generating user feature representation. In our experiment, we set this parameter as \(\beta = 0,0.2,0.5,0.8,1.0\). The other parameter is \(\alpha \), which is used to calculate the user preference score in recommendation. In order to verify the impact of reference from similar users on the recommendation results, the value of \(\alpha \) is controlled as \(\alpha = 0,0.2,0.5,0.8,1.0\).

In our experiment, in order to precisely evaluate our algorithm, we select 2000 tags with most number of use in the tag system. 2000 popular items associated with these tags are chosen together with 1000 active users. The dataset is divided into training and testing set at the ratio of 4 : 1.

Evaluation Metrics. In this study, standard Precision and Recall rates are employed to evaluate the performance of the recommendation algorithm. Let R(u) be the recommended item list calculated by the recommendation algorithm according to the user behaviors in the training set, and T(u) is the actual behavior of users in the testing set. Precision is the ratio of the items in the list of recommendation hit the testing dataset, which is calculated as:

$$\begin{aligned} \text {Precision} = \frac{{\sum \nolimits _{u \in U} {\left| {R(u) \cap T(u)} \right| } }}{{\sum \nolimits _{u \in U} {\left| {R(u)} \right| } }} \end{aligned}$$
(10)

The Recall rate shows the proportion of recommended items in the users’ actual items collection, which assesses the integrity of the user’s interest, and is calculated as:

$$\begin{aligned} \text {Recall} = \frac{{\sum \nolimits _{u \in U} {\left| {R(u) \cap T(u)} \right| } }}{{\sum \nolimits _{u \in U} {\left| {T(u)} \right| } }} \end{aligned}$$
(11)

In addition, we uses the F-measure to evaluate the quality of the recommendation model, which is calculated as:

$$\begin{aligned} \text {F-measure} = \frac{{2 \times \text {Precision} \times \text {Recall}}}{{\text {Precision} + \text {Recall}}} \end{aligned}$$
(12)

4.2 Effect of \(\beta \)

We first test the effect of \(\beta \), which adjusts the influence of time factor in user feature presentation. In this test, we set \(\alpha =0.5\), and verify the influence of \(\beta \) on the precision and recall rate, and the results are recorded in Figs. 1 and 2. It shows that the precision rate will decrease as more items are recommended while recall rate will significantly increase. If recommend a few items, the short-term interest statistics of the user are better to describe the user’s need, while when recommend a large number of items, the long-term interest is more accurate to judge the user’s preference.

Fig. 1.
figure 1

Effect of parameter \(\beta \) on precision rate.

Fig. 2.
figure 2

Effect of parameter \(\beta \) on recall rate.

4.3 Effect of \(\alpha \)

\(\alpha \) is the parameter to adjust the importance of similar users in recommendation process. In this test, we set \(\beta =0.5\) and change \(\alpha \) value to study the effect of \(\alpha \) on the recommendation, and the results are shown in Figs. 3 and 4. From the curves we can discover that the overall performance follows the same trend as that of \(\beta \) when the number of recommended items increases. In addition, small \(\alpha \) value will result in high recommendation accuracy, and if \(\alpha =1\), the recommendation effect is relatively consistent. It means that, to a certain extent, the recommendation result is more accurate when the algorithm takes into account a large proportion of similar user information. The effect on recall rate behaves in the same way, which means the reference from similar users can introduce more resource items to cover the potential items selected by the target user.

Fig. 3.
figure 3

Effect of parameter \(\alpha \) on precision rate.

Fig. 4.
figure 4

Effect of parameter \(\alpha \) on recall rate.

Fig. 5.
figure 5

Comparison of different recommendation algorithms.

4.4 Comparison Study

We compare our algorithm with some conventional recommendation algorithms: user-based collaborative filtering (UCF) and tag-based collaborative filtering (TCF). Our regular-tag-based algorithm will be denoted as RTCF. The comparison results of precision rate, recall rate and F-measure are demonstrated in Fig. 5(a)–(c) respectively. From the figures, we can see that the proposed RTCF algorithm can achieve better performance than UCF and TCF algorithms in all indicators. When the number of recommended items increases, the precision rate will decrease naturally, but the recall rate will significantly improve. As the recall rate is more important in recommendation system, we can conclude that our proposed algorithm can achieve more satisfactory performance.

5 Conclusions

The conventional recommendation systems usually suffer from several problems, e.g., the item tag data and user rating data are extremely sparse, and the user feedbacks are not sufficiently utilized. This work considers from the perspective of resource provider, and proposes a recommendation algorithm based on the regular tags and user operation feedbacks. In order to precisely describe the user’s personal characteristics, an innovative user feature representation is proposed, integrating the information of regular tags, the user operation and the time influence. The final recommendation algorithm is designed based on collaborative filtering mechanism, with the user preference model on historical and new resource items. The experiments are conducted on real recommendation system with extensive users and resources. The influence of time factor and the reference of similar users are studied by recommendation accuracy test, concluding that the most recent resources and the similar users can better describe the user’s preference and improve the recommendation performance. Our algorithm is compared with some conventional approaches and the results show that the algorithm of this work is superior to the other algorithms.