Tag recommendation method in folksonomy based on user tagging status

Yu, Hong; Zhou, Bing; Deng, Mingyao; Hu, Feng

doi:10.1007/s10844-017-0468-1

Tag recommendation method in folksonomy based on user tagging status

Published: 06 June 2017

Volume 50, pages 479–500, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Tag recommendation method in folksonomy based on user tagging status

Download PDF

Hong Yu¹,
Bing Zhou²,
Mingyao Deng¹ &
…
Feng Hu¹

1082 Accesses
14 Citations
Explore all metrics

Abstract

A folksonomy consists of three basic entities, namely users, tags and resources. This kind of social tagging system is a good way to index information, facilitate searches and navigate resources. The main objective of this paper is to present a novel method to improve the quality of tag recommendation. According to the statistical analysis, we find that the total number of tags used by a user changes over time in a social tagging system. Thus, this paper introduces the concept of user tagging status, namely the growing status, the mature status and the dormant status. Then, the determining user tagging status algorithm is presented considering a user’s current tagging status to be one of the three tagging status at one point. Finally, three corresponding strategies are developed to compute the tag probability distribution based on the statistical language model in order to recommend tags most likely to be used by users. Experimental results show that the proposed method is better than the compared methods at the accuracy of tag recommendation.

An Optimized Tag Recommender Algorithm in Folksonomy

Keep Querying and Tag on: Collaborative Folksonomy Using Model-Based Recommendation

Personalized Search by a Multi-type and Multi-level User Profile in Folksonomy

Article 27 February 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content; this practice is also known as social tagging, social classification, social indexing and collaborative tagging (Trant 2009). Social tagging is widely used in various web sites to collect, retrieve and share information. For example, the CiteULike (http://www.citeulike.org/) uses tags for sharing bibliographic references, the Delicious (https://delicious.com/) uses tags for social bookmarking, the Last.fm (http://www.last.fm/) uses tags for sharing music listening habits, and the MovieLens (http://movielens.org/) uses tags for helping users to find the right movies.

These folksonomies allow users to annotate resources with their own tags, and tagging allows users to classify and find information collectively. Especially, for multimedia resources like music, photos or videos, tagging resources is the only feasible way to organize multimedia data and to make it searchable. These tags can be freely chosen by a user and are not restricted to any taxonomy (Krestel and Fankhauser 2012). Many existing studies have investigated a variety of co-occurrence patterns between entities from a folksonomy system. The unsupervised tagging results in some benefits like flexibility, quick adaption and easy usability, but also presents some challenges; for example, the wide variety of tags assigned by users can be redundant, ambiguous or entirely idiosyncratic (Hu et al. 2012).

Tag recommendation can deal with these challenges by suggesting tags that users are most likely to use for a resource. Recommending tags can serve various purposes, such as: increasing the chances of getting a resource annotated, reminding a user what a resource about, and consolidating the vocabulary across the users (Marinho et al. 2011). Furthermore, as Sood et al. (2007) pointed out that, tag recommendations “fundamentally change the tagging process from generation to recognition”, which requires less cognitive effort and time. So more researchers and Internet enterprises pay highly attentions to tag recommendation. Recently, scholars have put forward various tag recommendation approaches, which mainly include the collaborative filtering approaches, the graph-based approaches, the content-based approaches and the hybrid approaches. The related work is introduced in the next section.

The existing achievements seldom consider the fact that users’ tagging behavior changes with time. However, according to the statistical analysis, we find that the total number of tags used by a user changes over time in a social tagging system. In this paper, we study the tag recommendation method by considering the phenomenon that users’ tagging behavior changes with time. We first propose three types of user tagging status, namely the growing status, the mature status and the dormant status; and the determining user tagging status algorithm is also devised. After analysing the characteristics of user tagging status, we present three corresponding tag recommendation strategies by computing tag probability distribution in users’ and resources’ tag space, based on the statistical language model. Finally, the results of comparison experiments on the CiteULike dateset and the Last.fm dateset show that the proposed tag recommendation method is better at the accuracy than the comparative approaches as the FolkRank (Kim and El Saddik 2011), the LocalRank (Kubatz et al. 2011) and the most popular tags ρ-mix (Jäschke et al. 2008).

The remainder of the paper is structured as follows. Section 2 briefly reviews the related work. Then we introduce some basic concepts in Section 3. In Section 4, we formalize the concept of user tagging status, and present the determining user tagging status algorithm. Section 5 brings the further tag recommendation method based on different user tagging status. The comparative experiment analysis on two social tagging systems are described in Section 6. Finally, some conclusions and discussions are given in Section 7.

2 Related work

In recent years, scholars have put forward various tag recommendation approaches. Generally speaking, these approaches could be divided into four categories, namely the collaborative filtering approaches, the graph-based approaches, the content-based approaches and the hybrid approaches.

Collaborative filtering is a common technique used by recommender systems. The traditional collaborative filtering methods cannot be applied directly, unless we reduce the ternary relation to a lower dimensional space, because there exists the ternary relationships among users, resources and tags in a social tagging system. Lu et al. (2011) developed a post-based collaborative filtering framework to recommend tags based on the query user’s tagging history and tags that have been associated with the query document, leveraging the ternary relationships. Liu et al. (2011) injected the social relations between users and the content similarities between resources into a graph representation of folksonomies, exploited random-walk computation of similarities, and combined both the collaborative information and the tag preferences to recommend tags. Wang et al. (2013) put forward a novel hierarchical Bayesian model, which can seamlessly integrate the item-tag matrix, item content information and social networks between items into the same principled model based on extending the collaborative filtering approaches. Ma et al. (2015) proposed a recommendation approach fusing user-generated tags and social relations into a novel way, in order to solve the data sparsity problem and improve the recommendation accuracy.

The basic idea of graph-based approaches is to construct a graph with users, resources and tags as vertices and build edges according to user’s tagging behavior (Liu et al. 2010). The kind of method dose not need consider the content of resources and semantic information of tags. Kim and El Saddik (2011) introduced a new way to compute the probabilistic interpretation in FolkRank by representing it as a linear combination of the personalized PageRank vectors. However, one of the major disadvantage of FolkRank is the steep computational costs. In contrast to the previous graph-based algorithms, Kubatz et al. (2011) computed the rank weights of tags only based on the tag space of a given user and resource. Ramezani (2011) suggested to improve the existing graph-based tag recommendation techniques by introducing a new model of the folksonomy as a directed graph. Rawashdeh et al. (2013) proposed to adapt the Katz measure in social tagging systems from a graph-based perspective. Cai et al. (2016) proposed the GRETA, a novel graph-based approach to assign tags for repositories on GitHub, based on constructing an Entity-Tag Graph (ETG) for GitHub using the domain knowledge from StackOverflow, and assign tags for repositories by taking a random walk algorithm. Hmimida and Kanawati (2016) proposed a graph-coarsening approach where a community detection algorithm is applied in the diversiform networks to speed up the execution time of graph-based tag recommenders in large-scale folksonomies.

The content-based approaches usually employ content of resources and adopt machine learning technology to recommend tags. Krestel and Fankhauser (2012) thoroughly investigated the use of language models for tag recommendation, showing that simple language models built from users and resources yield competitive performance while consuming only a fraction of the computational costs compared to more sophisticated algorithms. By modeling the generating process of social tagging systems in a Latent Dirichlet Allocation approach, Zhang et al. (2012) built a fully generative model for social tagging, leveraged it to estimate the relation among users, tags and resources in order to achieve the tag recommendation tasks. To learn the weights of different types of nodes and edges represented by features, Feng and Wang (2012) proposed an optimization framework, which learnt the best feature weights by maximizing the average area under the Curve of the tag recommender. Wu et al. (2016) proposed a generative model, where they can generate the words based on the tag-word distribution as well as the tag itself. Xie et al. (2016) proposed a novel generic model SenticRank to incorporate various sentiment information to various sentiment-based information for personalized recommendation by user profiles and other information.

Generally speaking, the hybrid approaches combine two or more than two kinds of tag recommendation algorithms. Gemmell et al. (2010) proposed a weighted linear hybrid incorporating simple popularity and collaborative filtering components, and the success of the hybrid over the lower-dimensional components demonstrates clearly the importance of an integrative approach that exploits multiple dimensions of the data. Belém et al. (2014) had proposed a personalized and object-centered tag recommendation methods for Web 2.0 applications. Kim and Kim (2014) investigated association rule, bigram, tag expansion, and implicit trust relationship for providing tag and item recommendations on a social tagging recommendation system. Wei et al. (2016) proposed a hybrid movie recommendation approach based on the user’s annotating information to improve the ability of fusion and give the personalized recommendation services.

Furthermore, in the past few years, we have witnessed great advances in many perception tasks by using deep learning models. Wang and Yeung (2016) proposed a general framework for Bayesian deep learning and discussed the applications of deep learning on recommender systems, topic models and control. Wang et al. (2015) proposed a hierarchical Bayesian model called collaborative deep learning, which jointly performs deep representation learning for the content information and collaborative filtering for the ratings matrix.

However, the existing achievements seldom consider users’ tagging behavior changes with time. In fact, the tagging behavior varies during different time. For example, the user might tag resources frequently during a period, the user might tag resources occasionally during a period, or the user might seldom tag resources during a period. Thus, this paper studies the tag recommendation method by considering the fact that users’ tagging behavior changes with time.

3 Basic concepts

3.1 Social tagging system

Folksonomy (Vander Wal 2007), a term coined by Thomas Vander Wal in 2004, is the basic data structure of the social tagging system.

Formally, a folksonomy is a quadruple $ \mathbb {F}=(U,R,Tag,Y)$, where U = {u ₁,⋯ , u _k,⋯ ,u _K}, R = {r ₁,⋯ ,r _l,⋯ ,r _L} and T a g = {t a g ₁,⋯ ,t a g _m,⋯ ,t a g _M} are finite sets, whose elements are called users, resources and tags, respectively; K, L, and M are the numbers of users, resources, and tags, respectively. Y is a ternary relation among them, that is Y ⊆ U × R × T a g.

The ternary relation Y can be transferred to three binary relations, and each binary relation can be described by a matrix. That is, matrices UTag _K×M, RTag _L×M, and UR _K×L represent the user-tag, the resource-tag and the resource-tag relations, respectively. Set the element of UTag _K×M be $w_{u_{k}tag_{m}}$, where $w_{u_{k}tag_{m}}$ represents the number of resources which are labeled as the tag t a g _m by the user u _k. Set the element of RTag _L×M be $w_{r_{l}tag_{m}}$, where $w_{r_{l}tag_{m}}$ represents the number of users who use the tag t a g _m to label the resource r _l. Set the element of UR _K×L be $w_{u_{k}R_{l}}$, where $w_{u_{k}R_{l}}$ represents the number of tags which are labeled by the user u _k to the resource r _l.

Let $Tag_{u_{k}}$ be the set of tags used by the user u _k, and $Tag_{r_{l}}$ be the set of tags assigned to the resource r _l. Each post a of the folksonomy consists of three parts: a user u _k, a resource r _l and all tags in T a g(u _k,r _l). That is, a = (u _k,r _l,T a g(u _k,r _l)). T a g(u _k,r _l) is a set of tags that the user u _k has assigned to the resource r _l. All posts of the social tagging system constitute the post set A.

For a given user u _q ∈ U and a given resource r _q ∈ R with $Tag(u_{q},r_{q})\neq \varnothing $, the task of a tag recommendation is to recommend a set of tags, $\widehat {Tag}(u_{q},r_{q})$, with a tag recommendation algorithm, where $\widehat {Tag}(u_{q},r_{q}) \subseteq Tag$. In many cases, $\widehat {Tag}(u_{q},r_{q})$ is computed by generating a ranking on the set of tags according to some quality or relevance criterion, from which then the top n elements are selected and recorded in $\widehat {Tag}^{n}(u_{q},r_{q})$.

3.2 Statistical language model

Statistical language model (Ponte and Croft 1998) (abbreviated as SLM) is widely used in natural language processing fields, such as speech recognition, information retrieval and machine translation. Essentially, it is a probability distribution model, mainly describes the inherent laws of statistics and structure of natural language. The set of all strings is a language, and a language model is called the probability distribution model of strings in the language.

In the field of information retrieval, the basic idea of statistical language model is to explain the correlation between a query q and a document d to produce a probability model of query from the document, i.e. $p_{LM}(q|d)={\prod }_{w\in q}p(w|d)$, where w is a word of the query, and p(w|d) is the probability of querying the word w from the document d, which is calculated as follows:

$$\begin{array}{@{}rcl@{}} p(w|d)&=&\frac {N_{d}}{N_{d} + \lambda} \times \frac{tf(w,d)} {N_{d}} \\ &&+ \left( 1 - \frac {N_{d}}{N_{d}+\lambda} \right) \times \frac{tf(w,D)}{N_{D}} , \end{array} $$

(1)

where N _d is the length of the document d with the word as the unit, t f(w,d) is the word frequency of w in the document d, N _D is the total number of words in all the documents, t f(w,D) is the word frequency of w in all the documents, λ is a Dirichlet smoothing factor whose value is set to be the average document length in the document set, i.e. λ = N _d/N _D.

4 User tagging status

4.1 Related definitions

Let us observe the change of total numbers of tags that the user owned during a period of time T. Let the start moment be T ₀, and we take equal interval as observation points (in the following experiments, the period of a month is chosen as a unit of time), then the next moment is T ₁. Suppose the current moment to be T _t.

The set $Tag^{u_{k}T_{t}}$ consists of different tags used by the user u _k in a unit time interval, i.e. [T _t−1,T _t). The $f_{u_{k}}(T_{t})$ indicates the number of tags used by the user u _k in [T _t−1,T _t), that is:

$$ f_{u_{k}}(T_{t})=|Tag^{u_{k}T_{t}}|. $$

(2)

The $g_{u_{k}}(T_{t})$ is the number of tags used by the user u _k in the time interval [T ₀,T _t), that is:

$$ g_{u_{k}}(T_{t})=|\bigcup \limits_{\tau =0}^{t}Tag^{u_{k}T_{\tau}}|. $$

(3)

For the user u _k, the tags used before the moment T _t−1 are called the historical tags of the user at the moment T _t. Obviously, the number of historical tags is $g_{u_{k}}(T_{t})=|\bigcup \limits _{\tau =0}^{t-1}Tag^{u_{k}T_{\tau }}|$. The tags, which have not used before the moment T _t−1 but used in [T _t−1,T _t), are called the new tags for the user. The number of new tags is $g_{u_{k}}(T_{t})=|\bigcup \limits _{\tau =0}^{t}Tag^{u_{k}T_{\tau }} \backslash \bigcup \limits _{\tau =0}^{t-1}Tag^{u_{k}T_{\tau }}|$.

Let us observe what will happen during the time period [T _t−1,T _t), the user u _k may tag or not. Thus, we should discuss in two cases:

Case 1: the user u _k has no tagging behavior, i.e. $f_{u_{k}}(T_{t})=0$.

Case 2: the user u _k has tagging behavior, i.e. $f_{u_{k}}(T_{t})\neq 0$.

In the Case 2, we need to consider three aspects:

The u _k uses both new tags and historical tags, i.e. $0< \frac {g_{u_{k}}(T_{t})-g_{u_{k}}(T_{t-1})}{f_{u_{k}}(T_{t})}<1$.

The u _k uses only new tags, i.e. $\frac {g_{u_{k}}(T_{t})-g_{u_{k}}(T_{t-1})}{f_{u_{k}}(T_{t})}=1$.

The u _k uses only historical tags, i.e. $g_{u_{k}}(T_{t})-g_{u_{k}}(T_{t-1}) =0$.

During a period of time, when the total of tags which the user owns (i.e. the total of different tags used by a user to tag resources) grows slowly or rapidly, it is certainly that the user uses new tags and it is possible that the user uses historical tags. When the total of tags which the user owns remains unchanged, the user only uses historical tags or the user has no tagging behavior. In other words, the user’s tagging status have three cases during a period of time: the first case is the scenario that a user’s total number of tags increases rapidly; the second case is the scenario that a user’s total number of tags increases slowly; the third case is the scenario that a user has no tagging behavior. Therefore, we defined the three cases as users’ tagging status as the growing status, the mature status and dormant status respectively.

In the period of time T = [T _t−Δt,T _t), if the total number of tags which a user owns is increased and the average growth rate of the total number of tags is no less than the threshold α, we call that the user is in the growing status. That means, during the period of time, the user is quite active, and adds many new tags into the social tagging system.

Definition 1

(Growing Status) Considering the period of time [T _t−Δt,T _t), for a user u _k, if $\frac {g_{u_{k}}(T_{t-1})-g_{u_{k}}(T_{t-{\Delta } t})}{\Delta t}\geq \alpha $, then the user tagging status of the user u _k at the time T _t is the growing status.

In the period of time T = [T _t−Δt,T _t), if the total number of tags a user owned is increased and the average growth rate of the total number of tags is less than the threshold α, we call that the user is in the mature status. That means, during the period of time, the user adds a few new tags into the social tagging system and also uses many historical tags.

Definition 2

(Mature Status) Considering the period of time [T _t−Δt,T _t), for a user u _k, if $\exists T_{t^{\prime }} \in [T_{t-{\Delta } t},T_{t})$ brings $f_{u_{k}}(T_{t^{\prime }})\neq 0$ and $0\leq \frac {g_{u_{k}}(T_{t-1})-g_{u_{k}}(T_{t-{\Delta } t})}{\Delta t}< \alpha $, then the user tagging status of user u _k at the time T _t is the mature status.

In the period of time T = [T _t−Δt,T _t), if a user has no tagging behavior and the total number of tags the user remains constant, we call that the user is in the dormant status.

Definition 3

(Dormant Status) Considering the period of time [T _t−Δt,T _t), for a user u _k, if $\forall T_{t^{\prime }} \in [T_{t-{\Delta } t},T_{t})$ brings $f_{u_{k}}(T_{t^{\prime }})=0$, then the user tagging status of user u _k at the time T _t is the dormant status.

4.2 Determining user tagging status algorithm

Suppose the current time is T _t. We can determine the user tagging status at the moment T _t according to Definition 1, Definition 2 and Definition 3, by analysing the tagging history of the user u _k during the period of time Δt.

Then, the determining user tagging status algorithm, abbreviated as DUTS, is described in Algorithm 1. Here, the T ₀ is the moment when the user begins to use the social tagging system. If the duration that user u _k uses the social tagging system is less than Δt, and has tagging behavior recently, we think the user is in the growing status. Because everyone is personalized, the duration that a user in different tagging status is different. In order to simplify the calculation, the determining user tagging status algorithm only backs the user’s tagging history to a fixed period of time.

5 Tag recommendation method based on user tagging status

The Fig. 1 shows the framework of tag recommending model proposed in this paper. Once the user’s user tagging status is determined, we can employ different strategy to recommend tags for the user. Algorithm 2 describes the processing of the tag recommendation algorithm based on user tagging status, abbreviated as TR-UTS. First, the algorithm computes the user tagging status of the user at the moment T _t by using Algorithm 1. Then, the algorithm determines tag recommendation strategy by calculating tag probability distributions according to the user’s tagging status. Finally, the top n tags, most likely to be used by the user, are recommended.

A few additional explanations to the proposed method. Question 1: how to obtain the user u _q’s group members. It had been proved to be very helpful to improve the accuracy of recommendation by utilizing the group information. Since it is not the point of this paper, we simply think the friendship existing among the folksonomy as the user’s group information. For example, each user has an average of 13.443 friends in the Last.fm; for the CiteULike, there also exists “group id | username” information. Anyway, to propose an appropriate clustering method on users should be our further work in order to further enhance the flexibility of the proposed method. Question 2: how to obtain the target resource r _q’s similar resources. For this question, we will give the detail description in the Section 5.1.

5.1 The strategy for user tagging status in growing status

Considering a given user u _q to tag a given resource r _q at the current time, the user’s user tagging status is the growing status, which means the number of resources tagged by the user is increasing continually during a period of time before the current time, and the total number of tags used by the user is increasing continually, too. Therefore, it is helpful to enhance the performance of recommendation by considering the following two kinds tags: (1) one kind of tags is the tags used by the target user and his/her group members; and (2) the other kind of tags is the tags to label the target resource and their similar resources. Then, we can compute those tags’ probability distribution with SLM to recommending tags.This approach not only ensures recommending personalized tags, but also increases the diversity of recommended tags.

The strategy for the user tagging status in growing status, abbreviated as TR-GS, is described as follows.

Step1: according to the resource-tag matrix RTag _L×M, to calculate the similarity between the resource r _q and other resources based on the cosine similarity, and select the top S resources with highest similarity to r _q as the neighbor set of the resource r _q.

Set a row of the resource-tag matrix RTag _L×M be the vector r. Then, the similarity s i m(r _l,r _q) between r _l and r _q is computed as follows:

$$ sim(\textbf{r}_{l},\textbf{r}_{q})= \frac{\textbf{r}_{l} \cdot \textbf{r}_{q}}{\parallel \textbf{r}_{l} \parallel \parallel \textbf{r}_{q} \parallel}. $$

(4)

Step2: considering all tags labeled for the resource r _q and its neighbors, to compute the tag probability distribution p(t a g _m∣r _q) according to the following equation:

$$\begin{array}{@{}rcl@{}} p(tag_{m}\mid r_{q})&=&\frac{N_{Tag_{r_{q}}}}{N_{Tag_{r_{q}}} + \lambda_{r_{q}}} \times \frac {TF(tag_{m},Tag_{r_{q}})}{N_{Tag_{r_{q}}}} \\ &&+ \left( 1- \frac{N_{Tag_{r_{q}}}}{N_{Tag_{r_{q}}} + \lambda_{r_{q}}}\right) \times \frac {TF(tag_{m},TagS)}{N_{TagS}}, \end{array} $$

(5)

where $TF(tag_{m},Tag_{r_{q}})$ is the number of users who use the t a g _m to label the r _q, namely, $TF(tag_{m},Tag_{r_{q}})=w_{r_{q}tag_{m}}$. $N_{Tag_{r_{q}}}$ is the sum of weights of tags of resource r _q. T a g S is the set of tags labeled to the resource r _q and its neighbors, and for ∀t a g ∈ T a g S, its tag weight $w^{\prime }_{r_{k}tag_{m}}=w_{r_{k}tag_{m}} \times sim(\textbf {r}_{k},\textbf {r}_{q})$. N _{T
a
g
S} is the sum of weights of tags in the set T a g S. T F(t a g _m,T a g S) is the sum of weights of the tag t a g _m labeled to the resource r _q and its neighbors. $\lambda _{r_{q}}$ is interpreted as a Dirichlet smoothing factor, i.e. $\lambda _{r_{q}}=N_{Tag_{r_{q}}} / N_{TagS}$.

Step3: considering all the tags used by the user u _q and his/her group members, based on the user-tag matrix UTag _K×M, to compute the tag probability distribution p(t a g _m∣u _q) according to the following equation:

$$\begin{array}{@{}rcl@{}} p(tag_{m}\mid r_{q})&=&\frac{N_{Tag_{u_{q}}}}{N_{Tag_{u_{q}}} + \lambda_{u_{q}}} \times \frac {TF(tag_{m},Tag_{u_{q}})}{N_{Tag_{u_{q}}}} \\ &&+ \left( 1- \frac{N_{Tag_{u_{q}}}}{N_{Tag_{u_{q}}} + \lambda_{u_{q}}}\right) \times \frac {TF(tag_{m},Tag_{U_{q}})}{N_{Tag_{U_{q}}}}, \end{array} $$

(6)

where $N_{Tag_{u_{q}}}$ is the sum of tag weights of tags user u _q used. $TF(tag_{m},Tag_{u_{q}})$ is the tag weight of t a g _m the user have used, namely, $TF(tag_{m},Tag_{u_{q}})=w_{u_{q}tag_{m}}$. The set U _q consists of user u _q and users in the same groups with u _q. $Tag_{U_{q}}$ is the set of tags used by user u _q and users in the same groups with u _q. $N_{Tag_{u_{q}}}$ is the sum of tag weights of tags in the set $Tag_{U_{q}}$. $TF(tag_{m},Tag_{u_{q}})$ is the sum of tag weights of the tag t a g _m used by users in the set U _q. $\lambda _{u_{q}}$ is a Dirichlet smoothing factor, i.e. $\lambda _{u_{q}}=N_{Tag_{u_{q}}} / N_{Tag_{U_{q}}}$.

Step4: compute the possibility of the user u _q use the tag t a g _m to label the resource r _q, p(t a g _m∣r _q) and p(t a g _m∣u _q), according the following equation:

$$\begin{array}{@{}rcl@{}} p(tag_{m}\mid u_{q},r_{q})& =&(1-\beta) \times p(tag_{m}\mid u_{q}) \\ &&+\beta \times p(tag_{m}\mid r_{q}). \end{array} $$

(7)

where β ∈ [0,1].

Step5: sort the tags according to the probability p(t a g _m∣u _q,r _q), and then select the top n elements with the highest rank values to recommend to the user u _q, that is,

$$\begin{array}{@{}rcl@{}} \widehat{Tag}^{n}(u_{q},r_{q})=\max \limits_{tag_{m} \in Tag}^{n} (p(tag_{m}\mid u_{q},r_{q})). \end{array} $$

5.2 User tagging status in mature status

When the given user u _q tags the given resource r _q, if at the moment the user’s user tagging status is the mature status, then during the period of time before the moment, the user’s tagging behavior tends to be stable, and the amount of resources achieves a certain number; thus, the total number of the user’s tags has increases slowly. We compute those tags’ tag probability distribution with SLM based on the user’s tags and the resource’s tags. This approach not only ensures the accuracy of tag recommendation, but also reduces the computation complexity.

The strategy for the user tagging status in mature status, abbreviated as TR-MS, is described as follows.

Step1: for $\forall tag_{m} \in Tag_{u_{q}}$, the probability p _u(t a g _m∣u _q) that the user u _q will use t a g _m is calculated as follows:

$$ p_{u}(tag_{m}\mid u_{q})=\frac{w_{u_{q}tag_{m}}}{N_{Tag_{u_{q}}}}, $$

(8)

where, $N_{Tag_{u_{q}}}$ is the sum of tag weights of tags used by u _q, namely, $N_{Tag_{u_{q}}}=\sum \limits _{tag \in Tag_{u_{q}}} w_{u_{q}tag}$.

Step2: for $\forall tag_{m} \in Tag_{r_{q}}$, the probability p _r(t a g _m∣r _q) that the resource r _q will be labeled by t a g _m is calculated as follows:

$$ p_{r}(tag_{m}\mid r_{q})=\frac{w_{r_{q}tag_{m}}}{N_{Tag_{r_{q}}}}, $$

(9)

where, $N_{Tag_{r_{q}}}$ is the sum of tag weights of tags labeled to r _q, namely, $N_{Tag_{r_{q}}}=\sum \limits _{tag \in Tag_{r_{q}}} w_{r_{q}tag}$.

Step3: calculate the p(t a g _m∣u _q,r _q), the probability that a given tag t a g _m will be used by the given user u _q to label the given resource r _q, using a weighted linear combination of p _u(t a g _m∣u _q) and p _r(t a g _m∣r _q), as follows:

$$ \begin{array}{ll} p(tag_{m}\mid u_{q},r_{q}) =&(1-\gamma) \times p_{u}(tag_{m}\mid u_{q}) \\ &+\gamma \times p_{r}(tag_{m}\mid r_{q}), \end{array} $$

(10)

where, t a g _m ∈ (T a g _u ∪ T a g _r), and γ ∈ [0,1].

Step4: sort the tags according to the probability p(t a g _m∣u _q,r _q), and select the top n elements to recommend to the user u _q, that is:

$$\begin{array}{@{}rcl@{}} \widehat{Tag}^{n}(u_{q},r_{q})=\max \limits_{tag_{m} \in Tag}^{n} (p(tag_{m}\mid u_{q},r_{q})). \end{array} $$

5.3 User tagging status in dormant status

When the given user u _q tags the given resource r _q, if at the moment the user’s user tagging status is the dormant status, then during the period of time before the moment, this user did not tag. Thus, we compute the tag probability distribution with SLM using tags labeled to the resource r _q and its similar resources.

The strategy for the user tagging status in dormant status, abbreviated as TR-DS, is described as follows.

Step1: estimate the probability p(t a g _m∣r _q) that the tag t a g _m will be labeled to the resource r _q using the same method used in the Section 5.1.

Step2: sort the tags according to the probability p(t a g _m∣r _q), and then select the top n elements with the highest rank values to recommend to the user u _q, that is:

$$\begin{array}{@{}rcl@{}} \widehat{Tag}^{n}(u_{q},r_{q})=\max \limits_{tag_{m} \in Tag}^{n} (p(tag_{m}\mid r_{q})). \end{array} $$

6 Experiments

In this session, we conducted various experiments to evaluate and analyze the effectiveness and efficiency of the proposed method on some datasets. In the first set of runs, we gave examples with purposes of assessing the effectiveness of the DUTS algorithm. In the second set of runs, we obtained experimentally the threshold values used in the TR-UTS algorithm and the most popular tags algorithm (Jäschke et al. 2008). In the third set of runs, we gave some results of TR-UTS, TR-GS, TR-MS and TR-DS. In the forth set of runs, we compared the proposed TR-UTS algorithm to other algorithm as FolkRank (Kim and El Saddik 2011), LocalRank (Kubatz et al. 2011) and the most popular tags ρ-mix (Jäschke et al. 2008). But before reporting these experimental results, we need to introduce the dataset preprocessing and the evaluation criteria that we adopt.

6.1 Dataset preprocessing

There are two datasets of social tagging systems used in experiments, that is, the CiteULike^{Footnote 1}, and the Last.fm^{Footnote 2}. CiteULike is a web service which allows users to save and share citations to academic papers. Users can organize their libraries with freely chosen tags and this produces a folksonomy of academic interests. Last.fm is a music website, the site offers numerous social networking features and can recommend and play artists similar to the user’s favourites. Though there is no palpable information for the user is belong to which group, there exists friendships between users in these two folksonomies. Thus, we can find the friends from the original data and set the user and his (her) friends into a group.

The original datasets are too sparse to be used for experiments. Therefore, the p-core of level k algorithm (Batagelj and Zaveršnik 2011) is applied to the datasets so that every user, every resource and every tag appear at least k times in the processed datasets. The statistical information after preprocessing are shown in Table 1. The first column denotes some statistical information of the corresponding dataset, the second column describes the statistical information on CiteULike when k=30, and the third column presents the statistical information on Last.fm when k=10.

Table 1 Some information of datasets after preprocessing

Tag recommendation method in folksonomy based on user tagging status

Abstract

Similar content being viewed by others

An Optimized Tag Recommender Algorithm in Folksonomy

Keep Querying and Tag on: Collaborative Folksonomy Using Model-Based Recommendation

Personalized Search by a Multi-type and Multi-level User Profile in Folksonomy

Explore related subjects

1 Introduction

2 Related work

3 Basic concepts

3.1 Social tagging system

3.2 Statistical language model

4 User tagging status

4.1 Related definitions

Definition 1

Definition 2

Definition 3

4.2 Determining user tagging status algorithm

5 Tag recommendation method based on user tagging status

5.1 The strategy for user tagging status in growing status

5.2 User tagging status in mature status

5.3 User tagging status in dormant status

6 Experiments

6.1 Dataset preprocessing

6.2 Evaluation criteria

6.3 Experimental results

6.3.1 Results of the determining user tagging status algorithm

6.3.2 Calculating parameters

6.3.3 Results and analysis

7 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation