Keywords

1 Introduction

Social network sites (SNSs), such as Facebook, Twitter, Sina Weibo has attracted millions of users since introduced. SNSs is playing a very important role in “Web 2.0” with countless of UGC are generated. However, the lack of centralized organizing and well-structured makes UGC become a very big challenge to mine the latent value in the contents. Make it hard to analyze the social networks [1].

To solve the problem, tagging is an effective way to encode interests and characters in the contents and users. Tags play a very important role in tag-based social interest discovery [2], making UGC accessible by search engines, recommending system and advertising systems. Social Tagging [3, 4] has emerged as an effective way to alleviate some of these challenges. Many methods and studies have focused on tag generation and recommending. There are two general approaches to generate tags for items:

  1. 1.

    Social Tagging System allows users to create and manage tags of social items. This idea is simple: users can select any meaningful tags or keywords to encode the characteristic attributes of items.

  2. 2.

    Automatic Tag Recommendation has been proposed in multiple studies [58]. Given an item, the task of automated tag recommendation is to suggest several most relevant tags to the contents using machine learning algorithms.

However, all these approaches have their own defects. For manually tags, since they are not restricted to a certain vocabulary, users can pick any tags they prefer to describe the resource. So these tags can be inconsistent, trivial, or false, duo to the users’ personal terminology, depending too much on the users’ choice [10]. For automatic-generated tags, most approaches assume the pre-existing tags or the text description of the items. As a result, we do not expect these approaches could still perform well while applied to social sites or systems, that does not built around tags and text description.

This paper aims to improve quality of tag recommendation, overcoming the problem of automatic tag recommending approaches, which depending too much on presenting of text and pre-existing tags. To archive this, we use LDA with social graph together to infer tags on a large scale of group. Social graph implies people are connected by common interest. People with common interest will group together automatically which is called clustering, we can see that in Fig. 43.1. Our approach try to learn user’s common interest instead of analyze the basic text description of the user. This will be more accurate and precise to generate high quality tags.

Fig. 43.1
figure 1

Users are clustering with common interests on social sites

The rest of this paper is organized as follow. In Sect. 43.2 we introduce LDA algorithm and then present our approach. Next we will show our experiments on real data from Sina Weibo and compare it with our baseline approaches in Sect. 43.3. In Sect. 43.4, we conclude this paper.

2 Tags Recommendation

In this section, we introduce LDA first, and then explain our approach how to extract tags by adapting LDA and follower graph together.

2.1 Latent Dirichlet Allocation

Latent Dirichlet Allocation is a probabilistic latent topic model introduced by Blei in 2003 [9]. The general idea of LDA is based on the assume that a person writing a document with certain topics in mind. When wring a document, first pick some topics for the document, and then pick words from vocabulary that with a certain probability about that topic. By repeating this step, a document finally is generated with a mixture of topics, representing by a collection of words. Here, LDA do not care the order of the words, so the probabilities of different words with different orders are treated the same, known as “Bag-of-Words” [11].

In this model, topic is the latent variable added. With topic, LDA helps to explain the similarity of words by grouping them into different topics. The modeling process of LDA can be described finding a mixture of topics for each resource. If there are T number of topics specified, and probability of the ith word wi can be formalized as following:

$$ p\left( {w_{i} |d} \right) = \sum\limits_{j = 1}^{\text{z}} {p\left( {w_{i} |z_{j} } \right)p\left( {z_{j} |d} \right)} $$
(43.1)

where w i represents the ith word in vocabulary. d represents the document. z j represents the latent topic for ith word. P(w i |d) is the probability of ith word in the given document d, with latent topic variable z j . P(t i |z j ) is the probability of ith word in topic z j . p(z j | d) is the probability of picking a word in topic in document d. LDA use latent topic variables to link words and documents. Figure 43.2 shows the plate graph representation of LDA.

Fig. 43.2
figure 2

Plate graph of latent Dirichlet allocation

The probability of latent topics p(z|d) follows a multinomial distribution with parameter θ that has a Dirichlet distribution with parameter α as it prior. The probability of a word in a topic p(w|z) follows a multinomial distribution with parameter φ that has a Dirichlet distribution with parameter β. With this notion, the text generation process can be explained as follow:

  1. 1.

    For all aspects k, sample φ k  ~ Dir(β)

  2. 2.

    For all entities e, sample θ e  ~ Dir(α)

  3. 3.

    For each term-slot in d e

    1. a.

      Sample an aspect z j  ~ Mult e )

    2. b.

      Sample a term w i  = w ~ Mult(φ zj )

Then parameter Θ and Φ can be learned by many infer methods, such as EM algorithm and Gibbs sampling method. Compared to EM, Gibbs sampling can guarantee a better convergence. Here we choose Gibbs sampling as the training method for LDA. It iterates multiple times over each word wi in document, and samples a new topic j for the word based on the core probability method P(z j |t i , t i , z j ), until the LDA model parameters converge.

$$ P(z_{j} |t_{i} , t_{ - i} , z_{ - j} )\;\alpha\; \frac{{n_{j|e, - i} + \alpha }}{{\left| {d_{e, - i} } \right| + \alpha |K|}}*\frac{{n_{w|j, - i} }}{n\cdot|j, - i} $$
(43.2)

where n j | e,−i is the number of times aspect j is observed for entity e, n w|j,i is the number of times word w is sampled from aspect j,|d e,i| is number of word occurs associated with e, and n ·|j,i is the total number of words generated from aspect j. After learning the Dirichlet distribution Θ and Φ. The posterior probabilities p(w|z) and p(z| d) can be figured out.

2.2 Tags Recommendation Based on Social Graph

In this paper, we focus on micro blogging service which is one-directional site, so the social graph is the follower graph. As we can see from Fig. 43.1, people on social sites are clustering with the same common interest [13]. And some other studies have focused on taking advantages of social networks [1, 12]. Our approach does not try to recommend tags on the text description of users, we take the latent advantage of social graph. The general purpose of a user willing to follow somebody is because they share the common interests. So our approach add “common interest” as the latent variable in social graph, and concretely involves two steps to generate tags for a user:

  1. 1.

    Learn users’ interest probability distribution.

  2. 2.

    Recommend tags from interest probability distribution

2.2.1 Use LDA to Learn User’s Interest

As we introduced in Sect. 43.2.1, LDA is a topic model method mostly used to do text categorizing or tag extraction on documents. With the same idea, we take LDA on social graph to infer the probability of user distribution. While LDA adopt “word > topic > document” pattern to generate documents. In social graph, we can treat each follower as “words” in LDA, treat common interests as “topics”, treat the followed user as the “document”. So the follower graph can be generated with pattern: “follower > common interest > followed-user”. If there are M number of common interests are given, the probability of ith user ui follow the target user as follow:

$$ {\text{P}}({\text{u}}_{i} | {\text{u}}) = \sum\nolimits_{j = 1}^{M} {P (u_{i} | I_{j} ) P (I_{j} | u)} $$
(43.3)

Similar with original LDA, u i represents the ith user. P(u i | u) indicates the probability that u i will follow u. I j is the latent variable representing ith interest. P(u i | I j ) represents the probability that I j is ui’s interest. While P(I j | u) represents the probability that I j is the followed user’s interest. Then we can learn the user’s interest distribution, with the same approach we introduced in Sect. 43.2.1.

2.2.2 From Interest to Tags

After learning the users’ interest distribution, then we can use this to extract the actual tags of users, with representing with form of words. So the probability of word ti can the the tag of u can be formalized as follow:

$$ P(t|u) = \;\sum\nolimits_{j = 1}^{M} {P(t|I_{j} )P(I_{j} |u)} $$
(43.4)
$$ P(t|I_{j} ) = \;\sum\nolimits_{i = 1}^{N} {P(t|u_{i} )P(u_{i} |I_{j} )} $$
(43.5)

where p(t | u) is the probability of word t j be the tag of user. P(t | I j ) means the probability of word t i is interest I j . So we can break P(t | I j ) into P(t | u i ) and P(u i | I j ). Because P(I j | u) and P(u i | I j ) has learnt during process in Sect. 43.2.2.1, so these are known variables. Then what we must focus on is to learn P(t | u i ), which is the probability of a word be tag of a user.

It seems a recursive routine that we have to figure out p(t| u i ) first in order to get p(t| u). However we can define p(t| u i) as the “basic tag” probability distribution of all the followers, which not including the followed user’s probability p(t| u). So p(t|u i) is absolute different with p(t|u), we can other approaches to generate “basic tags” for followers. Then use above formulas get followed user’s tag. And in this paper, we adopt method introduced in Latent dirichlet allocation for tag recommendation [8] to generate basic tags of followers.

3 Experiments

In this section, we will illustrate the efficacy of our approach with experiments on real data that crawled from Sina Weibo. It shows that our approach outperforms with all the baseline methods. We will discuss the experiment in detail.

3.1 Datasets

We chose data from Sina Weibo, which is the most popular micro blogging service in China. You can follow anyone you like, such as you favorite movies starts, singers. And also you can be followed by anyone found of you. In our experiments, we made a crawler to crawl through the network based on then open platform of Sina Weibo. In order to get the text associated with each user, we collected each user’s tweets from April to October in 2012.

3.2 Baselines

We chose two classic tag recommendation methods: TFIDF and LDA-Tag Recommendation.

  1. 1.

    TFIDF is a simple method to obtain tags from text associated with entities. It treats such text as documents and the use TFIDF method to score all the words, then choose most rating words as the tags for entities.

  2. 2.

    LDA-Tag Recommendation treats each tweets of user as different documents [8]. Then use LDA to infer topics-user probabilities and words-topics probabilities. So tags can also be generated with most rating words.

3.3 Evaluation

In this experiment, we set 100 as the number of interests of user. The performance of different algorithms is evaluated by the average fraction of wins. There are two views to present the results: followers count and tweets count. From followers count view, we separated users into 8 groups with followers count [0–100], [100–1000], etc. This grouping is to test the performance of algorithm on different popularities. From tweets count view, we separated users into 7 groups with tweets count [0–100], [100–200], etc. This grouping is to test the performance on different frequencies of user on social sites. Then we randomly select 50 users from each group as samples.

Then we showed our result to two annotators for each approach. The annotators are asked to pick the approach with the best tags-set. For each group, we report the average fraction of wins. Figure 43.3 present the precision and recall grouped by followers count for each approach. Figure 43.4 present the precision and recall grouped by tweets count.

Fig. 43.3
figure 3

Precision and recall grouped by number of followers

Fig. 43.4
figure 4

Precision and recall grouped by number of tweets

It is clear that our approach outperforms all the baseline approaches as expected. However, there is a fall of precision when the count of followers is greater than 500 K in Fig. 43.3. Because it is not easy to determine the actual interest attracted to be followed for users with too much followers. Someone followed the user may just because his friends followed him or the user is so famous that to be followed without obvious interest in common. Figure 43.4 shows us that there is no much difference while taking our approach on different tweet count, while baseline approaches become more precise as the count increasing. It is because our approach is based on the social graph not the text of user, so it’s much more stable when the quality of text itself is unpredictable.

4 Conclusion

In this paper, we investigate the use of Latent Dirichlet Allocation on social graph to recommend tags. After experimenting, it turns out that our approach outperformances TFIDF and LDA-Tag Recommendation approaches. Our approach not only recommends more precise and appropriate tags for users, but also works more stably on the range of text qualities. The most contribution of this paper is to combine the latent value of social graph with LDA, archiving better tags recommending.