Keywords

1 Introduction

In Twitter, choosing the right hashtag automatically for the user enables him/her to quickly join a discussion and read tweets written by other users. Currently, Twitter recommends only trends, the most popular contemporary hashtags among all users. Hashtag recommendation has become an active area of research. Most hashtag recommendation systems suggest the most relevant top-k hashtags to the user’s query [2, 4, 7, 8, 11, 12, 19]. Hashtag recommendation algorithms can be broadly classified into two categories: personalised [7, 8] and non-personalised [2, 4, 11, 12, 19] systems. Personalized recommendation systems [14] address the user preferences, activities and constraints while the non-personalized recommendation systems address data of all users. The outputs of hashtag recommendation systems benefit two parties: the user and Twitter. Not only that the user will save time and effort when personalized hashtags are recommended automatically, but the quality of the Twitter’s discussions will be enhanced when the used hashtags are more accurate. They also help Twitter eliminate the insignificant and noisy hashtags.

Twitter is composed of three main components which are: user, hashtag and tweet content. There are connections between users that reflect their relationship (e.g., family members) or their similarity in profession or interest. By analysing these connections in a network, communities can be detected. Individual users have their preferred hashtags when they tweet. Hashtag preference is the set of all previous hashtags used by a user [8]. Users also have Topics preference [20]. Textual features are collection of words extracted from tweets; they are therefore related to the content of tweets. Hashtags related features are popularity, relevance, recency and number of authors who are adopting a certain hashtag. Some hashtag features are used as ranking methods in hashtag recommendation systems. Previous research in hashtag recommendation used different combination of previously mentioned components or their related features to design their models. Some of these research clustered similar users [20], set of mentioned users [10] or set of followee [7] to find candidate hashtags. None of these research studied the effect of these factors on communities detected from real-world networks in the context of hashtag recommendation. As community detection algorithms explores densely interconnected users, it is worthwhile investigating how network communities affect the performance of hashtag recommendation.

The focus of this paper is to investigate the impact of the social factors when they are incorporated with tweet texts in hashtag recommendation on detected communities. The effect of user profiling based on hashtag preference is also investigated. Our research questions are: What is the most influencing factor on detected communities in the context of hashtag recommendation? Does the algorithm used in community detection and the size of the community affect the hashtag recommendation performance later on? To the best of our knowledge this is the first piece of research work that studies hashtag recommendation on detected communities. The Breadth-First Search algorithm (BFS) and Clique Percolation Method (CPM) algorithms are adopted in our study to detect communities. Hit rate is used as a measure of evaluation to compare the performance of these factors. In addition, the performance of some ranking methods that are related to popularity and relevance used in hashtag recommendation are compared.

Structure of the Paper: This paper is organized as follows: Sect. 2 discusses previous works that directly relevant to our research. Section 3 describes the dataset and methodologies used in this research. Section 4 explains the conducted experiments. Section 5 reports the results of experiments and extensive discussion on the results. Section 6 concludes the paper and outlines our future work.

2 Related Work

Our analysis is built on two lines of research: Community detection from the Twitter real-world network and Twitter hashtag recommendation.

Community Detection from the Twitter Real-World Network. Algorithms for detecting communities from real-world social networks focus mainly on the connections between users and the strength of these connections [1, 13, 16]. These algorithms gather users from the network to form communities using either Breadth-First Search (BFS) or the Clique Percolation Method (CPM). BFS works as a traversal method through a graph of users. It finds a root users and then the next level followee and so on. CPM finds overlapped communities of highly connected users [1]. CPM explores all possible k-cliques which are k number of nodes with complete connections. When two k-cliques share \(k-1\) nodes, they are considered adjacent. The union of the two adjacent k-cliques forms a community. Wagenseller et al. [6] used the size of the community, coverage, modularity, participation ratio and user interests to compare different community detection algorithms. They also studied how good the detected communities were, based on the similarity score between users interests. In their method, the user’s interest was expressed as the top-10 most frequent hashtags. They reported that the relationships between users were poor when this method was used.

Twitter Hashtag Recommendation. The textual factor has been studied in the literature and proven to be a significant factor in hashtag recommendation. Mazzia et al. [11] used the Naive Bayes algorithm to recommend hashtags. Dovgopol et al. [2] built a hybrid hashtag recommendation model based on the K-Nearest Neighbour and Naive Bayes. Zangerle et al. [19] built their hashtag recommendation model by studying the textual similarity between tweet contents. They weighted the words in tweets using TF-IDF and computed the similarity distance using Cosine similarity, Jaccard coefficient, Dice coefficient and Levenshtein distance. They found that the Cosine similarity performed the best over the others.

User profiling [14] infers the user’s interests, activities, preferences and behaviours. User profiling or user based recommendation is used to find similar users. In an early study, the biography of users has been analysed for user classification [15]. However, it is difficult to rely completely on this information as not all users provide correct biography about themselves. Some research incorporated user profiling in hashtag recommendation. Zhao et al. [20] have entrenched the user’s topics preference and Kywe et al. [8] have implanted the user’s hashtag preference to find similar users. From the set of similar users, candidate hashtags are extracted, ranked and recommended.

In hashtag recommendation systems, candidate hashtags can be ranked based on their popularity, relevance or recency. The definition of these ranking methods are listed below:

 

Tweet Hashtag Popularity.:

Yang et al. [18] defined popularity as the number of times a hashtag has been adopted in previous tweets.

User Hashtag Popularity.:

This means that the popularity of a hashtag is measured based on the number of authors (users) who have adopted the hashtag at least once [8, 18].

Global Hashtag Popularity.:

For this type of hashtag popularity, the hashtag frequency is calculated over the whole dataset [19].

Tweet Hashtag Relevance.:

The closeness of a hashtag to the user or to the tweet content [19]. Hashtags placed in the tweet with the highest similarity score to the user’s query are considered the most relevant hashtags to the user’s query tweet [19].

Recency of the Hashtag.:

This measures the age (in days) of the hashtag that has recently been used by the user [5].

 

From the above definitions, we can see that some of them are general while the others are personalized. Ranking based on hashtag popularity is sometimes called ranking by frequency [8, 19].

3 Methods

In this section, we compare the quality of the recommended hashtags to study the effect of the textual, social and user profiling factors on detected communities. The baseline methods, dataset, experimental settings and evaluation metrics are explained. In the previous section we have introduced different ranking methods. In this section, we focus on analysing the performance of Tweet Hashtag Relevance (THR), Tweet Hashtag Popularity (THP), User Hashtag Popularity (UHP) and Global Hashtag Popularity (GHP).

3.1 Baseline Approaches

Two baseline methods are chosen to perform our experiments. The first one is hashtag recommendation based on textual factor and the other one is hashtag recommendation based on user profiling.

Hashtag Recommendation Based on Textual Factor. In Zangerle et al.’s [19] model, the feature vectors of tweets are created using TF-IDF. The Cosine similarity is used to retrieve the top-500 similar tweets to the query tweet. Candidate hashtags are extracted from the set of similar tweets, ranked and the top-5 and top-10 hashtags are recommended. Table 1 reviews Zangerle et al.’s results. These results show the contribution of the textual factor in the application of hashtag recommendation.

Table 1. Previous research results from [19]

Hashtag Recommendation Based on User Profiling. Kywe et al.’s [8] model is our second baseline method. The feature vector of a user is his/her historical hashtags considering the duplication. TF-IDF is used to weight all the extracted hashtags. Then, the Cosine similarity is used to find the distance between users. From the tweets of the similar users, all hashtags are extracted. In Kywe et al.’s model, when the hashtags extracted from similar users and the ones extracted from similar tweets are combined, their hit rate performance is 31.56% when the top-5 hashtags are recommended and 37.19% when the top-10 hashtags are recommended.

3.2 Datasets and Pre-processing

The dataset we use is the Dataset-UDI-TwitterCrawl-Aug2012 [9] collected by Li et al. during the period from 2011 to 2012. However, the user’s personal timeline includes tweets issued from 2008 to 2012. In this dataset, there are 200 million user following relationships, 3 million user profiles and 50 million tweets for 140,000 users. Every tweet is attached with its author name, the issue date of the tweet and other data. Each user’s personal timeline has at most 500 tweets. Due to hardware constraints, our sub-network consists of 745,262 users and 2 million user relationships which our machine with 32 GB RAM could just handle when the number of adjacent nodes k is set to 2. In order to study the impact of the community detection algorithm on hashtag recommendation, we adopted the Breadth-First Search algorithm (BFS) and Clique Percolation Method (CPM) to detect communities. In BFS, the first user was chosen randomly to be the root node followed by its immediate followee, then followed by the next level followee.

As a proof of concept, a straightforward exploratory analysis is conducted regarding the network we are using. Using CPM, the maximal number of k-cliques in our sub-network is 1,881,550. The number of communities and the size of the largest community are shown in Table 2. To validate our results, we tested various number of detected communities. Table 3 shows an overview of two random communities detected using BFS and CPM.

Table 2. Number of communities and largest community size
Table 3. Overview of the two random communities detected using BFS and CPM

By studying these communities, some of our observations were consistent with earlier research which incorporated millions of users in the following points: Few hashtags have very high tweet hashtag popularity and the majority of hashtags have a very low tweet hashtag popularity, mostly equal to 1. These data follow the long tail distribution. The top-5 most popular hashtags in the BFS-generated community are: ‘fb’: 419, ‘news’: 384, ‘ff’: 195, ‘pr20chat’: 122, ‘sxsw’: 114. The top-5 most popular hashtags in the CPM-generated community are: ‘ff’: 324, ‘alliegentry’: 255, ‘tcot’: 251, ‘cdnpoli’: 249, ‘teaparty’: 197.

We have chosen 10 communities randomly, 5 of these communities are generated using BFS and 5 are generated using CPM (when k = 3). Each of these communities is a separate dataset. We split each dataset into training and testing datasets since it is useful for evaluation. Essentially, we shuffled the dataset and 20% of the dataset was used in the testing. The hashtags placed in the testing dataset were removed from the original tweets and used to build the set of ground truth hashtags.

We adopted various pre-processing strategies to reduce noise in tweets content. As the performance of the ranked search results depends heavily on the pre-processing of the corpus [2], we removed duplicates in tweets, punctuation, stop words and links. All texts transformed into lower case. We used the contraction map built by Sarkar [17] that converts 122 shortened words into proper English words such as ‘won’t’ to ‘will not’. We also used the WordNetLemmatizer [3] algorithm to group different forms of words into one word such as ‘drive’,‘drove’, ‘driven’ and ‘driving’ into ‘drive’. The open source Python Libraries are used.

3.3 Experimental Setting

There are general parameters and personalized parameters involved in the experiments. As for the general parameters, let \(D=\{d_{1},d_{2},\ldots ,d_{n}\}\) be the set of tweets in the training dataset and \(Q=\{q_{1},q_{2},\ldots ,q_{l}\}\) be the set of tweets in the testing dataset. Let \(U=\{u_{1},u_{2},\ldots ,u_{m}\}\) be the set of users, and let \(H=\{h_{1},h_{2},\ldots ,h_{p}\}\) be the global hashtag space from D. Personalized parameters are modified parameters which differ from user to user. So, \(D_{u_{i}}\) is the tweets issued by the user \(u_{i}\) and \(H_{u_{i}}\) is the \(u_{i}\)’s hashtags preference. In the training and testing datasets, every tweet is attached with its author ID. Top-n is the set of similar tweets and top-m is the set of similar users. Top-k is the set of highly ranked hashtags to be recommended to the user where we set k to 5 and 10.

3.4 Evaluation Metrics

Measuring the quality of the automatically recommended hashtags is essential to compare the results. To evaluate methods of this research, hit rate is adopted. The hit rate measure [8] gives the ratio of the number of hits to a number of attempts. A hit to an active tweet is considered in the counting when there is at least one matching ground truth hashtag in the tweet.

4 Experiments

In this section, two experiments are performed.

Experiment 1: Hashtag Recommendation Based on Social and Textual Factors. The aim of this experiment is to assess the contribution of the social factor when it is incorporated with tweet contents on hashtag recommendation. This experiment is performed on the ten randomly selected communities (5 are generated using BFS and 5 are generated using CPM). The reported results of the experiments are the average score of testing these communities. In this experiment, we notice that the similarity score of some of the retrieved tweets are very low or equal to zero. This motivates us to set a threshold \(\tau \) to work as a dividing line between the highly similar and less similar tweets. To improve the recommendation quality, we disregard tweets that are marginally similar to the user’s query q. There are two parts in this experiment. The first part records the hit rate when retrieving various number of similar tweets n to the user’s query q, top-n = 10, 50, 100, 150 and 200. The second part investigates the impact of the size of the community on hashtag recommendation by increasing the number of users m. The number of users is set to 100, 200, 300 and 400 and we fix the value of the top-n to be 50. In both parts, we compare the performance of the four ranking methods: UHP, THP, THR and GHP.

Experiment 2: Hashtag Recommendation Based on Social and User Profiling Factors. The aim of this experiment is to asses the contribution of the social factor when it is incorporated with the user profiling factor based on the user’s hashtag preference. This experiment is performed on the BFS and CPM generated communities. For the ten communities, top-5 and top-10 hashtags are recommended. There are two parts in this experiment. The first part records the hit rate when the hashtags extracted from the top-m similar users are considered in the recommendation. The second part records the hit rate when the hashtags extracted from the top-m similar users and the hashtags extracted from the top-n similar tweets are combined in the recommendation. In both parts, top-m is set to 1, 3, 5 and 10 similar users and top-n equals 50 is fixed.

Fig. 1.
figure 1

Top-5 recommended hashtags average hit rates of the BFS-generated communities

Fig. 2.
figure 2

Top-10 recommended hashtags average hit rates of the BFS-generated communities

Fig. 3.
figure 3

Top-5 recommended hashtags average hit rates of the CPM-generated communities

Fig. 4.
figure 4

Top-10 recommended hashtags average hit rates of the CPM-generated communities

Fig. 5.
figure 5

Top-5 recommended hashtags on different sizes of BFS-generated communities

Fig. 6.
figure 6

Top-10 recommended hashtags on different sizes of BFS-generated communities

Fig. 7.
figure 7

Top-5 recommended hashtags on different sizes of CPM-generated communities

Fig. 8.
figure 8

Top-10 recommended hashtags on different sizes of CPM-generated communities

5 Results and Discussions

Results of Experiment 1. Figures 1 and 2 show the average hit rate (in percentage) on the BFS-generated communities when top-5 and top-10 hashtags are recommended, respectively. Figures 3 and 4 show the average hit rate when top-5 and top-10 hashtags are recommended on the CPM-generated communities. In general, there is a slight improvement when \(\tau \) is used in all ranking methods but it is more clear in the UHP. When \(\tau \) > 0.1, more accurate results are obtained but many queries retrieve none similar tweets which reduces the overall performance.

As a whole, the Tweet Hashtag Popularity ranking method (THP) performs better than all the other ranking methods. In the BFS-generated communities, the highest average hit rate is 41.34% when top-5 hashtags are recommended and 45.78% when the top-10 hashtags are recommended. In the CPM-generated communities, the highest average hit rate is 47.58% when the top-5 hashtags are recommended and 52.67% when the top-10 hashtag are recommended. It can be noticed that there is no significant improvement in the performance when n > 50 since \(\tau \) is used as a filter. In general, the performance on the CPM-generated communities outperforms the BFS-generated communities. The performance on the BFS and CPM-generated communities are higher than the Zangerle’s et al. paper by approximately more than 20% and 30%, respectively.

As for the results of the second part of the experiment, Figs. 5 and 6 show the average hit rate of the top-5 and top-10 recommended hashtags on the BFS-generated communities when the sizes of the communities (number of users) are increased. The highest average hit rate in the top-5 hashtag recommendation is 41.34% and 47.46% in the top-10 hashtag recommendation. It can be seen that there are no significant improvements in the performance when the sizes of the BFS-generated communities are increased to the second or to the third level. Figures 7 and 8 show the average hit rate of the top-5 and top-10 recommended hashtags on the CPM-generated communities when the sizes of the communities (number of users) are increased. The highest average hit rate when the top-5 hashtags are recommended is 47.58% and 52.20% when the top-10 hashtags are recommended. In CPM-generated communities, it can be noticed that the performance is decreasing with the increase of the communities sizes. The overall performance on the CPM-generated communities is higher than that of the BFS communities.

Fig. 9.
figure 9

Top-5 recommended hashtags average hit rates when the top-m users are considered of the BFS-generated communities

Fig. 10.
figure 10

Top-10 recommended hashtags average hit rates when the top-m users are considered of the BFS-generated communities

Fig. 11.
figure 11

Top-5 recommended hashtags average hit rates when the top-m users are considered of the CPM-generated communities

Fig. 12.
figure 12

Top-10 recommended hashtags average hit rates when the top-m users are considered of the CPM-generated communities

Fig. 13.
figure 13

Top-5 recommended hashtags average hit rates when hashtags from top-m users and top-50 tweets are considered of the BFS-generated communities

Fig. 14.
figure 14

Top-10 recommended hashtags average hit rates when hashtags from top-m users and top-50 tweets are considered of the BFS-generated communities

Fig. 15.
figure 15

Top-5 recommended hashtags average hit rates when hashtags from top-m users and top-50 tweets are considered of the CPM-generated communities

Fig. 16.
figure 16

Top-10 recommended hashtags average hit rates when hashtags from top-m users and top-50 tweets are considered of the CPM-generated communities

Results of Experiment 2. Figures 9, 10, 11 and 12 show results of the first part of the experiment which measures the average hit rates when top-m users are considered. In this part of the experiment, the THR is not used because the tweets content are not incorporated. We notice that GHP performs the best over THP and UHP. The best results in BFS-generated communities are when the top-1 similar users are taken into account. The average hit rates of THP and UHP decrease as m increases. This means when hashtags of more than one user are considered the performance of the hashtag recommendation is decreased. This finding is consistent with Kywe’s et al. finding. In THP, the best average hit rates is 14.2% when the top-5 hashtags are recommended and 17.6% when the top-10 hashtags are recommended. In the CPM-generated communities, the average hit rates of the THP in the top-5 and top-10 recommended hashtags are 14.2% and 17.21%, respectively. UHP shows the best performance when the top-1 user is considered with average hit rates equals to 11.2% and 17% to the top-5 and top-10 recommended hashtags. Therefore, the contribution of the user profiling based on the user’s hashtags preference is less than the contribution of the textual factor when each of them is considered separately.

Figures 13, 14, 15 and 16 show results of the second part of the experiment. In general, the performance of the hashtag recommendation on both communities are higher than the results reported in the first part of the experiment 1 which does not incorporate user profiling. This indicates that the user profiling factor has a significant impact on enhancing the performance. These results also are higher than the results reported by Kywe et al.’s. which indicates that the social factor has a great impact on the hashtag recommendation. Table 4 shows the results compared with the results reported by Kywe’s et al.’s. model. In BFS-generated communities, the best average hit rate is 45.48% in the top-5 hashtag recommendation and 50.53% in the top-10 hashtag recommendation. In CPM-generated communities, the highest results is 50.89% when the top-5 hashtags are recommended and 56.61% when the top-10 hashtags are recommended.

Table 4. Hit rates comparison

6 Conclusion and Future Work

In this paper, we derived an empirical analysis to study the performance of the hashtag recommendation on communities detected using BFS and CPM when the social, textual and user profiling factors are incorporated. The results show that the social factor is the most significant factor. The community detection algorithm and the size of the community also play important roles in the performance of the hashtag recommendation.

Our future work is to design a personalized hashtag recommendation model based on the results of our investigation. In addition, we will have additional restrictions on which neighbours we are adding into the community. For example, if a node is not influential, then we should not include that node into the community.