Abstract
Massive amounts of data are available on social websites, therefore finding the suitable item is a challenging issue. According to recent social statistics, we have more than 930 million people are using WhatsApp with more than 340 million active daily users and 955 million people who access Facebook daily with an average daily photo uploads up to 325 million. The approach presented in this paper employs the collaborative tagging accumulated by huge number of users to improve social media recommendation. Our approach has two phases, in the first phase, we compute the tag-item weight model and in the second phase, we compute the user-tag preference model. After that we employ the two models to find the suitable items tailored to the user’s preferences and recommend the items with the highest score. Also our model can compute the tag score and suggest the tags with the highest weight to the user according to their preferences. The experiment results performed on Flicker and MovieLens prove that our approach is capable to improve the social media recommendation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
At the present time, we have a massive amount of data on the social websites, and finding the appropriate resource based on the user’s preferences is getting more attention Milicevic et al. [20] and Fang et al. [9]. According to Twitter recent statistics, we have more than 329 million users, and 930 million people are using WhatsApp with more than 340 million active daily users. Researchers started to employ social tagging known as Folksonomy to solve the problem of finding the suitable resource for each user query according to his taste and preferences, Alhamid et al. [3], Deshpande and Karypis [6], and Doerfel et al. [8].
Users of Folksonomy are not tied to specific hierarchy or structure, they are permitted to use any word or a combination of multiple words to describe and annotate resources Huang et al. [14], Pirolli and Kairam [22] and Hossain et al. [13]. Due to that flexibility in Folksonomy, it can be used as good tool to organize and share resources on the web see Fig. 1.
Some studies, Yang et al. [27] analyzed social media and mainly the tags initiated by a number of users and discovered that a pile of media tags has enough information to identify and label the key ideas of content in the media. Therefore studying collaborative tagging –Folksonomy– as a tool to improve personalization in social media recommendation is very crucial. Keyword query is the most dominant search pattern in most websites, due to its easiness and efficiency. But most users of different domains might have different view and understanding for the same item or tag Yang et al. [27]. This triggers the need for a tag-based personalized search to answer the search query according to each user’s taste. Therefore Google, Ping and other search engines have shifted towards the personalized search in their algorithms.
The work done in this paper, explore the social collaborative tagging as a means to expand and enhance the social media recommendation. In this model we are trying to throw the questions below:
-
Question A. How to propose users with proper tags during annotation/recommendation process?
-
Question C. How to achieve different weight score for each searched item?
Most of the researchers tried to answer one of these questions but not both of them at the same time in their proposed models. Mainly they are trying to compute the tag-tag similarity, item-tem similarity and user-user similarity. The missing factor is to combine all these similarities and to consider the situation where there are no enough tags, or the new users who just joined the social media site or community. Even though Folksonomy is a good tool for personalized search, it suffers some restrictions and limitation. The problem arises when we have items annotated with few tags or we have new users who never tagged any items in the past, this problem is known as cold start users/items. To deal with this limitation we need to combine or aggregate all users and items whenever we compute the item or the tag weight. Another limitation occurs when some users label an item with precise tag while other users use a generic tag. Some users use vague or unclear tags which results in synonymous tags.
Enriching media recommendation through social collaborative tagging is the goal of this work. We compute first the tag-item weight model with respect to similar tags and the user-item weight model with respect to similar items. Then we combine both weights to answer the user query according to his preferences, so we have a personalized result query.
Sections of this paper are: Related work to social tagging is introduced in section 2. In section 3, we covered folksonomy types and representation. Section 4 presents the Pearson’s chi-squared similarity model. The main ranking model is introduced in Section 5. The experimental results displayed in Section 6, and the conclusion in Section 7.
2 Related literature
We will discuss a number of research papers related to our problem.
2.1 Annotation
Stone et al. [24] proposed rank images from collaborative tagging systems based on the SVD model. A similar model was introduced by Krestel et al. [18] to study the latent topic and to apply the Dirichlet Allocation (LDA) to extract from folksonomy the hidden items. Another approach was proposed by Diaz-Aviles et al. [7] to rank items by extracting g the latent topic using the LDA technique. Wattenberg et al. [25] determine the annotation item using data visualization model. The user can annotate the item base on the previous history.
2.2 Search
The author Xie et al. [26] presented a framework to integrate different weighted score for the searched item based on the user’s need. Ifada and Nayak [16] interprets the observed tag and non-observed tag and ranks them according to the graded-relevance interpretation scheme. Mao et al. [19] tackle the tag-based recommendation by studding the users tag co-occurrence based on which link the PageRank algorithm uses to transform the score of the items into recommendations. Balakrishnan et al. [4] study the face models to tag photos and then search the tagged photos. The author proposed an AutoTag application as proof of concept. Another work was done by agharwal et al. [1], to solve the sematic query gap, due to the wide class variance, and inadequate vocabulary. Ha et al. [10] proposes a new scheme to enhance the tag-based image retrieval accuracy. Images in this scheme are grouped by the semantic similarity and information produced by the folksonomy.
2.3 Recommendation
Zhao et al. [29] modeled the tagged item and the user’s profile as graph-based ranking of multi-type interconnected items. The tags are sorted according to the GRoMO score and recommend accordingly to the user. Another work was introduced by Zhao et al. [30]. A fast recommendation algorithm by uncovering the non-overlapping user clusters and corresponding overlapping item clusters simultaneously. Similar work to use singular value decomposition to enhance tag recommendation by Min et al. [21]. The Tags weight is computed according to the number of user who used this tag and according to the tagged locations. Chen et al. [5] mine the semantic information of tags for each item and user and then employ the semantic information in matric factorization to uncover the semantic information between users and items. Another work has been proposed by Zhenzhen et al. [31]. The paper introduces a multi model item recommendation based on user similarity. It computes the users trust relation by adapting the transfer matric into random walk.
3 Study of folksonomy
3.1 Social websites
We are becoming addict to the social websites, and users of different ages are using them on daily basis, due to the benefits and advantage of this websites. Social websites allow users to collaborate with other people to build social relations to share interests and advertise for their businesses. Figure 2 shows some example of the social websites.
3.2 Broad and narrow folksonomy
Based on who has the right to tag or annotate the item we can classify Folksonomy into broad and narrow.
Hassan-Montero and Herrero-Solana [12]. The left part of Fig. 3 shows broad Folksonomy. Even though the creator did not use Tag 3 but he can still retrieve item using Tag 3. Delicious is an example of broad Folksonomy and Flicker is an example of Narrow Folksonomy.
4 Similarity model
Before we introduce our model, Table 1 will list all the terms and their associated meanings to be used in this paper.
Most studies Qian et al. [23] treated Folksonomy as 3-dimensional space, which span users, tags, and items. The 3-dimentional Folksonomy can be reshaped into three 2-dimensional spaces. The 2-dimensional space displays the following relations:
-
User-tag
-
User-item
-
Tag-item
From the above relations different similarities can be computed among tags or items or users. Depends on the relation used to calculate the tag-tag similarity or item-item similarity or user-user similarity, we yield different similarity scores. In a previous work we employed cosine score to measure the similarity among tags and items. In this paper the similarities are characterized using the homogeneity score which equals to the Pearson’s chi-squared goodness-of-fit test for independence, Agresti and Kateri [2].
The intuition of using the homogeneity score is explained through applying the Pearson’s chi-squared test to the tag-item matrix. The test value calculated below is used to determine whether there is a significant association between tags and items. The higher the score the more evidence we have to conclude a strong association between tags and items. This means that tags frequencies are utilized to recommend items occurrences. On the other hand, low scores imply that items and tags are independent, which means knowing tags frequencies will not provide accurate items recommendations.
4.1 Tag-tag
Given the tag-item relation with dimension |I| items (rows) and |T| tags (columns), we calculate the tags connection, to investigate items tagged with related tags:
-
1.
Find \( {R}_i={\displaystyle {\sum}_{j=1}^{\left|T\right|}{d}_{ij}} \) be the ith row sum, where i = 1, …,|I| and \( {C}_j={\displaystyle {\sum}_{i=1}^{\left|I\right|}{d}_{ij}} \) where j = 1, …,|T|
-
2.
We construct the relative difference matrix \( \widehat{D}=\left[{\widehat{d}}_{ij}\right] \), where \( {\widehat{d}}_{ij}=\frac{{\left({R}_i{C}_j-{d}_{ij}\right)}^2}{R_i{C}_j} \)
-
3.
Calculate the homogeneity score, H x , y , between tag x and tag y as follows
The H x , y represents the homogeneity similarity score between tag x and y. We denote the tag-tag similarity matrix as T t M.
4.2 Item-item
To form the relation between related items, we employ the user-item relation W with dimension |U| users (rows) and |I| items (columns), we calculate the following:
-
1.
Find \( {R}_i={\displaystyle {\sum}_{j=1}^{\left|I\right|}{w}_{ij}} \) be the ith row sum, where i = 1, …,|U| and \( {C}_j={\displaystyle {\sum}_{i=1}^{\left|U\right|}{w}_{ij}} \) where j = 1, …,|I|
-
2.
We construct the relative difference matrix \( \hat{W}=\left[{\hat{w}}_{ij}\right] \), where \( {\hat{w}}_{ij}=\frac{{\left({R}_i{C}_j-{w}_{ij}\right)}^2}{R_i{C}_j} \)
-
3.
Calculate the homogeneity score, H a , b , between item a and item b as follows
The H a , b represents the homogeneity item-item similarity score between two items a and b. We denote the item-item similarity matrix as I i M.
5 Recommendation model
First we need to compute the similar items to a given item model in addition to the similar tag to a given tag model. To construct the first model, which represents the user-tag preference model we need to utilize the product of the following relations:
where U t M is a normalized user-tag relation, and T t M is related tag-tag relation. This idea behind this model is to uncover similar tags to a given tag assigned by a specific user to an item. We normalize the user-item matrix to reduce the effect of the most popular tags labeled by frequent users. The new user-tag weight among all users and tags is displayed in Fig. 4.
To construct the second model, which represents the related tags to given tag, we need to utilize the product of the following relations:
where T i M is a normalized tag-item matrix, and I i M is an item-item similarity matrix. The latent tag model uncovers similar items to a given item labeled by a certain tag. We normalize the tag-item matrix to maximize the effect of the rarely tagged items. Fig. 5 below presents the tag-item weight model A TI .
Due to the flexibility of folksonomy, each person has his own taste to annotate the resource or an item with any arbitrary word. Therefore retrieving the appropriate item according to the user’s need is an important part that needs to be integrated in our model. To build the tag-based search model we employ the user-tag weight model L UT and the tag-item weight model A TI . Given a specific user who submits a query q which consists of one or more tags, the ranking score of an item is computed as:
Where S u (i, J) represents the personalized ranking score with respect to user u for an item i with a set of query J. The model uncovers user’s likely items even if the items are not labeled with the submitted query tags. Also the new users who never tagged any item or an item that has never been labeled – cold start problem – can be processed using the proposed model, since the model has the capability to uncover similar items and tags even if they are new.
6 Experimental results
6.1 Datasets
We downloaded 2 real datasets to validate our ranking model. The first dataset comes from Flicker,Footnote 1 a photo management and sharing application that allow users to upload, tag and share their photos online. The dataset contains 206,564 photos from 58,199 different photographers, and 11,386 tags, Huiskes et al. [15]. MovieLensFootnote 2 is the second dataset, which is an online sharing movie site. It allows users rate and tag the movies and generates personalized movie predictions. The MovieLens dataset contains 5580 tags applied to 10,681 movies by 71,567 users, Harper et al. [11]. We pruned both datasets and projected them in 3 two-dimensional matrices. We applied the frequency weight in all matrices.
To run the experiments tests, we divided both datasets into validation set and training set. The validation set contains one tag assignment for each user and the rest is in the training set. In order to have accurate results with 95 % confidence interval, we performed each experiment 5 times.
6.2 Evaluation metrics
We employed different metrics to measure the accuracy and coverage of our proposed model.
The first metric is F-Measure.
where P is the precision, and R the recall. We also tested 2 versions from the F-Measure F 2 and F 0.5
F 2 adds more importance on recall compared with precision.
While F 0.5 puts more weight on precision. The second metric is the Positive predictive value (PPV) and Accuracy (ACC).
To test the coverage of our model for a given search query, we ran a test to see if the model is capable to compute a ranking score for all items.
6.3 Effect of similar tags and items
Similar tags and items participate in computing the item raking score. Hence we measured F2 and F 0.5 at different values. We started by10 similar tags and items, and then 20, 30, 40 and finally tested all tags and items. As shown in Table 2 the best value was achieved at 20 similar tags/items for both F2 and F 0.5. The large number of similar tags/items has less weight in our models due to the noise associated with the tags/items.
6.4 Effect of normalization
We investigated the effect of matrix normalization. As mentioned earlier the idea behind the normalization is to reduce the influence of popular tags and common items. The accuracy for 10 similar tags and items were checked, and then increase the number until we have tested all tags and items. We compared the accuracy with normalized matrices and without. Also to show the statistical significance, we performed two-tailed paired t-tests. Table 3 reports the best accuracy value was achieved at the normalized matrices with 20 similar tags/items.
6.5 Effect of top returned item
To compare with baseline methods, we measured the positive predictive value PPV. We compared our model with Social rank algorithm, Zanardi and Capra [28], and CUM algorithm, Kim et al. [17]. We examined how each model will behave according to the user query. We started by changing the number of returned item from 1 until 10. We computed the PPV at N = 1, and N = 2 and so on. We recorded which algorithm will position the returned item at higher rank. Figure 6 shows the comparison results; clearly our tag-based models outperformed the other algorithms.
7 Conclusion
This paper tacked the issue of tag-based search, and tag recommendation. We tried to mine the related tags and items while building our ranking model. The experimental results show that our approach ranks the items and tags in a higher position according to the user’s preferences.
As a future work, we can extend our approach to be employed in different contexts such as movie recommendation, smart home, and ambient intelligence environment. Due to the simplicity of our model as it does not require any additional data other than the tagging information. We plan also to test different algorithms when calculating the similarity matrices such as Pearson correlation instead. As mentioned earlier we have problem with synonymous tags sometimes, therefore we plan to combine semantic web with our model and test the improvement in regard to this problem.
References
Agharwal A, Kovvuri R, Nevatia R, Snoek CG (2016). Tag-based video retrieval by embedding semantic content in a continuous word space. In: 2016 I.E. Winter Conference on Applications of Computer Vision (WACV) (pp 1–8). IEEE
Agresti A, Kateri M (2011) Categorical data analysis (pp 206–208). Springer, Berlin Heidelberg
Alhamid MF, Rawashdeh M, Al Osman H, Hossain MS, El Saddik A (2015) Towards context-sensitive collaborative media recommender system. Springer Multimed Tools Appl 74(24):11399–11428
Balakrishnan, S., Chaudhuri, S. and Narasayya, V., (2015). AutoTag’n search my photos: leveraging the social graph for photo tagging. In: Proceedings of the 24th international conference on world wide web companion, 163–166
Chen C, Zheng X, Wang, Y., Hong, F., & Chen, D. (2016). Capturing semantic correlation for item recommendation in tagging systems. In: Thirtieth AAAI Conference on Artificial Intelligence
Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Syst 22(1):143–177
Diaz-Aviles E, Georgescu M, Stewart A, Nejdl W (2010) Lda for on-the-fly auto tagging. In Proceedings of the fourth ACM conference on Recommender systems (pp. 309–312)
Doerfel S, Zoller D, Singer P, Niebler T, Hotho A, Strohmaier M (2016) What users actually do in a social tagging system: a study of user behavior in BibSonomy. ACM Transactions on the Web (TWEB) 10(2):14
Fang Q, Sang J, Xu C, Hossain MS (2015) Relational user attribute inference in social media. IEEE Trans Multimed 17(7):1031–1044
Ha E, Kim Y, Hwang E (2016) A categorization scheme of tag-based folksonomy images for efficient image retrieval. KIISE Transactions on Computing Practices 22(6):290–295
Harper F M, Konstan J A (2016) The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4), 19
Hassan-Montero Y, Herrero-Solana V (2006) Improving tag-clouds as visual information retrieval interfaces. In: Proceedings of the International Conference on Multidisciplinary Information Sciences and Technologies
Hossain MS, Alamri A, El Saddik A (2009) A biologically inspired framework for multimedia service management in a ubiquitous environment. Concurrency Computat: Pract Exper 21(11):1450–1466
Huang CL, Yeh PH, Lin CW, Wu DC (2014) Utilizing user tag-based interests in recommender systems for social resource sharing websites. Knowl-Based Syst 56:86–96
Huiskes MJ, Thomee B, Lew MS (2010) New Trends and Ideas in Visual Concept Detection. ACM International Conference on Multimedia Information Retrieval (MIR’10)
Ifada, N. and Nayak, R., (2016). How relevant is the irrelevant data: leveraging the tagging data for a learning-to-rank model. In: Proceedings of the ninth ACM international conference on web search and data mining, 23–32.
Kim HN, Alkhaldi A, Abdulmotaleb El Saddik, Jo GS, (2011) Collaborative user modeling with user-generated tags for social recommender systems. Expert Systems with Applications 38 (7):8488–8496
Krestel R, Fankhauser P, Nejdl W (2009) Latent dirichlet allocation for tag recommendation. In Proceedings of the third ACM conference on Recommender systems (pp. 61–68)
Mao J, Lu K, Li G, Yi M (2015) Profiling users with tag networks in diffusion-based personalized recommendation. Journal of Information Science, doi: 10.1177/0165551515603321
Milicevic AK, Nanopoulos A, Ivanovic M, (2010) Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions. Artificial Intelligence Review 33(3):187–209
Min W, Bao B-K, Xu C, Hossain MS (2015) Cross-platform multi-modal topic modeling for personalized inter-platform recommendation. IEEE Trans Multimed 17(10):1787–1801
Pirolli P, Kairam S (2013) A knowledge-tracing model of learning from a social tagging system. User Model User-Adap Inter 23(2–3):139–168
Qian S, Zhang T, Xu C, Hossain MS (2015) Social event classification via boosted multimodal supervised latent dirichlet allocation. ACM Trans Multimed Comput Commun Appl (ACM TOMM) 11(2):1 Article. 27, 27.127.22
Stone Z, Zickler T, Darrell T (2010) Toward large-scale face recognition using social network context. Proc IEEE 98(8):1408–1415
Wattenberg MM, Viégas FB, Kriss JH, McKeon MM, Heer J (2015) International Business Machines Corporation. System and method for annotation of data visualizations. U.S. Patent 9,058,316
Xie H, Li X, Wang T, Lau RY, Wong TL, Chen L, Wang FL, Li Q (2016) Incorporating sentiment into tag-based user profiles and resource profiles for personalized search in folksonomy. Inf Process Manag 52(1):61–72
Yang X, Zhang T, Xu C, Hossain MS (2015) Automatic visual concept learning for social event understanding. IEEE Trans Multimed 17(3):4658
Zanardi V, Capra L (2008) Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems. In: proceedings of ACM Conference on Recommender Systems, 51–58
Zhao W, Guan Z, Liu Z (2015a) Ranking on heterogeneous manifolds for tag recommendation in social tagging services. Neurocomputing 148:521–534
Zhao Y D, Cai S M, Tang M, Shang M S (2015b) A Fast Recommendation Algorithm for Social Tagging Systems: A Delicious Case. arXiv preprint arXiv:1512.08325
Zhenzhen X, Jiang H, Kong X, Kang J, Wang W, Xia F (2016) Cross-domain item recommendation based on user similarity. Comput Sci Inf Syst 13(2):359–373
Acknowledgments
The authors extend their appreciation to the Deanship of Scientific Research at King Saud University, Riyadh, Saudi Arabia for funding this work through the research group project no. RGP-229.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rawashdeh, M., Shorfuzzaman, M., Artoli, A.M. et al. Mining tag-clouds to improve social media recommendation. Multimed Tools Appl 76, 21157–21170 (2017). https://doi.org/10.1007/s11042-016-4039-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4039-1