Leveraging social information for personalized search

Dridi, Amna; Slimani, Yahya

doi:10.1007/s13278-017-0435-4

Leveraging social information for personalized search

Review Article
Published: 26 April 2017

Volume 7, article number 16, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Social Network Analysis and Mining Aims and scope Submit manuscript

Leveraging social information for personalized search

Download PDF

Amna Dridi¹ &
Yahya Slimani²

283 Accesses
4 Citations
Explore all metrics

Abstract

Social information retrieval becomes a very challenging task with the increase use of social networks and the amount of social information they provide continuously in different fields. In this paper, we aim at exploring different kind of social information, namely descriptions (tags) and reactions (clicks) to build user and document profiles for personalization aim. The goal is threefold: (1) propose a social user profile based on community detection considering descriptions, (2) introduce a new notion of social document profile using reactions and (3) propose a personalized ranking model based on social relevance that is computed considering the social document and user profiles. We evaluate our approach on a last.fm dataset using exact matching and approximate matching algorithms. Results show that our approach significantly outperforms the baseline in terms of effectiveness by more than 26% in NDCG@5 for approximate matching and 15% for exact matching. The improvement reaches 43% when only user profile is considered for computing relevance.

Exploiting Social Data to Enhance Web Search

Improving Personalized Search on the Social Web Based on Similarities between Users

The Searching Ranking Model Based on the Sharing and Recommending Mechanism of Social Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Search contextualization becomes a very challenging task, especially with the multiple interpretations given to the key notion context. It may refer to the characteristics and preferences of a specific user (in this case, contextualization can be referred to personalization), or it may be related to user geographic localization (when, for example, using a search engine on a smartphone), or it may refer also to user social area (searching for a person or group or page on social networks). Related to the considered interpretation of context, some specific IR branches were given birth such as personalized IR, mobile IR, social IR. Although the specific techniques related to these branches vary, the common issue of context-based IR is to improve the quality of search by providing to the user relevant results (Pasi 2010; Mahyuddin and Shahrul 2012).

In this research work, we aim to investigate both personalized IR and social IR. In fact, personalizing search becomes more challenging with the increase use of social networks. The volume of information available on social context (social networks, blogs, forums, etc.) is growing continuously, which makes from its exploration a challenging task. For instance, the extension of conventional information retrieval (IR) to incorporate the social context is becoming the bridge between the fields of information retrieval and social network analysis, which gives birth to the field social information retrieval (SIR). A main aspect of SIR models is to understand the user’s interests and preferences (user profile) expressed by user-generated content in the social context by means of descriptive actions (mainly tagging) or reactive actions (clicks, feedbacks, etc.), and then consider this user-generated content to enhance IR systems by providing to the user relevant results matching his interests. This rich repository of users’ actions triggered many research works to exploit social information for search personalization (Bouadjenek et al. 2011, 2013a; Cai and Li 2010; Vallet et al. 2010; Schenkel et al. 2008a, b; Tang et al. 2011; Wang and Jin 2010). Most of the existing techniques consider descriptive actions (tagging) as the main indicator of users interests and thus use them for building users and documents profiles. However, relying only on tagging actions to provide relevant search results to users’ needs is not sufficient. For example, a video tagged by {Volkswagen, car, advert} would be returned as a relevant result to the query “car advert” initiated by a user interested in “Volkswagen”. Knowing that the video features people speaking in fake Jamaican accents, some users would find it funny while some others would find it offensive. In this case, the video should be relevant only if it is liked by users having similar profiles to the query initiator. Consequently, the pool of users’ reactions should be exploited to refine the search space and give a new definition for social document relevance. The contrast between descriptive actions which are directly related to the content of documents and reactive actions that show users’ personal preferences makes the exploitation of social information a challenging task.

Our goal in this work is to understand the role of different types of user-generated content in search scenario and to develop a comprehensive SIR framework, consisting of hybrid modeling of conventional content (text) and user-generated content (reactive and descriptive actions), where relevance is measured in terms of these two different contents.

The reminder of the paper is structured as follows. First, we discuss related work on personalized search using social information. Second, we present the proposed social information retrieval framework. Third, we describe user and document profiling. Then, we detail the scoring model of our personalized approach and we present the evaluation framework. After that, experimental results are presented and discussed. Finally, we conclude and draw some future works.

2 Related work

Social information retrieval is an emerging area for the design and implementation of a new generation of information retrieval systems (Goh and Foo 2008). The intuition behind social information retrieval is assisting users in meeting their information needs by harnessing the wisdom of crowds. This section discusses the different categories of social information used in the literature for improving search, namely user-generated content and social relationships.

2.1 User-generated content

Annotations or tags, as main form of user-generated content, have been used extensively in social information retrieval systems for enhancing the search effectiveness, mainly for building user and document profiles. For instance, Bouadjenek et al. (2011) use tags to build user profiles and then use those profiles for query expansion. The idea is to compute social proximity between each query and the profile of its initiator. Vallet et al. (2010) present two techniques that build user and document profiles. The first technique uses a vector space model incorporating the concepts of tag inverse document frequency and tag inverse user frequency in folksonomy systems. By contrast, the second technique adapts the BM25 probabilistic model to user and document vectors. Similarly, Bouadjenek et al. (2013a) propose a framework for social Web search, called LAICOS, that constructs documents profiles based on their content and associated tags. Later and in the same context, Bouadjenek et al. (2013b) propose a new ranking function called SoPRA that considers the social dimension of the Web. They define a matching score between the document and the query based on a textual matching score and a social matching score. According to them, the social matching score expresses how similar the social representation of the document is for the query when social representation is based on the annotations associated with the document. Cai and Li (2010) examine the limitations of TF-IDF-based models showing that using absolute term frequency favors active users against non-active users. Moreover, inverted document frequency is not necessary useful in indicating users’ preferences on tags or how a document is relevant to tags. Thus, the authors use a normalized term frequency (NTF) to indicate the preference degree of a user on a tag and thus construct user profile. Then, they perform search by matching user profile and document profile.

2.2 Social relationships

In addition to tags, social relationships have been used to improve search. They have been exploited as a form of collective intelligence of other users, to which the user awards his trust. For instance, Amer Yahia et al. (2008) are among firsts that investigate social relationships in Web search under the concept definition network-aware search. They propose an efficient top-k processing when the score of an answer is computed as its popularity among members of a seeker’s network. According to them, relevance of an item is a function of the number of taggers within the seeker’s network who tagged the item with a tag in the query. In the same context, Carmel et al. (2009) rerank search results based on friendship relationships among users. Schenkel et al. (2008a, b) propose a top-k algorithm for social search and ranking with two dimensional expansions: semantic expansion that considers the relatedness of different tags and social expansion that considers the strength of relations among users. Additionally, Gou et al. (2010) propose a framework called SNDocRank that considers documents content and the relationship between information seekers and documents owners by combining TF-IDF and multi-level actor similarity (MAS) algorithm. Tang et al. (2011) selects the closest sub topics to the query and then looks for the most influential users. They have developed an influence maximization algorithm to find the sub network that closely connects influential users. Similarly, Ben Jabeur et al. (2010) define social scores based on users’ relationships which depend on users’ positions in the social network and their mutual collaborations. And Vosecky et al. (2014) propose a collaborative personalized Twitter search framework based on collaborative user model, which exploits the user’s social connections in order to obtain a comprehensive account of her preferences. The model was built with detailed parameterization of the influence of each friend and each topic.

We note that all research works described above focus on how to generate user profile using social information but none of them takes into account social document profile. In our proposal, we aim at exploiting user profile not at query time but to detect interest communities the user trusts. Moreover, we build a social document profile based on reactive actions which was not considered in related work. A work that went beyond using only tags and user relationships is by Wang et al. (2010) who define users’ interests based on users’ activities. However, the authors consider activities that are not related to documents but about social relationships such as subscription to groups. In our research, we aim to use reactive actions which are main indicators of documents social relevance or what is defined by social document popularity. The popularity notion was investigated in (He et al. 2014). He et al. propose a regularization-based algorithm bipartite user-item ranking (BUIR) to rank items by capturing three hypotheses about temporal, social and current popularity factors. But, their approach was focusing on predicting the popularity of Web 2.0 items based on user comments more than integrating this popularity to search personalization process, similarly to Tatar et al. (2011) that they propose a similar model for predicting the popularity of online articles based on user comments. Going beyond them, we aim to compute items popularity for goals of search personalization. Moreover, the item popularity that we propose is not only limited to user comments. It is calculated based on reactive actions related to items and it is defined as item/document social relevance that serves for personalizing ranking. Moreover, the item popularity is not restricted to be only global but we propose also to make it more fine-grain by computing the popularity of each item at community level. A research work which is similar to our proposed approach and it was among first works investigating social dimension in Web search, was developed by Amer Yahia et al. (2008) under the concept definition network-aware search. They investigated an efficient top-k processing when the score of an answer is computed as its popularity among members of a seeker’s network. According to them, relevance of an item is a function of the number of taggers within the seeker’s network who tagged the item with a tag in the query. In our work, we are going to do a network-aware search as well, but we extend the tag strategy by exploiting different social actions a user uses in a social network to predict the item popularity and the relevance, for us, does not ignore the content dimension. It is a combination between social and content dimensions. Another research work similar to our approach was developed by Schenkel et al. (2008a, b), Crecelius et al. (2008). They proposed a framework for exploiting social wisdom for search results ranking and recommendation with consideration of both social relations and semantic/statistical relations among items and tags. The main difference between our approach and their approach is: They define a document by a set of tags only ignoring first its basic textual content and second its social relative actions that could be valuable for estimating its relevance.

3 Social information retrieval

3.1 Framework

In this section, we provide formalization of our social information retrieval framework (Dridi 2014). We present the different entities used to formalize our framework. Then, we describe our personalized search strategy.

3.1.1 Preliminaries

Our SIR framework consists in combining both user profile and document profile in the search process. To extract these profiles, we exploit social networks as a prominent and rich source for information about both document properties and user activities. The first step toward this goal is to understand (1) which kind of information can we find in such networks and (2) how can we use it to extract document and user profiles. To this end, we distinguish the following entities as main components of the information provided by social networks:

1.
Users Represent the participants to a social network. They are defined by mutual relationships they have on the social network and reactions they provide to documents.
2.
Documents Represent the content shared by users in a social network. A document could be any searchable entity, i.e., text, image, video, song, etc.
3.
Descriptions Represent tags or annotations provided by users to describe documents. Tags can also be exploited to indicate user preferences.
4.
Reactions represent user feedback reflected by different actions (comment, like, dislike, favorite, etc.). Reactions capture user interests and the popularity of documents. In some cases, they also categorize interests as negative or positive (like, dislike, etc.)
5.
Communities represent sets of users who are interconnected. Users can be linked based on different criteria such as friendship, location, behavior, or simply belonging to the same social network.

Using these different entities defined above, we define the social information retrieval graph SG as a tuple SG = {U, D, T, C, A ₁, A ₂} where U = {u ₁,…, u _k}, D = {d ₁,…, d _l}, T = {t ₁,…, t _m} and C = {c ₁,…, c _e} are, respectively, the set of users, documents, descriptions (tags) and reactions (clicks). A ₁ = {u _i, d _j, t _i} ∈ U × D × T is a set of descriptions reflecting each user u _i tagging document d _j with tag t _f, and A ₂ = {u _i, d _j, c _r} ∈ U × D × C is a set of clicks reflecting each user u _i reacting to document d _j using click c _r (see Fig. 1).

3.1.2 Overview

We propose to provide tailored answers to users’ needs by exploiting social information in two different stages. First, we use descriptions (tags) to create, for each user, the community of interest he trusts to judge the relevance of documents. Second, we use both descriptions and reactions to define a social profile for each document. With respect to the proposed user’s profile and document profile, we propose a new ranking that returns personalized results. The general architecture of our model is shown in Fig. 2.

Our personalized search strategy consists in the following steps. First, we extract users’ communities from social networks based on users’ profiles. The profile of a user is defined by the set of tags he used to annotate documents. Thus, the community detection problem is reduced to computing tags similarity by using the subgraph G = (U, T) of the social graph SG. Second, upon receiving a search query Q = {q ₁; :::; q _n} from a user u, we proceed as follows:

(a)
We retrieve the top-k relevant results to the query. Each result is associated with a semantic relevance score, i.e., content relevance score; the more relevant and important a result is, with respect to the query, the higher its relevance score is.
(b)
For each of the top-k results, we compute its social score based on how popular it is in u’s community. This popularity is defined by related clicks (share, favorite, comment, etc.) and denoted social relevance score.
(c)
The results are then reranked based on the combination of the semantic relevance score and the social relevance score.

4 User and document profiling

In the following section, we present the method used to create user and document profiles. For each profile, we distinguish a semantic profile and a social profile.

4.1 Definitions

The different entities of social networks provide two types of information defined as follows:

Semantic information Semantic information reflects meanings, roles and properties. This type of information is provided by descriptions where users tag document with semantic labels. Descriptions about texts could be keywords, author, etc., descriptions about songs include genre, singer, title, etc., descriptions about videos include title, category, etc. Descriptions give labels to user interests such as keywords used to tag texts, music genres, videos categories, etc.

Social information Social information reflects activities in social networks. These activities are represented by the links between entities in the social graph shown in Fig. 1. They include descriptive (tags) and reactive (clicks) actions of users toward documents.

4.2 User profile

Based on semantic and social information described above, we define two types of user profiles: User semantic profile and user social profile defined as follows:

4.2.1 User semantic profile

User semantic profile consists in the set of descriptions (tags) used by a user u to annotate documents of his choice. We denote this set by T _u = {t ₁; t ₂;…; t _k}. In social networks, tags represent a strong indicator of users’ interests.

4.2.2 User social profile

User social profile consists in a set of users that have a relationship with user u, namely the community of u. We denote this set by C _u = {u ₁; u ₂;…; u _n1} where u _i is a user i linked to user u.

4.3 Document profile

Similarly to user profile, we define two types of profiles for documents: Document semantic profile and document social profile as follows:

4.3.1 Document semantic profile

Document semantic profile consists in a set of features that represent a document d. The set of features is predefined and has a fixed length k. Document features include a set of keywords representing its content. We represent the semantic profile of document d as vector of k features F _d = {f ₁; f ₂;…; f _k}. The values of these features are mainly provided by textual content and tags.

4.3.2 Document social profile

Document social profile consists in the set of users who reacted to the document d. We denote this set by U _d = {u ₁; u ₂;…; u _n2} where the u _i is a user who reacted to document d. The number of users who reacted to document d indicates its popularity in the social network.

5 Scoring model

In the following, we present our proposed scoring model for personalized search that considers two scores:

(1) A non-personalized score expressed by document relevance score that depends on the semantic relevance of the document where only the conventional content is considered and (2) a personalized score given by the user relevance score that computes the social relevance.

5.1 Scoring function

Given a user u, and a query Q, we search documents that are relevant to the query Q and match the interests of user u. A set of document results {d ₁,…, d _n3} is returned to the user where the score of each result d _i is given as follows:

$$S\left( {Q; \, u; \, d_{\text{i}} } \right) \, = \lambda S_{\text{document}} \left( {d_{\text{i}} ; \, Q} \right) \, + \, (1 - \lambda ) \, S_{\text{user}} \left( {d_{\text{i}} ; \, u} \right)$$

(1)

where S _document (d _i; Q) denotes the document relevance score of d _i to Q and S _user (d _i; u) denotes the user relevance score of d _i to u. The parameter λ controls the amount of personalization (0 ≤ λ ≤ 1). Setting λ = 1 means that we aim at finding what matches the query and setting λ = 0 means that we aim at finding what matches user interests. Values in between combine the two components with different degrees.

5.2 Document relevance score

The document relevance score, of a document d given a query Q, indicates to which degree d matches Q. We compute the document relevance score using a similarity measure between d and Q:

$$S_{\text{document}} \left( {d; \, Q} \right) \, = {\text{ Similarity }}\left( {d; \, Q} \right)$$

(2)

The similarity measure can take different forms such as cosine similarity, Jaccard similarity or Euclidean distance in the case of query-by-example paradigm, or TF-IDF or BM25 in the case of keywords-query. In the next section, we will detail the similarity measure used because it is highly related to the experimental data.

5.2.1 User relevance score

The community of a user has a big influence on documents he goes through. This means that for a document to be liked by the user, it needs to be popular in his community. Thus, we exploit the social profile of the user to compute the user relevance score of document d in the community of u as follows:

$$S_{\text{user}} \left( {d; \, u} \right) = \frac{{|U_{d} \cap C_{u} |}}{{\left| {\text{Users}} \right|}}$$

(3)

where U _d is the set of users who reacted to document d, C _u is the set of users in the community of user u. and |Users| is the total number of users in the network. It is important to mention that we focus, in this paper, only on positive reactions. So, the more reactions there are, the more popular the document is.

In addition to the social profile of the user, we can enhance the user relevance score by taking into account his semantic profile. This means that among the users of the community of user u, we just target those who have similar interests as u. Recall that user interests are reflected by tags. In this case, we would limit the community of user u only to users who have similar tagging actions as user u. We say that two users u _i and u _j have similar tagging behavior if the size intersection of their tag sets T _u1 and T _u2 exceeds a certain threshold. The threshold setting depends on the social network and the amount of data it contains.

6 Evaluation framework

To evaluate our model, we have encountered some issues related to luck of suitable dataset. To overcome this problem, we have used a dataset from Last.fm^{Footnote 1},^{Footnote 2} and we have enriched it from Wikipedia (Dridi and Kacimi 2015).

In the following section, we describe the data collection of Last.fm, the methodology employed for identifying users’ communities and the evaluation methodology and metrics.

6.1 Experimental data

We have considered a music track as a document. The dataset contains music tracks, users, and their activities. As there was not enough semantic information about music tracks, we have exploited Wikipedia^{Footnote 3} to enrich the dataset. For each music track, we have used the title to access its Wikipedia page. From the Wikipedia page, we have got information about the music track from the infobox including singer, producer, writer, year, label and other features. Further, we have removed all tracks that do not have Wikipedia pages. Regarding user activities, the dataset contains descriptions corresponding to tags, and reactions corresponding to clicks which are the only reactions available in the dataset. Table 1 gives statistics about the resulted dataset.

Table 1 Statistical characteristics of the Last.fm dataset

Full size table

6.1.1 Community identification

As described in previous sections, user social profile is defined by user’s community and user relevance score depends on user’s community. Recall that user interests are reflected by tags. So, we define the community of user u by the users who have similar tagging actions as user u. We say that two users u_i and u_j have similar tagging behavior if the size intersection of their tag sets T _u1 and T _u2 exceeds a certain threshold. The threshold setting depends on the social network and the amount of data it contains.

In our case, we proceed as follows: (1) we defined 15 communities that correspond to top 15 tracks genres (e.g., pop, rock, rap, soul), (2) users belong the same community if they clicked on same tracks genre and (3) user belongs a community if percentage of tracks he clicked on his published content in the network is more than 30%. A user u can belong more than one community. In this case, the social score is normalized based on number of communities user u belongs.

6.1.2 Evaluation methodology

Considering the experimental data of music, we propose to use a query-by-example paradigm, i.e., a query Q corresponds to a music track. To compute the document relevance score using a similarity measure between a document d and a query Q, we use in this paper Jaccard distance as follows:

$${\text{Similarity}}\left( {d; \, Q} \right) \, = \frac{{|F_{d} \cap F_{Q} |}}{{|F_{d} \cup F_{Q} |}}$$

(4)

where F _d and F _Q represent the set of features of document d (music track) and Q, respectively. These features are extracted from the semantic document profiles of d and Q.

We have run our experiments using 100 queries. The 100 queries were randomly selected from the top 250 most clicked music tracks. The reason of this choice is driven by the requirements of the automatic assessment of the results described below. As a further step, we proceeded with the selection of the query initiators. For each query, we have selected the users who clicked on the query music track and ranked them. The rank of users was computed based on the number of clicks he has on that track and the number of clicks he has globally. The query initiator was then selected randomly among the top 20 users.

After selecting all queries and their query initiators, we have used our model to rank the results of each query. We have followed two IR scenarios for music search. The first one is exact matching that matches tracks according to selected features (singer, writer, producer) and the second one is approximate matching that matches music tracks based on an approximate interval of time using the feature year.

Considering the exact matching algorithm and the approximate matching algorithm, we have tested different strategies of our model:

(a)
Semantic document profile (baseline) Consists in returning results that match only the semantic profile of the query without considering the query initiator profile. This is achieved by setting the parameter λ = 1. This is the baseline of IR that returns results based on the content relevance (semantic relevance).
(b)
Document profile (semantic + social) Consists in returning results that match the semantic and the social profile of the query without considering the query initiator profile. This is achieved by setting the community of the query initiator to all users in the network. Thus, the S _user score would be independent from the query initiator reflecting only the popularity of the music track in the whole network.
(c)
User profile (social) Consists in returning results that match the social profile of the query initiator setting λ = 0. In these experiments, the community of any user is the set of all users in the network, and thus, the score is solely based on the popularity of the music track in the network. This setting is equivalent to document profile (social) where the result matches only the social profile of the query.
(d)
User profile (semantic + social) Consists in taking into account both the social and the semantic profiles of the query initiator. In this setting, results should match the interests of the community and the query initiator. This is obtained by setting λ = 0 and restricting the community of the query initiator only to users with similar tagging behavior.
(e)
Document profile + user profile (semantic + social) consists in using all the elements of our approach. We set λ = 0.5 and we use both semantic and social profiles for music tracks and users to rank results.

6.1.3 Assessment and evaluation metrics

To avoid any subjectivity in the assessment of the results, we have exploited click information to indicate whether a user likes a document (music track) or not. So, for each returned result we check if the user has clicked on it. If he has clicked, then we set the result as relevant and give a value of 1; otherwise, it is irrelevant and has a value of 0. To measure the effectiveness of our approach, we have used:

(a)
The Precision P@k which represents the fraction of retrieved documents that are relevant to the query considering only the top-k results. It is given by:
$$P@k = \frac{{|{\text{RelevantDoc}} \cap {\text{topkDoc Results}}|}}{k}$$
(5)
(b)
The mean average precision (MAP) which is a widely adopted standard measure in IR given by:
$${\text{MAP}}@n = \frac{{\sum\nolimits_{i = 1}^{N} {{\text{Average}}P@n_{i} } }}{N}$$
(6)
where N is the total number of queries, n is a given position and AverageP is the average precision of each query.
(c)
The normalized discounted cumulative gain (NDCG) which is a measure of ranking quality that uses graded relevance scale of documents in an IR result set, given by:
$${\text{NDCG}}_{k} = \frac{{{\text{DCG}}_{k} }}{{{\text{IDCG}}_{k} }}$$
(7)
where DCGk is defined as follows:
$${\text{DCG}}_{k} = \sum\limits_{i = 1}^{k} {\frac{{2_{i}^{\text{rel}} - 1}}{{^{{{ \log }_{2} (i + 1)}} }}}$$
(8)
where k is a particular rank position, reli is the graded relevance of the result at position k, and IDCG is the ideal DCG obtained by sorting documents of a result list by relevance producing the maximum possible DCG till position k.

7 Experimental results and discussion

The main idea of these experiments is based on the following assumption:

For a query Q issued by user u, relevant documents (music tracks) are those having similar features to Q and highly popular on u’s community.

The objective of the experiments is to demonstrate that our proposed approach for SIR, where user profile and document profile are used with an important consideration of the social context, allows to improve the effectiveness of IR system with a very good results. The results obtained in both algorithms, exact matching and approximate matching, show the competitiveness of our social-based personalization approach.

We carried out the experiments following the exact matching and approximate matching algorithms. Recall that for exact matching, we study the effectiveness @5 and @10 only, while for approximate matching we study the effectiveness @5, @10, @20 and @100 because for exact matching the similarity measure is strict which gives a short result list. However, for approximate matching, there is an interval for the similarity measure which gives a long result list. In our case, the result list of approximate matching exceeds 1000 in most query cases.

Tables 2, 3, 4 and 5 show the precision and MAP values for the different strategies of our model, respectively, related to exact matching and approximate matching IR scenarios. It is clear from the results that user profile approaches (social) perform the best in terms of precision and MAP values, more precisely, when the social and the semantic user profiles are both taken into account to find relevant documents. Compared to the baseline where only the semantic document profile is used, the precision@5 increases from 0.285 to 0.478 in exact matching results, and from 0.0 to 0.223 in approximate matching results, which is a substantial improvement. Similarly, the MAP@5 highly improves, respectively, from 0.242 to 0.423 in exact matching and from 0.0 to 0.09 in approximate matching. We note a decrease in precision and MAP for the top 10 results of exact matching and for all approximate matching results which is due to the decrease in the popularity of music tracks at lower ranks.

Table 2 Mean precision values of exact matching for all queries

Full size table

Table 3 MAP values of exact matching for all queries

Full size table

Table 4 Mean precision values of approximate matching for all queries

Full size table

Table 5 MAP values of approximate matching for all queries

Full size table

After user profile approaches come the document profile combined with user profile approaches showing also high precision and MAP values in both exact matching and approximate matching algorithms. Thus, whenever user profile approaches are adopted, i.e., personalization, we can increase the satisfaction of the user. Now, considering only Document Profile approaches we can see a notable difference when we use the semantic profile of the document (music track) and when we enhance it with its social profile. We can see that matching music tracks based on their social profile increases the precision@5 from 0.285 to 0.433 in exact matching, and from 0.0 to 0.122 in approximate matching which is a substantial improvement. Similarly, the MAP@5 highly improves, respectively, in both algorithms from 0.242 to 0.405 in exact matching, and from 0.0 to 0.074 in approximate matching. The reason is that even though the social profile of the document does not depend on the query initiator, it depends on other users in the network. This means that it is enough to be in the same network to influence the taste of any participant. So, this strategy is indirectly user profile which explains the high improvement in the results.

To analyze the impact of the parameter λ on the performance of the model, we use different values ranging from 0 to 1 for the strategy that combines all types of profiles. The results related to the exact matching algorithm are given in Tables 6 and 7. In the same line as previous results, we can see that the best results are achieved when λ = 0 which corresponds to user profile strategy. To summarize, the overall results of these experiments demonstrate that involving the user in the retrieval process can be done in different ways and all of them improves highly the satisfaction of the user compared to document profile approaches.

Table 6 Impact of λ on precision

Full size table

Table 7 Impact of λ on MAP

Full size table

In these experiments, we have exploited clicks, by analogy to IR evaluation paradigm, as relevance judgments. However, actually, in the case of music, a simple click does not really reveal the relevance judgment since the user can stop the music after some seconds. To overcome this problem, we test our approach by introducing a graded relevance scale based on the number of clicks considering that having clicked a track several times is a much stronger indication of its relevance. Tables 8 and 9 show NDCG values @5 and @10 for exact matching algorithm, and @5, @10 and @20 for approximate matching. NDCG values approve the results given by precision and MAP. For instance, NDCG@5 goes from 0.633 given by the baseline to 0.730 in exact matching results when the social component is introduced, presenting an improvement of 15%. And, it reaches 0.799 when only the user profile approach is considered, presenting an improvement exceeding 26%. Similarly, for approximate matching results, NDCG@5 highly improves from 0.466 to 0.633 for user profile approach, reaching a significant improvement of 43%. Considering our model (document profile + user profile), NDCG@5 shows an improvement of 15% for exact matching which is a substantial improvement. On the other hand, NDCG@5 shows a significant improvement of 21% for approximate matching. It is clear again that when user profile is introduced, i.e., the process is personalized, IR results are performed.

Table 8 NDCG values of exact matching for all queries

Full size table

Table 9 NDCG values of approximate matching for all queries

Full size table

For all results given by precision, MAP and NDCG for both algorithms exact matching and approximate matching, we note a significant improvement once the User Profile approaches are introduced. Thus, it is clear that our personalization approach using social networks for defining both document and user profiles performs the best the effectiveness of IR results comparing with the baseline that considers only the semantic document profile.

All results given by the different strategies of our model outperform the baseline. This is perhaps not surprising since our evaluation dataset is collected from last.fm which is a collaborative recommender system. Thus, although the potential of our personalization approach using social information for building both user and document profiles is evident in the context of music IR, clearly there is a need for testing the effectiveness of the model on different datasets, in order to ensure the performance of our approach for any IR context. Furthermore, there is a need for more research to determine the impact of different users’ reactions under which our model performs better or worse.

8 Conclusions and future work

In this paper, we have investigated a personalized social search model based on information produced on social networks, namely descriptions (tags) and reactions (clicks). These two kinds of social information were used for user and document profiling. Our approach goes beyond existing approaches on modeling document profile in social context taking into account in addition to descriptive actions, reactive actions that reflect document popularity. A new reranking function was adopted. It considers a non-personalized score given by semantic relevance and a personalized score given by social relevance. Our proposed approach outperformed the baseline of non-personalized model where only the semantic relevance was considered. The user profile approach gave better overall results than other strategies on two algorithms exact matching and approximate matching.

As future work, we aim at testing our approach on other real-world datasets other than the music one. In addition, we plan to investigate different types of reactions other than clicks that could be significant for other fields. For example, in Twitter, clicks such as retweet, favorite, hashtags could be used as citations for scholarly communication (Dridi 2015). Moreover, comments, as descriptive actions, are definitely a valuable source for user interests and can reveal a lot more about what the user is looking for and how his taste changes over time.

Notes

References

Amer Yahia S, Benedikt M, Lakshmanan VSL (2008). Efficient network aware search in collaborative tagging sites. In Proceeding of PVLDB 2008, Auckland, 24–30 August, 2008, pp. 710–721
Ben Jabeur L, Tamine L, Boughanem M (2010). A social model for literature access: towards a weighted social network of au-thors. In: Proceeding of RIAO 2010, Paris, 28–30 April 2010, pp. 32–39
Bouadjenek MR, Hacid H, Bouzeghoub M, Daigremont J (2011). Personalized social query expansion using social bookmarking systems. In: Proceeding of SIGIR 2011, Bei-jing, 24–28 July 2011, pp. 1113–1114
Bouadjenek MR, Hacid H, Bouzeghoub M (2013). LAICOS: an open source platform for personalised social web search. In: Proceeding of KDD 2013, Chicago, 11–14 August 2013, pp. 1446–1449
Bouadjenek MR, Hacid H, Bouzeghoub M (2013). SoPRA: A new social personalized ranking function for improving web search. In: Proceeding of SIGIR 2013, Dublin, July 28–August 01 2013, pp. 861–864
Cai Y, Li Q (2010) Personalized search by tag-based user profile and resource profile in col-laborative tagging systems. In: Proceedings of CIKM 2010, Toronto, 26–30 October 2010, pp. 969–978
Carmel D, Zwerdling N, Guy I, Ofek-Koifman S, Har’el N, Ronen I, Uziel E, Yogev S, Chernov S (2009) Personal-ized social search based on the user’s social network. In: Proceedings of CIKM 2009, Hong Kong, 02–06 November 2009, pp. 1227–1236
Crecelius T, Kacimi M, Michel S, Neumann T, Parreira JX, Schenkel R, Weikum G (2008) Making SENSE: socially enhanced search and exploration. In: Proceedings of PVLDB 2008, Singapore, 20–24 July 2008, pp. 1480–1483
Dridi A (2014) Information retrieval framework based on social document profile. In: Joint Proceedings of the CAiSE 2014 forum and CAiSE 2014 doctoral consortium co-located with the 26th international conference on advanced information systems engineering (CAiSE 2014), Thessaloniki, 18–20 June 2014
Dridi A (2015) Retrieving research trends in Twitter. In: Proceedings of the 14th Dutch-Belgian information retrieval workshop (DIR 2014), Amesterdam, 27 November 2015, pp. 39
Dridi A, Kacimi M (2015) KISS MIR: keep it semantic and social music infor-mation retrieval. In: Proceedings of the 7th international joint conference on knowl-edge discovery and information retrieval (KDIR’15), Lisbon, 12–14 November 2015, pp. 433–439
Goh D, Foo S (2008) Social information retrieval systems: emerging technologies and applications for searching the web ef-fectively. ISBN13: 9781599045436, DOI: 10.4018/978-1-59904-543-6
Gou L, Zhang XL, Chen HH, Kim JH, Giles CL (2010) Social network document ranking. In: Proceedings of JDCL 2010, Gold Coast, 21–25 June 2010, pp. 313–322
He X, Gao M, Kan MY, Liu Y, Sugiyama K (2014) Predicting the popularity of web 2.0 items based on user comments. In: Proceeding of SIGIR 2014, Gold Coast, 06–11 July 2014, pp. 233–242
Mahyuddin KM, Shahrul AN (2012) Infor-mation retrieval model: a social network extraction perspective. CAMP 2012, Kuala Lumpur, 13–15 March 2012, pp. 322–326
Pasi G (2010) Issues in personalizing information retrieval. IEEE intelligent bulletin 11(1):3–7
Google Scholar
Schenkel R, Crecelius T, Kacimi M, Michel S, Neumann T, Parreira JX, Weikum G (2008) Efficient top-k querying over social-tagging networks. In: Proceedings of SIGIR 2008, Singapore, 20–24 July 2008, pp. 523–530
Schenkel R, Crecelius T, Kacimi M, Neu-mann T, Parreira JX, Spaniol M, Weikum G (2008b) Social wisdom for search and recommendation. In IEEE Data Eng Bull 2008:40–49
Google Scholar
Tang J, Wu S, Gao B, Wan Y (2011) Topic-level social network search. In: Proceedings of KDD 2011, San Diego, 21–24 August 2011, pp. 769–772
Tatar A, Leguay J, Antoniadis P, Limbourg A, De Amorim MD, Fdida S (2011) Predicting the popularity of online articles based on user comments. In: Proceedings of WIMS 2011, Sogndal, 25–27 May 2011, pp. 67:1–67:8
Vallet D, Cantador I, Joemon MJ (2010) Personalizing web search with folksonomy-based user and document profiles. In: Proceedings of ECIR 2010, Milton Keynes, 28–31 March 2010, pp. 420–431
Vosecky J, Leung KWT, Ng W (2014) Collaborative personalized twitter search with topic-language models. In: Proceedings of SIGIR 2014, Gold Coast, 06–11 July 2014, pp. 53–62
Wang Q, Jin H (2010) Exploring online social activities for adaptive search personalization. In: Proceedings of CIKM 2010, Toronto, 26–30 October 2010, pp. 999–1008

Download references

Author information

Authors and Affiliations

Higher Institute of Management of Tunis, Bouchoucha City, Le Bardo, 2000, Tunis, Tunisia
Amna Dridi
High Institute of Multimedia Arts of Manouba, University Campus, 2010, La Manouba, Tunisia
Yahya Slimani

Authors

Amna Dridi
View author publications
You can also search for this author in PubMed Google Scholar
Yahya Slimani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amna Dridi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dridi, A., Slimani, Y. Leveraging social information for personalized search. Soc. Netw. Anal. Min. 7, 16 (2017). https://doi.org/10.1007/s13278-017-0435-4

Download citation

Received: 21 July 2016
Revised: 13 March 2017
Accepted: 15 April 2017
Published: 26 April 2017
DOI: https://doi.org/10.1007/s13278-017-0435-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Leveraging social information for personalized search

Abstract

Similar content being viewed by others

Exploiting Social Data to Enhance Web Search

Improving Personalized Search on the Social Web Based on Similarities between Users

The Searching Ranking Model Based on the Sharing and Recommending Mechanism of Social Network

Explore related subjects

1 Introduction

2 Related work

2.1 User-generated content

2.2 Social relationships

3 Social information retrieval

3.1 Framework

3.1.1 Preliminaries

3.1.2 Overview

4 User and document profiling

4.1 Definitions

4.2 User profile

4.2.1 User semantic profile

4.2.2 User social profile

4.3 Document profile

4.3.1 Document semantic profile

4.3.2 Document social profile

5 Scoring model

5.1 Scoring function

5.2 Document relevance score

5.2.1 User relevance score

6 Evaluation framework

6.1 Experimental data

6.1.1 Community identification

6.1.2 Evaluation methodology

6.1.3 Assessment and evaluation metrics

7 Experimental results and discussion

8 Conclusions and future work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation