1 Introduction

Search contextualization becomes a very challenging task, especially with the multiple interpretations given to the key notion context. It may refer to the characteristics and preferences of a specific user (in this case, contextualization can be referred to personalization), or it may be related to user geographic localization (when, for example, using a search engine on a smartphone), or it may refer also to user social area (searching for a person or group or page on social networks). Related to the considered interpretation of context, some specific IR branches were given birth such as personalized IR, mobile IR, social IR. Although the specific techniques related to these branches vary, the common issue of context-based IR is to improve the quality of search by providing to the user relevant results (Pasi 2010; Mahyuddin and Shahrul 2012).

In this research work, we aim to investigate both personalized IR and social IR. In fact, personalizing search becomes more challenging with the increase use of social networks. The volume of information available on social context (social networks, blogs, forums, etc.) is growing continuously, which makes from its exploration a challenging task. For instance, the extension of conventional information retrieval (IR) to incorporate the social context is becoming the bridge between the fields of information retrieval and social network analysis, which gives birth to the field social information retrieval (SIR). A main aspect of SIR models is to understand the user’s interests and preferences (user profile) expressed by user-generated content in the social context by means of descriptive actions (mainly tagging) or reactive actions (clicks, feedbacks, etc.), and then consider this user-generated content to enhance IR systems by providing to the user relevant results matching his interests. This rich repository of users’ actions triggered many research works to exploit social information for search personalization (Bouadjenek et al. 2011, 2013a; Cai and Li 2010; Vallet et al. 2010; Schenkel et al. 2008a, b; Tang et al. 2011; Wang and Jin 2010). Most of the existing techniques consider descriptive actions (tagging) as the main indicator of users interests and thus use them for building users and documents profiles. However, relying only on tagging actions to provide relevant search results to users’ needs is not sufficient. For example, a video tagged by {Volkswagen, car, advert} would be returned as a relevant result to the query “car advert” initiated by a user interested in “Volkswagen”. Knowing that the video features people speaking in fake Jamaican accents, some users would find it funny while some others would find it offensive. In this case, the video should be relevant only if it is liked by users having similar profiles to the query initiator. Consequently, the pool of users’ reactions should be exploited to refine the search space and give a new definition for social document relevance. The contrast between descriptive actions which are directly related to the content of documents and reactive actions that show users’ personal preferences makes the exploitation of social information a challenging task.

Our goal in this work is to understand the role of different types of user-generated content in search scenario and to develop a comprehensive SIR framework, consisting of hybrid modeling of conventional content (text) and user-generated content (reactive and descriptive actions), where relevance is measured in terms of these two different contents.

The reminder of the paper is structured as follows. First, we discuss related work on personalized search using social information. Second, we present the proposed social information retrieval framework. Third, we describe user and document profiling. Then, we detail the scoring model of our personalized approach and we present the evaluation framework. After that, experimental results are presented and discussed. Finally, we conclude and draw some future works.

2 Related work

Social information retrieval is an emerging area for the design and implementation of a new generation of information retrieval systems (Goh and Foo 2008). The intuition behind social information retrieval is assisting users in meeting their information needs by harnessing the wisdom of crowds. This section discusses the different categories of social information used in the literature for improving search, namely user-generated content and social relationships.

2.1 User-generated content

Annotations or tags, as main form of user-generated content, have been used extensively in social information retrieval systems for enhancing the search effectiveness, mainly for building user and document profiles. For instance, Bouadjenek et al. (2011) use tags to build user profiles and then use those profiles for query expansion. The idea is to compute social proximity between each query and the profile of its initiator. Vallet et al. (2010) present two techniques that build user and document profiles. The first technique uses a vector space model incorporating the concepts of tag inverse document frequency and tag inverse user frequency in folksonomy systems. By contrast, the second technique adapts the BM25 probabilistic model to user and document vectors. Similarly, Bouadjenek et al. (2013a) propose a framework for social Web search, called LAICOS, that constructs documents profiles based on their content and associated tags. Later and in the same context, Bouadjenek et al. (2013b) propose a new ranking function called SoPRA that considers the social dimension of the Web. They define a matching score between the document and the query based on a textual matching score and a social matching score. According to them, the social matching score expresses how similar the social representation of the document is for the query when social representation is based on the annotations associated with the document. Cai and Li (2010) examine the limitations of TF-IDF-based models showing that using absolute term frequency favors active users against non-active users. Moreover, inverted document frequency is not necessary useful in indicating users’ preferences on tags or how a document is relevant to tags. Thus, the authors use a normalized term frequency (NTF) to indicate the preference degree of a user on a tag and thus construct user profile. Then, they perform search by matching user profile and document profile.

2.2 Social relationships

In addition to tags, social relationships have been used to improve search. They have been exploited as a form of collective intelligence of other users, to which the user awards his trust. For instance, Amer Yahia et al. (2008) are among firsts that investigate social relationships in Web search under the concept definition network-aware search. They propose an efficient top-k processing when the score of an answer is computed as its popularity among members of a seeker’s network. According to them, relevance of an item is a function of the number of taggers within the seeker’s network who tagged the item with a tag in the query. In the same context, Carmel et al. (2009) rerank search results based on friendship relationships among users. Schenkel et al. (2008a, b) propose a top-k algorithm for social search and ranking with two dimensional expansions: semantic expansion that considers the relatedness of different tags and social expansion that considers the strength of relations among users. Additionally, Gou et al. (2010) propose a framework called SNDocRank that considers documents content and the relationship between information seekers and documents owners by combining TF-IDF and multi-level actor similarity (MAS) algorithm. Tang et al. (2011) selects the closest sub topics to the query and then looks for the most influential users. They have developed an influence maximization algorithm to find the sub network that closely connects influential users. Similarly, Ben Jabeur et al. (2010) define social scores based on users’ relationships which depend on users’ positions in the social network and their mutual collaborations. And Vosecky et al. (2014) propose a collaborative personalized Twitter search framework based on collaborative user model, which exploits the user’s social connections in order to obtain a comprehensive account of her preferences. The model was built with detailed parameterization of the influence of each friend and each topic.

We note that all research works described above focus on how to generate user profile using social information but none of them takes into account social document profile. In our proposal, we aim at exploiting user profile not at query time but to detect interest communities the user trusts. Moreover, we build a social document profile based on reactive actions which was not considered in related work. A work that went beyond using only tags and user relationships is by Wang et al. (2010) who define users’ interests based on users’ activities. However, the authors consider activities that are not related to documents but about social relationships such as subscription to groups. In our research, we aim to use reactive actions which are main indicators of documents social relevance or what is defined by social document popularity. The popularity notion was investigated in (He et al. 2014). He et al. propose a regularization-based algorithm bipartite user-item ranking (BUIR) to rank items by capturing three hypotheses about temporal, social and current popularity factors. But, their approach was focusing on predicting the popularity of Web 2.0 items based on user comments more than integrating this popularity to search personalization process, similarly to Tatar et al. (2011) that they propose a similar model for predicting the popularity of online articles based on user comments. Going beyond them, we aim to compute items popularity for goals of search personalization. Moreover, the item popularity that we propose is not only limited to user comments. It is calculated based on reactive actions related to items and it is defined as item/document social relevance that serves for personalizing ranking. Moreover, the item popularity is not restricted to be only global but we propose also to make it more fine-grain by computing the popularity of each item at community level. A research work which is similar to our proposed approach and it was among first works investigating social dimension in Web search, was developed by Amer Yahia et al. (2008) under the concept definition network-aware search. They investigated an efficient top-k processing when the score of an answer is computed as its popularity among members of a seeker’s network. According to them, relevance of an item is a function of the number of taggers within the seeker’s network who tagged the item with a tag in the query. In our work, we are going to do a network-aware search as well, but we extend the tag strategy by exploiting different social actions a user uses in a social network to predict the item popularity and the relevance, for us, does not ignore the content dimension. It is a combination between social and content dimensions. Another research work similar to our approach was developed by Schenkel et al. (2008a, b), Crecelius et al. (2008). They proposed a framework for exploiting social wisdom for search results ranking and recommendation with consideration of both social relations and semantic/statistical relations among items and tags. The main difference between our approach and their approach is: They define a document by a set of tags only ignoring first its basic textual content and second its social relative actions that could be valuable for estimating its relevance.

3 Social information retrieval

3.1 Framework

In this section, we provide formalization of our social information retrieval framework (Dridi 2014). We present the different entities used to formalize our framework. Then, we describe our personalized search strategy.

3.1.1 Preliminaries

Our SIR framework consists in combining both user profile and document profile in the search process. To extract these profiles, we exploit social networks as a prominent and rich source for information about both document properties and user activities. The first step toward this goal is to understand (1) which kind of information can we find in such networks and (2) how can we use it to extract document and user profiles. To this end, we distinguish the following entities as main components of the information provided by social networks:

  1. 1.

    Users Represent the participants to a social network. They are defined by mutual relationships they have on the social network and reactions they provide to documents.

  2. 2.

    Documents Represent the content shared by users in a social network. A document could be any searchable entity, i.e., text, image, video, song, etc.

  3. 3.

    Descriptions Represent tags or annotations provided by users to describe documents. Tags can also be exploited to indicate user preferences.

  4. 4.

    Reactions represent user feedback reflected by different actions (comment, like, dislike, favorite, etc.). Reactions capture user interests and the popularity of documents. In some cases, they also categorize interests as negative or positive (like, dislike, etc.)

  5. 5.

    Communities represent sets of users who are interconnected. Users can be linked based on different criteria such as friendship, location, behavior, or simply belonging to the same social network.

Using these different entities defined above, we define the social information retrieval graph SG as a tuple SG = {U, D, T, C, A 1, A 2} where U = {u 1,…, u k }, D = {d 1,…, d l}, T = {t 1,…, t m } and C = {c 1,…, c e } are, respectively, the set of users, documents, descriptions (tags) and reactions (clicks). A 1 = {u i , d j , t i } ∈ U × D × T is a set of descriptions reflecting each user u i tagging document d j with tag t f , and A 2 = {u i , d j , c r } ∈ U × D × C is a set of clicks reflecting each user u i reacting to document d j using click c r (see Fig. 1).

Fig. 1
figure 1

Social information retrieval graph

3.1.2 Overview

We propose to provide tailored answers to users’ needs by exploiting social information in two different stages. First, we use descriptions (tags) to create, for each user, the community of interest he trusts to judge the relevance of documents. Second, we use both descriptions and reactions to define a social profile for each document. With respect to the proposed user’s profile and document profile, we propose a new ranking that returns personalized results. The general architecture of our model is shown in Fig. 2.

Fig. 2
figure 2

Social information retrieval system architecture

Our personalized search strategy consists in the following steps. First, we extract users’ communities from social networks based on users’ profiles. The profile of a user is defined by the set of tags he used to annotate documents. Thus, the community detection problem is reduced to computing tags similarity by using the subgraph G = (U, T) of the social graph SG. Second, upon receiving a search query Q = {q 1; :::; q n } from a user u, we proceed as follows:

  1. (a)

    We retrieve the top-k relevant results to the query. Each result is associated with a semantic relevance score, i.e., content relevance score; the more relevant and important a result is, with respect to the query, the higher its relevance score is.

  2. (b)

    For each of the top-k results, we compute its social score based on how popular it is in u’s community. This popularity is defined by related clicks (share, favorite, comment, etc.) and denoted social relevance score.

  3. (c)

    The results are then reranked based on the combination of the semantic relevance score and the social relevance score.

4 User and document profiling

In the following section, we present the method used to create user and document profiles. For each profile, we distinguish a semantic profile and a social profile.

4.1 Definitions

The different entities of social networks provide two types of information defined as follows:

Semantic information Semantic information reflects meanings, roles and properties. This type of information is provided by descriptions where users tag document with semantic labels. Descriptions about texts could be keywords, author, etc., descriptions about songs include genre, singer, title, etc., descriptions about videos include title, category, etc. Descriptions give labels to user interests such as keywords used to tag texts, music genres, videos categories, etc.

Social information Social information reflects activities in social networks. These activities are represented by the links between entities in the social graph shown in Fig. 1. They include descriptive (tags) and reactive (clicks) actions of users toward documents.

4.2 User profile

Based on semantic and social information described above, we define two types of user profiles: User semantic profile and user social profile defined as follows:

4.2.1 User semantic profile

User semantic profile consists in the set of descriptions (tags) used by a user u to annotate documents of his choice. We denote this set by T u  = {t 1; t 2;…; t k }. In social networks, tags represent a strong indicator of users’ interests.

4.2.2 User social profile

User social profile consists in a set of users that have a relationship with user u, namely the community of u. We denote this set by C u  = {u 1; u 2;…; u n1} where u i is a user i linked to user u.

4.3 Document profile

Similarly to user profile, we define two types of profiles for documents: Document semantic profile and document social profile as follows:

4.3.1 Document semantic profile

Document semantic profile consists in a set of features that represent a document d. The set of features is predefined and has a fixed length k. Document features include a set of keywords representing its content. We represent the semantic profile of document d as vector of k features F d = {f 1; f 2;…; f k }. The values of these features are mainly provided by textual content and tags.

4.3.2 Document social profile

Document social profile consists in the set of users who reacted to the document d. We denote this set by U d  = {u 1; u 2;…; u n2} where the u i is a user who reacted to document d. The number of users who reacted to document d indicates its popularity in the social network.

5 Scoring model

In the following, we present our proposed scoring model for personalized search that considers two scores:

(1) A non-personalized score expressed by document relevance score that depends on the semantic relevance of the document where only the conventional content is considered and (2) a personalized score given by the user relevance score that computes the social relevance.

5.1 Scoring function

Given a user u, and a query Q, we search documents that are relevant to the query Q and match the interests of user u. A set of document results {d 1,…, d n3} is returned to the user where the score of each result d i is given as follows:

$$S\left( {Q; \, u; \, d_{\text{i}} } \right) \, = \lambda S_{\text{document}} \left( {d_{\text{i}} ; \, Q} \right) \, + \, (1 - \lambda ) \, S_{\text{user}} \left( {d_{\text{i}} ; \, u} \right)$$
(1)

where S document (d i; Q) denotes the document relevance score of d i to Q and S user (d i; u) denotes the user relevance score of d i to u. The parameter λ controls the amount of personalization (0 ≤ λ ≤ 1). Setting λ = 1 means that we aim at finding what matches the query and setting λ = 0 means that we aim at finding what matches user interests. Values in between combine the two components with different degrees.

5.2 Document relevance score

The document relevance score, of a document d given a query Q, indicates to which degree d matches Q. We compute the document relevance score using a similarity measure between d and Q:

$$S_{\text{document}} \left( {d; \, Q} \right) \, = {\text{ Similarity }}\left( {d; \, Q} \right)$$
(2)

The similarity measure can take different forms such as cosine similarity, Jaccard similarity or Euclidean distance in the case of query-by-example paradigm, or TF-IDF or BM25 in the case of keywords-query. In the next section, we will detail the similarity measure used because it is highly related to the experimental data.

5.2.1 User relevance score

The community of a user has a big influence on documents he goes through. This means that for a document to be liked by the user, it needs to be popular in his community. Thus, we exploit the social profile of the user to compute the user relevance score of document d in the community of u as follows:

$$S_{\text{user}} \left( {d; \, u} \right) = \frac{{|U_{d} \cap C_{u} |}}{{\left| {\text{Users}} \right|}}$$
(3)

where U d is the set of users who reacted to document d, C u is the set of users in the community of user u. and |Users| is the total number of users in the network. It is important to mention that we focus, in this paper, only on positive reactions. So, the more reactions there are, the more popular the document is.

In addition to the social profile of the user, we can enhance the user relevance score by taking into account his semantic profile. This means that among the users of the community of user u, we just target those who have similar interests as u. Recall that user interests are reflected by tags. In this case, we would limit the community of user u only to users who have similar tagging actions as user u. We say that two users u i and u j have similar tagging behavior if the size intersection of their tag sets T u1 and T u2 exceeds a certain threshold. The threshold setting depends on the social network and the amount of data it contains.

6 Evaluation framework

To evaluate our model, we have encountered some issues related to luck of suitable dataset. To overcome this problem, we have used a dataset from Last.fmFootnote 1,Footnote 2 and we have enriched it from Wikipedia (Dridi and Kacimi 2015).

In the following section, we describe the data collection of Last.fm, the methodology employed for identifying users’ communities and the evaluation methodology and metrics.

6.1 Experimental data

We have considered a music track as a document. The dataset contains music tracks, users, and their activities. As there was not enough semantic information about music tracks, we have exploited WikipediaFootnote 3 to enrich the dataset. For each music track, we have used the title to access its Wikipedia page. From the Wikipedia page, we have got information about the music track from the infobox including singer, producer, writer, year, label and other features. Further, we have removed all tracks that do not have Wikipedia pages. Regarding user activities, the dataset contains descriptions corresponding to tags, and reactions corresponding to clicks which are the only reactions available in the dataset. Table 1 gives statistics about the resulted dataset.

Table 1 Statistical characteristics of the Last.fm dataset

6.1.1 Community identification

As described in previous sections, user social profile is defined by user’s community and user relevance score depends on user’s community. Recall that user interests are reflected by tags. So, we define the community of user u by the users who have similar tagging actions as user u. We say that two users ui and uj have similar tagging behavior if the size intersection of their tag sets T u1 and T u2 exceeds a certain threshold. The threshold setting depends on the social network and the amount of data it contains.

In our case, we proceed as follows: (1) we defined 15 communities that correspond to top 15 tracks genres (e.g., pop, rock, rap, soul), (2) users belong the same community if they clicked on same tracks genre and (3) user belongs a community if percentage of tracks he clicked on his published content in the network is more than 30%. A user u can belong more than one community. In this case, the social score is normalized based on number of communities user u belongs.

6.1.2 Evaluation methodology

Considering the experimental data of music, we propose to use a query-by-example paradigm, i.e., a query Q corresponds to a music track. To compute the document relevance score using a similarity measure between a document d and a query Q, we use in this paper Jaccard distance as follows:

$${\text{Similarity}}\left( {d; \, Q} \right) \, = \frac{{|F_{d} \cap F_{Q} |}}{{|F_{d} \cup F_{Q} |}}$$
(4)

where F d and F Q represent the set of features of document d (music track) and Q, respectively. These features are extracted from the semantic document profiles of d and Q.

We have run our experiments using 100 queries. The 100 queries were randomly selected from the top 250 most clicked music tracks. The reason of this choice is driven by the requirements of the automatic assessment of the results described below. As a further step, we proceeded with the selection of the query initiators. For each query, we have selected the users who clicked on the query music track and ranked them. The rank of users was computed based on the number of clicks he has on that track and the number of clicks he has globally. The query initiator was then selected randomly among the top 20 users.

After selecting all queries and their query initiators, we have used our model to rank the results of each query. We have followed two IR scenarios for music search. The first one is exact matching that matches tracks according to selected features (singer, writer, producer) and the second one is approximate matching that matches music tracks based on an approximate interval of time using the feature year.

Considering the exact matching algorithm and the approximate matching algorithm, we have tested different strategies of our model:

  1. (a)

    Semantic document profile (baseline) Consists in returning results that match only the semantic profile of the query without considering the query initiator profile. This is achieved by setting the parameter λ = 1. This is the baseline of IR that returns results based on the content relevance (semantic relevance).

  2. (b)

    Document profile (semantic + social) Consists in returning results that match the semantic and the social profile of the query without considering the query initiator profile. This is achieved by setting the community of the query initiator to all users in the network. Thus, the S user score would be independent from the query initiator reflecting only the popularity of the music track in the whole network.

  3. (c)

    User profile (social) Consists in returning results that match the social profile of the query initiator setting λ = 0. In these experiments, the community of any user is the set of all users in the network, and thus, the score is solely based on the popularity of the music track in the network. This setting is equivalent to document profile (social) where the result matches only the social profile of the query.

  4. (d)

    User profile (semantic + social) Consists in taking into account both the social and the semantic profiles of the query initiator. In this setting, results should match the interests of the community and the query initiator. This is obtained by setting λ = 0 and restricting the community of the query initiator only to users with similar tagging behavior.

  5. (e)

    Document profile + user profile (semantic + social) consists in using all the elements of our approach. We set λ = 0.5 and we use both semantic and social profiles for music tracks and users to rank results.

6.1.3 Assessment and evaluation metrics

To avoid any subjectivity in the assessment of the results, we have exploited click information to indicate whether a user likes a document (music track) or not. So, for each returned result we check if the user has clicked on it. If he has clicked, then we set the result as relevant and give a value of 1; otherwise, it is irrelevant and has a value of 0. To measure the effectiveness of our approach, we have used:

  1. (a)

    The Precision P@k which represents the fraction of retrieved documents that are relevant to the query considering only the top-k results. It is given by:

    $$P@k = \frac{{|{\text{RelevantDoc}} \cap {\text{topkDoc Results}}|}}{k}$$
    (5)
  2. (b)

    The mean average precision (MAP) which is a widely adopted standard measure in IR given by:

    $${\text{MAP}}@n = \frac{{\sum\nolimits_{i = 1}^{N} {{\text{Average}}P@n_{i} } }}{N}$$
    (6)

    where N is the total number of queries, n is a given position and AverageP is the average precision of each query.

  3. (c)

    The normalized discounted cumulative gain (NDCG) which is a measure of ranking quality that uses graded relevance scale of documents in an IR result set, given by:

    $${\text{NDCG}}_{k} = \frac{{{\text{DCG}}_{k} }}{{{\text{IDCG}}_{k} }}$$
    (7)

    where DCGk is defined as follows:

    $${\text{DCG}}_{k} = \sum\limits_{i = 1}^{k} {\frac{{2_{i}^{\text{rel}} - 1}}{{^{{{ \log }_{2} (i + 1)}} }}}$$
    (8)

    where k is a particular rank position, reli is the graded relevance of the result at position k, and IDCG is the ideal DCG obtained by sorting documents of a result list by relevance producing the maximum possible DCG till position k.

7 Experimental results and discussion

The main idea of these experiments is based on the following assumption:

For a query Q issued by user u, relevant documents (music tracks) are those having similar features to Q and highly popular on u’s community.

The objective of the experiments is to demonstrate that our proposed approach for SIR, where user profile and document profile are used with an important consideration of the social context, allows to improve the effectiveness of IR system with a very good results. The results obtained in both algorithms, exact matching and approximate matching, show the competitiveness of our social-based personalization approach.

We carried out the experiments following the exact matching and approximate matching algorithms. Recall that for exact matching, we study the effectiveness @5 and @10 only, while for approximate matching we study the effectiveness @5, @10, @20 and @100 because for exact matching the similarity measure is strict which gives a short result list. However, for approximate matching, there is an interval for the similarity measure which gives a long result list. In our case, the result list of approximate matching exceeds 1000 in most query cases.

Tables 2, 3, 4 and 5 show the precision and MAP values for the different strategies of our model, respectively, related to exact matching and approximate matching IR scenarios. It is clear from the results that user profile approaches (social) perform the best in terms of precision and MAP values, more precisely, when the social and the semantic user profiles are both taken into account to find relevant documents. Compared to the baseline where only the semantic document profile is used, the precision@5 increases from 0.285 to 0.478 in exact matching results, and from 0.0 to 0.223 in approximate matching results, which is a substantial improvement. Similarly, the MAP@5 highly improves, respectively, from 0.242 to 0.423 in exact matching and from 0.0 to 0.09 in approximate matching. We note a decrease in precision and MAP for the top 10 results of exact matching and for all approximate matching results which is due to the decrease in the popularity of music tracks at lower ranks.

Table 2 Mean precision values of exact matching for all queries
Table 3 MAP values of exact matching for all queries
Table 4 Mean precision values of approximate matching for all queries
Table 5 MAP values of approximate matching for all queries

After user profile approaches come the document profile combined with user profile approaches showing also high precision and MAP values in both exact matching and approximate matching algorithms. Thus, whenever user profile approaches are adopted, i.e., personalization, we can increase the satisfaction of the user. Now, considering only Document Profile approaches we can see a notable difference when we use the semantic profile of the document (music track) and when we enhance it with its social profile. We can see that matching music tracks based on their social profile increases the precision@5 from 0.285 to 0.433 in exact matching, and from 0.0 to 0.122 in approximate matching which is a substantial improvement. Similarly, the MAP@5 highly improves, respectively, in both algorithms from 0.242 to 0.405 in exact matching, and from 0.0 to 0.074 in approximate matching. The reason is that even though the social profile of the document does not depend on the query initiator, it depends on other users in the network. This means that it is enough to be in the same network to influence the taste of any participant. So, this strategy is indirectly user profile which explains the high improvement in the results.

To analyze the impact of the parameter λ on the performance of the model, we use different values ranging from 0 to 1 for the strategy that combines all types of profiles. The results related to the exact matching algorithm are given in Tables 6 and 7. In the same line as previous results, we can see that the best results are achieved when λ = 0 which corresponds to user profile strategy. To summarize, the overall results of these experiments demonstrate that involving the user in the retrieval process can be done in different ways and all of them improves highly the satisfaction of the user compared to document profile approaches.

Table 6 Impact of λ on precision
Table 7 Impact of λ on MAP

In these experiments, we have exploited clicks, by analogy to IR evaluation paradigm, as relevance judgments. However, actually, in the case of music, a simple click does not really reveal the relevance judgment since the user can stop the music after some seconds. To overcome this problem, we test our approach by introducing a graded relevance scale based on the number of clicks considering that having clicked a track several times is a much stronger indication of its relevance. Tables 8 and 9 show NDCG values @5 and @10 for exact matching algorithm, and @5, @10 and @20 for approximate matching. NDCG values approve the results given by precision and MAP. For instance, NDCG@5 goes from 0.633 given by the baseline to 0.730 in exact matching results when the social component is introduced, presenting an improvement of 15%. And, it reaches 0.799 when only the user profile approach is considered, presenting an improvement exceeding 26%. Similarly, for approximate matching results, NDCG@5 highly improves from 0.466 to 0.633 for user profile approach, reaching a significant improvement of 43%. Considering our model (document profile + user profile), NDCG@5 shows an improvement of 15% for exact matching which is a substantial improvement. On the other hand, NDCG@5 shows a significant improvement of 21% for approximate matching. It is clear again that when user profile is introduced, i.e., the process is personalized, IR results are performed.

Table 8 NDCG values of exact matching for all queries
Table 9 NDCG values of approximate matching for all queries

For all results given by precision, MAP and NDCG for both algorithms exact matching and approximate matching, we note a significant improvement once the User Profile approaches are introduced. Thus, it is clear that our personalization approach using social networks for defining both document and user profiles performs the best the effectiveness of IR results comparing with the baseline that considers only the semantic document profile.

All results given by the different strategies of our model outperform the baseline. This is perhaps not surprising since our evaluation dataset is collected from last.fm which is a collaborative recommender system. Thus, although the potential of our personalization approach using social information for building both user and document profiles is evident in the context of music IR, clearly there is a need for testing the effectiveness of the model on different datasets, in order to ensure the performance of our approach for any IR context. Furthermore, there is a need for more research to determine the impact of different users’ reactions under which our model performs better or worse.

8 Conclusions and future work

In this paper, we have investigated a personalized social search model based on information produced on social networks, namely descriptions (tags) and reactions (clicks). These two kinds of social information were used for user and document profiling. Our approach goes beyond existing approaches on modeling document profile in social context taking into account in addition to descriptive actions, reactive actions that reflect document popularity. A new reranking function was adopted. It considers a non-personalized score given by semantic relevance and a personalized score given by social relevance. Our proposed approach outperformed the baseline of non-personalized model where only the semantic relevance was considered. The user profile approach gave better overall results than other strategies on two algorithms exact matching and approximate matching.

As future work, we aim at testing our approach on other real-world datasets other than the music one. In addition, we plan to investigate different types of reactions other than clicks that could be significant for other fields. For example, in Twitter, clicks such as retweet, favorite, hashtags could be used as citations for scholarly communication (Dridi 2015). Moreover, comments, as descriptive actions, are definitely a valuable source for user interests and can reveal a lot more about what the user is looking for and how his taste changes over time.