Enhancing Content Based Filtering Using Web of Data

Zitouni, Hanane; Meshoul, Souham; Taouche, Kamel

doi:10.1007/978-3-319-89743-1_52

Hanane Zitouni¹⁹,
Souham Meshoul¹⁹ &
Kamel Taouche²⁰

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 522))

Included in the following conference series:

IFIP International Conference on Computational Intelligence and Its Applications

1324 Accesses
1 Citations

Abstract

Recommender systems are very useful to help access to relevant information on the web and to customize search. Content based filtering (CBF) is an alternative among others used to design recommender systems by exploiting items’ contents. Basically, they recommend items based on a comparison between the content of items and user profile. Usually, the content of an item is represented as a set of descriptors or terms; typically the words that occur in text documents. The user profile is represented by the same terms and built up by analysing the content of items he used before. However current CBF recommender systems are mostly devoted to deal with textual resources and cannot be used in their current form to handle the variety of data published on the web especially unstructured data. Another challenge for the existing CBF methods is the issue of new user for whom the system cannot draw any inference due to the lack of information about the user. This paper describes an approach to CBF that aims to deal with these problems on which CBF systems perform poorly. The basic feature of the proposed approach is to incorporate linked data cloud into the information filtering process using a semantic space vector model, and FOAF vocabulary, which is used to define a new distance measure between users, based on their FOAF profiles. We report on some experiments and very promising results of the proposed approach.

You have full access to this open access chapter, Download conference paper PDF

Semantics-Aware Content-Based Recommender Systems

An Ontology Based Recommender System to Mitigate the Cold Start Problem in Personalized Web Search

Improved content recommendation algorithm integrating semantic information

Article Open access 28 May 2023

Keywords

1 Introduction

Nowadays, the explosive growth of data published on the web in all different fields such as e-learning, social networks, e-commerce among many others is not slowing down soon according to recent studies [1, 2]. The expanding data universe makes it difficult to get benefit from the web content. Furthermore, predicting user responses to options for recommendation purpose becomes an enormous challenge for an extensive class of web applications. Recommending a resource is usually achieved through information filtering. There exist two major approaches to information filtering [1]: Collaborative filtering and Content-based filtering. A Collaborative Filtering (CF) system chooses items based on the correlation between people with similar preferences, while a content-Based Filtering system (CBF) selects items based on the correlation between the content of the items and the user preferences.

Despite the demonstrated effectiveness of CBF technology in many cases, some drawbacks make it inappropriate in its current form for other cases. Indeed, CBF requires analyzing the content of a document which is computationally expensive and even impossible to perform on multimedia items which do not contain descriptive text [3]. Furthermore, CBF presents difficulties to handle the new user problem where no preference is available. At the beginning, a new user does not have any preference value. Therefore, it is very hard to issue any recommendation to him.

In this paper, we propose solving these issues by enhancing CBF systems using semantic derived from the Web of Data. In this latter, the World Wide Web is viewed as a global database by creating links between data which known as Linked Data. When these linked data enable describing people, they are called FOAF (Friend of A Friend). The proposed approach is based on Vector Space Modeling of CBF [4], and enhanced by a semantic level extracted from the web of data leading to a new model that we refer to as Semantic Vector Space Model (SVSM).

Following this introduction, CBF based on Vector Space Model is described in Sect. 2. Section 3 presents key features of the web of data. In Sect. 4, a review of some related works that propose recommender systems in web of data context is given. In Sect. 5, we describe the proposed approach SCBF and we report on the conducted experimental study and obtained results. Finally, conclusion and future work are given.

2 Content Based Filtering (CBF)

Information filtering deals with the delivery of information that would be interesting and useful to a user given his profile and preferences. An information filtering system assists users by filtering the data source and deliver relevant information to them. When the delivered information comes in the form of suggestions such information filtering system is called a recommender system. A CBF technique, also referred to as cognitive filtering [1], recommends items based on a correlation between the content of the items and a user profile. The content of each item is represented as a set of descriptors or terms, classically the words that occur in a document. The user profile is represented by a set of terms built up by analyzing the content of items seen by the user. Typically, a content based filtering system selects relevant items based on the correlation between the content of the items and the user’s preferences.

One of the most important approaches is Vector Space Model (VSM) or term vector model [5]. In the vector space model, a document D (item) is represented as an m-dimensional vector, where each dimension corresponds to a distinct term [6]. The term frequency (tf) is a numerical statistic that measures the importance a term would have with regard to a document in a collection or corpus:

$$ {\text{tf}}_{\text{vi}} = \frac{{{\text{n}}_{\text{vi}} }}{\text{N}} $$

(1)

Where, $ n_{vi} $ is the number of times term $ t_{i} $ appears in a vector v; it models the taste of user and N is the total number of terms in the vector v.

To measure the extent to which documents contain a given term $ t_{i} $ we need to calculate the inverse document frequency (idf).

$$ {\text{idf}}_{\text{i}} = { \log }\left( {\frac{\text{D}}{{{\text{n}}_{\text{j}} }}} \right) $$

(2)

Where, $ D $ is the total number of documents, $ n_{j} $ is the number of documents $ d_{j} \varvec{ } $ containing term $ t_{i} $.

From tf and idf we can calculate the weight (W) or tfidf. This latter is a concept that can be used to create a profile of an item for example a document or an object… etc.

$$ {\text{W}}_{\text{i}} = {\text{tf}}_{\text{vi}} \, *\,{\text{idf}}_{\text{i}} $$

(3)

A content-based filtering system selects relevant items based on the correlation between the content of the items and the user’s preferences [3]. However this technique suffers too from some disadvantages such as: it requires analyzing the content of the document which is expensive and even impossible to perform on multimedia [7] and the problem of new user or no preferences problem. At the beginning, a new user does not have any preference values; this makes it impossible to give him any recommendation. To address these problems, we propose to enhance CBF using the Web of Data.

3 Web of Data

Typically, a data set published in the web contains knowledge about a particular domain, like books, music, encyclopedic data and companies to name just few. If these data sets were interconnected i.e. linked to each other, this makes the World Wide Web a global database termed by Tim Berners Lee as Web of Data.

The most important concepts related to the web of data are: Linked Open Data (LOD), Friend of A Friend (FOAF) vocabulary, and Resource Description Frame work (RDF).

3.1 Linked Open Data

The term Linked Open Data refers to a set of best practices for publishing and connecting structured data on the web using international standards of the World Wide Web Consortium.^{Footnote 1} LOD cloud is considered as a network or collection of data silos.

The diagram of Fig. 1 is maintained by Richard Cyganiak, and Anja Jentzsch (http://lod-cloud.net/).

The core of this diagram is DBpedia^{Footnote 2} which is a community effort to extract structured information from Wikipedia and to make this information available on the Web.

3.2 FOAF Vocabulary

The FOAF project began as an “experimental linked information project.” Dan Brickley and Libby Miller are responsible for its inception, and EddDumbill and Leigh Dodds (http://www.foaf-project.org) notably contributed to its success. FOAF enables to describe people, their interests, their achievements, their activities, and their relationship with other people [8]. In Table 1 below all FOAF classes and proprieties are presented.

Table 1. Classes and proprieties of FOAF vocabulary.

Full size table

3.3 Resource Description Framework (RDF)

Resource Description Framework or in short RDF provides a common data model for Linked Data [8] and is particularly suited for representing data on the Web. Linked Data uses RDF as its data model and represents it in one of several syntaxes. There is also a standard query language called SPARQL. A single RDF statement describes two things and a relationship between them. Technically, this is called an Entity-Attribute-Value (EAV) data model.

4 Related Work

Few recommender systems based on web of data have been developed till date. The following Table 2 reviews some recent approaches. It provides a short description of the methods and indicates the web of data concepts used.

Table 2. Recent recommender systems enhanced by web of data.

Full size table

From Table 2, we can observe that most proposed approaches are dedicated to a specific domain example movies or music and use either FOAF vocabulary or linked data cloud. Almost half of these methods are based on Collaborative filtering.

Our work is motivated by the fact that combination of FOAF vocabulary and linked data cloud would have the potential to further improve the ability of CBF to achieve suitable recommendations. Using the FOAF vocabulary helps in solving the problem of new user and the extracted linked data from the cloud provide a semantic description of non-structured items.

5 Proposed Semantic Content Based Filtering (SCBF)

CBF selects items based on the correlation between the content of the items and the user’s preferences. As aforementioned, the problem with CBF is that it requires analyzing the content of the items which is expensive or impossible with multimedia items. To solve this issue along with the new user problem, we describe in this section how the Web of Data technologies could be used to enhance CBF systems. We refer to the proposed web of data based variant of CBF as Semantic Content Based Filtering (SCBF). In SCBF, we suggest integration of the following technologies:

FOAF Vocabulary: if new user is connected, his FOAF description will be compared with the other users’ FOAF descriptions. The comparison is based on the proposed formula:
$$ {\text{D}}_{\text{FOAF}} \left( {{\text{u}},{\text{v}}} \right) = 1 + { \log }\left( {\frac{{1 + {\text{K}}}}{\text{P}}} \right) $$
(4)

Where, $ D_{FOAF} \left( {u,v} \right) $ is the FOAF distance between users u and v, $ K = L + S $ with S is number of the similar FOAF proprieties between users u and v, L is number of links between u and v and P stands for the total number of FOAF proprieties describing target user u.

Following some important properties for the class person [8]:
- Based near - A location that something is based near, for some broadly human notion of near (The based near relationship relates two “spatial things”).
- Age - The age in years of some person.
- Gender - The gender of this person (typically but not necessarily ‘male’ or ‘female’).
- Title - Title (Mr, Mrs, Ms, Dr. etc.).
- Knows - A person known by this person (indicating some level of reciprocated interaction between the parties).
- dMaker - An agent that made this thing.
- Member - Indicates a member of a Group.
- Interest - A page about a topic of interest to this person.
- Topic_interest - A thing of interest to this person.
Linked Data Cloud

The vector space model is a representation often used for text items In this model, an item i is represented as an m-dimensional vector, where each dimension corresponds to a distinct term. However, this technique is too limited with unstructured and even with semi-structured items.

To fix this problem, we propose in SFBC to enhance the m dimensional vector by n other textual or semantic attributes extracted from the linked data cloud. Therefore, the representation of the item will include (m + n) attributes and expressed of a (m + n)- dimensional vector that we refer to as Semantic Vector Space Model (SVSM). The example below brings more explanation about the proposed SVSM.

In the dataset Movielens^{Footnote 3}, the movie “No escape” is represented by the following textual attributes:

Id	Title	Realise date	Genre
1416	No escape	1994-01-01	Action, science fiction

On the same movie and using DBpedia, we can extract other information such as those given in the following Table 3.

Table 3. Textual and semantic attributes describing the movie “no escape”.

Full size table

For that we propose a new version of the tf denoted by $ \widetilde{tf} $ defined as follows:

$$ \widetilde{\text{tf}} \left( {{\text{v}},{\text{i}}} \right) = \frac{{{\text{NS}}_{\text{vi}} }}{\text{T}} $$

(5)

Where, $ {\text{NS}}_{\text{vi}} $ is the number of times triplet $ t_{i } $ appears in the semantic segment of the vector v and T is the total number of triplets in the semantic segment of the vector v.

$$ \widetilde{{{\text{idf}}_{\text{i}} }} = { \log }\left( {\frac{\text{Tt}}{{{\text{n}}_{\text{j}} }}} \right) $$

(6)

Where, $ Tt $ is the total number of triplet and $ n_{j} $ is the number of documents $ d_{j} $ where triplet $ t_{i} \in d_{j} $.

Therefore, the semantic weight is given by:

$$ \widetilde{{W_{i} }} = \widetilde{{{\text{tf}}_{{{\text{v}},{\text{i}}}} }} *\widetilde{\text{idf}}_{\text{i}} $$

(7)

And the global weight Wg for the item is defined as follows:

$$ {\text{Wg}}_{\text{i}} = {\text{Wi *}}\widetilde{W}_{i} $$

(8)

Based on the above description, the proposed SCBF approach suggests the following architecture of recommender systems shown on Fig. 2.

In the case of new user (the feedback is empty), his D_FOAF is calculated using other users, just after we recommend the set of items liked by the user who has the maximum D_FOAF.

The Space of attributes that describe the items is enhanced by semantic and textual attributes extracted from linked data cloud, which gives further descriptions of the items.

The proposed SCBF engine can be outlined by the following algorithm.

6 Experiments

In the dataset Movielens^{Footnote 4}, all movies are characterized by the following attributes: Id, Title, Realize date, and Genre, using following SPARQL query based on the federation (released by FedX) [17], between DBpedia and Linked Movie DataBase (LDMDB^{Footnote 5}). We can extract more information about these movies like: Director, Country, Actor, and Abstract. The common attribute between Movielens and the federated query is the movies titles. Following is the SPARQL query that extract more information about Movielens movies:

To measure the effectiveness of our approach, we calculated the Mean Absolute Error MAE, and Root Mean Square Error (RMSE) using the following formulas:

$$ {\text{MAE = }}\frac{{\mathop \sum \nolimits_{\text{u,i}} \left| {{\text{p}}_{\text{u,i}} - {\text{n}}_{\text{u,i}} } \right|}}{\text{n}} $$

(9)

$$ {\text{RMSE = }}\sqrt {\frac{1}{\text{n}}\mathop \sum \limits_{\text{u,i}} {\text{p}}_{\text{u,i}} - {\text{n}}_{\text{u,i}}^{2} } $$

(10)

Where $ n_{u,i } $ is the note given by the user u on item I, $ p_{u,i} $ is the predicted note, n is the total number of predicted notes.

The value MAE and RMSE of SBCF are compared with other values of state of the art techniques described in [18]. The results are shown on Table 4 where we can observe that the proposed approach offers the minimum error value (Fig. 3).

Table 4. Comparative results.

Full size table

7 Conclusion

In this work, we described a new approach to content based recommendation using web of data which is mainly supported by some of intelligent technologies namely: FOAF vocabulary and Linked Data Cloud. We were faced with a challenge to use the technique of CBF while reducing the impact of new user issue and the difficulty of analyzing unstructured items. Promising preliminary results have been obtained. As future work, our plan is to test and evaluate the proposed approach with other metrics like recall and precision, and apply new user problem solution to Collaborative Filtering (CF) algorithm to reduce the impact related to cold start issues.

Notes

References

Burke, R.: Hybrid recommender systems: survey and experiments. User Model. User-Adap. Inter. 12, 331–370 (2002)
Article Google Scholar
Zitouni, H., Nouali, O., Meshoul, S.: Toward a new recommender system based on multi-criteria hybrid information filtering. In: Amine, A., Bellatreche, L., Elberrichi, Z., Neuhold, Erich J., Wrembel, R. (eds.) CIIA 2015. IAICT, vol. 456, pp. 328–339. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19578-0_27
Chapter Google Scholar
Shoval, P., Maidel, V., Shapira, B.: An ontology-content-based filtering method. Int. J. Inf. Theor. Appl. 15, 303–314 (2008)
Google Scholar
Raghavan, V.V., Wong, S.K.M.: A critical analysis of vector space model for information retrieval. J. Am. Soc. Inf. Sci. 37(5), 279–287 (1986)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article Google Scholar
Karen, S.J.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar
Dai, H., Mobasher, B.: Using ontologies to discover domain-level web usage profiles. In: Proceedings of the Second Semantic Web Mining Workshop at PKDD 2001, Helsinki, Finland (2001)
Google Scholar
Wood, D., Zaidman, M., Ruth, L., Hausenblas, M.: Linked Data. Manning Publications, New York (2014)
Google Scholar
Celma, O., Serra, X.: FOAFing the music: bridging the semantic gap in music recommendation. Web Semant.: Sci. Serv. Agents World Wide Web 6, 250–256 (2008)
Article Google Scholar
Shani, G., Chickering, M., Meek, C.: Mining recommendations from the web. In: ACM Conference on Recommender Systems, pp. 35–42. ACM, New York (2008)
Google Scholar
Passant, A., Raimond, Y.: Combining social music and semantic web for music-related recommender systems. In: Social Data on the Web Workshop (2008)
Google Scholar
Passant, A., Heitmann, B., Hayes, C.: Using linked data to build recommender systems. In: Proceedings of RecSys, New York, USA (2009)
Google Scholar
Ostuni, V.C., Di Noia, T., Di Sciascio, E., Mirizzi, R.: Top-n recommendations from implicit feedback leveraging linked open data. In: Proceedings of the 7th ACM Conference on Recommender Systems, pp. 85–92. ACM (2013)
Google Scholar
Mirizzi, R., Di Noia, T., Ostuni, V.C., Ragone, A.: Linked Open Data for Content-Based Recommender Systems (2012)
Google Scholar
Szomszor, M., Cattuto, C., Alani, H., O’Hara, K., Baldassarri, A., Loreto, V., Servedio, V.D.: Folksonomies, the Semantic Web, and Movie Recommendation (2007)
Google Scholar
Liu, H., Wang, T., Tang, J., Ning, H., Wei, D., Xie, S., Liu, P.: Identifying linked data datasets for sameAs interlinking using recommendation techniques. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds.) WAIM 2016. LNCS, vol. 9658, pp. 298–309. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39937-9_23
Chapter Google Scholar
Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011. LNCS, vol. 7031, pp. 601–616. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25073-6_38
Chapter Google Scholar
Haddad, M.R., Baazaoui, H., Ziou, D., Ghézala, H.B.: Un modèle de recommandation contextuel pour la prédiction des intérêts des consommateurs sur le Web. In: IC2015 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Abdelhamid Mehri University, Constantine, Algeria
Hanane Zitouni & Souham Meshoul
Claude Bernard Lyon 1 University, Lyon, France
Kamel Taouche

Authors

Hanane Zitouni
View author publications
You can also search for this author in PubMed Google Scholar
Souham Meshoul
View author publications
You can also search for this author in PubMed Google Scholar
Kamel Taouche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hanane Zitouni .

Editor information

Editors and Affiliations

University of Saida, Saida, Algeria
Abdelmalek Amine
University of Regina, Regina, Saskatchewan, Canada
Malek Mouhoub
Concordia University, Montreal, Québec, Canada
Otmane Ait Mohamed
University of Oran, Oran, Algeria
Bachir Djebbar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zitouni, H., Meshoul, S., Taouche, K. (2018). Enhancing Content Based Filtering Using Web of Data. In: Amine, A., Mouhoub, M., Ait Mohamed, O., Djebbar, B. (eds) Computational Intelligence and Its Applications. CIIA 2018. IFIP Advances in Information and Communication Technology, vol 522. Springer, Cham. https://doi.org/10.1007/978-3-319-89743-1_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-89743-1_52
Published: 12 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89742-4
Online ISBN: 978-3-319-89743-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Enhancing Content Based Filtering Using Web of Data

Abstract

Similar content being viewed by others

Semantics-Aware Content-Based Recommender Systems

An Ontology Based Recommender System to Mitigate the Cold Start Problem in Personalized Web Search

Improved content recommendation algorithm integrating semantic information

Keywords

1 Introduction

2 Content Based Filtering (CBF)