1 Introduction

The introduction of the World Wide Web at the end of the 20th century has resulted in a widely accessible knowledge source with enormous growth potential. According to estimates, the digital world almost doubles every two years in size, emerging in the prodigious scale of 44 trillion gigabytes in 2020  [20]. Hence, the challenge today is not to increase the amount of data, but to retrieve patterns of interest in this data.

Applications that deal with these problems and help to structure the overload of data are recommender systems (RS)  [17]. RS are employed in various fields (e.g., news items, movies, books, etc.) to distinguish certain data based upon user’s preferences which are captured in so-called user profiles, e.g., by using domain models  [18]. The focus in this research is on the recommendation of news items. The large amounts of data available on every single news event does not facilitate search for users to find the items of their interest. News Web sites often categorize news items, however, these are not ordered according to the individual needs of a user. Therefore, users can strongly benefit from RS.

There are three main types of RS that can be used for this: collaborative RS, which provide recommendations based on similarities between preferences of one user and preferences of others, content-based RS, which recommend items according to their content, and hybrid RS, which are combinations of the former two approaches  [3]. Here, our focus will be on content-based RS for news recommendation, as these enable a better understanding of the news item content, and are able to deal with the cold-item problems. We do not consider hybrid RS as we assume not to have a lot of information on users and their preferences.

The difficulty that arises in content-based recommendation is that machines are not able to understand the meaning of the text. This is, however, a necessary condition in order to provide suitable recommendations for the interests of a particular user. Therefore the words in the text need to be semantically analyzed, and the correct sense for each word determined by word sense disambiguation, enabled by using a semantic lexicon, e.g. WordNet  [8]. Existing approaches such as (Bing-)CF-IDF+  [2, 7] and (Bing-)SF-IDF+  [6, 15] have used only subsets of these features, i.e., concepts and their relations for CF-IDF+, and synsets and their relations for SF-IDF+, and named entities for the Bing variant.

In this paper, we aim to make use of a larger set of features by combining those of the previously mentioned RS, and propose the Bing-CSF-IDF+ recommender. This content-based approach incorporates both the concepts found in the news item and their domain-specific related concepts, as well as the identified synsets and their related synsets using semantic relations from WordNet for mapping a user’s preference. Moreover, named entities that are not present in WordNet or a domain ontology are considered. To find (related) concepts, a domain-specific ontology is used as knowledge base. We hypothesize that the proposed method, which combines different features of state-of-the-art recommendation methods, yields an improvement in news recommendation compared to existing RS. The performance of the Bing-CSF-IDF+ recommender will be measured by means of statistics, e.g., the \(F_1\)-measure and Kappa statistic.

The remainder of this paper is organized as follows. Section 2 discusses related work on content-based RS. Sections 3 and 4 provide a description of the proposed recommender, and present an evaluation against other recommenders as benchmark, respectively Section 5 discusses conclusions drawn from the conducted research and provides some directions for future work.

2 Related Work

Let us start with an overview of existing content-based RS, and consider traditional Vector Space Models (VSM) TF-IDF, CF-IDF, and SF-IDF, where TF-IDF is the oldest recommendation approach. The TF-IDF method is of interest as SF-IDF and CF-IDF build on the mathematical concept provided by TF-IDF. The Term Frequency - Inverse Document Frequency (TF-IDF)  [19] recommender consists of two parts. The term frequency indicates how often a term occurs in a given news item. Higher frequencies are linked to higher relevancies. The inverse document frequency captures the importance of a term in a set of news items. Frequent terms are considered to be common and less important. TF-IDF represents news items as term vectors containing scores, which can be compared to user vectors (aggregation of vectors corresponding to items previously consumed by the user) using similarity functions (e.g., cosine similarity). The TF-IDF score is large for terms that occur frequently in a particular news item but not often in all other news items. A certain specified threshold decides whether a news item and the user’s interest are considered similar. The Synset Frequency - Inverse Document Frequency (SF-IDF)  [5] VSM is a variation of TF-IDF, which in addition to all terms looks at synonyms and ambiguous terms using a semantic lexicon (WordNet). Terms having the same meaning will be subsumed in one single concept, and therefore, word sense disambiguation is needed. For terms with multiple meanings, corresponding word senses are to be counted separately. The Concept Frequency - Inverse Document Frequency (CF-IDF)  [10] recommendation approach is another variant of TF-IDF, deviating from SF-IDF by using key ontological concepts instead of all synsets in a news item. CF-IDF considers news items as a weighted vector of concepts. A domain ontology linked to WordNet captures the most salient concepts of a domain.

The Semantic Relationship Vector Space Models extend the traditional VSM by taking semantic relationships into account. SF-IDF+  [15] extends the SF-IDF  [5] method, by combining synsets with their synsets related using semantic relationships, such as WordNet hypernyms. The vector representation is extended by adding the related synsets from the synset of a news item to the vector representation, enabling better vector representation of news items. Bing-SF-IDF+  [6] is an extension of SF-IDF+, which in addition to words in the semantic lexicon also considers the similarity between named entities frequently occurring on the Web. The Bing similarity is based on the number of page counts originating from the Bing search engine. Each news item has an SF-IDF+ similarity value and a Bing similarity value with a user profile. A weighted average is used to compute the Bing-SF-IDF+ similarity value for a news item with a user profile, using the Point-Wise Mutual Information (PMI)  [1] measure. The CF-IDF+  [7] recommender is an extension of CF-IDF  [10], which also processes the news items into a concept vector representation but extends the model by considering related ontology concepts – direct super- and subclasses, and domain-specific related concepts, and their relationships. Only the related concepts not yet in the vector representation are taken into account, or if the related concept has a higher CF-IDF+ value than the previous value.

Lastly, for historical reasons we discuss two semantic similarity RS: SS and Bing-SS as both of these have been outperformed by Bing-SF-IDF+  [6] and CF-IDF+  [7]. The Semantic Similarity (SS) recommender  [5] filters all possible user profile and news item synset pairs for words that do not have the same part-of-speech. Similarity scores are computed for the remaining pairs using various measures, e.g., Jiang and Conrath  [12], Leacock and Chodorow  [13], Lin  [14], Resnik  [16], and Wu and Palmer  [21]. These similarity measures capture the distance between two synsets in a semantic graph (e.g., WordNet). Finally, the score for the unread news items is found by taking the average over the similarity scores for all pairs of synsets. Bing-SS  [4] is an extension of the SS recommender. Similar to Bing-SF-IDF+, it additionally takes into account named entities in its computations for those synset pairs that have the highest similarity scores.

3 Bing-CSF-IDF+

The Bing-CSF-IDF+ recommender combines information from found named entities, concepts and their relationships, and synsets and their relationships by using the Bing, the CF-IDF+, and the SF-IDF+ similarity values for news items. As the previous recommenders, it also relies on the Hermes framework.

Hermes is a framework for indexing, querying, and recommending news items using a knowledge base  [9]. It allows to construct the knowledge base from RSS (Really Simple Syndication) feeds, advantaging from the meta-data, e.g., the title, category, and publication date of the news items available in these feeds, and enables collecting news items from multiple news sources. Hermes also stores the user profile (created by collecting the concepts of interest from previously read news) containing information about the news items and subjects a user finds interesting. Last, Hermes uses a domain ontology (created by domain experts and defines relationships between different concepts pertinent to a certain domain), which enables semantic-based news indexing and querying. Using the content and the meta-data of the news items in the knowledge base (the instance) and the domain ontology (the schema), the news items are pre-processed into vector space models before being run through the recommenders. The implementation of the Hermes uses a Natural Language Processing (NLP) engine to pre-process the news articles and employs linguistic techniques such as lemmatization, word sense disambiguation, tokenization, sentence-splitting, and concept detection to find which concepts are described in the text. The latest description of the Hermes framework and recommender implementations can be found in  [2].

The Bing-CSF-IDF+ recommender assumes a certain order of steps taken. First, a news item is analyzed on the presence of words which trigger concepts from the ontology. The words found to trigger concepts from the ontology are no longer considered in the next step, which is looking for named entities by means of the Bing method. These found named entities will now no longer be considered for the last step, which analyzes the remainder of the news item by means of the SF-IDF+ recommender. We have considered this order of processing steps as we assume that the ontology, followed by named entities, and then synsets provide for the most specific, and thus the most useful information when analyzing news.

The Bing-CSF-IDF+ similarity measure is calculated by linearly combining the weighted averages of the similarity values between a user profile and an unread news item found for the CF-IDF+ recommender, the Bing method, and the SF-IDF+ recommender:

(1)

where \(d_{r}\) is the vector representation of the user’s interest, \(d_{u}\) the vector representation of unread news items, and \(\alpha \), \(\beta \) predefined values (like with Bing-SF-IDF+  [6]), which can be optimized by means of a genetic algorithm to obtain the best performance of the Bing-CSF-IDF+ recommender on a validation data set. As with existing RS, unread news items for which the normalized similarity measure exceeds a predefined cut-off value are recommended.

4 Evaluation

The evaluation of the newly proposed recommender will be discussed through the set-up of the experiment, the optimization of the weights used in the recommender, and the results obtained for Bing-CSF-IDF+ and the existing RS.

4.1 Setup

The evaluation setup is similar to setups in existing literature on news recommendation  [6, 7]. The used data set consists of 100 different news articles, originating from a Reuters RSS feed. All these articles are concerned with financial news on technology companies. Next, 8 different user profiles are contained in this data set. Each user profile is linked to a specific topic. The topics are: “Asia”, “Financial markets”, “Google and its competitors”, “Internet of Web services”, “Microsoft and its competitors”, “National economies”, “Technology”, and “United States”. The user profiles were created by 3 researchers (experts in news analytics) from the Erasmus University Rotterdam by rating a news article as either interesting or not for a certain profile. Articles are considered to be interesting based on the principle of majority voting. Table 1 reports the user profile topics with their inter-annotator agreements (IAA), and the number of interesting (I+) and non-interesting (I−) news items as given by the experts.

Table 1. Number of interesting (I+) and non-interesting (I−) news items, and the inter-annotator agreement (IAA)

To test the performance of the recommenders, the complete data set of 100 news articles is split into smaller subsets – a training set (30%), a validation set (30%), and a test set (40%), keeping the proportions of relevant/non-relevant news items for each of the considered sets (we have 8 such divisions, one for each user profile). The training set is used for learning the user profile. The validation set is used for finding the optimal weights used in the recommender.

The recommender uses the knowledge about the user learned from the training and validation set to predict whether an article from the test set is interesting or not. An unread article is marked as interesting if the similarity value between the user profile and the article is higher than a predefined cut-off value. The news items classified by RS as interesting for the user, will be recommended.

4.2 Optimizing Weights

The optimization of the weights for the recommender is done using a Genetic Algorithm. The Genetic Algorithm works with sets of solutions, called populations. In each iteration of the algorithm, the previous population is adapted such that one navigates through the parameter space to the (local) optimum. For the Bing-CSF-IDF+ recommender there are 32 weights which need to be optimized, namely weights for 27 SF-IDF+ relations, 3 CF-IDF+ relations, and \(\alpha \) and \(\beta \). The optimization is done on the Lisa system from SURFsaraFootnote 1. The Lisa system is a computer cluster consisting of several hundreds of multi-core nodes and is meant for researchers who need large computing capacities. As the nodes of the Lisa system contain multiple cores, each computer can run multiple jobs in parallel. We want to find the optimal weights for several cut-off values. The optimization for different cut-off values does not depend on each other, so these jobs can be independently parallelized.

Note, as the genetic algorithm is a heuristic algorithm, the algorithm might not be able to find the optimal weights, but we were able to search large parameter spaces due to the fact that we had the computing power of the Lisa system at our disposal. This makes it likely that the optimized weights are nearly optimal.

4.3 Results

The following results for the considered recommenders were obtained on exactly the same splits of the data set. Note that these splits are different from the ones considered in the previous works  [6, 7].

First, the results for the \(F_1\)-measure are presented for the Bing-CSF-IDF+, Bing-SF-IDF+, and CF-IDF+ recommenders. These are the recommenders that are the most interesting for this research, as previous research showed that Bing-SF-IDF+ and CF-IDF+ recommenders gave the best results. Figure 1 outlines that the CF-IDF+ recommender seems to perform well for low cut-off values (i.e., in situations where low precision is tolerated in favor of high recall). From a cut-off value of about 0.3, both the Bing-CSF-IDF+ and Bing-SF-IDF+ recommender perform notably better than the CF-IDF+ recommender. This is an indication that Bing-CSF-IDF+ and Bing-SF-IDF+ boast a high precision and are able to pull a higher recall in more strict recommendation contexts. The most important result that can be deduced from Fig. 1 is that the Bing-CSF-IDF+ recommender seems to perform at least as well as the other two recommenders for almost all cut-off values. Especially for the cut-off values which range from 0.05 to 0.4, the combination of both the CF-IDF+ recommender and the Bing-SF-IDF+ recommender for the Bing-CSF-IDF+ recommender, seems to be useful.

Fig. 1.
figure 1

\(F_1\)-measures for several recommenders

The observations made from Fig. 1 are confirmed by the Student-t test which was used to determine whether one recommender was statistically better than the other recommenders. The Student-t test was used for testing whether two recommenders had significantly different average \(F_1\)-measures. The p-values of this test can be found in Table 2. The performance of the recommenders from worst to best is CF-IDF+, Bing-SF-IDF+, and then Bing-CSF-IDF+. All results were found to be significant on a 5% significance level.

Table 2. One-tailed two-sample Student-t test p-values for the \(F_1\)-measure (\(H_0\): \(\mu _{\mathrm {column}} = \mu _{\mathrm {row}}, \; H_1\): \(\mu _{\mathrm {column}} > \mu _{\mathrm {row}}, \; \alpha = 0.05\))

Also the Cohen’s Kappa statistic was determined for each of the cut-off values and each of the recommenders. The Cohen’s Kappa statistic measures the inter-rater agreement between the classifications and the actual interestingness of news articles (by taking into account the agreement by chance). Figure 2 shows the results for the Cohen’s Kappa statistic. The Bing-SF-IDF+ recommender seems to perform notably worse than the other two recommenders for low cut-off values. The Bing-SF-IDF+ recommender, however, improves in relative performance for larger cut-off values. Again, the Bing-CSF-IDF+ recommender seems to perform at least as well as the other recommenders for almost all cut-off values.

Once again, the observations made from Fig. 2 are confirmed by statistical tests. A Student-t test was performed on the average Kappa statistic for each of the recommenders (Table 3). This time it was found that the recommenders could be ordered from worst to best in the order of Bing-SF-IDF+, CF-IDF+, and Bing-CSF-IDF+. Again, all results are significant on a 5% significance level. So from both the \(F_1\)-measure and the Kappa statistic, we can conclude that the Bing-CSF-IDF+ recommender performs the best.

Fig. 2.
figure 2

Kappa statistics for several recommenders

Table 3. One-tailed two-sample Student-t test p-values for the Kappa statistic (\(H_0\): \(\mu _{\mathrm {column}} = \mu _{\mathrm {row}}, \; H_1\): \(\mu _{\mathrm {column}} > \mu _{\mathrm {row}}, \; \alpha = 0.05\))

The optimized weights for the different recommenders, while considering the optimal cut-off (with respect to \(F_1\)) for each recommender, have notable differences. For CF-IDF+, for the cut-off value of 0.06, each relationship is almost equally informative. For the cut-off value of 0.22, the Bing-SF-IDF+ \(\alpha \) parameter has an optimized weight of 0.14220, indicating that the SF-IDF+ recommender contributes the most information. Last, for Bing-CSF-IDF+, \(\alpha \) and \(\beta \) are optimized to 0.20268 and 0.50435, respectively, at a cut-off value of 0.28. Thus, CF-IDF+ contributes about half the information, Bing contributes about 20% of the information and SF-IDF+ contributes the rest of the remaining 30%.

Looking at Fig. 1, the resulting weights for the Bing-CSF-IDF+ recommender make sense. For the cut-off value of 0.28 the CF-IDF+ recommender clearly performs better than the Bing-SF-IDF+, hence why the assigned weight is larger.

5 Conclusion

We have proposed a new semantics-driven Bing-CSF-IDF+ recommender combining the best features of the existing CF-IDF+ and Bing-SF-IDF+ recommenders. We have shown that the newly proposed Bing-CSF-IDF+ recommender outperforms the already existing recommenders. For almost each cut off value, both the \(F_1\)-measure and the Kappa statistic of the Bing-CSF-IDF+ recommender are at least as high as the other recommenders, meeting our expectations, as the Bing-CSF-IDF+ recommender could be transformed to both a pure CF-IDF+ as well as a Bing-SF-IDF+ recommender by choosing the appropriate values for the weights \(\alpha \) and \(\beta \). The occasions that the values of these statistics are higher for the existing recommenders can be explained by the fact that the genetic algorithm is used for optimizing all the weights simultaneously. As this algorithm is only a heuristic, it does not need to find the optimal weights.

We envision various possible opportunities and directions for future work. It would be interesting to compare the proposed method to graph embedding based recommendation  [11]. Also, one could look for better heuristic algorithms to optimize the weights for the Bing-CSF-IDF+ recommender, for example, an Ant Colony Optimization algorithm. This algorithm might find better weights, which will improve the performance of the recommender.

Another improvement might be made by only taking the most similar named entities into account (as performed in the Bing-SS recommender), or by using machine learning algorithms (e.g., SVMs) to learn the recommender model using all the available features. Moreover, using WordNet relations also for concepts (if concepts have a synset associated) could possibly lead to a better performance of the recommender. Finally, the results could be improved by using a more extensive domain ontology for the CF-IDF+ values, to discover additional important concepts in the news articles that might influence the classification of an article as interesting or not.