Keywords

1 Introduction

Recommender systems (RSs) are considered as a powerful tool to guide users in their decision making process [8]. That is why, much research has been recently devoted to the development of RSs aiming to enhance the accuracy and the performance of the recommendations. One of the most promising recommendation approaches is the collaborative filtering which is considered as a leading approach in the RSs field due to its straightforwardness and its high accuracy [2]. It tends to predict users’ preferences of items not yet rated. For example, when a user is browsing a movie website in order to get an idea about the new releases, the system recommend him movies that he had not watched yet by predicting a rating for each movie. The question that arises here is how far the system can assume that the computed outputs are certain? In one way or another, the provided predictions are not perfect and they involve uncertainty which should not be ignored. In this case, it is obvious that this approach exhibits some weakness related to its unability to deal with the uncertainty involved in the predictions. That is why, we are oriented to the improvement and the extension of the traditional version of the collaborative filtering in order to deal with this kind of problems. In fact, a little attention has been paid to the problem of managing uncertainty in the field of RSs, either by means of fuzzy set theory [3, 4], probability theory [5] or possibility theory [6]. However, the uncertainty that is generally pervading in the prediction results has not been considered. Consequently, ignoring this important point can decrease the user’s confidence towards the system which can deeply affect the quality of recommendations as well as their reliability. This fact prompted us resorting to a powerful theory which offers a flexible tool to deal with imperfect information namely the belief function theory [7]. This theory is appropriate to cope with uncertainty in classification problems within several machine learning techniques, out of which the K-Nearest Neighbors (KNN). Indeed, an extension of the classical KNN based on the belief function framework has been proposed by [18]. Such classifier allows the objects to belong to not only a specific class.

Thus, the idea behind our new recommendation approach is to take advantage of the belief function tools as well as the Evidential K-Nearest Neighbors (EKNN) in order to deal with the prediction uncertainty. The proposed method is able to provide an evidential representation of the ratings given by similar items as well as the aggregation of their contributions. In this paper, the EKNN formalism is used to represent both the interactions between similar items and the processes leading to a richer information content of the final recommendation. Besides, we show how incorporating uncertainty in the prediction process leads to more significant and accurate results.

The remainder of this paper is organized as follows: Sect. 2 provides a review of the Recommender Systems. Section 3 recalls the Evidential K-Nearest Neighbors. Our proposed recommendation approach is presented in Sect. 4. Section 5 exposes its experimental results conducted on a real world data set. Finally, the contribution is summarized and the paper is concluded in Sect. 6.

2 Recommender Systems

With the continual growth of the available information, RSs have sprung up as a suitable solution to provide the users with personalized recommendations [1]. Such systems start by collecting users’ preferences and try to predict their future evaluation towards unrated items. Generally, RSs fall into three categories [10] namely the content-based recommender [9], the collaborative filtering recommender [11] and the hybrid approaches [12]. The first type consists of a matching process between the contents of the unrated items and those of the items in which the user has previously expressed an interest. In this category, a prior access to the item’s features is required which cannot be always the case. Indeed, when items have a limit number of available features, no content-based recommender can provide suitable suggestions. Moreover, recommending items sharing the same features may lead to an overspecialization problem. That is to say, since content-based approach relies solely on items’ descriptions, this latter cannot in any case find out different items that may please the user. Such phenomenon refers to serendipity which cannot be obtained by this approach unlike the collaborative filtering (CF). This second type relies on a matrix of user-item ratings rather than items’ features to predict preferences. By making use of other users’ ratings, this approach is able to generate recommendations even when items’ descriptions are not available or hard to extract. Besides, it can deal with any kinds of content and help users find interesting items that they might not have discovered otherwise. Surprising the users and recommending different items even the ones that are dissimilar to those rated in the past is the key advantage of this approach. The CF is often characterized as either being model-based or memory-based. The model-based techniques learn a model to predict the future preferences based on the entire collection of users’ ratings. The second category, refereed as neighborhood-based, computes the similarity between users (user-based [14]) or items (item-based [15]) and select the most similar ones for recommendation. In some applications, hybrid approaches have also emerged combining two or more recommendation techniques to increase their performance while leveling out the weakness of each one. However, the CF in particular the memory-based has remained the most popular and commonly implemented approach in RSs field due to its simpleness, robustness and its success in real-world applications [13]. That is why the new method will be centered around the memory-based CF approach.

3 Evidential K-Nearest Neighbors

In this section, we recall the basic concepts of the belief function theory as well as the Evidential K-Nearest Neighbors classifier.

3.1 Belief Function Theory

The belief function theory [16, 17], also refereed to Dempster-Shafer Theory (DST), is among the most used theories for representing and reasoning with uncertainty. In this theory, a problem domain is represented by a finite set of elementary events called the frame of discernment and denoted by \( \varTheta \). The belief committed to the elements of the frame of discernment \( \varTheta \) is expressed by a basic belief assignment (bba) which is a mapping function \(m: 2^{\varTheta } \rightarrow [0, 1] \) such that:

$$\begin{aligned} \sum \limits _{A \subseteq \varTheta } m(A) = 1 \end{aligned}$$
(1)

Each mass m(A), called a basic belief mass (bbm), quantifies the degree of belief exactly assigned to the event A of \( \varTheta \). An event A is called a focal element if \(m(A)> 0\). The bba which has at most one focal element aside from the frame of discernment \(\varTheta \) is called simple support function. It is defined as follows:

(2)

where A is the focal element and w \(\in \) [0, 1].

Given two bba’s \(m_1\) and \(m_2\) induced from two reliable and independent information sources, the evidence can combined using Dempster’s rule of combination defined as:

(3)
(4)
(5)

To make decisions, beliefs can be represented by pigninstic probabilities defined as:

$$\begin{aligned} BetP (A) = \sum _{B \subseteq \varTheta }\frac{|A \cap B|}{|B|}\frac{m(B)}{(1-m(\varnothing ))}~for~all~A \in \varTheta \end{aligned}$$
(6)

3.2 Evidential K-Nearest Neighbors

The Evidential K-Nearest Neighbors (EKNN) [18] is a pattern classification method based on the belief function theory. This latter improves the classification performance over the crisp KNN approach. It allows a credal classification of the objects which leads to a richer information content of the classifier’s output.

Notations

  • \( \varTheta =\lbrace C_{1}, C_{2},..., C_{M} \rbrace \): The frame of discernment containing the M possible classes in the system.

  • \(X_{i}=\lbrace X_{1}, X_{2},..., X_{n} \rbrace \): The object \(X_{i}\) belonging to the set of n distinct objects in the system.

  • X: A new object to be classified.

  • \( N_{K}(X) \): The set of the K-Nearest Neighbors of X.

EKNN Procedure

The EKNN aims to classify a new instance X based on the information provided by the training set. A new pattern X to be classified should be assigned to one class of the \( N_{K}(X)\) based on the selected neighbors. However, the knowledge that a neighbor X belongs to class \(C_{q}\) can be considered as a piece of evidence that raises the belief that the pattern X to be classified belongs to the class \(C_{q}\). That is why, the EKNN technique treats each neighbor as a piece of evidence supporting a number of hypotheses regarding the class of the object X to be classified. Indeed, the more the distance between X and \(X_i\) is scaled down, the more the evidence is strong. This evidence can be represented by a simple support function with a bba verifying:

$$\begin{aligned} m_{X,X_i}(\lbrace C_q\rbrace ) = \alpha _0 \exp ^ {-({\gamma _{q}^2}\times d(X,X_{i})^2)} \end{aligned}$$
(7)
$$\begin{aligned} m_{X,X_i}(\varTheta ) = 1-\alpha _0 \exp ^ {-({\gamma _{q}^2}\times d(X,X_{i})^2)} \end{aligned}$$

where \(\alpha _0\) is a constant that has been heuristically fixed to 0.95 and \({d(X, X_{i})}\) is the Euclidean distance between the object to be classified and the other objects in the training set. On the other hand, \({\gamma _{q}}\) has been defined as a positive parameter assigned to each class \(C_q\). It is considered as the inverse of the mean distance between all the training patterns belonging to the class \(C_q\).

Once the different bba’s provided by the K-Nearest Neighbors are generated, they can be combined using Dempster’s rule of combination.

$$\begin{aligned} m_X=m _{X,X_1}\oplus ...\oplus m _{X,X_K} \end{aligned}$$
(8)

where \( \lbrace 1,...,K\rbrace \) is the set containing the indexes of the K-Nearest Neighbors.

4 Evidential Item-Based Collaborative Filtering

The idea behind our contribution is to take into account the uncertain aspect of the predictions made by RSs. To ensure this task, we propose to rely on the EKNN which is a machine learning algorithm under the belief function framework. The whole process of our proposed evidential collaborative filtering approach is performed in three phases namely the initialization phase, the learning phase and the prediction phase.

Step1: Initialization Phase

The first step consists of assigning values to the two parameters \(\alpha _0\) and \(\gamma _{r_{i}}\) to be used in the next phase. This procedure starts by initializing the parameter \(\alpha _0\) and then exploits the user-item matrix in order to compute the second parameter \(\gamma _{r_{i}}\). The parameter \(\alpha _0\) is initialized to the value 0.95 as mentioned in the EKNN formalism [18]. Note that the initialization of \(\alpha _0\) is executed only once while the \(\gamma _{r_{i}}\) computation is performed each time according to the current items’ ratings. In order to ensure the \(\gamma _{r_{i}}\) computation, we should firstly find items having separately exclusive ratings. In other words, we extract the items having equal values of the provided ratings. According to the selected items, we assign a parameter \(\gamma _{r_{i}}\) to each rating \(r_{i}\) which will be computed as the inverse of the average distance between each pair of items i and j having the same ratings. This computation is performed based on the normalized Euclidean distance denoted by d(i,j) and defined as follows:

$$\begin{aligned} d(i, j)=\frac{\sum _{u \in u_{i}\cap u_{j}} \quad (r_{u,i}-r_{u,j})^{2}}{\vert u_{i}\cap u_{j} \vert } \end{aligned}$$
(9)

where \(r_{u,i}\) and \(r_{u,j}\) correspond to the rating of the user u for the items i and j. Moreover, \(u_{i} \) and \(u_{j}\) are the users u who have rated the items i and j.

Step2: Learning Phase

Once the two parameters \(\alpha _0\) and \(\gamma _{r_{i}}\) have been assigned, the second phase consists in the items’ selection. This selection is performed according to a similarity computation as that proposed in [15] by isolating the co-rated items. In our method, we firstly consider users who have rated common items. Then, we compute for each item j in the database, its distance with the target item i. Given a target item, we have to spot to its K-most similar items, also referred to the neighbors, by picking out only the K items having the lowest distances that we denote by dist(i,j).

Step3: Prediction Phase

The prediction phase is the most important one in RSs since it provides users with their future evaluations regarding the unrated items. In this section, we shed light on the prediction process of our contribution. The two key procedures of this phase are respectively, the bba’s generation and the bba’s combination.

  1. 1.

    The bba ’s Generation

    Traditional methods in this step, provide the user with a predicted rating that indicates a score of his future degree of satisfaction given an item. As we argued in the introduction, the RS cannot draw any certain inference about the future rating. Even if the ratings provided by the most similar items can increase our belief about the most probable one, we cannot admit that such knowledge is certain. In view of this assumption, we emphasize the presence of uncertainty facet in the predictions through the belief function theory. In such case, we maintain that different pieces of evidence which involve a particular hypothesis about the predicted rating can contribute to the final prediction. Hence, the evidence would be over the ratings provided by the K-most similar items. The main advantage under this representation is that the final prediction must be a basic belief assignment which reflects more credible results. We start by observing the ratings provided by the different pieces of evidences (i.e. the K-similar items). Accordingly, we generate the corresponding bba’s.

    Using the terminology of the belief function theory, we can define the frame of discernment corresponding to this situation as:

    $$\begin{aligned} \varTheta =\{r_{1},r_{2}.....r_{n}\} \end{aligned}$$
    (10)

    where n denotes the number of the possible ratings r that can be provided in the system.

    Since each item involves a particular hypothesis about the predicted rating, we generate a bba over each rating provided by the similar items as well as the whole frame of discernment \(\varTheta \). According to the similarities computed in the learning phase as well as the two parameters \(\alpha _0\) and \(\gamma _{r_{i}}\) initially assigned, we can represent this bba as a simple support function defined as following:

    $$\begin{aligned} m_{i,j}(\lbrace r_{i}\rbrace ) = \alpha _{r_{i}}\nonumber \\ m_{i,j}(\varTheta )= 1-\alpha _{r_{i}} \end{aligned}$$
    (11)

    where i is the target item and j is its similar item such that: j = \(\lbrace 1, .., K \rbrace \), \(\alpha _{r_{i}}=\alpha _0 \exp ^ {-({\gamma _{r_{i}}^2}\times dist(i,j)^2)} \), \(\alpha _0\) and \(\gamma _{r_{i}}\) are the two parameters assigned in the initialization phase and dist(ij) is the distance between the items i and j computed in the learning phase.

    In this situation, each neighbor of the target item has two possible hypotheses. The first one corresponds to the value of its provided rating while the rest of the committed belief is allocated to the frame of discernment \(\varTheta \). Thereupon, the focal elements of the belief function are the ratings provided by the K-similar items and \(\varTheta \). By treating the K-most similar items as independent sources of evidence, each one is represented by a basic belief assignment. Hence, K different bba’s can be generated for each item.

  2. 2.

    The bba ’s Combination

    In the previous step, we showed how to generate bba’s for each similar item. Now, we describe how to aggregate these bba’s in order to synthesize the final belief about the rating of the target item. Using the belief function theory, such bba’s can be combined using Dempster’s rule of combination. Therefore, the resulting bba encodes the evidence of the K-Nearest Neighbors regarding the rating that should be provided to the target item.

    $$\begin{aligned} m_{Target~item}=m _{Target~item, Item~1}\oplus ...\oplus m _{Target~item, Item~K} \end{aligned}$$
    (12)

    This final bba is obtained as follows:

    $$\begin{aligned} m(\lbrace {r_{i}}\rbrace )=\frac{1}{R}(1-\prod _{i\in i_K}(1-\alpha _{r_{i}}))\cdot \prod _{r_{j} \ne r_{i}}\prod _{{i\in i_K}}(1-\alpha _{r_{j }} ) \quad \forall r_{i}\in \lbrace r_{1},..,r_{n} \rbrace \end{aligned}$$
    (13)
    $$\begin{aligned} m(\varTheta )=\frac{1}{R}\prod _{i=1}^n (1-\prod _{i\in i_K}(1-\alpha _{r_{i }})) \end{aligned}$$

    where \( i_K= \lbrace i_1, i_2..., i_K\rbrace \) is the set containing the indexes of the K-nearest neighbors of the target item over the user-item matrix, n is the number of the ratings provided by the similar items, \(\alpha _{r_{i }}\) is the belief committed to the rating \(r_{i }\), \(\alpha _{r_{j }}\) is the belief committed to the rating \( r_{j }\) \(\ne r_{i}\), R is a normalized factor defined by:

    $$\begin{aligned} R=\sum _{i=1}^n (1-\prod _{i\in i_K}(1-\alpha _{{r_{i }}} ) \prod _{r_{j }\ne r_{i }}\prod _{i\in i_K,}(1-\alpha _{{r_{j }}})+\prod _{{i=1}}^n (\prod _{i\in i_K}(1-\alpha _{{r_{j}}} )) \ \end{aligned}$$
    (14)

    Figure 1 illustrates the prediction phase.

Fig. 1.
figure 1

The prediction phase

5 Experimental Study

We conduct some experiments on the MovieLensFootnote 1 data set which is one of the widely used real word data sets in the field of CF. It contains in total 100.000 ratings collected from 943 users on 1682 movies. Indeed, a study of the choice of the appropriate metric to be used in CF has been performed in [20] using MovieLens data set. According to this study, the Euclidean distance achieves the best results in terms of accuracy. Otherwise, another experimental study [21] has proven that both Pearson and Cosine are the most appropriate similarity measures. Hence, the choice of the most suitable metric is still arguable in the CF framework. That is why, we propose to perform a comparative evaluation over our proposed method as well as the traditional one using these three different similarity measures commonly used in the memory-based CF category.

5.1 Evaluation Metrics

To evaluate our approach, we carry out experiments over three evaluation metrics namely the Mean Absolute Error (MAE) [22], the Root Mean Squared Error (RMSE) [23] and the Distance criteron (dist_crit) [24] defined by:

$$\begin{aligned} MAE= \frac{\sum _{u,i}\vert {p}_{u,i} - {r}_{u,i}\vert }{N}, \end{aligned}$$
(15)
$$\begin{aligned} RMSE=\sqrt{ \frac{\sum _{u,i}( {p}_{u,i} - {r}_{u,i})^2}{N}} \end{aligned}$$
(16)
$$\begin{aligned} dist\_crit = \frac{\sum _{u,i} dist\_crit(i)}{\text {N}} \end{aligned}$$
(17)

where

(18)
  • \({r}_{u,i}\) is the real rating for the user u on the item i and \( {p}_{u,i}\) is the predicted value of the rating. N is the total number of the predicted ratings over all the users and n is the number of the possible ratings that can be provided in the system. \(\delta _{i}\) is equal to 1 if \( {r}_{u,i} \) is equal to \( {p}_{u,i}\) and 0 otherwise.

It is appropriate for all these measures to remain as low as possible in order to achieve a higher performance of the predictions. Hence, a small value of MAE, RMSE and dist_crit means a better prediction accuracy.

5.2 Experimental Protocol

Working on the MovieLens data set, we follow the same experimental protocol introduced by [19]. In the first step, we rank the movies available in the data set according to the number of the provided ratings. Hence, we get:

$$ Nb_{user}(m_1) \ge Nb_{user}(m_2) \ge ...Nb_{user}(m_{1682}) $$

where \(Nb_{user}\)(\(m_{i}\)) is the number of users who rated the movie \(m_i\). From the original MovieLens data set, we extract 10 subsets each of which contains the ratings provided by the users for 20 movies. The selection of the subsets is performed by progressively increasing the number of the missing rates. In other words, since few ratings provided for the total number of items are available, each subset will contain a specific number of ratings leading to different degrees of sparsity. For each subset, we randomly extract 20 % of the available ratings as a test set and the remaining 80 % are used as a training set. We compute the MAE, the RMSE and the dist_crit for each subset by varying each time the value of the neighborhood size K.

5.3 Experimental Results

Our proposed approach is characterized by its ability to deal with the uncertainty pervaded in the prediction task. Hence, we will be concerned by showing how this contribution improves the accuracy of the predictions. That is why, we perform several experiments over the 10 subsets by varying each time K from 1 to 10.

Performance for Different Sparsity Degrees

Table 1 recapitulates results considering different sparsity degrees and recommendations’ approaches in the two cases namely, certain case and uncertain case. We mention that the obtained results for each subset correspond to the average of 10 Nearest-Neighbors in order to have fair results over the four approaches. As can be seen, the evidential item-based CF has practically better MAE, RMSE and dist_crit values comparing to the three other approaches under a certain framework. For example, at a sparsity level of 75 %, the MAE of our proposed method (equal to 0.744) is lower than the MAE of Pearson CF (equal to 0.943) leading to a reduction of 20 % in the error rate. Similarly, it outperforms both Cosine (equal to 0.877) and Euclidean CF (equal to 0.851). Besides, the dist_crit corresponding to our approach remains the lowest over the different sparsity degrees. (e.g. 0.786 compared to 1.283, 1.353 and 1.444 at a sparsity of 95.9 %.)

Table 1. Comparison result in term of MAE, RMSE and dist_crit

Performance for Different Neighborhood Sizes

According to the neighborhood size, we can observe a variation of the MAE, the RMSE and the dist_crit corresponding to the four approaches. This variation is illustrated in Figs. 2, 3 and 4. As seen, the four approaches have almost the same behavior over the different neighborhood sizes. However, we can observe that the curve of the evidential item-based CF remains always under those of the three traditional methods. According to these results, the evidential approach shows the greatest performance over all the traditional item-based CF methods. Thereupon, we can conclude that our new approach significantly improves the prediction accuracy.

Fig. 2.
figure 2

The MAE results

Fig. 3.
figure 3

The RMSE results

Fig. 4.
figure 4

The dist_crit results

6 Conclusion

In this paper, we have proposed a new recommendation approach based on the Evidential K-Nearest Neighbors. Our approach is based on the generation and the combination of the different bba’s corresponding to each similar item which allows an improvement over the crisp results related to the traditional methods. This solution may increase the user’s confidence towards the system as well as the accuracy of the provided predictions which would be fully beneficial to the RSs field especially, nowadays where reliability becomes a crucial parameter to attend user’s satisfaction. Moreover, the proposed evidential item-based collaborative filtering approach offers the users the possibility of having a global overview of their future interests which leads to a better decision making. As future works, we suggest to introduce uncertainty in the users’ preferences which are considered as the inputs of our approach.