Keywords

1 Introduction

Collaborative Filtering (CF) is one of the most well-known approaches that have achieved a widespread success in Recommender Systems (RSs) [1]. It consists in finding out the most similar users (user-based) or the most similar items (item-based) to provide recommendations. Thus, the predictions of the users’ preferences are performed based on the users or the items holding similar ratings. Although their simplicity and intuitiveness, these two CF methods disclose some limitations such as the scalability problem [2], since they require a lot of heavy computations to provide predictions. That is to say, in order to compute the user-user similarities and/or the item-item similarities, the entire ratings matrix needs to be searched which reveals a poor scalability performance. In this context, clustering techniques can be involved to first group users or items according to their available past ratings and predictions can be made independently within each cluster. Hence, clustering would be an ideal process that may increase the scalability of CF systems as well as maintaining a good prediction performance. While clustering users or items, they are likely to bear upon more than only one cluster which is known as soft clustering. Thus, the uncertainty spreading around the clusters assignment should be considered. In this paper, we embrace the belief function theory (BFT) [3,4,5] which is among the most used theories for representing and reasoning under uncertainty. We opt for the Evidential c-Means (ECM) [6], an efficient soft clustering method allowing to deal with the uncertainty at the clusters assignment level. When the uncertainty intervenes at clusters assignment, while using the belief function theory, the output is referred to as credal partition. The pertinence of reasoning under uncertainty in CF is also considered at the final prediction level. Indeed, we also tend to quantify and represent the uncertainty arising behind the provided predictions based on the Evidential k-Nearest Neighbors [7] formalism. Such evidential classifier improves the classification performance by allowing a credal classification of the objects. Therefore, the proposed approach allows us to provide more reliable and intelligible predictions by providing an evidential representation of the final results. This representation allows the users to get an overall information about their future preferences, which may increase their confidence and satisfaction towards the RS. Our aim then is not only to unify both user-based and item-based CF but also to represent and encapsulate the uncertainty appearing in both clusters assignment and prediction process under an evidential framework.

This paper is organized as follows: Sect. 2 recalls the concepts of the belief function theory. Section 3 introduces the related works on CF. Then, our proposed recommendation approach is emphasized in Sect. 4. Section 5 reports the experimental results. Finally, the paper is concluded in Sect. 6.

2 Belief Function Framework

The belief function theory [3,4,5], also referred to as evidence theory, represents a flexible framework for modeling and quantifying imperfect knowledge. Let \(\varOmega \) be the frame of discernment defined as a finite set of variables w. It refers to n elementary events such that: \( \varOmega = \lbrace w_1, w_2, \cdots , w_n \rbrace \). The basic belief assignment (bba) represents the belief committed to each element of \(\varOmega \) such that \(m: 2^{\varOmega } \rightarrow [0,1] \) and \(\sum \limits _{E \subseteq \varOmega } m(E) = 1\). The mass m(E) quantifies the degree of belief exactly assigned to an element E of \( \varOmega \). Evidence may not be equally trustworthy. Hence, a discounting operation can be used to get the discounted bba \(m^\delta \) such that: \( m^{\delta }(E) = (1-\delta )\cdot m(E), \forall E \subset \varOmega ~~and~~~ m^{\delta }(\varOmega )= \delta + (1-\delta )\cdot m(\varOmega ) . \) where \(\delta \) \(\in \) [0,1] is the discounting factor.

To make decisions, the pignistic probability denoted BetP can be used, where BetP is defined as follows: \( BetP (E) = \sum _{F \subseteq \varOmega }\frac{|E \cap F|}{|F|}\frac{m(F)}{(1-m(\varnothing ))}~for~all~E \subseteq \varOmega .\)

3 Related Works on Collaborative Filtering

Clustering-based CF approaches have been suggested in RSs to learn a cluster model and to provide predictions accordingly. For instance, the CF approach introduced in [2] consisted in a partition of the users in a CF system based on clustering and used the obtained partitions as neighborhoods. In the same way, authors in [8] have proposed a CF approach based on users’ preferences clustering. In [9], a CF method has been presented where the training set has been used to generate users clusters and predictions are then generated. A different clustering based-CF has been proposed in [10], where, instead of users, authors proposed to group items into several clusters based on the k-METIS algorithm. These works mentioned above rely either on users clustering or items clustering to predict the users’ preferences. In this work, we are rather interested in joining both users clustering and items clustering under the belief function theory [3,4,5]. In fact, a hybrid CF approach combining both user-based and item-based strategies under the belief function framework has been proposed in [11]. Such evidential approach showed that the fusion framework is effective in improving the prediction accuracy of single approaches (user-based and item-based) under certain and uncertain frameworks. However, it is not able to scale up to real data sets with large collections of items and users since a lot of heavy computations is required. This problem is referred to as the scalability problem which we tackle in our proposed approach.

4 E-HCBCF: Evidential Hybrid Clustering-Based CF

In this approach, we tend to cope with the scalability problem by performing a clustering process of both users and items where uncertainty is also handled.

4.1 Evidential Users Clustering

In this step, an evidential clustering process is performed for the users in the system using the Evidential c-Means (ECM) [6]. We define \( \varOmega _{clus}=\lbrace w_{1}, w_{2},\dots , w_{c} \rbrace \) where c is the numbers of clusters. Hence, the users clustering process bestows a credal partition that allows a given user to be associated to multiple clusters, or rather multiple partitions of clusters. We start by considering the ratings matrix to randomly initialize the cluster prototypes. Then, we compute the euclidean distance between the users and the non empty sets of \(\varOmega _{clus}\). At the end, the convergence and the minimization of a given objective function [6] lead to the generation of the final credal partition. In order to make a final decision about the cluster of the current user, we transform each bba into a pignistic probability \(BetP({w_k}) \) based on the obtained credal partition of each user in the system. These values reflect the degrees of membership of the user u to the cluster \(w_k\). A hard partition can be easily derived by assigning each user to the cluster holding the highest BetP value.

4.2 Modeling Users’ Neighborhood Ratings

According to the obtained users clusters, the set of the k-nearest neighbors of the active user \(U_a\) is extracted. We compute the distances between \(U_a\) and the other users in the same cluster as follows [7]: \( dist(U_a, U_i)=\frac{1}{|I(a, i)|}\sqrt{\sum _{I \in I(a, i)} (r_{a,j}-r_{i,j})^{2}}\), where |I(ai)| is the number of items rated by the user \(U_a\) and the user \(U_i\), \(r_{a,j}\) and \(r_{i,j}\) are respectively the ratings of the users \(U_a\) and \(U_i\) for the item \(I_j\). The set of the k-most similar users, that we denote by \(\varGamma _k\), is selected based on the computed distances. We define the frame of discernment \(\varOmega _{pref}=\lbrace r_1, r_2,\cdots , r_L\rbrace \) as a rank-order set of L preference labels (i.e ratings) where \(r_p < r_l\) whenever \(p < l\). Since we use the belief function theory in order to conveniently model uncertainty, we transform the rating of each user \(U_i\) belonging to \(\varGamma _k\) into a basic belief assignment \(m_{U_a,U_i}\) spanning over the frame of discernment based on the formalism in [7]. This bba reflects the evidence of \(U_i\) for the rating of \(U_a\).

$$\begin{aligned} m_{U_a,U_i}(\lbrace r_p\rbrace )=\alpha _0 e^ {-({\gamma _{r_p}^2}\times (dist(U_a,U_i))^2} \end{aligned}$$
(1)
$$\begin{aligned} m_{U_a,U_i}(\varOmega _{pref})=1-\alpha _0 e^ {-({\gamma _{r_p}^2}\times (dist(U_a,U_i))^2} \end{aligned}$$

Where \(\alpha _0\) is fixed to the value of 0.95 and \(\gamma _{r_{p}}\) is the inverse of the average distance between each pair of users having the same ratings \(r_{p}\). In order to evaluate the reliability of each neighbor, these bba’s are then discounted such that:

$$\begin{aligned} m^{\delta _u}_{U_a,U_i}(\lbrace r_p\rbrace )= (1-\delta _u)\cdot m_{U_a,U_i}(\lbrace r_p\rbrace ) \end{aligned}$$
(2)
$$\begin{aligned} m^{\delta _u}_{U_a,U_i}(\varOmega _{pref})= \delta _u+(1-\delta _u)\cdot m_{U_a,U_i}(\varOmega _{pref}) \end{aligned}$$

The discounting factor \(\delta _u=\frac{dist(U_a,U_i)}{max(dist)}\), where max(dist) is the maximum value of the computed distances between the users.

4.3 Generating Users’ Neighborhood Predictions

Once the \(bba's\) of the similar users are generated, they are fused as follows [7]:

$$\begin{aligned} m^{\delta _u}(\lbrace r_p\rbrace )=\frac{1}{N}(1-\prod _{U\in \varGamma _k}(1-\alpha _{r_{p}}))\cdot \prod _{r_{p} \ne r_{q}}\prod _{{U\in \varGamma _k}}(1-\alpha _{r_{q }} ) \forall r_{p}\in \lbrace r_{1}, \cdots , r_{Nb} \rbrace \ \end{aligned}$$
(3)
$$\begin{aligned} m^{\delta _u}(\varOmega _{pref})=\frac{1}{N}\prod _{p=1}^{Nb} (1-\prod _{U\in \varGamma _k}(1-\alpha _{r_{p }})) \ \end{aligned}$$

Nb is the number of the ratings provided by the similar users, \(\alpha _{r_{p }}\) is the belief committed to the rating \(r_{p }\), \(\alpha _{r_{q }}\) is the belief committed to the rating \( r_{q }\) \(\ne r_{p}\) and N is a normalized factor defined by [7]:

\( N=\sum _{p=1}^{Nb} (1-\prod _{U\in \varGamma _k}(1-\alpha _{{r_{p }}} ) \prod _{r_{q }\ne r_{p }}\prod _{U\in \varGamma _k}(1-\alpha _{{r_{q }}})+\prod _{{p=1}}^{Nb} (\prod _{U\in \varGamma _k}(1-\alpha _{{r_{q}}} )))\ \)

4.4 Evidential Items Clustering

Until now, only the user-based assumption has been adopted. In what follows, the item-side would also be considered. In this phase, we aim also to carry out an evidential items clustering process for the corresponding items in the system. The uncertainty about items assignments to clusters is considered where each item in the user-item matrix can belong to all clusters with a degree of belief. The ECM [6] is applied once again and the ratings matrix is explored in order to generate the final credal partition of the corresponding items. Once the credal partition corresponding to each item in the system has been produced, a pignistic probability \(BetP({w_k})\) is then generated based on the obtained \(bba's\). Thereupon, we apply at this level these pignistic probabilities with the aim of assigning each item to its appropriate cluster through the consideration of the highest values.

4.5 Modeling Items’ Neighborhood Ratings

Given a target item \(I_t\), we now consider the items having the same cluster membership as \(I_t\). To this end, the distances between \(I_t\) and the whole items are computed. Formally, the distance between the target item \(I_t\) and each item \(I_j\) is computed as follows [7]: \( dist(I_t,I_j)=\frac{1}{|U(t, j)|}\sqrt{\sum _{U \in U(t, j)} (r_{i,t}-r_{i,j})^{2}}\). |U(tj)| denotes the number of users that rated both the target item \(I_t\) and the item \(I_j\) where \(r_{i,t}\) and \(r_{i,j}\) correspond to the ratings of the user \(U_i\) for the target item \(I_t\) and for the item \(I_j\). Finally, only the k items having the lowest distances are selected and the rating of each similar item is transformed into a bba defined as:

$$\begin{aligned} m_{I_t,I_j}(\lbrace r_p\rbrace )=\alpha _0 e^ {-({\gamma _{r_p}^2}\times (dist(I_t,I_j))^2} \end{aligned}$$
(4)
$$\begin{aligned} m_{I_t,I_j}(\varTheta _{pref})=1-\alpha _0 e^ {-({\gamma _{r_p}^2}\times (dist(I_t,I_j))^2} \end{aligned}$$

Following [7], \(\alpha _0\) is set to 0.95 and \(\gamma _{r_p}\) is defined as the inverse of the mean distance between each pair of items having the same ratings. Finally, these \(bba's\) are discounted as follows:

$$\begin{aligned} m^{\delta _i}_{I_t,I_j}(\lbrace r_p\rbrace )= (1-\delta _i)\cdot m_{I_t,I_j}(\lbrace r_p\rbrace ) \end{aligned}$$
(5)
$$\begin{aligned} m^{\delta _i}_{I_t,I_j}(\varTheta _{pref})= \delta _i+(1-\delta _i)\cdot m_{I_t,I_j}(\varTheta _{pref}) \end{aligned}$$

where \(\delta _i\) is a discounting factor depending on the items’ distances such as: \(\delta _i=\frac{dist(I_t,I_j)}{max(dist)}\) where max(dist) is the maximum value of the computed distances.

4.6 Generating Items’ Neighborhood Predictions

The aggregation of the \(bba's\) for each similar item is performed as follows:

$$\begin{aligned} m^{\delta _i}(\lbrace r_p\rbrace )=\frac{1}{Z}(1-\prod _{I\in \varGamma _k}(1-\alpha _{r_{p}}))\cdot \prod _{r_{p} \ne r_{q}}\prod _{{I\in \varGamma _k}}(1-\alpha _{r_{q }} ) \forall r_{p}\in \lbrace r_{1}, \cdots , r_{Nb} \rbrace \ \end{aligned}$$
(6)
$$\begin{aligned} m^{\delta _i}(\varTheta _{pref})=\frac{1}{Z}\prod _{p=1}^{Nb} (1-\prod _{I\in \varGamma _k}(1-\alpha _{r_{p }})) \ \end{aligned}$$

Note that Nb is the number of the ratings given by the similar items, \(\alpha _{r_{p }}\) is the belief committed to the rating \(r_{p }\), \(\alpha _{r_{q }}\) is the belief committed to the rating \( r_{q }\) \(\ne r_{p}\) and Z is a normalized factor defined by [7]:

\( Z=\sum _{p=1}^{Nb} (1-\prod _{I\in \varGamma _k}(1-\alpha _{{r_{p }}} ) \prod _{r_{q }\ne r_{p }}\prod _{I\in \varGamma _k}(1-\alpha _{{r_{q }}})+\prod _{{p=1}}^{Nb} (\prod _{I\in \varGamma _k}(1-\alpha _{{r_{q}}} )))\ \)

4.7 Final Predictions and Recommendations

After performing an evidential co-clustering process, predictions incorporating both items’ aspects and users’ aspects have been generated under the belief function theory. Thus, we obtain, on the one hand, bba’s from the k-similar items (\(m^{\delta _i}\)) and, on the other hand, \(bba's\) derived from the k-similar users (\(m^{\delta _u}\)). The fusion of the \(bba's\) corresponding to these two kinds of information sources can be ensured through an aggregation process using Dempster’s rule of combination [3] such that: \( m_{Final}=(m^{\delta _i} \oplus m^{\delta _u}) \). The obtained predictions indicate the belief induced from the two pieces of evidence namely the similar users and the similar items.

Table 1. Comparison results in terms of Precision and Dist_crit

5 Experiments and Discussions

MovieLensFootnote 1, one of the commonly used real world data set in CF, has been adopted in our experiments. We perform a comparative evaluation over our proposed method (E-HCBCF) and the evidential hybrid neighborhood-based CF proposed in [11], denoted by (E-HNBCF). Such hybrid approach achieved better results than state of the art CF approaches both item-based and user-based ones. Compared to E-HNBCF, our new approach incorporates an evidential co-clustering process of the users as well as the items in the system and provides predictions accordingly. We followed the strategy conducted in [12] where movies are first ranked based on the number of their corresponding ratings. Different subsets are then extracted by progressively increasing the number of the missing rates. Hence, each subset will contain a specific number of ratings leading to different degrees of sparsity. For all the extracted subsets, we randomly extract 10% of the users as a testing data and the remaining ones were considered as a training data. We opt for three evaluation metrics to evaluate our proposal: the Precision, the Distance criteron (Dist_crit) [13] and the elapsed time such that: \( Precision= \frac{IR}{IR+UR} ~~and ~~Dist\_crit = \frac{\sum _{u,i} (\sum _{i=1}^{n}(BetP(\{{p}_{u,i}\}) - \theta _{i})^2)}{ \Vert \widehat{ p_{u,i}} \Vert } \). IR indicates that an interesting item has been correctly recommended while UR indicates that an uninteresting item has been incorrectly recommended. \( \Vert \widehat{ p_{u,i}} \Vert \) is the total number of the predicted ratings, n is the number of the possible ratings that can be provided in the system. \(\theta _{i}\) is equal to 1 if \( {p}_{u,i} \) is equal to \( {\widehat{ p_{u,i}}} \) and 0 otherwise, where \({p}_{u,i}\) is the real rating for the user u on the item i and \({\widehat{ p_{u,i}}} \) is the predicted value. Note that the lower the Dist_crit values are, the more accurate the predictions are while the highest values of the precision indicate a better recommendation quality. Experiments are conducted over the selected subsets by switching each time the number of clusters c. We used c = 2, c = 3, c = 4 and c = 5. For each selected cluster, different neighborhood sizes were tested and the average results were computed. Finally, the results obtained for the different numbers of clusters used in the experiments are also averaged. More specifically, we compute the precision and the Dist_crit measures for each value of c and we report the overall results. For the evidential co-clustering process, required parameters were set as follows: \(\alpha \) = 2, \(\beta \) = 2 and \(\delta ^2\) = 10 as invoked in [6]. For the \(bba's\) generation, we remind that \(\alpha _0\) is fixed to the value 0.95 and \(\gamma _{r_{p}}\) is computed as the inverse of the mean distance between each pair of items and users sharing the same rating values. Considering different sparsity degrees, the results are displayed in Table 1. Compared to the evidential hybrid neighborhood-based CF, it can be seen that incorporating an evidential co-clustering process that joins both items clustering and users clustering provides a slightly better performance than the other evidential approach. Indeed, the average precision of E-HCBCF is somewhat greater than the value obtained by the standard E-HNBCF (0.747 compared to 0.737). Besides, it acquires the lowest average value in terms of Dist_crit (0.909 compared to 0.918). These obtained results show that the evidential co-clustering process is effective to maintain a good prediction quality while improving also the scalability performance as depicted in Fig. 1. According to Fig. 1, the elapsed time corresponding to E-HCBCF is substantially lower than the basic E-HNBCF. This outcome is explained by the fact that the standard hybrid evidential CF method needs to search the closest neighbors of the active user as well as the similar neighbors of the target item by browsing the whole user-item ratings space, which results in a huge computing amount.

Fig. 1.
figure 1

Elapsed Time of E-HCBCF vs. E-HNBCF

6 Conclusion

In this paper, we have proposed a new evidential co-clustering CF approach where both items’ aspect and users’ aspect come into play. In fact, a clustering model that joins both users clustering and items clustering has been built under the belief function theory based on the available past ratings. According to the obtained clusters, the k-nearest users and the k-nearest items are selected and predictions are then performed. Global evidence of the neighbors is finally aggregated to get an overall information about the provided predictions. Experiments on a real world CF data set proved the efficiency of the proposed approach where elapsed time has been significantly improved while maintaining a good recommendation performance.