Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

We present here a new technique for making predictions on recommender systems based on collaborative filtering. The underlying idea is based on selecting a different number of neighbors for each user, instead of, as it is usually made, selecting always a constant number k of neighbors. In this way, we have improved significantly the accuracy of the recommender systems.

Recommender Systems are programs able to make recommendations to users about a set of articles or services they might be interested in. Such programs have become very popular due to the fast increase of Web 2.0 [11, 12, 15] and the explosion of available information on the Internet. Although Recommender Systems cover a wide variety of possible applications [3, 4, 16, 19, 21], Movie recommendation websites are probably the best-known example for common users and therefore they have been subject to significant research [2, 14].

Recommender Systems are based on a filtering technique trying to reduce the amount of information available to the user. So far, collaborative filtering is the most commonly used and studied technology [1, 5, 9] and thus judgment on the quality of a recommender system depends significantly on its Collaborative Filtering procedures [9]. The different methods on which Collaborative Filtering is based are typically classified as follows:

  • Memory-based methods [13, 18, 20] use similarity metrics and act directly on the matrix that contains the ratings of all users who have expressed their preferences on the collaborative service; these metrics mathematically express a distance between two users on behalf of their respective ratings.

  • Model-based methods [1] use the matrix with the users’ ratings to create a model on which the sets of similar users will be established. Among the most widely used models of this kind we have: Bayesian classifiers [6], neural networks [10] and fuzzy systems [22].

Generally speaking, commercial Recommender Systems use memory-based methods [8], whilst model-based methods are usually associated with research Recommender Systems. Regardless of the approach used in the Collaborative Filtering stage, the technical purpose generally pursued is to minimize the prediction errors, by making the accuracy [7, 8, 17] of the Recommender Systems as high as possible. This accuracy is usually measured by the mean absolute error (MAE) [1, 9].

In this paper, we will focus, among the Memory-based methods, those which rely on the user-based nearest neighborhood algorithm [1, 5]: The K most similar users to one given (active) user are selected on behalf of the coincidence ratio between their votes as registered within the database. In this paper, a variant of this algorithm is presented. This is based on choosing, for each different user, not a constant, but a variable number of neighbors. As we will see, our algorithm improves significantly the accuracy as compared to the typical user-based nearest neighborhood algorithm with a constant number of neighbor users.

In Sects. 15.2 and 15.3, we formalize some concepts on recommender systems and the memory-based methods of collaborative filtering. In Sects. 15.4 and 15.5, we present new techniques based on the idea of choosing a variable number of neighbors for each user. In Sect. 15.6, we discuss how our algorithm improves significantly the K-nearest neighborhood algorithm. Finally, in Sect. 15.7, we set our conclusions.

2 Recommender Systems

We will consider a recommender system based on a database consisting of a set of m users, U = { 1, , m}, and a set of n items, I = { 1, , n} (in the case of a movie recommender system, U would stand for the database users registered in the system and I would refer to the different movies in the database).

Users rate those items they know with a discrete range of possible values {min, . . ., max}, associating higher values to their favorite items. Typically, this range of values is {1, , 5} or {1, , 10}.

Given a user xU and a item iI, the expression v(x, i), will represent the value with which the user x has rated the item i. Obviously, users may have not rated every item in I. We will use the symbol ∙ to represent that a user has not made any rating concerning an item i. In this way, the possible values in the expression v(x, i) is the set V = {min, , max} ∪{ ∙ }.

In order to offer reliable suggestions, recommender systems try to make accurate predictions about how a user x would rate an item, i, which has not been rated yet by the user x (that is to say, that v(x, i) = ∙ ). Given a user xU and an item iI, we will use the expression v(x, i), to denote the system’s estimation of the value with which the user x is expected to rate the item i.

Different methods have been used so far in order to achieve good estimations on the users’ preferences. The quality of these techniques is typically checked in an empirical way, by measuring two features of the recommender system:

  • The error made in the predictions

  • The amount of predictions that the system can make.

Regarding error made in the estimation, different measures have been proposed. The most used one is probably the MAE [9] (see Definition 15.1) which conveys the mean of the absolute difference between the real values rated by the users, v(x, i), and the estimated values v(x, i). As may be seen in Definition 15.1, in the hypothetical case that the recommender system cannot provide any estimation, then we would consider that MAE = 0.

Definition 15.1 ( MAE).

Let J = { (x, i) | xUiIv(x, i)≠ ∙ v(x, i)≠ ∙ }

$$\mathit{MAE} = \left \{\begin{array}{c l} \frac{1} {\vert J\vert }{\sum \nolimits }_{\begin{array}{c}(x,i)\in J\end{array}}\vert v(x,i) - {v}^{{_\ast}}(x,i)\vert & \text{ if }J\neq \varnothing \\ 0 & \text{ otherwise}\\ \end{array} \right.$$

In order to quantify the amount of predictions that the recommender system can make, it is often used the coverage of a recommender system (see Definition 15.2), being defined as the percentage of predictions actually made by the system over the total amount of every possible prediction within the system. In the hypothetical case that all of the users had rated every item in the database, we would consider that the coverage is 1.

Definition 15.2 ( Coverage).

Let A = { (x, i) | xU, iI, v(x, i) = ∙ }

Let B = { (x, i) | xU, iI, v(x, i) = ∙, v(x, i)≠ ∙ }

$$\mathit{coverage} = \left \{\begin{array}{c l} \frac{\vert B\vert } {\vert A\vert } & \text{ if }\vert A\vert \neq \varnothing \\ 1 & \text{ otherwise}\\ \end{array} \right.$$

3 Memory-Based Methods of Collaborative Filtering

In this section, we will focus on how recommender systems perform a prediction about the value with which the user xU would rate the item iI, v(x, i). These methods are based on the following idea: if we find a user yU who has rated very similarly to xU, then, we can conclude that the user x’s tastes are akin to those of the user y. Consequently, given an item iI which the user x has not rated yet while the user y already has, we could infer that the user x would probably rate the item i with a similar value to the one given by the user y.

Thus, methods of this kind search, for each user, xU, a subset of k users, y1U, . . ., ykU (called ‘neighbors’) who have rated very similarly to the user x. In order to predict the value with which the user x would rate an item iI, the recommender system examines first the values with which the neighbors y1, . . ., yk have rated the item i, and then, uses these values to make the prediction v(x, i). Consequently, two main issues must be considered in order to make predictions:

  • Evaluating how similar two users are in order to select, for each user x, a set of users, y1, . . ., yk (called ‘neighbors’) with similar tastes to the user x.

  • Given a user x, and an item i, estimating the value with which the user x would rate the item i, v(x, i), by considering the values with which the neighbors of x have rated this item i, v(y1, i),...,v(yk, i).

As far as the first issue is concerned, there are several possible measures for quantifying how similar the ratings between two different users are [18]. The similarity measure between two users x, yU is defined on those items who have been rated by both x and y. That is to say, we define the set C(x, y), of the common items between x, yU as follows:

Definition 15.3 ( Common Items).

Given x, yU, we define C(x, y) as the following subset of I:

$$C(x,y) =\{ i \in I\vert v(x,i)\neq \bullet,\ v(y,i)\neq \bullet \}$$

The Mean Square Difference, MSD, may be regarded as the simplest similarity measure:

$$MSD(x,y) = \frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}{(v(x,i) - v(y,i))}^{2}$$

As may be seen, MSD is based on a known metric distanceFootnote 1 and MSD(x, y) ≥ 0. When MSD(x, y) = 0, then we have that x and y have assigned exactly the same values to those items which both users have rated. Besides, the lower MSD(x, y) is, the more similar the users x and y are.

The cosine between two vectors, cos, or the correlation coefficient, ρ(x, y), are the similarity measures most used:

  • The cosine similarity:

    $$\cos (x,y) = \frac{{\sum \nolimits }_{i\in C(x,y)}v(x,i)v(y,i)} {\sqrt{{\sum \nolimits }_{i\in C(x,y)}v{(x,i)}^{2}} \cdot \sqrt{{\sum \nolimits }_{i\in C(x,y)}v{(y,i)}^{2}}}$$
  • The correlation coefficient or Pearson similarity:

    $$\rho (x,y) = \frac{{\sum \nolimits }_{i\in C(x,y)}(v(x,i) -\bar{ x})(v(y,i) -\bar{ y})} {\sqrt{{\sum \nolimits }_{i\in C(x,y)}{(v(x,i) -\bar{ x})}^{2}} \cdot \sqrt{{\sum \nolimits }_{i\in C(x,y)}{(v(y,i) -\bar{ y})}^{2}}}$$

    where \(\bar{x} = \frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}v(x,i)\) and \(\bar{y} = \frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}v(y,i)\)

Unlike MSD(x, y), the measures cos(x, y) and ρ(x, y) do not fulfill the conditions related to distance in metric spaces.Footnote 2 Indeed, both measures lie within the range [ − 1, 1] and, when the higher cos(x, y) or ρ(x, y) are, the more similar the users x and y are.

Once a similarity measure has been chosen, the recommender system selects, for each user x, a subset of the k users most similar to it, N(x) = { y1, . . ., yk}, and then, these are used to predict how the user x will rate an item i: v(x, i).

As for this late issue, the simplest way to determine v(x, i) consists of calculating the mean of the values rated by the k users N(x) = { y1, . . ., yk}. That is to say:

$${v}^{{_\ast}}(x,i) = \frac{1} {\vert B(x,i)\vert }{\sum \nolimits }_{y\in B(x,i)}v(y,i)$$
(15.1)

where B(x, i) = { yN(x) | v(y, i)≠ ∙ }

The estimation v(x, i) can be improved significantly by weighting more the values from the users who are more similar to x over those given by the users who are not so similar. If we consider a measure, sim, like ρ or cos, we could perform this easily in the following way:

$${v}^{{_\ast}}(x,i) = \frac{{\sum \nolimits }_{y\in N(x)\ v(y,i)\neq \bullet }\mathit{sim}(x,y) \cdot v(y,i)} {{\sum \nolimits }_{y\in N(x)\ v(y,i)\neq \bullet }\mathit{sim}(x,y)}$$
(15.2)

As may be seen, the Expression 15.2 works perfectly when using similarity measures like ρ or cos, since they give the higher values to those users who are more similar to a given one. However, the Expression 15.2 does not work when dealing with a function like MSD, since it gives lower values to those items who are more similar (indeed, when two users, x, y have rated exactly with the same values, the value MSD(x, y) = 0). Consequently, when using MSD, the estimation is calculated using the Expression 15.1.

Once we have selected the similarity measure and the estimation expression, the recommender system based on collaborative filtering has been just designed. Both the evaluation of the recommender system in relation to MAE (see Definition 15.1) and the coverage (see Definition 15.2) depend strongly on the constant k, the number of neighbors for each user. Indeed, the optimal value associated to this constant depends on the recommender system and is often hard to find out.

4 Choosing Variable Number of Neighbors for Each User

As we have described in the previous section, typical recommender systems based on collaborative filtering select, for each user x, the set of the k most similar users (neighbors), and then use these neighbors to make predictions about how the user x will rate the different items.

In this section, we discuss a new technique to select the neighbors and calculate the estimations. This is based on choosing a variable number of neighbors (instead of choosing always k neighbors) for each user. This idea is inspired by the fact that, as it usually happens, a certain user x may have much more than k highly similar neighbors, while another one, y, may have much less than k. When this happens, in order to make predictions on x, we are in fact discarding a certain number of users which are highly similar to x, and in the same way, while making predictions on y, we would be including some users which are not similar enough to y, but are merely necessary to complete the fixed number of k neighbors.

In order to avoid this drawback, with our technique, a variable amount of neighbors is associated to each user in the following way. First we need to define a function, d : U ×U+, where d(x, y) measures the inadequacy of user y’s rates in order to predict the ratings of x. A user y will be considered as a neighbor of x when d(x, y) lies under a constant value α (see Definition 15.6). This function d, like ρ and cos, is based on linear regression.

Next it must be dealt with obtaining this function d. We will consider that a user yU is completely suitable to predict the ratings of user xU, if there is a value b(x, y) such that for every item common to both users x and y, (that is to say, ∀iC(x, y)) the following holds:

$$v(x,i) = b(x,y) \cdot v(y,i)$$
(15.3)

In case the user yU is not completely suitable to predict the user xU, there will be an error in the previous expression, and this expression will turn to be:

$$v(x,i) = b(x,y) \cdot v(y,i) + e(x,y,i)$$
(15.4)

Given b(x, y), we evaluate the general error in the statement as follows:

$$\frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}e{(x,y,i)}^{2}$$
(15.5)

In order to define the unsuitability degree, d, of a user y to predict the user x, we will consider the value b(x, y) ∈ such that the Expression 15.5 is minimum. As is commonplace in mathematics (using linear regression), this happens when b(x, y) takes the value described in Definition 15.4.

Definition 15.4.

We define b : U ×U as follows:

$$b(x,y) = \left \{\begin{array}{c l} \dfrac{{\sum \nolimits }_{i\in C(x,y)}v(x,i) \cdot v(y,i)} {{\sum \nolimits }_{i\in C(x,y)}v{(y,i)}^{2}} & \text{ if }C(x,y)\neq \varnothing \\ \infty & \text{ otherwise}\\ \end{array} \right.$$

Factor b(x, y) is used in order to consider the case that two users employ different scales when rating items, even though they have similar tastes. That is to say, we will consider that users x and y have a similar taste when ∀iC(x, y)v(x, i) tends to be very close to b(x, y) ⋅y(i) (where b(x, y) is a constant associated to both users x and y).

The function d(x, y) is the minimum possible value in Expression 15.5 above.

$$d(x,y) =\min \frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}e{(x,y,i)}^{2}$$
(15.6)

It is not too hard to prove that this expression is completely equivalent to the one given in Definition 15.5. In this expression we have considered that d is infinite when there are no items common to both users x and y (that is to say, C(x, y) = ).

Definition 15.5.

We define d : U ×U+ as follows:

$$d(x,y) = \left \{\begin{array}{c l} \frac{1} {\vert C(x,y)\vert } \cdot \left ({\sum \nolimits }_{i\in C(x,y)}v{(x,i)}^{2}-\frac{{({\sum \nolimits }_{i\in C(x,y)}v(x,i)\cdot v(y,i))}^{2}} {{\sum \nolimits }_{i\in C(x,y)}v{(y,i)}^{2}} \right ) & \text{ if }C(x,y)\neq \varnothing \\ \infty & \text{ otherwise}\\ \end{array} \right.$$

Besides, by Proposition 15.1, if we know the value d(x, y), we could bound the value of the mean absolute error between the users x and the prediction made by user y, that is to say:

$$\frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}\vert e(x,y,i)\vert = \frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}\vert v(x,i)-b(x,y)\cdot v(y,i)\vert $$

Proposition 15.1.

Let γ > 0 be a positive real number. We have that if d(x,y) ≤ γ 2 , then:

$$\frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}\vert e(x,y,i)\vert \leq \gamma $$

Proof.

According to Expression 15.6, we have that:

$$d(x,y) = \frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}e{(x,y,i)}^{2}$$

The following expression

$$\frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}\vert e(x,y,i)\vert $$

reaches the maximum value for the users x, yU fulfilling that:

$$d(x,y) = \frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}e{(x,y,i)}^{2} \leq {\gamma }^{2}$$

when

$$\forall i \in C(x,y)\ \vert e(x,y,i)\vert = \gamma $$

Consequently, we have that:

$$\frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}\vert e(x,y,i)\vert \leq \frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}\gamma = \gamma $$

The following proposition is immediately proven by taking into account the above one.

Proposition 15.2.

Let γ such that 0 ≤ γ ≤ 1.

Let y ∈ N(x) such that d(x,y) ≤ γ 2 ⋅ ( maxmin)2.

We have that:

$$\frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}\vert e(x,y,i)\vert \leq \gamma \cdot (\mathit{max} -\mathit{min})$$

Once we have defined the function d, we can state that a user y is a neighbor of x if the value of d(x, y) keeps under constant α.

Definition 15.6 ( Neighborhood).

Given xU, we define N(x) as the following set of users:

$$N(x) =\{ y \in U\vert d(x,y) \leq \alpha \}$$

Although the parameter α may be defined arbitrarily and may depend on the specific recommender system in use, we suggest employing the following number (note that the possible values with which a user can rate an item are {min, , max}):

$$\alpha = \frac{{(\mathit{max} -\mathit{min})}^{2}} {100}$$

When α takes this number, we can be sure, by Proposition 15.2, that for every neighbor, yN(x), the mean absolute error between the users x and y is below the 10% of the difference maxmin, that is to say:

$$\frac{1} {\vert C(x,y)\vert }{\sum \nolimits }_{i\in C(x,y)}\vert e(x,y,i)\vert \leq 0.1 \cdot (\mathit{max} -\mathit{min})$$

Once we have selected the neighbors of a user, xU, we can make an estimate on with which value the user x would rate an item i, by taking into consideration the neighbors of x.

Given an item iI and a neighbor, yN(x), who has rated the item i, we can estimate as b(x, y) ⋅v(y, i) the value with which the user x would rate the item i. In the same way as in Expression 15.1, we take into account all the neighbors who have rated the item i to make the estimation v(x, i) (see Definition 15.7).

In this way, we make an average of all the estimations arisen from the neighbors of x who have rated the item i. In case there are no neighbors which have rated the item i, we would say that v(x, i) = ∙ (that is to say, we cannot estimate the value with which the user x would rate the item i).

Definition 15.7 ( Estimation).

We define the function v : U ×U ∪{ ∙ } such that ∀xU and ∀iI:

$${ v}^{{_\ast}}(x,i) = \left \{\begin{array}{c l} \dfrac{1} {\vert B(x,i)\vert }{\sum \nolimits }_{y\in B(x,i)}b(x,y) \cdot v(y,i) & \text{ if }B(x,i)\neq \varnothing \\ \bullet & \text{ otherwise}\\ \end{array} \right.$$

where B(x, i) = { yN(x) | v(y, i)≠ ∙ }

4.1 Example

Next, we will consider an example in order to illustrate what we have stated above.

Let us consider four users x, y, z, tU and the items i1, i2, i3, i4, i5I. Let us consider that V = { 1, 2, 3, 4, 5}. Just consider the following ratings made by the users:

$$\begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} v(x,{i}_{1}) = 1\quad &v(x,{i}_{2}) = \bullet \quad &v(x,{i}_{3}) = 2\quad &v(x,{i}_{4}) = 3\quad &v(x,{i}_{5}) = 5 \\ v(y,{i}_{1}) = \bullet \quad &v(y,{i}_{2}) = 2\quad &v(y,{i}_{3}) = 3\quad &v(y,{i}_{4}) = 3\quad &v(y,{i}_{5}) = 5 \\ v(z,{i}_{1}) = 5\quad &v(z,{i}_{2}) = \bullet \quad &v(z,{i}_{3}) = \bullet \quad &v(z,{i}_{4}) = 3\quad &v(z,{i}_{5}) = 1 \\ v(t,{i}_{1}) = 2\quad &v(t,{i}_{2}) = 4\quad &v(t,{i}_{3}) = 4\quad &v(t,{i}_{4}) = 5\quad &v(t,{i}_{5}) = \bullet \\ \quad \end{array}$$

We calculate the value α:

$$\alpha = \frac{{(5 - 1)}^{2}} {100} = 0.16$$

In order to make a recommendation to user x, we need to calculate the neighbors of this user x. Consequently, we calculate previously the following:

$$\begin{array}{rcl} C(x,y)& =& \left \{{i}_{3},{i}_{4},{i}_{5}\right \}\ \ \vert C(x,y)\vert = 3 \\ b(x,y)& =& \frac{2\cdot 3 + 3\cdot 3 + 5\cdot 5} {{3}^{2} + {3}^{2} + {5}^{2}} = 0.93\ \ d(x,y) \\ & =& \frac{1} {3}\left ({2}^{2} + {3}^{2} + {5}^{2} -\frac{{(2\cdot 3 + 3\cdot 3 + 5\cdot 5)}^{2}} {{3}^{2} + {3}^{2} + {5}^{2}} \right ) = 0.26 > \alpha \\ C(x,z)& =& \left \{{i}_{1},{i}_{4},{i}_{5}\right \}\ \ \vert C(x,z)\vert = 3 \\ b(x,z)& =& \frac{1\cdot 5 + 3\cdot 3 + 5\cdot 1} {{5}^{2} + {3}^{2} + {1}^{2}} = 0.66\ \ d(x,z) \\ & =& \frac{1} {3}\left ({1}^{2} + {3}^{2} + {5}^{2} -\frac{{(2\cdot 3 + 3\cdot 3 + 5\cdot 1)}^{2}} {{5}^{2} + {3}^{2} + {1}^{2}} \right ) = 3.07 > \alpha \\ C(x,t)& =& \left \{{i}_{1},{i}_{3},{i}_{4}\right \}\ \ \vert C(x,t)\vert = 3 \\ b(x,t)& =& \frac{1\cdot 2 + 2\cdot 4 + 3\cdot 5} {{2}^{2} + {4}^{2} + {5}^{2}} = 0.55\ \ d(x,t) \\ & =& \frac{1} {3}\left ({1}^{2} + {2}^{2} + {3}^{2} -\frac{{(1\cdot 2 + 2\cdot 4 + 3\cdot 5)}^{2}} {{2}^{2} + {4}^{2} + {5}^{2}} \right ) = 0.04 \leq \alpha \\ \end{array}$$

As a result, there is only one neighbor of x, namely, t. That is to say,

$$N(x) =\{ t\}$$

Now, we can estimate how user x would rate the item i2 in the following way:

$${v}^{{_\ast}}(x,{i}_{ 2}) = \frac{1} {1}(b(x,t) \cdot v(t,{i}_{2})) = 0.55 \cdot 4 = 2.2$$

5 The Coverage Improvement

As will be seen in Sect. 15.6, when evaluating the technique described in the previous section, we can see that (when α is low), the MAE level is extraordinarily good, but the coverage is very low. That is to say, the recommender system makes real good but few predictions.

In this section, we deal with a way to get a better coverage, while preserving to the greater possible extent the quality of the system’s predictions.

First of all, we will study why the recommender system makes so few predictions. The main reason lies in the fact that since we only select as neighbors those users who are very suitable so as to predict the rating of a user x, the resulting set of neighbors is usually very small, and consequently it often happens that B(x, i) = for many xU and iI.

In order to correct this, we propose to get a bigger set of the neighbors N(x) of a user x, taking into account also some users, y, which, although might not be considered as very suitable to predict the ratings of x, were indeed useful for making predictions on some items which cannot be predicted by the set of neighbors N(x) alone. This new enlarged set of neighbors of a user x will be called N(x).

For each user, xU, and each item, iI, we will consider the neighbor w(x, i), who is the user most suitable to predict the ratings of x among the users who have rated the item i (see Definition 15.8).

Definition 15.8.

We define a function w : U ×IU fulfilling that

$$w(x,i) = y \Leftrightarrow d(x,y) {=\min }_{\begin{array}{c}z\in U \\ v(z,i)\neq \bullet \end{array}}d(x,z)$$

For each user, xU, we include as neighbors every user w(x, i) where iI (see Definition 15.9). Consequently, in this new set of neighbors, not only we include very suitable neighbors to predict the ratings of x, but also those which let us get a closer prediction on how the user x would rate the items.

Definition 15.9 ( New Neighborhood).

Given xU, we define N(x) as the following set of users:

$${N}^{{_\ast}}(x) = N(x) \cup \{ w(x,i)\vert i \in I\}$$

where N(x) = { yU | d(x, y) ≤ α}

According to this definition, the coverage of the recommender system would be the highest possible, since we are considering for each user xU and each item iI, the best neighbor of x who has rated the item i. Indeed, an item iI cannot be predicted on behalf of the user x if and only if this item i has not been rated by any users.

Once we have defined the neighbors of each user, we will focus on how to make estimations. Unlike Definition 15.6, the new definition of neighbors of xU in Definition 15.9 involves the possibility of including users who are not very suitable so as to predict the ratings of x. In order to have good levels of MAE, we need to weight (unlike the estimation proposed in Definition 15.7) the inadequacy levels of each neighbor of x, in such way that those users more suitable to predict the ratings of x will have more importance. Although this is the underlying idea already implied in Expression 15.2 in Sect. 15.3, we cannot use this expression since, unlike the similarity measures like ρ or cos, the lesser d(x, y) is, the higher will be the adequacy level of user y in order to predict the ratings of x.

In order to weight the unsuitability degrees of each neighbor, we need to design a function fα (see Definition 15.10):

Definition 15.10.

Let α > 0

We define the function fα : +→[0, 1] as follows:

$${f}_{\alpha }(x) = \frac{{\alpha }^{4}} {{x}^{4} + {\alpha }^{4}}$$

This function fulfills the following properties:

Proposition 15.3.

Let α > 0. The previous function f α has the following properties:

  1. i)

    f α (0) = 1 f α (α) = 1∕2

  2. ii)

    If 0 ≤ x 1 < x 2 , then 0 < f α (x 2 ) < f α (x 1 ) ≤ 1

  3. iii)

    If x ≤ 0.5 ⋅ α, then f α (α) > 0.9

  4. iv)

    If x ≥ 2 ⋅ α, then f α (α) < 0.1

The function fα is suitable to weigh the neighbors of x when making estimations. As may be seen in Definition 15.11, we weigh, by means of fα, the inadequacy degree of the neighbors of x.

Definition 15.11 ( New Estimation).

Let xU.

We define the function v : U ×U ∪{ ∙ } such that ∀xU and \(\forall i \in I\), the following holds:

$${ v}^{{_\ast}}(x,i) = \left \{\begin{array}{c l} \dfrac{{\sum \nolimits }_{y\in {N}^{{_\ast}}(x)\ v(y,i)\neq \bullet }{f}_{\alpha }(d(x,y)) \cdot b(x,y) \cdot v(y,i)} {{\sum \nolimits }_{y\in {N}^{{_\ast}}(x)\ v(y,i)\neq \bullet }{f}_{\alpha }(d(x,y))} & \text{ if }\exists y \in {N}^{{_\ast}}(x)\ v(y,i)\neq \bullet \\ \bullet & \text{ otherwise}\\ \end{array} \right.$$

According to the properties of fα, we give much more importance to those neighbors yN(x) with d(x, y) ≤ α, than to those fulfilling d(y, x) ≥ α. Consequently, when there are neighbors yU who have rated an item iI (that is to say, v(y, i)≠ ∙ ) and with unsuitability degree lower than α (that is to say, d(x, y) ≤ α), the estimation calculated in Definition 15.11 is very close to that which ensues from Definition 15.7 in the previous section. Besides, as we have said above, we are considering the highest possible value of coverage.

6 Evaluation of Our Techniques

In this section, we will analyze the value of coverage and MAE of our techniques in relation to the k-neighborhoods based on the metrics, MSD, ρ (‘correlation’) and cos. We have used the “MovieLens” database [23], which has been a reference for many years in research carried out in the area of Collaborative Filtering. The database contains 943 users, 1,682 items and 100,000 ratings, with a minimum of 20 items rated per user. The items represent motion pictures and the rating values range from 1 to 5.

In relation to the techniques presented here, we calculate the constant α as we suggested in a previous section:

$$\alpha = \frac{{(5 - 1)}^{2}} {100} = 0.16$$

In relation to the techniques presented in Sects. 15.4 and 15.5, we have obtained the values of MAE and coverage described in Table 15.1. As may be seen, although the MAE of this second technique is not so good in as the first technique, the coverage level has been increased significantly.

Table 15.1 Evaluation of the techniques presented in the paper

Figure 15.1 illustrates the MAE and coverage values of the algorithm of the k nearest neighbors for different values of k (15, 30, 60, 90, 120, 150, 180, 210 and 240), covering from 1.6 to 25% of the total number of users.

Fig. 15.1
figure 1figure 1

MAE and Coverage for the k nearest neighbors algorithm based on the metrics MSD, ρ and cos

Although MSD provides the best results in relation to MAE, the coverage level keeps very low (until the value of k is high). Instead, the similarity measure ρ (“correlation”) keeps good levels of coverage and MAE (this is an example which helps understand why correlation is often used instead of MSD).

As may be seen, the values the algorithm k-nearest neighbors provides for any of the three similarity measures and for any value in the constant k are consistently worse than those obtained by the technique presented in Sect. 15.5. Besides, our technique presents the advantage that it does not need to find the value for any parameter, unlike the classic algorithm, whose results depend significantly on the parameter k (and consequently, it is always necessary to find out the optimal value for this parameter beforehand).

7 Conclusions

In this paper, we have presented a new technique for making predictions on recommender systems based on collaborative filtering. The underlying idea is based on selecting a variable number of neighbors for each user on behalf of the number of high similar users (neighbors) in the database (instead of, as it is usually made, selecting always a constant number k of neighbors). Thanks to this new technique, we can get a significant improvement of both the value MAE and the coverage.