Definition

The goal of a recommender system is to generate meaningful recommendations to a collection of users for items or products that might interest them. Suggestions for books on Amazon, or movies on Netflix, are real-world examples of the operation of industry-strength recommender systems. The design of such recommendation engines depends on the domain and the particular characteristics of the data available. For example, movie watchers on Netflix frequently provide ratings on a scale of 1 (disliked) to 5 (liked). Such a data source records the quality of interactions between users and items. Additionally, the system may have access to user-specific and item-specific profile attributes such as demographics and product descriptions, respectively. Recommender systems differ in the way they analyze these data sources to develop notions of affinity between users and items, which can be used to identify well-matched pairs. Collaborative Filtering systems analyze historical interactions alone, while Content-based Filtering systems are based on profile attributes; and hybrid techniques attempt to combine both of these designs. The architecture of recommender systems and their evaluation on real-world problems is an active area of research.

Motivation and Background

Obtaining recommendations from trusted sources is a critical component of the natural process of human decision making. With burgeoning consumerism buo-yed by the emergence of the web, buyers are being presented with an increasing range of choices while sellers are being faced with the challenge of personalizing their advertising efforts. In parallel, it has become common for enterprises to collect large volumes of transactional data that allows for deeper analysis of how a customer base interacts with the space of product offerings. Recommender systems have evolved to fulfill the natural dual need of buyers and sellers by automating the generation of recommendations based on data analysis.

The term “collaborative filtering” was introduced in the context of the first commercial recommender system, called Tapestry (Goldberg, Nichols, Oki, & Terry, 1992), which was designed to recommend documents drawn from newsgroups to a collection of users. The motivation was to leverage social collaboration in order to prevent users from getting inundated by a large volume of streaming documents. Collaborative filtering, which analyzes usage data across users to find well-matched user-item pairs, has since been juxtaposed against the older methodology of content filtering, which had its original roots in information retrieval. In content filtering, recommendations are not “collaborative” in the sense that suggestions made to a user do not explicitly utilize information across the entire user-base. Some early successes of collaborative filtering on related domains included the GroupLens system (Resnick, Neophytos, Bergstrom, Mitesh, & Riedl, 1994b).

As noted in Billsus and Pazzani (1998), initial formulations for recommender systems were based on straightforward correlation statistics and predictive modeling, not engaging the wider range of practices in statistics and machine learning literature. The collaborative filtering problem was mapped to classification, which allowed dimensionality reduction techniques to be brought into play to improve the quality of the solutions. Concurrently, several efforts attempted to combine content-based methods with collaborative filtering, and to incorporate additional domain knowledge in the architecture of recommender systems.

Further research was spurred by the public availability of datasets on the web, and the interest generated due to direct relevance to e-commerce. Netflix, an online streaming video and DVD rental service, released a large-scale dataset containing 100 million ratings given by about half-a-million users to thousands of movie titles, and announced an open competition for the best collaborative filtering algorithm in this domain. Matrix Factorization (Bell, Koren, & Volinsky, 2009) techniques rooted in numerical linear algebra and statistical matrix analysis emerged as a state-of-the-art technique.

Currently, recommender systems remain an active area of research, with a dedicated ACM conference, intersecting several subdisciplines of statistics, machine learning, data mining, and information retrievals. App-lications have been pursued in diverse domains ranging from recommending webpages to music, books, movies, and other consumer products.

Structure of Learning System

The most general setting in which recommender systems are studied is presented in Fig. 1. Known user preferences are represented as a matrix of n users and m items, where each cell r u, i corresponds to the rating given to item i by the user u. This user ratings matrix is typically sparse, as most users do not rate most items. The recommendation task is to predict what rating a user would give to a previously unrated item. Typically, ratings are predicted for all items that have not been observed by a user, and the highest rated items are presented as recommendations. The user under current consideration for recommendations is referred to as the active user.

Recommender Systems. Figure 339.1
figure 1_705

User ratings matrix, where each cell r u, i corresponds to the rating of user u for item i. The task is to predict the missing rating r a, i for the active user a

The myriad approaches to recommender systems can be broadly categorized as:

  • Collaborative Filtering (CF): In CF systems, a user is recommended items based on the past ratings of all users collectively.

  • Content-based recommending: These approaches recommend items that are similar in content to items the user has liked in the past, or matched to pre-defined attributes of the user.

  • Hybrid approaches: These methods combine both collaborative and content-based approaches.

Collaborative Filtering

Collaborative filtering (CF) systems work by collecting user feedback in the form of ratings for items in a given domain and exploiting similarities in rat-ing behavior amongst several users in determining how to recommend an item. CF methods can be further subdivided into neighborhood-based and model-based approaches. Neighborhood-based methods are also commonly referred to as memory-based approaches (Breese, Heckerman, & Kadie, 1998).

Neighborhood-based Collaborative Filtering 

In neighborhood-based techniques, a subset of users are chosen based on their similarity to the active user, and a weighted combination of their ratings is used to produce predictions for this user. Most of these approaches can be generalized by the algorithm summarized in the following steps:

  1. 1.

    Assign a weight to all users with respect to similarity with the active user.

  2. 2.

    Select k users that have the highest similarity with the active user – commonly called the neighborhood.

  3. 3.

    Compute a prediction from a weighted combination of the selected neighbors’ ratings.

In step 1, the weight w a, u is a measure of similarity between the user u and the active user a. The most commonly used measure of similarity is the Pearson correlation coefficient between the ratings of the two users (Resnick, Iacovou, Sushak, Bergstrom, & Reidl, 1994a), defined below:

$${w}_{a,u} = \frac{{\sum \nolimits }_{i\in I}({r}_{a,i} -{\overline{r}}_{a})({r}_{u,i} -{\overline{r}}_{u})} {\sqrt{{\sum \nolimits }_{i\in I}{({r}_{a,i} -{\overline{r}}_{a})}^{2}{ \sum \nolimits }_{i\in I}{({r}_{u,i} -{\overline{r}}_{u})}^{2}}}$$
(1)

where I is the set of items rated by both users, r u, i is the rating given to item i by user u, and \({\overline{r}}_{u}\) is the mean rating given by user u.

In step 3, predictions are generally computed as the weighted average of deviations from the neighbor’s mean, as in:

$${p}_{a,i} ={ \overline{r}}_{a} + \frac{{\sum \nolimits }_{u\in K}({r}_{u,i} -{\overline{r}}_{u}) \times {w}_{a,u}} {{\sum \nolimits }_{u\in K}{w}_{a,u}}$$
(2)

where p a, i is the prediction for the active user a for item i, w a, u is the similarity between users a and u, and K is the neighborhood or set of most similar users.

Similarity based on Pearson correlation measures the extent to which there is a linear dependence between two variables. Alternatively, one can treat the ratings of two users as a vector in an m-dimensional space, and compute similarity based on the cosine of the angle between them, given by:

$$\begin{array}{rcl}{ w}_{a,u}& =& \cos ({{r}}_{a},{{r}}_{u}) = \frac{{{r}}_{a} \cdot { {r}}_{u}} {\|{{r}{}_{a}\|}_{2} \times \|{ {r}{}_{u}\|}_{2}} \\ & =& \frac{{\sum \nolimits }_{i=1}^{m}{r}_{a,i}{r}_{u,i}} {\sqrt{{\sum \nolimits }_{i=1}^{m}{r}_{a,i}^{2}}\sqrt{{\sum \nolimits }_{i=1}^{m}{r}_{u,i}^{2}}}\end{array}$$
(3)

When computing cosine similarity, one cannot have negative ratings, and unrated items are treated as having a rating of zero. Empirical studies (Breese et al., 1998) have found that Pearson correlation generally performs better. There have been several other similarity measures used in the literature, including Spearman rank correlation, Kendall’s τ correlation, mean squared differences, entropy, and adjusted cosine similarity (Herlocker, Konstan, Borchers, & Riedl, 1999; Su & Khoshgoftaar, 2009).

Several extensions to neighborhood-based CF, which have led to improved performance are discussed below.

Item-based Collaborative Filtering : When applied to millions of users and items, conventional neighborhood-based CF algorithms do not scale well, because of the computational complexity of the search for similar users. As a alternative, Linden, Smith, and York (2003) proposed item-to-item collaborative filtering where rather than matching similar users, they match a user’s rated items to similar items. In practice, this approach leads to faster online systems, and often results in improved recommendations (Linden et al., 2003; Sarwar, Karypis, Konstan, & Reidl, 2001).

In this approach, similarities between pairs of items i and j are computed off-line using Pearson correlation, given by:

$${w}_{i,j} = \frac{{\sum \nolimits }_{u\in U}({r}_{u,i} -\bar{ {r}}_{i})({r}_{u,j} -\bar{ {r}}_{j})} {\sqrt{{\sum \nolimits }_{u\in U}{({r}_{u,i} -\bar{ {r}}_{i})}^{2}}\sqrt{{\sum \nolimits }_{u\in U}{({r}_{u,j} -\bar{ {r}}_{j})}^{2}}}$$
(4)

where U is the set of all users who have rated both items i and j, r u, i is the rating of user u on item i, and \(\bar{{r}}_{i}\) is the average rating of the ith item across users.

Now, the rating for item i for user a can be predicted using a simple weighted average, as in:

$${p}_{a,i} = \frac{{\sum \nolimits }_{j\in K}{r}_{a,j}{w}_{i,j}} {{\sum \nolimits }_{j\in K}\vert {w}_{i,j}\vert }$$
(5)

where K is the neighborhood set of the k items rated by a that are most similar to i.

For item-based collaborative filtering too, one may use alternative similarity metrics such as adjusted cosine similarity. A good empirical comparison of variations of item-based methods can be found in Sarwar et al. (2001).

Significance Weighting: It is common for the active user to have highly correlated neighbors that are based on very few co-rated (overlapping) items. These neighbors based on a small number of overlapping items tend to be bad predictors. One approach to tackle this problem is to multiply the similarity weight by a significance weighting factor, which devalues the correlations based on few co-rated items (Herlocker et al., 1999).

Default Voting: An alternative approach to dealing with correlations based on very few co-rated items is to assume a default value for the rating for items that have not been explicitly rated. In this way one can now compute correlation (Eq. 1) using the union of items rated by users being matched as opposed to the intersection. Such a default voting strategy has been shown to improve collaborative filtering by Breese et al. (1998).

Inverse User Frequency: When measuring the similarity between users, items that have been rated by all (and universally liked or disliked) are not as useful as less common items. To account for this Breese et al. (1998) introduced the notion of inverse user frequency, which is computed as f i = lognn i , where n i is the number of users who have rated item i out of the total number of n users. To apply inverse user frequency while using similarity-based CF, the original rating is transformed for i by multiplying it by the factor f i . The underlying assumption of this approach is that items that are universally loved or hated are rated more frequently than others.

Case Amplification: In order to favor users with high similarity to the active user, Breese et al. (1998) introduced case amplification which transforms the original weights in Eq. (2) to

$${w'}_{a,u} = {w}_{a,u} \cdot \vert {w}_{a,u}{\vert }^{\rho -1}$$

where ρ is the amplification factor, and ρ ≥ 1.

Other notable extensions to similarity-based collaborative filtering include weighted majority prediction (Nakamura & Abe, 1998) and imputation-boosted CF (Su, Khoshgoftaar, Zhu, & Greiner, 2008).

Model-based Collaborative Filtering 

Model-based techniques provide recommendations by estimating parameters of statistical models for user ratings. For example, Billsus and Pazzani (1998) describe an early approach to map CF to a classification problem, and build a classifier for each active user representing items as features over users and available ratings as labels, possibly in conjunction with dimensionality reduction techniques to overcome data sparsity issues. Other predictive modeling techniques have also been applied in closely related ways.

More recently, latent factor and matrix factorization models have emerged as a state-of-the-art methodology in this class of techniques (Bell et al., 2009). Unlike neighborhood based methods that generate recommendations based on statistical notions of similarity between users, or between items, latent factor models assume that the similarity between users and items is simultaneously induced by some hidden lower-dimensional structure in the data. For example, the rating that a user gives to a movie might be assumed to depend on few implicit factors such as the user’s taste across various movie genres. Matrix factorization techniques are a class of widely successful latent factor models where users and items are simultaneously represented as unknown feature vectors (column vectors) w u , h i k along k latent dimensions. These feature vectors are learnt so that inner products w T u h i approximate the known preference ratings r u, i with respect to some loss measure. The squared loss is a standard choice for the loss function, in which case the following objective function is minimized,

$$J\left (W,H\right ) = \sum \limits_{(u,i)\in L}{\left ({r}_{u,i} - {w}_{u}^{T}{h}_{ i}\right )}^{2}$$
(6)

where W = [w 1 …w n ]T is an n ×k matrix, H = [h 1 …h m ] is a k ×m matrix, and L is the set of user-item pairs for which the ratings are known. In the impractical limit where all user-item ratings are known, the above objective  function is J(W, H) = ∥ RWH 2 fro where R denotes the n ×m fully known user-item matrix. The solution to this problem is given by taking the truncated SVD of R, R = UDV T and setting \(W = {U}_{k}{D}_{k}^{\frac{1} {2} },H = {D}_{k}^{\frac{1} {2} }{V }_{k}^{T}\) where U k , D k , V k contain the k largest singular triplets of R. However, in the realistic setting where the majority of user-item ratings are unknown and insufficient number of matrix entries are observed, such a nice globally optimal solution cannot in general be directly obtained, and one has to explicitly optimize the non-convex objective function J(W, H). Note that in this case, the objective function is a particular form of weighted loss, that is, \(J(W,H) =\| S \odot {(R - WH)\|}_{fro}^{2}\) where ⊙denotes elementwise products, and S is a binary matrix that equals one over known user-item pairs L, and 0 otherwise. Therefore, weighted low-rank approximations are pertinent to this discussion (Srebro & Jaakkola, 2003). Standard optimization procedures include gradient-based techniques, or procedures like alternating least squares where H is solved keeping W fixed and vice versa until a convergence criterion is satisfied. Note that fixing either W or H turns the problem of estimating the other into a weighted linear regression task. In order to avoid learning a model that overfits, it is common to minimize the objective function in the presence of regularization terms, J(W, H) + γW2 + λH2, where γ, λ are regularization parameters that can be determined by cross-validation. Once W, H are learnt, the product WH provides an approximate reconstruction of the rating matrix from where recommendations can be directly read off.

Different choices of loss functions, regularizers, and additional model constraints have generated a large body of literature on matrix factorization techniques. Arguably, for discrete ratings, the squared loss is not the most natural loss function. The maximum margin matrix factorization (Rennie & Srebro, 2005) approach uses margin-based loss functions such as the hinge loss used in SVM classification, and its ordinal extensions for handling multiple ordered rating categories. For ratings that span over K values, this reduces to finding K − 1 thresholds that divide the real line into consecutive intervals specifying rating bins to which the output is mapped, with a penalty for insufficient margin of separation. Rennie and Srebro (2005) suggest a nonlinear conjugate gradient algorithm to minimize a smoothed version of this objective function.

Another class of techniques is the nonnegative matrix factorization popularized by the work of Lee and Seung (1999) where nonnegativity constraints are imposed on W, H. There are weighted extensions of NMF that can be applied to recommendation problems. The rating behavior of each user may be viewed as being a manifestation of different roles, for example, a composition of prototypical behavior in clusters of users bound by interests or community. Thus, the ratings of each user are an additive sum of basis vectors of ratings in the item space. By disallowing subtractive basis, nonnegativity constraints lend a part-based interpretation to the model. NMF can be solved with a variety of loss functions, but with the generalized KL-divergence loss defined as follows,

$$J(W,H) = \sum \limits_{u,i\in L}{r}_{u,i}\log \frac{{r}_{u,i}} {{w}_{u}^{T}{h}_{i}} - {r}_{u,i} + {w}_{u}^{T}{h}_{ i}$$

NMF is in fact essentially equivalent to probabilistic latent semantic analysis (pLSA) which has also previously been used for collaborative filtering tasks (Hofmann, 2004).

The recently concluded million-dollar Netflix competition has catapulted matrix factorization techniques to the forefront of recommender technologies in collaborative filtering settings (Bell et al., 2009). While the final winning solution was a complex ensemble of different models, several enhancements to basic matrix factorization models were found to lead to improvements. These included:

  1. 1.

    The use of additional user-specific and item-specific parameters to account for systematic biases in the ratings such as popular movies receiving higher ratings on average.

  2. 2.

    Incorporating temporal dynamics of rating behavior by introducing time-dependent variables.

In many settings, only implicit preferences are available, as opposed to explicit like–dislike ratings. For example, large business organizations, typically, meticulously record transactional details of products purchased by their clients. This is a one-class setting since the business domain knowledge for negative examples that a client has no interest in buying a product ever in the future is typically not available explicitly in corporate databases. Moreover, such knowledge is difficult to gather and maintain in the first place, given the rapidly changing business environment. Another example is recommending TV shows based on watching habits of users, where preferences are implicit in what the users chose to see without any source of explicit ratings. Recently, matrix factorization techniques have been advanced to handle such problems (Pan & Scholz, 2009) by formulating confidence weighted objective function, \(J(W,H) ={ \sum \nolimits }_{(u,i)}{c}_{u,i}{\left ({r}_{u,i} - {w}_{u}^{T}{h}_{i}\right )}^{2}\), under the assumption that unobserved user-item pairs may be taken as negative examples with a certain degree of confidence specified via c u, i .

The problem of recovering missing values in a matrix from a small fraction of observed entries is also known as the Matrix Completion problem. Recent work by Candès & Tao (2009) and Recht (2009) has shown that under certain assumptions on the singular vectors of the matrix, the matrix completion problem can be solved exactly by a convex optimization problem provided with a sufficient number of observed entries. This problem involves finding among all matrices consistent with the observed entries, the one with the minimum nuclear norm (sum of singular values).

Content-based Recommending

Pure collaborative filtering recommenders only utilize the user ratings matrix, either directly, or to induce a collaborative model. These approaches treat all users and items as atomic units, where predictions are made without regard to the specifics of individual users or items. However, one can make a better personalized recommendation by knowing more about a user, such as demographic information (Pazzani, 1999), or about an item, such as the director and genre of a movie (Melville, Mooney, & Nagarajan, 2002). For instance, given movie genre information, and knowing that a user liked “Star Wars” and “Blade Runner,” one may infer a predilection for science fiction and could hence recommend “Twelve Monkeys.” Content-based recommenders refer to such approaches, that provide recommendations by comparing representations of content describing an item to representations of content that interests the user. These approaches are sometimes also referred to as content-based filtering.

Much research in this area has focused on recommending items with associated textual content, such as web pages, books, and movies; where the web pages themselves or associated content like descriptions and user reviews are available. As such, several approaches have treated this problem as an information retrieval (IR) task, where the content associated with the user’s preferences is treated as a query, and the unrated documents are scored with relevance/similarity to this query (Balabanovic & Shoham, 1997). In NewsWeeder (Lang, 1995), documents in each rating category are converted into tf-idf word vectors, and then averaged to get a prototype vector of each category for a user. To classify a new document, it is compared with each prototype vector and given a predicted rating based on the cosine similarity to each category.

An alternative to IR approaches, is to treat recommending as a classification task, where each example represents the content of an item, and a user’s past ratings are used as labels for these examples. In the domain of book recommending, Mooney and Roy (2000) use text from fields such as the title, author, synopses, reviews, and subject terms, to train a multinomial naive Bayes classifier. Ratings on a scale of 1 to k can be directly mapped to k classes (Melville et al., 2002), or alternatively, the numeric rating can be used to weight the training example in a probabilistic binary classification setting (Mooney & Roy, 2000). Other classification algorithms have also been used for purely content-based recommending, including k-nearest neighbor, decision trees, and neural networks (Pazzani & Billsus, 1997).

Hybrid Approaches

In order to leverage the strengths of content-based and collaborative recommenders, there have been several hybrid approaches proposed that combine the two. One simple approach is to allow both content-based and collaborative filtering methods to produce separate ranked lists of recommendations, and then merge their results to produce a final list (Cotter & Smyth, 2000). Claypool, Gokhale, and Miranda (1999) combine the two predictions using an adaptive weighted average, where the weight of the collaborative component increases as the number of users accessing an item increases.

Melville et al. (2002) proposed a general framework for content-boosted collaborative filtering, where content-based predictions are applied to convert a sparse user ratings matrix into a full ratings matrix, and then a CF method is used to provide recommendations. In particular, they use a Naïve Bayes  classifier trained on documents describing the rated items of each user, and replace the unrated items by predictions from this classifier. They use the resulting pseudo ratings matrix to find neighbors similar to the active user, and produce predictions using Pearson correlation, appropriately weighted to account for the overlap of actually rated items, and for the active user’s content predictions. This approach has been shown to perform better than pure collaborative filtering, pure content-based systems, and a linear combination of the two. Within this content-boosted CF framework, Su, Greiner, Khoshgoftaar, and Zhu (2007) demonstrated improved results using a stronger content-predictor, TAN-ELR, and unweighted Pearson collaborative filtering.

Several other hybrid approaches are based on traditional collaborative filtering, but also maintain a content-based profile for each user. These content-based profiles, rather than co-rated items, are used to find similar users. In Pazzani’s approach (Pazzani, 1999), each user-profile is represented by a vector of weighted words derived from positive training examples using the Winnow algorithm. Predictions are made by applying CF directly to the matrix of user-profiles (as opposed to the user-ratings matrix). An alternative approach, Fab (Balabanovic & Shoham, 1997), uses relevance feedback to simultaneously mold a personal filter along with a communal “topic” filter. Documents are initially ranked by the topic filter and then sent to a user’s personal filter. The user’s relevance feedback is used to modify both the personal filter and the originating topic filter. Good et al. (1999) use collaborative filtering along with a number of personalized information filtering agents. Predictions for a user are made by applying CF on the set of other users and the active user’s personalized agents.

Several hybrid approaches treat recommending as a classification task, and incorporate collaborative elements in this task. Basu, Hirsh, and Cohen (1998) use Ripper, a rule induction system, to learn a function that takes a user and movie and predicts whether the movie will be liked or disliked. They combine collaborative and content information, by creating features such as comedies liked by user and users who liked movies of genre X. In other work, Soboroff (1999) multiply a term-document matrix representing all item content with the user-ratings matrix to produce a content-profile matrix. Using latent semantic Indexing, a rank-k approximation of the content-profile matrix is computed. Term vectors of the user’s relevant documents are averaged to produce a user’s profile. Then, new documents are ranked against each user’s profile in the LSI space.

Some hybrid approaches attempt to directly combine content and collaborative data under a single probabilistic framework. Popescul, Ungar, Pennock, and Lawrence (2001) extended Hofmann’s aspect model (Hofmann, 1999) to incorporate a three-way co-occurrence data among users, items, and item content. Their generative model assumes that users select latent topics, and documents and their content words are generated from these topics. Schein, Popescul, Ungar, and Pennock (2002) extend this approach, and focus on making recommendations for items that have not been rated by any user.

Evaluation Metrics

The quality of a recommender system can be evaluated by comparing recommendations to a test set of known user ratings. These systems are typical measured using predictive accuracy metrics (Herlocker, Konstan, Terveen, & Riedl, 2004), where the predicted ratings are directly compared to actual user ratings. The most commonly used metric in the literature is Mean Absolute Error (MAE) – defined as the average absolute difference between predicted ratings and actual ratings, given by:

$$\begin{array}{rlrlrl} &MAE = \frac{{\sum }_{\{u,i\}}\vert {p}_{u,i} - {r}_{u,i}\vert } {N} &\end{array}$$
(7)

Where p u, i is the predicted rating for user u on item i, r u, i is the actual rating, and N is the total number of ratings in the test set.

A related commonly used metric, Root Mean Squared Error (RMSE), puts more emphasis on larger absolute errors, and is given by:

$$RMSE = \sqrt{\frac{{\sum \nolimits }_{\{u,i\}}{({p}_{u,i} - {r}_{u,i})}^{2}} {N}}$$
(8)

Predictive accuracy metrics treat all items equally. However, for most recommender systems the primary concern is accurately predict the items a user will like. As such, researchers often view recommending as predicting good, that is, items with high ratings versus bad or poorly rated items. In the context of information retrieval (IR), identifying the good from the background of bad items can be viewed as discriminating between “relevant” and “irrelevant” items; and as such, standard IR measures, like Precision, Recall and Area Under the ROC Curve (AUC) can be utilized. These, and several other measures, such as F1-measure, Pearson’s product-moment correlation, Kendall’s τ, mean average precision, half-life utility, and normalized distance-based performance measure are discussed in more detail by Herlocker et al. (2004).

Challenges and Limitations

This section, presents some of the common hurdles in deploying recommender systems, as well as some research directions that address them.

Sparsity: Stated simply, most users do not rate most items and, hence, the user ratings matrix is typically very sparse. This is a problem for collaborative filtering systems, since it decreases the probability of finding a set of users with similar ratings. This problem often occurs when a system has a very high item-to-user ratio, or the system is in the initial stages of use. This issue can be mitigated by using additional domain information (Melville et al., 2002; Su et al., 2007) or making assumptions about the data generation process that allows for high-quality imputation (Su et al., 2008).

The Cold-Start Problem: New items and new users pose a significant challenge to recommender systems. Collectively these problems are referred to as the cold-start problem (Schein et al., 2002). The first of these problems arises in collaborative filtering systems, where an item cannot be recommended unless some user has rated it before. This issue applies not only to new items, but also to obscure items, which is particularly detrimental to users with eclectic tastes. As such the new-item problem is also often referred to as the first-rater problem. Since content-based approaches (Mooney & Roy, 2000; Pazzani & Billsus, 1997) do not rely on ratings from other users, they can be used to produce recommendations for all items, provided attributes of the items are available. In fact, the content-based predictions of similar users can also be used to further improve predictions for the active user (Melville et al., 2002).

The new-user problem is difficult to tackle, since without previous preferences of a user it is not possible to find similar users or to build a content-based profile. As such, research in this area has primarily focused on effectively selecting items to be rated by a user so as to rapidly improve recommendation performance with the least user feedback. In this setting, classical techniques from active learning can be leveraged to address the task of item selection (Harpale & Yang, 2008; Jin & Si, 2004).

Fraud: As recommender systems are being increasingly adopted by commercial websites, they have started to play a significant role in affecting the profitability of sellers. This has led to many unscrupulous vendors engaging in different forms of fraud to game recommender systems for their benefit. Typically, they attempt to inflate the perceived desirability of their own products (push attacks) or lower the ratings of their competitors (nuke attacks). These types of attack have been broadly studied as shilling attacks (Lam & Riedl, 2004) or profile injection attacks (Burke, Mobasher, Bhaumik, & Williams, 2005). Such attacks usually involve setting up dummy profiles, and assume different amounts of knowledge about the system. For instance, the average attack (Lam & Riedl, 2004) assumes knowledge of the average rating for each item; and the attacker assigns values randomly distributed around this average, along with a high rating for the item being pushed. Studies have shown that such attacks can be quite detrimental to predicted ratings, though item-based collaborative filtering tends to be more robust to these attacks (Lam & Riedl, 2004). Obviously, content-based methods, which only rely on a users past ratings, are unaffected by profile injection attacks.

While pure content-based methods avoid some of the pitfalls discussed above, collaborative filtering still has some key advantages over them. Firstly, CF can perform in domains where there is not much content associated with items, or where the content is difficult for a computer to analyze, such as ideas, opinions, etc. Secondly, a CF system has the ability to provide serendipitous recommendations, that is, it can recommend items that are relevant to the user, but do not contain content from the user’s profile.