Keywords

1 Introduction

Recommendation systems are information filtering systems that urge to predict preferences that user might have for an item over other. Recommendation systems are very popular in applications like movies, books, research articles, search queries, social tags, product, financial services, restaurants, twitter pages, job, university, friends and what not. To increase product sales is the primary goal of recommendation system by bringing a relevant item to the user and thus increasing the overall profit, which covers the functional goal of recommendation system such as [1]—relevancy, serendipity and diversity. Most popular recommender systems of today are Group Lens recommender system, Amazon.com recommender system, Netflix Movie recommender system, Google News personalisation system, Facebook friend recommendations, link prediction recommender system [1].

First recommendation system was developed in 1992 by Goldberg, Nichols, Oki and Terry. This was called Tapestry which allows users to rate an item good or bad and further used keyword filtering for recommendation [2,3,4]. Thus, recommendation system works on available information in any form and then applies different filtering techniques to find the most appropriate choice (like the movie, show, web page, scientific literature and news that a user might have interest in). The recommendation system makes use of data mining techniques [4, 5] and prediction algorithm to find out user’s interest in information, item and their other interests. Later on, several recommendation systems developed which use different filterings to lure their customers and make them feel more attended (Fig. 1).

Fig. 1
figure 1

Recommendations and recommender system

The reason for many companies care about recommendation system is to deliver actual value to their customer. Recommender systems provide a scalable way of personalising content for users in scenarios with many items. It engages many scientists, since it is a major problem of data science, a perfect intersection of software engineering, machine learning and statistics. Recommender systems are an effective tool for personalisation. Since it is based on actual user behaviour, users can make decisions directly based on the results. These systems work on unstructured and dynamically changing data because of which predictions are more specific and up to date.

Although recommender systems are application-specific and require specific filtering process, few properties must be addressed by all of them [6] like user preference, prediction accuracy, confidence score, user’s trust on a recommendation system.

The rest of this paper is organised as follows: first section deals with the introduction of recommender system with their applicability and importance in present era. Section 2 presents the goals and critical challenges of recommendation systems. Section 3 presents the classification of recommendation system based on the approach to build recommendation engine. This section presents a brief introduction of content-based recommender system with collaborative techniques in detail and presents two different approaches of collaborative as memory-based and model-based systems. Section 4 presents experimental set-up to show the methods implementation and results. Section 5 gives the conclusion of work and possible future scope.

2 Goals and Critical Challenges

2.1 Goals

Recommender systems are used in different fields, from e-commerce to government applications. Most widely used application of recommendation system comes from e-commerce where companies are competing for enhancing their sales and improve user experience. By recommending interested and preferred items to users’ recommender system helps merchants to increase their profit. Apart from this, the general operational and technical goals of recommendation systems are as follows:

Relevance: The most common operational goal of recommender system is to provide or recommend relevant items to the users. Users are more likely to purchase or opt in items which are of his/her preferences.

Novelty: Recommendation systems are supposed to provide novel or new items each time. The system should not repeatedly show popular items as this may also leads to reduction in user interest [7].

Serendipity: Serendipity is notion to define somewhat unexpected recommendation. It is different from novelty as it is truly surprising to user instead that they did not know about before [8].

Diversity: Recommendation system generally recommends list of similar items which increases the chance that user might not like any time. So, diversity is one of the important goals of recommender system which supports range of items for recommendation.

2.2 Challenges

Following are the critical challenges of recommendation systems:

Scalability: Most collaborative filtering techniques show poor performance with an increase in user and item base.

Grey Sheep: Grey sheep denotes the group of peoples whose opinions do not match with any group of people. These users basically create a problem in the smooth functioning of recommendation system [9].

Synonymy: Most recommender systems face problem to predict accurately the items which are same in features but have different names [10, 11].

Cold Start: New users and items suffer from accurate prediction as not much information is available to start the system [12].

Privacy Breach: Privacy has always been the biggest challenge of a recommender system. While providing an accurate prediction of user system demand to get personalised information of the user.

Shilling Attack: Recommendation is a public activity, so people get biased for their feedbacks and give millions of positive reviews for their own products or items and sometimes negative views of their competitors [13].

3 Classification

Depending on the type of input used to make recommendations, recommender systems are classified into several categories. Out of which, most commonly used techniques are content-based filtering and collaborative filtering. This paper accentuates various recommendation systems used today with their pitfalls and comparative analysis of two major recommendation models (NN and Latent factor model) (Fig. 2).

Fig. 2
figure 2

Classification of recommendation system

3.1 Content-Based Filtering System

Content-based filtering is the most common type of filtering system used. These systems work on rating, which a user gave while creating a profile to get initial information about a user in order to avoid not knowing a new user [14]. To create user profile, two types of information are mainly focused: user’s preferences and users interaction with recommendation system. It simply recommends items on the basis of comparison between the content of the item and a user’s profile. Engines in these systems compare positively rated item by a user with the item he/she did not rate yet. The items with maximum similarities will then be recommended to the users. Different distances are used for measuring distances/similarities between user’s choice and among items in the database (Fig. 3).

Fig. 3
figure 3

Content-based filtering workflow

3.2 Collaborative Filtering

Collaborative filtering algorithm works by collecting and analysing a large amount of information on user behaviour, their preferences and their activities. Collaborative filtering is capable of recommending complex items more accurately because it does not reside on content analysed by machine. Such recommendation systems work on assumption that a user agreed in past will be interested in future as well and they are more probable to like similar kind of item.

Collaborative filtering techniques use distance feature to calculate similarities between user’s choice and among items in database such as cosine distance, Pearson distance and Euclidean distance. We have implemented cosine and Pearson similarities to calculate similarity. Cosine similarity— this uses a coordinate space in which items are represented as a vector. It measures the angle between vectors and gives out their cosine values [5]. Pearson distance—it is a measure of linear correlation between two variables.

$$ S_{\text{pearson}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {r_{u, i} - \hat{r}_{u} } \right) \times \left( {r_{v, i} - \hat{r}_{v} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {r_{u, i} - \hat{r}_{u} } \right)^{2} \times } (r_{v, i} - \hat{r}_{v} )^{2} }} $$
(1)
$$ S_{\text{cosine}} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} r_{u, i} .r_{v, i} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {r_{u, i} } \right)^{2} \times } (r_{v, i} )^{2} }} $$
(2)

The idea is to create a community that shares a common interest [15]. Such users form a neighbourhood. And thus, a user gets a recommendation for items that he/she have not rated before but rated positively by users in his/her community. Collaborative filtering is of following types.

Memory-based approach: they are also called as neighbourhood-based collaborative filtering algorithms, in which the ratings of user–item combinations are predicted on the basis of their neighbourhoods which include user–user-based collaborative filtering and item–item-based collaborative filtering [16,17,18].

User-based Collaborative Filtering: In this, the rating predictions are calculated based on similar minded users of the target user. To predict rating preference for user A, the idea is to find top k similar users of A and compute weighted average of ratings of peer group.

Item-based Collaborative Filtering: In this, item similarity is used to determine rating prediction for target user. The idea is to find a set of similar items for which prediction was sought and then use these items’ rating to compute final prediction of user to item.

Model-based approach: In this, machine learning and data mining methods are used in the context of predictive models. For example: decision tree, rule-based model, Bayesian model and latent factor model. In this paper, we have implemented latent factor model, using SVD (singular value decomposition) (Fig. 4).

Fig. 4
figure 4

Framework of collaborative filtering

4 Experimental Set-up and Results

This paper illustrates the implementation of two basic recommender systems (memory-based and model-based) and compares their performance on the basis of various evaluation parameters.

4.1 Data set

This paper has used the following data sets to implement recommendation algorithms.

Movie lens: This data set describes 5-star rating and free-text tagging activity from movie lens, a movie recommendation service. It contains 100,004 ratings and 1296 tag applications across 9125 movies. These data were created by 671 users between 09 January 1995 and 16 October 2016. This data set was generated on 17 October 2016 [19].

Jester: Over 4.1 million continuous ratings (−10.00 to +10.00) of 100 jokes from 73,421 users were collected between April 1999 and May 2003 [20].

4.2 Working Process

4.2.1 Memory-Based Collaborative Filtering: User-Based Collaborative Filtering

User-based collaborative filtering is based on the assumption that similar users with similar preferences will rate their choices similarly. One has to find that similarity and predict missing ratings for that user. When missing, ratings are known, and we can also recommend user items as per his/her taste [21].

Item-based collaborative filtering—This looks into the sets of items that target user has rated and compute how similar they are to the target item i and then select more similar item k, and also compares their corresponding similarities. The prediction is then computed by taking a weighted average of the target user’s ratings on these similar items [22].

$$ P_{u,j} = \hat{r}_{u} + K\mathop \sum \limits_{i = 1}^{n} S\left( {u, i} \right) \times \left( {r_{i, j} - \hat{r}_{i} } \right) $$
(3)

Model-based filtering: SVD—Singular value decomposition is a well-established technique for identifying latent semantic factors in information retrieval. Collaborative filtering uses SVD by factoring user item rating matrix.

Let the user item rating matrix is described as Rn*m with N number of users’ rate M items, and Rij describes the rating of item j given by user i. For a matrix R, its SVD is factorisation of R into three matrices such that:

$$ R = P\varSigma Q^{\text{T}} $$
(4)

where ∑ is the diagonal matrix whose values σi are the singular values of decomposition, and both P and Q are the orthogonal matrices, which means PTP = Inxn and QTQ = Imxm. Originally, matrix P is n × k, ∑ is k × k, and Q is m × k, where R is n × m and has rank k.

The SVD represents an expansion of the original rating matrix in a coordinate system where the covariance matrix is diagonal. Matrix P represents user latent values, and matrix Q gives the item latent feature for given rating matrix [23].

5 Results

This paper considered the following parameters for evaluation and for a comparison of different algorithms against a data set, which has been shown in Table 1 (Fig. 5).

Table 1 Results
Fig. 5
figure 5

Comparison of RMSE value of UBCF, IBCF and SVD on movie lens data set and Jester data set

5.1 RMSE

The RMSE (root-mean-square error) is computed by averaging the square of the differences between UV and the utility matrix, in those elements where the utility matrix is nonblank. The square root of this average is the RMSE [24].

$$ {\text{RMSE}} = \sqrt {\frac{1}{N}} \mathop \sum \limits_{i = 1}^{n} \left( {P_{i} - R_{i} } \right)^{2} $$
(5)

5.2 MAE

The mean absolute error is an average of the absolute errors (Fig. 6).

Fig. 6
figure 6

Comparison of MAE value of UBCF, IBCF and SVD on movie lens data set and Jester data set

$$ {\text{MAE}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{n} \left| {P_{i} - R_{i} } \right| $$
(6)

5.3 F-Measure

Metric combines Precision and Recall into a single value for comparison pulse.

$$ F{\text{-Measure}} = \frac{{2*{\text{Precision}}*{\text{Recall}}}}{{\left( {{\text{Precision}} + {\text{Recall}}} \right)}} $$
(7)

Precision is the measure of exactness. It determines the fraction of relevant items retrieved out of all items.

$$ {\text{Precision}} = \frac{\text{True Positive}}{{\left( {{\text{True Positive}} + {\text{False Positive}}} \right)}} $$
(8)

Recall is the measure of completeness. It determines the fraction of relevant items retrieved out of all items (Fig. 7).

Fig. 7
figure 7

Comparison of F-Measure value of UBCF, IBCF and SVD on movie lens data set and Jester data set

$$ {\text{Recall}} = \frac{\text{True Positive}}{{\left( {{\text{True Positive}} + {\text{False Negative}}} \right)}} $$
(9)

6 Conclusion and Future Scope

Recommendation system serves as a useful tool for users in expanding their interest and their experience over the Internet. Recommendation accelerates profits for developer and business person by knowing their customers well serving them best. Along with mobiles and computers, they open new security doors for the automobile industry and devices used on daily basis. Among several solutions and facilities, there are some issues related to available recommendation system that needs to be addressed specifically to take most out of them [25].

The recommendation can be made more complete and accurate by using latest data mining techniques and machine learning approach. Incorporating artificial intelligence into underlying algorithm strengthens the system to a greater extent as it helps in knowing the audience well and enough. Further improvisation is required so that recommendation system can do the intended job without compromising privacy and information leakage as mentioned above. All these factors imply that we are still in the urge to make promising systems, and there is way more to go for their development [26].