1 Introduction

With the rapid growth of information on the Internet, it has become a pervasive problem for finding relevant information from the Internet. In this era of Internet, e-commerce has been growing rapidly and allowing millions of product for sale. E-commerce users are suffering from problems of selection of a product from millions of product. Recommendation system helps the e-commerce user to select the items from millions of items [1]. A Recommender system (RS) collects information from a customer about the items he/she is interested in and recommends that items or products [2]. Nowadays, RS is used on almost every E-commerce websites, assisting millions of users. E-Commerce sites such as Netflix and Movielens [3] for moves, Amazon [4] for books, CD’s and many other products, Entree for restaurant and Jester [5] for jokes uses recommender system to assist his consumer. The result of the recommender system is used for both E-Commerce organization as well as users [6] i.e., RS not only assist the customer in getting the preferred item but also increase the revenue of the organization by selling more products. Recommendation System can be classified into three categories based on how the recommendation is performed: Content-based recommendation, Collaborative filtering (CF), and Hybrid approaches. Collaborative Filtering is one of the most widely used and successful technique to recommend an item. It recommends the item to the particular user based on the rating of the other user in the system. Content based approaches perform prediction based on characteristics of the item from their past history, for example recommending a movie that has been categorized as “Action Movies” to a user who likes action movies. Hybrid approach is a combination of content based and collaborative filtering technique in different ways. Figure 1 shows the framework of Memory based collaborative filtering recommendation system. The framework shows that millions of individual users are using E-Commerce sites and their reviews regarding products are store in corresponding server. The E-Commerce organizations collect the data from server and preprocessed into require format (User Rating Matrix). The rating matrix is used to compute the similarity between users by using the different similarity algorithm such as Pearson correlation coefficient, cosine similarity, adjusted cosine similarity and Jaccard’s similarity etc. then rating for a particular item that active user has not yet been given is predicted and top N predicted items are recommended to the active user. The rest of paper is organized as follows. In Sect. 2 Related work is discussed. In Sect. 3 Recommendation system techniques are discussed. In Sect. 4 Issues related to Recommendation system are discussed. In Sect. 5 Evaluation metrics of Recommendation system are discussed. Finally, In Sect. 6 conclusion is given.

Fig. 1
figure 1

Framework of memory based collaborative filtering recommendation system

2 Related work

Rodrigues et al. [6] proposed a framework which combines the item-based collaborative filtering (CF) with user demographic information in cluster weighted mechanism to solve the cold start and data sparsity issues. This system provides the good recommendation to the new users which makes the user experience great and also increases the organization revenue. Better recommendations can be provided by making the cluster based on cross domain data. For example, if a user who likes romantic songs, the system can recommend him love story movies.

Ji et al. [7] introduced a scalable CF algorithm based on matrix factorization, performed prediction using two decision matrices: user-category and user-keyword instead of using single user-item rating matrix. The proposed algorithm is implemented on real data set and the result shows that model has good scalability for new items.

Gu et al. [8] a simple collaborative filtering suffers from data sparsity problem because of the explosive growth of users and items in e-commerce. This paper introduced a dynamic-weighted CF technique (DWCF) to resolve data sparsity and adaptive issues. In this approach similarity between user and items is found then a weight controlling method is proposed to find the impacts of user & item similarity. So this method outperforms under various situations of data sparsity.

Koohi et al. [9] Collaborative filtering suffers from data sparsity and high dimensionality problem. In this paper, author solves these issues by finding the neighbor user using subspace clustering approach. The author constructs the different subset of a rated matrix as Interested (I), Neither Interested Nor Uninterested (NIU) and Uninterested (U). Based on these subsets three level of the tree is created for the neighbors of an active user. This method is efficient in dealing with sparse data.

Verma et al. [10] proposed a recommendation system using collaborative filtering (CF) technique and fuzzy c-means (FCM) clustering algorithm. FCM clustering is used for item clustering and CF is used for rating prediction. FCM performs better than K means clustering because K means has a restriction that one item belongs to single cluster where as one item may similar to more than one group of items.

Kumar et al. [11] proposed a hybrid collaborative filtering method to resolve the issues of sparsity and scalabilities provide more personalized recommendations. The proposed method works in two phases, in the first phase resolve the sparsity using Case based reasoning (CBR) followed average filling and in the second phase resolve the scalability using clustering into the group by Self-organizing map optimized with a Genetic algorithm.

Koohi et al. [12] Proposed a Collaborative Filtering recommendation system using fuzzy C-means clustering algorithm, performance against the K-means and SOM clustering approaches have been evaluated. The experimental result shows that fuzzy c-means clustering outperforms another clustering in terms of accuracy, precision and recall.

Lee et al. [13] introduce a Predictive Clustering-based Collaborative Filtering (PCCF) that combines the Markov model and fuzzy clustering with Clustering based CF (CBCF). This method solves the issue of reduced coverage and of unstable performance by tracking the changes in user preferences and bridging the gap between the static model and dynamic model.

Kim et al. [14] proposed a recommender system for online shopping market using GA K-means clustering. In this system, the author tries to segment the online shopping user according to their buying behavior. GA is used to resolve the local optima problem found in K-means clustering & provide a method of finding the relevant groups more efficiently.

Ar et al. [15] proposed an approach that reduces the prediction error that occurs in collaborative filtering RS. The conventional CF method uses similarity values directly for the rating prediction of an item whereas in proposed approach author uses a genetic algorithm before using the prediction process to get the better result. The statical analysis performed on various similarity matrices such as Vector Cosine Similarity, Pearson’s Correlation and Extended Jaccard Coefficient and result shows that evolutionary approach has reduced the prediction error. Table 1 shows the summary of the works that have done by different authors in the field of the recommendation system.

Table 1 Summarized information of literature review

3 Recommendation system techniques

3.1 Collaborative filtering

Collaborative filtering (CF) technique recommends an item to the particular user based on the rating/opinions of the other user [16, 18]. CF system performs recommendation by building a database of preferences for items by the user. The system then finds the user with similar interest and preferences by calculating similarities between the user profiles [17], build a group of similar user called neighborhood. A user gets the recommendation to those products that he has not rated/purchased but his neighbors are rated. Collaborative filtering performs predictions or recommendations, the prediction is a numerical value and recommendation is a list of top N items that the user will like the most [17] as shown in the Fig. 2. Collaborative filtering technique can be classified into two broad categories (a) Memory-based technique (b) Model-based technique [16,17,18]. Memory-based technique identify the similarity between an active user to all other user using similarity measures such as Pearson correlation, Cosine similarity, Jaccard coefficient etc. Then missing rating of an active user is predicted and the top k rated item is recommended to the active user. In the model-based technique, previous ratings are used to develop a model using machine learning technique. Once the model is developed predictions can be made for an individual user.

Fig. 2
figure 2

Collaborative filtering process

3.2 Content-based filtering

Content-based (CB) approach performs recommendation of those items that are similar in characteristic to the item that the users have already used in their past. CB approach performs more analysis on the attribute of the item in order to produce recommendations. CB filtering (CBF) technique is most successful in webpages, publications and news recommendation.

CBF system automatically creates personalized profiles of the user based on his feedback and type of item likes. In order to generate meaningful recommendations, collected user information is compared against the characteristic of the item examine [19] as shown in the Fig. 3.

Fig. 3
figure 3

Content-based filtering process

3.3 Hybrid filtering

Hybrid filtering system achieves by a combination of two or more recommendation system in order to get better performance over collaborative filtering and content-based filtering. It is possible to combine CF and CBF technique in a different way to obtain hybrid filtering system, which may produce several outputs. Hybridization process categorized into seven different types [17] such as (1) Weighted (2) switching (3) Mixed approach (4) Feature combination (5) feature augmentation (6) Cascade and (7) Meta-level.

4 Issue related to recommendation system

4.1 Limited content analysis

Content-based filtering (CBF) techniques are restricted by the characteristic that is explicitly concerned with the item that is recommended. So in order to obtain enough number of characteristic, the content must be in the form that can be parsed automatically or the characteristics should be assigned manually [24]. CBF also facing another problem that is when two different items having the same characteristics are not distinguishable to the system.

4.2 Cold-start problem

It refers to the situation where it is difficult to make recommendations for a new user and items. Because of lack of sufficient rating information, it is difficult to find similarity between users and items. So, neither the taste of the new users can be predicted nor the new items be rated or purchased by the users, this situation leads to less accurate recommendations. The cold start problem can be solved in many ways such as (a) Ask the new user at the beginning to rate some items. (b) Ask to state the taste of new users explicitly. (c). Recommends items to the new user based on the collected demographic information.

4.3 Data sparsity problem

This is the problem that occurs when a majority of the users do not rate most of the items and consequently, the user-item matrix becomes very sparse. So, the chance of getting a set of users with the similar rating decreases. Collaborative filtering uses the nearest neighbor approach to recommend items and less rating makes difficult to make accurate predictions about items.

4.4 Scalability problem

Recommender system is facing one of the vital and foremost issues with the large real-world dataset are called scalability. If the size of dataset grows with the number of user and items the computation also grows linearly. i.e., when the dataset is small algorithm works well but unable to generate the satisfactory result for a large volume of the dataset. Thus, it is very difficult to apply recommendation technique with huge and dynamic data sets produced by item-users interaction. Scalability problem can be solved using Dimensionality reduction, Bayesian Network and Clustering etc.

4.5 Privacy issue

Recommendation algorithm requires input from the user population to produce quality personalized recommendations; this may lead to issues of data privacy and security. Thus a technique required to be designed that can reasonably and carefully use the user data by assuring that information about the user-item rating can’t be freely available to the malicious users.

4.6 Synonymy

It is the situation which refers similar items having different names or entries. RS algorithms are unable to find the difference between closely related items such as “comedy movie” and “comedy film”. The extreme usage of synonym words decreases the performance of collaborative filtering recommendation. Synonymy problem can be solved by using these methods (a) Construction of a thesaurus (b) Singular Value Decomposition (SVD) (c) Latent Semantic Indexing.

5 Evaluation metrics of RSs

The quality of a recommendation system algorithm can be assessed using the different method. The type of metrics used depends on the types of filtering technique. The assessment of prediction and recommendation has been considered essential so that the user can have the best experience with RSs. Evaluation metrics can be classified as follows:

5.1 Statistical accuracy metric

It evaluates the accuracy by comparing the predicted rating with the actual rating. The commonly used metrics are Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Correlation.

5.1.1 Mean absolute error

It is an average of the absolute deviation between predicted rating and actual rating. The lower MAE value shows the better prediction [20]. Let r1, r2, r3,…., rn are the actual ratings and the corresponding p1, p2, p3,…., pn are the predicted ratings.

It is defined as follows

$$MAE = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left| {r_{i} - p_{i} } \right|}}{n} .$$
(1)

5.1.2 Root mean square error

It is also used for the measure of model performance. RMSE is obtained by squaring the difference between predicted rating and actual rating, adding those together, dividing that by the no of test points and then taking the square root of the result [21]

$$RMSE = \sqrt {\frac{{\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} \left( {{\text{r}}_{\text{i}} - {\text{p}}_{\text{i}} } \right)^{2} }}{\text{n}}} .$$
(2)

5.1.3 Correlation

Correlation analysis refers to the measure of the linear relationship between two variables. A higher correlation value shows more accurate rating prediction or recommendations

$$corr(p,r) = \frac{{\sum\nolimits_{i = 1}^{n} {\begin{array}{*{20}c} {\left( {\begin{array}{*{20}c} {\mathop p\nolimits_{i} } & {\begin{array}{*{20}c} - & {\overline{p} } \\ \end{array} } \\ \end{array} } \right)} & {\left( {\begin{array}{*{20}c} {\mathop r\nolimits_{i} } & {\begin{array}{*{20}c} - & {\overline{r} } \\ \end{array} } \\ \end{array} } \right)} \\ \end{array} } }}{{\mathop {\left[ {\begin{array}{*{20}c} {\sum\nolimits_{i = 1}^{n} {\mathop {\left( {\begin{array}{*{20}c} {\mathop p\nolimits_{i} } & {\begin{array}{*{20}c} - & {\overline{p} } \\ \end{array} } \\ \end{array} } \right)}\nolimits^{2} } } & {\sum\nolimits_{i = 1}^{n} {\mathop {\left( {\begin{array}{*{20}c} {\mathop r\nolimits_{i} } & {\begin{array}{*{20}c} - & {\overline{r} } \\ \end{array} } \\ \end{array} } \right)}\nolimits^{2} } } \\ \end{array} } \right]}\nolimits^{{{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}}} }}.$$
(3)

5.2 Classification accuracy metrics

The recommendation system makes decisions about whether an item is good or not by measuring the frequency [22]. Generally in a Recommendation system, binary rating is used that is the items are relevant to the user interest or not because rating dataset is extremely sparse compared to binary selection dataset. For this metrics, a rating dataset is transformed into the binary dataset. Three classification accuracy metrics Precision, Recall, and F-1 score are often used to assess the relevance between recommendations and user interest [23]

$$\Pr ecision = \frac{\text{Relevant Item Recommended}}{\text{Total Item Recommended}},$$
(4)
$$\text{Re} call = \frac{\text{Relevant Item Recommended}}{\text{Total Relevant Items}},$$
(5)
$$F1 = \frac {{2 * {\text{Pr}} ecision * {\text{Re}}call}} {{ {\text{Pe}}cision + {\text{Re}} call. }}$$
(6)

6 Conclusion

Recommendation system has an ability to provide personalized information on the internet. In this era of internet, lots of RSs have been developed that are based on Content-based filtering, Collaborative filtering and Hybrid system and helps to reduce the problem of information overload. In this study authors found that CF recommendation system provide better recommendation but still facing problem of scalability and sparsity. So there is a possibility to improve the quality and performance of collaborative filtering based recommendation system by using the fuzzy clustering and the optimization technique.