1 Introduction

The World Wide Web users, unsurprisingly experience information overload. Recommender system (RS) is a tool to filter data according to interest of the active user. Till date, Collaborative filtering (CF) is the most successful approach employed in a recommendation system. In CF, opinions are gathered from likeminded users (neighbor users) of active user and used in the decision making process. The Memory based and Model based algorithms are the two categories of CF algorithms. Model-based algorithms are based on predictive model. The general idea is to extract some data from the user-item ratings database and use that as a “model” to make recommendations without using entire database each time. The user-item ratings are modeled with a small set of latent factors that represents characteristics of the items and users (i.e. category of items and preference of users etc.) in the system. This model is then trained and later used to make recommendations. The Memory based approach works on the phenomenon that user who has similar preferences in the past will share similar preferences in the future. Hence, the memory based approach utilizes the entire rating matrix to find the most similar user among the set of users for active user. This set of most similar user is called neighborhood and therefore it is named as neighborhood-based method (Töscher et al. 2008). Memory based algorithms can be classified into two broad categories: User-based CF and Item-based CF algorithms.

Although, CF is a very popular and successful approach, sparsity is a major problem that limits its usefulness. In a RS, the quantity of users and items are unsurprisingly ever increasing. However, due to this many users that are very active may have rated or purchased very limited items from the total items available. Even very popular items might have been purchased or rated by a few users. Consequently there exists high sparsity in user-item ratings matrix. According to Sarwar et al. (2001) the density of matrix is lower than 1 %. The core of CF algorithms is to find similar users or items, but due to extremely sparse user ratings matrix available it is hard to find similar users or items rendering poor performance of CF. In case the system manages to evaluate similarity, it may be possible that this similarity may be not trustworthy. The reason being the amount of information processed was insufficient.

This paper proposes a new CF approach to address the issue of sparsity. Firstly, the biclustering is adapted for dimension reduction, as well as to simultaneously group (clusters) the users and items of the rating data matrix. The similarity between users or items is computed on the basis of the group they belong to. The prediction for an unknown target rating is made by adapting user-based and item-based approaches. The user-based and the item-based approaches utilize the computed users/items similarity respectively. Finally the approach fuses these resultant predictions to estimate the final rating.

2 Background work

In last few years, many researchers have integrated Clustering (Kant and Ansari 2015) with several CF based RS in order to address the sparsity problem. The pioneer work in this field (Sarwar et al. 2002) focuses on partitioning the ratings database into smaller sets by creating clusters of users who have similar preferences. O’Connor and Herlocker (2001) experimented by applying various clustering algorithms to partition items. Xue et al. (2005) used clustering in order to smooth data and for neighborhood selection. Jiang et al. (2006) proposed to combine the iterative clustering algorithm with k-Nearest Neighbor. Here the iterative clustering approach was used to exploit the voting information and also used as a smoothing method to the k-Nearest Neighbor approach.

Huang and Yin (2010) showed that Clustering CF provide more accurate predictions if sparsity of data is too high. Birtolo et al. (2011) confirmed it with their experiments that takes advantage of clustering-based CF. Birtolo and Ronca (2013) proposed two clustering-based algorithms for CF. The first algorithm was based on Fuzzy C-mean clustering and the other one combined trust and similarity. The proposed framework improves quality and coverage of recommendation. Zhang et al. (2014a) adapted user clustering regularization term in their model to optimize standard matrix factorization. This approach integrated user information along with the user-item ratings information to enhance the performance of the recommendation system.

One-dimension clustering is a very popular approach among researchers to cluster users/items. However, it might be possible to miss some useful information by ignoring the opposite dimension. On the other hand, Biclustering technique clusters both dimension i.e. user dimension and item dimension simultaneously. To deal with sparsity problem of RS, the Biclustering technique can be a better approach than one-way cluster technique as demonstrated in literature. George and Merugu (2005) employed a weighted co-clustering algorithm that simultaneously obtains item and user neighborhoods.

Zhang et al. (2014b) proposed cloud-based CF approach using biclustering and Fusion (BiFu). Alqadah et al. (2014) proposed a collaborative filtering method using biclustering neighborhood approach. The system used the local similarity of biclusters and combined it with the global similarity. Vizine et al. (2015) approach combined CF recommendations with demographic information and adapted SCOAL (Simultaneous Co-Clustering and Learning) algorithm. Symeonidis et al. (2008) have used biclustering to reveal the duality between users and items. To match users’ preferences a new similarity measure was proposed. Liang and Leng (2014) proposed a CF approach based on information-theoretic co-clustering.

3 Preliminaries: simultaneous clustering

Simultaneous clustering also known as biclustering or co-clustering is an important technique to perform simultaneous clustering (grouping) of a matrix by using both rows and columns. The aim of biclustering is to find sub-matrices (subgroups of rows/columns) with highest correlation. Literature has shown rich evidence of successful use of biclustering algorithms in diverse application areas including bioinformatics, social network analysis, web mining etc.

In this paper, we have used xMotif biclustering algorithm (Murali and Kasif 2003).

The Algorithm 1 presents the xMotif algorithm, which is a key component of the proposed approach. Let

$$U = \left\{ {u_{1} ,u_{2} \ldots \ldots u_{M} } \right\},\quad {\text{Set}}\;{\text{of}}\;{\text{Users}}$$

\(S = \left\{ {s_{1} ,s_{2} \ldots \ldots s_{N} } \right\}\), Set of Items. Each user \(u_{i} ,i = 1,2 \ldots M\) has rated a subset of items \(s_{j} ,j = 1, 2 \ldots N\). We also have predefined specification 0 < α, β < 1. α is defined as a fraction of total users (U). Then a bicluster b (\(U_{b} ,\) \(S_{b}\)) consists of a subset of user \(U_{b} \subseteq U\) that together has coherent preference for a subset of items \(S_{b} \subseteq S\). The bicluster b (\(U_{b} ,\) \(S_{b}\)) have the following conditions:

  • Size the cardinality of users in \(U_{b}\) is at least an α -fraction of \(U\) (total users),

  • Conservation \(s_{b} \in S_{b}\) have the same state for \(\forall u_{b} \in U_{b}\) and

  • Maximality for each item \(s_{b}^{\prime} \notin S_{b}\), the item is conserved in at most \(\left| {\frac{\beta }{{U_{b} }}} \right|\) users.

The algorithm initially takes \(u_{s}\) users (randomly selected) as input which acts as seeds for the biclusters. Every seed consists of a set of users \(u_{d}\) with \(i_{d}\) number of items. Algorithm 1 summarizes the main steps of xMotif algorithm.

figure a

3.1 Item based collaborative filtering

The item based CF is based on the phenomena that the similar items to the item the active user have already preferred in the past may be preferred by the same user. Firstly it finds out neighborhood (similar items) for each item based on item similarity comparison and then prediction are made for active user on that item, based on rating history of active user on the neighborhood of the target item. Equation 1 illustrates how the active user u rates the similar items, \(i\).

$$Pred_{u,i}^{I} = \begin{array}{*{20}c} {\overline{r} + } & {\frac{{\mathop \sum \nolimits_{{i^{\prime} \in N\left( i \right)}} sim\left( {i,i^{\prime } } \right) \times \left( {r_{{u,i^{\prime}}} - \bar{r}_{{i^{\prime}}} } \right)}}{{\mathop \sum \nolimits_{{i^{\prime} \in N\left( i \right)}} sim\left( {i,i^{\prime } } \right)}}} \\ \end{array}$$
(1)

where \(sim\left( {i,i^{\prime } } \right)\) indicates the similarity between two items, \({\bar{\it r}}_{i}\) is the average rating of item \(i\), \(r_{u,i}\) is the rating of given by user \(u\) to item \(i\) and \(N\left( i \right)\) is the neighbors of item \(i\).

Conventionally, literature suggest various measures to compute the similarity between items, including Pearson Correlation coefficients (PCC), Constrained Pearson Correlation coefficients (CPCC), Cosine(COS) similarity measure, Spearman’s Rank Correlation similarity(SRCC), Jaccard (Koutrika et al. 2009) and mean squared difference (MSD) (Cacheda et al. 2011). One of the widely used similarity measure among these is cosine similarity measure. It is obtained by computing the cosine of the angle between two items.

The COS formulas are defined as follows (Eq. 2):

$$sim\left( {i,i^{\prime } } \right)^{{\varvec{COS}}} = \frac{{\vec{\varvec{r}}_{i} { \cdot }\vec{\varvec{r}}_{{i^{\prime}}} }}{{\vec{\varvec{r}}_{i} \cdot \vec{\varvec{r}}_{{i^{\prime}}} }}$$
(2)

where \({\vec{\text{r}}}_{\text{i}}\) and  \({\vec{\text{r}}}_{{{{i^{\prime}}}}}\) is the average rating of item  \(i\)  and  \(i^{\prime }\) respectively.

3.2 User based collaborative filtering

The User-based CF predicts the active user’s interest for an unrated item based on rating information from ‘neighbors’ (similar users). Hence, it is necessary to find out the nearest ‘neighbors’ for each user. The ‘neighbors’ of an active user are those users who shares similar interests. Equation 3 shows how prediction is achieved by user-based collaborative filtering, where \(sim\left( {u,u^{\prime}} \right)\) is the similarity between a pair of users i.e. users \(u\) and \(u^{\prime}\).

$$Pred_{u,i}^{U} = \bar{r}_{u} \begin{array}{*{20}c} + & {\frac{{\mathop \sum \nolimits_{{u^{\prime} \in N\left( u \right)}} sim\left( {u,u^{\prime } } \right) \times \left( {r_{{u^{\prime},i}} - \bar{r}_{{u^{\prime}}} } \right)}}{{\mathop \sum \nolimits_{{u^{\prime} \in N\left( u \right)}} sim\left( {u,u^{\prime } } \right)}}} \\ \end{array}$$
(3)

The PCC and COS are the most popular used similarity measures. The COS formulas are defined as follows (Eq. 4):

$$sim\left( {u,u^{\prime } } \right)^{COS} = \frac{{\vec{r}_{U} \cdot \vec{r}_{U\prime } }}{{\vec{r}_{U} \cdot \vec{r}_{U\prime } }}$$
(4)

where \({\vec{\text{r}}}_{\text{U}}\) and  \({\vec{\text{r}}}_{\text{U'}}\) is the mean rating value of user u and  \({\text{u}}^{\prime }\) respectively.

4 Proposed system: BiUCF

In practice, there exists high sparsity in user-item ratings matrix. In some cases, the density of matrix is lower than 1 % (Sarwar et al. 2001). This is due to the fact that users’ rate only some of the items that they have viewed from a large pool of items in user-item ratings matrix. Therefore, the user–based or item-based model needs to predict the unknown ratings from very few available ratings. The intuition behind the weighted sum approach is that both models can complement each other and possibly can produce more accurate predictions as compared to individual model.

For example, there may be an instance when the target item has been brought and rated only by few users. This results in item based CF algorithm making prediction by using only few ratings. Simultaneously, if we have enough information about target user like his general rating habits or behavior, it can be used instead of using only few ratings. Similarly if the target user has not rated many items but we have enough information about target item based on the item-based regression model then this information can be used instead of using few items data. Therefore based on the key idea of (Jannach et al. 2012), we propose to combine user- based and item-based CF algorithm in a weighted sum manner. The weighted sum of user-based and item-based models can be computed as (Eq. 5):

$$Pred_{u,i} = {w_{u} }\times {Pred_{u,i}^{U} + w_{i} }\times {Pred_{u,i}^{I} }$$
(5)

where \({\text{Pred}}_{{{\text{u}},{\text{i}}}}^{\text{U}}\) and \({\text{Pred}}_{{{\text{u}},{\text{i}}}}^{\text{I}}\) are the prediction results of user-based CF and item-based CF algorithms respectively, and \({\text{w}}_{\text{u}}\) and \({\text{w}}_{\text{i}}\) are the weights which control the influence of the corresponding models. To minimizes the prediction error (difference between the predicted and the actual rating) individual weights for each single user and item need to be optimized. Hence, a fast, heuristic gradient descent approach is adopted based on an existing work by Koren (2010) and Gedikli et al. (2011). It efficiently estimates weight for every user and item.

figure b

The proposed approach called as BiUCF depicted in Algorithm 2 uses a biclustering algorithm and fuses the User and Item based CF. It computes \(w_{u}\) and \(w_{i}\) for all users and items which are used to infer a user preference to an item.

5 Experimental setup

5.1 Data sets

A large number of benchmark datasets are publically available to evaluate the RS approach. The proposed model has been evaluated with three well known real-life datasets (1) MovieLens 100K (ML-100K), consists of 100,000 ratings on 1682 items (movies) made by 943 users. This dataset is 93.69 % sparse, which means that 4.25 % of the movie has been rated by users. (2) MovieLens 1 M (ML-1 M), consists of approximately one million ratings from 6040 users who reviewed 3952 movies. The sparsity rating of ML-1 M is about 95.75 %. In both the MovieLens dataset, each movie is rated on a scale from 1 to 5 and (3) the EachMovie dataset where we have extracted a subset of 1004 users who reviewed 1091 movies with rating scale from 1 to 6.

5.2 Evaluation measures

After developing a new algorithm for RS, the next step comprises of evaluating the algorithm. The evaluation checks the performance of algorithm under given circumstances. To evaluate the accuracy of prediction, the most widely used metrics are: Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) (Bobadilla et al. 2013).

The MAE is defined as in Eq. 6:

$$MAE = \frac{1}{\left| S \right|}\mathop \sum \limits_{i = 1}^{\left| S \right|} \left| {Pred_{i} - r_{i} } \right|$$
(6)

where |S| is the cardinality of the test ratings, \(Pred_{i}\) the vote predicted for a movie, and \(r_{i}\) the true rating.

The RMSE is presented in Eq. 7:

$${\text{RMSE }} = \sqrt {\frac{1}{\left| S \right|}\mathop \sum \limits_{i = 1}^{\left| S \right|} \left( {Pred_{i} - r_{i} } \right)^{2} } .$$
(7)

5.3 Experimental result and analysis

For the purpose of evaluating and validating the effectiveness of the proposed framework, it’s performance is compared with some existing techniques at different sparsity levels. The dataset is partitioned into two disjoint subsets i.e. training set and the test set. The training set is treated as known information and used to train the algorithm and the test set is used to measure the MAE and RMSE of predictions. To test the sensitivity of proposed work on different scales of the training set and the test set on predicted results, we construct nine pair’s of training sets and the test set. A split 10–90 means that 10 % of ratings are selected as the training set and the other 90 % ratings for testing. A fivefold cross validation has been conducted and the average MAE and RMSE was taken. In order to show the efficiency of BiUCF, we compare it with the RSVD (Funk 2006), ItemRank (Gori and Pucci 2002), iExpand (Liu et al. 2012), LDA(Liu et al. 2012), artificial immune systems based collaborative filtering (AIS + CF) (Chen et al. 2014) and user-based collaborative filtering (UCF) (Resnick et al. 1994) for the rating prediction accuracy.

Table 1 and 2 shows the MAEs and RMSEs of state-of-the-art methods on MovieLens 100 k dataset.

Table 1 Performance comparison with (Chen et al. 2014) based on MAE results on the ML100K dataset
Table 2 performance comparison with (Liu et al. 2012) based on RMSE results on the ML100K dataset

The results show that BiUCF outperforms all the other methods in terms of the obtained MAE/RMSE. A smaller value of MAE/RMSE (represented in bold) means a better performance. The MAE of AIS + CF are more stable than BiUCF, when the sparsity level of data is average (splits between 40–60 and 80–20). However, when the sparsity is very low in between 90–10 or very high 10–90, 20–80 splits and 30–70, BiUCF outperforms the AIS + CF.

The RSVD, LDA and iExpand algorithms based on the dimension reduction have not performed well. However, BiUCF uses biclustering for dimension reduction and is able to discover the better indirect correlations in data, hence it performs better. Another interesting observation can be drawn from the results that if the training set is smaller and sparser, then more significant improvement has been made by BiUCF when compared with all the other algorithms.

Moreover, the performance of our proposed framework has been further demonstrated and evaluated with EachMovie and ML-1M dataset. We adopted two traditional collaborative methods as baselines for this comparison. The baselines include user-based CF with Cosine Correlation Coefficient (UBCF), and an item-based CF with Cosine Correlation Coefficient (IBCF).

The size of the original EachMovie dataset is uncontrollable large which may take high computational time; hence we randomly selected 1004 users who reviewed the 1091 movies from the database. We have partitioned this subset into 5 folds, and then mean MAE and RMSE have been taken as the final output which is listed in the Table 3.

Table 3 MAE/RMSE comparison of Bi-CF algorithms on each movie dataset

The ML-1M dataset has also been partitioned into fivefolds for evaluating the performance of the purposes method. Table 4 illustrates the mean MAE and RMSE. The proposed method provides better MAE/RMSE for this dataset to the one obtained for the previous dataset. This is because the range of voting possibility of MovieLens dataset is lower than the EachMovie dataset.

Table 4 MAE/RMSE comparison of Bi–CF algorithms on ML1 M dataset

6 Conclusion

From past few decades, CF has proved itself as a powerful tool to handle the problem of information overload. Personalized recommendations are made to users on the basis of their preferences/taste and those of other similar users.

However, despite their success and popularity, the problem of data sparsity remains an obstacle. In this paper BiUCF algorithm that takes Biclustering into consideration for clustering user/item simultaneously and then the preference of user for an unrated item has been evaluated using weighted sum of user/item-based models. The detailed numerical analysis on three benchmark datasets indicates that the results of proposed approach are comparable with the traditional approach as well as some state art techniques. The proposed technique provides better accuracy of prediction along with quality of recommendation. However, the proposed BiUCF algorithm can be further improved. We plan to extend the BiUCF algorithm in a real-world recommender system.