Abstract
The Recommender Systems (RSs) based on the performance of Collaborative filtering (CF) depends on similarities among users or items obtained by a user-item rating matrix. The conventional measures such as the Pearson correlation coefficient (PCC), cosine (COS), and Jaccard (JACC) provide a varied and dissimilar value when the ratings between the users lie in the positive and negative side of the rating scale. These measures are also not very effective when there is sparsity in the rating matrix of the user-item. These problems are addressed by the Proximity-Impact-Popularity (PIP) similarity measure. Even though the PIP method provides an improved solution for this problem, the range of values for each component in PIP is very high. To address this issue and to improve the performance of a CF-based RS, a modified proximity-impact-popularity (MPIP) similarity measure is introduced. The expression is designed to get PIP values within the range of 0 to 1. A modified prediction expression is proposed to predict the available and unavailable ratings by combining user- and item-related components. The proposed method is tested by using various benchmark datasets. The size of the user-item sparse matrix varies to compare the performance of the methods in terms of mean absolute error, root mean squared error, precision, recall, and F1-measure. The performance of the proposed method is statistically tested through the Friedman and McNemer test. The results obtained by using the evaluation criteria indicate that the proposed method provides a better solution than the conventional methods. The statistical analysis reveals that the proposed method provides minimum MAE and RMSE values. Similarly, it also provides a maximum F1-measure for all the sub-problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
A majority of people spend an increasing amount of time on the Internet because of the excessive quantity of information on various fields it provides [1,2,3,4,5]. A Recommender System (RS) refers to the collection of information on websites related to the user preferences for a set of items. The RS employs different online information sources to predict the users’ preferences for these items [6, 7]; therefore, it plays a vital role in the sale of products or services [8, 9]. Many users prefer to buy the products based on the recommendations of other users; thus, the preferences of the users for different products should be analyzed. From the perspective of a company, this helps to maximize profits and promote its products or services.
The RS methods based on the Collaborative filtering (CF) are incredibly popular among the researchers and practitioners. This is endorsed by a vast number of articles published in the journals and also real-life implementation cases [10,11,12,13]. CF recommends products or services based on the similarities in the preferences of a group of customers or online users, known as neighbors [14,15,16]. The advantage of CF-based RS is that it is domain-independent and comparatively more accurate than content-based filtering (CBF) [17]. With an increase in online purchase and the development of electronic commerce, the automated product recommendation is observed as an essential tool for enhancing the sales of products and services through Internet-based stores [18]. Conversely, CF recommends products to a customer based on the similarities between the users or products. Customers’ past or historical preferences enable them to find out the similarities [16, 19,20,21]. Therefore, the critical component of CF effectively identifies the similarities between the users or items [18].
The Pearson Correlation Coefficient (PCC), Cosine (COS), Jaccard (JACC) and Jaccard Mean Squared Difference (JMSD) are the conventional methods used to compute the similarities between the users or items. The PCC, COS, and mean squared difference is the statistical metrics adopted in CF-based RS; the main advantage of these measures is that they are easy to implement to interpret the similarity values [22]. Similarly, JACC is the ratio of intersection by the union, which helps to find out the similarity between the users based on the number of co-rated items rated by two users. These similarity measures provide high accuracy for CF-based RS. However, these methods suffer from cold-start problems; in other words, only a few items or users are rated. This leads to an extremely sparse user-item rating matrix for the RS [7, 16, 23,24,25]. A similarity matrix computed by using this sparse input matrix misleads an RS [18, 26, 27]. The cold-start is an example of a sparsity problem that occurs when a new user or item is introduced; it becomes difficult to compute the similarities among users or items because of insufficient rating information [25, 28, 29]. The sparse input matrix is a significant issue in an RS, which decreases the performance of CF-based RS [30]. Another problem is that the rating values belong to both the positive and negative side of the rating scale; then, the conventional methods provide different similarity values. This is another drawback of conventional similarity measures. This misleading similarity values, eventually lower the accuracy of the CF-based RS. Therefore, a more effective similarity measure is required to improve the performance of a CF-based RS. To address these issues, Ahn [18] proposed the Proximity-Impact-Popularity (PIP) measure, predominantly for use in the CF to provide a better solution to the sparsity problem. In this method, two agreement conditions are included, and similarity is computed by considering positive and negative ratings. The range of values for the PIP is so wide that the three components (proximity, impact, popularity) are not equally treated; in different scenarios, each component has a different weight; similarly, the values of the components are not normalized [26]. If users provide an extremely positive rating for the co-rated items, then proximity has a greater weight than the impact and popularity. Similarly, if users provide an opposite rating for the co-rated items, then proximity and popularity are treated in the same manner but the impact value is very small. Each component contributes some important information in the PIP calculation; however, as each component is treated in unequal proportion, in reality, this affects prediction accuracy in CF-based RS. This is one of the limitations of the PIP measure. To overcome this, a detailed analysis of the PIP measure has been performed and the shortcomings of the existing similarity measure has been identified. Based on this analysis, a modified PIP (MPIP) similarity measure has been developed to overcome the limitations of PIP measure.
Generally, a similarity-based prediction expression is used to predict the rating, based on user-related average and its weighted average deviation or item-related average and its weighted average deviation. Anyone of these is considered for the prediction process. The user and item-related information are providing important information to make an accurate prediction; in the existing prediction expression, any one of the information is used for the rating prediction. This is one of the shortcomings of the existing prediction expression. Therefore, to improve the accuracy of prediction, a modified prediction expression is devised by adding the user-related deviation in the user-based prediction and item-related deviation in the item-based prediction; further, the predicted rating is the average values of user-and item-based prediction.
In this study, we have modified the PIP similarity measure by converting the range of each component into the range of 0 to 1. The PIP value is the product of proximity, impact and popularity values. These three components are weighed in different proportions in different scenarios. The deviation between the minimum and maximum values of each component is very high. If two users provide extreme positive ratings, then the resultant PIP values are in the magnitude of 103. If users provide different rating values i.e., one user provides a positive rating and another user offers a negative rating, then the PIP values are 10–1. The variation between the two conditions is very high. Direct normalization procedures such as z-score, and min-max normalization may be adopted to get normalized PIP values. However, different normalization procedures provide a different range of values. For different similarity matrix, the normalization techniques provide different values. To overcome this issue, the expression itself has changed into the modified PIP measure to compute improved similarity values between users or items. In the existing similarity measures such as PCC, COS, JACC, and JMSD, the agreement conditions are not adopted. The main problem in existing similarity measures is flat and single value problem. In PIP measure provides better solution for flat and single value problems. In MPIP expression the same agreement conditions in PIP are used to differentiate the similar and dissimilar between two ratings. Similarly, it resolves the flat and single value problems. In MPIP values are designed to get higher magnitude values for agreement conditions and very minimal values for disagreement conditions. If two users are in disagreement with the ratings, then MPIP provides very lesser similarity values for this condition. In the existing PIP similarity measure, constant penalty value is multiplied with the proximity component. Instead of constant value, variable penalty is used in the proposed proximity expression. This helps to compute an improved similarity matrix for CF-based RS. A modified prediction expression has been introduced, which evolves by combining the user-and item-related information to enhance the effectiveness of prediction by using a spare user-item rating matrix. The modified prediction expression is also derived for predicting unavailable ratings. Based on this modified prediction expression, one can find an accurate rating prediction for the users. The modified PIP similarity measure and prediction expression are combined as a proposed framework for CF-based RS. This improved CF-based RS recommends more relevant products or services to the customer, which in turn enhances the level of customers’ satisfaction in e-commerce services.
Experiments are conducted by using MovieLens100KB (ML100KB), and Netflix datasets, which are also used by Ahn [18]. Benchmark datasets such as Epinions, CiaoDVD, MovieTweet, FilmTrust, and MovieLens1MB (ML1MB) are used to validate the proposed framework. Each dataset is divided into different sub-problems, to test the proposed framework under sparse conditions. The performance criteria such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), precision, recall and F1-measure are used to measure the effectiveness of the CF-based RS. McNemer test is conducted to statistically test the performance of the different methods. Friedman rank test is performed to test the performance of the methods in all the sub-problems. The results obtained from the proposed framework are compared with the eleven existing similarity measure. The proposed method provides a minimum of MAE and RMSE values for all the dataset. Similarly, it provides higher accuracy than conventional methods. The statistical tests are conducted on MAE, RMSE and confusion matrix. The results show that the proposed framework can improve prediction performance and quality with the sparse input matrix. The symbols used throughout the paper are listed in Table 1.
The main contribution of this study is as follows: The Modified PIP (MPIP) similarity measure is introduced to provide an improved similarity between users or items. In existing PIP measure, the minimum and maximum value of each component varies at different range. The resultant PIP value heavily depends on any one of the component value. This issue is addressed by the proposed similarity measure (MPIP) by converting each component values into zero to one. In MPIP the variable penalty value is introduced in the proposed proximity expression to differentiate the values at various scenario. Further, Combined user and item-based prediction (CUIP) expression is proposed to get better predicted rating. The CUIP is tuned to forecast for unavailable rating. The MPIP and CUIP both are combined as a new framework to overcome sparsity problem in CF-based RS. A schema is developed to generate different level of sparse input matrix by providing equal importance to all the elements in the input user-item rating matrix. Under this schema number of users, items and ratings size vary in a different proposition. Finally, the McNemer test is explained with the graphical representation for better understating of each component used in the McNemer table.
The remainder of this paper is organized as follows: Section 2 discusses the literature related to CF-based RS and the similarity measures adopted for CF-based RS. Section 3 presents a detailed analysis of the PIP and the issues identified in the PIP expression. Section 4 describes the proposed method, which is a combination of the modified PIP similarity measure and the modified prediction expression. Section 5 presents the experimental results, and Section 6 discusses our conclusions.
2 Related literature
RS is broadly classified into three sectors; they are Content-based Filtering, Collaborative Filtering, and hybrid approach. The content-based filtering predominantly utilizes text-mining concepts; collaborative filtering, which is further subdivided into model-and memory-based filtering; and a hybrid method that combines the text and rating preferences provided by the user. [31]
CF is a type of personalized recommendation technique, which is widely used in many domains [13, 30, 32]. However, CF also suffers from several issues; for example, the cold-start problem, data sparsity [6, 33, 34], and scalability. These problems considerably reduce the user experience. Memory-based filtering is further divided into user-and item-based methods. In this study, literature related to the user-based similarity methods are collected and listed below.
Many studies have been conducted to improve prediction accuracy, resulting in the development of new similarity measures. PCC is often used to compute a linear relationship (i.e., correlation) between a pair of objects; it ranges from −1 to 1, where −1 indicates a negative relationship between the users, mid-value 0 indicates that there is no relationship between users, and + 1 indicates a strong relationship between users.
The COS similarity is a vector space model, which is highly used in information retrieval domains, where the cosine value of the angle between two vectors is used as the similarity between the users. This is calculated by using the following equation:
Bobadilla et al. [6] have proposed a new similarity measure by combining Jaccard similarity and mean squared difference (MSD). Another new metric, Jaccard Mean Squared Difference (JMSD), is proposed to solve the cold user problem.
The Spearman Rank Correlation (SRC) is another similarity measure that relies on the rank of the items instead of the rating provided by the users as in the PCC. The rankings are based on the higher to lower-rated items [35]. Ahn [18] has proposed the PIP similarity measure for CF for both agreement and disagreement situations.
Further, this similarity measure is modified into a non-linear model by Liu et al. [26], which is termed as a new heuristic similarity measure (NHSM). It comprises the proximity-significance-singularity (PSS) combined with the modified Jaccard (JACC′) function and the user rating preference (URP).
Bhattacharyya Coefficient in CF is proposed by Patra et al. [17] by combining global and local similarity measures; this coefficient is treated as a global measure, and the local measure is calculated based on the correlation or cosine similarity measure.
Where ‘BC’ is the Bhattacharyya Coefficient, ‘loc’ is the local similarity measure. Eqs. 16 and 17 are used for calculating local similarity. Eq. 16 is calculated by using the mean as a reference; similarly, Eq.17 is calculated by using the median as a reference.
Generally, the Jaccard similarity measure only counts the frequency of co-rated ratings between the users. This is one of the shortcomings in the Jaccard measure. To overcome this issue, Bag et al., [31] have proposed the Relevant Jaccard Similarity Measure (RJACC), which includes the frequency of un-co-rated items. Besides, a new similarity measure for CF-based RS is evolved by combining the RJACC and MSD which is termed as Relevant Jaccard Mean Squared Deviation (RJMSD) [31].
Where ‘\( \overline{I_{u_j}} \)’ and ‘\( \overline{I_{u_h}} \)’ are the number of un-co-rated items of user ‘j’ and ‘h’. Sub one quasi norm-based similarity measure for CF-based RS is introduced to overcome the issues of similarity measure based on the Euclidean distance. [7]
Where ‘Rran’ is the range value, which is deviation between Rmax and Rmin. ‘Rmax ’ is the maximum value on the rating scale. ‘Rmin’ is the minimum value in the rating scale. ‘g’ is the parameter which varies from 0 to 1.
2.1 Drawbacks in the existing similarity measure
PCC and COS are widely used methods in CF-based RS. The limitations related to PCC and COS are as follows: Flat value problem [18, 36], Single-value problem [36], Equal ratio problem [36], and Opposite value problem [36]. Specifically, Cosine similarity is one of the popular similarity measures used in many research applications. This is highly used in the area of text clustering to find out the similarity between the two documents [37], which is also used in the k-means clustering algorithm to compute the similarity between the data objects and the corresponding centroids. Based on the similarity values, the data objects are grouped into different clusters. It is also used in a multi-objective function for the Krill Herd algorithm. This similarity measure provides higher accuracy than conventional methods. [38]. Jaccard similarity measure is very highly known as a binary similarity measure. This is mainly adopted to find a similarity between the binary variables. This measure provides a better solution in the evolutionary algorithm, such as fitness function for the genetic algorithm [39]. The research, as mentioned above, clearly shows that cosine and Jaccard based similarity measures are highly used in real-life applications to solve various problems. In the case of CF-based RS, the rating values (ordinal scale) are used. The PCC and COS provide equal weightage for both the positive and negative sides of a rating scale which leads to misleading similarity values. This is one of the shortcomings in the PCC and Cosine based similarity measures. In Jaccard similarity measure, only the number of co-rated ratings is considered, not the intensity of the rating. These are some of the shortcomings in existing similarity measures. To overcome this shortcoming, Ahn [18] has proposed a PIP similarity measure. The magnitude of the PIP values is higher and each component in PIP measures treated in different proportions in different scenarios. PSS is the extension of the PIP expression with a non-linear assumption but the agreement conditions used in PIP measure is not included in this expression; thus, the correlation between PIP and PSS is very low. NHSM is a combination of the PSS, modified Jaccard (JACC′) and User Preference Rating (URP) measure. In URP, the deviation of the mean is multiplied with the standard deviation. If both users have the same mean but a different standard variation in this situation, standard variation becomes negligible. BCF similarity measure is a combination of global and local similarity measures. The Bhattacharya coefficient is treated as a global measure. In this expression, only the rating number is used in the calculation; co-rated items are not considered. Relevance Jaccard (RJACC) is the extension of JACC based similarity measure, in this measure, only the frequency of the co-rated and un-co-rated items are considered, the intensity (i.e., magnitude of the rating) is not considered for the similarity computation. SQON is the improved version of Euclidean distance, if the number of co-rated items gets increased then the magnitude of this similarity values also get increased. If two users having higher number of co-rated items, then the SQON provides higher similarity values for the users and vice-versa. These are the shortcomings in the existing similarity measure used in CF-based RS. To overcome these shortcomings, a modified PIP similarity measure has been proposed to get an improved similarity matrix for CF-based RS.
3 Detailed analysis of PIP
The PIP similarity measure [18] comprises two agreement conditions, which in turn include three components: proximity, impact, and popularity. The similarity between the two users ratings, ‘\( {r}_{u_j,{I}_i} \)’ and ‘\( {r}_{u_h,{I}_i} \)’ is calculated as follows:
The agreement conditions play a vital role in the PIP measure. Compared to other similarity measures, the PIP is the only similarity measure that differentiates the positive and negative ratings by using the agreement conditions. This agreement conditions are used to discriminate the ratings based on pattern of ratings provided by two users. Let us consider the set of the users U = {u1, u2, …, un} and set of items I = {I1, I2, …, Im}, Where ‘n’ is the total number of users, and ‘m’ denotes the total number of items. They are associated with the rating matrix called ⟨user × item⟩ rating matrix
To understand better, let us consider ‘r1’ and ‘r2’ are the two ratings, where r1 is the rating provided by user ‘j’ for item ‘i’ and r2 is the rating provided by user ‘h’ for item ‘i’. Where ‘Rmax’ is the maximum rating in the rating scale and ‘Rmin’ is the minimum rating in the rating scale; further let \( {R}_{med}=\frac{R_{max}+{R}_{min}}{2} \), if both rating ‘r1’ and ‘r2’ are lesser or greater than ‘Rmed’. This means Agreement(r1, r2) belongs to TRUE situation. Similarly, if the rating is in the opposite direction, i.e., a rating is greater than ‘Rmed’ and another rating is lesser than ‘Rmed’ the situation is FALSE. The main use for this agreement conditions is to discriminate against the rating. A Boolean function for the agreement conditions of ratings ‘r1’ and ‘r2’ is defined as follows:
The pictorial representation of agreement conditions by considering r1 as a reference is shown in Fig. 1. (A rating scale1 to 5 is considered for plotting this figure). The vertical lines denote the rating inside the plot, and the vertical black line indicates the Agreement of r1 and r2 is TRUE; similarly, the vertical red line denotes the Agreement of r1 and r2 is FALSE.
If both the users provide similar kind of ratings, the Agreement(r1, r2) = TRUE. The minimum and maximum deviation of ratings in Agreement(r1, r2) = TRUE situation are 0 and 2; this shows that both the users agree with the rating values. If both the users provide different kinds of rating for the same item then the Agreement(r1, r2) = FALSE. The minimum deviation of Agreement(r1, r2) = FALSE is two and the maximum deviation value is four. This clearly shows that the deviation between the two ratings provides useful information to calculate the similarity between the users. This agreement condition helps to differentiate the users based on the variation of the ratings provided to the co-rated item.
3.1 Proximity
Generally, proximity is used to determine the closeness of the two objects. Here proximity is the absolute difference between two rating r1 and r2.
Figure 2 (a) is the graphical representation of the proximity values plotted for the rating scale of 1 to 5. The minimum proximity value for Agreement (r1, r2) = TRUE condition is 49. This situation may occur when two ratings are not equal (r1 ≠ r2). If both the ratings are same (i.e., r1 = r2), then the proximity value is 81. Similarly, for the Agreement (r1, r2) = FALSE, condition the minimum value is 1 and the maximum value is 25. The graphical representation of the proximity values for different combinations is shown in Fig. 2 (a).
3.2 Impact
The impact is the second critical component computed in the PIP formula, which provides the information regarding how strong the user preferred or disliked the particular item.
Impact in the range 0.11 to 9. In the impact factor, if Agreement (r1, r2) = TRUE the minimum is 1, and the maximum value is 9. For the Agreement (r1, r2) = FALSE the minimum is 0.11, and the maximum value is 0.25. The impact value for different combinations is shown in Fig. 2(b).
3.3 Popularity
Popularity is calculated based on the deviation of the average rating for the item ‘i’. Let ‘\( {\overline{r}}_{I_i} \)’ be the average rating of item ‘i’ by all users.
In popularity factor, if Agreement (r1, r2) = TRUE, the minimum value is 1, and the maximum value is 5. For computation purpose, the value of \( {\overline{r}}_{I_i} \) is considered as 3. For the Agreement (r1, r2) = FALSE the minimum, and the maximum value is 1. The popularity value for different combinations is represented in Fig. 2(c).
The graphical comparison indicates that the proximity has higher magnitude values than the impact and popularity for Agreement (r1, r2) = TRUE condition. In the case of Agreement (r1, r2) = FALSE condition, the impact value is very minimal when compared to the other two components. So that the similarity value is highly dependent on the proximity and popularity values than the impact in the FALSE condition. A detailed explanation is given in the below sub-section.
3.4 Issues identified in PIP
In PIP, each component value lies in the wider range. Proximity, impact, and popularity are treated in different proportions., The calculation is listed for the rating scale 1 to 5. The maximum value is calculated by considering r1 as 5 and r2 as 5 for Agreement (r1, r2) = TRUE situation. The minimum value is computed by considering r1 as 1 and r2 as 5 for Agreement (r1, r2) = FALSE condition, which is the extreme rating provided by both users (uj, uh) for item ‘Ii’.
The minimum and maximum proximity values for Agreement (r1, r2) = TRUE are 49, and 81, respectively. Similarly, if Agreement (r1, r2) = FALSE the minimum and maximum values are 1 and 9. The range value for proximity is 80. If Agreement (r1, r2) = TRUE, the minimum and maximum impact values are 1 and 9. Similarly, if Agreement (r1, r2) = FALSE the minimum and maximum values are 0.11 and 0.25, respectively. The impact range is 8.89. If Agreement (r1, r2) = TRUE, the minimum and maximum values for popularity are 1 and 5. Similarly, if Agreement (r1, r2) = FALSE the minimum and maximum values are both 1. The range for popularity is 4.
If the ‘uj’ rating is 5, and the ‘uh’ rating is 5; this situation leads to higher values for all the three components; i.e., 81 for proximity, 9 for impact, and 5 for popularity. The PIP is the product of proximity, impact and popularity; the resultant value is 3645. In this calculation, the proximity has a greater weight than the impact and popularity. Each component results in a different proportion. Similarly, for the worst scenario; i.e., if the user ‘uj’ rating is 5, and the ‘uh’ rating is 1, then proximity is 1, the impact is 0.11, and popularity is 1. The final ‘PIP(r1, r2)’ value is 0.11, which is very small. Here, we see that proximity and popularity have equal weight, but the impact has much less value. The maximum PIP is 3645, and the minimum PIP is 0.11. This leads to a greater difference between the minimum and maximum value. We conclude that each factor provides important information for computing the similarity between users. Yet, their different range of values shows that any one of the components provides higher values in different scenarios. This unequal scaling method provides different ranges of values for different scenarios, and the values of each component are non-normalized. Suppose direct normalization procedure is adopted to convert the values into particular range, then the values get changed for different scenario. This leads to lesser prediction accuracy [40], which constitutes a major drawback of the existing PIP similarity measure.
4 Proposed method
Our proposed framework aims to provide an improved solution for the CF-based RS with a sparse data matrix. It consists of a modified PIP (MPIP) measure and a combined user and item-based prediction (CUIP) expression.
4.1 Modified PIP (MPIP) similarity measure
In the existing PIP expression, each component has different values in different scenarios, and greater priority can be given to anyone of the components in different situations. To avoid this situation, the component ranges are converted into zero to one in our modified similarity measure by changing the expressions.
4.1.1 Proposed proximity
Proposed proximity is the normalized value that ranges from zero to one, calculated using absolute deviation between the two ratings r1 and r2.
Where med+ is the median value of the positive rating (i.e., the rating above or equal to the median value of the rating scale), med− is the median value of the negative rating (i.e., the rating value below the median value of the rating scale). Agreement(r1, r2) = TRUE condition the absolute difference between two ratings is subtracted from the average of positive and negative median values. To get a normalized value within the range of 0 to 1 the deviation of Rmax and Rmin values are used. The positive and negative median value is included in the expression to find the closeness of the rating.
Agreement(r1, r2) = FALSE the inverse of the deviation term is used to calculate the proximity value. Our reframed expression provides higher weightage for Agreement(r1, r2) = TRUE condition and less weightage for Agreement(r1, r2) = FALSE condition. Agreement(r1, r2) = FALSE condition, the positive and negative median values are not included because both the users are on the different side of the rating scale values. In both situations, the values are within the range of 0 to 1.
In MPIP variable penalty (δ) is multiplied with the proximity value. The higher penalty is weighed for higher deviation value, and a lesser penalty is given for lesser deviation values. This conversion reduces the magnitude of the value.
4.1.2 Proposed impact
The exponential based expression is used to compute the impact value for the Agreement (r1, r2) = TRUE case. This helps to get a normalized impact value for the TRUE situations.
If r1 and r2 are far away from the median, the impact value is higher. When both the ratings are nearer to the median, the impact provides lesser value. This shows that both the users agree with the median values, so the impact of the ratings is very less.
In the existing PIP measure the impact values for Agreement (r1, r2) = FALSE condition, are lie within the range of zero to one. Therefore, the same expression is used in the proposed impact for FALSE condition .
4.1.3 Proposed popularity
Popularity is the third component in PIP similarity measure, which includes both positive and negative popularity in this expression.
If two ratings are on the same side of the rating scale and the average of two ratings is far away from the item mean, then the popularity is very high. For popular or unpopular items, two users provide similar kinds of ratings, and then, they have high similarity in the type of rating. In the existing method, the minimum value for the Agreement (r1, r2) = TRUE situation is 1. Based on this, the popularity value is set to 1 for Agreement (r1, r2) = FALSE condition. In the proposed method, the minimum value for Agreement (r1, r2) = TRUE is 0.3010; So this value is chosen for all the Agreement (r1, r2) = FALSE condition. The proposed popularity values range from 0.3010–0.778 for the 1–5 rating scale, so that the minimum popularity value is chosen for all Agreement (r1, r2) = FALSE situations.
According to Fig. 3, the proposed proximity, proposed impact and proposed popularity values are within the range of zero to one; in contrast to the existing method, which provides different weights for each component.
4.1.4 Similarity measure computation
The MPIP expression is the product of proposed proximity, proposed impact, and proposed popularity, as shown in Eq. 41
The similarity between users is computed as follows:
Similarity between ‘uj’ and ‘uh’ (sim(uj, uh)) is the item-wise summation of MPIP values.
The same procedure is used for computing the similarity between items, which helps to find a similar kind of items. The maximum similarity values show that most of the users provide a similar pattern of rating for the two items. Likewise, the minimum similarity value shows that most of the users provide a high positive rating for one item and a higher negative rating for another item.
A comparison of the minimum and maximum values is shown in Table 2.
Case 1: Positive or Negative rating.
Both uj and uh have a higher positive or negative rating; i.e., the extreme positive rating means both users uj and uh gives rating 5 for the item ‘i’ otherwise extreme negative rating means both the users uj and uh offers rating 1 for the item ‘i’; the existing PIP method provides a proximity value of 81, an impact of 9, popularity of 5, and a PIP value of 3645. Approximately 85% of weightage is given for the proximity components. When using the MPIP, the proposed proximity is 0.56, the proposed impact is 0.89, the proposed popularity is 0.778, and the MPIP value is 0.389.
Case 2: Median rating.
If uj and uh have a median rating, i.e., both the users uj and uh gives rating 3 for the item ‘i’, then the proximity is 81, the impact is 1, the popularity is 1, and the PIP value is 81. For the MPIP, the proposed proximity is 0.56, the proposed impact is 0.50, the proposed popularity is 0.301, and the MPIP is 0.084.
Case 3: Difference of opinion.
If uj and uh provides different extreme ratings for the item ‘i’, i.e., uj provides rating 1and uh gives rating 5 or vice versa, then the proximity is 1, the impact is 0.11, popularity is 1, and the PIP value is 0.11. For the MPIP, the proposed proximity is 0.002, the proposed impact is 0.11, the proposed popularity is 0.301, and the MPIP value is 0.0001. MPIP values nearer to zero for difference of opinion situation, i.e., very minimal weightage is given for the ratings because both the ratings lie on the extreme value of the rating scale. There is no relationship existing between the two users. This comparison proves that the MPIP values lie within a range of 0 to 1, but the existing method has a vast range.
4.2 Validation for MPIP similarity measure
In the MPIP, the maximum and minimum values range from 0 to 1. To validate the proposed similarity measure (i.e., Modified PIP), a rank correlation test is conducted. A set of rating pairs is generated for the rating scale 1 to 5. The PIP, PSS and MPIP values are computed for each pair of ratings. We generated the set of PIP, Proposed proximity, impact, popularity values and PSS values. The values generated for each pair of ratings are listed in the Table 3.
The magnitude of the values gets changed in the case of MPIP. In PSS method, each component is ranged from zero to one, but the position of the values gets changed because it provides equal weightage for both agreement and disagreement conditions. The rank correlation test is conducted for this set of values; the results are shown in Table 4.
The correlation values of PIP and MPIP components are one. This shows that a strong positive relationship exists between the PIP, and MPIP components. Owing to the violation of the agreement conditions, the PSS measure has much lower correlation values. The correlation between proximity and PSS proximity is higher. In the case of impact and popularity, the values are minimal. At the same time, this shows that a negative relationship exists between impact and PSS significance term. Similarly, there is a negative relationship existing between popularity and singularity. These results clearly show that the violation of agreement condition provides misleading information for the similarity between two ratings. Due to this reason, the same agreement conditions are used in our proposed method. The high correlation between the PIP and MPIP components reveals that the magnitude of the values gets changed in our proposed method, but the equivalent proportion is retained.
4.3 Existing prediction expression
The main objective of the CF-based RS is to predict user ratings based on the similarity measure. Many expressions are used in the prediction process. This user mean-based prediction expression, which is a widely used method that provides a better solution for CF-based RS [5, 18, 36, 41,42,43], is shown as follows:
In this expression, ‘\( {P}_{u_j,{I}_i} \)’ is computed based on the mean of the user ‘j’ which is added with the weighted average deviation of user ‘h’. The weighted average deviation of user ‘j’ is not considered in the prediction process. In weighted deviation, only user-related information is used, item-related information is not included in the prediction expression.
From Fig. 4, the user-related mean is considered for user-based prediction, and the item-related mean is considered for item-based prediction. In the sparse user-item rating matrix, both the mean and deviation provide important information for the rating prediction. This is one of the shortcomings of conventional user-related or item-related prediction expression.
4.3.1 Combined user and item-based prediction expression (CUIP)
A modified prediction expression is derived by incorporating user ‘j’ related components in user-based prediction. Similarly, the item ‘i’ related components are included in the item-based prediction.
Ma and Hu [44] used a hybrid prediction expression for predicting the rating. In this expression, the additional weight parameter ‘λ’ is multiplied by the user based prediction. Similarly, ‘1- λ’ weight is multiplied for the item based prediction. ‘λ’ value varies between 0 to 1. If ‘λ’ equals one, it purely depends on user-based prediction; if ‘λ’ equals zero, it purely depends on item-based prediction. Thus, ‘λ’ is an additional parameter that should be optimized for a different problem.
In the modified prediction expression, the average values of user-based and item-based prediction expression are introduced by providing equal weightage for both the expressions. The modified prediction expression is shown below.
Where ‘UP’ is the user based prediction, j ≠ h, j = {1, 2…. ., n},h = {1, 2……, n′}, and n′ is the number of items co-rated by both the users ‘j’ and ‘h’.
Where ‘IP’ is the item based prediction i ≠ q, i = {1, 2…. ., m},q = {1, 2……, m′}, and m′ is the number of users who rated both items ‘i’ and ‘q’.
The similarity values and deviation play a vital role in prediction expression. Either the user-related mean and its weighted average deviation or the item-related mean and its weighted average deviation are used in the existing prediction expressions. The user-related mean \( {\overline{r}}_{u_j} \) and item-related mean \( {\overline{r}}_{I_i} \) are calculated for user ‘j’ and item ‘i’, but the corresponding deviation is not included in the existing prediction expression. This is one of the shortcomings of the existing prediction expression. In the modified prediction expression (CUIP), the above-mentioned shortcoming is removed by including the user and item-related deviation term.
Forecasting is a vital process in CF-based RS. The forecasting values help the company to identify potential customers and then to promote sales. In CUIP, if the user gives rating for that particular item, i.e., \( {r}_{u_j,{I}_i}\ne \varnothing \); then the deviations carry some values. Suppose the rating is unavailable i.e., \( {r}_{u_j,{I}_i}=\varnothing \), then computing user-related and item-related deviations become complex. The average of the user (uj) related mean and item (Ii) related mean is replaced in the unavailable rating to avoid this situation. For unavailable rating, the user based prediction is calculated using Eq. (48).
Similarly, the item-based prediction is calculated using Eq. (49)
The Eq. (47) is used for computing the final prediction value for the user ‘j’ for the item ‘i’. The combination of modified PIP (MPIP) similarity measure and modified prediction expression (CUIP) is treated as the proposed method. The algorithm for the proposed method is shown below:
The block diagram for our proposed framework is shown in Fig. 5.
The proposed framework consists of two phases: The first phase is the computation of the similarity matrix (user similarity and item similarity matrix), and the second phase is the prediction process. Initially, it requires an input user-item rating matrix for computing similarity values. The similarity between users is computed by using MPIP expression (eq.(42)). Similarly, item similarity is computed by using MPIP expression (eq.(43)). Both user and item related similarity values provide valuable information for rating prediction. For available rating, the Eqs. (45), (46) and (47) are used to compute prediction values. In the case of unavailable ratings, the Eqs. (48), (49), and (47) are adopted for predicting the user’s ratings. Based on the predicted values, the items are sorted in descending order; the top ‘k’ items are recommended to the users.
5 Experiments
Datasets such as MovieLens1MB, Netflix, Epinions, CiaoDVD, MovieTweet, FilmTrust, and MovieLens100KB are used for comparing the conventional and proposed methods. We used MovieLens100KB and Netflix, which are used in [18], and other benchmark datasets that are often used by many researchers for CF-based RS. To validate our proposed method eleven state of the art methods (PCC, COS, JACC, JMSD, PIP, NHSM, BCFcorr, BCFMed, RJACC, RJMSD, and SOQN) are used for comparison.
5.1 Characteristics of sub-problems generated from the dataset
The dataset description, number of users, number of items, number of ratings, and sparsity are listed in Table 5. The dataset is arranged based on the rank of the sparsity (i.e., higher sparsity to lower sparsity).
[a] http://grouplens.org/datasets/movielens/1m
[b] http://grouplens.org/datasets/movielens/100k
[c] http://www.netflixprize.com
[d] http://www.trustlet.org/downloaded epinions.html.
[e] http://www.librec.net/datasets/CiaoDVD.zip
[f] https://github.com/sidooms/MovieTweetings
[g] http://www.librec.net/datasets/filmtrust.zip
The above-mentioned dataset size is large. Computing the similarity matrix and predicting the rating for this nature of the dataset requires higher computational complexity. Because of this, different levels of sparse subsets \( \left(\left(1-\left(\frac{\#R}{n\ast m}\right)\right)\ast 100\right) \) are generated from the user-item rating matrix to validate the proposed method. The schema used for creating different sub-problems is shown in Fig. 6.
The schema comprises four levels. Level 1 relates to the dataset; level 2 is for users, level 3 is for items, and level 4 is for rating. The user-level consists of two sub levels. One sub-level of users varies without any constraints. Another sub-level of users is restricted as 25%, 50%, and 75% of the users in the dataset. The next level is the item level, which consists of two sub-levels. One sub-level of items varies without any constraints. The item sub-level is restricted to 25%, 50%, and 75% of the items in the dataset. The final level consists of the rating. This level varies from 1% to 50% in the number of ratings with increments of 1% in the dataset.
For each dataset, one can create 800 different sub-problems by using the above schema. The number of sub-problems created by each path is shown in Table 6.
5.2 Limitation of the above schema
In this schema, the maximum percentage of users and items is 75%. For this specific combination, it is difficult to create sub-problems for all the ratings because the required percentage of ratings (1% to 50%) could not use for the higher-order matrix. For all the sub-problems, it is ensured that each user and item should have a minimum of two co-rated values. In each sub-problem, it is challenging to generate an exact 1% to 50% of ratings. To overcome this problem, we have an approximate percentage of rating values extracted. For each dataset, this results in lesser than 800 sub-problems, as shown in Table 6.
Table 7 provides the number of feasible sub-problems created for each dataset using the schema.
5.3 Performance criteria
The Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are most commonly used evaluation criteria to validate the CF-based RS performance. [45, 46]
MAE is the mean of absolute deviation between the actual and the predicted rating.
Where ‘n’ is the total number of users, ‘m’ is the total number of items. ‘\( {P}_{u_j,{I}_i} \)’ is the predicted rating for the jth user and ith item, \( {r}_{u_j,{I}_i} \)is the actual rating for the jth user and ith item, j = {1, 2, …. n}, i = {1, 2, …. . m}.
RMSE is the square root of the deviation between the actual and predicted rating. RMSE is also known as standard deviation of residuals or forecasting error. The formula is given below:
The lesser MAE and RMSE help to choose the better method. The predicted rating (\( {P}_{u_j}{,}_{I_i} \)) for the existing similarity measures such as PCC, COS, JACC, JMSD, PIP, NHSM, BCFcorr, BCFMed, RJACC, RJMSD, and SOQN are calculated by using Eq. (44). The predicted rating values (\( {P}_{u_j}{,}_{I_i} \)) for the proposed method, are calculated by using Eqs. (45–47). Finally, the predicted ratings are rounded off to the nearest integer for both the existing and proposed method.
5.4 Friedman rank test
Friedman rank test is a non-parametric test for finding differences in methods across multiple replications [47, 48]. This test does not assume the given data that comes from particular distribution (i.e., normal distributions). The Friedman values are calculated by using the given equation
Where Fr is the calculated Friedman value, \( {R}_y^2 \) is the squared average rank values of method ‘y’, (1 ≤ y ≤ l), ‘l’ is the total number of methods, ‘sp’ is the number of sub-problems generated, sp = {1, 2…. TS}, ‘TS’ is the total number of sub-problems generated. The hypotheses framed for this test are: Null hypothesis (H0) denotes there is no significant difference that occurs between the performance of the different methods, whereas Alternate hypothesis (H1) denotes there is significant difference that occurs between the performance of the different methods. [49]
5.5 McNemar’s test
McNemar’s test is a non-parametric test used to analyze the performance of the two methods has a statistically significant difference [50, 51]. The contingency table used to compare the two models is given in Table 8.
Where ‘A’ is the number of ratings correctly predicted by Methods 1 and 2, B is the number of ratings correctly predicted by Method 1 but incorrectly predicted by Method 2, C is the number of rating correctly predicted by Method 2 but incorrectly predicted by Method 1, D is the number of ratings incorrectly predicted by both methods.
The Null hypothesis (H0) for this χ2 test is the probabilities, Pr(B) and Pr(C) are equal. The alternative hypothesis (H1) is the performances of the two methods are not equal.
For clarity, a graphical representation is presented in Fig. 7 for the McNemer contingency table values. The A value is used to find out the number of ratings correctly predicted by method 1 and method 2. The D value is used to find out number of ratings that are incorrectly predicted by method 1 and method 2. The B and C values are used to find the best method. If B is greater than C value this indicates that method 2 performs better than method 1. Similarly, if C is greater than B value then, the performance of method 1 is better than that of method 2.
In addition to McNemer test, the precision, recall, and F1-measure are calculated. The measures are computed from the confusion matrix by converting the rating into a good or bad recommendation. The format of the confusion matrix used for computing precision, recall, and F1 measure is given in Table 9.
Where True Positive (TP) is the number of actual good ratings provided by the user and correctly predicted by the CF-based RS, False Negative (FN) is the number of actual good ratings predicted as bad ratings, False Positive (FP) is the number of actual bad ratings predicted as good ratings, True Negative (TN) is the number of actual bad ratings correctly predicted.
The rating values are classified into two groups i.e., good and bad rating. The rating lies between Rmed to Rmax which are treated as good ratings; similarly, the values which are less than Rmed are treated as bad ratings. For example, if rating scale is 1 to 5, where good rating values are 3,4, and 5, and bad ratings are 1 and 2. This similar process is carried out for various rating scale.
Precision is the ratio of the number of good ratings (i.e., relevant rating) recommendations made to the total number of recommendations.
A recall is the ratio of the number of good ratings recommended to the number of actual good ratings.
F1-Measure is the combination of precision and recall
The higher precision, Recall and F1-measure shows the better performance. The overall process of this comparison is shown in Fig. 8.
5.6 Results and discussion
The comparison of results obtained for the proposed framework and other methods using the performance criteria are discussed below.
The number of feasible sub-problems generated using the schema varies for each dataset. Each sub-problem, the conventional methods, and the proposed framework are adopted to predict the rating values. The MAE and RMSE values are calculated for all the sub-problems by using Eqs. (50) and (51), respectively. The set of MAE and RMSE values are generated for each dataset. From the set of MAE and RMSE values the minimum MAE and RMSE are chosen and listed in Tables 10 and 11.
Our proposed method provides lesser MAE for all the datasets. The second best results are obtained by SOQN for Epinions, FlimTrust, NetFlix, and ML1MB. For CiaoDVD and MovieTweet datasets RJMSD attains the next better solution. For ML100KB datasets NHSM provides the next better result. The percentage of improvement for our proposed method from the second best solution are 35.56% for Epinions; 45.21% for CiaoDVD; 33.8% for MovieTweet; 51.33% for FlimTrust; 46.03% for NetFlix, 47.78% for ML1MB and 47.47% for ML100KB datasets. The existing similarity measures with user-based prediction expression, the high variations exist between the actual and predicted ratings. This higher variations leads to maximum MAE values, which decreases the prediction quality. In our proposed approach, the variation is minimal, and the predicted ratings coincide with the actual rating for most cases; this improves the effectiveness of the CF-based RS.
The table values indicate that the standard deviation of the prediction error is minimum for the proposed method compared to the other methods. The percentage of improvement of our proposed method from the next lesser RMSE values is 42.16%, 41.62%, 42.50%, 61.54%, 48.59%, 44.49%, and 45.35% for Epinions, CiaoDVD, MovieTweet, FlimTrust, NetFlix, ML1MB and ML100KB respectively. The comparison of maximum MAE and RMSE values for all the datasets are arranged in Tables 12 and 13.
Tables 12 and 13 show the maximum MAE and RMSE chosen for each method. It is observed that the proposed method achieves more improved solutions than the conventional methods in terms of lesser MAE and RMSE values. Average of 46.71% of improved MAE and 45.18% of improved RMSE values are arrived at from the next best results. The comparison of average MAE and RMSE values are listed in Tables 14 and 15.
From Tables 14 and 15, it is noticed that among all the methods listed, the proposed method attains the smaller MAE and RMSE values for all the datasets. The proposed framework is the combination of MPIP similarity measure and CUIP prediction expression. In MPIP all the three components are converted into to range of 0 to 1 values, this converted value helps to find a better similarity value between users or items. This goes as an input for prediction expression; in the proposed method, the item related components and the user ‘j’ deviation is included; this leads to accurate rating prediction. For calculating MAE and RMSE the actual range of rating scales are used. When compared to the existing method, the predicted ratings of our proposed method are nearer to the actual ratings, which reduce the deviation between actual and predicted ratings.
The average MAE and RMSE values are computed by considering all the sub-problems. The percentage of improvement for the proposed method over other methods is calculated and listed in Table 16.
The results listed in Table 16 shows that the SOQN method holds the second position by obtaining lesser average MAE and RMSE for Epinions, CiaoDVD, and ML1MB datasets. Similarly, RJMSD provides the second best solution for FlimTrust and NetFlix datasets. PCC provides the next better solution for the MovieTweet dataset. BCFcorr provides the second best solution for the ML100KB dataset. Our proposed framework yields 53.81% and 51.60% of the average percentage of improved MAE and RMSE than the conventional similarity measures such as PCC, COS, JACC and JMSD. Similarly, the average percentage of improved MAE and RMSE for the proposed framework is 49.44% and 49.24% from specific similarity measures used for CF-based RS are PIP, NHSM, BCFcorr, BCFMed, RJACC, RJMSD, and SOQN. This improved solution enhances the accuracy of the CF-based RS. The above-mentioned tables indicate that the proposed method exhibits superiority over other methods.
The Friedman rank test is conducted to test whether the performance of all the methods is the same. The MAE values are computed for all the sub-problems. For each sub-problem (sp), rank the MAE values from 1 (best results) to ‘l’ (worst results). Fr values are calculated based on the squared average rank values of each method. Each dataset the Fr is calculated and listed in Table 17.
The calculated Fr values are higher than the χ2 table value, which is 19.67 for the degrees of freedom 11 with a significance level of 0.05. The p-values obtained from the Friedman test are nearer to zero, which strongly accepts the alternate hypothesis for all the datasets. The performance of all the methods is not equal. The comparison of average rank values for the conventional method and proposed method for all datasets are shown in Table 18.
The average rank for our proposed method is 1 for all the datasets. This infers that the proposed framework provide minimum MAE values for all the sub-problems generated by using the schema. Similarly, a comparison of Fr values for RMSE is listed in Table 19.
The p-values of Friedman rank test in Table 18 results in acceptance of alternate hypothesis for all the datasets, and hence it is revealed that the performance of all methods is not the same. The comparison of average rank values for obtaining using RMSE as a reference for all the dataset are listed in Table 20.
The rank values show that our proposed method has proved better than the existing methods. Compared to other methods, the proposed framework attains top rank for all the sub-problems. The proposed method yields a better solution for sparsity problems than the existing similarity measures with user-based prediction expression.
To further validate the proposed method, a McNemer test is conducted on the confusion matrix. The maximum and minimum MAE are chosen from the conventional methods for each dataset, and the corresponding confusion matrix is tested through a McNemer test. The proposed framework is treated as reference for conducting this test. The corresponding calculated χ2 values for all the datasets are listed in Table 21.
The calculated χ2 values are higher than the χ2 table value; this results in acceptance of alternate hypothesis for McNemer test. This shows that performance of existing method and proposed method are not equal. The values of B and C are the primary components required for conducting McNemer test. The McNemer test is conducted by considering our proposed framework as a reference. The B values are related to the number of rating incorrectly predicted by our proposed framework but correctly predicted by the existing method. Similarly, C is related to the number of rating correctly predicted by our proposed framework but incorrectly predicted by the existing method. The comparison of B and C values are listed in Table 22.
The comparison table (Table 22) results clearly show that C values are higher than the B values i.e., the number of ratings predicted by our proposed framework is higher than the other exiting methods.
The maximum MAE values are chosen from the feasible number of sub-problems generated for each dataset corresponding predicted values are considered for conducting the McNemer test by using a confusion matrix. The calculated χ2 and p-values are listed in Table 23.
The calculated χ2 value is greater than the table value, which agrees with the alternate hypothesis, i.e., a significance difference exists between the conventional and proposed methods. The proposed method has proved that it is better than the existing method for all the datasets. Table 24 is the comparison of B and C values for maximum MAE.
The B and C values comparison for a maximum of MAE values reveal that the C is greater than the B values for all the datasets. When compared to the existing method, our proposed method correctly predicts good and bad ratings.
The precision, recall, and F1-measure are calculated for each sub-problem. The summary of average precision, recall, and F1-measure for the conventional and proposed methods are reported in Tables 25, 26, 27.
The comparison table (Table 25) indicates that the average precision value is higher for our proposed framework. These results show that from the total predicted ratings most of the actual good rating items are correctly predicted by our proposed method.
The results arranged in Table 26 shows that our proposed method values are greater than the other methods. This indicates more number of actual good rating items are correctly predicted by our proposed method. Table 27 is the comparison of average F1 measure values for the conventional and proposed method.
The F1 measure includes both the precision and recall values. The proposed method attains better results for all the datasets. To validate the effectiveness of the proposed method the Friedman rank test is conducted on F1 -Measure values. The Fr values are listed in Table 28.
The calculated Frvalues confirm that the performance of all the methods is not equal. Also, the average rank using for Fr calculation is shown in Table 29.
Table 29 indicate that the proposed method provides higher F1-measure for all the feasible sub-problems. Furthermore, most of the actual good rating items are correctly recommended by the proposed method.
The main issue in CF-based RS is sparsity problem. A new framework is introduced to solve the sparsity problem and to enhance the prediction performance. The experiments are conducted on various level of sparse data. The MAE and RMSE results for our proposed approach is minimal compared to conventional similarity measures like PCC, COS, JACC and JMSD with user based prediction expression. The specific similarity methods used for CF-based RS are PIP, NHSM, BCFcorr, BCFMed, RJACC, RJMSD, SOQN with user based prediction expression also attains higher MAE and RMSE values than the proposed methods for all the datasets. This clearly shows that the predicted ratings by the proposed framework are closer to the actual ratings and this minimizes the error values. The average ranking value for MAE and RMSE of our proposed framework is 1, which denotes that for various sub-problems, our proposed method provides better prediction rating. The McNemer test results in acceptance of alternate hypothesis, which shows that existing and proposed methods performance are not same. Each conventional method is tested with our proposed method and in all the comparison, our proposed approach offers improved solution in terms of accurate prediction. Finally, the Precision, Recall, and F1-measure values for our proposed approach are above 0.8, higher than the other methods. The results obtained from the analysis indicate that our proposed framework improves the CF-based RS performance by correctly predicting more good ratings items and reducing the misclassification error. This improved prediction helps the company to identify and recommend products that are more relevant to online users.
6 Conclusion
CF-based RS depends on the similarity measures, of which PIP is one of the most popular techniques for calculating the similarity between the users. However, the ranges of proximity-impact-popularity values are wide and each component provides a different weight in different scenarios; i.e., the magnitude of each component differs. This is a serious limitation of existing PIP measures; therefore, we have developed a modified PIP similarity measure that provides a common value range between 0 and 1 for all the three components, resulting in equal priority. We have also developed a modified prediction expression to include the item-based average and weighted average deviation with user-based average and weighted average deviation in the expression to improve accuracy. Finally, a procedure for forecasting unrated items has been introduced to improve the recommendation performance. The proposed framework was tested by using various benchmark datasets, namely ML1MB, Netflix, ML100KB, CiaoDVD, Epinions, MovieTweet, and FlimTrust. The entire analysis was conducted for a different level of sparsity sub-problems generated from the user-item rating matrix. The various level of sparse data is created by varying the number of users, items, and ratings. This is done for all datasets. The results obtained from the proposed method are compared with those of conventional methods. The proposed framework provides better results for all datasets yielding lower MAE and RMSE than the existing methods. The McNemer test is conducted to validate the proposed method in terms of good and bad rating recommendations. This analysis results in the acceptance of the alternate hypothesis (i.e., the performances of the conventional and proposed methods are not equal). Also, precision, recall, and F1-measure are calculated to identify the best method based on the confusion matrix. The proposed method attains higher precision, recall, and F1-measure. Finally, a Friedman rank test was conducted on the MAE, RMSE, and F1 values. The statistical results show that our proposed framework outperformed the other methods.
References
Baker T, Mackay M, Randles M, Taleb-Bendiab A (2013) Intention-oriented programming support for runtime adaptive autonomic cloud-based applications. Comput Electr Eng 39:2400–2412. https://doi.org/10.1016/j.compeleceng.2013.04.019
Karam Y, Baker T, Taleb-Bendiab A (2012) Intention-oriented modelling support for socio-technical driven elastic cloud applications. In: 2012 international conference on innovations in information technology, IIT 2012
Baker T, Taleb-Bendiab A, Randles M (2009) Auditable intention-oriented web applications using PAA auditing/accounting paradigm. Front Artif Intell Appl. https://doi.org/10.3233/978-1-60750-052-0-61
Ozsoy MG, Polat F, Alhajj R (2016) Making recommendations by integrating information from multiple social networks. Appl Intell 45:1047–1065. https://doi.org/10.1007/s10489-016-0803-1
Zhang J, Lin Y, Lin M, Liu J (2016) An effective collaborative filtering algorithm based on user preference clustering. Appl Intell 45:230–240. https://doi.org/10.1007/s10489-015-0756-9
Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowledge-Based Syst. 46:109–132. https://doi.org/10.1016/j.knosys.2013.03.012
Jiang S, Fang SC, An Q, Lavery JE (2019) A sub-one quasi-norm-based similarity measure for collaborative filtering in recommender systems. Inf Sci (Ny) 487:142–155. https://doi.org/10.1016/j.ins.2019.03.011
Hwangbo H, Kim YS, Cha KJ (2018) Recommendation system development for fashion retail e-commerce. Electron Commer Res Appl 28:94–101. https://doi.org/10.1016/j.elerap.2018.01.012
Shi X, Luo X, Shang M, Gu L (2017) Long-term performance of collaborative filtering based recommenders in temporally evolving systems. Neurocomputing. 267:635–643. https://doi.org/10.1016/j.neucom.2017.06.026
Li Y, Lu L, Xuefeng L (2004) A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in E-commerce. Expert Syst Appl 28:67–77. https://doi.org/10.1016/j.eswa.2004.08.013
Cohen WW, Fan W (2000) Web-collaborative filtering: recommending music by crawling the web. Comput Netw 33:685–698. https://doi.org/10.1016/S1389-1286(00)00057-8
Wang Y, Deng J, Gao J, Zhang P (2017) A hybrid user similarity model for collaborative filtering. Inf Sci (Ny). 418-419:102–118. https://doi.org/10.1016/j.ins.2017.08.008
Bellogín A, Sánchez P (2017) Collaborative filtering based on subsequence matching: a new approach. Inf Sci (Ny). 418-419:432–446. https://doi.org/10.1016/j.ins.2017.08.016
Resnick P, Iacovou N, Suchak M, et al (1994) GroupLens: An open architecture for collaborative filtering of Netnews. In: CSCW
Shardanand U, Maes P (1995) Social information filtering: algorithms for automating ``word of mouth”. In: CHI ‘95: proceedings of the SIGCHI conference on human factors in computing systems
Bobadilla J, Hernando A, Ortega F, Gutiérrez A (2012) Collaborative filtering based on significances. Inf Sci (Ny) 185:1–17. https://doi.org/10.1016/j.ins.2011.09.014
Patra BK, Launonen R, Ollikainen V, Nandi S (2015) A new similarity measure using Bhattacharyya coefficient for collaborative filtering in sparse data. Knowledge-Based Syst. 82:163–177. https://doi.org/10.1016/j.knosys.2015.03.001
Ahn HJ (2008) A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem. Inf Sci (Ny) 178:37–51. https://doi.org/10.1016/j.ins.2007.07.024
Basu C, Hirsh H, Cohen W (1998) Recommendation as classification: using social and content-based information in recommendation. In Recommender Systems. Papers from 1998 Workshop. Technical Report WS-98-08. AAAI Press
Krulwich B, Burkey C (1996) Learning user information interests through the extraction of semantically significant phrases. In: Proceedings of the AAAI spring symposium on machine learning in information access
Lang K (1995) NewsWeeder : Learning to Filter Netnews ( To appear in ML 95 ). Proc 12th Int Mach Learn Conf
Sheugh L, Alizadeh SH (2015) A note on Pearson correlation coefficient as a metric of similarity in recommender system. In: 2015 AI and robotics, IRANOPEN 2015 - 5th conference on artificial intelligence and robotics
Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. Proc 10th …. https://doi.org/10.1145/371920.372071
Huang Z, Chen H, Zeng D (2004) Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Trans Inf Syst 22:116–142. https://doi.org/10.1145/963770.963775
Papagelis M, Plexousakis D, Kutsuras T (2005) Alleviating the Sparsity problem of collaborative filtering using trust inferences
Liu H, Hu Z, Mian A, Tian H, Zhu X (2014) A new user similarity model to improve the accuracy of collaborative filtering. Knowledge-Based Syst 56:156–166. https://doi.org/10.1016/j.knosys.2013.11.006
Kurdija AS, Silic M, Vladimir K, Delac G (2018) Efficient global correlation measures for a collaborative filtering dataset. Knowledge-Based Syst. 147:36–42. https://doi.org/10.1016/j.knosys.2018.02.013
Yu K, Schwaighofer A, Tresp V et al (2004) Probabilistic memory-based collaborative filtering. IEEE Trans Knowl Data Eng 16:56–69. https://doi.org/10.1109/TKDE.2004.1264822
Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17:734–749
Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Adv Artif Intell 2009:1–19. https://doi.org/10.1155/2009/421425
Bag S, Kumar SK, Tiwari MK (2019) An efficient recommendation generation using relevant Jaccard similarity. Inf Sci (Ny) 483:53–64. https://doi.org/10.1016/j.ins.2019.01.023
Guo G, Zhang J, Yorke-Smith N (2013) A novel bayesian similarity measure for recommender systems. In: IJCAI International Joint Conference on Artificial Intelligence
Luo X, Xia Y, Zhu Q (2012) Incremental collaborative filtering recommender based on regularized matrix factorization. Knowledge-Based Syst 27:271–280. https://doi.org/10.1016/j.knosys.2011.09.006
Bobadilla J, Serradilla F (2009) The effect of Sparsity on collaborative filtering metrics. In: Conferences in Research and Practice in Information Technology Series
Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22:5–53. https://doi.org/10.1145/963770.963772
Tan Z, He L (2017) An efficient similarity measure for user-based collaborative filtering recommender systems inspired by the physical resonance principle. IEEE Access 5:27211–27228. https://doi.org/10.1109/ACCESS.2017.2778424
Abualigah LMQ, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73:4773–4795. https://doi.org/10.1007/s11227-017-2046-2
Abualigah L. M. Q (2019) Feature selection and enhanced krill herd algorithm for text document Clusteringitle, studies in. Springer International Publishing, Switzerland AG
Abualigah LMQ, Hanandeh SE (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5:19–28. https://doi.org/10.5121/ijcsea.2015.5102
Saranya KG, Sudha Sadasivam G (2017) Modified heuristic similarity measure for personalization using collaborative filtering technique. Appl Math Inf Sci 11:307–315. https://doi.org/10.18576/amis/110137
Konstan JA, Miller BN, Maltz D, Herlocker JL, Gordon LR, Riedl J (1997) GroupLens: applying collaborative filtering to Usenet news. Commun ACM 40:77–87. https://doi.org/10.1145/245108.245126
Sarwar BM, Karypis G, Konstan J, Riedl J (2002) Recommender systems for large-scale e-commerce: Scalable neighborhood formation using clustering. In Proceedings of the fifth international conference on computer and information technology (Vol. 1, pp. 291–324)
Jamalzehi S, Menhaj MB (2016) A new similarity measure based on item proximity and closeness for collaborative filtering recommendation. In: 2016 4th international conference on control, instrumentation, and automation, ICCIA 2016
Ma H, King I, Lyu MR (2008) Effective missing data prediction for collaborative filtering
Chen Y, Wu C, Xie M, Guo X (2011) Solving the sparsity problem in recommender systems using association retrieval. J Comput https://doi.org/10.4304/jcp.6.9.1896-1902, 6
Singh S, Bag S, Jenamani M (2016) Relative similarity based approach for improving aggregate recommendation diversity. In: 12th IEEE international conference electronics, energy, environment, communication, computer, control: (E3-C3), INDICON 2015
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32:675–701. https://doi.org/10.1080/01621459.1937.10503522
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1:3–18. https://doi.org/10.1016/j.swevo.2011.02.002
Luo J, Liu Z (2020) Novel grey wolf optimization based on modified differential evolution for numerical function optimization. Appl Intell 50:468–486. https://doi.org/10.1007/s10489-019-01521-5
Roggo Y, Duponchel L, Huvenne JP (2003) Comparison of supervised pattern recognition methods with McNemar’s statistical test: application to qualitative analysis of sugar beet by near-infrared spectroscopy. Anal Chim Acta 477:187–200. https://doi.org/10.1016/S0003-2670(02)01422-8
Roggo Y, Duponchel L, Ruckebusch C, Huvenne JP (2003) Statistical tests for comparison of quantitative and qualitative models developed with near infrared spectral data. J Mol Struct 654:253–262. https://doi.org/10.1016/S0022-2860(03)00248-5
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Manochandar, S., Punniyamoorthy, M. A new user similarity measure in a new prediction model for collaborative filtering. Appl Intell 51, 586–615 (2021). https://doi.org/10.1007/s10489-020-01811-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01811-3