A Linear Regression Approach to Multi-criteria Recommender System

Jhalani, Tanisha; Kant, Vibhor; Dwivedi, Pragya

doi:10.1007/978-3-319-40973-3_23

Tanisha Jhalani¹⁵,
Vibhor Kant¹⁵ &
Pragya Dwivedi¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9714))

Included in the following conference series:

International Conference on Data Mining and Big Data

3204 Accesses
19 Citations

Abstract

Recommender system (RS) is a web personalization tool for recommending appropriate items to users based on their preferences from a large set of available items. Collaborative filtering (CF) is the most popular technique for recommending items based on the preferences of similar users. Most of the CF based RSs work only on the overall rating of the items, however, the overall rating is not a good representative of user preferences for an item. Our work in this paper, is an attempt towards incorporating of various criteria ratings into CF i.e., multi-criteria CF, for enhancing its accuracy through multi-linear regression. We suggest the use of multi-linear regression for determining the weights of individual criterion and computing the overall ratings of each item. Experimental results reveal that the proposed approach outperforms the classical approaches.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Matrix Factorization and Regression-Based Approach for Multi-Criteria Recommender System

A Robust Multi-criteria Recommendation Approach with Preference-Based Similarity and Support Vector Machine

Enhanced multi-criteria recommender system based on fuzzy Bayesian approach

Article 21 June 2017

Keywords

1 Introduction

During last decade, information is expanding tremendously. Instead of helping the users, this great amount of information caused the problem of information overload. To handle this explosive growth of information, a personalization tool is needed that can assist a user to get the valid and appropriate information. Recommender system (RS) is one of the most successful personalization tools that guides a user to select an appropriate item from a large set of alternatives [1, 2].

Generally, recommender system employs three major filtering techniques, namely, collaborative filtering (CF), content-based filtering (CBF) and hybrid filtering (HF). Among these techniques, collaborative filtering is widely used in the recommender system. Most of the existing RSs are based on the single criterion collaborative filtering [3, 4], In a single criterion CF, only overall rating of item is considered, but the overall rating of an item depends on the different criteria. So instead of considering only single criterion, multiple criteria are should be used in multi-criteria CF [3, 5]. In heuristic approaches of MCCF, all criteria have same priorities, but this is not an optimal scenario because different users have different priorities on various criteria, so in [3, 9], it was suggested that weights on these criteria can be computed using either some machine learning techniques or any appropriate statistical techniques. Based on the above discussion, the contributions of our paper can be summarized as follows:

First of all, we propose the use of multi linear regression approach for deriving the individual weight for each criterion.
Second, we aggregate similarities and ratings for different criteria using these weights.
Third, we perform rigorous experiments, on very popular and large Yahoo movie dataset by varying the number of users and compare our approach with various benchmark algorithms for single criterion and multi-criteria CF.

The rest of this paper is organized as follows: Sect. 2 describes background related to MCCF and multi linear regression. In Sect. 3, we have discussed proposed approach. Section 4 shows experimental evaluation of our proposed approach. Finally, last Section provides some concluding remarks.

2 Background and Related Work

This section briefly describes collaborative filtering for multi-criteria and multi linear regression.

2.1 Multi-criteria Recommender System

In multi-criteria RS, user rates various criteria of an item. The complete process of multi-criteria CF can be summarized into the following three phases:

Phase 1 (Similarity computation): In this phase, first multi-criteria data set is divided into k single criterion datasets (where k is the number of criteria) and then similarities are computed for each criterion separately using some similarity measures like Pearson correlation and cosine similarity [3]. Now, overall similarity is computed using any aggregation function [10, 11] which is expressed as follows :
$$\begin{aligned} Sim_{aggregate}(u,u')=\sum _{c=0}^{k}w_csim_c(u,u') \end{aligned}$$
(1)
where, $w_c$ is the weight of each criterion. In the above equation, if weights are same for all criteria, like $w_1=w_2=w_3=....=w_k$ then aggregation function is similar to the average of similarities [3]. But this technique is not appropriate for aggregation because weights may be different for each criterion and it is a challenge to find these weights. Therefore, we use multi linear regression for computing these weights for various criterion.
Phase 2 (Neighborhood generation): After computing similarities between active user and remaining users, neighborhood set is formed as a collection of similar users either using nearest neighbor approach (Top N users) or threshold based approach.
Phase 3 (Prediction of unknown rating): In this phase, unknown rating is predicted for each criterion separately using following prediction function [7, 12]:
$$\begin{aligned} r^p_{u,i}=\frac{1}{\sum _{u' \in U_{t}}|Sim(u,u')|}\sum _{u' \in U_t}Sim(u,u')\times r_{u',i} \end{aligned}$$
(2)
Now, these ratings are aggregated and overall rating is predicted for the users [12, 13].

2.2 Multi Linear Regression

Linear regression is a statistical technique for finding the relationship between a dependent variable Y and independent variable X [14, 15]. If independent variable is one then it is called simple linear regression and in case of more than one independent variables it is known as multi linear regression. Multi linear regression can be represented as follows:

$$\begin{aligned} Y=w_0+w_1x_1+w_2x_2+...+w_kx_k \end{aligned}$$

(3)

where, Y is called as dependent variables and $x_1,x_2,...,x_k$ are independent variables. $w_0, w_1,w_2,...,w_k$ are the weight parameters corresponding to independent variables which are computed on the basis of some observations. In proposed approach, multi linear regression is used to find the weights for different criteria.

3 Proposed Recommendation Approach

This section describes the proposed multi-criteria recommender system utilizing the concept of multi linear regression. Multi linear regression is used to aggregate the similarities and to find the overall ratings by using weights for each criterion. Before presenting our proposed approach, we discuss about the inputs required for our system. For multi-criteria RS, Let $ U=\{u_1,u_2,u_3,...,u_n\}$ be the set of n users , $ I=\{i_1,i_2,i_3,...,i_m\}$ is the set of m items. and $ C=\{c_1,c_2,c_3,...,c_k\}$ is the set of k criteria. The rating vectors for user u to item i is represented as $R(u,i)=(r_{u,i}^0,r_{u,i}^1,r_{u,i}^2,....,r_{u,i}^k)$, which consists of an overall rating $r_{u,i}^0$, and k multi-criteria ratings $r_{u,i}^1,r_{u,i}^2,....,r_{u,i}^k$. Our proposed system has following three phases:

Phase 1: Multi-linear regression based similarity computation

Phase 2: Neighborhood generation

Phase 3: Multi-linear regression approach to prediction

Phase 1. Multi-linear regression based similarity computation:

In proposed multi-criteria RS, following two steps are required for similarity computation.
- Step 1 (Similarity computation for each criterion): In this step, multi-criteria ratings are divided into k single criteria ratings and then similarities are estimated between user u and u’ is computed as follows:
  $$\begin{aligned} sim^c(u,u')=\frac{\sum _{i \in I}(r_{u,i}^c - \bar{r}_u^c) (r_{u',i}^c - \bar{r}_{u'}^c)}{\sqrt{\sum _{i \in I}{(r_{u,i}^c - \bar{r}_u^c)}^2}\sqrt{\sum _{i \in I}{(r_{u',i}^c - \bar{r}_u'^c)}^2}} \end{aligned}$$
  (4)
  where c represents the different criteria, i.e., $c=\{1,2,3,...,k\}$.
- Step 2 (Aggregation of similarities): In this step, overall similarity is computed using following equation:
  $$\begin{aligned} sim(u,u')=w_0+\sum _{c \in \{1,...,k\}}w_csim^c(u,u') \end{aligned}$$
  (5)
  where, $sim^c(u,u')$ is the similarity between user u and $u' \in U$ for criteria $c \in \{1,...,k\}$, $w_c$ is the weight parameter for criteria $c \in \{1,...,k\}$ and $w_0$ is the error term.
  
  Using multi-linear regression, weight parameters are estimated on the basis of previously rated item by users which is called training data. Table 1 represents the training data.
  Table 1. Presentation of training data
  Full size table
  where, $C_1,C_2,...,C_k$ are different single criteria ratings and $C_0$ is the overall rating. $r_{k,i}$ is the rating of $i^{th}$; $ i\in \{1,2,...,n\}$ training data for criteria k. Based on the training data weight values are derived using following equation in matrix form [5]:
  $$\begin{aligned} \begin{bmatrix} w_1\\ \vdots \\ w_k \end{bmatrix} = \begin{bmatrix} \sum \nolimits _{i}u_{1,i}^2&\dots&\sum \nolimits _{i}u_{1,i}u_{k,i} \\ \vdots&\ddots&\vdots \\ \sum \nolimits _{i}u_{1,i}u_{k,i}&\dots&\sum \nolimits _{i}u_{k,i}^2\\ \end{bmatrix}^{-1} \begin{bmatrix} \sum \nolimits _{i}u_{1,i}v_{i}\\ \vdots \\ \sum \nolimits _{i}u_{k,i}v_{i} \end{bmatrix} \end{aligned}$$
  (6)
  where,
  $$\begin{aligned} \sum _{i \in \{1,...,n\}}u_{j,i}u_{k,i}=\sum _{i \in \{1,...,n\}}r_{j,i}r_{k,i} -\frac{\sum _{i \in \{1,...,n\}}r_{j,i}\sum _{i \in \{1,...,n\}}r_{k,i}}{n} \end{aligned}$$
  (7)
  
  $$\begin{aligned} \sum _{i \in \{1,...,n\}}u_{j,i}v_{i}=\sum _{i \in \{1,...,n\}}r_{j,i}r_{0,i} -\frac{\sum _{i \in \{1,...,n\}}r_{j,i}\sum _{i \in \{1,...,n\}}r_{0,i}}{n} \end{aligned}$$
  (8)
  here, n is total number of samples in training data and $j \in \{1,2,...,k\}$. $w_0$ is called the error term which is computed as follows.
  $$\begin{aligned} w_0=\bar{r_0}-w_1\bar{r_1}-w_2\bar{r_2}-...-w_k\bar{r_k} \end{aligned}$$
  (9)
  By applying these weight values in Eq. (5) overall similarity is calculated.
Phase 2. Neighborhood generation:

This phase is similar to the phase 2 of MCCF.
Phase 3. Multi linear regression approach to prediction:

In this phase, we predict the overall rating using Eq. (2). In this equation, the overall rating of an item given by these nearest neighbors is utilized. The important task in this phase is to compute the overall rating of an item through its criteria ratings. We have employed again a linear regression approach to aggregate the criteria ratings. The aggregation function for this task is expressed as follows:
$$\begin{aligned} r(u,u')=w_0+\sum _{c \in \{1,...,k\}}w_cr_c \end{aligned}$$
(10)
where, $r_c$ represents the rating for criteria $ c \in \{1,...,k\}$, $w_c$ is the weight parameter for criteria $c \in \{1,...,k\}$ and $w_0$ is the error term. these weights are calculates using Eq. (6) and then we compute overall rating. After finding overall rating, we have used Eq. (2) for predicting unknown rating to an active user. Finally, we have recommended highly some predicted items to users.

4 Experiments and Results

We performed various experiments to analyze the effectiveness of the proposed multi-criteria recommender system using Yahoo movie dataset, which consists of 6078 users and 976 items. Each item has five different criteria from which four are individual features and fifth is the overall rating. For experiments, 10 fold cross-validation mechanism is used. In each fold, 60 % data of each user is considered as training data and 40 % data is used as test data. Training data is used to learn the system and test data is used to analyze the performance of the system. In order to evaluate the performance of our proposed system, we have used mean absolute error (MAE), coverage, recall and f-measure as evaluation metrices:

To demonstrate the feasibility and effectiveness of proposed system we have compared our results with the following approaches:

Single criterion CF (SCCF)
Multi-criteria collaborative filtering using average similarity and ratings (MCCF-A)
Multi-criteria collaborative filtering using average similarity&overall rating (MCCF-AO)
Multi-criteria collaborative filtering using minimum similarity&average rating (MCCF-MA)
Multi-criteria collaborative filtering using minimum similarity&overall rating (MCCF-MO).

Table 2. Performance comparison via MAE, coverage, recall, f-measure

Full size table

Table 3. Performance comparison of the proposed approach with other approach for different number of users

Full size table

4.1 Experiment 1

In this experiment, we calculate the predictive and classification accuracy of proposed approach via MAE, coverage, recall and f-measure. Table 2. presents results for these measures by taking 30 % most similar user as neighbors and shows that our proposed approach outperformed in terms these measure. Figs. 1 and 2, show the results for different percentages of users ( 10 %, 20 %, 30 % and 40 %) on MAE and f-measure. It reveals that proposed approach has minimum MAE and maximum f-measure.

4.2 Experiment 2

This experiment reflects the scalability of proposed approach. For this experiments we choose six different subsets of Yahoo movie dataset, called Y_1000, Y_2000, Y_3000, Y_4000, Y_5000, Y_6078. Table 3. depicts the effectiveness of proposed approach under varying number of participating users. Fig. 3 depicts the results of F-measure for different scheme on different subset of dataset.

5 Conclusion

In this work, we have presented linear regression based multi-criteria recommender system (MCRS) where linear regression is used to aggregate similarity components on various criteria and to compute overall rating. Generally different users have different priorities on various criteria where they evaluate these criteria based on their perceptions. The aggregation of similarities based on each criterion is quite challenging task in the area of MCRS because the used weights in aggregation task are not optimal. We have used linear regression approach to compute these weights optimally. Experimental results on a popular Yahoo dataset demonstrated that the adoption of linear regression approach in MCRS has produced quality recommendation and established that our proposed approach outperformed other heuristic approaches.

In our future work, we are planning to handle uncertainty associated with user preferences using fuzzy sets [16] and we will explore some new methods for dealing with correlation based similarity problems.

References

Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Engg. 17(6), 734–749 (2005)
Article Google Scholar
Bobadilla, J., Ortega, F., Hernando, A., Gutirrez, A.: Recommender systems survey. Knowl. Based Syst. 46, 109–132 (2013)
Article Google Scholar
Adomavicius, G., Kwon, Y.: New recommendation techniques for multicriteria rating systems. IEEE Int. Syst. 22(3), 48–55 (2007)
Article Google Scholar
Soboroff, I., Nicholas, C.: Combining Content and Collaboration in Text Filtering. In: International Joint Conference on Artificial Intelligence, pp. 86–92 (1999)
Google Scholar
Balabanovi, M., Shoham, Y.: Fab: content-based collaborative recommendation. ACM Comm. 40(3), 66–72 (1997)
Article Google Scholar
Kant, V.: A user-oriented content based recommender system based on reclusive methods and interactive genetic algorithm. In: Bansal, J.C., Singh, P.K., Deep, K., Pant, M., Nagar, A.K. (eds.) Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012). Advances in Intelligent Systems and Computing, vol. 201, pp. 543–554. Springer, India (2013)
Chapter Google Scholar
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: GroupLens: an open architecture for collaborative filtering of netnews. In: ACM Conference on Computer Supported Cooperative Work, pp. 175–186. ACM (1994)
Google Scholar
Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: 14th Conference on Uncertainty in Artificial Intelligence, pp. 43–52. Morgan Kaufmann Publishers Inc., San Francisco (1998)
Google Scholar
Al-Shamri, M.Y.H., Bharadwaj, K.K.: Fuzzy-genetic approach to recommender systems based on a novel hybrid user model. Expert Syst. Appl. 35(3), 1386–1399 (2008)
Article Google Scholar
Delgado, J., Ishii, N.: Memory-based weighted majority prediction. In: SIGIR Workshop on Recommender System. Citeseer (1999)
Google Scholar
Jannach, D., Karakaya, Z., Gedikli, F.: Accuracy improvements for multi-criteria recommender systems. In: 13th ACM Conference on Electronic Commerce, pp. 674–689. ACM (2012)
Google Scholar
Winarko, E., Hartati, S., Wardoyo, R.: Improving the prediction accuracy of multi-criteria collaborative filtering by combination algorithms. Int. J. Adv. Comput. Sci. App. 52(4), 52–58 (2014)
Google Scholar
Bilge, A., Kaleli, C.: A multi-criteria item-based collaborative filtering framework. In: 11th International Joint Conference on Computer Science and Software Engineering, pp. 18–22. IEEE (2014)
Google Scholar
Agarwal, B., L.: Basic Statistics. New Age International (2006)
Google Scholar
Kutner, M.H.: Applied Linear Statistical Models, vol. 4. Irwin, Chicago (1996)
Google Scholar
Kant, V., Bharadwaj, K.: Integrating collaborative and reclusive methods for effective recommendations: a fuzzy bayesian approach. Int. J. Int. Syst. 28(11), 1099–1123 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The LNMIIT, Jaipur, 302031, India
Tanisha Jhalani & Vibhor Kant
MNNIT Allahbad, Allahbad, 211004, India
Pragya Dwivedi

Authors

Tanisha Jhalani
View author publications
You can also search for this author in PubMed Google Scholar
Vibhor Kant
View author publications
You can also search for this author in PubMed Google Scholar
Pragya Dwivedi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vibhor Kant .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Xi'an Jiaotong-Liverpool University, Suzhou, China
Yuhui Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jhalani, T., Kant, V., Dwivedi, P. (2016). A Linear Regression Approach to Multi-criteria Recommender System. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2016. Lecture Notes in Computer Science(), vol 9714. Springer, Cham. https://doi.org/10.1007/978-3-319-40973-3_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-40973-3_23
Published: 14 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40972-6
Online ISBN: 978-3-319-40973-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics