Keywords

1 Introduction

During last decade, information is expanding tremendously. Instead of helping the users, this great amount of information caused the problem of information overload. To handle this explosive growth of information, a personalization tool is needed that can assist a user to get the valid and appropriate information. Recommender system (RS) is one of the most successful personalization tools that guides a user to select an appropriate item from a large set of alternatives [1, 2].

Generally, recommender system employs three major filtering techniques, namely, collaborative filtering (CF), content-based filtering (CBF) and hybrid filtering (HF). Among these techniques, collaborative filtering is widely used in the recommender system. Most of the existing RSs are based on the single criterion collaborative filtering [3, 4], In a single criterion CF, only overall rating of item is considered, but the overall rating of an item depends on the different criteria. So instead of considering only single criterion, multiple criteria are should be used in multi-criteria CF [3, 5]. In heuristic approaches of MCCF, all criteria have same priorities, but this is not an optimal scenario because different users have different priorities on various criteria, so in [3, 9], it was suggested that weights on these criteria can be computed using either some machine learning techniques or any appropriate statistical techniques. Based on the above discussion, the contributions of our paper can be summarized as follows:

  • First of all, we propose the use of multi linear regression approach for deriving the individual weight for each criterion.

  • Second, we aggregate similarities and ratings for different criteria using these weights.

  • Third, we perform rigorous experiments, on very popular and large Yahoo movie dataset by varying the number of users and compare our approach with various benchmark algorithms for single criterion and multi-criteria CF.

The rest of this paper is organized as follows: Sect. 2 describes background related to MCCF and multi linear regression. In Sect. 3, we have discussed proposed approach. Section 4 shows experimental evaluation of our proposed approach. Finally, last Section provides some concluding remarks.

2 Background and Related Work

This section briefly describes collaborative filtering for multi-criteria and multi linear regression.

2.1 Multi-criteria Recommender System

In multi-criteria RS, user rates various criteria of an item. The complete process of multi-criteria CF can be summarized into the following three phases:

  • Phase 1 (Similarity computation): In this phase, first multi-criteria data set is divided into k single criterion datasets (where k is the number of criteria) and then similarities are computed for each criterion separately using some similarity measures like Pearson correlation and cosine similarity [3]. Now, overall similarity is computed using any aggregation function [10, 11] which is expressed as follows :

    $$\begin{aligned} Sim_{aggregate}(u,u')=\sum _{c=0}^{k}w_csim_c(u,u') \end{aligned}$$
    (1)

    where, \(w_c\) is the weight of each criterion. In the above equation, if weights are same for all criteria, like \(w_1=w_2=w_3=....=w_k\) then aggregation function is similar to the average of similarities [3]. But this technique is not appropriate for aggregation because weights may be different for each criterion and it is a challenge to find these weights. Therefore, we use multi linear regression for computing these weights for various criterion.

  • Phase 2 (Neighborhood generation): After computing similarities between active user and remaining users, neighborhood set is formed as a collection of similar users either using nearest neighbor approach (Top N users) or threshold based approach.

  • Phase 3 (Prediction of unknown rating): In this phase, unknown rating is predicted for each criterion separately using following prediction function [7, 12]:

    $$\begin{aligned} r^p_{u,i}=\frac{1}{\sum _{u' \in U_{t}}|Sim(u,u')|}\sum _{u' \in U_t}Sim(u,u')\times r_{u',i} \end{aligned}$$
    (2)

    Now, these ratings are aggregated and overall rating is predicted for the users [12, 13].

2.2 Multi Linear Regression

Linear regression is a statistical technique for finding the relationship between a dependent variable Y and independent variable X [14, 15]. If independent variable is one then it is called simple linear regression and in case of more than one independent variables it is known as multi linear regression. Multi linear regression can be represented as follows:

$$\begin{aligned} Y=w_0+w_1x_1+w_2x_2+...+w_kx_k \end{aligned}$$
(3)

where, Y is called as dependent variables and \(x_1,x_2,...,x_k\) are independent variables. \(w_0, w_1,w_2,...,w_k\) are the weight parameters corresponding to independent variables which are computed on the basis of some observations. In proposed approach, multi linear regression is used to find the weights for different criteria.

3 Proposed Recommendation Approach

This section describes the proposed multi-criteria recommender system utilizing the concept of multi linear regression. Multi linear regression is used to aggregate the similarities and to find the overall ratings by using weights for each criterion. Before presenting our proposed approach, we discuss about the inputs required for our system. For multi-criteria RS, Let \( U=\{u_1,u_2,u_3,...,u_n\}\) be the set of n users , \( I=\{i_1,i_2,i_3,...,i_m\}\) is the set of m items. and \( C=\{c_1,c_2,c_3,...,c_k\}\) is the set of k criteria. The rating vectors for user u to item i is represented as \(R(u,i)=(r_{u,i}^0,r_{u,i}^1,r_{u,i}^2,....,r_{u,i}^k)\), which consists of an overall rating \(r_{u,i}^0\), and k multi-criteria ratings \(r_{u,i}^1,r_{u,i}^2,....,r_{u,i}^k\). Our proposed system has following three phases:

Phase 1: Multi-linear regression based similarity computation

Phase 2: Neighborhood generation

Phase 3: Multi-linear regression approach to prediction

  • Phase 1. Multi-linear regression based similarity computation:

    In proposed multi-criteria RS, following two steps are required for similarity computation.

    • Step 1 (Similarity computation for each criterion): In this step, multi-criteria ratings are divided into k single criteria ratings and then similarities are estimated between user u and u’ is computed as follows:

      $$\begin{aligned} sim^c(u,u')=\frac{\sum _{i \in I}(r_{u,i}^c - \bar{r}_u^c) (r_{u',i}^c - \bar{r}_{u'}^c)}{\sqrt{\sum _{i \in I}{(r_{u,i}^c - \bar{r}_u^c)}^2}\sqrt{\sum _{i \in I}{(r_{u',i}^c - \bar{r}_u'^c)}^2}} \end{aligned}$$
      (4)

      where c represents the different criteria, i.e., \(c=\{1,2,3,...,k\}\).

    • Step 2 (Aggregation of similarities): In this step, overall similarity is computed using following equation:

      $$\begin{aligned} sim(u,u')=w_0+\sum _{c \in \{1,...,k\}}w_csim^c(u,u') \end{aligned}$$
      (5)

      where, \(sim^c(u,u')\) is the similarity between user u and \(u' \in U\) for criteria \(c \in \{1,...,k\}\), \(w_c\) is the weight parameter for criteria \(c \in \{1,...,k\}\) and \(w_0\) is the error term.

      Using multi-linear regression, weight parameters are estimated on the basis of previously rated item by users which is called training data. Table 1 represents the training data.

      Table 1. Presentation of training data

      where, \(C_1,C_2,...,C_k\) are different single criteria ratings and \(C_0\) is the overall rating. \(r_{k,i}\) is the rating of \(i^{th}\); \( i\in \{1,2,...,n\}\) training data for criteria k. Based on the training data weight values are derived using following equation in matrix form [5]:

      $$\begin{aligned} \begin{bmatrix} w_1\\ \vdots \\ w_k \end{bmatrix} = \begin{bmatrix} \sum \nolimits _{i}u_{1,i}^2&\dots&\sum \nolimits _{i}u_{1,i}u_{k,i} \\ \vdots&\ddots&\vdots \\ \sum \nolimits _{i}u_{1,i}u_{k,i}&\dots&\sum \nolimits _{i}u_{k,i}^2\\ \end{bmatrix}^{-1} \begin{bmatrix} \sum \nolimits _{i}u_{1,i}v_{i}\\ \vdots \\ \sum \nolimits _{i}u_{k,i}v_{i} \end{bmatrix} \end{aligned}$$
      (6)

      where,

      $$\begin{aligned} \sum _{i \in \{1,...,n\}}u_{j,i}u_{k,i}=\sum _{i \in \{1,...,n\}}r_{j,i}r_{k,i} -\frac{\sum _{i \in \{1,...,n\}}r_{j,i}\sum _{i \in \{1,...,n\}}r_{k,i}}{n} \end{aligned}$$
      (7)
      $$\begin{aligned} \sum _{i \in \{1,...,n\}}u_{j,i}v_{i}=\sum _{i \in \{1,...,n\}}r_{j,i}r_{0,i} -\frac{\sum _{i \in \{1,...,n\}}r_{j,i}\sum _{i \in \{1,...,n\}}r_{0,i}}{n} \end{aligned}$$
      (8)

      here, n is total number of samples in training data and \(j \in \{1,2,...,k\}\). \(w_0\) is called the error term which is computed as follows.

      $$\begin{aligned} w_0=\bar{r_0}-w_1\bar{r_1}-w_2\bar{r_2}-...-w_k\bar{r_k} \end{aligned}$$
      (9)

      By applying these weight values in Eq. (5) overall similarity is calculated.

  • Phase 2. Neighborhood generation:

    This phase is similar to the phase 2 of MCCF.

  • Phase 3. Multi linear regression approach to prediction:

    In this phase, we predict the overall rating using Eq. (2). In this equation, the overall rating of an item given by these nearest neighbors is utilized. The important task in this phase is to compute the overall rating of an item through its criteria ratings. We have employed again a linear regression approach to aggregate the criteria ratings. The aggregation function for this task is expressed as follows:

    $$\begin{aligned} r(u,u')=w_0+\sum _{c \in \{1,...,k\}}w_cr_c \end{aligned}$$
    (10)

    where, \(r_c\) represents the rating for criteria \( c \in \{1,...,k\}\), \(w_c\) is the weight parameter for criteria \(c \in \{1,...,k\}\) and \(w_0\) is the error term. these weights are calculates using Eq. (6) and then we compute overall rating. After finding overall rating, we have used Eq. (2) for predicting unknown rating to an active user. Finally, we have recommended highly some predicted items to users.

4 Experiments and Results

We performed various experiments to analyze the effectiveness of the proposed multi-criteria recommender system using Yahoo movie dataset, which consists of 6078 users and 976 items. Each item has five different criteria from which four are individual features and fifth is the overall rating. For experiments, 10 fold cross-validation mechanism is used. In each fold, 60 % data of each user is considered as training data and 40 % data is used as test data. Training data is used to learn the system and test data is used to analyze the performance of the system. In order to evaluate the performance of our proposed system, we have used mean absolute error (MAE), coverage, recall and f-measure as evaluation metrices:

Fig. 1.
figure 1

MAE comparison on different number of neighbors

To demonstrate the feasibility and effectiveness of proposed system we have compared our results with the following approaches:

  • Single criterion CF (SCCF)

  • Multi-criteria collaborative filtering using average similarity and ratings (MCCF-A)

  • Multi-criteria collaborative filtering using average similarity&overall rating (MCCF-AO)

  • Multi-criteria collaborative filtering using minimum similarity&average rating (MCCF-MA)

  • Multi-criteria collaborative filtering using minimum similarity&overall rating (MCCF-MO).

Table 2. Performance comparison via MAE, coverage, recall, f-measure
Table 3. Performance comparison of the proposed approach with other approach for different number of users
Fig. 2.
figure 2

F-measure comparison on different number of neighbors

4.1 Experiment 1

In this experiment, we calculate the predictive and classification accuracy of proposed approach via MAE, coverage, recall and f-measure. Table 2. presents results for these measures by taking 30 % most similar user as neighbors and shows that our proposed approach outperformed in terms these measure. Figs. 1 and 2, show the results for different percentages of users ( 10 %, 20 %, 30 % and 40 %) on MAE and f-measure. It reveals that proposed approach has minimum MAE and maximum f-measure.

4.2 Experiment 2

This experiment reflects the scalability of proposed approach. For this experiments we choose six different subsets of Yahoo movie dataset, called Y_1000, Y_2000, Y_3000, Y_4000, Y_5000, Y_6078. Table 3. depicts the effectiveness of proposed approach under varying number of participating users. Fig. 3 depicts the results of F-measure for different scheme on different subset of dataset.

Fig. 3.
figure 3

F-measure comparison for different users (Color figure online)

5 Conclusion

In this work, we have presented linear regression based multi-criteria recommender system (MCRS) where linear regression is used to aggregate similarity components on various criteria and to compute overall rating. Generally different users have different priorities on various criteria where they evaluate these criteria based on their perceptions. The aggregation of similarities based on each criterion is quite challenging task in the area of MCRS because the used weights in aggregation task are not optimal. We have used linear regression approach to compute these weights optimally. Experimental results on a popular Yahoo dataset demonstrated that the adoption of linear regression approach in MCRS has produced quality recommendation and established that our proposed approach outperformed other heuristic approaches.

In our future work, we are planning to handle uncertainty associated with user preferences using fuzzy sets [16] and we will explore some new methods for dealing with correlation based similarity problems.