1 Introduction

The concept of trust is fundamental to the existence of online social networks as their success depends on the level of trust members have on each other, on the social network service platform, and on the service provider (Sherchan et al. 2013). Lack of trust in online social networks might prevent users from sharing personal information due to privacy concerns and expressing their opinions and ideas freely (Dwyer et al. 2007; Young and Quan-Haase 2009). The study of trust in online social network has led to a new research area, that is, trust aware recommender systems. Trust information is used to improve the quality of recommendations (Bobadilla et al. 2013).

Over the years, researchers have proposed well-formulated trust metrics to quantify and ascertain trust. Trust metrics can be explicit or implicit. Explicit trust metrics are directly expressed by users. Several research works on trust driven recommender systems focus on explicit trust as most social networks allow users to express explicit connections such as friends and followers, who are considered to be trusted users. Datasets with explicit trust are easily available (Zhou et al. 2014). However, there are certain problems with explicit trust networks. Explicit trust relations are very sparse (Guha et al. 2004; Ugander et al. 2011). Most social networks define only binary trust relations (trust or no trust). They generally allow only single dimensional trust (not context based). Further, they do not distinguish between high- and low-quality relationships (Huberman et al. 2009).

In contrast to explicit trust, implicit trust metrics are derived from actual user activity and interaction between users on the social network. They are calculated by some third party or computing entity by judging the similarity in user profiles, their online behavioral features, and attitudes. A large majority of similarity-based trust metrics consider likeness in user preferences. They provide rich information about the trust relationships in social networks (Guo et al. 2014) and are used heavily in social recommender systems (O’Donovan and Smyth 2005; Lathia et al. 2008; Papagelis et al. 2005; Hwang and Chen 2007). Essentially, implicit trust is evaluated by measuring the similarity in the product ratings of users.

Nevertheless, the relationship between trust and similarity is not clearly established in the research literature. Ziegler and Golbeck (2007) showed that trusted users are more similar than users who are not trusted. But their report did not establish the two-way correlation between the concepts of trust and similarity, specifically (1) whether higher similarity necessarily leads to higher trust between users and (2) whether trusted users are necessarily bound by the notion of similarity. Therefore, it remains to be seen if similarity can be used as a measure of trust. Thus, there is a need to study the impact of similarity on trust.

In this paper, we extend the work done by Ziegler and Golbeck (2007) to examine the relationship between trust and similarity by varying different control parameters such as similarity threshold and common item count threshold. We develop a mathematical model for predicting the likelihood of trust between users based on their item rating similarity. We conduct an empirical analysis of large publicly available Epinion dataset.Footnote 1 We evaluate the effectiveness of implicit rating similarity-based trust by validating their prediction results with explicit trust information which is embedded in the same dataset.

The key contributions of this paper are as follows:

  1. 1.

    Development of a mathematical model for predicting the likelihood of trust based on similarity rating.

  2. 2.

    Demonstration of the relationship between trust and rating similarity with variation in number of co-rated items.

  3. 3.

    Establishing the ineffectiveness of rating-based implicit similarity trust metric in predicting explicit trust using precision, recall, and coverage evaluation metrics

The rest of this paper is organized as follows. Section 2 provides a review of existing implicit similarity trust metrics and defines our problem statement. Section 3 provides details of the dataset used and the analysis methodology followed in this article. Section 4 presents the results of the analysis and Sect. 5 ends with conclusions and future research ideas.

2 Related work

Multiple definitions of trust exist in computer science literature. Researchers have defined trust broadly from “opinion about perceived quality of certain characteristics” (Massa 2007) to “measure of confidence about expected future behavior” (Singh and Bawa 2007). Some researchers proposed narrower definitions specific to a particular application, such as “ability to provide valuable ratings” (Guo et al. 2014) and “ability to provide reliable recommendations in social network” (Podobnik et al. 2012).

Due to the lack of one universally agreed upon definition of trust, a variety of trust metrics with diverse methods of computation and sourcing trust information have evolved. Implicit similarity trust metrics have been used in trust aware recommender systems to solve the data sparsity and cold start problems in collaborative filtering (O’Donovan and Smyth 2005; Lathia et al. 2008; Papagelis et al. 2005, Hwang and Chen 2007).

In this paper, we focus on rating similarity-based implicit trust and validate it against explicit trust. To facilitate discussion and comparison of different trust metrics, we have used the notation used by Guo et al. (2014) (Table 1).

Table 1 Notation

The mean of absolute (MAD) similarity between two users u and v is defined by the following equation:

$${\text{sim}}(u,v) = \frac{1}{{\# I_{u,v} }}\mathop \sum \limits_{{i \in I_{u,v} }} \left| {r_{u,i} - r_{v,i} } \right|$$
(1)

Lower the value of \({\text{sim}}(u,v)\), greater the degree of similarity between the two users. MAD similarity ranges from 0 (totally similar) to \((r_{ \hbox{max} } - 1)\), \(r_{ \hbox{max} }\) being the maximum rating scale.

Lathia et al. (2008) have proposed a trust metric based on the mean of absolute difference (MAD) in the ratings of two users u and v over items rated by both users.

$$t_{u,v} = \frac{1}{{\# I_{u,v} }}\mathop \sum \limits_{{i \in I_{u,v} }} \left( {1 - \frac{{\left| {r_{u,i} - r_{v,i} } \right|}}{{r_{\hbox{max} } }}} \right)$$
(2)

where \(I_{u,v} = I_{u} \cap I_{v}\) is the set of items co-rated by both users u and v. The above equation can be rewritten using MAD similarity sim(u, v) as

$$t_{u,v} = 1 - \frac{1}{{r_{\hbox{max} } }}{\text{sim}}\left( {u,v} \right)$$
(3)

The trust values in this system range from 0 to +1 as against alternative similarity measures that range from −1 to +1. Therefore, their system disregards the concept of dissimilarity or distrust. Though the authors claimed that their metric is different from similarity metrics used traditionally in collaborative filtering (CF) recommender systems but in reality their metric is also a measure of rating similarity. Users with same item ratings get the highest trust score of unity, while users with very different item ratings get low trust scores.

Papagelis et al. (2005) have used the classical Pearson correlation coefficient (PCC) to extract a measure of trust between users. The key idea in their paper was that if we consider rating correlation as measure of trust instead of similarity, then the concept of trust propagation can be used to infer trust between those users who have no co-rated items. This gives an approach for tackling the data sparsity problem in traditional similarity-based CF systems.

$$t_{u,v} = s_{u,v} = \frac{{\mathop \sum \nolimits_{i} \left( {r_{u,i} - \overline{r}_{u} } \right)\left( {r_{v,i} - \overline{r}_{v} } \right)}}{{\sqrt {\mathop \sum \nolimits_{i} \left( {r_{u,i} - \overline{r}_{u} } \right)^{2} } \sqrt {\mathop \sum \nolimits_{i} \left( {r_{v,i} - \overline{r}_{v} } \right)^{2} } }}$$
(4)

where \(s_{u,v}\) is the PCC similarity between user u and v that also gives their mutual trust \(t_{u,v}\). For calculating indirect trust between users u and w who have not rated any common items, and therefore for whom trust cannot be calculated using Eq. (4), following trust propagation formula was proposed. If u trusts v and v trusts w, then trust value between u and w is,

$$t_{u,w} = t_{u,v} \oplus t_{v,w} = \frac{{\# I_{u,v} t_{u,v} + \# I_{v,w} t_{v,w} }}{{\# I_{u,v} + \# I_{v,w} }}$$
(5)

Hwang and Chen (2007) used past prediction accuracy of a user for measuring his trustworthiness. They used a simple version of Resnick’s prediction formula (Resnick et al. 1994) to compute the predicted rating \(p_{u,i}^{v}\) of item i for user u using rating from another user v who has co-rated item i. The overall trust score of trustee is then derived by averaging the prediction error of all co-rated items between trustee and trustor.

$$p_{u,i}^{v} = \overline{r}_{u} + \left( {r_{v,i} - \overline{r}_{v} } \right)$$
(6)
$$t_{u,v} = \frac{1}{{\# I_{{u,v}} }}\mathop \sum \limits_{{i \in I_{{u,v}} }} \left( {1 - \frac{{\left| {p_{{u,i}}^{v} - r_{{u,i}} } \right|}}{{r_{ \hbox{max} } }}} \right)$$
(7)

Their trust metric is similar to trust metric proposed by Lathia et al. (2008). It measures the average difference of ratings between two users for items rated by both users but is also considering the difference in the average ratings of the users as seen from Eq. (8) obtained by replacing \(p_{u,i}^{v}\) in the Eq. (7).

$$t_{u,v} = \frac{1}{{\# I_{u,v} }}\mathop \sum \limits_{{i \in I_{u,v} }} \left( {1 - \frac{{\left| { - r_{u,i} + r_{v,i} + \overline{r}_{u} - \overline{r}_{v} } \right|}}{{r_{ \hbox{max} } }}} \right)$$
(8)

The main idea of their paper was to use a trust propagation framework to infer trust between users where direct trust is not available due to data sparsity. They also used the trust propagation formula given by Eq. (5).

O’Donovan and Smyth (2005) developed computational models of item level and profile level trust in recommender systems. Their trust metric measures global trust or reputation of the user. Trustworthiness is defined as someone who has history of making reliable recommendations. Trust in trustee v by all other users is calculated as proportion of correctly predicted item ratings for all users, by trustee v using only ratings from trustee v.

$$t_{v} = \frac{{\# {\text{CorrectSet}}(v)}}{{\# {\text{RecSet}}(v)}}$$
(9)

For every item that user v has rated, a rating prediction can be made for user u who has also rated that item using Eq. (6). \({\text{RecSet}}\left( v \right)\) denotes the set of all predictions that user v was involved in for various users and items. \({\text{CorrectSet}}(v)\) represents the correct predictions provided by user v

$${\text{RecSet}}\left( v \right) = \left\{ {\left( {u_{1} , i_{1} } \right), \ldots ,\left( {u_{n} , i_{n} } \right)} \right\}$$
(10)
$${\text{CorrectSet}}\left( v \right) = \left\{ {\left( {u_{k} , i_{k} } \right) \in {\text{RecSet}}(v)| {\text{Correct}}(i_{k} , u_{k} , v)} \right\}$$
(11)
$${\text{Correct}}\left( {i, u,v} \right) \Leftrightarrow \left| {p_{u,i}^{v} - r_{u,i} } \right| \le \varepsilon$$
(12)

Predicted rating of item i for any user \(u\) is considered correct if absolute difference between the predicted rating \(p_{u,i}^{v}\) and the actual rating is smaller than a threshold ϵ as shown in Eq. (11). Replacing \(p_{u,i}^{v}\) from Eq. (5) in Eq. (12), we get Eq. (13) which shows that their measure for correct rating is also evaluating similarity in ratings and is closely related to MAD.

$${\text{Correct}}\left( {i,u, v} \right) \Leftrightarrow \left| {r_{v,i} - r_{u,i} - \left( {\overline{r}_{v} - \overline{r}_{u} } \right)} \right| \le \epsilon$$
(13)

The approach given in Pitsilis and Marshall (2004) is based on subjective logic which uses a simple intuitive representation of uncertain probabilities by using a three-dimensional metric that comprises belief (b v ), disbelief (d v ), and uncertainty (u v ), where \(u_{v} + b_{v} + d_{v} = 1\), and the belief b v is used as the measure of trust that user u has on user v., i.e.,

$$t_{u,v} = b_{v}$$
(14)
$$u_{v} = \frac{1}{{\# I_{u,v} }}\mathop \sum \limits_{{i \in I_{u,v} }} \frac{{\left| {p_{u,i}^{v} - r_{u,i} } \right|}}{{r_{ \hbox{max} } }}$$
(17)
$$b_{v} = \frac{1}{2}\left( {1 - u_{v} } \right)\left( {1 + s_{u,v} } \right)$$
(16)
$$d_{v} = \frac{1}{2}\left( {1 - u_{v} } \right)\left( {1 - s_{u,v} } \right)$$
(17)

where \(p_{u,i}^{v}\) is given by Eq. (6) and \(s_{u,v}\) is PCC similarity given by Eq. (4).

All of the above similarity trust metrics are based on the similarity in user preferences. This similarity is measured by evaluating the similarity in the product rating of the users. But none of them provide enough justification for using rating similarity as a proxy for trust. Though recommender performance is shown to improve with using these trust metrics, compared to classic CF recommender system, it has not been shown how effectively these trust metrics represent trust. The idea that trust is positively and strongly correlated with user preference or rating similarity and therefore can be used in place of similarity in recommender systems is accepted on the basis of intuition alone (Guo et al. 2014). Even Sociopsychological research does not provide sufficient support for positive interactions between trust and interest similarity (Ziegler and Golbeck 2007).

Ziegler and Golbeck (2007) were the first to investigate and analyze whether positive correlation actually holds between trust and similarity. They showed that people’s trusted peers are on average considerably more similar to them than non-trusted peers.

If U denote the set of all community members, \({\text{trusted}}(u)\) the set of all users trusted by u, and \({\text{sim }}:U \times U \to \left[ { - 1, + 1} \right]\) some similarity function(PCC or MAD), then:

$$\mathop \sum \limits_{u \in U} \frac{{\mathop \sum \nolimits_{{v \in {\text{trusted}}(u)}} {\text{sim}}(u,v)}}{{\# {\text{trusted}}(u)}} \gg \mathop \sum \limits_{u \in U} \frac{{\mathop \sum \nolimits_{{v \in U\backslash \{ u\} }} {\text{sim}}(u,v)}}{\# U - 1}$$
(18)

They performed experiments with two sets of data: all consuming book reading community data and Filmtrust movie recommendation site data. For all consuming book reading community data, they showed that trusted peers have higher average similarity compared to non-trusted peers. For Filmtrust data, they showed that as average trust rating increases, similarity also increase. They did not establish the causality between similarity and trust, i.e., similarity implies trust and neither did they quantify the impact of similarity on trust.

Golbeck (2009) in a separate paper conducted survey-based experiment where users were asked to give trust rating to a hypothetical user and showed that there are other rating details in addition to the overall similarity measure that impact the trust between users. Lee and Brusilovsky (2009) built on the work of Ziegler and Golbeck (2007) and showed that users connected by a network of trust exhibit significantly higher similarity on items and meta-data than non-connected users. But they too did not establish the reverse inference that similarity leads to trust.

Guo et al. (2014) were the first to evaluate the above implicit trust metrics by comparing them against explicit trust. They concluded that above implicit similarity trust metrics were not able to predict explicit trust satisfactorily. But they also did not study the relationship between similarity and trust.

There is a need to establish the relationship between similarity and trust which can justify the use of implicit similarity trust metrics. We address this problem by developing a model for likelihood of trust between users based on the rating similarity between them in the following sections. A careful analysis of the various implicit similarity trust metrics discussed in this section reveals that they are all derived from two basic similarity metrics. They are variations of either the PCC given by Eq. (4) or MAD given by Eq. (3). So we have used these two similarity metrics for evaluating the relationship between user similarity and trust.

3 Analysis methodology

3.1 Dataset

We have used the publicly available Epinion dataset for studying the relationship between trust and rating similarity and for evaluating the effectiveness of predicting trust between users from their rating similarity. Table 2 lists the key features of the Epinion dataset. It contains two types of information:

  1. (i)

    Binary explicit trust relationship between users as expressed by the users themselves. For a user pair \(\left( {u,v} \right)\), their trust relationship can occur in two directions: (i) u trusts v and (ii) v trusts u. A 2-way relationship is bidirectional trust and a 1-way relationship is unidirectional trust. The explicit trust information can be used for constructing an explicit trust network between users.

  2. (ii)

    Item ratings given by the users which can be used to generate an implicit network between users by connecting them through the common items co-rated by them.

Our analysis exploits the availability of explicit trust information in the dataset to study the correlation of trust and similarity and to evaluate the accuracy of the implicit trust prediction derived from the rating similarity.

Table 2 Epinion dataset description
Fig. 1
figure 1

Distribution of number of trusted users and number of co-rated items on log–log scale axis

Table 3 Distribution of number of trusted users and number of co-rated items

Distribution of number of trusted users per user

Figure 1a and Table 3 show the frequency distribution of the number of trusted users per user in the Epinion dataset. We observe that most users have no trusted users and very few users have very large number of trusted users. For 49,288 users in the Epinion dataset, 1,214,628,828 unique user pairs are possible and 2,429,257,656 trust relations are possible, if each user pair has bidirectional trust relationship. However, in the dataset only 487,183 or 0.02 % explicit trust relations exists. Therefore, the explicit web of trust in the dataset is very sparse.

Distribution of number of co-rated items

Figure 1b and Table 3 show the distribution of the number of items co-rated by a pair of users in the Epinion dataset. Again, a similar pattern can be observed. A very large number of user pairs have rated one common item, but there are very few user pairs which have rated large number of common items. The connection network based on co-rated items in the dataset is also very sparse. In the dataset, out of the 1,214,628,828 possible user pairs between 49,288 users, only 14,040,056 or 1.156 % user pairs have rated one or more common item.

The straight line distribution of number of trusted users and number of co-rated items on log–log axis graph indicates a power law degree distribution typical of most social networks (Faloutsos et al. 1999; Zafarani et al. 2014). This indicated our dataset is representative of typical social network.

Selection of user pairs

Computing rating similarity on only one co-rated item is not meaningful, so we have considered user pairs with more than 2 common product ratings for our analysis. Out of the 578,548 user pairs for which more than 2 common product ratings exist, only 22,881 user pairs have explicit trust relationship; 8105 user pairs with bidirectional trust and 14,776 user pairs with unidirectional trust, total 30,986 trust relations. Therefore, proportion of explicit trust is 2.68 %, among the user pairs considered in the analysis. This is significantly denser than 0.02 % for the entire dataset with 2,429,257,656 possible user pairs.

3.2 A logistic regression model for trust similarity relationship

Our aim is to characterize the relationship between rating similarity and trust among users in a given population. We develop binary logistic regression model for predicting the likelihood of trust between users given their similarity.

Binary logistic regression is a method for modeling the relationship between a binary categorical dependent variable and one or more continuous independent variables, using a logistic function for estimating probabilities (Seltman 2012). In our case,

  • The dependent variable is the trust y between any two users in a given population. It can assume two states, trust (y = 1) and no trust (y = 0).

  • The continuous valued independent variable x represents the degree of similarity, between two users.

For a set of user pairs \(S = \left\{ {u_{1} ,v_{1} , \ldots ,u_{l} ,v_{l} } \right\}\) with some fixed similarity value \(x_{1}\) between them, \(T \subseteq S\) such that for each user pair \(u_{i} ,v_{i} \in T\), \(u_{i}\) trusts \(v_{i}\). Then, the likelihood or expectation of trust represented as E(trust) or E(y = 1) between any randomly selected pair of users in the population is given by the following equation:

$$E({\text{trust}}) = \frac{\# T}{\# S}$$
(19)

The logistic function-based relationship between E(trust) and similarity x is then given as:

$$E\left( {\text{trust}} \right) = \frac{1}{{1 + e^{{ - \left( {\beta_{o} + \beta_{1} x} \right)}} }}$$
(20)

Users with E(trust) >0.5 can be considered trusted users according to this model. Above equation can be written in terms of log of odds of trust which will be linear function of similarity.

$${\text{log }}\left( {\text{odds of trust}} \right) = \log_{e} \left( {\frac{E(y = 1)}{1 - E(y = 1)}} \right) = \beta_{0} + \beta_{1} x$$
(21)

The parameters \(\beta_{1}\) and \(\beta_{0}\) characterize the relationship between trust and similarity. The regression coefficient \(\beta_{1}\) represents the rate at which the log odds of trust increase with an increase in the degree of similarity between them. Larger the value of \(\beta_{1}\), stronger the positive correlation. The intercept \(\beta_{0}\) gives the expected odds of trust without any similarity between users. Fitting of the logistic regression model involves finding the best estimates of these two parameters.

3.3 Trust prediction

The Epinion dataset is labeled with only binary-valued explicit trust between members representing trust and no trust. Therefore, to evaluate the effectiveness of similarity in predicting trust, we need to calculate binary-valued implicit similarity trust between users. As seen in Sect. 2, most researchers proposed continuous valued rating similarity measures and derived trust metrics based on them. In order to convert them to binary values, we use the threshold-based method, proposed by Yuan et al. (2010) as given by Eq. (22)

$$t_{u,v} = \left\{ {\begin{array}{*{20}c} {1, } \\ {0, } \\ \end{array} \begin{array}{*{20}c} {s_{u,v} > {{\tau }}^{\text{sim}} \& \& \# I_{u,v} > \tau^{\text{Icount}} } \\ {\text{otherwise}} \\ \end{array} } \right.$$
(22)

where

  • The parameter \(s_{u,v}\) is rating similarity between a user pair u, v. Since all trust metrics have been derived from either MAD similarity or PCC similarity, \(s_{u,v}\) may represent either one of them.

  • The parameter \(\tau^{\text{sim}}\) is a preset threshold to decide whether the degree of similarity qualifies for establishing positive trust between two users. Depending upon the metric applied, \(\tau^{\text{sim}}\) may be written as \(\tau^{\text{MAD}}\) or \(\tau^{\text{PCC}}\). Thus, user v is declared a trusted user for user u, if the PCC similarity between them is above the preset threshold \(\tau^{\text{PCC}}\) or MAD similarity value between them is below the preset threshold \(\tau^{\text{MAD}}\).

  • The parameter \(\tau^{\text{Icount}}\) is the minimum number of common item(s) that must be co-rated by both users in a pair in order to be considered in trust similarity modeling process.

The effectiveness of rating similarity as a trust predictor is validated by comparing the derived implicit trust against the explicitly expressed trust values in the dataset. Let \(P_{u}\) be the set of all predicted trusted users for user u and let \(T_{u}\) be the set of all explicitly trusted users for user u. Let \(Z = \left\{ {u \in U|P_{u} \ne \emptyset } \right\}\) be set of all users for which one or more trusted users have been predicted correctly on the basis of implicit rating similarity-based trust. The following metrics indicate the quality of prediction.

$${\text{coverage}} = \frac{\# Z}{\# U}$$
(23)
$${\text{precision}} = \frac{1}{\# Z}\mathop \sum \limits_{u \in Z} \frac{{\# (P_{u} \cap T_{u} )}}{{\# P_{u} }}$$
$${\text{recall}} = \frac{1}{\# Z}\mathop \sum \limits_{u \in Z} \frac{{\# (P_{u} \cap T_{u} )}}{{\# T_{u} }}$$
(25)

4 Experimental results

In this section, we discuss our experimental results and observations.

4.1 Regression analysis and validation

In Sect. 3.2, we developed a logistic regression model to describe the relationship between rating similarity and explicit trust. In our first set of experiments, we apply this model on the Epinion dataset to find the best fitting parameters \(\beta_{0} \;{\text{and}}\;\beta_{1}\) as given by Eq. (20) and validate the model by comparing it against the actual data. We performed multiple experiments, presetting the co-rated item count threshold \(\tau^{\text{Icount}}\) for each experiment.

We used the Statistical Package for Social Science [SPSS] software for performing logistic regression on the Epinion dataset. The SPSS software gives the significance values (or p values) of the estimated coefficients \(\beta_{0} \;{\text{and}}\;\beta_{1}\) and the confidence interval (CI) for \(e^{{\left( {\beta_{1} } \right)}}\) for confidence level (CL) of 95 %. The model is considered to be statistically significant if the p values of the coefficients are less than a significance level (SL) of 5 %.

The objective of our first set of experiments is to investigate how trust between users varies with varying degrees of similarity between them. We first selected all user pairs whose co-rated items exceed the preset threshold. We then partitioned the user pairs into multiple subsets on the basis of their similarity values. For PCC similarity, users were grouped into 21 subsets within the similarity ranges: [−1, −0.95), [−0.95, −0.85)… [0.85, 0.95), [0.95, 1]. For MAD similarity, users were divided into 51 subsets within the similarity ranges: [0, 0.05), [0.05, 0.15)… [3.85, 3.95), [3.95, 4]. For each subset, we plotted the probability of trust as given by Eq. (19) against the midpoint of each similarity range. We now discuss our observations. It may be noted that user pairs for which similarity value is either not defined or cannot be reliably calculated due to number of co-rated items being less than 3 have not been considered in this analysis even though explicit trust might exist between some of them.

4.1.1 Probability of trust versus PCC Rating similarity

Figure 2 graphically depicts the relationship between probability of trust and PCC rating similarity, as predicted by the fitted model as well as based on the actual data.

Fig. 2
figure 2

Probability of trust between user pair versus similarity (PCC)

Table 4 records the best fitting values of coefficients \(\beta_{0} \;{\text{and}}\;\beta_{1}\), their p values, and the CI for \(e^{{\beta_{1} }}\) for 95 % CL.

Table 4 Logistic regression model parameters

Table 4 reveals that the estimates of \(\beta_{0} \;{\text{and}}\;\beta_{1}\) are statistically significant as all p values are 0 for a SL of 0.05. These estimates have low standard error. Further, the confidence interval of \(e^{{\beta_{1} }}\) is very narrow for 95 % CL, thereby showing a good model fit.

From Fig. 2, we observe that the fitted model is almost a straight line, and this is because PCC similarity range of −1 to 1 covers a small portion of overall input range of the non-linear logistic function. From Fig. 2, the following interesting observations emerge:

  • Considering all user pairs with >2 co-rated items, a user is only 0.1 % more likely to trust another user with rating similarity of 0.5 as compared to a user with rating similarity of −0.5.

  • Considering all user pairs with >5 co-rated items, a user is 1.5 % more likely to trust another user with rating similarity of 0.5 as compared to a user with rating similarity of −0.5.

  • Next considering all user pairs with >10 co-rated items, a user is 2.4 % more likely to trust another user with rating similarity of 0.5 as compared to a user with rating similarity of −0.5.

  • If we keep the rating similarity constant at 0.5, then a user is 9.6 % more likely to trust a user with >10 common items as compared to a user with >2 common items.

The key conclusion that can be derived from the above analysis is that as rating similarity increases, so does the probability of trust, but very marginally. Further, as the number of co-rated items increases, the probability of trust increases, irrespective of the rating similarity.

4.1.2 Probability of trust versus MAD rating similarity

Figure 3 graphically depicts the relationship between probability of trust and MAD rating similarity, as predicted by the fitted model as well as on the basis of actual data. The MAD similarity along x-axis ranges from 0 to \((r_{ \hbox{max} } - 1)\), where we have used a rating scale of 1–5 and the y-axis shows the probability of trust between all user pairs who satisfy a given similarity value. Table 5 records the best fitting values of coefficients \(\beta_{0} \;{\text{and}}\;\beta_{1}\), their p values, and the CI for \(e^{{\beta_{1} }}\) for 95 % CL.

Fig. 3
figure 3

Probability of trust between user pair versus similarity (MAD)

Table 5 Logistic regression model parameters

Table 5 reveals that the estimates of \(\beta_{0} \;{\text{and}}\;\beta_{1}\) are statistically significant. All estimated coefficient values have a p value of 0 for a SL of 0.05. These estimates have low standard error ranging from 0.011 to 0.068. Further, the CI of \(e^{{\beta_{1} }}\) for 95 % CL is within the acceptable range. This shows a good model fit.

From Fig. 3, we observe that:

  • Considering all user pairs with >2 co-rated items, users with MAD = 1 are only 1.3 % more likely to trust each other as compared to users with MAD = 3.

  • Considering all user pairs with >5 co-rated items, users with MAD = 1 are only 5.2 % more likely to trust each other as compared to users with MAD = 3.

  • Considering all user pairs with >10 co-rated items, users with MAD = 1 are only 12 % more likely to trust each other as compared to users with MAD = 3.

  • If we keep the MAD value constant at 1, then a user is 10.7 % more likely to trust a user with >10 co-rated items compared to a user with >2 co-rated items.

The above observations indicate that by using MAD similarity, the correlation between similarity and trust is somewhat higher than that between PCC similarity and trust. However, it still reflects a weak correlation between rating similarity and trust. As rating similarity increases, i.e., the MAD value decreases, the probability of trust increases but marginally, thus re-affirming the same conclusion drawn from our previous experiment using PCC similarity. Further, as the number of co-rated items by a user pair increases, the probability of trust increases irrespective of the rating similarity.

4.2 Trust prediction

In our second set of experiments, we predict the implicitly trusted users on the basis of rating similarity using Eq. (22). We study the impact of rating similarity level on the prediction accuracy by varying the similarity threshold (\(\tau^{\text{PCC}} ,\tau^{\text{MAD}}\)) and co-rated item count threshold (\(\tau^{\text{Icount}}\)). We evaluate the prediction performance by calculating precision, recall, and coverage of predicted implicitly trusted users against explicitly trusted users.

4.2.1 Impact of similarity threshold on trust prediction

Figure 4a–c shows the values of precision, recall, and coverage for varying \(\tau^{\text{PCC}}\) for three different fixed values of \(\tau^{\text{Icount}}\), viz., 2, 5, and 10. Figure 4d–f shows the values of precision, recall, and coverage for varying \(\tau^{\text{MAD}}\) for three different fixed values of \(\tau^{\text{Icount}}\), viz., 2, 5, and 10. In Table 6, we record the values of precision, recall, and coverage for the minimum value of \(\tau^{\text{PCC}}\) (−1) and the maximum value of \(\tau^{\text{PCC}} \left( {0.8} \right)\), and the consequent percentage variation over this range. Also, we record the values of precision, recall, and coverage for the minimum value of \(\tau^{\text{MAD}}\) (4) and the maximum value of \(\tau^{\text{MAD}}\) (0.5).

Fig. 4
figure 4

Impact of similarity threshold on prediction metrics a precision, b recall, c coverage, for varying \(\tau^{\text{PCC}}\), d precision, e recall, f coverage, for varying \(\tau^{\text{MAD}}\)

Table 6 Precision, recall, and coverage ranges for experiments for predicting trust using rating similarity

It may be noted that the trends of precision, recall, and coverage remain same irrespective of \(\tau^{\text{Icount}}\). We use the representative case of \(\tau^{\text{Icount}} = 5\) to make the following observations:

  • Overall, the values of precision, recall, and coverage are very low for the entire range of \(\tau^{\text{PCC}}\) and \(\tau^{\text{MAD}}\). For \(\tau^{\text{PCC}}\), the maximum value of precision is 10.6 % and for \(\tau^{\text{MAD}}\) the maximum value of precision is 12.8 %. Maximum values of recall and coverage are 5.2 and 5.8 %, respectively, for both \(\tau^{\text{MAD}}\) and \(\tau^{\text{PCC}}\).

  • It may be noted that the maximum value of recall is low even with minimum preset similarity threshold value (PCC = −1). This is because of the fact that for many user pairs, valid similarity values cannot be calculated because they do not have sufficient number of co-rated items:

    • The notion of rating similarity between two users is based on the premise that they have rated a common set of items. Both PCC similarity and MAD similarity are based on the ratings provided by users to a common set of items. Thus, similarity-based prediction fails for user pairs who explicitly trust each other but have rated completely different sets of items.

    • PCC similarity (refer Eq. 4) is undefined when a user pair has only one co-rated item or when one of them has provided the same rating to all co-rated items.

    • Similarity calculated on the basis of two co-rated items is not reliable and so we have precluded such user pairs from our similarity calculations.

Owning to the above factors, the trust between several users could not be predicted even when the similarity threshold was preset to its minimum level.

  • The variation in precision with \(\tau^{\text{PCC}}\) is only 0.7 %, indicating that precision remains constant for the entire range. For \(\tau^{\text{MAD}}\), the variation is higher at 24.9 %. However, Fig. 4d clearly shows that the precision remains constant for large portion of the entire range of \(\tau^{\text{MAD}}\) between 4 and 1.5.

  • The maximum recall of 5.2 % drops sharply as the similarity threshold increases in the case of both \(\tau^{\text{PCC}}\) and \(\tau^{\text{MAD}}\). On examining the Epinion dataset, we found that this is due to very few user pairs who satisfy high similarity threshold values.

  • The maximum coverage of 5.8 % drops gradually as the similarity threshold increases in the case of both \(\tau^{\text{PCC}}\) and \(\tau^{\text{MAD}}\).

From above observations, we conclude that:

  • In general, the low value of precision indicates that similar users do not necessarily trust each other

  • In general, the low value of recall indicates that users who are either not similar in terms of co-rated item ratings or are not bound by any kind of similarity relationship are also likely to trust each other.

  • We therefore conclude that higher rating similarity leads to very marginal increase in the precision and only toward the very high similarity range, we see more pronounced changes. As rating similarity increases, the coverage and recall decrease significantly as expected, due to very few user pairs with high rating similarity.

4.2.2 Impact of number of co-rated item on trust prediction

Figure 5a–c shows the values of precision, recall, and coverage for varying \(\tau^{\text{Icount}}\) for three values of \(\tau^{\text{PCC}}\), viz, −1, 0, and 0.5. Figure 5d–f shows the values of precision, recall, and coverage for varying \(\tau^{\text{Icount}}\) for three values of \(\tau^{\text{MAD}}\), viz., 4, 2, and 1. In Table 7, we record the values of precision, recall, and coverage for the minimum value of \(\tau^{\text{Icount}}\) (2) and the maximum value of \(\tau^{\text{Icount}}\) (180), and the consequent percentage variation.

Fig. 5
figure 5

Impact of number of co-rated items on prediction metrics a precision, b recall, c coverage, by varying \(\tau^{\text{Icount}}\) for PCC, d precision, e recall, f coverage, by varying \(\tau^{\text{Icount}}\) for MAD

Table 7 Precision, recall, and coverage ranges for experiments for predicting trust using co-rated items

It may be noted that the trends of precision, recall, and coverage remain same irrespective of similarity threshold. We use the representative case of \(\tau^{\text{PCC}} = 0\) and \(\tau^{\text{MAD}} = 2\) to make the following observations:

  • Maximum precision value of 100 % can be achieved though only for very large number of common rated items (140) between users

  • Overall, the recall and coverage are very low for the entire range of \(\tau^{\text{Icount}}\). For \(\tau^{\text{PCC}} = 0\), the maximum values of recall and coverage are 7.5 and 18.3 %, respectively. For \(\tau^{\text{MAD}} = 2\), the maximum values of recall and coverage are 7.5 and 19.3 %, respectively.

  • The variation in precision for PCC with \(\tau^{\text{Iconut}}\) is 2795.4 %, indicating that precision has very high correlation with \(\tau^{\text{Iconut}}\). For \(\tau^{\text{MAD}}\), the variation is 2896.9 %, again indicating that precision has very high correlation with \(\tau^{\text{Iconut}}\).

  • The recall value increases initially till \(\tau^{\text{Icount}}\) of 40 and then starts decreasing again reaching the minimum value of 1.9 % for both \(\tau^{\text{PCC}}\) and \(\tau^{\text{MAD}}\). On examining the Epinion dataset, we found that this is due to very few user pairs with large number of co-rated items. There are less then than 32 user pairs with more than 40 common rated items between them.

  • The maximum coverage of 18.3 and 19.3 % for \(\tau^{\text{PCC}}\) and \(\tau^{\text{MAD}}\), respectively, drops gradually as the number of co-rated item threshold increases.

From above observations, we concluded that

  • The more significant factor in improving the precision is the number of co-rated items between users, irrespective of the actual rating values. These results show that precision of trust prediction is strongly dependent on the number of co-rated items between users.

  • Further, we conclude that even though trust can be predicted with very high precision for higher co-rated items, but the coverage is very low. The best coverage value for implicit similarity trust is significantly lower than the number of users which have at least 1 explicitly trusted other user in the dataset. This low coverage limits any benefit of using rating similarity or co-rated items for predicting trust. In fact, it further aggravates the data sparsity problem if do not want to sacrifice the precision for increasing coverage.

5 Conclusion and future directions

We presented a comprehensive review of the myriad rating similarity-based trust metrics discussed in the literature. We developed a logistic regression model to characterize the relationship between similarity and trust. We validated this model for the Epinion dataset using both PCC similarity and MAD-based similarity. An analysis of the fitted model revealed that there is a positive but very weak correlation between rating similarity and trust. Further, we conducted experiments on trust prediction with varying similarity threshold and co-rated item count threshold. In general, the precision, recall, and coverage values achieved were low. The impact of similarity threshold on precision is marginal. Significantly, we observed that with increasing co-rated item count threshold, the precision improves markedly irrespective of the actual ratings.

These observations indicate that rating similarity is not a reliable source for trust evaluation between users. Better implicit similarity trust metrics need to be developed based on other aspects of user profile and user interactions. Currently, very little work exists on relational trust metrics which use user interactions for predicting trust, due to non-availability of rich dataset with details of behavior and interaction between users. For our future work, we will investigate the potential of relational trust in enhancing the performance of social network recommender systems.