Keywords

1 Introduction

Recommendation systems (RS) provide information or item that is interest of the user by analysing rating pattern and stable information of user. The huge growths of information on the web as well as variety of guests to websites add some key challenges to recommender systems technology; these are producing accurate recommendation and handling several recommendations with efficiency [1]. Therefore, new recommender system technologies are required which will quickly turn out prime quality recommendations even for immense information sets.

Content based and collaborative filtering (CF) based are two approaches for developing recommendation systems. In content based system items are recommended based on users past rating history and content of items. Collaborative filtering recommendation system is based on U-I rating matrix. In a typical Collaborative filtering system, an n × m user-item matrix is created, where n users’ preferences about m products are represented as ratings, either numeric or binary. To obtain a prediction for a target item i or a sorted list of items that might be liked, an active user u sends her known ratings and a query to the system. CF system estimates similarities between u and each user in the database, forms a neighbourhood by selecting the best similar users, and estimate a prediction (p ui ) or a recommendation list (top-N recommendation) using a CF algorithm [2, 3]. Profile injection attacks degrade the quality and accuracy of a CF based recommender system over inflicting frustration for its users and probably resulting in high user defection [4]. CF based recommendation systems are extremely prone for shilling attacks then content based recommendation systems [2]. New technology are needed that cannot biased to the various fake profile, and generate recommendation with high precision.

Overall success of CF based recommendation system is depends on how it handle shilling attack and how effectively detect shilling attacks [5]. In this paper we provide survey of various types of shilling attack in CF based RS. Also classification of shilling attacks, detection attributes detection techniques and some evaluation parameters of recommendation systems.

The paper is designed, as follows: In Sect. 2 we briefly discuss theoretical background after that in Sect. 3, contain related work then in Sect. 4 contain various detection attributes of shilling attack detection after that in Sect. 5 evaluation matrix and parameters and then in Sect. 6 we discuss some future work and open issues. Finally, we conclude our paper in Sect. 7.

2 Theoretical Background

Section 1, provides basic introduction about recommendation system so now, we focus on shilling attack in collaborative filtering based recommendation system.

2.1 Shilling Attacks

Recommendation schemes are successful in e-commerce sites; they are prone to shilling or profile injection attacks. Shilling attack or profile injection attacks is outlined as,

Malicious users and/or competitive vendors may attempt to insert fake profiles into the user-item matrix in such a way so they will have an effect on the predicted ratings on behalf of their benefits [2].

To understand how shilling attack works, consider Table 1. Contain 6 users and 6 items and we want to predict rating on item 6 by user 1 which is our target user.

Table 1 U-I matrix without Shilling attack

Without shilling attack similarity with user 1 is given in Table 1. This similarity is calculated using Pearson correlation coefficient (PCC). If we chose k = 1 then most similar user with user 1 is user 5 and rating by user 1 on item 5 is 2, which is our correct answer.

Now, if attacker enters shilling attack profile which is user 6 then similarity with target user 1 is shown in Table 2. Here with shilling attack profile user 6 is most similar user with target user 1 with similarity 0.98, and rating for item 6 by user 1 is 5 instead of 2 (original rating without shilling attack).

Table 2 U-I matrix with Shilling attack

Hence, shilling attacks is reducing quality of data and hence reduce accuracy of recommendation system.

2.2 Classification of Shilling Attacks

Shilling attacks are classified based on intent and based on amount of knowledge required to build shilling attack profiles.

Based on intent. Based on intent shilling attacks are classified as push and nuke attacks. Push attack try to increase popularity of target item by giving high rating to target item. Nuke attack tries to reduce popularity of target item by giving low rating to target item [6].

Based on knowledge required. Based on amount of knowledge required there are different shilling attack models like average attack, random attack, bandwagon attack, reverse bandwagon, segment attack etc.

Attack profile for shilling attack is shown in Fig. 1.

Fig. 1
figure 1

Shilling attack profile

The attack profile is an m-dimensional vector of ratings as per Fig. 1, were m is the total range of items within the system. The profile is partitioned into four components. The empty partition, Iɸ is those items with no ratings in the profile. The only target item it will be given a rating as determined by the function ϒ, usually this will be either the highest or lowest possible rating, looking on the attack type (push/nuke). As described below, some attacks need distinguishing a group of items for special care during the attack. This special set Is receives ratings as given by the function δ. Finally, there is a group of filler items If whose ratings are given as specified by the function σ. It is the strategy for choosing items in Is and If and the functions ϒ, σ, and δ that outline an attack model and provides it its character [6].

For different attack models different strategies are used for creating attacker profiles and how to provide rating for items to create attacker profiles are shown in Table 3.

Table 3 Attack profile summary [2]

3 Related Work

In this section we have a tendency to represent some related works in field of shilling attacks in recommendation systems. Since shilling profiles look like authentic profiles, it is very tough to spot them. To discover shilling profile various statistical, classification, clustering techniques are used. Bryan et al. [7] Suggest new algorithm known as “Unsupervised Retrieval of Attack Profiles” (UnRAP). They recommend new measure known as Hv-score measure to find shilling profile from genuine profile. They said that Hv-score value of attacker profile is higher than genuine profile. Based on this assumption they identify attacker profile. Lu [8] Extends work of [7] to find group of attacker instead of individual attackers. With help of various detection matrices and analysing raring pattern of attacker [9] Propose unsupervised learning method for detection of fake profile using target item analysis. Algorithm find potential attack profiles using digsim and rdma (Rating Deviation from Mean Agreement) and then refine set of potential profile using target item analysis. Supervised learning is another approach to detect shilling attack in memory based CF. Zhang and Zhou [10] suggest Ensemble learning concept for shilling attack detection using back propagation neural network classifier, finally output is combined using voting strategy. Semi-supervised learning also helpful to detect shilling profiles. Bilge et al. [11] Use bisecting k-means algorithm to generate binary decision tree. Intra cluster correlation (ICC) is used to find correlation within cluster between the profiles. This method assumes that attacker profiles in cluster have high ICC between them. And cluster with high value for ICC is considered as attacker cluster. But Performance of this scheme is slightly worse with increasing filler size in segment attack. Zhang et al. [12] Detect shilling attacks using clustering social trust information between the users. They propose two algorithms, CluTr and WCluTr, to mix clustering with “trust” among users. According to them user with no incoming trust is considered as attacker profiles. Cao et al. [13] Use Semi-supervised learning method semi-SAD. Combination of EM-λ and naïve-bayes is used for detection of shilling attacks.

4 Detection of Shilling Attack

CF based recommendation systems are vulnerable to shilling attacks. We begin this section with a review of some of the statistical measures that have been designed to detect shilling attack in recommendation system. Some of the Standard shilling attack detection metrics are explains below:

Rating Deviation from Mean Agreement (RDMA). This measures a user’s rating disagreement with other genuine users in the system, weighted by the inverse number of item that user rated. It is defined as,

$$RDMA_{u} = \frac{{\mathop \sum \nolimits_{i = 0}^{{N_{u} }} \frac{{\left| {r_{u,i} - Avg_{i} } \right|}}{{NR_{i} }}}}{{N_{u} }}$$
(1)

Weighted Deviation from Mean Agreement (WDMA). This measure is strongly based on RDMA; however it places higher weight on rating deviations for sparse items [6]. It is defined as,

$$WDMA_{u} = \frac{{\mathop \sum \nolimits_{i = 0}^{{N_{u} }} \frac{{\left| {r_{u,i} - Avg_{i} } \right|}}{{NR_{i}^{2} }}}}{{N_{u} }}$$
(2)

Where, Nu is the range of items user u rated, ru,i is the rating given by user u to item i, NRi is the overall range of ratings in the system given to item i. Avgi is average rating of item i.

Degree of similarity (DegSim). Which based on hypothesis that is attacker profiles is highly similar with each other because of theirs characteristics and they are generated with same process [7]. But this profile has low similarity value with genuine profiles. It can be defined as,

$$DigSim_{u} = \frac{{\sum\nolimits_{v\epsilon \,neighbors(u)} {W_{u,v} } }}{k}$$
(3)

Where, Wu,v similarity between u and k-nearest neighbours v. and k is number of nearest neighbours of user u.

Equation for similarity between u and v using Pearson correlation coefficient is given as below [9],

$$W_{u,v} = \frac{{\sum\nolimits_{i\epsilon I} {\left( {r_{u,i} - \overline{{r_{u} }} } \right)\left( {r_{v,i} - \overline{{r_{v} }} } \right)} }}{{\sqrt {\sum\nolimits_{i\epsilon I} {\left( {r_{u,i} - \overline{{r_{u} }} } \right)^{2} \left( {r_{v,i} - \overline{{r_{v} }} } \right)^{2} } } }}$$
(4)

Where, I is the set of items that users u and v both rated, rui is the rating user u gave to item i, and \(\overline{{r_{u} }}\) is the average rating of user u.

Length Variance (LengthVar). This attribute relies on the length of user profile. Most of the attacker enters shilling profile that contains large number of rated items [6]. Thus shilling profile has high value for this attribute. Length Variance (LengthVar) that is a measure of what proportion the length of a given profile varies from the average length within the database. It is defined as,

$$LengthVar_{u} = \frac{{\left| {n_{u} - \overline{{n_{u} }} } \right|}}{{\sum\nolimits_{u\epsilon U} {\left( {n_{u} - \overline{{n_{u} }} } \right)^{2} } }}$$
(5)

Where, nu is the average length of a profile in the system.

5 Evaluation Parameters and Matrices

Various evaluation matrices are used for evaluation of effect of shilling attack in recommendation system and evaluating detection algorithms and measure accuracy of recommendation system. These measures are shown in Table 4.

Table 4 Evaluation parameters [2]

6 Discussion and Open Issues

The internet has been increasing attention now a days. Due to the continuously increasing popularity of internet large amount of data are available. This large data create information overload problem. Recommendation system is system that predict users interest and recommend items to the user. Therefore, the research about recommender systems seems to remain popular. Similarly, shilling attacks against such systems will be in place, as well. Hence, number of researchers are doing research in this area and they are try to find various solutions but there are still missing gaps in this area that is need to be filled.

From all above surveys in Sect. 3 some interesting points are found these are almost all detection methods are unsupervised learning methods. Using supervised learning high accuracy is possible to achieve then unsupervised method. Detection of fake profile in model based recommendation is also one interesting point. Sparse database are vulnerable to shilling attack hence this direction is also need to be investigated. Design a various methods that effectively improves recommendation accuracy in presence of sparsity and shilling attack profile are one of the good idea of research. Using social relationship between users we can also find fake profile that help to improve recommendation accuracy. Using content and social information of user analyze their search history to determine that given user is attacker or not this is also one of the new topic of research.

7 Conclusion and Future Work

In this survey we discuss about effect of shilling attack in recommendation system, their types, detection parameters, evaluation parameters and related works that was done in this In future we are planning to conduct detail survey in field of fake review detection in model based and hybrid recommendation system. We are also planning to propose method for detection of shilling attack using supervised learning.