1 Introduction

With a large amount of data available over the web, it becomes difficult for the user to process the data and find the relevant information from it. For instance, to watch a web series on Prime, a user might have to go through a large number of trailers before reaching a web series of interest, which is a time-consuming process and may even end up not watching any series. To help the user find the relevant information in a short time, a tool namely, Recommender System (RS) has been developed by scientists/researchers (Jannach et al. 2010). Collaborative Filtering (CF) is a memory-based RS technique that filters out items based on the interest of similar users/items (Bansal and Baliyan 2019a, b; Bedi et al. 2017; Bansal and Baliyan 2020). It is the most successful recommendation technique used by big giants namely, Amazon and Netflix. 60% of videos watched on YouTube and 40% of apps installed from the Play Store are results of recommendations.Footnote 1

CF though successful in the world of the web is vulnerable to profile injection attacks due to the reliance of recommendations on user profiles and its open nature (Lam and Riedl 2004). Profile injection attacks are also known as shilling attacks (Burke et al. 2015). The attackers while mounting these attacks take the advantage of dependency of recommendations on other user’s reactions. The fake user profiles similar to the target user are created and inserted in the dataset by the attacker to make them appear in the neighborhood and thus bias the recommendation process. Such user profiles are created by following different attack models namely, segment attack, bandwagon attack, random attack, and average attack. The purpose of such attacks is to promote or demote items for fun and profit that would otherwise may not appear in the user’s list of recommended items.

Several supervised and unsupervised detection techniques have been investigated by researchers to filter out user profiles that can generate bias in the recommendation process. However, both techniques have certain demerits. Supervised detection techniques require a large amount of labeled data and a balanced number of fake and genuine user profiles to train the classifier (Zhou et al. 2016). Further, hand-designed features are used to train machine learning classifiers which are difficult to extract (Zhou et al. 2020). While unsupervised techniques require less computational time as unlabeled training samples are used but usually require some knowledge about shilling profiles which is difficult to find in the real world (Zhou et al. 2016, 2020). To the best of our knowledge, Swarm Intelligence (SI) techniques have not been explored by the researchers to detect fake profiles mounted in the dataset. For the ease of use and excellent results shown by bio-inspired SI technique, Grey Wolf Optimizer (GWO) on various problems including parameter tuning, economy dispatch, classification, clustering, power engineering to name a few (Hassan and Zellagui 2018; Pradhan et al. 2018; Hatta et al. 2019), we explored it from the perspective of detecting attack profiles mounted in the dataset. Further, the detection of shillers can be seen as a binary classification problem on which GWO has shown significant results in the past (Emary et al. 2016).

In this paper, we develop an unsupervised detection technique, ShillDetector for finding attack profiles mounted in the dataset. It works directly on the rating matrix, does not require hand-designed features or prior knowledge of attack profiles. Further, it shows significant detection accuracy when tested on the MovieLens (ML)Footnote 2 dataset. ShillDetector is a GWO based technique that takes inspiration from the social hierarchy of grey wolves and works on the lines of their hunting behavior, i.e., to encircle the prey before attacking it. The ease of implementation, the involvement of minimal parameters, simplicity of the algorithm, use of few operators, derivation-free nature, and excellent results (Mirjalili et al. 2014), make it more noticeable to be explored in the future by researchers. To the best of our knowledge, no meta-heuristic technique till now has been proposed for the detection of attack profiles in recommender systems.

The major contributions of the work are:

  1. 1.

    A novel GWO based technique for the detection of shilling attacks (ShillDetector) is proposed.

  2. 2.

    It works directly on the rating matrix, does not require hand-designed features or prior knowledge of attack profiles.

  3. 3.

    It mimics the hunting behavior of grey wolves to detect fake profiles.

  4. 4.

    The technique uses group behavior of attack profiles.

  5. 5.

    ShillDetector detects fake profiles with an average precision of 99%.

  6. 6.

    The simplicity of the algorithm, ease of implementation, derivation-free nature, use of fewer operators as opposed to the evolutionary algorithm (crossover, mutation), and excellent results, make it more noticeable to be explored in the future by researchers.

The paper is structured as follows: the literature review is discussed in Sect. 2. The background is discussed in Sect. 3. The proposed work is detailed in Sect. 4. Section 5 throws light on experiments and results. Section 6 concludes the work.

2 Literature review

Defending and attacking a system is a two-player game with each player’s motive being ‘to win’. The defender’s win is in making the attack expensive, reducing the system’s vulnerability, minimizing the attacker's chance of a return, and generating a robust system. On the other hand, the attacker’s win is in successfully exploiting the vulnerability of the system, inserting shillers, and generating bias in the system’s functionality.

To detect attack profiles mounted by the attacker in the database, various supervised and unsupervised shilling detection techniques have been discussed in the literature. A supervised detection technique using two attributes namely, Weighted Degree Agreement (WDA) and Filler Mean Target Difference (FMTD) has been proposed by Mobasher et al. (2005). Batmaz et al. (2020) proposed a technique that uses six generic and four model-specific attributes and employs kNN and SVM for classification. Cao et al. (2018), on the other hand, proposed an outlier degree detection algorithm based on dynamic feature selection. Zhou et al. (2016) proposed a two-phase SVM-TIA detection method using the Borderline-SMOTE method to balance the number of attack profiles in the training set to get rough detection results in phase-1. The target items are analyzed from attack profiles in phase-2. Supervised detection methods based on deep learning are proposed in Tong et al. (2018) and Zhou et al. (2020) considering 1 layer and 2 layer each for convolution and pooling, respectively. The basis of many unsupervised detection methods is clustering with the purpose to detect a group of attack profiles instead of a single attack profile (Mehta 2007; Mehta et al. 2007; Mehta and Nejdl 2009). Chirita et al. (2005) introduced Rating Deviation from Mean Agreement (RDMA) considering rating deviations between profiles. Few variations of PCA explored by authors are combining PCA with data complexity (Zhang et al. 2018a) and PCA with perturbation (Deng et al. 2016). Liu et al. (2019) proposed another unsupervised method using a Kalman filter based on time while Zhang et al. (2018a, b) exploited user’s suspicious degree based on past behavior using the hidden Markov model and hierarchical clustering.

GWO is a SI technique that has shown various applications in literature including—a prediction model using GWO with fuzzy sets to detect the diabetes disease at an early stage (Manikandan 2019), finding out the optimal feature set for the diagnosis of Parkinson’s disease (Sharma et al. 2019), optimal feature selection (Emary et al. 2016), training multi-layer perceptron (Mirjalili 2015), dimensionality reduction keeping accuracy high (Elhariri et al. 2016) taking advantage of multi-objective characteristics of GWO (Emary et al. 2015).

Table 1 provides a summary of various detection techniques and the application of GWO in the literature.

Table 1 Summary of detection techniques and application of GWO

This section discussed and highlighted several limitations of current detection techniques, such as the high cost involved in training labeled data, hand-designed features, and having certain prior knowledge of attack profiles in case of unsupervised methods. Further, GWO being multi-objective, i.e., it reduces dimensionality and maximizes classification accuracy at the same time, has shown remarkable results on a binary classification problem (Emary et al. 2015, 2016). Detection of shilling profiles in the dataset being a binary classification problem (Wang et al. 2015) motivated us to mathematically model the social behavior of grey wolves to distinguish between genuine and fake profiles that can manipulate recommendations generated.

3 Background

3.1 Shilling attacks

The recommendations generated by the CF depend on similar users in the neighborhood of the target user. The neighborhood can be manipulated by adding fake user profiles in the database and thus generating bias in recommendations (Gunes et al. 2014). This is termed as shilling attack or profile injection attack and is mounted with the intent of promoting or demoting an item.

From the attacker’s perspective, the best attack is one that requires a minimum amount of information about the dataset, demands minimum effort, and maximizes the similarity between the shilling and genuine profiles. Taking into consideration these aspects, we have chosen average, bandwagon, random, and segment attack models among the six well-known shilling attack models (Batmaz et al. 2020), for mounting fake profiles in the database. The attack profiles are generated following the attack models described below:

  1. 1.

    Random attack

    Random attack is a low-knowledge attack. In order to mount such an attack, the mean of all ratings in the dataset is required (Mobasher et al. 2005; Bilge et al. 2014).

  2. 2.

    Average attack

    The average attack is a high-knowledge attack that proves to be successful even with a smaller filler size and can be used as a push or nuke attack (Burke et al. 2015). The average rating of each item is required by the attacker to mount such an attack.

  3. 3.

    Bandwagon attack

    It is a low-knowledge attack that is almost as successful as an average attack but does not need information about the mean of each item and thus is more practical to mount. It is based on highly visible items or items that a significant number of users have rated. These items are termed as selected items (\({\text{I}}_{\text{S}})\) and are assigned maximum rating along with the target item (Mobasher et al. 2005; Mobasher et al. 2007; Burke et al. 2015).

  4. 4.

    Segment attack

    It is another low-knowledge attack that mounts the attack profiles by targeting a set of users that may be interested in the target item instead of the entire user’s set thus making it more meaningful and resource-saving (Mobasher et al. 2005; Burke et al. 2015; Bansal and Baliyan 2019a, b). The attack model resembles that of a bandwagon attack. For experimentation purposes, we have considered the horror movie segment. All users who have given a rating of 3 or higher, to at least 4 horror movies form a group of segment users.

The analysis in Bilge et al. (2014) depicts an increase in prediction shift with increasing filler size in case of Discrete Wavelet Transform (DWT)—based Privacy Preserving Collaborative Filtering (PPCF) for an average attack due to the transformation of successive items together. On the other hand, in case of k-means clustering-based PPCF, as filler size grows average attack becomes less successful (Bilge et al. 2014). In general, users rarely provide ratings to items, leaving most items unrated in a genuine user profiles, resulting in high sparsity in the dataset. Keeping the filler size high increases number of ratings in attack profiles and thus increases the chances of attack profiles to be less similar to authentic users (Sundar et al. 2020). Furthermore, in the case of average attack, efforts required to retrieve the mean rating of each item increases with increase in filler size (Mobasher et al. 2007). In addition, even with small filler size, average attack can prove to be just as successful (Mobasher et al. 2005). Therefore, the filler size, i.e., 1%, 3%, 5%, and 7% for all four attacks is chosen taking into consideration the knowledge efforts and sparsity of the dataset, keeping most items unrated in the attack profiles similar to genuine profiles.

The attack model varies slightly depending upon the type of attack (Gunes et al. 2014) as described in Table 2.

Table 2 Attack models

3.2 Basic grey wolf optimization model

GWO proposed by Mirjalili et al. (2014) is a SI technique that mimics the social behavior of grey wolves to capture the prey. Grey wolves live in a group of 5–12 and have a strong dominance hierarchy. Some of the advantages of GWO (Hatta et al. 2019; Emary et al. 2015; Mirjalili et al. 2014; Faris et al. 2018) are: simplicity, ease to operate, few operators unlike the genetic algorithm (crossover, mutation, and so on), and a high convergence rate. Further, it is flexible i.e. can be applied in various applications such as optimization, power engineering, bioinformatics, image processing, etc. To leverage the above-mentioned benefits of GWO, a huge volume of work has been done on applying GWO in solving problems of various domains. In the mathematical model of GWO, the fittest solution is α followed by β and so on as in the hierarchy shown in Fig. 1 (Mirjalili et al. 2014).

Fig. 1
figure 1

Social hierarchy of grey wolves

GWO starts by assigning random positions to grey wolves (search agents). The fitness function is used to compute the fitness value of each search agent based on the current position. Throughout iteration, α, β, and δ are assigned best positions (closest to prey) and other search agent's positions are updated accordingly. The components of \(\overrightarrow{\text{a}}\) are linearly decreased from 2 to 0 throughout iterations (Mirjalili et al. 2014). This process is repeated till the termination condition is reached. The pseudocode for GWO is given below:

figure a

4 Proposed work

4.1 Motivation

The trust and reliability of the user on recommendations generated are extremely substantial for the continuity of the system. Malicious users may compromise with the trust and reliability of recommendations by injecting fake user profiles in the database. Therefore, the purpose is to nullify/minimize the effect of fake profiles on recommendations generated. There exists a high correlation among shillers due to the same underlying model used to generate them (Mehta et al. 2007). Therefore, the detection of shillers can be seen as a dimensionality reduction problem and thus minimizing the redundancy that exists in the database. GWO has the capability of solving bi-objective problems i.e. dimensionality reduction keeping high classification accuracy. Further, it has been used for feature selection in various applications of machine learning (Al-Tashi et al. 2020). But, to the best of our knowledge, no work till now has used GWO for the detection of shillers in RS. Further, detection of fake profiles can be seen as a binary classification problem: 1 for a genuine profile and 0 for a fake profile. Considering this, a binary version of GWO has been used in the detection of shilling attacks.

  1. 1.

    Proposed approach

    This subsection details the proposed algorithm (ShillDetector) for the detection of fake profiles in the database following the attack models namely, average attack, bandwagon attack, and segment attack. The ShillDetector takes advantage of the application of GWO i.e. feature reduction (Emary et al. 2015). Further, the algorithm is explained in a step-wise fashion:

  2. 2.

    Pre-processing phase

    The dataset is transformed into a user-item rating matrix \((R)\) consisting of \(M\) items and \(K\) users as shown in Table 3. Here, \(?\) denotes an item not seen/not rated by the user.

  3. 3.

    Clustering of users

    In this step, clusters of users are created based on Pearson correlation among users using \(k\)-Means. Next, we find the top-\(\text{N}\) highly correlated users based on the Pearson correlation coefficient computed. Finally, the cluster number containing the maximum number of top-\(\text{N}\) highly correlated users is returned which is used in a later stage of the proposed approach. This step is based on the hypothesis that fake profiles have a higher correlation among them as compared to genuine profiles (Mehta and Nejdl 2009). Therefore, the cluster number returned will be containing most of the shillers. However, the resultant cluster may contain genuine profiles also.

  4. 4.

    Transpose of matrix

    ShillDetector is a dimensionality reduction technique that considers users as features. Therefore, we transpose the user-item rating matrix to store users as columns instead of rows i.e. \({R}^{T}\).

  5. 5.

    Feature importance computation

    We compute the importance of each feature (user) by importing \(feature\_importance\) attribute of the random forest regressor from \(sklearn.ensemble\). The intuition behind this step is to get a low feature importance value for highly correlated users as such users do not contribute much to the functionality of any system and are thus considered redundant. The importance of each feature in \(feature\_importance\) attribute is computed based on the feature’s contribution in determining the split. The aggregation of the importance of all features is 1 with each user's importance between 0 and 1.

  6. 6.

    Mathematical computation on lines of GWO

    This step is built following the original GWO model (Mirjalili et al. 2014).

    1. a.

      We first initialize vectors and variables required in ShillDetector.

      • i. \({{\upalpha }}_{\mathrm{pos}}\),\({\upbeta }_{\mathrm{pos}}\) and \({\updelta }_{\mathrm{pos}}\) \(= \overrightarrow{0}\). They are binary vectors of size (\(1\times n\_users\)) where \(n\_users\) represents the total number of users in the dataset. 1 in binary vector represents genuine profile whereas 0 represents a fake profile. Among all search agents, three search agents nearest to prey are termed as α, β, δ, and their current position is stored in \({{\upalpha }}_{\mathrm{pos}}\),\({\upbeta }_{\mathrm{pos}}\) and \({\updelta }_{\mathrm{pos}}\) respectively.

        \({{\upalpha }}_{\mathrm{score}}\),\({\upbeta }_{\mathrm{score}}\) and \({\updelta }_{\mathrm{score}}\) represents the fitness score of α, β, and δ. They are initialized to 0.

      • ii. Randomly initialize the position of all search agents (grey wolves) who live in a pack of 5–12. The position of each search agent is represented using one-dimensional binary vector of size (\(1\times n\_users\)). Figure 2 shows an instance of a randomly initialized position of search agents where 1 signifies the genuine profile and 0 signifies the fake profile.

    2. b.

      Next, the fitness of each search agent is computed using the objective function as described by Eq. (1).

      $$\begin{aligned} & \text{Maximize fit (i)}=\left({\alpha }\times {\text{agg\_imp\_feature [i]}}\right) +\left(\upbeta \times \frac{\text{selected\_features [i]}}{\text{total\_features}}\right)\\ & \text{Subject to} \\ & \text{Constraints}\,{\alpha },\upbeta = 0.5\text{ to mark the balance between two},\end{aligned}$$
      (1)

      where \(i\) represents the search agent; \(selected\_features[i]\) refers to the total number of 1’s in the vector of search agent; \(\text{agg}\_\text{imp}\_\text{feature}[i]\) represents aggregation of the importance of features of search agent computed in step “4 Feature importance computation”; \(\text{total}\_\text{features}\) is the total number of features in the dataset.

      In \(\text{agg}\_\text{imp}\_\text{feature}[\text{i}]\), the importance of all features enabled in search agent, i.e., 1 (genuine profile) and not belong to selected cluster in step 2 is added along with the importance of all features disabled in search agent, i.e., 0 (fake profile) and belong to the selected cluster. Here, the selected cluster contains highly correlated users based on the hypothesis that fake profiles having a higher correlation among them as compared to genuine profiles. To sum up, we aggregate the importance of all features that are being correctly identified by the search agent.

    3. c.

      Find the fittest (best) search agent, i.e., search agent with maximum fitness value and assign the position vector and score to \({{\alpha }}_{\text{pos}}\) and \({{\alpha }}_{\text{score}}\) respectively. Similarly, find second and third fittest search agents and assign values to\({\upbeta }_{\text{pos}}\), \({\upbeta }_{\text{score}}\) and\({\updelta }_{\text{pos}}\), \({\updelta }_{\text{score}}\), respectively.

    4. d.

      Update \(\overrightarrow{\text{a}}\) which is used in the sub-point ‘e’.

      $$\overrightarrow{\text{a}}=2-\text{l}\times \frac{2}{\text{max}\_\text{iter}}$$
      (2)

      where \(\overrightarrow{\text{a}}\) is a co-efficient vector; \(\text{max}\_\text{iter}\) refers to maximum number of iterations; \(\text{l}\) ranges from 0 to \(\text{max}\_\text{iter}\)

    5. e.

      The position of each search agent is updated using Eq. (3)—Eq. (5) taking inspiration from the original GWO encircling process.

      $${\text{D}}_{{\alpha }}=\left|\left(\text{C}1\times {{\alpha }}_{\text{pos}}\left[\text{j}\right]\right)-\text{ position}\left[\text{i}\right]\left[\text{j}\right]\right|;\text{ X}1= {{\alpha }}_{\text{pos}}\left[\text{j}\right]-(\text{A}1\times {\text{D}}_{{\alpha }})$$
      (3)
      $${\text{D}}_{\upbeta }=\left|\left(\text{C}2\times {\upbeta }_{\text{pos}}\left[\text{j}\right]\right)-\text{ position}\left[\text{i}\right]\left[\text{j}\right]\right|;\text{ X}2= {\upbeta }_{\text{pos}}\left[\text{j}\right]-(\text{A}2\times {\text{D}}_{\upbeta })$$
      (4)
      $${\text{D}}_{\updelta }=\left|\left(\text{C}3\times {\updelta }_{\text{pos}}\left[\text{j}\right]\right)-\text{ position}\left[\text{i}\right]\left[\text{j}\right]\right|; \text{X}3= {\updelta }_{\text{pos}}\left[\text{j}\right]-(\text{A}3\times {\text{D}}_{\updelta })$$
      (5)
      $$\text{X}= \frac{\text{X}1+\text{X}2+\text{X}3}{3}$$
      (6)

      where \(\text{A}1,\text{ A}2,\text{ and A}3\) are coefficient vectors computed using Eq. (7); \(\text{C}1,\text{ C}2,\text{ and C}3\) are coefficient vectors computed using Eq. (8); \({\text{D}}_{{\alpha }}\), \({\text{D}}_{\upbeta }\), \({\text{D}}_{\updelta }\), \(X1,X2 \text{and} X2\) are vectors; \(\text{position}[\text{i}][\text{j}]\) is the value of search agent ‘i’ for feature ‘j’; \({{\alpha }}_{\text{pos}}\left[\text{j}\right]\), \({\upbeta }_{\text{pos}}[\text{j}]\), \({\updelta }_{\text{pos}}[\text{j}]\) represent positional value for feature ‘j’.

      $$\text{A}1,\text{A}2,\text{A}3=2\overrightarrow{\text{a}}.\overrightarrow{{\text{r}}_{1}}- \overrightarrow{\text{a}}$$
      (7)
      $$\text{C}1,\text{ C}2,\text{ C}3=2. {\overrightarrow{\text{r}}}_{2}$$
      (8)

      where \(\overrightarrow{{\text{r}}_{1}}\) and \(\overrightarrow{{\text{r}}_{2}}\) are random vectors in the range [0,1].

      Update the position vector of the search agent by finding a sigmoid of \(\text{X}\) taking inspiration from the hunting step of GWO.

    6. f.

      Repeat step 5 (sub-point ‘b’ to ‘e’) till \(\text{max}\_\text{iter}\) is reached or algorithm converges i.e. there is no improvement over the past two iterations.

  7. 7.

    Use \({{\alpha }}_{\text{pos}}\) for the detection of fake profiles from the dataset.

Table 3 User-item rating matrix
Fig. 2
figure 2

Random initialization of 4 search agents (wolves) and 5 users

5 Experiments and results

In this section, the dataset is discussed followed by experiments and results.

5.1 Dataset and experimental methodology

For experimentation purposes, a publicly available MLFootnote 3 dataset of size 100 K has been used. The user rates the movie on a scale of 1–5 giving a rating to at least 20 movies. The users corresponding to ML 100 K dataset are considered genuine while fake profiles/shillers are added to the dataset using the attack model. Different attack and filler sizes have been considered for generating attack profiles keeping the target item constant for experimental purposes. The proposed approach is an unsupervised technique and therefore does not require any additional training time.

5.2 Parameter setting

Several parameters need to be defined while implementing and analyzing ShillDetector.

5.2.1 Fixed parameter

There are a few parameters that need to be initialized and remain the same for every experiment of ShillDetector as mentioned in Table 4. Further, seeking the advantages of GWO, only 2 hyperparameters (\(\text{a}\) and \(\text{c}\)) that helps the learning process have to adjust.

Table 4 Parameters and their value

5.2.2 Varied parameters

The attack size and filler size are two parameters that are being varied to investigate the proposed approach.

  1. a.

    Attack size

    It is defined as a ratio of fake profiles to the total number of profiles. The attack size ranging from 1 to 30% is being considered for experimentation purposes taking into consideration information and efforts of the attacker to mount the attack.

  2. b.

    Filler size

    It is defined as the ratio of ratings provided in the user profile to the total number of items in the dataset (Zhou et al. 2016). Taking into consideration the sparsity of the dataset, filler size is usually kept small i.e. 1%, 3%, 5%, and 7% make the fake profile resemble genuine profiles.

5.3 Evaluation metrics

To evaluate the performance of the proposed approach, several standard metrics have been used (Sharma et al. 2019; Al-Tashi et al. 2019). Each evaluation metrics computes the average of \(\text{M}\) runs where \(\text{M}\) is taken to be 10.

  1. a.

    Classification accuracy

    It indicates the correctness of ShillDetector in classifying fake and genuine profiles.

    $$\text{Classification accuracy}= \frac{\text{X}}{\text{Total}\_\text{profiles}}\times 100$$
    (9)

    where \(X\) indicates the number of correctly classified profiles.

  2. b.

    Detection rate

    It signifies the % of correctly identified fake profiles.

    $$\text{Detection rate}= \frac{\text{Y}}{\text{total}\_\text{fake}\_\text{profiles}}\times 100$$
    (10)

    where \(Y\) is the number of fake profiles correctly identified.

  3. c.

    False Alarm Rate (FAR)

    It counts the number of genuine profiles classified as fake.

    $$\text{FAR}= \frac{\text{FP}}{\text{genuine}\_\text{profiles}}\times 100$$
    (11)

    where \(FP\) is the number of genuine profiles misclassified as fake.

  4. d.

    Precision

    It is defined as the number of fake profiles correctly classified to the number of profiles classified as fake.

  5. e.

    Recall

    It is the number of fake profiles correctly identified to the number of profiles.

5.4 Experiments and results

This subsection reports and discusses the results obtained by conducting several experiments from various perspectives.

5.4.1 Binary operator used

Each user in the dataset can be classified as fake/genuine as depicted in Fig. 3. Therefore, shilling profile detection can be considered as a binary classification problem that provides a label to each user i.e. fake or genuine. The proposed approach ShillDetector uses a binary version of GWO to detect fake profiles in the dataset. We have used binary operator \(\text{sigmoid}\)(Al-Tashi et al. 2019) to transform GWO into a binary version to fit the problem of feature selection.

Fig. 3
figure 3

Output of ShillDetector labeling users as fake/genuine

5.4.2 Result analysis

The performance of ShillDetector has been analyzed by conducting various experiments and results have been tabulated in Table 5. Filler size plays a crucial role in creating attack profiles and thus generating bias in recommendations. To fill up ratings of filler items in attack profiles created using average attack, a huge amount of information about the rating distribution (mean rating for every item) is needed by the attacker which is often difficult to extract and incurs huge efforts. Therefore, the filler size is kept small. Another reason to keep the filler size small is the sparsity of the dataset. As most items remain unrated in a genuine user’s profile, filler size is kept small i.e. 1%, 3%, 5%, and 7% to keep most items unrated in the attack profile too.

Table 5 Investigation results of ShillDetector using different attack models on ML 100 K

Investigations have been done by mounting different attacks considering different attack sizes. It is worth noting that, the classification accuracy of ShillDetector is above 99% in almost all cases depicting the correct classification of fake and genuine profiles. Further, a high detection rate i.e. above 95% is shown in all cases considered except at filler size 3% and attack size 1% for Bandwagon attack. However, the detection rate of 92.5% in such a case signifies the misclassification of 1 out of 10 fake profiles. The small FAR i.e. below 0.25 in most cases has been observed. However, in the case of filler size 3% and attack size 30%, FAR as high as 0.35 has been observed signifying misclassification of one genuine profile on an average. To sum up, it can be inferred that at max only one user profile (genuine/fake) has been misclassified by ShillDetector depicting its excellent performance.

ShillDetector shows excellent detection results in case of strong attacks such as average, bandwagon, and segment attack on different filler sizes ranging from 1 to 7% seeking to the high correlation between profiles due to underlying attack models.

5.4.3 Comparative analysis

In this subsection, a comparative analysis of ShillDetector with two other state-of-the-art approaches, namely, SVM-TIA (Zhou et al. 2016) and PCA-VarSelect (Mehta and Nejdl 2009) is drawn on the ML-100 K dataset. SVM-TIA is a variant of SVM that has been primarily explored for classification. On the other hand, PCA-VarSelect has originated from PCA which is a clustering technique. Both techniques have shown good accuracy in the detection of shilling attacks but have certain drawbacks. SVM-TIA requires target item analysis which is sometimes difficult to find when a large number of items are available. The performance of SVM-TIA becomes unstable in such situations. Further, it lacks effective results in terms of recall under small attack size. On the other hand, PCA-VarSelect requires prior knowledge of the total number of attack profiles which is infeasible in the real world. A comparison among these approaches is laid using precision and recall. Bandwagon, average and random attack model were considered and filler size of 3% taking a similar underlying part of all three approaches.

A comparison of all three approaches when attack profiles are inserted using bandwagon, average and random attack in terms of precision is shown in Fig. 4. ShillDetector outperformed PCA-VarSelect and SVM-TIA on the average attack. For bandwagon attack, at attack size, 1%, the precision of ShillDetector is found to be 0.977 which is slightly small in comparison to a precision of 0.985 for SVM-TIA. However, ShillDetector's performance is excellent in this case also as the accuracy of classification is still 99.89% as can be seen from Table 5. Only one fake profile has been misclassified. On a weak attack, such as a random attack, fake profiles do not have a high correlation among them due to the underlying model. Further, the fake profiles tend to be distributed across different clusters, thus making it difficult for ShillDetector to detect them. However, such attacks have a diminutive impact on the performance of recommendations as they are weak attacks and seldom occur in the neighborhood of the target user (Mobasher et al. 2007).

Fig. 4
figure 4

Comparison of ShillDetector with state-of-the-art approaches in terms of precision. a Comparative analysis of average attack. b Comparative analysis of bandwagon attack. c Comparative analysis of random attack

Next, a comparison between SVM-TIA and ShillDetector is drawn based on recall. The results are depicted in Fig. 5 when attack profiles are inserted using average, random, and bandwagon attack model. ShillDetector outperformed SVM-TIA on both average and bandwagon attack on all attack sizes considered with the highest recall value of 1 and the lowest recall value of 0.925. However, in the case of random attack, a lower recall value has been observed.

Fig. 5
figure 5

Comparison of ShillDetector with SVM-TIA in terms of recall. a Comparative analysis of average attack. b Comparative analysis of bandwagon attack. c Comparative analysis of random attack

To sum up, ShillDetector which uses a variant of recently developed SI technique namely, GWO, is an unsupervised technique i.e. no training process required and thus saves CPU Time. Further, hand-designed features are not required, unlike the supervised technique. It is easy to operate, implement, requires few parameters, and provides results with excellent classification accuracy.

6 Conclusion

In this paper, we proposed a novel approach namely, ShillDetector for the detection of fake profiles that can be inserted by the attacker in the dataset with a motive of generating bias in the recommendation process. ShillDetector is based on the Grey Wolf Optimization technique which is a swarm intelligence technique and mimics the social behavior of grey wolves for reaching the prey. The proposed approach exploits group characteristics that exist among shillers by working directly on a user-item rating matrix. Further, it works as a feature selection technique that is easy to operate, requires no training time, and has few parameters to adjust. Further, ShillDetector can be used as a pre-processed phase of any recommendation algorithm and thus can save the recommendation process from generating biased recommendations.