1 Introduction

E-commerce websites often provide rating systems to evaluate the quality of products or services. Such review resources are increasingly influencing consumer’s purchase decisions. For the sake of financial profits, some fraudulent reviewers fabricate spam reviews to promote their products or to demote their rivals’ products. Such reviewers are called fake reviewers or review spammers, and the products being spammed are called target products [8]. Review spammers often work in groups to fully control the sentiment of the target products, split total effort, and camouflage (i.e., to hide their spamming behaviors by arranging some group members to review irrelevant products or review normally to mislead spam detecting tools) [12, 25]. Such review spammer groups are more frequently occurred and are even more harmful than individual review spammers.

Since Jindal and Liu first proposed the fake review/reviewer detection problem in 2008 [8], there have been increasing research interests in this field. Early works focus on individual review spammer detection [6, 8, 9, 11, 15]. Recently, there is a trend aiming to detect group spammers [3, 5, 12, 16, 19, 22, 23, 25]. In earlier stage, supervised learning-based approaches are adopted, which heavily depend on labeled datasets to train classifiers. However, such methods are shown to be inaccurate due to the fact that there is no ground-truth (labeled) datasets for modeling or evaluation. To bypass the labeling obstacle, many unsupervised models are proposed to detect review spammers, e.g., Markov Random Field (MRF) method [5, 6, 16].

Compared to individual spammer detection, group spamming detection is not so extensively addressed. Much previous work in group spamming detection exploits frequent itemset mining (FIM) technique to generate candidate spammer groups and then build models to categorize them into spam/non-spam reviewer groups [3, 12, 22, 23]. As stated in [19], such methods can only find tight spammer groups in which each group member has to review all the target products. However, group spammers often work in a looser manner, i.e., reviewers are not required to review each target product in some group spam campaigns. In our previous work [19], we proposed a graph-based approach to mine loose spammer groups via bipartite graph projection, which can discover high-quality loose spammer groups from only the rating data, i.e., (userid, prodid, rating, timestamp) tuples, no review text is needed. It is well known that review text analysis is computationally inefficient and is often unreliable in review spam detection [14, 15, 23].

Our previously proposed method in [19], however, suffers from many drawbacks. For instance, the degree of collusion behavior (or similarity) between two reviewers in [19] is simply defined by the number of products that are co-reviewed by the two reviewers in a given time window. Clearly, this relationship metric ignores the review time interval and the rating score deviation, which are two key factors for describing the collusion behavior between two reviewers. Secondly, it generates spammer groups by incrementing the weight threshold to weaken the connectivity of the reviewer graph, so that smaller connected components come into being. In such a method, the granularity used to generate connected components is rough, and the connectivity of a connected component tends to be loose. In this paper, we propose another graph-based approach under a unified top-down computing framework for the review spammer group detection problem. The contribution of this work are three-fold:

  • We introduce a top-down computing framework, namely GGSpam (for Graph-based Group SPAM), to detect review spammer groups based merely on the topological structure of the reviewer graph. GGSpam treats the whole review data as a graph structure: reviewers as the nodes, and the collusiveness between reviewers as edges. By adopting a divide-and-conquer strategy, GGSpam recursively breaks the whole graph into small-sized subgraphs. This computing framework has many distinct advantages in comparison with other frameworks in group review spammer detection.

  • We propose a novel instantiation of GGSpam, namely GSBC, by modeling review spam groups as bi-connected graphs. GSBC models the collusiveness between two reviewers by considering both the review time interval and the rating score deviation for each common product reviewed by the two reviewers, which can better reflect the colluding behavior between two reviewers. We also design new group spam indicators to evaluate the spamicity of a group from various perspectives.

  • We conduct extensive experiments on real-world datasets with/without ground-truth to evaluate the performance of our proposed approach. Experimental results indicate that our proposed method can find high-quality review spammer groups and outperforms several state-of-the-art approaches, including both graph based and non-graph based, by a large margin.

The rest of the paper is organized as follows. Section 2 discusses the related works. Section 3 introduces the GGSpam framework. Section 4 proposes the bi-connected graph-based instantiation of the GGSpam. Section 5 reports the experimental results. We conclude the paper in Sect. 6.

2 Related work

Fake review/reviewer detection problem has gained increasing interest in recent ten years. The work can be approximately summarized into three categories: fake review content detection, fake reviewer detection, and fake reviewer group detection. For example, Liu et al. [8] employ (near) duplicate reviews as fake reviews to train classifiers; Ott et al. [15] employ Amazon Mechanical Turks (AMT) to crowd-source fake hotel reviews and use linguistic features analysis of review text to identify deceptive reviews; Lim et al. [11] use behavioral features in rating patterns to spot suspicious reviewers; Wang et al. [18] first introduced review graphs to compute the trustiness of reviewers; Xie et al. [20] detect reviewers who write singleton reviews by temporal analysis.

Recently, there have been increasing works in detecting group review spamming. Mukherjee et al. [12] first introduce FIM technique to generate candidate review spammer groups, which takes reviewer as items, and products as transactions. For example, by setting the minimum support count to 3, they can find candidate spammer groups each contain at least 2 reviewers and each reviewer at least reviews 3 common products. Based on these candidate groups, many computing frameworks are proposed to evaluate the suspicion of each candidate spammer group or individual spammers. For instance, [12] proposed GSRank to rank candidate groups using an iterative computing framework which captures the relationship among candidate groups, target products, and individual reviewers. Xu et al. [23] introduce a KNN-based method and a graph-based classification method to predict the spam/non-spam labels for each reviewer belonging to at least one FIM candidate group. Xu et al. [22] propose a statistical model which exploits the EM algorithm to compute the collusiveness of each group member from at least one FIM candidate group.

There are also many other efforts aiming to detect collusive reviewers without using FIM. Leman et al. [1] propose FraudEagle framework, which employs a Loopy Belief Propagation-based (MRF) inference algorithm that solely relies on network effect (the relational structure) among reviewers and products to rank fake reviewers. Spammer groups can be further obtained by doing graph clustering on the induced subgraph which contains the top suspicious reviewers and the corresponding products. Shebuti et al. [16] propose SPEagle, which is extended from FraudEagle by introducing an augmented graph (the review nodes), and incorporating meta information (e.g., review content, timestamps and star ratings, etc.) as priors, which can greatly improve the ranking precision. Ye et al. [25] propose a two-step method to discover review spammer groups. It first finds out all the suspicious products being spam campaign targets and then clusters spammers on a 2-hop induced subgraph from the top-ranked products. Xu et al. [21] propose FraudInformer framework to detect colluders via multiple heterogeneous pairwise features extracted from reviewers’ rating behaviors and linguistic patterns. A Markov random walk model-based iterative computing framework is exploited to rank reviewers according to their spamicity.

Unlike the above-mentioned group spamming detection frameworks, in [19] we propose a divide-and-conquer-based algorithm (GSBP) which is solely based on the topological structure of the reviewer graph revealing the behavioral similarity between two reviewers. In this paper, we introduce a more general computing framework which is inspired by the basic ideas in [19], and give a full instantiation of the framework. As such, our solution is notably different from, while being complementary to, the previous methods, such as the FIM-based and statistical learning-based approaches.

Review spamming techniques are evolving and vary from different domains. Therefore, there is no one-size-fits-all solution in detecting review spamming activities. Although there are a variety of spamming detection techniques proposed by researchers, there is no overall winner in discovering all kinds of spamming strategies. We argue that the best way to improve the detection recall is to incorporate as many techniques as possible.

3 The \(\hbox {GGS}_\mathrm{PAM}\) computing framework

In this section, we give a graph-based unified computing framework, namely GGSpam, to detect review spammer groups by generalizing the ideas in [19].

GGSpam computing framework

1.

Model a review spammer group g as a sextet form (\(R_g\),\(P_g\),\(V_g\),\(S_g\),\(\mathrm{SS}(g)\),\(\tau \)), where \(R_g\) is the set of review spammers (or members) in group g, \(P_g\) is the set of target products in group g, \(V_g\) is the set of reviews written by reviewers in \(R_g\) toward products in \(P_g\), \(S_g\) is the set of spam indicators, \(\mathrm{SS}(g)\) is a function of \(S_g\) measuring the spamicity of g, and \(\tau \) is the time window to measure the time closeness degree of co-reviewing

2.

Construct a global weighted reviewer graph \(G=(R_g, E)\), where \(R_g\) is the node set representing reviewers, and E is the edge set whose weights represent the similarity of collusive behaviors for the adjacent reviewers

3.

Design a divide-and-conquer algorithm to find all the suspicious review spammer groups whose \(\mathrm{SS}(g)\) value is above a given spamicity threshold

4.

Rank the detected spammer groups, that is, reorder the detected spammer groups so that more suspicious spammer groups are brought to front and non-spam groups are sent to back to improve the precision at top k position

Our previous work in [19] can be viewed as an instantiation of GGSpam, which models a spammer group as a k-connectivity graph, and defines the similarity between reviewers as the number of products that are co-reviewed by two adjacent reviewers. In this paper, we give another instantiation of GGSpam based on the bi-connected graph structure, which can better model the reviewer spammer groups and, as a result, significantly improve the quality of the detected spammer groups.

Our graph-based GGSpam framework has many distinct advantages compared to other frameworks. First, GGSpam directly generates holistic spammer groups, comprising group members, the target products, and the spam reviews issued by the group, which provides rich information and context for end-users to analyze and determine the spamicity degree of the detected groups. In comparison, some researchers deal with group spamming problem by only considering detecting individual spammers who are of collusive behaviors [1, 23]. For example, [1] exploits MRF to rank the spamicity of reviewers, and for further spotting the spammer groups, an additional stage is required to find review spammer groups by taking into account the top k% review spammers. Secondly, some supervised or unsupervised methods require that all instances being identically and independently distributed (iid), whereas review spammers usually closely interact with each other in comparison to the genuine reviewers; therefore, machine learning-based approaches which aim to find frequent patterns often are not applicable to this problem [2]. Since review fraud detection problem can be viewed as an anomaly detection problem, our graph-based approach is more suitable for this task because it can capture the underlying co-reviewing relationships among reviewers via the products they review. Moreover, as a divide-and-conquer-based algorithm, GGSpam is computationally efficient compared to machine learning-based approaches.

4 Bi-connected spammer group detection

Before we define the data model of the spammer groups, we make the following assumptions for group spamming activities. (1) Reviewers are required to fulfill the spamming campaign in a relatively short time period to pursue a maximum impact on target products; (2) Reviewers in the same spamming campaign either rate high scores to promote a product, or rate low scores to demote a product; (3) Each reviewer in the same campaign does not necessarily review all the target products, that is, reviewers can balance the review task by reviewing a subset of the target products. In this section, under the GGSpam computing framework, we describe the data model of a spammer group, its group spam indicators, the method to construct the reviewer graph, the divide-and-conquer-based spammer group detection algorithm, and the reordering techniques to rank detected spammer groups. We list the relevant notations and their meanings in Table 1.

4.1 The data model of bi-connected spammer groups

A bi-connected graph is a connected graph that is not broken into disconnected pieces by deleting any single vertex (and its incident edges). Conceptually, a bi-connected graph is a connected graph that, for each pair of node i and j, there exist two disjoint paths between node i and j. Bi-connected graphs are widely used in the field of networking due to its property of redundancy. As an instantiation of GGSpam, we model a spammer group as a set of review spammers who form a bi-connected graph. To begin with, we define the concept of a loose spammer group, based on which we derive the definition of the bi-connected spammer group.

Table 1 Notation table

Definition 1

Loose spammer group A loose spammer group (or spammer group for short) g is modeled as a sextet form (\(R_g,P_g,V_g,S_g,SS(g),\tau \)), where \(\tau \) is a user-specified time window during which a co-review action is captured; \(R_g \subseteq \mathcal {R}\) is the set of review spammers (or members) in group g; \(P_g \subseteq \mathcal {P}\) is the set of target products in group g, which refers to the products that are co-reviewed by at least two review spammers in \(R_g\) within time window \(\tau \); \(V_g \subseteq \mathcal {V}\) is the set of reviews written by reviewers in \(R_g\) toward products in \(P_g\) during co-review time window \(\tau \), \(V_g \subseteq R_g \times P_g\); \(S_g\) is a set of spam indicators measuring the spamicity of group g from different dimensions; \(\mathrm{SS}(g)\) is a scoring function of \(S_g\), which measures the spamicity of group g.

Note that \(\tau \) plays an important role in GGSpam framework, as spam campaigns are highly related to time period. As we will see in Sect. 5, [1] absolutely ignores timestamps, resulting in low detection accuracy. [16] uses timestamps as metadata (priors), which takes little effect in spotting fake review(er)s.

The group members in g inherently form a spammer group graph [19]. The node set of the graph is \(R_g\), and for any two members in the group, their adjacent edge reflects to what extent the two reviewers collude in promoting/demoting the common target products in \(P_g\).

Definition 2

Co-review collusiveness Given reviewer \(i, j \in \mathcal {R}\), if i and j co-review a product \(k \in \mathcal {P}\), we define the collusiveness of ijk as:

$$\begin{aligned} \mathrm{Collu}(i,j,k)= & {} \left\{ \begin{array}{l} 0,\ |t_i^k-t_j^k| > \tau \vee |\psi _i^k-\psi _j^k| \ge 2\\ {\varsigma _k\left[ \alpha \left( 1-\frac{|t_i^k-t_j^k|}{\tau }\right) +(1-\alpha )\left( 1-\frac{|\psi _i^k-\psi _j^k|}{2}\right) \right] }, \ \mathrm{otherwise} \end{array} \right. \nonumber \\ \varsigma _k= & {} \frac{2}{1+\mathrm{e}^{-(\mathrm{MP}-\deg (k))^\theta +2^\theta }}-1 \end{aligned}$$
(1)

where \(\alpha \) is a coefficient to balance the importance of the time difference and the rating score difference, \(\varsigma _k\) is the suspicion degree of product k, \(\deg (k)\) is the number of reviewers who reviewed product k, MP and \(\theta \) are user-specified parameters.

Note that if reviewer i and j co-review product k beyond time window \(\tau \) or their rating score deviation is greater than 1, the co-review would not be considered, which significantly reduces the number of trivial co-reviews (co-reviews by coincidence). Figure 1 plots \(\varsigma _k\) for \(\theta =0.1,0.2,\) and 0.4 (\(\mathrm{MP}=1000\)). Intuitively, the larger the number of reviewers of a product, the less likely the product involves spamming.

Definition 3

Bi-connected spammer group graph For a given spammer group g, the bi-connected spammer group graph of g (if exists) is a bi-connected and weighted graph, denoted by \(G_g=(R_g,E)\), where \(E \subseteq R_g \times R_g\) is the edge set. For reviewer i,\(j\in R_g\), the weight of edge (ij), denoted by \(\omega (i,j) \in [0,1]\) , is defined as:

$$\begin{aligned} \omega (i,j)=\frac{2}{1+\mathrm{e}^{-\sigma (i,j)}}-1 \end{aligned}$$
(2)

where

$$\begin{aligned} \sigma (i,j)=\left[ \sum \limits _{k\in P_i\cap P_j}{\mathrm{Collu}(i,j,k)}\right] \, \frac{|P_i\cap P_j|}{|P_i\cup P_j|} \end{aligned}$$
(3)

where \(P_i\) and \(P_j\) are the sets of products reviewed by i and j, respectively.

Fig. 1
figure 1

Product suspicion versus num. of reviewers of the product (\(\mathrm{MP}=1000\))

Fig. 2
figure 2

The Sigmoid function

The Jaccard similarity coefficient in Eq. 3 reflects the fact that collusion is more likely to happen when most of the products reviewed by the two reviewers are in common. \(\omega (i,j)\) can be considered as the normalization of \(\sigma (i,j)\) using the Sigmoid function which is plotted in Fig. 2.

Definition 4

Bi-connected spammer group Given a spammer group g, if the spammer group graph of g is a bi-connected spammer group graph, we call g a bi-connected spammer group.

There are many advantages to model spammer groups as bi-connected graphs rather than merely connected graphs as in [19]. (1) A bi-connected component is more tightly coupled than a connected component; therefore, it can better reflect the fact that review spammers work collaboratively to fulfill the campaign task; (2) given a connected graph G, we can compute its bi-connected components efficiently in linear time [7]; (3) a group of reviewers forming a connected graph might seem to be normal, while its bi-connected subgraphs might be highly suspicious spammer groups; (4) although some spammer groups might be very loosely connected which results in many bi-connected subgroups, these bi-connected subgroups can be easily combined based on the nearby time window and/or similar target product set.

Fig. 3
figure 3

A 2-connected spammer group and its three bi-connected spammer groups. a A 2-connected component in GSBP. b Three bi-connected components

To illustrate the rationale of bi-connected spammer groups, we take a closer look at a real-world spammer group from Amazon.com detected by the method in [19]Footnote 1 as show in Fig. 3a. The edge weights in Fig. 3a denote the number of commonly reviewed products by two reviewers as defined in [19], while the edge weights in Fig. 3b denote the similarity values computed by Eq. 2. We can see that this group is very sparse and the reviewers loosely reviewed a set of target products in a very narrow time window. By modeling spammer groups as bi-connected graphs, this group splits into 3 overlapping bi-connected spammer groups as shown in Fig. 3b, and the spamicity scores [\(\mathrm{SS}(g)\)] of the 3 groups are 0.70, 0.51, and 0.50, respectively. We can see that each bi-connected spammer group is more tightly coupled than the original connected component in Fig. 3a, which suggests a more suspicious spamming group. According to the human evaluation, all the 7 reviewers were identified as spammers. However, if the holistic group were not identified as a spammer group, by the bi-connected graph modeling, we would still have chance to check for spamicity at a smaller scale, whereas in [19], by raising the weight threshold from 2 to 3, the third bi-connected group will disappear. On the other hand, the 3 bi-connected spammer groups cover all the 7 reviewers in the whole spammer group and it is easy to combine them into the holistic spammer group based on the time window and the common products they review.

4.2 Group spam indicators

It is very important to design effective spamming indicators to measure the suspiciousness of group spamming. In [19] we proposed 8 spam indicators for group spamming behaviors. Here, we also give 8 group spam indicators on the basis of the bi-connected graph data model. One advantage of our GGSpam-based approaches is the ability to detect spammer groups of desired spam characteristics by adopting different spamming indicators. Weights can also be assigned to the spam indicators based on domain knowledge or empirical analysis. In this study, we do not use linguistic features which usually incur heavy computation overload, although these indicators are convenient to be incorporated into the detection system. Indicators marked * are identically defined in [19], whereas we still list them here for completeness. Note that we also use the penalty function defined in [19],

$$\begin{aligned} L(g) = \frac{1}{1+\mathrm{e}^{-(|R_g|+|P_g|-3)}} \end{aligned}$$
(4)

to reduce the contingency of small-sized spammer groups. As the minimum number of reviewers is 2 and the minimum number of products is 1, \(L(g)\in [0.5,1)\).

  1. 1.

    Review tightness (RT*) For a given spammer group g, the review tightness of g, denoted by RT(g), is defined as the ratio of the number of reviews in g to the cardinality of the cartesian product of the reviewer set and the product set in g, and multiplies L(g).

    $$\begin{aligned} \mathrm{RT}(g)=\frac{|V_g|}{|R_g||P_g|}L(g) \end{aligned}$$
    (5)
  2. 2.

    Neighbor tightness (NT) In a tightly coupled spammer group, the collusion relationship among reviewers intends to be stronger than those in genuine reviewer groups. Thus we define the Neighbor tightness of a group g as:

    $$\begin{aligned} \mathrm{NT}(g) = \frac{\sum \nolimits _{i,j \in R_g}{\omega (i,j)}}{{|R_g| \atopwithdelims ()2}}L(g) \end{aligned}$$
    (6)

    This indicator is a counterpart of the NT indicator defined in [19] where the collusion behavior is computed only by the average value of the Jaccard similarities of the product sets of each reviewer pair.

  3. 3.

    Product tightness (PT*) For tight spammer groups, group members concentrate on a certain number of products, and if these reviewers do not review any other products, they are most likely to be spammers. Given a spammer group g, the product tightness of g is defined as the ratio of the number of common products reviewed by all the members in g to the number of products reviewed by all the members in g:

    $$\begin{aligned} \mathrm{PT}(g)=\frac{|\cap _{r\in R_g}{{P}_r}|}{|\cup _{r \in R_g}{{P}_r}|} \end{aligned}$$
    (7)
  4. 4.

    Average time window (TW) Compared to randomly formed groups, spammer groups often post fake reviews in a short time period. Given a spammer group g, and a product \(p\in P_g\), we define the time window spamicity of p as:

    $$\begin{aligned} \mathrm{TW}_p(g,p)=\bigg \{ \begin{array}{l} 1-\frac{\mathrm{SD}_p}{T},\quad \ \mathrm{SD}_p\le T\\ 0,\quad \ \mathrm{SD}_p>T \\ \end{array} \end{aligned}$$

    where \(\mathrm{SD}_p\) is the standard deviation of review time for product p reviewed by reviewers in \(R_g\), T is a user-specified time threshold, say, 30 days. Unlike [19], which considers the distance between the first and the last review date, we use the standard deviation of review dates to take account of the overall review time distribution. The TW indicator of g is then defined as the average time window spamicity of all products in \(P_g\):

    $$\begin{aligned} \mathrm{TW}(g)={\mathrm{avg}}_{p\in P_g}{\mathrm{TW}_p(g,p)L(g)} \end{aligned}$$
    (8)
  5. 5.

    Rating variance (RV*) Group members tend to rate identical or similar scores. We use the same method to define the RV indicator as in [19]:

    $$\begin{aligned} \mathrm{RV}(g)=2\left( 1-\frac{1}{1+\mathrm{e}^{-{\mathrm{avg}}_{p\in P_g}{S^2(p,g)}}}\right) L(g) \end{aligned}$$
    (9)

    where \(S^2(p,g)\) denote the variance of the rating scores of product p by reviewers in g.

  6. 6.

    Reviewer ratio (RR) If the target products in \(P_g\) are mainly reviewed by the reviewers in \(R_g\), while reviewers not in \(R_g\) are rare, then the group is more likely to be a spammer group. RR is defined as the maximum ratio of the number of reviewers in \(R_g\) who review product p to the number of all the reviewers of p, \(p \in P_g\):

    $$\begin{aligned} \mathrm{RR}(g)={\max }_{p\in P_g}{\frac{{|R_g}_p|}{|R_p|}} \end{aligned}$$
    (10)
  7. 7.

    Multiple Review (MR) Review spammers often post duplicate or similar reviews many times to attract eyeballs. Given a spammer group g, we define the MR indicator as the ratio of the number of reviews which involve multiple reviewing to the number of total reviews in group g:

    $$\begin{aligned} \mathrm{MR}(g)=\frac{|\{v| v\in V_g, |(R_v,P_v)|\ge 2\}|}{|V_g|} \end{aligned}$$
    (11)
  8. 8.

    Group size (GS*) Large spammer groups are more interesting than small groups because large groups are more damaging than smaller ones. On the other hand, a large portion of small groups (of size 2 or 3) are formed by accident rather than on purpose. Therefore, we take into account the group size indicator which is in favor of large spammer groups. The GS indicator is defined as

    $$\begin{aligned} \mathrm{GS}(g)=\frac{1}{1+\mathrm{e}^{-(|R_g|-3)}} \end{aligned}$$
    (12)

We do not use the Early Review (ER) indicator used in [19], because we observed that the ER indicator often correlates to other indicators, e.g., PT, RR. Meanwhile, we introduce a new indicator MR to capture the fact that spammers often review a product many times to attract eyeballs.

4.3 The GSBC algorithm

In [19], we give a divide-and-conquer-based algorithm (GSBP) to mine k-connected spammer groups. There the spammer groups are obtained by incrementing the edge weight threshold k by one, \(k \ge 1\), to weaken the connectivity of a graph so that strongly connected subgraphs appear. In this study, we introduce notably different data model for the spammer groups: (1) Spammer groups are modeled as bi-connected rather than loosely connected components; (2) The edge weights become real numbers ranged in [0, 1] rather than integers. Therefore, GSBP is no longer applicable to our proposed bi-connected spammer groups. In this section, we give a new approach to mining bi-connected spammer groups under the GGSpam computing framework. Specifically, we first design an efficient algorithm to compute the global reviewer graph, and then, we traverse the reviewer graph to find all the bi-connected components in each connected component of the reviewer graph. For large and/or unsuspicious bi-connected graphs, we use minimum cut algorithm to split them into smaller connected graphs so that they can be further investigated for spam detection recursively.

figure e

Algorithm 1 (GSBC) takes as input a bipartite graph \(\mathcal {B}\) which consists of reviewer nodes and product nodes, the time window \(\tau \) and a edge weight threshold \(\delta \). It first invokes Algorithm 2 (ConstructReviewerGraph) to construct a global reviewer graph, namely \(G=(\mathcal {R},E)\), where E is the edge set. For each edge \(e\in E\), \(\omega (e)\ge \delta \). Then for every connected component g in G, Algorithm 3 (FindGroups) is invoked to mine all the bi-connected spammer groups.

figure f

Algorithm 2 (ConstructReviewerGraph) is an efficient algorithm for computing the reviewer graph. It checks the common reviews toward product k between reviewer i and reviewer j. The edge weight between i and j is computed according to Eqs. 13. Edges with weight \({<}\delta \) are removed from the graph to reduce the computation complexity in mining bi-connected spammer groups. The time complexity of Algorithm 2 is \(O(|\mathcal {V}|)\), because each review (an edge in \(\mathcal {B}\)) is visited exactly twice.

figure g

Algorithm 3 (FindGroups) is a divide-and-conquer-based algorithm. For each connected component c in reviewer graph G, it searches for all the bi-connected components \(\{b\}\) in c. If the size (number of reviewers) of the bi-connected component b is below the given threshold MAXSIZE, then the group spam indicators are computed and if \(\mathrm{SS}(b)\) is greater or equal to the minimum spamicity threshold (MINSPAM), then b is output as a bi-connected spammer group. Otherwise, if the size of b is greater than MAXSIZE, or \(\mathrm{SS}(b)\) is below MINSPAM, then b is divided into two connected component, \(c_1\) and \(c_2\), by the minimum cut algorithm. Then \(c_1\) and \(c_2\) are further processed by Algorithm 3 recursively.

Searching for bi-connected graphs in a connected graph takes linear time in the number of the nodes of the graph [7], that is, \(O(|\mathcal {R}|)\). Bi-connected groups whose \(\mathrm{SS}(g)<\) MINSPAM, however, have to be split into two connected subgroups to further investigate their spamicity. To safely split a large bi-connected graph into smaller connected components, we employ the minimum cut algorithm which split a connected graph by removing some edges such that the sum of their weights is minimum. The time complexity of the minimum cut algorithm is \(O(|V||E|+|V|^2\log |V|)\) by using a priority queue [17], where V and E are the vertex set and the edge set of the graph, respectively. Fortunately, most of the bi-connected graphs have less than MAXSIZE nodes in real-world review dataset. As such, the overall time complexity of Algorithm 3 is near \(O(|\mathcal {R}|\log |\mathcal {R}|)\). The final running time is severely affected by the number of edges in G. Nonetheless, we can always control the number of edges (i.e., the outliers) by specifying a sufficiently large \(\delta \), a predefined parameter.

4.4 Spammer group ranking

The last step in GGSpam, which can be viewed as a post-processing phase, is to rank the spammer groups detected by a divide-and-conquer-based algorithm, e.g., GSBC. The goal of this step is to improve the precision of the algorithm by analyzing various spam indicators. Besides the 8 group spam indicators, other spam indicators, e.g., linguistic features of review content proposed in [10] shall also be useful to determine the spamicity of the groups.

The most simple yet effective way to rank the detected bi-connected spammer groups is by averaging the 8 group spam indicator values. However, the 8 group spam indicators have different capabilities to discriminate the spam/non-spam behaviors of a spammer group. Naturally, we can use linear regression to determine the weights of the spam indicators. To train a regression model, we need to know the labels of the detected spammer groups. However, group spam labels are often hard to obtain. In such cases, pseudo-supervised method can be used, i.e., treating the top N groups as fake review groups, and randomly formed groups as genuine review groups.

5 Experimental study

5.1 The datasets

We use 5 real-world datasets to evaluate our GSBC approach, two are from Amazon.com without ground-truth, three are from Yelp.com with near-ground-truth. Table 2 shows the summary statistics of the five datasets. AmazonBooks contains book reviews extracted from the Amazon datasetFootnote 2 crawled in 2006, comprising reviews from 1996 to 2006, which was also used in [8, 9, 11,12,13, 19]. Note that this dataset involves multiple reviewing. Since spam mechanisms have undoubtedly advanced a lot since 2006, we also used a recent Amazon dataset, AmazonCDs, which comprises CDs and Vinyl reviews from 2012 to 2014.Footnote 3 This dataset contains rating data only and does not involve multiple reviewing. The YelpChi dataset contains reviews for restaurants and hotels in the Chicago area and was collected by [14]. YelpNYC and YelpZip are another two Yelp datasets collected by [16] which contain hotel/restaurant reviews from NYC and a number of areas with continuous zipcode started from NYC, respectively. The three Yelp datasets contain both recommended and filtered reviews labeled by the Yelp anti-fraud filtering algorithm, and thus, they can be treated as near-ground-truth datasets [16]. The Yelp datasets do not involve multiple reviewing.

Table 2 Review dataset statistics

5.2 Compared baselines

Since GSBP is also a GGSpam instantiation, we take GSBP as a baseline to compare with. We also introduced SCAN [24], a graph-based clustering algorithm, to perform clustering on our global reviewer graph in order to generate dense subgraphs. These dense clusters can also be viewed as suspicious spammer groups. Although SCAN also exploits our global reviewer graph model, it differs from GGSpam framework in that it does not consider the spamicity of each cluster during the clustering process. We also compare GSBC with FraudEagle [1] and SpEagle [16], which also use the three Yelp datasets to evaluate their performance. Essentially, FraudEagle and SpEagle do not generate spammer groups. Instead, they only rank reviewers or reviews. Note that FraudEagle and SpEagle are probabilistic graphic model-based solutions, which exploit Loop Belief Propogation (LBP) to inference the spamicity of review(er)s. In [16], a semi-supervised version, namely SpEagle \(^+\), is proposed, which can greatly improve the detection precision. However, our method is absolutely unsupervised, and labeling fake reviews is often impossible by merely reading the review text, so we do not compare to SpEagle \(^+\) in this study.

5.3 Performance on unlabeled datasets

We evaluate GSBC against GSBP and SCAN on the two unlabeled Amazon datasets. As we optimized the group spam indicators defined in [19], for fairness and consistency, we tried our best to use the same indicator set in GSBP as those in GSBC. That is, we do not use the ER (Early review) indicator, and instead, we use MR (Multiple reviewing) defined in this paper. The TW indictor is also computed in the same way as in this paper.

Since AmazonBooks and AmazonCDs are unlabeled datasets, it is difficult to directly evaluate the effectiveness of various spam detection approaches. Borrowing the idea in [4] which evaluates the spamicity of detected spammer groups via spam indicators, we compared the cumulative distribution function (CDF) curves of various spam indicators for the spammer groups detected by GSBC, GSBP and SCAN. The CDF curve, first proposed in [12], is often used to compare the distribution of a series of spam indicators.

Table 3 GSBC parameter setting, #groups, and time elapsed (\(\mathrm{MAXSIZE}=50,\, \theta =0.2,\, \alpha =0.5\), for all datasets)
Fig. 4
figure 4

Top-500-groups CDF comparison of GSBC (red and solid), GSBP (blue and dotted) and SCAN (green and dashed) on AmazonBooks. AVG is the average of all 8 group spam indicators. The same below (color figure online)

Fig. 5
figure 5

Top-500-groups CDF comparison of GSBC (red and solid), GSBP (blue and dotted) and SCAN (green and dashed) on AmazonCDs (color figure online)

For each dataset, we use GSBC to generate 500+ groups. The parameters used in GSBC as well as the number of groups generated and the time elapsed for each dataset are shown in Table 3. We adjust the parameters in GSBP and SCAN so that the number of groups generated is comparable to the number of groups return by GSBC. For fairness, we, respectively, fetched the top 500 spammer groups detected by GSBC, GSBP, and SCAN. Then we compute their 8 spam indicators in the way defined in Sect. 4.2. Figures 4 and 5 shows the CDF curves of the 8 indicators and their average value (AVG) for spammer groups detected by GSBC (red and solid lines), GSBP (blue and dotted) and SCAN (green and dashed) on dataset AmazonBooks and AmazonCDs, respectively. The closer the curve is to the vertical axis, the smaller their indicator values will be. We can see that, for most of the indicators, GSBC generates higher scores than GSBP and SCAN. For AVG indicator, GSBC always outperforms GSBP and SCAN by a large margin, while the distinction between GSBP and SCAN is not significant. Note that AmazonCDs does not involve multiple review, so its MR values are all zeros. To our surprise, GSBC performs extremely well on AmazonCDs dataset. This is because AmazonCDs contains more up-to-date review data (2012–2014) than those in AmazonBooks (1996–2006). Definitely, the number of internet users increased exponentially in recent years, and co-reviewing a product is becoming prevalent; thus, more group spam phenomena emerge. Also, the fact that SCAN gains comparable performance to GSBP indicates that the way we model the collusiveness between a pair of reviewers through Eq. 2 is effective.

We notice that for both Amazon datasets, RR values in GSBP are significantly higher than those in GSBC. Deep spammer group analysis reveals that, the reason why RR behaves differently from other indicators is mainly because GSBP models the reviewer similarity (collusiveness) as the number of co-reviewed products, while GSBC models the reviewer similarity by both review time and rating scores of the co-reviewed products, thus, reviewers who co-review a product within time window \(\tau \) but the rating score difference is relatively high will be filtered out by GSBC.

Fig. 6
figure 6

Group size comparison on Amazon datasets (Top 500 groups)

Figure 6 depicts the number of spammer groups versus group size for GSBC, GSBP and SCAN on AmazonBooks and AmazonCDs, respectively. We can see that, in general, the number of groups decreases as the group size increases. For AmazonBooks, each method returns a large number of groups with only 2 reviewers. However, GSBC tends to generate larger groups than GSBP and SCAN. For AmazonCDs which involves larger number of co-reviews, GSBC returns significantly smaller number of groups with only 2 or 3 reviewers than GSBP and SCAN. As a result, GSBC can spot more individual group spammers than GSBP and SCAN.

Fig. 7
figure 7

Top-500-groups CDF comparison of GSBC (red and solid), GSBP (blue and dotted) and SCAN (green and dashed) on YelpChi (color figure online)

Fig. 8
figure 8

Top-500-groups CDF comparison of GSBC (red and solid), GSBP (blue and dotted) and SCAN (green and dashed) on YelpNYC (color figure online)

Fig. 9
figure 9

Top-500-groups CDF comparison of GSBC (red and solid), GSBP (blue and dotted) and SCAN (green and dashed) on YelpZip (color figure online)

5.4 Performance on labeled datasets

For labeled Yelp datasets, we not only give CDF comparison against GSBP and SCAN, but also give precision comparison against GSBP, SCAN, FraudEagle, and SpEagle. Because hotel review dataset involves less number of products (hotels), which leads to considerably large number of co-reviews, we set MP to 10,000 and \(\tau \) to 10 as shown in Table 3. Figures 7, 8, and 9 depict the CDF comparison on YelpChi, YelpNYC and YelpZip, respectively. We can see that on Yelp datasets, GSBC performs even better than on Amazon datasets. The gap between the curves of GSBC and GSBP or SCAN gets notably wider. This suggests GSBC is more suitable for hotel/restaurant reviews, which involve considerably less number of items (products) and more co-reviews.

Fig. 10
figure 10

Group size comparison on Yelp datasets (Top 500 groups)

Figure 10 depicts the group size on YelpChi, YelpNYC, and YelpZip, respectively. Again, GSBC generates least number of groups of size 2 or 3. As a result, GSBC can produce more individual group spammers than GSBP and SCAN.

Fig. 11
figure 11

Precision comparison for top-ranked review(er)s on Yelp datasets

Fig. 12
figure 12

Review precision and F1 @ top k groups on Yelp datasets

Since Yelp datasets contain near-ground-truth, that is, each review in a Yelp dataset is labeled as either fake (filtered) or genuine (recommended), we can evaluate the precision of GSBC by checking each review(er) in the detected spammer groups. We take a reviewer as fake if and only if she/he has written at least 1 fake review in the group. Therefore, we can check the top-ranked review(er)s according to the detected top 500 groups they belong to. In [16], the top 1000 reviews/reviewers for each dataset by FraudEagle and SpEagle were reported. Figure 11 shows the reviewer precision and review precision at top k review(er)s (\(k\le 1000\)) for GSBC and other four baselines on Yelp datasets. We can see that the precision decreases as the number of review(er)s gets larger, except for the reviewer precision of FraudEagle on YelpNYC dataset, which weirdly goes up as more reviewers get in. In general, GSBC outperforms all the 4 baselines, except for SpEagle on YelpChi dataset. It is worth noting that GSBC outperforms GSBP for both reviewer precision and review precision on all the three Yelp datasets. To our surprise, SCAN works almost as well as GSBC, which further validate the effectiveness of our modeling of reviewer relationship by Eq. 2.

Figure 12 shows the precision and F1 score at top k groups for Yelp datasets. The precision at top k groups is defined as the ratio of the number of fake reviews in all the top k groups and the number of reviews contained in the top k groups. The F1 score considers both the precision and recall of reviews. For the precision on YelpChi and YelpNYC, GSBC and SCAN overwhelmingly outperform GSBP. For the precision on YelpZip, GSBC and SCAN only perform better than GSBP for the top 70 groups, and after that, GSBP outperforms GSBC and SCAN. For all Yelp datasets, GSBC always has a higher F1-score than GSBP and SCAN due to the fact that GSBC intends to generate more fake review(er)s than GSBP and SCAN. This suggests that GSBC can generate higher quality of spammer groups than GSBP and SCAN.

5.5 Impact of parameters

There are several user-specified parameters which affect the number and the composition of the resulting spammer groups. Here we investigate the impact of these parameters on YelpNYC dataset.

Parameter \(\delta \), which is used to filter out those small collusion values between reviewers, determines the topological structure of the reviewer graph. We set \(\delta \) to 0.36, 0.38, 0.40, and fix \(\tau =10\), \(\mathrm{MINSPAM}=0.58\), \(\mathrm{MAXSIZE}=50\). For different \(\delta \) values, Table 4 shows the number of spammer groups, number of edges (weights), the running time elapsed, and the Jaccard similarity and the number of common reviewers for top 100, 200, 400 spammer groups. We use \(\delta =0.4\), \(\tau =10\), \(\mathrm{MINSPAM}=0.58\), \(\mathrm{MAXSIZE}=50\) as the baseline which is marked * in Table 4. For two spammer group sets of the top k spammer groups selected from the baseline and the setting with \(\delta =0.36\) or 0.38, the Jaccard similarity between the two sets is defined as the ratio of the number of common reviewers to the cardinality of the union of the two sets. From the table we can see that lower \(\delta \) value generates more edges in the reviewer graph, and thus, more spammer groups are found, and more running time is required. The Jaccard similarities decrease as \(\delta \) decreases. As the number of groups increases, more common reviewers are found.

Table 4 Spammer group info w.r.t. \(\delta \, (\tau =10,\, \mathrm{MINSPAM}=0.58,\, \mathrm{MAXSIZE}=50)\). Column marked * denotes the setting of the baseline
Table 5 Spammer group info w.r.t. \(\tau \, (\delta =0.4,\, \mathrm{MINSPAM}=0.58,\, \mathrm{MAXSIZE}=50)\). Column marked * denotes the setting of the baseline
Table 6 Spammer group info w.r.t. \(\mathrm{MINSPAM}\, (\delta =0.4,\, \tau =10,\, \mathrm{MAXSIZE}=50)\). Column marked * denotes the setting of the baseline

Similarly, to study the impact of \(\tau \), we fix \(\delta =0.4\), \(\mathrm{MINSPAM}=0.58\), \(\mathrm{MAXSIZE}=50\), and set \(\tau \) to 10, 15, 20 days. From Table 5 we can see that the number of spammer groups increases as \(\tau \) increases. Although the number of edges increases, the time required only increases slightly. The Jaccard similarities decrease as \(\tau \) increases. Tables 4 and 5 reveal that both \(\delta \) and \(\tau \) influence the resulting spammer groups greatly.

From Table 6 we can see that MINSPAM significantly affects the number of groups detected. As expected, lower MINSPAM results in more groups. The running time is almost the same, and the Jaccard similarities are extremely high for top 100, 200, and 400 groups. We can see that MINSPAM severely impacts the number of groups generated, but it does not influence the composition of top ranked groups too much. Intuitively, lower MINSPAM will gain higher recall but lower precision.

6 Conclusion

Online review spamming has been increasingly becoming a real threat to the online rating systems, and detecting group spamming activities is a critical problem to protect online customers. In this paper, we propose a general computing framework for detecting online review spammer groups, which recursively breaks the whole reviewer graph into small suspicious reviewer groups. Moreover, we propose a novel instantiation of the proposed computing framework, which models review spammer groups as bi-connected graphs and splits large bi-connected graphs by minimum cut algorithm. A number of group spam indicators are designed to evaluate the spamicity of a spammer group. Experimental study on a real-world datasets against several state-of-the-art approaches verifies the effectiveness and efficiency of our approach. Future work includes seeking new effective group spam features to further improve the precision and recall, and other possible instantiations of GGSpam to detect various spammer groups in different domains.