Unsupervised Fraud Detection on Sparse Rating Networks

Tang, Shaowen; Wong, Raymond

doi:10.1007/978-981-99-8696-5_2

Shaowen Tang¹⁰ &
Raymond Wong¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1943))

Included in the following conference series:

Australasian Conference on Data Science and Machine Learning

398 Accesses

Abstract

Network fraud detection, specifically identifying abnormal users on rating platforms, has attracted considerable interests of researchers due to its wide applicability. However, the performance of existing detection systems suffer from several challenging problems such as class imbalance, lack of annotated data and network sparsity. To address above challenges, in this paper, we propose a novel unsupervised fraud detection algorithm FD-SpaN based on network structure exploration, to effectively rank users based on computed probabilities of being fraudulent and identify abnormal users on sparse networks. Firstly, we model ratings networks as graphs in mathematical manner with introduced metrics. Then, we add variable smoothing terms accordingly when inferring the quality and trustworthiness of each item and rating respectively, to tackle network sparsity on entity level. Meanwhile, for active users, we integrate their rating patterns into our developed formulations as a critical term to avoid overfitting. In addition, our proposed FD-SpaN is scalable to large-scale rating networks in real world due to its linear time complexity with respect to the size of network. Extensive experiments on two real-world datasets show the effectiveness of FD-SpaN under extreme class imbalance and network sparsity, as it outperforms other state-of-the-art baselines in terms of all evaluation metrics.

Access provided by Autonomous University of Puebla. Download conference paper PDF

GCNEXT: graph convolutional network with expanded balance theory for fraudulent user detection

Article Open access 08 October 2020

A graph-powered large-scale fraud detection system

Article 14 February 2023

Rating behavior evaluation and abnormality forensics analysis for injection attack detection

Article 02 December 2021

Keywords

1 Introduction

User-generated feedback on e-commerce platforms like Amazon and Yelp can provide valuable information for potential customers to help them make decisions about purchase and avoid risks. As a result, malicious users will aim at those rating platforms for different purposes e.g., overrating products for promoting sales or underrating products for defaming competitors [11, 15]. Therefore, identifying such abnormal users on rating networks has been a meaningful and popular research topic in recent as the rapid spread of online trading.

Traditional methods [11, 17, 24] try to extract handcraft features such as length of reviews, rating distributions of users etc., and feed them into classic machine learning models to learn suitable weights. However, conventional models are bottlenecked in performance and not scalable to larger size of data in real world. Therefore, [9, 18] start to investigate the rating patterns of users, examining how fraudsters behave differently from normal users within the rating network. More recently, graph-based approaches [1, 3, 14, 22, 29, 31] have emerged in the area of network fraud detection and shown promising performance. Generally, they attempt to regard rating networks as graphs and analyze the interactions between nodes to find out anomalies within a graph.

Although the field of network fraud detection is well-investigated by existing graph-based methods, they are still not able to perform consistently well across real-world platforms for many reasons. Firstly, on large trading websites like Amazon, millions of users could potentially conduct shopping activities and post their opinions or ratings on a daily basis. It is impossible to annotate such a huge amount of data manually due to the massive labour and time cost. However, training process without sufficient high-quality labeled data will limit the performance of fraud detectors [19]. Secondly, it is challenging for graph-based detection models to analyze nodes with only a few of inter-connections with others, while this situation is quite common in reality [26, 32]. For example, a user entity with only one or two ratings could be easily identified as a spammer by many existing models when its ratings deviate from mainstream opinions. Nevertheless, it still could be a benign user just with a different personal preference on specific products. Besides, when a product only receives unreliable ratings, it is unlikely to infer its inherent quality with interpretation. Lastly, many graph-based models decide the trustworthiness of rating depending on local deviation and ignore overall rating patterns of users, which could lead to overfitting [6, 7]. The reason is that both normal and unfair users might occasionally give unusual ratings, which violate their overall rating patterns.

Table 1. Comparison in terms of algorithmic properties with other baselines.

Full size table

In this paper, we aim to address the challenges mentioned above by proposing an Unsupervised Fraud Detection ranking algorithm on Sparse Rating Networks (FD-SpaN) to effectively identify abnormal users. Our FD-SpaN is designed to eliminate the negative influence of network sparsity on entity level by flexibly controlling the weights of global defaults on computation for the quality of items and the trustworthiness of ratings. Moreover, FD-SpaN attempts to further boost performance by including overall rating patterns of active users to avoid overfitting. As shown in Table 1, we compare FD-SpaN with other baselines in terms of four major algorithmic properties and FD-SpaN is the only one to satisfy all of them. The contributions of our work are summarized as follows:

Analysis. We comprehensively analyze the problem of network fraud detection and existing challenges, then model rating networks as graphs to include structural relations.
Algorithm. We propose an efficient graph-based fraud detection algorithm FD-SpaN, in setting of unsupervised learning, to rank and identify fraudulent users on sparse rating networks according to computed fraud scores.
Effectiveness. We conduct extensive experiments on two real-world sparse datasets. Results show the effectiveness of our algorithm FD-SpaN, which outperforms all other baselines in terms of all evaluation metrics.

2 Related Work

In this section, we classify existing works into two categories: general fraud detection and graph-based fraud detection. Comprehensive surveys could be found in [2, 19].

General Fraud Detection. [11] is the first work to investigate fraud detection on e-commerce platforms by extracting handcraft features corresponding to reviews, reviewers and products and learning suitable weights. Inspired by it, Behavior [18] expands feature set to detect target-based spamming, regarding multiple reviews from the same user on a single item or brand within a short time period as potential frauds. Co-training [17] builds up two independent models with different feature sets related to reviews and reviewers respectively, then uses a small group of labeled data to train both classifiers and annotate unseen data mutually to create new training samples for boosting each other iteratively. Deceptive [24] and Singleton [26] focus on textual content analysis and measure semantic anomaly scores of reviews by computational linguistic strategies such as Linguistic Inquiry and Word Count (LIWC) and Latent Dirichlet Allocation (LDA). [32] studies the problem of singleton review spam detection by converting it into a temporal pattern discovery problem. Similarly, Birdnest [9] analyzes the distribution and frequency of ratings given by a user in a Bayesian probabilistic model, which measures fraud score depending on how estimated posterior distribution is different from global expectation. SpEagle [25] incorporates behavioral and textual features into FraudEagle [1] to improve the performance.

Graph-Based Fraud Detection. Since PageRank [4] and HITS [12] were introduced to explore relations among website pages based on citing, many studies have been extensively investigating fraud detection using topological information rather than content comparison. BAP [22] values ratings even from biased users if ratings are opposite to the usual patterns of their reviewers. Troll-Trust [31] ranks nodes within signed networks based on calculated trustworthiness using sigmoid-like functions, which provides clear semantic interpretation but is limited in dealing with discrete values i.e., ratings in most of real-world platforms. Trustiness [28, 29] formally model network fraud detection with three intrinsic metrics to examine information of nodes and edges in mathematical manner within graphs. Rev2 [14] incorporates behavioral features and sentiment analysis as auxiliary information to improve the performance of Trustiness. FraudEagle [1] regards network fraud detection as a node classification problem and uses loopy belief propagation to update the entire graph. More recently, FdGars [30] and CARE [6] learn the vector representations of user nodes by leveraging the power of Graph Convolutional Network (GCN). For tackling the problem of camouflage, Fraudar [10] proposes a novel metric denoting the suspiciousness of users, which would not decrease as any amount of camouflages added.

3 Preliminaries

In this section, we first model general rating networks as bipartite graphs with mathematical symbols. Then, we formally propose the problem that we will focus in this paper. Lastly, we introduce four basic metrics to mathematically examine information of nodes and edges in the graph for addressing the proposed problem.

3.1 Problem Definition

Firstly, we define a rating network as a bipartite graph $G = (U, P, R)$, where U denotes all user nodes; P denotes all product (interchangeable with item) nodes; and R represents all ratings given by user $u \in U$ on product $p \in P$ within the graph G as edges. In addition, let $R_{u, *}$ be all ratings given by user u and $R_{*, p}$ be all ratings received by product p. We denote the value of rating $r_{u, p}$ as $w(r_{u, p})$ and scale it to the range between $-1$ and $+1$ for generalization, i.e., $w(r_{u,p}) \in [-1, 1] ~ \forall r_{u,p} \in R$. Based on above preliminaries, we raise the fraud detection problem as following:

Definition 1 (Problem)

Given a bipartite rating graph $G = (U, P, R)$, what is the fraud score of each user node $u \in U$, also known as the probability of being a fraudster, whose rating behavior is maliciously deviated from normal user nodes?

3.2 Metrics

To solve this specific problem, we introduce four intrinsic metrics corresponding to bipartite rating graphs: 1) quality of item; 2) deviation of rating; 3) trustworthiness of rating and 4) honesty of user. The definitions of four metrics will be explained as follows:

1.
Quality measures the inherent goodness of each item, which is also considered as the deserved rating from fair users. Therefore, we use a real number between $-1$ and $+1$ to indicate the quality of a product p, denoted as $Q(p) \in [-1, 1], ~ \forall p \in P$. Intuitively, high quality Q(p) (close to $+1$) of product p represents high expected rating from normal users, and vice versa.
2.
Deviation measures the difference between the value $w(r_{u,p})$ of a rating $r_{u,p}$ and the quality Q(p) of its targeted item p, denoted as $dev(r_{u,p})$. Negative deviation suggests lower rating received than what the targeted item deserves, while positive deviation represents higher rating than expectation.
3.
Trustworthiness measures how reliable is a rating $r_{u, p}$ from user u on item p, denoted as $T(r_{u,p}) \in [0, 1] ~ \forall r_{u,p} \in R$. The trustworthiness $T(r_{u,p})$ is determined by multiple factors. Intuitively, the trustworthiness scales from 0 to 1, standing for completely untrustworthy and totally reliable, respectively.
4.
Honesty measures the fairness of a user node u in the graph, denoted as $H(u) \in [0, 1], ~ \forall u \in U$, where 1 indicates absolute fair users and 0 suggests definite spammers. In particular, the honesty of a user entity is decided by all its ratings given to items within the graph.

4 Methodology

4.1 Unsupervised Learning

After we formulate the bipartite rating networks comprehensively with four basic metrics, we now can explore relations among metrics in mathematical manner. Due to the lack of annotation, we aim to infer true values of metrics in unsupervised setting. Therefore, we design mathematical formulations to calculate values for all metrics interdependently. Firstly, for each item, we use all ratings received by the item to compute the intrinsic quality. Intuitively, each rating weighs variously based on the trustworthiness, as reliable ratings will reflect the real quality of item, while untrustworthy ratings aim to mislead other users. Hence, we develop the formulation for the quality of item as below:

$$\begin{aligned} Q(p) = \frac{\sum _{r \in R_{*, p}} T(r) \cdot w(r)}{| R_{*, p} |} \end{aligned}$$

(1)

where $|\cdot |$ is the element count operator. Secondly, for inferring the trustworthiness of a rating, the key factor is the rating deviation. As shown in Fig. 1, on both real-world rating networks, deviations of ratings from normal users are quite neutral, while fraudsters are more likely to give ratings on items with lower values than what items deserve, which generally implies that extreme deviations lead to unreliable ratings. Hence, we develop the formulation for the trustworthiness as:

$$\begin{aligned} T(r_{u, p}) = 1 - \frac{\Vert dev(r_{u, p}) \Vert }{2} \end{aligned}$$

(2)

$$\begin{aligned} dev(r_{u, p}) = w(r_{u,p}) - Q(p) \end{aligned}$$

(3)

where $\Vert \cdot \Vert $ is an operator for computing absolute value. Thirdly, the honesty of each user node in the graph is determined by all ratings it gives. In general, benign users tend to give trustworthy ratings to items, while fraudsters rarely rate items fairly. Thus, we infer the honesty of each user as:

$$\begin{aligned} H(u) = \frac{\sum _{r \in R_{u, *}} T(r)}{| R_{u, *} |} \end{aligned}$$

(4)

Lastly, the fraud score of each user entity in the graph is reversely correlated to its honesty score, denoted as F(u) and expressed as:

$$\begin{aligned} F(u) = 1 - H(u) \end{aligned}$$

(5)

4.2 Alleviating Network Sparsity

Normally, introduced formulations will work properly when each node in the graph is densely connected with others. Nevertheless, on real-world rating platforms, the performance of fraud detection systems will suffer from insufficient inter-connections, which is a problem called network sparsity [23]. For example, newly registered users on e-commerce websites will naturally have few ratings on items, which could be easily classified as fraudulent users for first one or two ratings significantly deviating from mainstream [3]. Similarly for item node, it could be hard to infer the real quality of item when most of its received ratings are untrustworthy. Therefore, effective fraud detectors are supposed to address network sparsity, because online platforms will notify suspicious users to verify identification only when feel confident rather than bother normal users frequently in practice, which is harmful to the revenue [16]. Thus, we add a variable Laplace smoothing term in Formula (1) to alleviate network sparsity on entity level as:

$$\begin{aligned} Q(p) = \frac{\sum _{r \in R_{*, p}} T(r) \cdot w(r) + \sum _{r \in R_{*, p}} (1 - T(r)) * q}{| R_{*, p} | + \sum _{r \in R_{*, p}} (1 - T(r))} \end{aligned}$$

(6)

where q refers to the default value of quality in the graph, which could be decided as the median value of all items’ quality scores. And variable term $\sum _{r \in R_{*, p}} (1 - T(r))$ flexibly controls the proportion of default on calculation for the quality. Intuitively, the default would account less for the calculation if the item receives more trustworthy ratings. Likewise, we also add a variable smoothing term in Formula (2) as:

$$\begin{aligned} T(r_{u, p}) = \frac{(1 - \frac{\Vert dev(r_{u, p}) \Vert }{2}) + C_1 * t}{1 + C_1} \end{aligned}$$

(7)

$$\begin{aligned} C_1 = \max (0, ~ 1 - \frac{| R_{u, *} |}{K}) \end{aligned}$$

(8)

where t refers to the default value of trustworthiness in the graph and K is a positive constant number indicating the threshold number of interactions as active users. Intuitively, the calculation will be less dependent on default value as the activity level of user increases. Together, variable smoothing terms in our formulations can automatically adapt to different nodes in the graph, which helps to alleviate network sparsity on entity level.

4.3 Avoiding Overfitting

Even in real-world sparse rating networks, where many user nodes in the graph only give few ratings to item nodes, there still could be some active users who rate items frequently, no matter they are fraudsters or normal users. For an active benign user node, one of its ratings might not be as usual as others due to the specific unsatisfactory shopping experience or unique personal preference. Therefore, it is inaccurate to always decide the trustworthiness of a rating from active users only by local rating deviation [20]. To solve this problem, we introduce a new metric to reflect the rating patterns of users for resisting overfitting when user nodes are active, denoted as $bias(u), ~ u \in U$. Then, we integrate this metric into Formula (7) for computing trustworthiness of rating as:

$$\begin{aligned} T(r_{u, p}) = \frac{(1 - \frac{\Vert dev(r_{u, p}) \Vert }{2}) + C_1 * t + C_2 * (1 - \frac{\Vert bias(u) \Vert }{2})}{1 + C_1 + C_2} \end{aligned}$$

(9)

$$\begin{aligned} bias(u) = \frac{\sum _{r_{u, p} \in R_{u, *}} (w(r_{u, p}) - Q(p)) }{| R_{u, *} |} \end{aligned}$$

(10)

where $C_2 = 1 - C_1$, controlling the importance of term bias on smoothing extreme rating deviation with respect to the activity level of users.

4.4 Proposed Algorithm: FD-SpaN

In this section, we propose algorithm FD-SpaN to effectively rank and identify network frauds as shown in Algorithm 1. Specifically, we iteratively calculate the true values for metrics and adjust default values simultaneously according to the global trend in the graph. In the end, our algorithm will output the likelihood of each user node being abnormal in the graph. In addition, FD-SpaN can also be applied to other fields that information can be organized as bipartite graphs, such as social media [5, 13], finance [27] and recommender system [33].

4.5 Time Complexity Analysis

In each iteration of training, our algorithm FD-SpaN will go through all user nodes once, all item nodes once and all ratings three times to update values of metrics. Hence the time complexity for each iteration is $\mathcal {O}(x + y + 3z)$, where x is the total number user nodes, y is the total number of item nodes and z is the total number of ratings in the graph. Let, n denote the number of training iterations needed for our model FD-SpaN to converge. As a result, the overall time complexity for FD-SpaN would be $\mathcal {O}(n * (x + y + 3z))$, where n is a constant number. Thus, theoretically, our proposed model FD-SpaN has a linear time complexity with respect to the size of graph, which is suitable to be applied in larger scale of data of real world.

5 Experiments

In this section, we conduct extensive experiments on two real-world datasets to evaluate FD-SpaN compared with the state-of-the-art baselines to show the effectiveness of our proposed model. Specifically, We aim to answer following research questions:

RQ1. How does FD-SpaN perform on real-world sparse rating networks for identifying frauds compared with other baselines?
RQ2. How does the performance of FD-SpaN and other baselines change under different training percentages?
RQ3. Is FD-SpaN scalable to real-world platforms with larger size of data?
RQ4. How does each different component of FD-SpaN contribute to the overall performance?

Table 2. Two real-world datasets used for evaluation.

Full size table

5.1 Experiment Setup

Datasets. Following the tradition of previous representative work [6], we also select two real-world datasets for extensive experiments. Table 2 shows the statistics of two real-world datasets, where attribute density refers to the average number of ratings given by each user in the network. Both datasets are relatively sparse and extremely imbalanced to different extents, as we aim to evaluate the performance of FD-SpaN on identifying frauds under actual circumstances.

FineFoods [21] is a bipartite rating network in category of fine foods, with around $65\%$ of user nodes and $41\%$ of item nodes giving and receiving only one rating, respectively. The ground truth is given by the method mentioned in [14] based on provided helpfulness votes.
Instruments [8] refers to the musical instruments trades on Amazon websites. This rating network is even sparser, as roughly $80\%$ of users and $44\%$ of items have only one rating. The ground truth is also given based on the similar method as used in FineFoods dataset.

Baslines. We compare our model with five state-of-the-art baselines on fraud detection. We convert these baselines to calculate the probability of each user being anomaly, as exactly what our model does, for fair comparison.

Birdnest [9] only considers the distribution and timestamps of ratings, and uses Bayesian inference to calculate suspiciousness scores for users.
BAP [22] measures the expected rating for each item and the trend of each user giving high or low ratings. The key of BAP is to value ratings from users even with high bias if ratings are opposite to the usual pattern of reviewers.
Eagle [1] is based on loopy belief propagation and regards the network fraud detection as a node classification problem in graphs.
Behavior [18] designs and extracts multiple behavioral features of users to calculate overall fraud scores for users.
Rev2 [14] is based on HITS algorithm [12] and iteratively computes three intrinsic metrics in the graph: the fairness of users, the reliability of ratings and the goodness of products.

Evaluation Metrics. Our task is to infer the probability of each user node being spammer in the graph. Then, we will rank user entities based on the assigned probabilities. However, real-world rating networks are extremely imbalanced and sparse as shown in Table 2. Additionally, effective fraud detectors are supposed to assign as high anomaly scores as possible to real spammers and as low as possible to normal users. Therefore, following by the previous work [14], we extract the most and least suspicious 100 users from the output of model to form the test set and evaluate models by Average Precision (AP), which measures the capacity of distinguishing fraudsters and normal users. Also, we evaluate models by Area Under ROC Curve (AUC) on the top 200 suspicious user entities, reflecting the ability of each algorithm to identify positive cases on imbalanced dataset.

Implementation Details. For BAP and Rev2, we strictly follow the instructions from their paperwork to implement their algorithms. For Birdnest^{Footnote 1}, We use the original code from authors to run experiments. For Eagle^{Footnote 2} and Behavior^{Footnote 3}, we use available open-source implementations online. As for our proposed model FD-SpaN, we set parameter $K = 4$ according to the result on validation set, number of training iterations $epochs = 20$ and training error threshold $\epsilon = 10^{-4}$ as many other studies did.

Table 3. Performance comparison (%) on two real-world datasets. The best result of each metric is in bold. ‘-’ represents not applicable.

Full size table

5.2 Effectiveness (RQ1)

In this experiment, we compare the results of proposed model FD-SpaN on both real-world datasets with other state-of-the-art baselines as shown in Table 3. Overall, we can observe that our FD-SpaN consistently outperforms all baselines in terms of all evaluation metrics. More specifically, Birdnest performs poorly across both datasets and has remarkable gaps with other models. This is due to Birdnest only considers the temporal pattern and distribution of ratings provided by users to decide their suspiciousness scores, while it suffers from massive inactive users with only few ratings, which are very common in real-world rating networks but cannot be interpreted by Birdnest.

BAP and Eagle are graph-based models, trying to exploit topological information of graphs to identify anomalies, as what Rev2 and FD-SpaN do. However, they cannot achieve consistently competitive performance because they fail to alleviate the impacts of network sparsity. Even though Behavior obtains promising results by including multiple designed features regarding to rating behaviors, our FD-SpaN still outperforms Behavior by $7.48\%$ in AP and $5.20\%$ in AUC on FineFoods, which implies the superiority of graph-based methods.

Moreover, Rev2 is a graph-based model and attempts to address network sparsity on graph level. However, even within the same graph, each node is in a different situation. Therefore, applying the same parameters to different nodes in the graph will cause damage to the overall performance. Meanwhile, by alleviating network sparsity on entity level and resisting overfitting through new designed metric, our proposed FD-SpaN achieves a $5.98\%$ improvement in AP value on FineFoods. Also, 91 out of top 100 suspicious user nodes predicted by our model FD-SpaN on dataset Instruments are real spammers, which is the best among all algorithms. In summary, under realistic circumstance of class imbalance and network sparsity, our proposed model FD-SpaN can effectively identify fraudsters within rating networks.

5.3 Training Percentage (RQ2)

In this experiment, we run our model FD-SpaN with different percentages of training data and record the results in terms of AP and AUC values. As we can see from Fig. 2, on FineFoods, FD-SpaN obtains stable performance in both evaluation metrics under different training percentages. On Instruments, even though FD-SpaN scores relatively low in AP with $10\%$ of training data, it rapidly recovers to normal as the training percentage increases. Together, experiment results show the robustness of our FD-SpaN with respect to training percentages.

Moreover, we record the performance variations of our FD-SpaN and other baselines in value of AP under different training percentages, as illustrated in Fig. 3. On FineFoods, FD-SpaN scores the highest when only $10\%$ of training data provided and maintains one of the tops in most choices of training percentages. On Instruments, we can observe FD-SpaN has continuous growth in AP value and consistently outperforms other baselines as training percentage increases from $10\%$ to $90\%$. Together, experiment results suggest our FD-SpaN can effectively analyze topological structure of graphs to identify anomalies.

5.4 Linear Scalability (RQ3)

Alongside with previous theoretical analysis of linear scalability of our proposed FD-SpaN, we also run our algorithm using different numbers of training samples to verify linear scalability. We scale training samples in different magnitudes of amount from $10^4$ to $10^7$ and record the training time costs for FD-SpaN to converge on the same machine, as illustrated in Fig. 4. Basically, the running time of FD-SpaN is increasing linearly as fed with more training samples. Therefore, experiment results verify our model has linear scalability and is suitable to handle with larger size of data in practice.

Table 4. Performance variations (%) when different component of algorithm FD-SpaN is excluded.

Full size table

5.5 Ablation Study (RQ4)

In this experiment, we conduct an ablation study on both datasets to validate the effectiveness of designed components of FD-SpaN and demonstrate how each different component contributes to the overall performance of FD-SpaN.

FD-SpaN/AO: it excludes the component for avoiding overfitting.
FD-SpaN/AS: it removes the component for alleviating network sparsity.

As shown in Table 4, on FineFoods, by removing the component of either anti-overfitting or anti-sparsity, the performance of FD-SpaN will drop remarkably by $8.21\%$ and $7.59\%$ in AP and AUC values, respectively. However, we observe a marginal decrease in performance on Instruments when the component of avoiding overfitting is excluded. The reason could be that most of users in Instruments are very inactive so that incorporating their overall rating patterns has very limited effects to resist overfitting. FD-SpaN can achieve the best performance on sparse rating networks only when all designed modules are integrated into it.

6 Conclusion

In this paper, we propose an unsupervised graph-based fraud detection ranking algorithm called FD-SpaN, which targets at effectively identifying anomalies on sparse rating networks in reality. We alleviate the negative impacts of network sparsity on entity level, which leads to finer granularity and better performance than existing methods. We also include the rating patterns of users in our FD-SpaN to avoid overfitting. Our algorithm has a linear time complexity with respect to the size of graph, which makes FD-SpaN is suitable for real-world platforms to deal with large-scale data. More importantly, extensive experiments on two real-world datasets have shown the effectiveness and superiority of FD-SpaN, scoring the highest in all metrics against other state-of-the-art baselines. In the future, we plan to investigate fraud detection on dense rating networks.

Notes

References

Akoglu, L., Chandy, R., Faloutsos, C.: Opinion fraud detection in online reviews by network effects. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 7, pp. 2–11 (2013)
Google Scholar
Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29(3), 626–688 (2015)
Article MathSciNet Google Scholar
Breuer, A., Eilat, R., Weinsberg, U.: Friend or faux: graph-based early detection of fake accounts on social networks. In: Proceedings of The Web Conference 2020, pp. 1287–1297 (2020)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Article Google Scholar
Cheng, L., Guo, R., Shu, K., Liu, H.: Causal understanding of fake news dissemination on social media. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 148–157 (2021)
Google Scholar
Dou, Y., Liu, Z., Sun, L., Deng, Y., Peng, H., Yu, P.S.: Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 315–324 (2020)
Google Scholar
Gao, Y., Wang, X., He, X., Liu, Z., Feng, H., Zhang, Y.: Alleviating structural distribution shift in graph anomaly detection. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pp. 357–365 (2023)
Google Scholar
He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of the 25th International Con World Wide Web, pp. 507–517 (2016)
Google Scholar
Hooi, B., et al.: BIRDNEST: Bayesian inference for ratings-fraud detection. In: Proceedings of the 2016 SIAM International Conference on Data Mining, pp. 495–503. SIAM (2016)
Google Scholar
Hooi, B., Song, H.A., Beutel, A., Shah, N., Shin, K., Faloutsos, C.: FRAUDAR: Bounding graph fraud in the face of camouflage. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 895–904 (2016)
Google Scholar
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 219–230 (2008)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Koggalahewa, D., Xu, Y., Foo, E.: A drift aware hierarchical test based approach for combating social spammers in online social networks. In: Xu, Y., et al. (eds.) AusDM 2021. CCIS, vol. 1504, pp. 47–61. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-8531-6_4
Chapter Google Scholar
Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., Subrahmanian, V.: REV2: Fraudulent user prediction in rating platforms. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 333–341 (2018)
Google Scholar
Kumar, S., Shah, N.: False information on web and social media: a survey. arXiv preprint arXiv:1804.08559 (2018)
Li, A., Qin, Z., Liu, R., Yang, Y., Li, D.: Spam review detection with graph convolutional networks. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 2703–2711 (2019)
Google Scholar
Li, F.H., Huang, M., Yang, Y., Zhu, X.: Learning to identify review spam. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)
Google Scholar
Lim, E.P., Nguyen, V.A., Jindal, N., Liu, B., Lauw, H.W.: Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 939–948 (2010)
Google Scholar
Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In: Mining text data, pp. 415–463. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_13
Liu, Z., Dou, Y., Yu, P.S., Deng, Y., Peng, H.: Alleviating the inconsistency problem of applying graph neural network to fraud detection. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp. 1569–1572 (2020)
Google Scholar
McAuley, J.J., Leskovec, J.: From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 897–908 (2013)
Google Scholar
Mishra, A., Bhattacharya, A.: Finding the bias and prestige of nodes in networks based on trust scores. In: Proceedings of the 20th International Conference on World Wide Web, pp. 567–576 (2011)
Google Scholar
Mukherjee, A., et al.: Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 632–640 (2013)
Google Scholar
Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557 (2011)
Rayana, S., Akoglu, L.: Collective opinion spam detection: bridging review networks and metadata. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 985–994 (2015)
Google Scholar
Sandulescu, V., Ester, M.: Detecting singleton review spammers using semantic similarity. In: Proceedings of the 24th International Conference on World Wide Web, pp. 971–976 (2015)
Google Scholar
Wang, D., et al.: A semi-supervised graph attentive network for financial fraud detection. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 598–607. IEEE (2019)
Google Scholar
Wang, G., Xie, S., Liu, B., Philip, S.Y.: Review graph based online store review spammer detection. In: 2011 IEEE 11th International Conference on Data Mining, pp. 1242–1247. IEEE (2011)
Google Scholar
Wang, G., Xie, S., Liu, B., Yu, P.S.: Identify online store review spammers via social review graph. ACM Trans. Intell. Syst. Technol. (TIST) 3(4), 1–21 (2012)
Google Scholar
Wang, J., Wen, R., Wu, C., Huang, Y., Xiong, J.: FdGars: fraudster detection via graph convolutional networks in online app review system. In: Companion Proceedings of the 2019 World Wide Web Conference, pp. 310–316 (2019)
Google Scholar
Wu, Z., Aggarwal, C.C., Sun, J.: The troll-trust model for ranking in signed networks. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 447–456 (2016)
Google Scholar
Xie, S., Wang, G., Lin, S., Yu, P.S.: Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 823–831 (2012)
Google Scholar
Zhang, S., Yin, H., Chen, T., Hung, Q.V.N., Huang, Z., Cui, L.: GCN-based user representation learning for unifying robust recommendation and fraudster detection. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 689–698 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University of New South Wales, Sydney, Australia
Shaowen Tang & Raymond Wong

Authors

Shaowen Tang
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaowen Tang .

Editor information

Editors and Affiliations

The University of Auckland, Auckland, New Zealand
Diana Benavides-Prado
The University of Melbourne, Carlton, VIC, Australia
Sarah Erfani
Shenzhen University, Shenzhen, China
Philippe Fournier-Viger
RMIT University, Melbourne, VIC, Australia
Yee Ling Boo
The University of Auckland, Auckland, New Zealand
Yun Sing Koh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, S., Wong, R. (2024). Unsupervised Fraud Detection on Sparse Rating Networks. In: Benavides-Prado, D., Erfani, S., Fournier-Viger, P., Boo, Y.L., Koh, Y.S. (eds) Data Science and Machine Learning. AusDM 2023. Communications in Computer and Information Science, vol 1943. Springer, Singapore. https://doi.org/10.1007/978-981-99-8696-5_2

Download citation

DOI: https://doi.org/10.1007/978-981-99-8696-5_2
Published: 05 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8695-8
Online ISBN: 978-981-99-8696-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Unsupervised Fraud Detection on Sparse Rating Networks

Abstract

Similar content being viewed by others

GCNEXT: graph convolutional network with expanded balance theory for fraudulent user detection

A graph-powered large-scale fraud detection system

Rating behavior evaluation and abnormality forensics analysis for injection attack detection

Keywords

1 Introduction

2 Related Work