Keywords

1 Introduction

With development of society and evolution of technology, economic fraud which is less in the past has gradually risen [1, 2], resulting in heavy loss of many enterprises and organizations. Therefore, from theoretical research to practical application, identification and monitoring fraud [3, 4] have caught more attention than before.

1.1 Related Work

Sieve method based on rule set used historical data related to fraud users’ behavior feature to define a series of rules [46]. If users break pre-defined rules, system will warn administrators by reporting an emergency. For example, a mobile phone user is presumed to be fraud if his ‘monthly cumulative charge exceeds 1,000 USD.

Outlier detection uses intelligent model to detect special samples in total, then system submits the outliers to administrators [7]. For example, by using density-based algorithm DBOM [8], abnormal degree of each instance in feature space is measured by LOF (local outlier factor).

Another solution is category discrimination [9]. It uses classification methods in data mining, such as decision tree [10], support vector machine [11], neural network [1214], to classify and evaluate new samples. According to such IF-THEN rules, a person whose monthly outbound times are more than 6,000 may be regarded as a fraud user.

However, those methods are not good at processing stream data. Among those methods, some are not easy to set up parameters, and some others cannot teach themselves to fit variable data. In addition, those methods limit the capacity of application system for their high calculation complexity [15].

1.2 Our Contributions

To overcome these limitations, we present a new algorithm UAF (Usage Amount Forecast). We analyze variables independent of total amount to predict whether a user is fraudulent. The experiment shows that UAF is superior over existing relative methods in terms of runtime, accuracy, and robustness.

Overall, the contributions of our work on real-time fraud detection are as follows:

  1. 1.

    UAF does not need cumulative variables, which makes it has low computational cost.

  2. 2.

    UAF only computes variables which are independent of total amount, so it is able to catch fraud timely.

  3. 3.

    UAF can be used on real-time scenarios. The scores update synchronously while bills are inputted continuously.

The rest parts are organized as follows. In Sect. 2, we demonstrate the main idea of UAF, and give notions and definitions. The complete process of UAF with pseudo-code is showed in Sect. 3. Experiment results are presented in Sect. 4 and we conclude our work in Sect. 5.

2 Preparation of Your Paper

2.1 Notions

Assume that dataset D is the feature space to be studied. It contains n instances and m attributes. It is represented as \( {\text{D}} = \left\{ {{\text{x}}_{1} , \ldots ,{\text{x}}_{\text{n}} } \right\} \) and the matrix form is \( {\text{D}} = \left\{ {{\text{x}}_{1}^{\text{T}} , \ldots ,{\text{x}}_{\text{n}}^{\text{T}} } \right\} \in {\text{Z}}^{\text{n*m}} \). For any instance \( {\text{x}}_{\text{i}} \) of D, we have \( {\text{x}}_{\text{i}} = \left\{ {{\text{x}}_{{{\text{i}}1}} , \ldots ,{\text{x}}_{\text{im}} } \right\}^{\text{T}} \). Here \( {\text{x}}_{\text{ik}} \) is the discretized result of the \( {\text{i}}^{\text{th}} \) instance on \( {\text{k}}^{\text{th}} \) attribute. For each fraudulent sample, we also have \( {\text{y}}_{\text{j}} = \left\{ {{\text{y}}_{{{\text{j}}1}} , \ldots ,{\text{y}}_{\text{jm}} } \right\}^{\text{T}} \). Here \( {\text{y}}_{\text{jk}} \) is the discretized result of the \( {\text{j}}^{\text{th}} \) sample on \( {\text{k}}^{\text{th}} \) attribute.

2.2 Real-Time Fraud Detection

To find out whether a user is fraudulent, we have to know how to accurately divide the target user sets into two subsets, fraud and normal.

A. Usage Amount Forecast

In telecom industry, users pay bills periodically. The billing cycle is usually a month, users randomly generate call records, surf the internet and purchase value-added services. Data scientists working for operators collect and analyze these consuming data with big data techniques. The attributes used to describe users can be divided into two types, cumulative attributes and feature ones.

As shown in Fig. 1, the cumulative attributes are increasing monotonously when consuming records generate, but the feature attributes are stable throughout the whole billing cycle. The feature attributes are independent of usage amount and almost constant for a single user. That is why they are called feature attributes.

Fig. 1.
figure 1

Cumulative attribute and feature attribute

With large sample analysis, we find out that the two types have some specific correlations. The cumulative attributes can be predicted by feature attributes. When we detect fraud, the cumulative attributes are useless because they need long enough time to increase and warn, which is really belated. So when we use feature attributes only, as shown in Fig. 2, we may estimate the potential risk of total usage and locate fraud timely.

Fig. 2.
figure 2

The time window of usage amount forecast

B. Similarity Evaluation

Although we know feature attributes are more useful in fraud detection, a mechanism of scoring is still needed. Generally speaking, close objects have similar patterns, such as K-NN (K-Nearest Neighbors) algorithm [17]. The user who shows similar features to given fraudulent samples has a higher risk of fraud.

Therefore, we give the definition of Similarity Score (SS).

$$ {\text{Definition}}:\forall i = 1, \cdots ,n;j = 1, \cdots ,n^{ '} ,SS\left( {x_{i} } \right) = min_{j} \left( {\sum\nolimits_{k = 1}^{m} {\left| {x_{ik} - y_{jk} } \right|} } \right) $$
(1)

\( \sum\nolimits_{{{\text{k}} = 1}}^{\text{m}} {\left| {{\text{x}}_{\text{ik}} - {\text{y}}_{\text{jk}} } \right|} \) is the Manhattan Distance between user \( {\text{x}}_{\text{i}} \) and fraudulent user \( {\text{y}}_{\text{j}} \). Manhattan Distance not only reduces the impact of correlation between attributes, but also greatly reduces computational complexity than commonly used Euclidean Distance.

3 Algorithm Description

The whole process of UAF is shown in Fig. 3, which includes 2 main phases, data prepare and SS calculation.

Fig. 3.
figure 3

Process of UAF

3.1 Data Prepare

First of all, we do data cleaning, e.g., Missing value interpolation and outlier detection. Then the predefined feature attributes are generated automatically. Some basic attributes are obtained directly from the original datasets such as calling duration and times. The other feature attributes are obtained by transforming, for example, the average duration of single call is defined as cumulative duration divided by cumulative times.

The third step is discretization. Because of frequent left avertence of normal distribution in telecom industry, equal-frequency criterion is more suitable than the common equal-width criterion. For example, assuming \( {\text{L}} \) is range of an attribute, \( {\text{K}} \) is number of segments, \( {\text{N}} \) is number of instances, the critical values of equal-width method are \( \left\{ {0,\frac{\text{L}}{\text{K}},\frac{{2 * {\text{L}}}}{\text{K}}, \ldots ,\frac{\text{K*L}}{\text{K}}} \right\} \), the critical values of equal-frequency method are \( \left\{ {{\text{x}}_{1} ,{\text{x}}_{{\left[ {\frac{\text{N}}{\text{K}}} \right]}} ,{\text{x}}_{{\left[ {\frac{{2 * {\text{N}}}}{\text{K}}} \right]}} , \ldots ,{\text{x}}_{{\left[ {\frac{\text{K*N}}{\text{K}}} \right]}} } \right\} \).

3.2 SS Calculation

Firstly, SS is calculated by (1). After that, SS has to be normalized and reversed for displaying. Scoring range is from 0 to 100. So we have (2).

$$ SS\left( {x_{i} } \right) = 100 - \frac{{100 \times \left( {SS\left( {x_{i} } \right) - SS_{min} } \right)}}{{SS_{max} - SS_{min} }} $$
(2)

The last step is decision process. When SS is higher than decision threshold, the user will be assumed as fraud, and the system triggers alarm to administrators, otherwise updates the user SS score. The decision threshold is an important parameter which can adjust and optimize by actual results.

4 Experiments and Results

4.1 Empirical Evaluation

A. Datasets

In this work, we use nine datasets to evaluate the performance of UAF. Description of the datasets is given in Table 1, for example, the date set A-1 means the data is from city A, which has 1,715,459 bills and 177,761 users in the first month. Additionally, a library which includes 6 international roaming fraudulent users is used as reference.

Table 1. Description of used datasets

All feature attributes are divided into boxes of total number \( {\text{n}} \) with equal-frequency discretization. Improper \( {\text{n}} \) may result in failure or over-fitting. The following results are the best performance of different \( {\text{n}} \).

B. Attributes

Considering usage amount forecast mechanism, we select attributes which are dependent of total amount, such as average call duration and average times of each number. Description of attributes is showed in Table 2.

Table 2. Description of used attributes

C. Decision Threshold

After finishing tests and adjustments, decision threshold may be 90 % of minimum score of all fraudulent users in the last month. If the system has a higher false rate compared with missing rate, the decision threshold should be increased, otherwise, it should be reduced.

D. Evaluation Criteria

We designed two ways to evaluate the effectiveness and robustness of UAF: post-testing, and pre-testing.

Post-Testing:

Examine whether the given fraudulent users get a higher score than normal users. Conduct Experiments on different cities and different months to ensure that UAF is applicable for different situations.

Pre-Testing:

With continuous input of bills, users’ real-time scores can be calculated simultaneously. Pre-testing focuses on the proportion of bills occupied when a fraud user is caught. The lower the rate is, the more effective UAF is.

4.2 Results and Analysis

A. Post-Testing

To illustrate the performance of UAF, both normal and fraudulent users’ scores of nine datasets are calculated, as shown in Table 3.

Table 3. Post-testing results
  1. (i)

    Different Cities in the Same Month

Obviously, comparing the result of A-1, B-1, and C-1, all fraudulent users get scores higher than 100 except No.2 in B-2. Since normalizing is based on normal users, the fraudulent users have specific features. That is why the fraudulent users get much higher scores.

However, there is a score lower than 100 in dataset B-2, and by sorting all scores and studying the bills, we are convinced that there is really a fraud user in B-2.

  1. (ii)

    Different Months at the Same City

Analyzing 7 months’ results of B city, the fraudulent users’ scores are always higher than the normal users’ scores, which proves that UAF works steadily through a long time.

B. Pre-Testing

In this part, the program of UAF reads in bills continuously simulating a data stream. Then, it calculates the usage rate of bills until fraud is detected, as show in Table 4.

  1. (i)

    Different Cities in the Same Month

Table 4. Pre-testing results

Obviously, the fraudulent users can be detected when only 0.01 % bills produced under the best condition, and for the worst, it needs 63.19 %. In this table, the usage rate will be 10.75 % on average, which means the model is robust and performs steadily for each dataset.

  1. (ii)

    Different Months at the Same City

From Table 4, each fraudulent user in the 7 datasets of city B can be detected timely.

C. Parameter n

Due to different sizes of datasets, the parameter n may affect the performance remarkably. For example, datasets A-1 has 177,761 users, where n = 10 is not big enough for distinguishing each attribute. As shown in Table 5, the No.3 fraudulent user only gets 98. When n increases to 20, the scores become more reasonable.

Table 5. Contrast experiment on A-1

But n is not the larger the better. To illustrate this puzzle, experiment results are shown in Table 6. There are 3 different n on B-2: 10, 20 and 30. When n is10, the minimum is 110. It rises to 131when n increases to 20, while it drops to 126 when n is 30. That is a typical example of overfitting.

Table 6. Contrast experiment on B-2

5 Conclusion

In this paper, we provide a new algorithm UAF to tackle the problem of real-time fraud detection. UAF selects feature attributes which are independent of total amount and uses equal-frequency criterion for discretization. After that, similarity calculation is proceeded by computing and comparing Manhattan distance between users. The experiments demonstrate that UAF is more precise than the state-of-the-art techniques in this domain and also has more effectiveness and scalability. In future studies, we will extend our algorithm to handle more complicated data types.