Abstract
Long gone is the time when people preferred using only cash. In recent years, cashless transactions have gained much popularity, be it using UPI apps or credit and debit cards. The same has even led to a significant increase in the number of credit card fraud cases. Detecting fraudulent transactions is a challenging task as the fraudsters disguise the ordinary conduct of clients in order to perform fraud. Automated intelligent credit card fraud detection can be employed for detecting fraudulent transactions. In this paper, we proposed a credit card fraud detection approach involving an arrangement of supervised machine learning algorithms called ensemble learning. One of the difficulties looked at during the time spent to distinguish fraud transactions in datasets is the imbalanced class distribution. In this work, we employed an ensemble learning model in combination with two data-level techniques for handling class imbalance problems. The proposed approach is the ensemble of three base classifiers including random forest, logistic regress and K-nearest neighbour along with two data-level algorithms namely random oversampling and random undersampling. To combine the predictions of the base classifiers, the weighted voting ensemble approach is used. The proposed approach is evaluated using a highly imbalanced credit card transaction dataset. The proposed approach is evaluated using various sets of weights in order to identify the best possible outcomes in terms of accuracy and minimise the misclassification of fraudulent transactions.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The idea of involving a card for paying off goods bought, or services used was portrayed in the year 1887 by Edward Bellamy in his idealistic book Looking Backward. In the year 1934, the American Airlines and the whole Air Transport Association introduced a new approach for booking tickets the Air Travel Card where travellers could select the option as “purchase now and pay later” [16]. Since then, the “purchase now and pay later” technique has gained popularity in various markets where monetary transactions are done using cards known as credit cards. In 2018, at least 72% of adults were considered to have at least one credit card, and there were well above 1.12 billion credit cards underused in the US alone [46]. Even though credit cards have completely revolutionized how people purchase, it has empowered common people to make purchase and pay by making just a swipe. But that does not mean that such things cannot meet with misshapen or any other difficult scenario, especially fraud [28, 46]. In general, credit card fraud is carried out as follows:
-
Theft or stealing of card: Technological malpractices cannot be a catalyst to every sort of credit card fraud, but someone stealing and using the card for their benefit is one of the primary ways of credit card-related fraud in today's world.
-
Phishing or skimmed card details: In this category of fraud, fraudsters tend to scam-call random people and try to retrieve personal bank details for illegal and manipulative transfer of funds or, in other words, do online stealing of funds from consumers of bank accounts.
-
Cyber-attacks and data breaches: In this category, the theft of credit card or bank information is carried out to perform frauds. A few years back, in 2016, the immense information break and clouded side of innovation took place in the world of cybercrime that enables fraudsters to commit sophisticated crimes. [46].
As the size of e-shopping, web-based banking, and online transactions increases, fraudsters make the most of each flimsy way in a transaction framework to make fraudulent transactions. Credit card-related fraud costs shoppers and companies a huge amount of money every year. People doing such frauds constantly attempt to track down new guidelines and strategies to commit unlawful activities. Thus, fraud tracking frameworks are now fundamental for banks and related financial institutions so that they can limit such misfortunes. Fraud transactions can be detected either by using a classification approach or by detecting an outlying transaction from normal transactions [44]. In general, credit card fraud detection relies on the analysis of recorded transactions for detection of the nature of transactions as fraudulent or legitimate [53]. The major issues that arise for fraud detection are the identification of customers’ transaction patterns and behaviours, as well as manual and timely investigation of transactions [21] which leads to the need for the development of automated systems. In past, data mining techniques have been applied in this domain extensively [19, 48, 60]. Apart from data mining, several other techniques have also been used such as Genetic Algorithm [13], statistical analysis [14], network-based [12], sequence classification [5, 37], Hidden Markov Model based approach [40], etc. In the last few years, machine learning [47, 62, 63, 68, 77] and deep learning [25, 26, 29, 42] are being extensively applied for credit card fraud detection.
All these fraud detection techniques can be categorized as supervised learning and utilize some datasets consisting of past transactions. In the case of credit card fraud detection, most of the transactions belong to legitimate and few are fraudulent. This leads to an imbalance distribution of data known as the class imbalance problem [2]. The imbalance issue in the dataset may cause the problem of a prediction model being biased towards legitimate transactions. In other words, if we try building a classification model in which we try to predict a fraudulent transaction, there is a high chance of malicious transactions being predicted as legitimate due to such a massive imbalance between the two types. So, to deal with the issue there are various approaches available. The existing approach can be categorized as data-level, algorithm-level, cost-sensitive, and prediction-level. The data-level approach mainly belongs to oversampling and undersampling where an imbalanced dataset is balanced by making the majority and minority class samples in equal proportion [34, 63]. In the algorithm-level approach, the modification is made by a higher magnitude to the minority class samples [73]. The cost-sensitive approach considers a parameter, known as cost, to balance the misclassification error for majority and minority class samples [3]. Apart from these approaches, ensemble learning is also applied to handle imbalanced datasets where multiple learning algorithms are employed and the results are aggregated to make final predictions [42]. This type of approach can be categorized as a prediction-level approach. In this approach, the problem of imbalanced class distribution is handled by assigning different weights to the results of classification models. In recent years, various state-of-the-art classification models with imbalanced data have been proposed that utilizes one or more types of the above mentioned approaches [6]. In this paper, we have applied data-level techniques for the pre-processing of data and utilized multiple machine learning as ensemble learning.
In general, machine learning algorithms are used for various decision and prediction-based purposes and problem-solving tasks. In a similar manner, when there is a task to predict or detect whether a fraudulent transaction has taken place or to identify a fraud transaction amongst a dataset that has valid transactions as well, requires the usage of classification based on machine learning algorithms. The detection system can be helpful in segregation of transactions into their type, whether fraud or valid transactions [33, 67]. But often using different machine learning algorithms on the same dataset gives very different results, so it is at times advised to try something called ensemble learning. Ensemble learning means using two or more machine learning algorithms and combining the results using some aggregation approach to get better results. One of the prominent ways of using ensemble learning for classification is voting ensemble learning. In this approach, the results are often combined, and voting is carried out among multiple results. The majority is said to be the final predicted value of the combined model [71, 76].
In this paper, we have proposed an ensemble learning-based approach for credit card fraud detection. The proposed model is composed of multiple machine learning-based classification algorithms, such as a Random Forest, Logistic Regression and KNN. Although, there are large numbers of various machine learning algorithms are available which can be used in this application. In order to make the proposed model simple and efficient, we have selected these three algorithms as these algorithms performed well in various parameters. As described earlier, ensemble learning can be used to handle the class imbalance problem by assigning weights to the results of different classifiers. In the case of credit card fraud detection, the main focus would be the detection of fraudulent transactions or positive samples. However, due to fewer samples available which belong to the positive class, there is a high chance that a positive sample get undetected. To overcome this problem, a weighted voting approach for the aggregation of ensemble learning models is used. The approach helps assign different weights during the aggregation of the results in order to reduce the misclassification of positive samples. Here, the soft and hard weighted voting schemes have been used to make final predictions. The soft voting ensemble technique includes a combination of probabilities of each prediction in each model and selecting the prediction with the highest total probability. Whereas, hard voting includes selecting the prediction with the highest number of votes. The proposed approach also employed data-level class balancing approaches in the pre-processing stage of the dataset such as random oversampling (ROS) and random undersampling (RUS). A block diagram of the proposed credit card fraud detection framework is shown in Fig. 1.
2 Related work
With the advancement of technology and cashless transactions, commercial fraud or deception has also increased, especially, in the banking systems [14]. Credit card fraud is one of the most challenging fraud banking systems are facing in modern times. Manual detection of such fraud is a tedious task and there is a high chance that fraudulent transactions go unnoticed [63]. In order to build automated credit card fraud detection systems, many state-of-the-art approaches have been proposed by several researchers. Several researchers have proposed fraud detection based on statistical learning and data mining [31, 53, 60, 72]. In recent years, machine learning and deep learning-based approaches for fraud detection have been applied extensively [32].
Machine learning and deep learning play important roles throughout several efficient fields for data processing [9, 27, 35, 36, 55]. Machine learning and deep learning are being applied for credit card fraud detection. Awoyemi et. al [10] employed multiple machine learning models, including Naïve Bayes (NB), K-Nearest Neighbour (KNN) and Logistic Regression (LR) for credit card fraud detection over the highly skewed dataset. Their analysis showed that the KNN performed better. In [38], an approach based on transaction behaviours has been proposed where multiple machine learning algorithms have been evaluated. However, the results show that only the Random Tree and J48 models have yielded satisfactory accuracy. [24] used various machine learning algorithms along with PCA for feature reduction and SMOTE for balancing the class distribution of the dataset. Several other approaches based on machine learning algorithms have also been proposed for credit card fraud detection [4, 39, 51, 63, 68]. Along with machine learning, deep learning has been also employed by several authors for credit card fraud detection [8, 47, 50]. In [37], LSTM-based transaction sequence classifiers have been proposed. Alghofaili et al. [5] have also proposed an LSTM-based deep learning model for the detection of financial fraud. [61] proposed a neural network architecture which is used and executed for a large number of iterations. In [26], a combination of deep learning based autoencoder and classifiers have been proposed for credit card fraud detection. In [69], the authors have discussed machine learning methods and the challenges in the process of credit card fraud detection.
The most difficult issue nowadays in this field is the class imbalance or skewed distribution of class in a dataset. Usually, the class imbalance problem degrades the efficiency of machine learning or deep learning-based classifiers. In the case of credit card datasets, the availability of positive samples is much less compared to legitimate transactions. This leads to a high rate of misclassification of positive samples. In [32], have suggested various approaches for dealing with problems of class imbalance such as methods based on data level, levelling algorithm, and ensemble approaches. Data level techniques are otherwise called outer level strategies, as they control the preparation information remotely [63]. It is finished by re-sampling of data externally to stabilize the distribution of occurrences in majority and minority classes and is used for finding fraudulent transactions and to improve overall accuracy [64]. Hybrid methods are another type of approach for dealing with the class imbalance problem [7]. In this manner, there is a consistent interest in concocting new methods that can enhance the accuracy of classification. One thought is to utilize a group of classifiers rather than individual ones. In other words, employing ensemble learning to deal with class imbalance problems.
The most popular troupe strategies for ensemble multiple classifiers were presented with bagging [78] and boosting [49]in which a few classifiers were utilized to create one single result with further developed precision. Be that as it may, the exemplary form just proposes the larger part or majority vote to total the results of individual classifiers. Ensemble learning is an important solution to upgrade the performance of a ML-based model. The basic ideology of an ensemble is to combine various classification models to enhance the overall performance of the model. [7] used an ensemble of ML models of Random Forests and Neural Networks. According to observation Random Forest can classify normal transactions correctly but misclassifies the fraudulent transactions. Meanwhile, Neural Networks can classify fraud transactions correctly but misclassify some of the transactions are normal. Hence, the ensemble-based method has the best possible solution. Application of ML and data mining deduce through a significant test that is the way to win wanted grouping accuracy from the data which is exceptionally slanted in nature.
In recent years, various ensemble learning models for credit card fraud detection have been proposed by several researchers. Sohony et al. [64] have proposed an ensemble approach of random forest and neural network. The proposed model has improved both fraud and normal instances detection. In [57], employed multiple learning algorithms along with Adaboost and majority voting technique. The Proposed approach has utilized a real-world credit card dataset and the results of the proposed work show that majority voting performed well. In [56], a weighted voting ensemble approach has been proposed where data is balanced using an undersampling technique followed by feature selection using the random forest. The selected features are further used by an ensemble classifier consisting of multiple machine learning and deep learning models. Xie et. al [74] proposed a heterogeneous ensemble model where equal focus was given to the classification of both positive and negative classes of credit card transactions. It also employed the KNN and K-Means algorithm for balancing the class distribution followed by classification tasks using a voting ensemble. In [25], an ensemble approach consisting of LSTM and Adaboost has been proposed in combination with the Smote-ENN approach for enhancing credit card fraud detection performance. Some authors have also applied different ensemble learning approaches for credit card fraud detection [11, 22, 42, 58]. A summary of various existing techniques for credit card fraud detection is presented in Table 1.
In all these state-of-the-art literature, different types of approaches based on data mining, machine learning, and deep learning have been used and reported satisfactory performance. A large number of the existing literature are not focused on improving the misclassifications of fraudulent transactions or mostly tackled the problems using data-level class balancing algorithms. Some authors have applied different ensemble learning strategies such as; boosting, bagging, stacking, and voting. In the case of voting ensemble techniques, most of the existing approach relies on single types of voting ensemble strategy. Although, they have reported good performance, still, there is a large scope for improvements in terms of robustness of the fraud detection. In this paper, we have proposed an ensemble approach which comprises three simple base classifiers. The approach utilizes multiple types of voting ensemble strategies such as hard and soft voting as well as weighted and unweighted voting ensemble in order to reduce the misclassification of fraudulent credit card transactions.
3 Dataset
Credit Card Fraud Detection using Machine Learning is a course of data examination by a Data Science group and the improvement of a model that will give the best results in uncovering and forestalling bogus trade. This is accomplished by uniting all significant elements of card users' transactions. The dataset that is being used in this work is taken from Kaggle (CCDataset 2018) which is available publically. It includes all those credit card transactions that took place in September of 2013 by European card owners. The dataset consists of 31 features labelled as two types of transactions; fraud (1) and legitimate (0). The publisher of the dataset employed PCA on the available features to get an equivalent feature set excluding the time and amount attributes. The regeneration of the features is used to avoid revealing confidential information. The record contains 492 fraud cases from 284,807 transactions, and others are valid.
4 Methodology
In general, for developing a machine learning-based system, it is crucial to pre-process the information, which includes information adjusting and feature selection, influencing the overall performance of an algorithm. This step is also important for an ensemble learning model. In the case of ensemble learning, the main concern is the determination of proper classifiers. In particular, choosing an appropriate set of classifiers and various subsets of the dataset could further enhance an algorithm’s performance [56]. In this work, since the data is highly imbalanced, we balance the data using undersampling (RUS) and oversampling (ROS) techniques. Then the balanced dataset is used to evaluate a weighted ensemble machine learning algorithm. The ensemble learning model consists of multiple machine learning models as described below.
4.1 Machine learning
Machine learning plays a fundamental part in detecting fraud in the daily transactions made using credit cards. For foreseeing these exchanges, most of the banking systems utilize different machine learning methodologies. In order to enhance a detection model, past data is used and new elements can also be utilized [23, 39]. In this paper, we have used the following machine learning-based techniques to detect fraudulent and legitimate transactions in credit card transactions:
-
K-Nearest Neighbours: KNN is a vital classifier that belongs to the supervised learning category of machine learning. In this technique, new data is grouped by its closest neighbour's majority vote. The distance between nearest neighbours is estimated by a distance function that utilizes the Euclidean distance, the Manhattan distance, and the Minkowski distance technique as well. K value refers to the number of neighbours the K value is extremely low, one will get less stable outcomes. On the other hand, increasing the K value will permit to increase in the error, but one will acquire stable outcomes. Subsequently, in the current work, the K value is picked by trial and error so no overfitting happens [20].
-
Random Forest Classification: Random Forest is a bagging class of ensemble classifier which employs multiple decision trees, and the decision trees are trained using different subsets of a given dataset. It takes the average to aggregate the results to improve the results. Rather than depending upon a single decision tree, the random forest takes the supposition from each of the included trees in view of the greater part of the votes of predictions, and it predicts the ultimate result.
-
Logistic Regression: It is an important statistical technique for the analysis, as well as classification of binary and proportional response data sets [41]. Logistic regression predicts the result of a straight-out subordinate variable. The approach is a parametric approach where a hyperplane is identified by optimizing the coefficients associated with features. The hyperplane evaluated in this algorithm partitions the samples into binary classes. The result should either be an absolute or a discrete. It tends to be either Yes or No, 0 or 1, True or False, and so forth.
4.2 Ensemble learning
As we all know, the blunders and forecasts in any machine learning models are unfavorably impacted by the bias or the variance, and noise as well. To battle these disadvantages, outfit ensemble models are utilized. An ensemble model is an interaction of how different ML models, like classifiers or specialists, are unequivocally made, and are joined to tackle a particular computational insight issue. Ensemble learning is essentially used to work on classification, prediction, function approximation, etc. [52]. The two most widely used ensemble techniques are Bagging and Boosting. Another popular war of aggregating the results of multiple learning models is voting ensemble.
A voting ensemble is an ensemble ML model that combines the assumptions from various models. A framework might be used to oversee model execution, in a perfect world achieving inclined toward execution over any single model used in the outfit. A voting classifier works by joining the assumptions from various models. It very well may be utilized for characterization and regression. By excellence of regression, this integrates registering the mean of the expectation from the models. By virtue of order, the expectation for each name is added and the engraving with the larger part vote is anticipated [17]. Different types of voting are hard voting and soft voting.
-
Hard Voting: In the case of hard voting scheme for ensemble learning-based classification, the final prediction of all the classifiers are considered and anticipates the class which have the most number of vote from models. For example, suppose there are three base classifiers employed to build an ensemble model and the result is aggregated using hard voting technique. For a sample to classify, if two base classifiers predict it as a positive sample and one classifier as a negative sample. Combining these results, the final prediction of the ensemble model will be a positive class for the sample.
-
Soft Voting: Soft voting includes finding the mean of all the predicted probabilities for each class label and foreseeing the class name with the highest probability. Here, the probability for each class is calculated by averaging the probabilities resulted by all the base classifiers of the ensemble model. In soft voting, it predicts a class with the highest average probability computed for the class.
4.3 Class imbalance techniques
Dataset plays an important role in the learning of a machine learning algorithm. Also, there's a huge amount of data that is produced every minute, and it is a massive task on its own to filter it and form suitable and ready-to-use datasets with no discrepancies as such and well balanced. But a lot of times the major classes seem to have a significant imbalance in their quantity as well. In the case of the credit card transactions dataset, there is a high chance of imbalanced data. For instance, the dataset used in this work has a total of 284,807 transactions, of which only 492 are fraud transactions which makes 248,315 valid transactions [18]. Since there is such a humongous imbalance in this dataset, it makes the outcome more inclined towards valid transactions and increases the chance of positive class misclassification. To avoid this, we applied certain techniques, mainly oversampling and undersampling to maintain a normal balance between the two major classes of the dataset.
The authors in [45] have summarized the related work in the field of imbalanced class problems. An imbalanced characterization issue is an example of an arrangement-type issue where the dissemination of models across the known classes is uneven or skewed. The scattering can change from a slight bias to an outrageous inconsistency where there is also one model in the minority class for a hundred, thousands, or even millions of models and in the bigger part class or classes. The Imbalanced arrangement represents a challenge for prescient modelling as most of the computations that are used for portrayal were arranged around the assumption of an identical number of models for each class. The outcome in models with poor insightful execution unequivocally for that of the minority class. That is an issue as the minority class is more critical so accordingly, the issue is more delicate to characterize blunders for the minority class to that of the greater part class. It is conceivable that the imbalance in the models across the classes was brought about by how the models were accumulated or inspected from the issue space [17].
Class Imbalance issues appear in many domains, including Fraud detection, Spam filtering, Disease screening, etc. The two popular types of techniques which are used extensively in various fields are oversampling and undersampling. In oversampling, the minority class samples are duplicated to an amount which makes it in a normal ratio to that of the amount of majority class values. Oversampling can be carried out using RUS, Synthetic minority oversampling technique (SMOTE), Adaptive synthetic sampling (AdaSyn), etc. In undersampling, the samples of the majority class are decreased to that of a normal ratio of the minority class samples. It is also having a risk of losing some important data in process of lowering the data amount. Undersampling is carried out using RUS, Tomeklinks, Cluster centroids undersampling technique etc. In this paper, ROS and RUS are used for balancing the class distribution. The class imbalance issues in the dataset and the strategies to deal with the issue have been shown in Fig. 2.
4.4 Proposed work
Every year, an enormous measure of financial misfortunes is brought about by illegal credit card exchanges. Also, with the advancement and innovations, criminals track down new ways to commit fraud. Therefore, there is a need to tackle fraud detection issues in credit card transactions. In this paper, we propose a voting ensemble learning approach for credit card fraud detection. The proposed approach is evaluated by using a highly imbalanced credit card transactions dataset. The proposed ensemble model is developed using Random Forest, Logistic Regression and KNN classifiers. Also, the results of these models are aggregated using a weighted voting scheme to get better results. As described earlier, the dataset available for credit card fraud suffers from a class imbalance problem which is handled using undersampling and oversampling techniques. The proposed approach works as follows:
-
First, data pre-processing is performed. In this phase, we have taken care of the class imbalance issues within the dataset. To balance the data, RUS and ROS approaches have been utilized. The balanced data is further used to train and test the proposed ensemble classification model.
-
Next, we have trained multiple machine learning models to create ensemble learning. The results of these models are aggregated using the voting scheme. The voting scheme uses weights in two ways; hard and soft voting.
-
To combat disadvantages like noise and variance, multiple-weighted voting ensemble methods comprising different combinations of weights are used. The weighted voting ensemble model is utilized to enhance the overall performance of credit card fraud detection by joining the classification results of all the classifiers and choosing the class with the highest vote based on the weights assigned to each of the classifiers.
At times while trying different machine learning algorithms as a part of the voting ensemble model, there is a probability that one might perform better than others. To deal with this, we can assign different weights to each of the classifiers. Let be three base classification models M1, M2, and M3, with assigned weights W1, W2, and W3, respectively. In this paper, we represented the assignment of the weight as < W1, W2, W3 > . Suppose a weight is given as < 1, 1, 1 > means that the vote of each model would be the same. Whereas combinations such as < 2, 1, 1 > , < 1, 2, 1 > and < 1, 1, 2 > mean different weights are assigned to classifiers. The majority of the three classification models are given as the final predicted value of the ensemble model. A block diagram for the proposed weighted voting ensemble model is shown in Fig. 3.
Let us assume the valid samples of a credit card transaction dataset are labelled as 0, and fraudulent transactions are labelled as 1. The proposed hard and soft voting schemes can be described as follows:
-
Hard voting: Hard voting is a simple voting in which the majority one is chosen or considered to be the ideal classification. This process is thus carried out by taking a mode for predicting class y for models mj.
$$y=mode\{m1(x), m2(x), m3(x), \dots , mj(x)\}$$Suppose the three-classification models are giving the following predictions for a transaction:
Model 1 -> Class 1, Model 2 -> Class 1, and Model 3 -> Class 0.
Now, if we want to predict class label y, we will take the mode of all predictions.
Therefore, y = mode {1, 1, 0} = 1.
This means that the predicted result for the transaction is fraudulent.
-
Soft voting: On the other hand, in soft voting, the probability of occurrence of both classes is calculated, and then a final classification is done.
For example, the probability of occurrence of class 0 and class 1 for a transaction is given as {p(0), p(1)} and the results of three models are given as follows:
-
Model 01—> {0.1,0.9}
-
Model 02—> {0.2,0.8}
-
Model 03—> {0.6,0.4}
-
Now, Overall p(0) = (0.1+0.2+0.6)/3 = 0.3
Overall p(1) = (0.9+0.8+0.4)/3 = 0.7
Here the overall probability of class 1 is higher, so the transaction is classified as class 1 i.e. fraudulent.
5 Experiment and result analysis
5.1 Experimental setup
The whole process of code execution and implementation of the proposed work is completed on an Apple MacBook Pro with a processor speed of 2.3 GHz, Dual-Core Intel Core i5. The Python codes are executed using the Juypter Notebook of Anaconda software. The versions used of Python and its libraries in this work are Python (3.7.11), NumPy (1.21.2), Pandas (1.3.5), SciPy (1.7.3), Matplotlib (3.5.0), Seaborn (0.11.2), Imblearn (0.0), and Sklearn (0.24.1). The results of the proposed work are recorded using accuracy and confusion matrices. The accuracy is used to record the overall performance of the proposed work whereas, the confusion matrices are used to show the true and false classification of positive and negative samples.
5.2 Result analysis
The dataset used in this work consists of 284,807 values, out of which 248,315 are valid transactions and only 492 samples are fraudulent. So, to deal with the imbalance of data, we have employed ROS and RUS techniques. The machine learning algorithms that are used in the voting ensemble model are random forest, logistic regression and KNN. The weights given for the 3 algorithms < w1, w2, w3 > were in the same order, where w1 is assigned the weight for the random forest, w2 is the weight for logistic regression and w3 is the weight for KNN. Both soft and hard types of voting schemes have been applied, to get better results and understand the right sort of model working. The results are tabulated which show different accuracies obtained using the weighted voting ensemble on the balanced dataset.
The dataset after the oversampling and undersampling is used for training and testing of the selected base classifiers. After that, hard and soft and weighted voting ensemble machine learning techniques were carried out. The best accuracy is received with the assigned weights < 2,1, 1 > in both the soft and hard voting types. The training and testing accuracies for soft and hard voting are recorded as 100% and 99.99%, respectively. The approach to handle the class distribution is carried out by ROS. The worst performance, in terms of accuracy, is yielded in the case of the hard voting strategy with assigned weights as < 1, 1, 2 > where the class imbalance problem has been handled by RUS. The training and testing accuracy achieved in this case is 87.2% and 79.05%, respectively. A details summary of the performance of multiple setups of weights in terms of training and testing accuracy is presented in Table 2.
To measure the misclassification of fraudulent and valid transaction samples, we have plotted the confusion matrices for all combinations of weighted hard and soft voting ensemble learning models carried out in this paper. Figures 4, 5, 6, 7, 8, 9, 10 and 11 are the confusion matrices plotted for the training and testing sets where the class imbalance problem is handled using the ROS approach. In some cases of ROS, the proposed approach has produced zero false predictions. For instance, with assigned weights < 2, 1, 1 > with soft voting, zero false prediction is achieved for the training set and zero false prediction is achieved for the fraud samples during the testing as shown in Fig. 4. Similarly, as shown in Fig. 5, with the hard voting strategy zero false prediction is achieved for the testing set and zero false prediction is resulted for the fraud level during training. With the assigned weights < 1, 2, 1 > and soft voting, the proposed model also resulted in zero false predictions for fraudulent samples in both the training and testing phases, as shown in Fig. 6. However, misclassification of negative samples has increased compared to previously mentioned cases. With assigned weights < 1, 2, 1 > and hard voting, the proposed model has not performed well as shown in Fig. 7. Except for this scenario, other assigned weights < 1, 1, 2 > and < 1, 1, 1 > have performed well in both soft and hard voting schemes. As shown in Figs. 8, 9, 10 and 11, zero false predictions are recorded for fraudulent transactions.
Figures 12, 13, 14, 15, 16, 17, 18 and 19 present the confusion matrices plotted for the training and testing sets where the class imbalance problem is handled using the ROS approach followed by multiple combinations of weighted soft and hard voting schemes. The target in all these experiments was to attain zero false predictions as well. It is clearly visible that the results are consequently better for the ROS than the RUS class handling approach. This is also understandable that the undersampling when lowering the majority class to the minority class amount doesn't always fit well, as the chances of losing the important data always exist. However, in some cases of assigned weights, satisfactory results are achieved. In the case of assigned weights < 2, 1, 1 > and soft voting very few misclassifications of valid transactions are recorded compared to misclassifications of fraudulent transactions as shown in Figs. 12 and 13. For other combinations of weights and voting schemes, there are few misclassifications of negative samples recorded but the same for fraudulent transactions are very high. The worst result is recorded for the weights < 1, 1, 2 > and hard voting scheme as shown in Fig. 17.
6 Limitations and future work
The results of the proposed works showed that the voted ensemble learning model for credit card fraud detection is performing well. In some cases of assigned weights, the models resulted in zero false classification for both training and testing sets of data. However, there are some limitations associated with the proposed model in terms of the dataset, approach for handling class imbalance problem, base classifiers selection to build ensemble learning models as well as a limited combination of weights assignments for the voting schemes. In terms of data, the main issue that arises for building an efficient credit card fraud detection model is the limited availability of data. Most of the existing datasets available in the public domain have limited numbers of features and samples. In this paper, the proposed model has been evaluated using only one dataset [18] which may result in poor generalization for unseen events of credit card transactions. For better generalization, the model needs to be evaluated using other existing datasets [66, 70, 72] as well as the real credit card dataset [53].
In terms of handling class imbalance problems, the ROA and RUS have been applied in this work. There are possibilities for enhancing the performance of the proposed model by combining it with different oversampling, undersampling, combination of oversampling and undersampling and other similar methods [63]. Apart from data-level algorithms, the cost-sensitive approaches [3, 65, 75] can also be applied in combination with proposed ensemble learning.
To develop an ensemble learning model, the selection of base classifier(s) plays an important role. Funding such a combination is a challenging task as a combination may perform well for a particular problem and it may fail to generalize other problems. In this paper, a limited number of base classifiers are employed which can be considered as advantages in terms of simplicity in the architecture and faster training of the model. But, this is also a limitation of the proposed work. A bigger or different combination of base classifiers may result in better performance. In future work, some other classifiers such as Support Vector Machine, Decision Tree, Naive Bayes, Neural Networks, LSTM, Generative Adversarial Networks etc. can be ensemble to enhance the credit card fraud detection task. Ensemble learning enhances classification accuracy at the cost of interpretability [15]. To deal with this issue, Explainable Artificial Intelligence (XAI) based credit card fraud detection models can be developed [43, 54].
In this paper, the soft and hard voting schemes for aggregating the results of the base classifiers have been used. It uses multiple combinations of weights assigned to each base classifier for final prediction. The selection of these weights used in this work is limited to a few integer values i.e. 1 and 2. More combinations of integers, as well as fractional weights, can also be used to enhance the performance of the proposed model. The other aggregation strategies [59] apart from voting may also be employed and evaluated to check the performance of the proposed model.
7 Real-world applications
Frauds related to credit cards increasing with the development of e-commerce, digital payment, and related technology. The fraud cost customers and financial organizations billions of dollars. Manual and traditional fraud detection systems may not be much efficient as the fraudsters also developing new ways and techniques to commit illegal transitions. To cater for this challenge, an advanced automated fraud detection system is required by the banking organization. As discussed in the limitation section, the proposed model needs improvement in terms of generalization to catch real-world unseen transactions, the model can be helpful in developing automated credit card fraud detection systems. Since it employs machine learning, the model will also advance itself with experience. Apart from credit card fraud detection, the proposed model can also be used for developing applications to tackle other monetary frauds in the banking and financial systems. The proposed model can also be applied in different fields where numerically labelled datasets are available such as insurance, stock market, medical and health care, cybersecurity, etc.
8 Conclusion
This study contributes to the existing literature on credit card fraud detection by proposing a voted ensemble learning approach. The proposed approach is an ensemble of three base classifiers namely Random Forest, Logistic Regression and KNN. The results of these models are aggregated using weighted voting techniques. The voting scheme used in this paper follows hard and soft voting techniques. To get a final prediction that a credit card transaction is fraudulent or legitimate, an aggregated value is calculated using various combinations of weights assigned to the predictions of three base classifiers. The proposed ensemble model is evaluated using a highly imbalanced dataset. The class imbalance problem has been handled during pre-processing using two approaches known as random oversampling and random undersampling. In this study, the main focus is to reduce the false prediction of fraudulent transactions. The proposed model has performed well in terms of accuracy as well as minimized false prediction. The highest training and testing accuracies achieved by the proposed model are 100% and 99.99%, respectively, when we applied ROS and assigned the highest weight to the results of Random Forest. Most of the other combinations of assigned weights for both soft and hard voting schemes have also produced satisfactory results except in the case where the highest weight is assigned to the results of the KNN classifier and class distribution is balanced using RUS. The proposed ensemble learning can further be enhanced by using other combinations of classification algorithms and different combinations of weights.
Data availability
All data generated or analysed during this study are included in this article.
References
Abd El-Naby A, Hemdan EED, El-Sayed A (2023) An efficient fraud detection framework with credit card imbalanced data in financial services. Multimed Tools Appl Multimed Tools Appl 82(3):4139–60
Ahmad H, Kasasbeh B, Aldabaybah B, Rawashdeh E (2023) Class balancing framework for credit card fraud detection based on clustering and similarity-based selection (SBS). Int J Inf Technol. Springer Nature Singapore 15(1):325–33. Available from: https://doi.org/10.1007/s41870-022-00987-w
Alejo R, Valdovinos RM, García V, Pacheco-Sanchez JH (2013) A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios. Pattern Recogn Lett 34(4):380–388. https://doi.org/10.1016/j.patrec.2012.09.003
Alfaiz NS, Fati SM (2022) Enhanced credit card fraud detection model using machine learning. Electronics 11(4):662. https://doi.org/10.3390/electronics11040662
Alghofaili Y, Albattah A, Rassam MA (2020) A financial fraud detection model based on LSTM deep learning technique. J Appl Secur Res. Routledge 15(4):498–516. Available from: https://doi.org/10.1080/19361610.2020.1815491
Ali A, Shamsuddin SM, Ralescu AL (2013) Classification with class imbalance problem: a review. Int J Adv Soft Comput 5(3)
Ali H, Najib M, Salleh M, Hussain K, Ahmad A, Ullah A et al (2019) A review on data preprocessing methods for class imbalance problem. Int J Eng Technol 8(3):390–397
Asha R, Suresh K (2021) Credit card fraud detection using artificial neural network. Glob Transitions Proc. Elsevier B.V 2(1):35–41. Available from: https://doi.org/10.1016/j.gltp.2021.01.006
Aswathi KB, Jayadev S, Krishna N, Krishnan R, Sarath G. Botnet (2021) Detection using machine learning. 2021 12th Int. Conf. Comput. Commun Netw Technol ICCCNT. 2021
Awoyemi JO, Adetunmbi AO, Oluwadare SA (2017) Credit card fraud detection using machine learning techniques: a comparative analysis. 2017 International Conference on Computing Networking and Informatics (ICCNI). Lagos, Nigeria, pp 1–9. https://doi.org/10.1109/ICCNI.2017.8123782
Bagga S, Goyal A, Gupta N, Goyal A (2020) Credit card fraud detection using pipeling and ensemble learning. Procedia Comput Sci 173(May):104–112
Van Belle R, Baesens B, De Weerdt J (2023) CATCHM: A novel network-based credit card fraud detection method using node representation learning. Decis Support Syst. Elsevier B.V. 164(August 2022):113866. Available from: https://doi.org/10.1016/j.dss.2022.113866
Benchaji I, Douzi S, El Ouahidi B (2019) Using genetic algorithm to improve classification of imbalanced datasets for credit card fraud detection. In: Khoukhi F, Bahaj M, Ezziyyani M, (eds). Smart Data Comput Intell. Cham: Springer International Publishing. p. 220–9
Bolton RJ, Hand DJ (2002) Statistical fraud detection: a review. Stat Sci 17(3):235–255
Bora A, Sah R, Singh A, Sharma D, Ranjan RK (2022) Interpretation of machine learning models using XAI - A study on health insurance dataset. 2022 10th Int Conf Reliab Infocom Technol Optim. (Trends Futur. Dir. ICRITO 2022. IEEE; (Ml):1–6
Britannica. Credit Card. Encycl. Br. 2016 [cited 2023 Jul 24]. Available from: https://www.britannica.com/money/topic/credit-card
Brownlee J (2020) How to develop voting ensembles with python . Mach. Learn. mastery. [cited 2023 Jul 24]. Available from: https://machinelearningmastery.com/voting-ensembles-with-python/
CCDataset (2018) Credit card fraud detection: anonymized credit card transactions labeled as fraudulent or genuine . Mach Learn Gr - ULB. [cited 2023 Aug 12]. p. 1. Available from: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
Chan PK, Fan W, Prodromidis AL, Stolfo SJ (1999) Distributed data mining in credit card fraud detection. IEEE Intell Syst their Appl 14(6):67–74
Chowdhury S, Schoen MP (2020) Research paper classification using supervised machine learning techniques. 2020 Intermt. Eng Technol Comput IETC 2020. Institute of Electrical and Electronics Engineers Inc
Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2018) Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans Neural Networks Learn Syst IEEE; 29(8):3784–97
Devikar M, Khadke A, Lad A, Sapkal R, Nikalje S (2020) Credit card fraud detection using ensemble learning. Int Res J Eng Technol 7(05). Available from: https://www.irjet.net/archives/V7/i5/IRJET-V7I51392.pdf
Dietterich TG (1997) Machine-learning research. AI Magazine 18(4). https://doi.org/10.1609/aimag.v18i4.1324
Dornadula VN, Geetha S (2019) Credit card fraud detection using machine learning algorithms. Procedia Comput Sci Elsevier B.V. p. 631–41
Esenogho E, Mienye ID, Swart TG, Aruleba K, Obaido G (2022) A Neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access IEEE 10:16400–16407
Fanai H, Abbasimehr H (2023) A novel combined approach based on deep Autoencoder and deep classifiers for credit card fraud detection. Expert Syst Appl. Elsevier Ltd; 217(January):119562. Available from: https://doi.org/10.1016/j.eswa.2023.119562
Garg A, Mago V (2021) Role of machine learning in medical research: a survey. Comput Sci Rev 40:100370. https://doi.org/10.1016/j.cosrev.2021.100370
Gerson ES (2023) Steps to take if you are the victim of credit card fraud . Experian.com. [cited 2023 Aug 1]. Available from: https://www.experian.com/blogs/ask-experian/credit-education/preventing-fraud/credit-card-fraud-what-to-do-if-you-are-a-victim/
Ghosh S, Reilly DL (1994) Credit card fraud detection with a neural-network. Proc Hawaii Int Conf Syst Sci 3:621–630
Gupta A, Lohani MC, Manchanda M (2021) Financial fraud detection using naive bayes algorithm in highly imbalance data set. J Discret Math Sci Cryptogr 24(5):1559–1572
Hand DJ, Henley WE (1997) Statistical classification methods in consumer credit scoring: a review. J R Stat Soc Ser a (statistics Soc. Wiley Online Library; 160(3):523–41
Hasib KM, Iqbal MS, Shah FM, Mahmud J Al, Popel MH, Showrov MIH, et al (2020) A survey of methods for managing the classification and solution of data imbalance problem. J Comput Sci Science Publications; 16(11):1546–57
Jin W (2020) Research on machine learning and its algorithms and development. J Phys Conf Ser. Institute of Physics Publishing
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1). https://doi.org/10.1186/s40537-019-0192-5
Joshi C, Bharti V, Ranjan RK (2021a) Botnet detection using machine learning algorithms. Proc Int Conf Paradig Comput Commun Data Sci. PCCDS 2020. p. 717–27
Joshi C, Ranjan RK, Bharti V (2021b) ANN based Multi-Class classification of P2P Botnet. Int J Comput Digit Syst 11(1):1391–1325. https://doi.org/10.12785/ijcds/1101107
Jurgovsky J, Granitzer M, Ziegler K, Calabretto S, Portier PE, He-Guelton L, et al (2018) Sequence classification for credit-card fraud detection. Expert Syst Appl. Elsevier Ltd 100:234–45. Available from: https://doi.org/10.1016/j.eswa.2018.01.037
Kho JRD, Vea LA (2017) Credit card fraud detection based on transaction behavior. IEEE Reg. 10 Annu Int Conf Proceedings/TENCON. 2017-Decem(2):1880–4
Kumar VKS, Kumar VVG, Shankar VA, Pratibha K (2020) Credit card fraud detection using machine learning algorithms. Int J Eng Res Technol 9(7):1526–1530. https://doi.org/10.17577/IJERTV9IS070649
Kumari N, Kannan S, Muthukumaravel A (2014) Credit card fraud detection using Hidden Markov Model-A survey. Middle - East J Sci Res 20(6):697–699
Maalouf M (2011) Logistic regression in data analysis: an overview. Int J Data Anal Tech Strateg 3(3):281–299. https://doi.org/10.1504/IJDATS.2011.041335
Mienye ID, Sun Y (2023) A deep learning ensemble with data resampling for credit card fraud detection. IEEE Access 11(February):30628–38
Mill E, Garn W, Ryman-tubb N, Turner C (2023) Opportunities in real time fraud detection : an explainable artificial intelligence ( XAI ) Research Agenda. Int J Adv Comput Sci Appl 14(5):1172–1186
Modi K, Dayma R (2018) Review on fraud detection methods in credit card transactions. Proc. 2017 Int. Conf. Intell. Comput. Control. I2C2 2017. Institute of Electrical and Electronics Engineers Inc. p. 1–5
Mohammed R, Rawashdeh J, Abdullah M (2020) Machine learning with oversampling and undersampling techniques: overview study and experimental results. 2020 11th Int. Conf. Inf. Commun. Syst. ICICS 2020. Institute of Electrical and Electronics Engineers Inc p. 243–8.
Mrinali K (2021) Equifax hack: 5 biggest credit card data breaches . Investopedia. [cited 2023 Jul 24]. Available from: https://www.investopedia.com/news/5-biggest-credit-card-data-hacks-history/
Najadat H, Altiti O, Aqouleh AA, Younes M (2020) Credit card fraud detection based on machine and deep learning. 2020 11th Int. Conf. Inf. Commun. Syst. ICICS 2020. Institute of Electrical and Electronics Engineers Inc p. 204–8
Nishi NJ, Akter Sunny F, Bakchy SC (2022) Fraud detection of credit card using data mining techniques. 2022 4th Int Conf Sustain Technol Ind 4.0. p. 1–6
Odegua RO (2019) An empirical study of ensemble techniques (Bagging, Boosting and Stacking). [cited 2023 Aug 1]. Available from: https://www.researchgate.net/publication/338681864
Pillai TR, Hashem IAT, Brohi SN, Kaur S, Marjani M (2018) Credit card fraud detection using deep learning technique. 2018 Fourth International Conference on Advances in Computing. Communication & Automation (ICACCA), Malaysia, pp 1–6. https://doi.org/10.1109/ICACCAF.2018.8776797
Plakandaras V, Gogas P, Papadimitriou T, Tsamardinos I (2022) Credit card fraud detection with automated machine learning systems. Appl Artif Intell. Taylor & Francis 36(1). Available from: https://doi.org/10.1080/08839514.2022.2086354
Polikar R (2009) Ensemble-based system . Scholarpedia. [cited 2023 Jul 24]. Available from: http://www.scholarpedia.org/article/Ensemble_learning
Pozzolo DA, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G (2014) Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl. Elsevier Ltd;41(10):4915–28. Available from: https://doi.org/10.1016/j.eswa.2014.02.026
Psychoula I, Gutmann A, Mainali P, Lee SH, Dunphy P, Petitcolas F (2021) Explainable machine learning for fraud detection. Computer (Long. Beach. Calif) 54(10):49–59
Rahmani AM, Yousefpoor E, Yousefpoor MS, Mehmood Z, Haider A, Hosseinzadeh M et al (2021) Machine learning (Ml) in medicine: review, applications, and challenges. Mathematics 9(22):1–52
Rakhshaninejad M, Fathian M, Amiri B, Yazdanjue N (2021) An ensemble-based credit card fraud detection algorithm using an efficient voting strategy. The Computer Journal, vol 8. Oxford University Press, pp 1998–2015. https://doi.org/10.1093/comjnl/bxab038
Randhawa K, Loo CK, Seera M, Lim CP, Nandi AK (2018) Credit card fraud detection using AdaBoost and majority voting. IEEE Access IEEE 6:14277–14284
Sahithi GL, Roshmi V, Sameera YV, Pradeepini G (2022) Credit card fraud detection using ensemble methods in machine learning. 2022 6th Int. Conf. Trends Electron. Informatics, ICOEI 2022 - Proc. IEEE;(Icoei):1237–41
Salman R, Alzaatreh A, Sulieman H (2022) The stability of different aggregation techniques in ensemble feature selection. J Big Data. Springer International Publishing 9(1):1–23. Available from: https://doi.org/10.1186/s40537-022-00607-1
Sánchez D, Vila MA, Cerda L, Serrano JM (2009) Association rules applied to credit card fraud detection. Expert Syst Appl 36(2 PART 2):3630–40
Shenvi P, Samant N, Kumar S, Kulkarni V (2019) Credit card fraud detection using deep learning. 2019 IEEE 5th Int Conf Converg Technol:1–5
Shirgave SK, Awati CJ, More R, Patil SS (2019) A review on credit card fraud detection using machine learning. Int J Sci Technol Res 8(10). [cited 2023 Aug 1]. Available from: https://www.ijert.org
Singh A, Ranjan RK, Tiwari A (2022) Credit card fraud detection under extreme imbalanced data: a comparative study of data-level algorithms. J Exp Theor Artif Intell. Taylor and Francis Ltd 34(4):571–98
Sohony I, Pratap R, Nambiar U (2018) Ensemble learning for credit card fraud detection. ACM Int Conf Proceeding Ser. Association for Computing Machinery 289–94
Thach NH, Rojanavasu P, Pinngern O (2008) Cost-xensitive XCS classifier system addressing imbalance problems. 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery. Jinan, China, pp 132–136. https://doi.org/10.1109/FSKD.2008.391
Thennakoon A, Bhagyani C, Premadasa S, Mihiranga S, Kuruwitaarachchi N (2019) Real-time credit card fraud detection using machine learning. 2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence). Noida, India, pp 488–493. https://doi.org/10.1109/CONFLUENCE.2019.8776942
Thomas GD (1997) Machine learning research: four current directions. Artif Intell Mag 18(4):97–136
Trivedi NK, Simaiya S, Lilhore UK, Sharma SK (2020a) An efficient credit card fraud detection model based on machine learning methods. Int J Adv Sci Technol 29(5):3414–24. [cited 2023 Aug 1]. Available from: http://sersc.org/journals/index.php/IJAST/article/view/12032
Trivedi NK, Simaiya S, Lilhore UK, Sharma SK (2020) An efficient credit card fraud detection model based on machine learning methods. Int J Adv Sci Technol 29(5):3414–3424
UCSD (2019) UCSD: University of California, San Diego Data Mining Contest 2009. cs.purdue
Valentini G, Masulli F (2002) Ensembles of learning machines. Lect Notes Comput Sci (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). Springer Verlag. p. 3–20
Whitrow C, Hand DJ, Juszczak P, Weston D, Adams NM (2009) Transaction aggregation as a strategy for credit card fraud detection. Data Min Knowl Discov 18(1):30–55
Wu G, Chang EY (2005) KBA: kernel boundary alignment considering imbalanced data distribution. IEEE Trans Knowl Data Eng 17(6):786–795
Xie Y, Li A, Gao L, Liu Z (2021) A heterogeneous ensemble learning model based on data distribution for credit card fraud detection. Wirel Commun Mob Comput. https://doi.org/10.1155/2021/2531210
Yang C-Y, Yang J-S, Wang J-J (2009) Margin calibration in SVM class-imbalanced learning. Neurocomputing 73(1):397–411. https://doi.org/10.1016/j.neucom.2009.08.006
Ying X (2014) Ensemble learning. [cited 2023 Aug 1]. Available from: https://www.researchgate.net/publication/262369664
Younas MZ (2021) Credit card fraud detection using machine learning algorithms. Universe Int J Interdiscip Res 1(4):274–283
Zareapoor M, Shamsolmoali P (2015) Application of credit card fraud detection: based on bagging ensemble classifier. Procedia Comput. Sci. Elsevier B.V. 679–85
Zhu H, Zhou MC, Liu G, Xie Y, Liu S, Guo C (2023) NUS: Noisy-sample-removed undersampling scheme for imbalanced classification and application to credit card fraud detection. IEEE Trans Comput Soc Syst. IEEE PP:1–12
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interests/competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chhabra, R., Goswami, S. & Ranjan, R.K. A voting ensemble machine learning based credit card fraud detection using highly imbalance data. Multimed Tools Appl 83, 54729–54753 (2024). https://doi.org/10.1007/s11042-023-17766-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17766-9