Keywords

1 Introduction

Major financial institutions are making their services available to the general public through online banking, mobile banking, credit and debit cards. Using services like credit cards, which have proven to be extremely good way for online purchases, makes everyday life easier. In the banking industry, credit card and online net banking fraud is a global issue. The credit card or any other card for the matter has data stored in a machine-readable format on a black stripe on the back. It includes details like the cardholder’s name, card number, expiration date, CVV code, card type, and other information that might be used to conduct credit card fraud. In the discipline of fraud detection with classifiers, financial fraud is a major issue. The assumption of balanced distribution in the dataset [1, 2] is a challenge for virtually all classifier learning methods. Machine learning techniques are used to anticipate various bank transactions. This article investigates the effectiveness of machine learning classifiers: logistic regression (LR), linear discriminant analysis (LDA), naïve Bayes (NB), and decision tree (DT). The algorithm’s performance is evaluated using the recall score, accuracy, f1-score, and precision.

1.1 Literature Review

The default rate on credit loans across all commercial banks has been at an all time since last six years, according to Federal Reserve economic statistics, and it is expected to continue to rise into 2021. Duman et al. [3] sought to demonstrate the benefits of using data mining methods such as DT and support vector machines (SVM) to solve the credit card fraud recognition problem and reduce the banks’ risk. The findings revealed that classifiers and added DT classifiers outperformed SVM approaches in tackling the problematic at hand. Wang et al. [4] presented a strategy for detecting credit card fraud based on local isolation coefficient to mining distance-based outliers on conventional algorithms. Bhattacharya et al. [5] detailed a comparative research on data mining methodologies but with the limitation of non-availability of exact time stamp data beyond the date of credit card transactions. APATE [6] is a new technique for detecting fraudulent credit card transactions in highly nonlinear models. A behavior-based credit card fraud detection model was proposed by Zhang et al. [7]. Chuang et al. [8] created a data mining-based model. Web services were utilized to communicate a fraud design, data communication between banks used to detect fraud. To identify credit card thefts, Yu et al. [9] suggested an outlier mining technique. Dembrani et al. [10, 11] proposed a comparative analysis of various adaptive filter structures that can be executed for credit card fraud recognition. A fusion method was presented [12, 13]. The four components were a rule-based filter, a Dempster-Shafer Adder, a transaction history database, and a Bayesian learner. Srivastava et al. [14,15,16] developed a hidden Markov model for detecting credit card fraud. They developed a unique credit card fraud detection system that uses best matching algorithms to detect 4 distinct patterns of fraud cases and addresses the associated difficulties reported by previous credit card fraud detection studies [17,18,19].

1.2 Organization of the Paper

The machine learning techniques applied to the proposed model are explained in Sect. 2. Section 3 depicts the suggested model’s block diagram, flowchart, and entire implementation. Section 4 illustrates the comparative study with existing machine learning approaches. Section 5 discusses the conclusion and future scope.

2 Machine Learning Algorithms

2.1 Logistic Regression

The logistic regression model calculates a weighted sum of input characteristics and bias. Logistic regression is named for the function used at the core of the method, the logistic function. Any integer with a real value can be translated to a value between 0 and 1.The output value being modeled is a binary value (0 or 1) rather than a numeric number, which is a major distinction from linear regression. The logistic regression equation is shown below:

$$y = \frac{{{\text{e}}^{b0 + b1*x} }}{{1 + {\text{e}}^{b0 + b1*x} }}$$
(1)

where y is the expected output, b0 represents the bias or intercept term, and b1 represents the coefficient for a single input value (x). Each column in your input data has a b coefficient (a constant real number) that must be determined using your training data.

2.2 Decision Tree

Decision tree (DT) is a non-parametric supervised learning approach used for classification and regression. The objective is to learn basic decision rules from data characteristics to construct a model that predicts the class of a target variable. Instances are classified using decision trees by sorting them along the tree from the root to a leaf node, which yields the classification. Starting at the root node of the tree, an instance is categorized by testing the attribute given by its node, then proceeding along the tree branch according to the attribute’s value. The sub-tree rooted at the new node is then processed in the same way.

2.3 Naïve Bayes

The naïve Bayes algorithm utilizes the Bayes theorem to classify the data. The naïve Bayes method essentially tells us the likelihood of a record belonging to a definite class constructed on the standards of its characteristics. Gaussian NB is a form of naïve Bayes that handles continuous data and follows the Gaussian normal distribution.

$$P\left( {x_i |y} \right) = \frac{1}{{\sqrt {2\pi \sigma^2 } }}\exp \left( { - \frac{{\left( {x_i - \mu_y } \right)^2 }}{2\sigma^2_y }} \right)$$
(2)

The parameters σy and μy are estimated using maximum likelihood.

The Bernoulli NB decision rule is based on the Bernoulli naïve Bayes decision rule:

$$P\left( {x_i |y} \right) = P\left( {i|y} \right)x_i + \left( {1 - P\left( {i|y} \right)} \right)\left( {1 - x_i } \right)$$
(3)

2.4 Linear Discriminant Analysis (LDA)

The LDA model assumes that the data is normally distributed and estimates the mean and variance for each class. It is common to assume about this in the multivariate (single input parameter) case with two classes. Overall mean (μ) number of each input (x) for each class (k) may be found by dividing the sum of values by the total number of values.

$$k = \frac{1}{nk}\left( {{\text{sum}}\left( x \right)} \right)$$
(4)

The numeral of events with class k is nk, and the mean value of x for class k is μk. The variance (σ2) is calculated to average squared modification of all value from the mean:

$$\sigma^2 = \frac{1}{{\left( {n - k} \right)}}{\text{sum}}\left( {\left( {x - } \right)^2 } \right)$$
(5)

3 Implementation

Figure 1 displays the block diagram of the proposed model. The suggested model’s operation is sequenced as follows: data collection, data processing, research into the appropriate model for the type of data, the model training and testing and evaluation. It is the most crucial stage in improving the accuracy of machine learning models. In supervised learning, an AI system is provided with data that has been labeled, meaning that each piece of information has been assigned to an appropriate label. Some of the most often used classification algorithms are support vector machine, naïve Bayes, logistic regression, decision trees, and KNN.

Fig. 1
figure 1

Block diagram of the proposed model

The dataset is divided into three categories as training data, validation data, and test data. To train the classifier, start with the training dataset, then fine-tune the parameters with the validation dataset, and lastly, evaluate the classifier’s performance with the test dataset. The classifier has access only to the training and/or validation sets. The test dataset must not be used through classifier training. A testing set would be given mostly during the classifier’s evaluation. Validation datasets are used to fine-tune the parameters of a classifier. When the categorical feature is ordinal, the categorical data encoding approach is used. The dataset is initially imbalanced, therefore, data rebalancing technique has been applied.

Figure 2 depicts the flowchart of the entire model stages followed by this technique. We have used two datasets: European and German datasets. The data is preprocessed and separated into two subsets: training and testing. The model is optimized using hyperparameter tuning, and the parameters such as accuracy, precision, recall, and F1-score are calculated using it.

Fig. 2
figure 2

Flow chart of the proposed model

4 Results and Discussions

The European credit card dataset [14] on which PCA technique has already been applied, contained 28 numerical features. The German credit card dataset [15] contained 21 features out of which 12 are categorical and 7 are numerical. The proposed methodology is implemented in Python and uses machine learning classification methods. Several machine learning models such as LR, LDA, naïve Bayes, BernoulliNB, and decision tree are used to analyze it. Hyperparameter optimization is carried out using GridSearchCV (Table 3 and 4).

Table 1 Performance analysis without hyperparameter optimization on German dataset
Table 2 Performance analysis without hyperparameter optimization on European dataset
Table 3 Performance analysis using hyperparameter optimization on German dataset
Table 4 Performance analysis using hyperparameter optimization on European dataset
Table 5 Confusion matrix of decision tree classifier on German dataset
Table 6 Confusion Matrix of LDA classifier on European dataset

The results from the proposed methodology implemented using decision tree classifier are compared with that in Patil et al. [16] on German dataset which signifies the superiority of the method proposed (Table 7).

Table 7 Comparative analysis with the existing results

5 Conclusion

This article assesses the performance of different machine learning classification algorithms by means of a German credit card dataset to perceive whether or not an operation/transaction is fraud. The credit card dataset was imported, preprocessed, encoded, and equipped for training the model using the machine learning workflow mechanism. The models were verified using both hyperparameter optimization and non-hyper-parameter optimization methods. It was then trained, deployed, and assessed for each classification model using multiple parameters and assumptions. The decision tree classifier outperforms the LR, LDA, and naïve Bayes algorithms in terms of performance. Ensembling all utilized models utilizing voting ensemble or weighted average ensemble can help increase the model’s accuracy.