Keywords

1 Introduction

To assess credit risk, the borrower’s creditworthiness is analyzed. In banking practice, creditworthiness is interpreted as a desire combined with the ability to repay the issued obligation in a timely manner. According to this definition, the main goal of scoring is not only to find out whether the client is able to pay back the loan or not but also to examine the degree of client’s reliability and commitment.

In the banking system, when a person applies for a loan, the bank may have the following information to analyze:

  • the questionnaire filled out by the borrower

  • information on this borrower from the credit bureau, an organization that stores the credit histories of the entire adult population of the country

  • the borrower’s account history, if he or she is the bank’s client.

Credit analysts use the following concepts: clients’ “attribute-characteristics” (in terms of mathematical logic—variables, factors) and “grade-values” that a variable takes. In the questionnaire that the client fills out, the characteristics are represented by the questions (age, marital status, profession), and the grade-values are the answers to these questions.

In its simplest form, a scoring model is a weighted sum of certain characteristics. The result is an integrated parameter (score); the higher it is, the more reliable the client is, and the bank can order the clients according to their level of creditworthiness in increasing order.

The integrated parameter of each client is compared with a certain numerical threshold, or boundary line, which is essentially a break-even line and is obtained from the reckoning of the average number of clients paying on time needed to compensate for losses from a single debtor. Clients with an integrated parameter above this line are given credit, while clients with an integrated parameter below this line are not.

Currently, it is customary to distinguish four areas of scoring:

  1. 1.

    Application scoring—models for evaluating the financial status of an entity to decide on the feasibility of a transaction

  2. 2.

    Behavioural scoring—models for evaluating the financial status of an entity in the process of implementing a transaction

  3. 3.

    Collection scoring—models for building relationships with entities in high-risk transactions

  4. 4.

    Fraud scoring—models for building relationships with entities to minimize non-financial (in particular, legal) transaction risks.

The first credit scoring models were developed by Fair Isaac Corporation more than half a century ago. The scores resulting from these models are named after the company—FICO. Now the FICO score is widely known and massively used in the United States and Canada when making decisions about issuing loans. The FICO score is calculated based on information from the three largest national credit bureaus: Experian, Equifax, and TransUnion. Depending on the credit bureau whose data is used for calculation, the credit score varies slightly.

The FICO score ranges from 300 to 850. A higher score, as in most other models, corresponds to lower risks. It should be noted that determining the threshold for screening applications that will not be satisfied requires additional efforts. There is no strictly defined procedure and the choice of this feature depends on the bank’s strategy: what risks the bank is willing to accept, how much it seeks to expand its loan portfolio, etc.

2 Credit Risk Assessment in the Banking Sector

Currently, the main algorithms used in credit scoring models are:

  • logistic regression

  • neural networks

  • decision trees (and their ensembles such as random forests and gradient boosting).

The use of various machine learning methods, such as neural networks, logistic regression, random forests, etc. in credit scoring has been considered in many papers.

So in the paper West, D. [1] the use of 5 different neural network architectures for credit scoring tasks was considered and benchmarked against classical statistical methods including discriminant analysis, logistic regression and nonparametric methods. The results of the study showed that the use of artificial neural networks could significantly improve the quality of classification.

Boguslauskas, V., Mileris, R. concluded in their work [2] that neural networks and logistic regression appeared to be the most effective models for solving the problem of credit scoring. Their analysis showed that artificial neural networks were superior to other methods in terms of prediction accuracy.

The paper Pawel Plawiak, Moloud Abdar, Joanna Plawiak, Vladimir Maka-renkov [3] also considered the credit scoring application of neural networks. The authors used a genetic algorithm of a deep learning neural network on a small data set, which resulted in outperforming traditional scoring algorithms.

The purpose of the present research is to study and summarize theoretical and practical issues of using neural networks in the banking sector for credit risk analysis.

The information base is a table of clients of an American bank with data for the period 2011–2015. The database contains information about more than 800,000 clients including 74 characteristics for each client (see Fig. 1).

Fig. 1.
figure 1

Information base.

The objective of the study is to build a neural network using real data to assess credit condition (good, bad).

Formation of requirements to the model. Below we list some important clients’ attribute-characteristics:

  1. 1.

    ID. Borrower’s identification number

  2. 2.

    Loan_amnt. Amount of the loan requested by the borrower

  3. 3.

    Funded_amnt. Amount of the loan issued

  4. 4.

    Term. Period for which the loan was issued

  5. 5.

    Int_rate. Interest rate of the loan

  6. 6.

    Installment. Amount of regular payment

  7. 7.

    Grade and Sub_grade. Score of the borrower’s reliability

  8. 8.

    Emp_length. Borrower’s employment length

  9. 9.

    Home_ownership. The form of housing tenure of the borrower (own, rent, mortgage)

  10. 10.

    Annual_inc. Borrower’s annual income

  11. 11.

    Loan_status. Current status of the loan (current, fully paid, late).

  12. 12.

    Issue_d. Loan issue date

  13. 13.

    Purpose. The purpose provided by the borrower for the loan request (car, business, educational).

Let us look at the amounts of loans requested by the clients and what loans were issued overall and by year (see Fig. 2).

Fig. 2.
figure 2

Amount of loans issued

We also consider the average amount of loans issued by year (see Fig. 3):

Fig. 3.
figure 3

Average amount of loans issued by year.

According to the charts, most of the loans issued were in the range of $10,000 to $20,000. We can also note a steady increase in the average amount of the loans issued.

Let us consider the number of loans by their status:

  1. 1.

    Current – 601779

  2. 2.

    Fully paid – 207723

  3. 3.

    Charged off – 45248

  4. 4.

    Late (31–120 days) – 11591

  5. 5.

    Issued – 8460

  6. 6.

    In Grace Period – 6253

  7. 7.

    Late (16–30 days) – 2357

  8. 8.

    Does not meet the credit policy. Status: Fully Paid – 1988

  9. 9.

    Default – 1219

  10. 10.

    Does not meet the credit policy. Status: Charged Off – 761.

We shall consider Charged Off, Default and Late (in any stage) as bad loans. Now let us look at the ratio of good loans to bad loans, as well as their number by year (see Fig. 4 and 5).

Fig. 4.
figure 4

The ratio of good loans to bad loans

Fig. 5.
figure 5

Percentage of good and bad loans by year

Two important conclusions can be drawn from these charts. First, bad loans comprise only 7.6% of all loans issued. Second, it is important to remember that the database contains a lot of current loans, which may become bad and somewhat affect the quality of the network.

Let us consider the importance of the borrower’s credit score. In order to understand exactly how the grade of the credit score affects the final risk, we need to consider the number of bad loans against the borrower’s score. Let us plot the number of loans issued depending on the borrower’s credit score (see Fig. 6).

Fig. 6.
figure 6

Number of loans issued

Conclusions on the impact of credit score:

  • The scores that had a lower grade received a larger amount of loans in comparison to the higher grade of credit score. This contributes to a higher level of risk for the bank as a whole.

  • The interest rate increases as the grade deteriorates.

  • Most bad loans were issued to borrowers with a grade of “B”.

Let us explore the reasons why a loan becomes bad. Logically, it can be assumed that the borrower’s credit score and annual income will have the greatest impact on the level of credit risk. We will identify factors that increase the risk of loan default, such as low annual income, high interest rate, and low grade of the credit score. Let us build a correlation heatmap based on numerical variables (see Fig. 7):

Fig. 7.
figure 7

Correlation heatmap.

Let us plot the amounts of bad loans with a breakdown by condition (see Fig. 8).

Fig. 8.
figure 8

Amount of bad loans by condition.

According to this plot, bad loans tended to decline by 2015.

3 Analysis of the Results Obtained

Let us describe the structure of the neural network that will be used for prediction [4,5,6]. The neural network will consist of input neurons, two output layers, and two hidden layers, with 66 neurons in each. For the research, a feedforward network will be used. The activation function will be the ReLu function [7, 8].

It is the most convenient function which often performs better than others. Schematically, the neural network has the form (see Fig. 9):

Fig. 9.
figure 9

Neural network scheme.

The scheme shows only 10 neurons in hidden layers for visual clarity. Also, the number of inputs in the scheme is two, but in fact, there will be more of them, equal to the number of attribute-characteristics.

Since there is only 9% of bad loans in the source data, the dataset can be considered unbalanced. Specifically for such cases, an algorithm named Synthetic Minority Over-Sampling Technique (SMOTE) was developed to improve the accuracy of predictions. We explain the principle of operation by giving an example. Assume that the total number of loans is \({{\text {D}}_{\text {0}}}\), then the number of good loans is \({{\text {S}}_0}\) and the number of bad loans is \({{\text {B}}_{\text {0}}}\). Consequently, \({{\text {D}}_{\text {0}}}{\text { = }}{{\text {S}}_{\text {0}}}{\text { + }}{{\text {B}}_{\text {0}}}\). Since the data is highly unbalanced, i.e. \({{\text {S}}_{\text {0}}} \gg {{\text {B}}_{\text {0}}}\), we will increase the percentage of bad loans, as shown in Table 1.

Table 1. Example of SMOTE operation.

It is important to note that instead of using existing data, SMOTE generates new rows by combining the characteristics of the target class with those of its neighbours.

The number of layers, the number of neurons per layer, and the learning rate were selected experimentally [9]. The learning rate is a setting parameter in the optimization algorithm that determines the step size at each iteration when moving to the minimum of the loss function [10, 11]. The loss function is a function that, in statistical decision theory, characterizes the loss associated with incorrect decision-making based on observed data.

Using the Tensorboard library, we will create the final scheme of the neural network (see Fig. 10).

Fig. 10.
figure 10

Neural network scheme.

Assessment of model prediction accuracy. To assess the accuracy of the model, we compare it with the prediction obtained using logistic regression on the same data. The performance of the developed model is compared with the results of alternative models below (see Table 2)

Table 2. Performance comparison of different prediction methods

The table shows that the accuracy of the neural network exceeds the accuracy of other prediction methods.

4 Conclusion

The study was conducted using a neural network with 2 hidden layers, with 66 neurons in each layer 22 categorical and numeric attribute-characteristics were used to create the model, and loans were divided into two classes: good and bad. Since categorical data cannot be used to build a neural network, it was converted to numeric data using the One-Hot Encoding algorithm 887379 observations were used to train the model. Due to the domination of good loans among all observations, SMOTE was used to increase the number of bad loans and thus balance the data. After training the neural network, the prediction accuracy comprised 0.92, which exceeds the results of other prediction methods.