Keywords

1 Introduction

Subsequent paragraphs, however, are indented. Kidneys are bean shaped organs in the human body located at the backside. Healthy kidneys are about 5 inches in size. The change in kidney size indicates an unhealthy kidney condition. Kidneys purify about 200 L of blood per day. The major function of the kidneys is to filter excess water, salts, and waste from the blood. The appropriate operation of this entire process is required to maintain electrolytes in a healthy level. Figures 1 and 2 show the healthy and unhealthy kidneys.

Fig. 1
A flowchart begins with patient data, followed by preprocessing, feature extraction, and classification level. The classification level further branches into normal and abnormal. The normal points to stop. Abnormal divides into A K I and C K D.

Proposed model for kidney disease prediction [29]

Fig. 2
A flow chart for C K D and A K I prediction. The flow begins with the patient dataset, followed by pre-processing, feature selection, algorithms, evaluation performance, and C K D and A K I. The algorithms include S V M, K N N, Decision Tree, and Random Forest.

Proposed framework for CKD and AKI prediction

Ailments related to kidney are becoming more prevalent. Kidney damage happens slowly among many people over many years, generally as a result of diabetes mellitus or blood pressure, and it can be termed as CKD, whereas AKI happens when a person’s renal function changes suddenly due to illness, accident, or by the use of certain drugs. This can effect the healthy people whose healthy kidneys or problems have related to kidneys. Chronic kidney disease (CKD) is usually dangerous condition if not identified at an early stage. Its progression is prevented by early detection and effective management [1,2,3,4,5,6]. It is vital to discover such disorders at an early stage in order to extend a patient's lifespan. Kidney disease is a quiet and serious disease that affects people all over the world. It is harmful since the symptoms do not appear until the kidney’s functions have deteriorated by 85–90%. According to Global Burden of Diseases (GBDs), over 1.2 million individuals died from kidney disease in some form. Since 2005, the proportion has raised by 32%, implying that the death rate of renal patients has increased by 32% over a ten-year period. According to the findings of the study, around 5–10 million individuals die each year as a result of kidney failure.

2 Literature Survey

To predict renal disorders, SVM and ANN were used [7, 8]. The study examined the accuracy and execution time of the two methods mentioned above. To develop a set of features that can predict kidney damage, effectively feature selection algorithms are employed. The reduced feature set reduces costs, improves efficiency, and eliminates ambiguity [9]. To predict at an early stage, the combination of machine learning algorithms and predictive modeling is proposed [10]. ANN models were assessed for predicting patient’s lifespan, especially while suffering with CKD [11, 12]. K-means algorithm was used to extract information about how CKD markers interact with patient’s mortality and analyzed clustering methods to predict dialysis patient’s lifespan. By using Hadoop environment, different machine learning algorithms are used, and KNN and SVM with an AUROC 0.83 is achieved [13]. Gradient boosting algorithms and clinical information from EHR to present a one-year prediction model for CKD [14] among diabetic patients [15]. Convolutional auto-encoder is used to encode the temporal features, which exceeded baseline models by using EHR data containing sequences of lab test results to predict the risk of progressing from the first to the second stage of diabetic nephropathy [16]. The prediction model in kidney disease patients is proposed, especially for hypertension individuals using textual and numeric data from EHR. A neural network, based on bidirectional long- and short-term memories and auto-encoders, were used to encode both textual, numerical data. Under-sampling is used to balance the data and is able to get an accuracy of 89.7% using tenfold cross-validation [17]. Dataset containing missing values are dealt since it results in reduction of the model's accuracy and prediction outputs. They discovered a solution to this by performing a recompilation process on CKD stages, which resulted in unknown values. They recalculated missing data to fill up the gaps [18]. Using several machine learning classifications techniques, the authors worked on reducing diagnosis time and increased accuracy for the same. The classification of different stages of CKD based on severity is proposed. Using algorithms such as the RBF, RF, and BPNN, the results shown that RBF algorithm performs better than other classifiers, with an accuracy of 85.3% [19, 20].

3 Proposed Work

3.1 Kidney Disease Classification Using Machine Learning Based on Pathological Data

Analyzing the medical data is a very sensible matter and that must be done correctly for disease prediction, detection, and analysis. This results in developing accurate tools and usage of such effective machine learning algorithms [21] which accurately detect or for diagnosing the disease. Appropriate and effective analysis of medical data has ushered in a revolution in machine learning field, especially for the widespread usage computationally demanding algorithms in recent years. However, existing number of clinical issues, namely accuracy, dependability, and rapid decision models, must be solved in order to guide physicians while diagnosing disease effectively [22, 23]. The classifiers’ performance for disease prediction is determined by the obtained quality medical data and the classifier models used for the classification process. As a result, it is critical to employ various classifiers to correctly and accurately assess sensitive medical data in order to anticipate and detect diseases. In machine learning, classification [19, 20, 24,25,26,27,28] is a crucial challenge to extract knowledge from various real-time issues. Hence, a well-developed model shown in Fig. 1 is required to accurately predict the target class using collected data at multiple categorization levels.

3.2 Early-Stage CKD Prediction Using Fuzzy Systems

The major goal of developing this fuzzy expert system [30] is to assist doctors in detecting CKD in patients. This medical expert system can detect a disease and assist specialists in providing proper and appropriate treatment. Here, a patient’s data is taken as input and classifying the stage of the patients is expected output. Inference and defuzzification are used to process the output as shown in Fig. 2 [7, 31,32,33]. The fuzzy system contains various methods like fuzzification, inference, and defuzzification for processing the output. A fuzzy expert system [34] is a conceptual framework used to diagnose and also to manage chronic kidney disease. The rule-based model receives its membership value through the defuzzification technique. This method converts output (linguistic values) into crisp values [35, 36].

4 Methodology

4.1 Dataset

We used a publicly available chronic kidney disease dataset from UCI repository. It contains 400 instances and 25 attributes.

4.2 ML Classifiers

Algorithm for proposed model

Input. Patient’s dataset

Output. Correct classification of patient’s dataset under various classification algorithms.

Step 1. Load dataset.

Step 2. Pre-processing the data.

  • Row-elimination technique to deal with missing values.

  • Convert the categorical values into numerical values

Step 3. Construct the classifier model (LR, DT, RF, and SVM) for preprocessed dataset.

Step 4. Performance analysis of constructed classifier models in step 3.

4.3 Classification Accuracy

Equation (1) is used to calculate the accuracy of given models:

$$Accuracy = \frac{TP + TN}{{TP + TN + FP + FN}},$$
(1)

where TP, TN, FP, and FN are observations and prediction values which are given in terms of true, false, positive, and negative.

4.4 Fuzzification

Medical diagnosis frequently requires a thorough examination of a patient in order to determine whether the patient is suffering from a suspected condition. If we consider sugar level, it may be high for one patient and low for another patient or no sugar for others. So here are combined features and its strengths to obtain an accurate diagnostic conclusion. Here, physicians’ experience is used in the current study to create a database of various fuzzy rules. Based on fuzzy decisions [37], a computer software can be developed to automatically evaluate if a patient with specific symptoms is suffering from one or other kind of a diseases.

The profile table can be determined as [r pij, rij, v].

$$\sigma i = {\raise0.7ex\hbox{${\mathop \sum \limits_{j = 1}^{j - ki} (Wij\delta ij)}$} \!\mathord{\left/ {\vphantom {{\mathop \sum \limits_{j = 1}^{j - ki} (Wij\delta ij)} {\mathop \sum \limits_{j = 1}^{j = ki} Wij}}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\mathop \sum \limits_{j = 1}^{j = ki} Wij}$}}.$$
(2)

Equation (2) is used to take a diagnosis decision by adding the impact of Ki relevant features by adding weighing factor wij. In this case, all the features will have equal weighted factor.

$$\sigma i = \frac{1}{ki}\mathop \sum \limits_{j = 1}^{j = ki} \delta ij.{ }$$
(3)

Equation (3) is to obtain precise crisp numbers which indicates the probability of each disease in the set S.

For the given data, the first step is to perform normality test. The main risk factors for CKD are SCR, blood sugar, blood pressure, age, GFR, and smoke. Here, normality tests are performed for GFR and SCR because these are main factors for CKD prediction.

Normality. Normality check is very important while considering any pathological or numerical data because the obtained data contains lots of imprecision. To deal with imprecise data, normality check is must.

Confidence Indicator. CIs can be used to determine the ranges that will function as the fuzzy [38] sets in the outputs and input variables of a given model.

After normality test is done, to measure uncertainty in variables, a confidence indicator is used. Equation (4) for confidence interval is:

$$CI = \overline{X} \pm {\text{\rm Z}} \frac{s}{\sqrt n }.$$
(4)
  • CI = confidence interval.

  • Z = confidence level value.

  • s = sample standard deviation.

  • n = sample size.

IF–THEN-RULES (knowledge base). The fuzzy variables for output categorization are linked with set of rules in this step. Mandani fuzzy rule-based model is used to store fuzzy rules [39]. Different membership functions are selected and analyzed for certain results, such as parameter as normal, moderate, or critical, using the MATLAB-FIS editor. Finally, the condition of a patient is established using the prepared rule bases, taking into account the status of individual parameters [40]. GFR = low (0–15), moderate (15–60), high (60 and above).

From the obtained data, to classify the stages of abnormality in kidney disease, six variables are considered. From these six variables, we get 324 rules.

Confidence indicator is used to estimate the performance using Eq. (5).

$$CI = \frac{{\text{Success number}}}{{\text{total tests}}}*100.$$
(5)

5 Result Analysis

While calculating CI, we got 92% accuracy when fuzzy expert system is used. So, we can say that fuzzy system [41] can help many physicians for diagnosing CKD. Various metrics such as accuracy, precision, specificity, and sensitivity can be used for performance evaluation. In order to evaluate the performance metrics, Table 1 and Table 2 show the performance of classifiers and the confusion matrix must be reduced to 2 × 2 matrix and is shown in Table 3.

Table 1 Performance of various classifiers
Table 2 Confusion matrix for various classifiers
Table 3 2 × 2 matrix

6 Conclusion

To predict the kidney disease at an early stage, the given input data was first classified into two levels, i.e., AKI and CKD using binary classification. A model was built using LR, SVM, DT, and RF. The random forest performed better while comparing to other classifiers, with an accuracy of 98.7%. In order to identify various stages of the disease, the attributes are combined to predict the outcome. By using fuzzy system, the inference rules were built and trained a model on these rules and achieved 96% accuracy when 200 rules were used for training using fuzzy expert system. So, it can be concluded that this can be helpful for many physicians in taking decisions to predict the severity of the disease at an early stage.