An effective approach for early liver disease prediction and sensitivity analysis

Khan, Md. Ashikur Rahman; Afrin, Faria; Prity, Farida Siddiqi; Ahammad, Ishtiaq; Fatema, Sharmin; Prosad, Ratul; Hasan, Mohammad Kamrul; Uddin, Main; Zayed-Us-Salehin

doi:10.1007/s42044-023-00138-9

An effective approach for early liver disease prediction and sensitivity analysis

Research
Published: 20 March 2023

Volume 6, pages 277–295, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Iran Journal of Computer Science Aims and scope Submit manuscript

An effective approach for early liver disease prediction and sensitivity analysis

Download PDF

Md. Ashikur Rahman Khan¹,
Faria Afrin¹,
Farida Siddiqi Prity¹,
Ishtiaq Ahammad²,
Sharmin Fatema¹,
Ratul Prosad¹,
Mohammad Kamrul Hasan¹,
Main Uddin¹ &
…
Zayed-Us-Salehin¹

238 Accesses
7 Citations
Explore all metrics

Abstract

The liver is one of the most vital organs of the human body. Even when partially injured, it functions normally. Therefore, detecting liver diseases at the early stages is challenging. Early detection of liver problems can improve patient survival rates. This research enlightens on several Artificial Intelligence techniques, including the Bagged Tree, Support Vector Machine, K-Nearest Neighbor, and Fine Tree classifier, to predict the presence of liver disease in a patient at an early stage. This study compares those models and selects the best technique to detect liver disease at an early stage. The classification performance is measured using the confusion matrix, True Positive Rate (TPR), False Positive Rate (FPR), ROC curve, and accuracy. The result shows that the Bagged Tree classifier achieves the highest classification accuracy (81.30%), which is very promising compared to the other algorithms. The proposed system also performs sensitivity analysis on the dataset to investigate the impact of each attribute on the model’s performance. It has been demonstrated that Alanine Aminotransferase (sgpt) attribute has the most significant impact on the prediction of liver disease. The proposed method could be used as an assistant framework for liver disease detection at an early stage.

Early-Stage Detection of Liver Disease Through Machine Learning Algorithms

Comparative Assessment of Performances of Various Machine Learning Algorithms in Detection of Liver Ailments

Prediction of Liver Disease Using Machine Learning Approaches Based on KNN Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The liver is the second most significant internal organ in the human body, and it plays an integral part in metabolism and various other processes, such as red blood cell decomposition. The liver performs numerous essential bodily activities, including digestion, metabolism, immunity, and nutrition storage. These tasks establish the liver as a vital organ; without it, body tissues would quickly perish due to a lack of energy and nutrition. A variety of variables contribute to the development of liver disease. Even though the liver is partly damaged, it works as usual. Generally, doctors can predict liver disease only when the liver is severely affected. Therefore, there is complexity in liver disease detection at an early stage.

Liver disease accounts for approximately two million deaths per year worldwide, one million due to complications of cirrhosis and one million due to viral hepatitis and hepatocellular carcinoma [1]. Cirrhosis and liver cancer account for 3.5% of all fatalities worldwide and are currently the 11th and 16th most common causes of mortality worldwide [1]. Hence, liver testing should be readily available, and the cost of testing should be low.

Machine Learning (ML) applications are making a considerable impact on healthcare. ML is a subtype of Artificial Intelligence (AI) technology that aims to improve the speed and accuracy of physicians' work [2]. Countries are currently dealing with an overburdened healthcare system with a shortage of skilled physicians, where ML provides a big hope. It is crucial in medicine because it can identify patterns in large data sets and improve the process of identifying risk or disease-related diagnostic indicators. ML techniques can assist clinical management and specialists in exploring excellent performance in many medical applications, such as medical image analysis [2,3,4,5], language processing [6, 7], and tumor or cancer cell detection [8,9,10,11,12]. Classification algorithms from ML have begun to be used in clinical treatment. It is possible to extract knowledge using classification algorithms. Access to large-data classification algorithms will aid clinicians in making better judgments and improving patient consequences through accurate liver disease prediction. Many ML-based techniques have been employed in the early identification of liver disease.

Onwodi Gregory has recommended two actual liver patient datasets to create classification algorithms to predict liver illness [13]. The datasets are subjected to eleven different data mining classification techniques. According to the experiments' results, the FT Tree algorithm's classification accuracy is superior to that of other algorithms. It also demonstrates outstanding performance, with findings of 78% accuracy, 77.5% precision, 86.4% sensitivity, and 38.2% specificity. Backpropagation Neural Networks and Radial Basis Function Neural Networks have been proposed by Ebenezer Obaloluwa Olaniyi et al. to diagnose disorders and avoid misdiagnosis of liver condition patients [14]. The algorithms are compared to the c4.5, CART, Naive Bayes, and Support Vector Machine (SVM) algorithms, and it is determined that the Radial Basis Function Neural Network is the best model since it has a recognition rate of 70%, which is more accurate and efficient than the other algorithms. Tapas Ranjan Baitharua et al. have concentrated on the component of medical diagnosis through the learning process using the collected data of liver illness to construct intelligent medical decision support systems that will assist clinicians [15]. This research compares multiple classification algorithms' (J.48, SVM, Random Forest) effectiveness and correction rate in classifying these disorders. A comparative comparison of data categorization accuracy utilizing liver disorder data in various situations is conducted in this paper. The prediction abilities of standard classifiers are quantitatively compared. When the results are analyzed, the Multilayer Perceptron provides the best overall classification result, with an accuracy of 71.59%, compared to other classifiers.

In another study, Ramana et al. introduced a Modified Rotation Forest model using a Multilayer Perception (MLP) model and Random Subset feature selection technique for the UCI dataset for liver disease classification [16]. The accuracy of the UCI liver dataset is given at 74.78% using MLP and the random subset approach. On the India liver illness dataset, the accuracy of the KStar model is 73.07% with Correlation-based Feature Selection (CFS). The NBTree algorithm has been developed by Alfisahrin et al. by combining Decision Tree and Naive Bayes algorithms [17]. The accuracy of the NB Tree algorithm is 67.01%, while Decision Tree and Naive Bayes's accuracies are 66.14% and 56.14%, respectively. The Naive Bayes algorithm, on the other hand, has the quickest runtime of all the algorithms. The features of the UCI dataset are determined using a ranking approach. Dhamodharn has examined data mining approaches to treat liver illness [18]. He has compared two models: FT growth and Naive Bayes. Compared to the FT growth model, he has found that Naive Bayes (75.54%) outperforms in terms of accuracy (72.66%). A total of 29 datasets with 12 different attributes are compared.

Gulia et al. have investigated intelligent algorithms for classifying liver patients using UCI datasets [19]. Different algorithms, such as J48, MLP, Random Forest, SVM, Bayesian Network, and the WEKA tool, are used in this study report. J48 scored 70.669%, MLP scored 70.8405%, SVM scored 71.3551%, Random Forest scored 71.8696%, and Bayes Net scored 69.1252% in accuracy measurement after feature selection. Vijayarani et al. have employed Support Vector Machine and Naive Bayes classification methods to predict liver disorders [20]. MATLAB is used to analyze the data. In 1670.00 ms, Naive Bayes achieved 61.28% accuracy, whereas SVM achieved 79.66% accuracy in 3210.00 ms. To predict fatty liver disease, Islam et al. have developed four classification models (Random Forest, Support Vector Machine (SVM), Artificial Neural Network (ANN), and Logistic Regression) [21]. The Logistic Regression technique outperforms all other ML algorithms (accuracy 76.30%, sensitivity 74.10%, and specificity 64.90%).

Singh et al. have created computer programs based on classification methods (such as Logistic Regression, Random Forest, and Naive Bayes) to estimate the likelihood of developing liver disease from a set of data that included the results of liver function tests [22]. Particle Swarm Optimization (PSO) with SVM has predicted the most crucial features for liver disease identification with the highest degree of accuracy over SVM, Random Forest, a Bayesian network, and an MLP-Neural Network. SVM has outperformed Bayesian and other earlier models in terms of accuracy in predicting drug-induced hepatotoxicity with fewer molecular descriptors [23]. A Convolutional Neural Network (CNN) model in hepatitis-infected individuals has accurately identified liver cancer with a 0.980 accuracy rate [24]. When used with imaging data sets, Neural Network techniques can aid in the differentiation between various forms of liver tumors [25].

Due to the adverse effects of liver disease on society, significant efforts have been undertaken to improve the diagnosis and treatment of the condition. Therefore, it is important to determine the most significant attributes for liver disease prediction. In prior works, most studies have looked into how classifiers can reliably detect liver disease cases. However, only a few studies have made an effort to examine all of the patient's conditions and pinpoint the most significant variables required for liver disease prediction. Studies in related fields have shown that selecting the critical feature is crucial for healthcare professionals to understand how the risk factors for liver disease interact and how each affects the precision of liver disease prediction. The current work has presented a high-performance paradigm for efficiently finding the most significant features of liver disease by using sensitivity analysis. Sensitivity Analysis is essential in the medical field to discover the most crucial attribute responsible for the prevalence of the disease. It is a strategy for modifying model input in a controlled manner and evaluating the impact of these changes on the model output. This method reveals the model's sensitivity to these changes and the impact of particular features on the model's performance. It is crucial to determine how reliable the findings from clinical trials are. It plays a vital role in interpreting or proving the integrity of the findings [26]. This study has used standard deviation-based formula for calculating sensitivity analysis. Seven attributes (Age, Gender, Total Bilirubin (TB), Direct Bilirubin (DB), Alanine Aminotransferase (sgpt), Aspartate Aminotransferase (sgot), and A/G (Albumin/Globulin ratio)) have been used as features to detect liver disease. Data have been collected from several clinics and hospitals in different districts of Bangladesh. The patients’ data are pre-processed and analyzed. The collected data are arranged according to category- (male and female), and the affected rates for men and women are calculated. This research has conducted experiments employing ML algorithms for prediction and compared them to the data set of liver disease patients using some assessing criteria. Four different ML algorithms (Bagged Tree, SVM, K-Nearest Neighbor (KNN), and Fine Tree) are utilized in this study. Bagged Tree has been successfully applied in a variety of medical fields, including disease prediction [27,28,29], medical image recognition [30, 31], and gene selection [32,33,34]. SVM has a large number of applications in the medical industry, such as Breast cancer [35,36,37], skin cancer [38, 39], and many other issues relating to disease prognosis. SVM can also attain greater generalization ability in small sample classification assignments. It is also widely utilized in many other domains, including handwritten character recognition, text classification, image classification, and recognition [40,41,42,43,44]. The supervised learning algorithm KNN is primarily employed for classification tasks. It has been extensively applied to disease prediction [45, 46]. Almustafa et al. have used KNN to classify the heart disease dataset [47]. The use of Fine Tree models aids in the early detection of cancer [48, 49], diagnosing cardiac arrhythmias [50, 51], forecasting stroke outcomes [52,53,54], and assisting with chronic disease management [55, 56]. There are no works that perform prediction of liver disease on the same raw data using these four algorithms and pick the best one based on accuracy. The effectiveness of the model is assessed using the confusion matrix and all pertinent metrics, such as the ROC curve, True Positives, True Negatives, False Positives, False Negatives, error rate, accuracy, True Positive Rate, and False Positive Rate, etc. This study focuses on ML algorithms, which produce improved results that can aid physicians in making correct diagnoses. Our best-performing approach constantly provides an accuracy of over 81.3% for all collected data.

The remainder of the paper is laid out as follows. The phases and strategies used in this suggested system are described in Sect. 2. Section 3 offers experimental results for each classifier, displays a comparison chart to investigate which algorithms are more accurate, and conducts sensitivity analysis experiments on the dataset. Finally, Sect. 4 concludes the paper by laying out future guidelines.

2 Methodology

This research aims to develop a model that can predict liver disease in an automated and accurate manner as soon as possible. The research methodology of this study is divided into the following distinct sections: data collection, pre-processing, ML techniques, training and testing, performance analysis, comparative analysis, and sensitivity analysis to attain the research goal. In this section, different ML classification algorithms and their implementations used for predicting liver disease are discussed in detail, and the whole research process is shown step by step. This research of analyzing liver disease is summarized in Fig. 1. It starts with collecting patient details. The data of the patients are then analyzed and normalized. Then normalized data is used in training, testing, and validation. Once training, testing, and validation output is acceptable after comparing all other algorithms, the final model is selected for predicting disease.

2.1 Data collection and processing

The first stage in developing a model is gathering and analyzing data. The details of data collection and processing are discussed below.

2.1.1 Attributes

For the proper diagnosis, evaluating the main attributes of liver disease is necessary. It is found that the attributes of liver disease are Gender, Age, Total Bilirubin, Direct Bilirubin, Total Protein, Albumin, Alanine Aminotransferase, Aspartate Aminotransferase, Albumin/Globulin, Sodium, Potassium, White Blood Cell, Hemoglobin, Body Mass Index, and Red Blood Cell, etc. [13,14,15,16,17,18]. Seven parameters are selected from these attributes in this work, and their details are shown in Table 1.

Table 1 Dataset description

Full size table

2.1.2 Data collection and normalization

The dataset is collected from Dhaka Medical College Hospital and two other private hospitals in Bangladesh. The dataset contains 203 data samples from patients with liver disease and 101 data samples from healthy patients. The differences in the ranges of variables appear; hence, it is essential to normalize the data. Normalization is accomplished according to the equation as follows:

$$\mathrm{Xnormalized }=\frac{x-\mathrm{min}(x)}{\mathrm{max}\left(x\right)-\mathrm{min}(x)}$$

(1)

where Xnormalized is the updated normalized value.

The lowest value of each feature is taken as ‘0’, the highest value is considered as ‘1’, and all other values are converted to an integer between ‘0’ and ‘1’.

2.2 Machine learning techniques

The sample of the liver disease prediction model used in this proposed system is given in Fig. 2. It starts with the input parameters of the dataset. After the pre-processing, the dataset is fed into the proposed ML models. This study has used four classification algorithms of ML for comparative analysis.

a.
Bagged Tree
b.
Support Vector Machine (Linear SVM)
c.
K-Nearest Neighbor (K-NN)
d.
Fine Tree

a. Bagged tree The term “bagging” is an acronym for “bootstrap aggregating,” which uses the original data n times with replacement and a bootstrap or sampling strategy to produce training sets. It is an algorithm for improving the accuracy and stability of ML algorithms used in statistical classification and regression. Additionally, it lowers variance and aids in preventing overfitting. Assume that there is a training data set S with T examples in it. Bootstrap sampling creates a sample of training examples Si by randomly selecting m examples and replacing them with ones from S. The substitution suggests that examples might be repeated in Si. The T classifier is then trained by bagging on each T bootstrap example after the creation of T bootstrap samples. A new instance is classified with the help of the weighted majority of the T-learned classifiers. The result is an ensemble of classifiers.

b. Support vector machine A relatively recent development in supervised ML is the SVM. The kernel Adatron technique is used to implement the SVM. By isolating those inputs near the data's borders, the kernel Adatron maps inputs to high-dimensional feature space and then optimally divides the data into the appropriate classes. As a result, the kernel Adatron is particularly good at separating data sets with complex boundary relationships. SVM cannot be used to approximate functions; it can only be used for classification.

c. k-nearest neighbor: The supervised ML technique known as the KNN can be used to tackle classification and regression issues. It is straightforward to apply. KNN is a form of instance-based learning, often known as lazy learning, in which all computation is postponed until after the function has been evaluated and the function is only locally approximated. Normalizing the training data can significantly increase the accuracy of this method because it uses distance for classification.

d. Fine treeFine Tree learning is one of the predictive modeling techniques used in statistics, data mining, and ML. To move from observations about an item (represented in the branches) to deductions about the item's target value (represented in the leaves), it employs a Fine Tree (as a predictive model). Classification trees are tree models where the target variable can take a discrete range of values. In these tree structures, the leaves correspond to class labels, and the branches to the attributes combine to form those class labels. Regression trees are Fine Trees when the target variable can take continuous values (usually real numbers).

2.3 Training and testing

Different ML techniques are developed and trained, including Bagged Tree, SVM, KNN, and Fine Tree. The complete data stream is divided into three categories: training, validating, and testing, as illustrated in Table 2. All of the models are trained using 70% of the data. The validation dataset is notably useful for preventing overfitting the dataset. So, 15% of the data are used for validation. A comparative analysis of the models is conducted to determine which model is the best based on its performance. Finally, new and unused samples (15%) are used to test the models.

Table 2 Proportions of data in each dataset

Full size table

2.4 Performance measurement

This study has used different evaluation metrics to evaluate the efficacy and usefulness of classification algorithms for liver disease prediction. A confusion matrix and all relevant metrics, including the ROC curve, True Positives, True Negatives, False Positives, False Negatives, error rate, accuracy, True Positive Rate (TPR), and False Positive Rate (FPR), etc., are used to evaluate a model's performance.

Confusion matrix: One of the most straightforward methods for assessing the model's efficacy and accuracy is the confusion matrix. Several classifications can be assigned to an outcome; it is used to address classification difficulties. A table containing the two dimensions, “Actual class” and “Predicted class”, in each dimension is the confusion matrix. Rows are the actual classifications, while columns are the predicted ones. Two classes, Class 0 and Class 1, are present in the dataset. Table 3 following is a confusion matrix that has been made:

Table 3 Confusion matrix

Full size table

True Positives (TP): True Positives are the cases when the actual class of the data point is True, and the predicted is also True.

True Negatives (TN): True Negatives are the cases when the actual class of the data point is False, and the predicted is also false.

False Positives (FP): False Positives are the cases when the actual class of the data point is False, and the predicted is True.

False Negatives (FN): False Negatives are the cases when the actual class of the data point is True, and the predicted is False.

True Positive Rate (TPR): It is calculated as the number of correct positive predictions of liver disease divided by the total number of positives. It is also called recall (REC) or sensitivity.

$$\mathrm{Sensitivity }=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$

(2)

True Negative Rate (TNR): It is calculated as the number of correct negative predictions of liver disease divided by the total number of negatives. It is also called specificity.

$$\mathrm{Specificity }= \frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}$$

(3)

Accuracy (ACC): Accuracy is calculated as the number of all correct predictions of liver disease divided by the total number of the dataset. Accuracy comparison is based on the performance among the four classification algorithms.

$$\mathrm{Accuracy }=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{FP}+\mathrm{TN}+\mathrm{FN}}$$

(4)

ROC: ROC stands for Receiver Operating Characteristics, a visual illustration of performance evaluation for classification problems. The ROC graph is constructed with TPR on the Y-axis and FPR on the X-axis.

2.5 Sensitivity analysis

The objective of sensitivity analysis is to figure out how input and target factors interact. In liver disease prediction, seven attributes (Age, Gender, Total Bilirubin (TB), Direct Bilirubin (DB), Alanine Aminotransferase (sgpt), Aspartate Aminotransferase (sgot), and A/G (Albumin/Globulin ratio)) of the dataset have been used as input variables. The output variable is categorized into two classes: absence (num = 0) and presence (num = 1) of the liver disease. The statistical characteristics of the dataset (inputs and output) are presented in Table 4.

Table 4 Statistical characteristics of the inputs and outputs

Full size table

This study has used the standard deviation-based based formula for calculating sensitivity analysis. This method calculates the sensitivity when each input and output is evaluated at its mean. Then all the parameters (input and output) are evaluated at their mean plus or minus some multiple of its standard deviation (Mean ± Std) [57]. This study has changed one variable at a time from its mean value to (Mean ± 4 Std) while keeping all other parameters held constant at the reference condition. The reference condition is decided to be the means of the experimental values [58]. The percentage of change of an input variable has been determined using the following formulas if the input variable's mean and standard deviation values are X_mean and X_std, respectively.

$$ \left( {\left( {X_{{{\text{mean}}}} \, \pm { 4 }X_{{{\text{std}}}} } \right) \, - \, X_{{{\text{mean}}}} } \right)*{1}00)/ \, X_{{{\text{mean}}}} ) $$

(5)

The percentage change in input is divided by the percentage change in output to determine sensitivity. The output's sensitivity to each independent variable is then calculated by repeating those processes.

3 Implementation and analysis

In this study, various ML techniques are employed to forecast liver disease. These techniques' effectiveness has been assessed, and a comparison between them has also been completed. The following section analyzes the data and presents the findings before moving on to the section on performance evaluation for various classification techniques.

3.1 Data analysis

The collected data are analyzed and arranged according to category—male and female, as shown in Fig. 3 and Table 5. Out of 304 samples, it was discovered that 203 people had been diagnosed with liver disease overall. According to the analysis, the affected rates for men and women are 22.17% and 77.83%, respectively. The average number of females with liver disease is more than that of males.

Table 5 Analysis of liver disease dataset

Full size table

3.2 Performance analysis

The confusion matrix and all relevant metrics, including the ROC curve, True Positives, True Negatives, False Positives, False Negatives, error rate, accuracy, TPR, and FPR, are used to evaluate and examine the effectiveness of the algorithms. The following section presents the performance analysis of each algorithm.

3.2.1 Bagged tree

Figure 4 depicts the confusion matrix that was created as a result of the Bagged Tree's training and testing on the gathered data. The training and testing results are also included in Table 5. In the confusion matrix, the green cell indicates that the output matches the target, while the red cell indicates that it does not.

According to Fig. 4 and Table 6, the Bagged Tree correctly predicts 69 samples for the negative class (Class 0) and incorrectly predicts 33 samples. The ratio of True Positives to False Positives is 67.6–32.4%. This classifier can determine 178 instances) rightly for the positive class (Class 1) with an 88.1% True Negative Rate and 11.9% False Positive Rate. The total correct prediction instances are 247, and the total wrong prediction instances are 57, which causes the overall percentage of right predictions and the wrong prediction to be 81.3% and 18.7%, respectively.

Table 6 Accuracy table of Bagged Tree

Full size table

Figure 5 depicts the ROC curve of Boosted Tree. The X-axis represents False Positive Rate, whereas the Y-axis represents True Positive Rate. The area under the ROC curve is 0.86 for both Class 0 and Class 1.

3.2.2 Support vector machine

The confusion matrix produced from the results of training and testing of SVM on the collected data is displayed in Fig. 6. Table 7 also displays the training and testing results. The red cell in the confusion matrix shows that the output is not matched with the target, whereas the green cell shows that the output matches the target.

Table 7 Accuracy table of SVM

Full size table

According to Fig. 6 and Table 7, SVM adequately predicts 49 samples for the negative class (Class 0) and incorrectly predicts 53 samples. The True Positive Rate is 48.0%, and the False Positive Rate is 52.0%. SVM can accurately predict 162 samples in the positive class (Class 1) scenario and 40 samples inaccurately. The False Negative Rate is 19.8%, and the True Negative Rate is 80.2%. It is proven that there were 211 total cases of correct prediction and 93 total instances of incorrect predictions, resulting in overall percentages of right predictions and wrong predictions of 69.4% and 30.6%, respectively.

Figure 7 exhibits the ROC curve. In Fig. 7, the Y-axis represents the True Positive Rate, and the X-axis presents the False Positive Rate. The area under ROC is 0.69 for both Class 0 and Class 1.

3.2.3 K-nearest neighbor

The collected data is trained and tested using KNN. The training and testing results are also shown in Table 8. The results confusion matrix of KNN is shown in Fig. 8. Here, the green cell represents the output class matched with the target class, and the red cell exhibits the output class, which is not matched with the target class.

Table 8 Accuracy table of K-NN

Full size table

For the negative class (Class 0), this classifier predicts 72 samples correctly and 30 samples incorrectly, according to Fig. 8 and Table 8. The True Positive Rate is 70.6%, and the False Positive Rate is 29.4%. KNN determines 171 samples rightly for positive class (Class 1) with an 84.7% True Negative Rate and the 31 instances incorrectly resulting in a 15.3% False Positive Rate. It appears that the total correct prediction instances are 247 and the total incorrect prediction instances are 57, which causes the overall percentage of right prediction and wrong prediction as 79.9% and 20.1%, respectively.

Figure 9 exhibits the ROC curve. In Fig. 9, the Y-axis represents the True Positive Rate, and the X-axis presents the False Positive Rate. The area under ROC is 0.78 for both Class 0 and Class 1.

3.2.4 Fine tree

The confusion matrix produced by Fine Tree's training and testing on the collected data is displayed in Fig. 10. Table 9 also includes the training and test results. In the confusion matrix, the green cell represents a match between the output and the target, whereas the red represents a mismatch.

Table 9 Accuracy table of Fine Tree

Full size table

The Fine Tree correctly predicts 102 samples for the negative class (Class 0) and wrongly predicts 48 samples, as shown in Fig. 10 and Table 8. The True Positive Rate is 52.9%, and the False Positive Rate is 47.1%. This classifier has a True Negative Rate of 80.7% and a False Positive Rate of 19.3%, and it can correctly identify 163 instances for the positive class (Class 1). There were 247 cases of correct forecasts and 57 instances of incorrect predictions, resulting in overall percentages of correct predictions and incorrect predictions of 69.4% and 30.6%, respectively.

The ROC curve of the Fine Tree is shown in Fig. 11. The Y-axis shows True Positive Rate, and the X-axis presents False Positive Rate. Figure 11 explores the area under ROC for Class 0 and Class 1 as 0.77.

3.3 Comparative analysis

A comparison of different classifiers- Bagged Tree, SVM, KNN, and Fine Tree is carried out. Accordingly, comparisons with the confusion matrix and ROC are shown in Figs. 12 and 13, respectively. Table 10 compares the performance of Bagged Tree, SVM, KNN, and Fine Tree in terms of prediction accuracy, error accuracy, modeling time, and ROC. Comparisons of accuracy and error among Bagged Tree, SVM, KNN, and Fine Tree are shown in Figs. 14 and 15. It has been found that accuracy rates for Bagged Tree, SVM, KNN, and Fine Tree are 81.3%, 69.4%, 79.9%, and 69.4%, respectively. The Bagged Tree classifier exhibits the highest accuracy (81.3%) compared to the other three techniques, as shown in Table 10.

Table 10 Comparison of performance among different algorithms

Full size table

Figures 14 and 15 show that the Bagged Tree algorithm has a higher accuracy rate of 81.3% and a lower error rate of 18.7%. Figure 16 evidences areas under ROC for Class 0 and Class 1 are 0.86, 0.69, 0.78, and 0.77 obtained for Bagged Tree, SVM, KNN, and Fine Tree, respectively. The highest value of the ROC curve (0.86) is found for Bagged Tree.

The amount of time (in seconds) needed to create each classifier's model is shown in Fig. 17. It has been noted that building a model using Bagged Tree, SVM, KNN, and Fine Tree takes 50.27, 16.17, 10.94, and 24.26 s, respectively. Figure 17 demonstrates that the Bagged Tree model is highest in build-in time measurement, which is 50.27 s.

The Bagged Tree, SVM, KNN, and Fine Tree algorithms are now used to test new samples that have never been tested. The following are the steps in the algorithm's prediction process:

i. Test the dataset with new instances.

ii. After the training process, export the selected whole model from the software APP in the working space for the prediction process.

iii. Then import the new sample dataset, which is also normalized. In the dataset, all the attribute fields will be the same as the previous full dataset for training purposes. Just the values of the target class are not included.

iv. In the working window, write a specific function for all the trained models exported, and it is 'yfit = trainedmodel.predictFnc(T)'. The trained model is the compact model name, and T is the name of the test dataset.

v. Run the test dataset. Then apply different classifier algorithms for testing purposes.

The main goal of this work is to find the best algorithm that will give better accuracy than the early prediction system of liver disease. To this end, the prediction output is shown in Table 11.

Table 11 Comparisons chart of target and predicted value of new samples

Full size table

3.4 Comparisons with earlier studies

Some distinctions between this proposed system and past studies are listed below. Most of the studies, like [13, 15, 17], have focused on how classifiers can correctly recognize cases of liver disease. Our suggested approach not only predicts liver disease cases but also considers the impact of each characteristic on the prediction process by employing sensitivity analysis to identify the most critical component that causes liver disease in most cases.

Our suggested approach outperforms various previously published papers. The table below shows a comparison of the suggested model to past research. Table 12 shows that the performance of our model outperforms that of other existing models.

Table 12 Comparative analysis of the proposed system with existing works

Full size table

3.5 Sensitivity analysis

Sensitivity analysis is used to determine which attributes have the most influence on the diagnosis of liver disease. This function calculates the sensitivity of each attribute to the class to estimate its worth. Sensitivity analysis has been performed on the seven attributes of the dataset using three different methods to determine the most significant attribute responsible for liver illness. Table 13 represents the result of sensitivity analysis for the dataset of this liver disease prediction system. Figure 18 shows the graphical representation of the sensitivity analysis of the standard deviation-based method.

Table 13 Sensitivity analysis of the proposed system

Full size table

It is critical to understand the relative importance of the various factors contributing to liver disease occurrence, to choose the best way to reduce the number of positive cases. The sensitivity analysis indices in this study show how important each parameter is to the prevalence of liver disease. Figure 18 demonstrates that Alanine Aminotransferase (sgpt), Aspartate Aminotransferase (sgot), and Age are the most influential parameter in the sensitivity analysis method.

Alanine Aminotransferase (sgpt) is the most significant parameter in the sensitivity analysis of liver disease. The range of SGPT in a liter of blood serum is 7 to 56 units. High levels of certain liver enzymes can be a significant sign of disease or injury. Liver illnesses include fatty liver or non-alcoholic fatty liver disease (NAFLD), viral hepatitis, autoimmune hepatitis, and liver cancer cause an increase in Alanine Aminotransferase (sgpt) levels [59,60,61]. About 35% of Americans have fatty liver disease, which frequently co-occurs with diabetes and obesity [62].
Aspartate Aminotransferase (sgot) is the second most important parameter. A sgot/sgpt ratio higher than 2:1 (where the sgot is more than twice as high as the sgpt) is a sign of alcoholic liver disease [63]. Every year 493,300 people die from alcoholic liver disease, which is 47.9% of all liver cirrhosis deaths [64].
Age ranks as the third most crucial parameter. With age, the liver's blood flow and volume gradually decline. Studies employing ultrasound have shown that as people age, their liver capacity reduces by 20–40%. These alterations are caused by a decrease in blood flow to the liver, as evidenced by the fact that those over 65 had a 35% lower blood volume than those under 40. Hepatic sinusoidal endothelial cells and other liver cells, as well as gradually changing hepatic shape and function, are all connected with aging. Additionally, aging might increase the risks for several liver illnesses and act as an adverse prognostic factor, increasing the death rate [65].
The development of diseases is greatly influenced by gender. Women are more frequently diagnosed with acute liver failure and toxin-mediated liver diseases, such as alcohol- and drug-induced liver disease. Even though males misuse or depend on alcohol more than women do at a ratio of 2:1 in adults over the age of 26, women are more vulnerable than men to the toxic effects of alcohol on the liver for any given dose of alcohol [66].
The serum Albumin/Globulin ratio (A/G) can predict the prognosis of liver illness. A kind of pyogenic infection in the liver called a pyogenic liver abscess (PLA) can be fatal if it is not appropriately treated. Monitoring A/G has significant clinical implications for assessing PLA patients' progress [67].
Numerous predictive models have been developed to forecast outcomes and categorize risk in liver cirrhotic patients. Total Bilirubin (TB) is a component of the most widely used predictive models, including the Child–Pugh score and the Model for End-stage Liver Disease (MELD) score. In particular, serum bilirubin level accurately reflects hepatic synthesis and excretory function. Direct Bilirubin (DB) levels rise in liver cirrhosis due to portal flow distortion, intrahepatic cholestasis, and impaired hepatic bilirubin clearance. In the meantime, splenomegaly and portosystemic shunting cause hemolysis, which raises indirect bilirubin levels. Due to the various pathophysiologies of high DB and indirect bilirubin levels, patients with primarily indirect bilirubin may have different prognoses and predisposing variables than those with Direct Bilirubin. Several studies investigated that Direct Bilirubin is more valuable than Total Bilirubin for predicting prognosis in patients with Liver Cirrhosis [68].

The variation of the target in relation to input parameters is depicted in Fig. 19. It is apparent from Fig. 19 that with the growth of the value of Alanine Aminotransferase (sgpt), Aspartate Aminotransferase (sgot), and Age, the possibility of growing the risk of liver disease is also increased. The expanding value of Alanine Aminotransferase (sgpt), Aspartate Aminotransferase (sgot), and age have substantial effects on the growth of the liver disease in most cases. Compared to the findings of other studies conducted in this field [59, 62, 66], these results appear reasonable.

3.6 Impact of the model on healthcare

The liver has several vital functions that keep the body healthy, including the production of bile, which allows the body to use protein, fat, and carbohydrates, as well as the use and storage of fats, sugar, iron, and vitamins, the detoxification of drugs, alcohol, and other potentially harmful substances. Cirrhosis occurs when the liver tissue is destroyed, reducing blood flow to the liver and preventing the liver from performing vital processes for human health. Acute liver failure (ALF) occurs in about 2000 cases yearly, accounting for 6% of all liver-related deaths and 6% of liver transplants [17]. Although ALF is uncommon, it is linked to a high mortality rate. As a result, to avoid acute problems and limit the likelihood of long-term complications, liver disease necessitates ongoing medical care and self-management education. Our proposed model will improve disease diagnosis and benefit the medical profession. These tools will assist clinicians in accurately determining whether or not a patient has liver disease.

It is vital to understand the relative relevance of the various factors that contribute to the occurrence of liver disease to choose the best way to reduce the number of positive cases. This study's sensitivity analysis indices indicate how vital each parameter is to liver disease prevalence. Age, gender, Total Bilirubin, Direct Bilirubin, Alanine Aminotransferase, Aspartate Aminotransferase, and Albumin/Globulin ratio are all factors that clinicians consider making initial liver disease diagnoses. The suggested system discovered that the Alanine Aminotransferase (sgpt) characteristic has a considerable impact on the cause of liver disease through sensitivity analysis of the datasets. It can be expected that the proposed sensitivity analysis-based approach will aid clinicians in making decisions about detecting liver disease at an early stage by evaluating Alanine Aminotransferase (sgpt) levels.

4 Conclusion

The number of patients with liver disease is constantly rising, and identifying its symptoms has become challenging. Accurate detection is required to aid the medical professional in prescribing the proper medications and medical care. This study highlights the application of various supervised classification approaches to detect liver disease at an early stage. Data have been collected from several clinics and hospitals in different districts of Bangladesh. Age, gender, TB, DB, Alanine Aminotransferase (sgpt), Aspartate Aminotransferase (sgot), and Albumin/Globulin ratio have been used as attributes to identify liver diseases. The patients’ data are pre-processed and examined. The diagnosing rates for men and women are also calculated. The diagnosis rate for men is 22.17%, while it is 77.83% for women. This research has conducted experiments employing four ML techniques (Bagged Tree, SVM, KNN, and Fine Tree) for prediction and compared them to the data set of liver disease patients using some assessing criteria. The results of these approaches are evaluated with the confusion matrix, TPR, FPR, ROC Curve, and accuracy. Bagged Tree provides 81.3% accuracy, 18.7% error rate, 0.86 ROC, and 50.27 s to create the model. SVM yields 16.17 s to build the model, with an accuracy of 69.4%. KNN achieves the model building time of 10.94 s, an accuracy of 79.9%, an error rate of 20.1%, and a ROC of 0.78. Fine Tree offers 69.4% accuracy, 30.6% error rate, 0.77 ROC, and 24.26 s to build the model. The experimental results conclude that the Bagged Tree classifier can be considered the best algorithm among other algorithms because of its highest classification accuracy of 81.30%. This study also examines the impact of each attribute on the prediction process by conducting sensitivity analysis to find the most significant factors responsible for most cases of liver disease. Age, Gender, TB, DB, Alanine Aminotransferase (sgpt), Aspartate Aminotransferase (sgot), and Albumin/ Globulin ratio attain 218.93%, 124.17%, 38.53%, 39.99%, 451.62%, 295.61%, and 42.37% sensitivity, respectively. It has been found that Alanine Aminotransferase (sgpt) is the most significant parameter in the sensitivity analysis of liver disease. The proposed approach could benefit physicians in making final predictions about liver patients. Physicians can make very accurate decisions if they use such a tool. More data exploration can lead to more exciting outcomes. It will be our main focus in the future.

Availability of data and materials

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Asrani, S.K., Devarbhavi, H., Eaton, J., Kamath, P.S.: Burden of liver diseases in the world. J. Hepatol. 70(1), 151–171 (2019)
Google Scholar
Barragán-Montero, A., Javaid, U., Valdés, G., Nguyen, D., Desbordes, P., Macq, B., Willems, S., Vandewinckele, L., Holmström, M., Löfman, F., Michiels, S.: Artificial intelligence and machine learning for medical imaging: a technology review. Physica Med. 83, 242–256 (2021)
Google Scholar
Taleb, A., Lippert, C., Klein, T., Nabi, M.: Multimodal self-supervised learning for medical image analysis. In: International Conference on Information Processing in Medical Imaging, pp 661–673 (2021)
De Bruijne, M.: Machine learning approaches in medical image analysis: From detection to diagnosis. Med. Image Anal. 33, 94–97 (2016)
Google Scholar
Criminisi, A.: Machine learning for medical images analysis. Med. Image Anal. 33, 91–93 (2016)
Google Scholar
Le Glaz, A., Haralambous, Y., Kim-Dufor, D.H., Lenca, P., Billot, R., Ryan, T.C., Marsh, J., Devylder, J., Walter, M., Berrouiguet, S., Lemey, C.: Machine learning and natural language processing in mental health: systematic review. J. Med. Internet Res. 23(5), e15708 (2021)
Google Scholar
Khanbhai, M., Anyadi, P., Symons, J., Flott, K., Darzi, A., Mayer, E.: Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review. BMJ Health Care Inform. 28(1), e100262 (2021)
Google Scholar
Manhas, J., Gupta, R.K., Roy, P.P.: A review on automated cancer detection in medical images using machine learning and deep learning based computational techniques: challenges and opportunities. Arch. Comput. Methods Eng., 1–41 (2021)
Allugunti, V.R.: Breast cancer detection based on thermographic images using machine learning and deep learning algorithms. Int. J. Eng. Comput. Sci. 4(1), 49–56 (2022)
Google Scholar
Alanazi, S.A., Kamruzzaman, M.M., Islam Sarker, M.N., Alruwaili, M., Alhwaiti, Y., Alshammari, N., Siddiqi, M.H.: Boosting breast cancer detection using convolutional neural network. J. Healthc. Eng. (2021). https://doi.org/10.1155/2021/5528622
Article Google Scholar
Abdullah, D.M., Ahmed, N.S.: A review of most recent lung cancer detection techniques using machine learning. Int. J. Sci. Bus. 5(3), 159–173 (2021)
Google Scholar
Bhise, S., Gadekar, S., Gaur, A.S., Bepari, S., Deepmala Kale, D.S.A.: Breast cancer detection using machine learning techniques. Int. J. Eng. Res. Technol, 10(7) (2021)
Gregory, O.: Prediction of liver disease (biliary cirrhosis) using data mining technique. Int. J. Emerg. Technol. Res. 10(2), 37–42 (2015)
MathSciNet Google Scholar
Olaniyi, E.O., Adnan, K.: Liver disease diagnosis based on neural networks. Adv. Comput. Intell., 48–53 (2013)
Baitharu, T.R., Pani, S.K.: Analysis of data mining techniques for healthcare decision support system using liver disorder dataset. Proc. Comput. Sci. 85, 862–870 (2016)
Google Scholar
Ramana, B.V., Babu, M.P., Venkateswarlu, N.B.: Liver classification using modified rotation forest. Int. J. Eng. Res. Dev. 6(1), 17–24 (2012)
Google Scholar
Alfisahrin, S.N.N., Mantoro, T. Data mining techniques for optimization of liver disease classification. In: 2013 International Conference on Advanced Computer Science Applications and Technologies, 379–384. IEEE (2013)
Dhamodharan, S.: (2016) Liver disease prediction using bayesian classification.
Gulia, A., Vohra, R., Rani, P.: Liver patient classification using intelligent techniques. Int. J. Comput. Sci. Inform. Technol. 5(4), 5110–5115 (2014)
Google Scholar
Vijayarani, S., Dhayanand, S.: Liver disease prediction using SVM and Naïve Bayes algorithms. Int. J. Sci. Eng. Technol. Res. (IJSETR) 4(4), 816–820 (2015)
Google Scholar
Islam, M., Wu, C.C., Poly, T.N., Yang, H.C., Li, Y.C.J.: Applications of machine learning in fatty live disease prediction. In: Building continents of knowledge in oceans of data: the future of co-created eHealth, 166–170 (2018)
Singh, J., Bagga, S., Kaur, R.: Software-based prediction of liver disease with feature selection and classification techniques. Proc. Comput. Sci. 167, 1970–1980 (2020)
Google Scholar
Joloudari, J.H., Saadatfar, H., Dehzangi, A., Shamshirband, S.: Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection. Inform. Med. Unlocked 17, 100255 (2019)
Google Scholar
Phan, D.V., Chan, C.L., Li, A.H.A., Chien, T.Y., Nguyen, V.C.: Liver cancer prediction in a viral hepatitis cohort: a deep learning approach. Int. J. Cancer 147(10), 2871–2878 (2020)
Google Scholar
Midya, A., Chakraborty, J., Pak, L.M., Zheng, J., Jarnagin, W.R., Do, R.K., Simpson, A.L. Deep convolutional neural network for the classification of hepatocellular carcinoma and intrahepatic cholangiocarcinoma. In: Medical Imaging Computer-Aided Diagnosis (10575:501–506). SPIE (2018)
Thabane, L., Mbuagbaw, L., Zhang, S., Samaan, Z., Marcucci, M., Ye, C., Thabane, M., Giangregorio, L., Dennis, B., Kosa, D., Debono, V.B.: A tutorial on sensitivity analyses in clinical trials: the what, why, when and how. BMC Med. Res. Methodol. 13(1), 1–12 (2013)
Google Scholar
Bakar WAWA, Josdi NLNB, Man MB, Triana YS (2022) An evaluation of artificial neural networks and random forests for heart disease prediction. J. Hunan Univ. Natl. Sci. 49(2).
Gupta, I., Sharma, V, Kaur S, Singh AK (2022) PCA-RF: an efficient Parkinson's disease prediction model based on random forest classification. arXiv preprint arXiv:2203.11287.
El-Shafiey, M.G., Hagag, A., El-Dahshan, E.S.A., Ismail, M.A.: A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest. Multimed. Tools Appl. 81(13), 18155–18179 (2022)
Google Scholar
Amini, N., Shalbaf, A.: Automatic classification of severity of COVID-19 patients using texture feature and random forest based on computed tomography images. Int. J. Imaging Syst. Technol. 32(1), 102–110 (2022)
Google Scholar
Williamson, S., Vijayakumar, K., Kadam, V.J.: Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features. Multimed. Tools Appl. 81(26), 36869–36889 (2022)
Google Scholar
Kursa, M.B.: Robustness of random forest-based gene selection methods. BMC Bioinform. 15(1), 1–8 (2014)
Google Scholar
Deng, H., Runger, G.: Gene selection with guided regularized random forest. Pattern Recogn. 46(12), 3483–3489 (2013)
Google Scholar
Pashaei, E., Pashaei, E.: Gene selection using intelligent dynamic genetic algorithm and random forest. In: 2019 11th international conference on electrical and electronics engineering (ELECO) (pp. 470–474). IEEE (2019)
Seo, H., Brand, L., Barco, L.S., Wang, H.: Scaling multi-instance support vector machine to breast cancer detection on the BreaKHis dataset. Bioinformatics 38, i92–i100 (2022)
Google Scholar
Badr, E., Almotairi, S., Salam, M.A., Ahmed, H.: New sequential and parallel support vector machine with grey wolf optimizer for breast cancer diagnosis. Alex. Eng. J. 61(3), 2520–2534 (2022)
Google Scholar
Alyami, J., Sadad, T., Rehman, A., Almutairi, F., Saba, T., Bahaj, S.A., Alkhurim, A.: Cloud computing-based framework for breast tumor image classification using fusion of AlexNet and GLCM texture features with ensemble multi-kernel support vector machine (MK-SVM). Comput. Intell. Neurosci. (2022). https://doi.org/10.1155/2022/7403302
Article Google Scholar
Mishra, R., Meher, S., Kustha, N., Pradhan, T.: A skin cancer image detection interface tool using vlf support vector machine classification. In: Computational Intelligence in Pattern Recognition (pp. 49–63). Springer, Singapore (2022)
Sethy, P.K., Behera, S.K., Kannan, N.: Categorization of common pigmented skin lesions (CPSL) using multi-deep features and support vector Machine. J. Digit. Imaging, 1–10 (2022).
Routray, S., Ray, A.K., Mishra, C., Palai, G.: Efficient hybrid image denoising scheme based on SVM classification. Optik 157, 503–511 (2018)
Google Scholar
Barghout, L: Spatial-taxon information granules as used in iterative fuzzy-decision-making for image segmentation. In: Granular computing and decision-making (pp. 285–318). Springer, Cham (2015)
DeCoste, D., Schölkopf, B.: Training invariant support vector machines. Mach. Learn. 46(1), 161–190 (2002)
MATH Google Scholar
Le, N.Q.K., Yapp, E.K.Y., Ho, Q.T., Nagasundaram, N., Ou, Y.Y., Yeh, H.Y.: iEnhancer-5Step: identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Anal. Biochem. 571, 53–61 (2019)
Google Scholar
Do, D.T., Le, N.Q.K.: A sequence-based approach for identifying recombination spots in Saccharomyces cerevisiae by using hyper-parameter optimization in FastText and support vector machine. Chemom. Intell. Lab. Syst. 194, 103855 (2019)
Google Scholar
Mukherjee, R., Sadhu, S., Kundu, A.: Heart disease detection using feature selection based KNN classifier. In: Proceedings of Data Analytics and Management (pp. 577–585). Springer, Singapore (2022)
Reza, M., Hossain, G., Goyal, A., Tiwari, S., Tripathi, A., Bhan, A., Dash, P.: Automatic diabetes and liver disease diagnosis and prediction through SVM and KNN algorithms. In: Emerging Technologies in Data Mining and Information Security (pp. 589–599). Springer, Singapore (2021)
Almustafa, K.M.: Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinform. 21(1), 1–18 (2020)
Google Scholar
Afolayan, J.O., Adebiyi, M.O., Arowolo, M.O., Chakraborty, C., Adebiyi, A.A.: Breast cancer detection using particle swarm optimization and decision tree machine learning technique. In: Intelligent Healthcare (pp. 61–83). Springer, Singapore (2022)
Nasser, F.K., Behadili, S.F.: Breast cancer detection using decision tree and k-nearest neighbour classifiers. Iraqi J. Sci. 4987–5003 (2022)
Sahoo, S., Subudhi, A., Dash, M., Sabut, S.: Automatic classification of cardiac arrhythmias based on hybrid features and decision tree algorithm. Int. J. Autom. Comput. 17(4), 551–561 (2020)
Google Scholar
Behadada, O., Chikh, M.A.: An interpretable classifier for detection of cardiac arrhythmias by using the fuzzy decision tree. Artif. Intell. Res 2(3), 45–58 (2013)
Google Scholar
Santos, L.I., Camargos, M.O., D’Angelo, M.F.S.V., Mendes, J.B., de Medeiros, E.E.C., Guimarães, A.L.S., Palhares, R.M.: Decision tree and artificial immune systems for stroke prediction in imbalanced data. Expert Syst. Appl. 191, 116221 (2022)
Google Scholar
Imura, T., Iwamoto, Y., Inagawa, T., Imada, N., Tanaka, R., Toda, H., Inoue, Y., Araki, H., Araki, O.: Decision tree algorithm identifies stroke patients likely discharge home after rehabilitation using functional and environmental predictors. J. Stroke Cerebrovasc. Dis. 30(4), 105636 (2021)
Google Scholar
Qiu, X., Miao, J., Lan, Y., Sun, W., Li, G., Pan, C., Wang, Y., Zhao, X., Zhu, Z., Zhu, S.: Artificial neural network and decision tree models of post-stroke depression at 3 months after stroke in patients with BMI≥ 24. J. Psychosom. Res. 150, 110632 (2021)
Google Scholar
Mishra, S., Mallick, P.K., Tripathy, H.K., Bhoi, A.K., González-Briones, A.: Performance evaluation of a proposed machine learning model for chronic disease datasets using an integrated attribute evaluator and an improved decision tree classifier. Appl. Sci. 10(22), 8137 (2020)
Google Scholar
Chaudhuri, A.K., Sinha, D., Banerjee, D.K., Das, A.: A novel enhanced decision tree model for detecting chronic kidney disease. Netw. Model. Anal. Health Inform. Bioinform. 10(1), 1–22 (2021)
Google Scholar
Downing, D.J., Gardner, R.H., Hoffman, F.O.: An examination of response-surface methodologies for uncertainty analysis in assessment models. Technometrics 27(2), 151–163 (1985)
Google Scholar
Ghaisari, J., Jannesari, H., Vatani, M.: Artificial neural network predictors for mechanical properties of cold rolling products. Adv. Eng. Softw. 45(1), 91–99 (2012)
Google Scholar
Gowda, S., Desai, P.B., Hull, V.V., Avinash, A.K., Vernekar, S.N., Kulkarni, S.S. A review on laboratory liver function tests. Pan Afr. Med. J. 3 (2009).
Chen, C.H., Huang, M.H., Yang, J.C., Nien, C.K., Yang, C.C., Yeh, Y.H., Yueh, S.K.: Prevalence and etiology of elevated serum alanine aminotransferase level in an adult population in Taiwan. J. Gastroenterol. Hepatol. 22(9), 1482–1489 (2007)
Google Scholar
Miyake, Y., Iwasaki, Y., Terada, R., Okamaoto, R., Ikeda, H., Makino, Y., Kobashi, H., Takaguchi, K., Sakaguchi, K., Shiratori, Y.: Persistent elevation of serum alanine aminotransferase levels leads to poor survival and hepatocellular carcinoma development in type 1 autoimmune hepatitis. Aliment. Pharmacol. Ther. 24(8), 1197–1205 (2006)
Google Scholar
Rinella, M.E.: Non-alcoholic fatty liver disease: a systematic review. JAMA 313(22), 2263–2273 (2015)
Google Scholar
Hall, P., Cash, J.: What is the real function of the liver’ function’tests? Ulst. Med. J. 81(1), 30 (2012)
Google Scholar
Rehm, J., Samokhvalov, A.V., Shield, K.D.: Global burden of alcoholic liver diseases. J. Hepatol. 59(1), 160–168 (2013)
Google Scholar
Kim, H., Kisseleva, T., Brenner, D.A.: Aging and liver disease. Curr. Opin. Gastroenterol. 31(3), 184 (2015)
Google Scholar
Guy, J., Peters, M.G.: Liver disease in women: the influence of gender on epidemiology, natural history, and patient outcomes. Gastroenterol. Hepatol. 9(10), 633 (2013)
Google Scholar
Zhang, J., Wang, T., Fang, Y., Wang, M., Liu, W., Zhao, J., Wang, B., Wu, Z., Lv, Y., Wu, R.: Clinical significance of serum albumin/globulin ratio in patients with pyogenic liver abscess. Front. Surg. (2021). https://doi.org/10.3389/fsurg.2021.677799
Article Google Scholar
Lee, H.A., Jung, J.Y., Lee, Y.S., Jung, Y.K., Kim, J.H., An, H., Yim, H.J., Jeen, Y.T., Yeon, J.E., Byun, K.S., Um, S.H.: Direct bilirubin is more valuable than total bilirubin for predicting prognosis in patients with liver cirrhosis. Gut Liver 15(4), 599 (2021)
Google Scholar

Download references

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Department of Information and Communication Engineering, Noakhali Science and Technology University, Noakhali, Bangladesh
Md. Ashikur Rahman Khan, Faria Afrin, Farida Siddiqi Prity, Sharmin Fatema, Ratul Prosad, Mohammad Kamrul Hasan, Main Uddin & Zayed-Us-Salehin
Department of Computer Science and Engineering, Prime University, Dhaka, Bangladesh
Ishtiaq Ahammad

Authors

Md. Ashikur Rahman Khan
View author publications
You can also search for this author in PubMed Google Scholar
Faria Afrin
View author publications
You can also search for this author in PubMed Google Scholar
Farida Siddiqi Prity
View author publications
You can also search for this author in PubMed Google Scholar
Ishtiaq Ahammad
View author publications
You can also search for this author in PubMed Google Scholar
Sharmin Fatema
View author publications
You can also search for this author in PubMed Google Scholar
Ratul Prosad
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Kamrul Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Main Uddin
View author publications
You can also search for this author in PubMed Google Scholar
Zayed-Us-Salehin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MdARK: conceptualization, investigation, data curation, writing—original draft, writing—review and editing. FA: data collection, conceptualization, writing—original draft, model training, analysis and interpretation of results. FSP and IA: draft manuscript preparation, writing—review and editing. SF, RP, MKH, MU, Z-U-S: study conception, design, data collection, supervision and investigation on challenges.

Corresponding author

Correspondence to Md. Ashikur Rahman Khan.

Ethics declarations

Conflict of interest

No, authors declare that there have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

It is declared that this work is original, entirely authentic, and the data used are genuine. The work has been performed using very recent data. Neither the data nor the text/content from a similar paper has been copied. It is firmly stated that the paper is entirely original, and all the authors have significant roles and contributions to completing this work and preparing the paper. Human or animal consent or approval from any organization was not required for this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khan, M.A.R., Afrin, F., Prity, F.S. et al. An effective approach for early liver disease prediction and sensitivity analysis. Iran J Comput Sci 6, 277–295 (2023). https://doi.org/10.1007/s42044-023-00138-9

Download citation

Received: 05 October 2022
Accepted: 25 February 2023
Published: 20 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s42044-023-00138-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An effective approach for early liver disease prediction and sensitivity analysis

Abstract

Similar content being viewed by others

Early-Stage Detection of Liver Disease Through Machine Learning Algorithms

Comparative Assessment of Performances of Various Machine Learning Algorithms in Detection of Liver Ailments

Prediction of Liver Disease Using Machine Learning Approaches Based on KNN Model

Explore related subjects

1 Introduction

2 Methodology

2.1 Data collection and processing

2.1.1 Attributes

2.1.2 Data collection and normalization

2.2 Machine learning techniques

2.3 Training and testing

2.4 Performance measurement

2.5 Sensitivity analysis

3 Implementation and analysis

3.1 Data analysis

3.2 Performance analysis

3.2.1 Bagged tree

3.2.2 Support vector machine

3.2.3 K-nearest neighbor

3.2.4 Fine tree

3.3 Comparative analysis

3.4 Comparisons with earlier studies

3.5 Sensitivity analysis

3.6 Impact of the model on healthcare

4 Conclusion

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation