1 Introduction

The liver, the largest functional organ in the body, performs numerous crucial functions that are necessary for survival, including the metabolism of carbohydrates, production and elimination of bile, and the synthesis of proteins, lipids, and fatty acids. It is the second-most important organ in humans [1]. According to the WHO, chronic diseases cause nearly 35 million deaths worldwide, accounting for roughly 46% of all illnesses and 59% of all fatalities. Liver illness has become more common over time. According to government statistics, liver illnesses are the sixth most common cause of death in the UK. Across all digestive conditions, liver diseases are the second largest cause of death in the US [2]. Over the past few decades, liver cancer has been more common overall, especially in wealthy nations [3]. The survival rate for liver cancer could be increased by early detection, but this is challenging given the absence of recognisable symptoms and incomplete understanding of the aetiology of oncogenesis [4].

The growth of artificial intelligence technology with advanced algorithms facilitates effective solutions for disease diagnosis and prognosis. The prediction and classification of diseases can be accomplished by machine learning techniques. Numerous approaches like Random Forest (RF), Decision tree, Naive Bayes algorithm, and Logistic regression have been used to provide support by analysis and prediction about the disease or illness [5]. In any kind of prediction model, the selection of features is more crucial to provide a better outcome. The filter and wrapper approaches are often employed for the feature selection techniques. The filter approach, which is independent of any classification algorithm, takes into account how each characteristic is related to the class label [6]. Feature selection enhances machine learning and improves the predictive power of machine learning algorithms in the medical field [7].

Additionally, it improves prediction performance, reduces processing requirements, alleviates the curse of dimensionality, and facilitates data comprehension. The aim of selecting a feature is to consider a part of input variables that can characterise the input data accurately while minimising the influence of noise or auxiliary components and still producing accurate prediction results. The design of the classification model will be influenced by irrelevant attributes if a dataset contains multiple features and multi-class datasets, which could reduce classification accuracy. Therefore, feature selection may be a crucial pre-processing method for solving classification issues. Improving the quality of categorization models can also aid in the reduction of redundant and unused attributes [8,9,10,11].

Section 2 elaborates on the current literature on liver disease prediction using machine learning and feature selection techniques employed in various healthcare fields. The proposed RG-SVM algorithm and its schematic view are given in Section 3. The results and discussions are detailed in Section 4. Section 5 concludes the paper with future enhancements.

2 Literature Survey

Hassan et al. [12] suggested ensemble filter techniques that utilised symmetrical uncertainty, gain ratio, Information Gain (IG), and Support Vector Machine- Recursive Feature Elimination (SVM-RFE) to rank all genes and choose the best genes. This approach is used to assess three binary-labelled gene expression datasets from leukaemia, lung cancer, and breast cancer. Decision trees, support vector machines (SVM), Random Forest (RF), K-nearest neighbours (KNN), and naive Bayes were the five classifiers utilised to evaluate the chosen attributes (NB). They claimed that across all datasets, the novel approach outperformed the earlier methods in terms of classification accuracy.

In the study by Dong et al. [13], RFEs (Recursive Feature Eliminations) and SVMs were suggested. The methodology used methylation chip data from The Cancer Genome Atlas (TCGA) database to examine 377 Hepatocellular carcinoma (HCC) patients and 50 normal samples. A total of 47,099 samples were examined for 134 methylation locations using the SVM-RFE, Cox regression, and Frank-Wolfe (FW)-SVM algorithms. This technique predicted patient survival rates based on the assessment of the model's high, moderate, and low-risk categories.

Hepatitis C was recognised as a virus by Azam Orooji et al.[14] The liver-attacking virus hepatitis C has a significant death rate. By tackling the imbalanced datasets issue, concentrate on this study. The proposed strategy makes use of random over- and under-sampling techniques. When the oversampling method was combined with the RF method, the accuracy of the results was at its highest. The authors, G. Shobana et al. [15], investigate the recursive feature removal feature reduction technique to improve prediction accuracy. Simple machine learning models were applied to the dataset, and the findings revealed that multi-layer perceptrons and logistic regression provided higher prediction accuracy with fewer data.

In order to choose the right amount of features from SVM-RFE, the authors Xiaohui Lin et al. [16] suggested a method known as SVM-RFE-OA(Overlapping Ratio)that integrates the performance of the classifier and the collection period of the data. A modified SVM-RFE-OA technique is suggested to temporally filter off the information occurring in heavily corresponding pixels in each iteration to estimate the feature weights between iterations more precisely. The condition of the test set's liver was examined by Tsehay Admassu Assegie et al. [17]. The ideal feature set was used to train the SVM during the pre-processing phase, and the RF model was used to reduce repetitive features. The experimental findings demonstrate improved accuracy for the proposed SVM model.

Assegie et al. [18] developed a predictive training model of liver disease using SVM and KNN learning techniques, and the performance of the approach was assessed using an Indian liver disease data source. The outcome analysis reveals improved SVM accuracy. It has been shown that SVM outperforms the KNN algorithm for predicting liver disease based on the accuracy ratings of SVM and KNN on the analysis results. The analysis of patient characteristics and genome expression by S. Sontakke et al. [19] aims to enhance the detection of liver illnesses. The molecular biology approach is influenced by factors such as age, ethnicity, and diet. The chemical method of forecasting is more reliable. Most likely, molecular biology research can save lives while illuminating the mysteries of the human anatomy.

Compared to earlier studies on liver disease, the cutting-edge decision tree-based system used by Moloud Abdar et al. [20] displayed good accuracy forecasts while considering more factors. The collection includes 167 data for a healthy liver and 416 records for liver diseases. It is examined by the two algorithms, Chi-square automatic interaction detection (CHAID) and Boosted C5.0, which are frequently used to pinpoint risk factors for liver disease. The findings demonstrate that both algorithms significantly affect the prediction of liver illness based on the rules they produce. According to Marwa I. M. et al. [21], liver tumours can be classified as benign or malignant by looking at CT liver pictures and using the adaptive neuro-fuzzy inference system (ANFIS) model. The decision-making approach involved four steps: liver extraction using thresholding, picture augmentation to boost image quality and boundary extraction methods. Then, using the Discrete Wavelet Transformation characteristics that were recovered using the Fuzzy C-mean (FCM) clustering technique, the interior of the tumour object is segmented. Finally, using the least squares strategy and the backpropagation gradient descent method, the ANFIS classifier is trained using these extracted features. A series of patient CT pictures were used to assess the effectiveness of the suggested technique.

By gathering crucial laboratory values, the Farokhzad et al. approach [22] for performing the diagnosis of liver illness using fuzzy logic was proposed. Fuzzy heuristic systems are built using two different variants of Triangle membership functions and Gussy membership functions. By carefully selecting their input parameters, the quantity of membership functions, and the kind of membership functions, they were able to achieve an accuracy of 83 percent. A suitable selection of features is provided by Marium Mehmood et al. of the feature selection techniques in [23], which makes it easier to identify illnesses. These methods demonstrate their value for data mining and machine learning. According to the study, the Wrapper Method has the highest R2-Score, making it the best at identifying traits that are crucial for disease detection. A higher R2-Score and a lower MSE signify more accurate sickness detection.

Sampling-Continuous Re-RX was introduced as a revolutionary technique by the authors Y Hayashi et al. [24] for developing highly precise and understandable rules for the British United Provident Association (BUPA) and Hepatitis datasets. They demonstrated an extracted rule set from the BUPA dataset and offered a healthcare information explanation of the found rules. Since the suggested approach was close to the trade-off curve, it was more precise and easy to understand, making it more suitable for use in medical decision-making. Padmakala et al. [25] suggested using a group SVM-based sample weighted RF with a brand-new improved colliding body optimization (NICBO) method to identify liver illnesses. The patient data are pre-processed using the ELTA technique for collection, packing, modification, and evaluation. It combined the appropriate model and the filter-based procedures, resulting in the relevant feature.

Admassu et al. [17] automated method for diagnosing liver disease makes use of SVM and RF detection methods. The proposed technique SVM and RF-based hybrid model successfully diagnoses liver illness in the test set. The SVM is trained using the ideal feature set during the well-before phase, and the RF model is used for recurrent feature reduction. The results of the experiment show that the proposed SVM model has an accuracy rate of 78.3%. Abdalrada et al. [35] dealt with the issue of predicting the progression of liver disease. The logistic regression based predictive model was utilised to estimate the likelihood of developing liver disease.

The patient's liver condition is examined using machine learning techniques by Tokala et al. [36]. This work used the proportion of people who get the condition as both positive and negative data. Various ML classifiers and confusion matrix were used to process the percentages of liver disease. According to Madhusudan et al. [37], the main driving force behind the effort was to put a machine learning (ML) based real-time framework for classifying liver illnesses onto the cloud in order to lighten the workload of clinicians. Convolutional neural networks (CNN) were used, and their output from the flatten layer was then delivered to classifiers. The performance of the model was assessed using the stratified K-fold approach.

Various artificial intelligence algorithms are studied by Khan et al. [38], in order to identify the presence of liver disease in a patient at an early stage. Sensitivity analysis was conducted on the dataset to look at how each attribute affects how well the model performs. It was shown that the Alanine Aminotransferase characteristic has the greatest influence on the prognosis of liver illness and is employed as a support system for the early detection of liver disease. A deep-learning approach was presented by Sun et al. [39] for the classification of histological images of liver cancer. Patch features are extracted and completely utilised to compensate for the lack of comprehensive cancer region annotations in those images. To obtain the image-level features for classification, transfer learning is paired with multiple-instance learning to provide the patch-level features.

2.1 Feature Selection

The redundant and irrelevant attributes from the dataset may be removed without affecting accuracy and the classification performance of learning models can be enhanced by feature selection algorithms. Feature selection algorithm that distinguishes crucial characteristics from less significant ones. Also, the dimensionality of training and testing data points is decreased via feature selection. The benefits of feature selection include decreased lifting, shorter training sessions, more accuracy, and more. These techniques can aid in identifying key features that can be utilized to classify various liver diseases [55,56,57,58].

Ruhul et al. [40] presented a system that generates a feature space by considering the covariance between observed variables, maximum class separation, and a linear combination of observed variables. Various statistical techniques were also applied to handle missing values, outliers, and data balancing to prevent bias and overfitting. Kumar et al. used the neighbourhood-weighted K-NN (NWKNN), fuzzy neighbourhood-weighted K-NN, and variable neighbourhood-weighted fuzzy K-NN classifiers to categorise liver patients [41, 43]. Tomek link and redundancy-based under-sampling technology (TR-RUS) is employed to avoid the unbalanced nature of the dataset and claims an improved of accuracy of 87.71% for NWFKNN classifier. A feature extraction approach has been employed to increase prediction performance by Salau et al. [42]. The author claims that the novel feature union prediction algorithm outperforms the existing classification algorithms in terms of accuracy and F1 score.

According to Admassu et al., the classification performance of machine learning models is enhanced by feature selection. This work makes use of a multivariate sample similarity metric for feature selection and chooses features that significantly contribute to the model [55]. By applying dimensionality reduction strategies, authors Ruhul Amin et al. investigated enhanced feature extraction systems for liver patient classification using statistical machine learning techniques. The system retrieved an improved feature space that takes into consideration the covariance between the observed variables, the linear combination of observed variables that maximizes class separation, and the maximum variation in the data. To deal with missing values, outliers, and data balance to prevent bias and overfitting, various robust statistical methods were applied [56].

Filter technique, wrapper approach, and embedding method were three different feature selection algorithms that authors Shruthi Jain et al. explored. The filter method is a pre-processing method for obtaining the greatest qualities. Highly ranking qualities are prioritized and used as predictors in this method. Predictors from the Wrapper Method are combined with a search algorithm that selects a subset and assigns the best possible predictor to that subset [57]. Finding the most pertinent and instructive subset of features in a given dataset is the aim of feature selection, according to the paper in [58]. This strategy helps to lessen dimensionality, improve model performance, and decrease the curse of dimensionality. The goal of this study is to create a brand-new framework that is snake-optimized. Five machine learning algorithms are used, along with the snake optimization (SO) method, to choose and categorize the best medical data, resulting in a highly accurate prediction of kidney and liver disease [58]. State of the art on liver disease prediction techniques has been summarized in Table 1.

Table 1 State of the art on liver disease prediction techniques

Numerous researchers have demonstrated their work on predicting and classifying liver diseases using machine learning and deep learning algorithms. Few works are concentrated on feature selection and grid search algorithms for the better selection of features from the dataset. However, many research studies have not concentrated on the various combination of features and their importance, feature ranking, and statistical analysis on the features for the prediction and classification of liver disease. This work analyses the features in different factors, combinations, and priority of features investigated, and they ranked with statistical analysis support. Thus, this research work intends to predict and classify liver diseases by employing a recursive feature selection algorithm, feature ranking and Gaussian kernel-based support vector machine learning algorithms. The main contributions of this work are given as follows.

  • This work proposes the recursive Gaussian support vector machine-based feature selection (RG-SVM) algorithm.

  • The feature importance and feature ranking has been evaluated for the disease classification with the support of Gaussian kernel-based SVM algorithm.

  • The results of this approach have been compared with the other existing algorithms, and performances and error metrics were evaluated.

  • The results of the proposed algorithm will be useful for physicians to make better decisions for liver disease patients.

Early prediction and diagnosis of liver disease more accurately is the main challenge where many research works are going on. The main contribution of this paper is a novel recursive feature selection algorithm developed with a Gaussian-based SVM algorithm for liver disease prediction.

3 Proposed Methodology

The dataset for the classification of liver diseases has been taken from the Indian Liver Patient Records collected from Northeast of Andhra Pradesh, India [54]. It has the liver disease details about 583 patients with features, age, gender, total bilirubin, direct bilirubin, alkaline phosphatase, alamine aminotransferase, aspartate aminotransferase, proteins, albumin, albumin & globulin ratio, and class. Table 2 gives the sample patient details from the Indian Liver Patient dataset. Figure 1 illustrates the swarm plot of gender with respect to age.

Table 2 Indian Liver Patient dataset
Fig. 1
figure 1

Gender over Age

The imbalanced classification involves developing a predictive model on the liver disease classification datasets with a severe class imbalance and, in turn, poor performance in the minority class. So, in this proposed work, an oversampling approach called the Synthetic.

Minority Oversampling Technique (SMOTE) has been used to address the issue of imbalanced datasets. SMOTE is the simplest approach that involves duplicating examples in the minority class, although it does not add any new information to the model. Instead, new examples can be synthesized from the existing examples. Figure 2 depicts the imbalanced dataset before applying the SMOTE algorithm to the dataset. It can be clearly seen that patients without liver disease are the minority class. Figures 3 and 4 show the alkaline phosphatase distribution and alamine aminotransferase over age.

Fig. 2
figure 2

Imbalanced Class

Fig. 3
figure 3

Alkaline Phosphotase over Age

Fig. 4
figure 4

Alamine Aminotransferase over age

3.1 Recursive Feature Selection Algorithm and Ranking

The recursive Gaussian SVM-based feature selection algorithm involves selecting the predictors in reverse. The first step in this technique is to create a model using all of the predictors and determine the relevance of each predictor. The model is rebuilt, significance scores are calculated once more, and the least significant predictor(s) are discarded. In reality, the analyst defines each subset's size and the number of predictor subgroups to be evaluated. As a result, the subset size is the recursive feature elimination tuning parameter. Based on the importance rankings, the predictors are chosen using the subset size that optimises the performance requirements. The final model is then trained using the best subset [26].

Feature ranking involves ordering the characteristics in accordance with the outcome of a scoring function, which typically determines the relevance of the features. The score S(fi) determines S(fi) criteria for all features, which is computed using the training data. The S(fi) criteria determine the high score for ten liver features, which are denoted as beneficial features. The k highest ranked features according to S(fi) are chosen using the feature selection method that makes use of variable ranking. It simply requires the calculation and sorting of n scores, which is computationally efficient [27]. Figure 5 illustrates the feature ranking for all ten liver features. The selected features based on the recursive feature selection are three, which are clearly depicted in Fig. 6. The optimal features are Alkaline_Phosphotase, Alamine_Aminotransferase, and Aspartate_Aminotransferase. Figure 7 gives the schematic diagram of the proposed RG-SVM methodology. The liver dataset has been pre-processed with the support of the AMOTE algorithm, and then a recursive feature selection, extraction and ranking algorithm is employed. As a result, the data is transformed into a feature vector. The statistical test on the features is performed using the chi-square test, and finally, the optimal features are selected for the liver disease classification.

Fig. 5
figure 5

Liver Feature Ranking

Fig. 6
figure 6

Features selected for liver disease prediction

Fig. 7
figure 7

Schematic diagram of Proposed RG-SVM methodology

3.2 Gaussian-based Support Vector Machine Algorithm

A promising classification method for identifying a complex condition like liver disease using widespread, straightforward data is support vector machine modelling. In circumstances where sample sizes are limited and there are many variables present, the SVM technique, which is data-driven and model-free, may offer significant discriminative potential for classification. Recent advances in disease detection techniques and automated disease classification have both benefited from the usage of this technology [28,29,30,31]. SVM with Gaussian kernel function has been developed for the nonlinearly separable liver dataset. The Gaussian kernel function enables the separation of nonlinearly separable liver dataset by converting the input vector to a Hilbert space representation. The Gaussian kernel is an exponential function that includes the real constant and norm, as given in Eq. (1).

$${K}_{G}\left(u,v\right)=exp\left(-\frac{{\Vert u-v\Vert }^{2}}{2{\rho }^{2}}\right)$$
(1)

where u and v are input vectors, the Euclidean norm in the exponential expression's numerator part is determined using input vectors. It is a real constant, a freely chosen value, in the denominator part. The Gaussian kernel function exhibits hyper-spherical outlines because of its exponential decay in the input feature space and uniform distribution around the support vector. Iterative, time- and energy-intensive, the experimental search is a process. Therefore, one of the key solutions to SVM classification issues would be the creation of an effective technique for adjusting to an ideal width for the data.

4 Results and Discussion

The proposed recursive Gaussian SVM-based feature selection algorithm has been developed and evaluated with a system configuring 6 GB RAM, an Intel i3 processor, and Python libraries. The liver dataset was divided into training and testing with 70% and 30%, respectively, with a k-fold cross-validation of 10. The results of the proposed work are analysed using accuracy, mean square error (MSE), precision, recall, sensitivity, specificity, confusion matrix, and area under receiver operating characteristic curve (AUROC).

Precision is the quality of positive instances produced by the proposed model, whereas recall is the proportion of correctly classified samples (also known as sensitivity). As a result, relevance serves as the foundation for accuracy and f score. A common statistic evaluation model is a mean squared error. The average of the squared prediction errors overall test set occurrences is used to calculate the mean squared error of a model with respect to the test set. The variance between a model's actual value and anticipated value is known as prediction error, as given in Eq. (2).

(2)

where n indicates instances, ‘yi’ is the real-world goal value for the test case ‘xi’ and \(\lambda\) (xi) is the target value anticipated for the test instance xi [32, 33].

A predictive analytics tool, a confusion matrix list, contrasts expected and actual predictions. A confusion matrix is a statistic used to examine a machine-learning classifier's efficacy in a machine-learning situation. The confusion matrix is applied when the classifier's result includes two or more categories. Confusion matrices are used to display the key predictive parameters, including recall, specificity, accuracy, and precision [34]. Figure 8 gives the accuracy of algorithms, where the logistic regression (LR), decision tree (DT), k-nearest neighbour (KNN), Naïve Bayes (NB), and proposed RG-SVM algorithms are compared. The algorithms LR, DT, KNN, NB, and proposed RG-SVM have accuracy values of 73, 80, 81, 54, and 93%, respectively. It clearly shows that the proposed RG-SVM, with the support of a recursive feature selection algorithm, outperformed other existing algorithms with an improved accuracy of 14 – 39%.

Fig. 8
figure 8

Accuracy of algorithms

The values 0.32, 0.34, 0.3, 0.38, and 0.18 are the mean square errors for the LR, DT, KNN, NB, and RG-SVM algorithms, respectively, shown in Fig. 9. The proposed RG-SVM algorithm has 12- 20% of reduced error over the compared LR, DT, KNN, and NB algorithms. The sensitivity and specificity scores for the LR, DT, KNN, NB, and RG-SVM algorithms are depicted in Figs. 10 and 11, respectively. As shown in Fig. 10, the sensitivity scores are 74, 70, 73, 91, and 96 for the LR, DT, KNN, NB, and RG-SVM algorithms. Similarly, the specificity scores are 64, 26, 45, 43, and 98 for the LR, DT, KNN, NB, and RG-SVM algorithms. Comparatively, Figs. 8, 9, 10, and 11 indicate that the proposed RG-SVM algorithm outperforms all performance metrics. The foremost significance of improvement for the RG-SVM algorithm is the features selected recursively for liver disease prediction.

Fig. 9
figure 9

MSE values of algorithms

Fig. 10
figure 10

Specificity Score of Algorithms

Fig. 11
figure 11

Sensitivity Score of Algorithms

Figures 12, 13, 14, and 15 give the confusion matrix for the LR, DT, KNN, and proposed RG-SVM algorithms. The RG-SVM's functions change degree and alter width simultaneously, which improves the classification performance. Moreover, the influences of these functions produce significant results for the confusion matrix of the proposed RG-SVM algorithm compared to other algorithms. The RG-SVM works in parallel with determining the feature weights in building the learning model. It gradually removes the attributes with the least values. The order in which the features are deleted provides an approximation of the ordering of feature importance. In essence, SVM-Recursive ranks the features based on the sequence in which features were eliminated during iterations. The most important components are those at the top of the list that were eliminated in the most recent iteration, as opposed to the least informative features at the bottom of the list that were eliminated in the first iteration. Figure 16 presents the AUROC graph for the LR, DT, KNN, and proposed RG-SVM algorithms. Table 3 gives the comparative analysis of performance metrics of various existing algorithms with the proposed RG-SVM algorithm. The metrics such as accuracy, precision, recall, F1-score, MSE, sensitivity, specificity and false positive rates are compared for the LR, DT, KNN, NB, MLP, Ensemble Classifier, Random Forest, and proposed RG-SVM algorithm. In Table 3, boldface indicates proposed RG-SVM algorithm, which outperforms over other methods. According to our evaluation, the proposed Gaussian kernel-based SVM with a recursive feature selection algorithm (RG-SVM) appears to be more effective than the other models.

Fig. 12
figure 12

Confusion matrix of LR

Fig. 13
figure 13

Confusion matrix of DT

Fig. 14
figure 14

Confusion matrix of KNN

Fig. 15
figure 15

Confusion matrix of RG-SVM

Fig. 16
figure 16

AUROC Chart

Table 3 Comparative Analysis of Existing Algorithm’s Performance Metrics

4.1 SHAP Result Analysis

The SHAP (SHapley Additive exPlanations) method measures the significance of each input variable to a model's ability to make predictions. The force plot is an approach to assess the impact of each characteristic on liver disease prediction. The model's score for liver disease prediction is indicated by the force plot in Fig. 17, as boldface 0.65. Lower scores cause the model to predict 0, whereas higher values cause it to anticipate 1. Red represents features (Age, Aspartate_Aminotransferase, Albumin, Alamine_Aminotransferase) that pushed the model score higher, and blue represents features that pushed the score lower. These features were crucial to predicting liver disease prediction. The features closest to the line separating red from blue indicate how much of an influence it had on the prediction of liver disease, and the size of the bar reflects how much of an impact it had.

Fig. 17
figure 17

SHAP Force Plot

The summary plot in Fig. 18 combines the importance and the impacts of the features. The link between a feature's value and its influence on the prediction may be seen in the summary plot. However, we need to look at SHAP-dependent graphs to see the precise shape of the relationship. A Shapley value for a feature and an instance may be found at each point on the summary plot. The feature determines the location on the y-axis, while the Shapley value determines the position on the x-axis. From low to high, the colour denotes the value of the characteristic. As overlapping points are jittered in the direction of the y-axis, we can see how the Shapley values are distributed over each feature. The relevance of the features determines their ranking. Figure 19 shows the dependence plot on Age and Aspartate_Aminotransferase.

Fig. 18
figure 18

SHAP Summary Plot

Fig. 19
figure 19

SHAP Dependence Plot

5 Conclusion and Recommendation for Future Work

This section elaborates on the conclusion and recommendations for future work.

5.1 Conclusion

In this work, a recursive Gaussian support vector machine-based feature selection (RG-SVM) algorithm has been proposed to predict liver disease. The RG-SVM works in parallel with determining the feature weights in building the learning model. It gradually removes the attributes with the least values. The order in which the features are deleted provides an approximation of the ordering of feature importance. The proposed RG-SVM with the support of the recursive feature selection algorithm has outperformed other existing algorithms. The improved accuracy of 14 – 39% and 12- 20% of reduced MSE error over the LR, DT, KNN, and NB algorithms. The proposed RG-SVM algorithm produces 96% sensitivity score, which is 5–26% higher than the scores of the LR, DT, KNN, and NB algorithms such as 74, 70, 73, and 91 respectively. Similarly, the specificity scores are 64, 26, 45, 43, and 98 for the LR, DT, KNN, NB, and RG-SVM algorithms. The specificity of RG-SVM algorithm produced 34–72% improved results over the existing algorithms. Our study clearly showed that a simple model consisting of RG-SVM could identify liver disease patients with a clinically significant prediction with a high degree of accuracy. As a result, the physicians can use these features that have been chosen in this work to diagnose the disease phenotype and disease process.

5.2 Recommendation for Future Work

Other parameters which are affecting liver such as, smoking, alcohol intake, etc. will be considered for better prediction results. The application of this model will be further enhanced to predict liver biopsy or decrease the need for it among liver disease patients.