1 Introduction

The liver is the body’s heaviest inner organ and carries out many essential functions [1]. It performs digestion, blood purification, blood toxicity control, bilirubin clearance, body metabolism, and conversion of harmful ammonia to urea. There are various liver diseases reported worldwide, including Non-alcoholic fatty liver disease (NAFLD) [2], Hepatocellular carcinoma (HCC) [3]. Due to such diseases, the patient’s health may be seriously affected once the liver gets sick. Liver disease can be diagnosed with many health conditions and types of equipment [4,5,6]. The main reasons for these diseases are alcohol, obesity, diabetes, etc. Once people consume alcohol, the liver distracts itself from other activities and primarily concentrates on making alcohol toxic [7]. The fat may increase near the liver of overweight people, which is a cause of the fatty liver disease (FLD). FLD has been described as a reversible disease and is treatable early [1]. The fat build-up in the liver is responsible for metabolic disorders like obesity, and high blood pressure affects insulin resistance and increases the risk of heart complications and death [8]. The diabetes patient uses insulin, which is 50% increasing the risk of liver disease. The common liver disorder is Hepatitis [9], Cirrhosis [10, 11], and Liver Cancer [6]. Long-term Cirrhosis and FLD may cause benign or malignant formation in the liver, and accurate assessment of these conditions early may lead to improved treatment results [1]. A diagnostic test is needed to detect such diseases because there are no visible symptoms of such illness [12]. There are various traditional methods available to see liver disease.

ML, IoT, and cloud computing have shown positive growth in the healthcare domain to assist, which reduces the physician’s pressure on a diagnosis. ML plays a pivotal role in disease identification using the reacted medical care datasets. ML learns and estimates the result based on the data provided [13]. ML can be described as a crucial tracker in the field such as medicine, data management, and surveillance with the support of suitably trained ML algorithms [14]. It analyses various characteristics and documents in the patient laboratory. It predicts whether a patient has some illness or not based on an appropriate learning strategy. The disease severity can be predicted by analyzing the results. Automatic learning can assist healthcare analysts with precision medicine [15]. The information obtained from completed ponders, patient socioeconomics, medical records, and other sources can be applied to establish a suitable and efficient learning model. Several factors distinguish between the traditional and ML techniques for disease prediction. ML possesses enormous analytical, visual, and predictive capabilities for various data types. Building an ML model that analyzes, imagines, and forecasts many disease types due to its broad applicability in the healthcare sector [16]. ML with Cloud computing is an advancing technological paradigm with a wide variety of economically complex and independent computational frameworks [17]. The cloud helps to maintain up-to-date health records that can access from anywhere. In a conventional process, Traditional techniques of treating liver problems have several drawbacks [18]. Here are a few examples:

  • A significant volume of medical data is generated, but there are far fewer competent observers to interpret this data. Furthermore, a physician may or may not be knowledgeable in analyzing various forms of data and images.

  • Finding hidden patterns and links in vast amounts of medical data is sometimes dismissed as insignificant, and manual detection can be slow and unclear.

  • many crucial features were not considered for proper forecasting, while in the ML techniques, many parts were studied, which provide higher precision [16].

  • Liver biopsy is dangerous and frequently misinterpreted by different observers.

  • Because of organ shortages, liver transplantation in hepatocellular carcinoma (HCC) is rarely performed, so other adequate liver resection (LR) treatments are prioritized. However, this can result in a recurrence of the disease in high-risk people.

  • A single biomarker is insufficient to predict disease. As a result, it is critical to combine a combination of biomarkers to improve diagnostic accuracy. Furthermore, biomarkers routinely utilized for disease diagnosis may produce erroneous results.

  • Early disease detection can save the liver from any more damage and protect the patient from serious illnesses. There are several non-invasive detection approaches. However, they frequently lack precision due to faulty blood-marker testing and may include expensive imaging techniques.

  • Prediction of surgical mortality is often undervalued, resulting in illness recurrence; and.

  • Clinical diagnosis by doctors is less reliable and time-consuming.

A cloud-based liver diseases classification system is developed to classify liver diseases in this work. The proposed work is divided into an introduction, related work, materials and methods, results, discussions, and conclusions with future work. The main highlights/contribution of this work include:

  • Advising on an efficient framework for a CNN network appropriate for liver disease classification by selecting parameters and hyperparameters.

  • Various optimizer combinations, such as Adam, RMSprop, and others, were tested, and the most effective model was provided with the Adam optimizer.

  • The improved CNN model was employed to diagnose liver disease with a training accuracy of 70.80% and validation accuracy of 74.58%.

  • Implementing CNN in conjunction with LR, RF, and SVM classifiers.

  • CNN-extracted features are fed into the LR, RF, and SVM classifiers.

  • Two feature selection methods, ETC and MRMR, are used to select the best feature.

  • Using a stratified K-fold approach to evaluate the suggested method.

  • CNN-LR, CNN-RF, and CNN-SVM performance comparison.

  • The CNN-RF model performs well, achieving 100% precision for all K values.

  • The effective model has been deployed to the PaaS cloud.

  • The model’s size, which is critical for deploying the model to the embedded platform and cloud, is also reported.

1.1 Related work

Expanding access to hidden features in medical data sets enables ML or Deep Learning (DL) algorithms to detect liver problems. Various data sets, including liver function tests, histologically stained slide pictures, and the same kind of molecular markers in blood or tissue, have been used to train ML or DL models that accurately predict liver illness. The accuracy of the ML algorithms reported in prior works was assessed using a combination of confusion matrix, receiver operating characteristic under area under the curve (ROC-AUC), and K-Fold cross-validation. Ramesh et al. [15] proposed ML algorithms such as SVM, KNN, Decision Tree (DT), Naïve Bayes (NB), and Random Forest (RF) to predict liver disease and achieved 88.5%, 85.5%, 88.1%, 59%, and 89%, respectively. Tanwar et al. [18] discussed various research articles with ML framework and suggested future work. Jaganathan [19] proposed SVM with feature selection techniques to extract the optimal subset of descriptors as modelling features to improve the prediction performance and achieved an accuracy of 81.1%, a sensitivity of 84%, and a specificity of 78.3%. Thirunavukkarasu et al. [20] proposed SVM, K-Nearest Neighbor (KNN), and LR algorithms to classify liver diseases. Razali et al. [21] proposed Neural Networks and Bayes Point Machines for liver disease prediction and achieved 66.85% and 70.52% accuracy. Ayeldeen et al. [22] offered a DT approach to predict liver fibrosis stages. Furthermore, Belavigi et al. [23] proposed the Stochastic Average Gradient (SAG) model, Resilient Backpropagation Neural Network, and Convolutional Neural Network (CNN) models to predict liver diseases in the early stage and achieved acceptable performance. Kumar et al. [24] proposed K-Means, RF, NB, KNN, and C 5.0 to detect liver disorders. Hashem et al. [25] proposed an ML model to predict liver fibrosis in chronic hepatitis-C patients. Vats et al. [26] suggested an ML framework, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), K-Means, and Affinity Propagation predict liver diseases. Kuppan et al. [27] proposed ML models like NB, Decision Table, and J48 to classify liver disease. Pasha et al. [28] proposed Logit Boost, Bagging, Adaboost, and Grading meta-learning algorithm to classify liver diseases using a dataset from the UCI repository. Baitharu et al. [29] suggested ML methods like NB, 1BK, DT, ZeroR, ANN, and VFI to detect liver cancer illness with 71.59% accuracy. Sontakke et al. [30] proposed computational methods to improve liver diseases diagnosis of liver diseases.

Singh et al. [31] give a coherent and detailed evaluation of the efforts made in forecasting liver failure, concentrating on several ML approaches established by numerous authors and measuring results. This research also discussed the datasets utilized by multiple authors to forecast liver illness. Pasha et al. [32] presented an ML-based screening method to identify liver disease utilizing RF, LR, and SVM and compared their outcomes. Poonguzharselvi et al. [33] presented a classification framework using KNN, RF, SVM, Nave Bayes, LR, and back propagation neural network (BPNN) to diagnose liver illness. RF with a Genetic algorithm reached the best accuracy of 84%. Yajurved et al. [34] utilized RF, LR, and SVM to determine CLD and obtained 76% accuracy and an F1 score of 84.78%. Keerthana et al. [35] suggested the LR technique for predicting liver illness with 76.3% accuracy and 0.678 Acute Boosted C5.0 and CHAID algorithms offered by Abdar et al. [36] were utilized to detect risk factors for liver problems. According to this study, ladies are more likely than males to get liver problems. The C5.0 algorithm, using the Boosting technique, has an accuracy of 93.75%, which is higher than the CHAID (65%).

Several forms of research on liver disease detection have begun from the above literature. First, the pre-processing data methods were applied, and then liver diseases were classified using several DL or ML models. There is also a range of studies in developing the early detection of liver diseases.

1.2 Motivation and purpose of the system

  • CNN, LR, RF, and SVM models were implemented to classify the disease using the ILPD dataset. Disease detection using ML is now a dominating research topic in most conferences and reputable journals. Multiple research groups simultaneously develop these applications, and the published results in a wide range of seminars, workshops, and publications. A software solution enables clinicians to identify liver illness using android or smartphones by entering the ILPD dataset features. The models such as CNN, LR, RF, and SVM were implemented in Python to predict liver illness with 100% precision. Conduct experiments with feature selection by applying ETC and MRMR methods to select the best features. This technique is helpful for diagnostic liver, predictive behaviour, evaluation of anticipated conditions, and the use of medicines. These studies can be used to learn new relationships and concepts.

2 Materials and methods

This research aims to present the performance analysis of the liver diseases classification system. Figure 1 shows the framework of the classification system for liver diseases. The ILPD dataset of liver diseases was selected and used for this work. The dataset attributes were pre-processed before applying them to the models. This move is helpful to replace the missing values, substitute numerical in place of a string, and change the numerical class into categorical. The CNN model is implemented with two convolutional layers, two batch normalization layers, flatten layer, one dropout layer, and three dense layers. The flatten layer output is given to the ETC and MRMR feature selection methods to select the best features. These best features were applied to the models like LR, RF, and an SVM, to classify liver disease. The models were evaluated using the stratified K-fold method to compare the model’s performance. In step 5, deploy the best-performing model with a small size to the Heroku cloud.

Fig. 1
figure 1

Framework of the proposed classification system for liver diseases

2.1 Datasets description

This paper used the ILPD dataset from the UCI repository, with 583 rows or instances with eleven different features or attributes with a known output class [37]. The total number of points and samples varies in the datasets. It contains numerical data, characters, and null values. The class attribute with the value “1” means that a person has a liver illness, and the class attribute with the value of “2” means a person with no liver illness. Table 1 tabulates the list of points and descriptions of the ILPD database. In the dataset, 416 out of 583 have liver diseases, and the remaining 167 are non-liver disease instances. Figure 2a shows the total number of male and female patients in the ILPD dataset. Figure 2b shows the age range of the male-female patients, and it is observed that the male-female patients are more in the approximately 30–60 age group.

Table 1 Dataset attributes list and ILPD dataset details
Fig. 2
figure 2

(a) Total number of male and female patients in the ILPD dataset and (b) Age range of the male and female patients

2.2 Feature selection techniques

This paper compares the two most common state-of-the-art attribute selection methodologies, ETC and MRMR, generally used to retrieve the most relevant features from datasets.

2.2.1 Extra Tree Classifier (ETC)

The feature selection method is the data preprocessing approach to preparing data for various mining and ML applications, notably for high-dimensional data [38]. Feature selection makes models simpler and most straightforward, boosting ML performance, and preparing clean and comprehensible data [38]. It removes extraneous characteristics and optimum feature sets for enhancing high-precision classification [38, 39]. ETC is a decision-based method of learning used for the selection of features. ETC is similar to the Random Forest (RF) Classifier, randomizing some decision-making, data sub-sets to decrease data, and over-fitting [38, 40]. ETC is the same as RF, in which many trees are created and split the nodes using a random subset of characteristics [41]. The best features are chosen based on the information gain and entropy [42]. It adds several correlated decision trees to the forest. ETC appears to perform better in the context of noisy features. The entropy is:

$$Entropy\left(S\right)= \sum _{i=1}^{c}-plog2\left({p}_{i}\right)$$
(1)

The Information Gain is

$$Gain\left(S,A\right) = Entropy\left(S\right)-\sum _{v ? values\left(A\right)}^{c}\frac{\left|Sv\right|}{\left|S\right|} Entropy\left({S}_{v}\right)$$
(2)

Where “c” is the unique class labels number, “S” is a training set, and “is the proportion of rows with output label is “i.“

2.2.2 Maximum relevance minimum redundancy (MRMR)

The MRMR is a feature selection method that favours features with a strong correlation with the classes but a weak correlation among individuals. The F-statistic will be employed to estimate correlation with the courses for continuous characteristics, while the Pearson correlation coefficient will be utilized to estimate correlation among features. Following that, components are chosen one at a time using a greedy search to maximize the objective function [43]. MRMR is a valuable attribute selection step in many solving scenarios: it attempts to locate a small group of significant features concerning the predicted value that is hardly identical to everyone. This approach is also practical for picking 50 to 100 elements from tens of thousands of parts; choosing just some of them is adequate for accomplishing the tasks with maximum precision. However, traditional relevance and redundancy criteria have the drawbacks of being overly sensitive to the existence of outlying measurements and/or inefficient. One disadvantage or weakness of the MRMR in its commonly used form is that standard relevance and redundancy measurements are susceptible to the presence of variables.

2.3 Convolutional neural network (CNN)

CNN is a supervised learning neural network that performs classification and prediction using multi-layered designs [44]. The CNN model automatically filtered crucial prediction features. CNN comprises convolution, batch normalization, flattening, and dense layers. The convolution layer is made up of several filters or kernels. Normalization and standardization procedures are carried out by the Batch normalization layer. Batch normalization entailed a transformation procedure that kept the mean output close to zero and the standard deviation output close to one [45, 46].

2.4 ML classifiers

Classification is a method for assigning objects to groups or target classes. In healthcare, numerous applications were found in biomedical response modelling, commercial modelling, consumer segmentation, drug analysis, and credit analysis. Different classification algorithms like LR, KNN, SVM, Naïve Bayes, J-48, etc., are available. This study used three classifiers, LR, RF, and SVM, to classify liver disease. A brief description of the three classifiers is as follows:

  • Logistic regression (LR).

LR is a supervised predictive analytical ML algorithm based on the probability principle used for classification problems [47, 48]. LR can be called linear regression, but the LR employs a more complicated cost function described as a ‘sigmoid function’ or the ‘logistic feature’ rather than a linear function. [48]. The LR hypothesis restricts the cost function from 0 to 1, but linear functions may have values greater than one or less than zero [49]. Threshold value determination is important when LR is used as a classification model. In this work, the LR parameters C = 0.1, dual = False, class_weight = None, intercept_scaling = 1, fit_intercept = True, multi_class=’ovr’, max_iter = 20, n_jobs = 3, random_state = None, penalty=’l2’, warm_start = False, solver= ‘liblinear’, and tol = 0.0001, were used.

  • Random Forest (RF).

Random forest (RF): An ensemble classifier that uses randomness to build a decision tree from a set of independent and non-identical data [49]. RF is a supervised algorithm in which the “forest” is a collection of decision-making trees, which are often trained using the “bagging” method. The RF applies to both grading and regression situations. As the trees develop, RF enhances the randomness of the model. When dividing a node, it finds the best feature among numerous features rather than the most significant features. This results in a wide range, leading to a superior model. As a result, the technique for splitting a node in the random forest examines just a random subset of features [49].

  • Support Vector Machine (SVM).

SVM is a supervised learning approach for classification or regression problems [50]. SVM algorithm applied for binary and multi-class classification problems [51]. SVM algorithm classifies the data in binary classifications by finding the best hyperplane separating all datacenters within one class from those in the other. If data is linearly integrated, a mathematical function is used to transform records into advanced space dimensions so that the mathematical feature can become linearly divisible in new space [52, 53].

2.5 Model deployment process

The ML model is deployed to the cloud through GitHub [46]. The flowchart of the complete liver disease classification system is shown in Fig. 3.

Fig. 3
figure 3

Flowchart of the model deployment process to the Heroku cloud

3 Results

This section includes testing the classification system of liver diseases with CNN and three ML classifiers. The ILPD is split into training (468) and validation (117) datasets. The ML-classifiers are analyzed using the UCI ML repository dataset available online. Furthermore, the ETC and MRMR techniques are applied for feature selection. The CNN model was trained with eleven attributes of the ILPD dataset. The ML classifiers such as LR, RF, and SVM are prepared with the CNN model’s flatten layer output features. The same models are also trained with the best-selected attributes by ETC and MRMR methods.

3.1 Performance measures

The confusion matrix can be employed to evaluate the performance of binary and multi-class classification problems [45, 46]. The following metrics of the confusion matrix are applied to assess the model or system.

$${\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}}_{\text{L}}=\frac{{\text{T}\text{P}}_{\text{L}}+{\text{T}\text{N}}_{\text{L}}}{{\text{T}\text{P}}_{\text{L}}+{\text{F}\text{P}}_{\text{L}}+ {\text{T}\text{N}}_{\text{L}} {+ \text{F}\text{N}}_{\text{L}}}$$
(4)
$${ \text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}}_{\text{L}}=\frac{{\text{T}\text{P}}_{\text{L}}}{{\text{T}\text{P}}_{\text{L}}+{\text{F}\text{P}}_{\text{L}}}$$
(5)
$${ \text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}_{\text{L}}=\frac{{\text{T}\text{P}}_{\text{L}}}{{\text{T}\text{P}}_{\text{L}}+{\text{F}\text{N}}_{\text{L}}}$$
(6)
$${\text{F}1-\text{s}\text{c}\text{o}\text{r}\text{e}}_{\text{L}}=\frac{2}{\left(\frac{1}{{\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}_{\text{L}}}\right)+\left(\frac{1}{{ \text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}}_{\text{L}}}\right)}$$
(7)

True negative rate (TNR):

$${\text{T}\text{N}\text{R}}_{\text{L}}= \frac{{\text{T}\text{N}}_{\text{L}}}{{\text{T}\text{N}}_{\text{L}}+{\text{F}\text{P}}_{\text{L}}}$$
(8)

3.2 Data preprocessing

Data preprocessing is essential in developing an ML model which formulates rough data to handle an ML model. The dataset typically contains noise, null values, and an unsuitable format for ML models. Data preprocessing is an essential step for cleaning the data and making it suitable for an ML model, which enhances the performance of the ML model. In this work, the authors checked any missing or null values in the dataset and found that only Albumin_and_Globulin_Ratio attributes have 4 NULL values, filled with the mean value. LabelEncoder is used to normalize labels that encode the variables into digits. The dataset is divided into a training and validation set. This is one of the crucial steps of data preprocessing for enhancing the performance of the ML model. The final step of data preprocessing is feature scaling, which helps to standardize the dataset’s independent attributes into a given range. The dataset variables are arranged in the same field and scale so that no variable dominates the other variable. Here, we will use the standardization method for our dataset.

3.3 Attribute or feature selection

This research aims to transfer the doctor’s decision-making process to a computer or mobile. Therefore, using the same information or data the physician or doctor uses is mandatory. The impact of attribute selection on the outcome is a significant factor in ML methods [51]. Therefore, the use of feature selection techniques is crucial. This study aims to classify liver diseases accurately by selecting the best feature to train the models. Compare the results of models with and without feature selection techniques to find what procedure provides the best classification results for liver diseases. The dataset was pre-processed in the correct format to train the ML models. The ETC and MRMR method was employed on these attributes. Figure 4 shows the feature selection score after applying the ETC method. The ETC method selects features such as f_9, f_74, f_66, f_55, f_67, f_42, f_52, f_29, f_30, and f_19. Whereas the MRMR method selects features such as “f_39, f_57, f_67, f_19, f_42, f_6, f_9, f_66, f_68, and f_30. Both the methods select different features. Table 2 tabulates the model’s name after applying ETC and MRMR techniques for easy understanding.

Fig. 4
figure 4

Feature selection score after applying ETC method

Table 2 Model’s name indication with ETC and MRMR techniques

3.4 CNN implementation

The CNN framework has two convolution layers (Conv1D) with 32 and 16 filters in this work. The Adam optimizer and activation function ‘ReLU’ is used with padding ‘same,’ two Batch normalization layers, and flatten layer. The dropout layer with 0.5 is used after the flattening layer. Three dense layers are used with 128, 32, and 1 neuron. The details are tabulated in Table 3. The CNN model achieved 71.37% training accuracy and 71.19% validation accuracy. The flatten layer has 80 features from f_0 to f_79. These features will be used to train the three classifiers.

Table 3 CNN layers details

3.5 CNN with LR, RF, and SVM.

The approach is proposed by combining the CNN model with the LR, RF, and SVM classifiers. This method uses the CNN model as a feature extractor and the LR, RF, and SVM models as classifiers. The classifier is fed the characteristics acquired from the flattened layer. Following feature extraction, 80 features were routed to the classification section. The feature map was flattened to 80 feature vectors and then classified using classifiers. Figure 5a shows the performance of the CNN-LR, 5b shows the CNN-RF, and 5c shows the CNN-SVM models.

Fig. 5
figure 5

performance of the (a) CNN-LR, (b) CNN-RF, and CNN-SVM models with feature selection techniques (CL_Acc indicates CNN-LR accuracy, CR_Acc_ET indicates CNN-RF accuracy with ETC, CS_Acc_MR indicates CNN-SVM accuracy with MRMR)

The performance of the models was tested with a stratified K-fold method with K values equal to 10, 20, 30, 40, and 50. The 80 features from the flattened layer were used to train and validate the model. The accuracy (highest) performance of the CNN-LR, CNN-ET, and CNN-MR are 91.66%, 100%, and 83.33%, respectively, at K, which equals 50, whereas precision was 100% for all models. The recall (highest) of the CNN-LR, CNN-RF and CNN-SVM are 88.88%, 100%, and 81.81%, respectively, while F1-score was 94.12%, 100%, and 90%, respectively. The CNN-RF models outperform all models in terms of accuracy, precision, recall, and F1 score. 10 features out of 80 were selected using ETC and MRMR methods. From Fig. 5, it can be seen that the performance of the models is identical with and without feature selection methods. The models achieved comparable performance with 10 features only. Both the feature selection method selects different parts still the performance of good. In this work, both ETC and MRMR methods perform well.

The Receiver Operator Characteristic (ROC) analysis efficiently measures ML and data mining performance [52]. The ROC curve of CNN-LR, CNN-RF, and CNN-SVM models is shown in Fig. 6a and b, and 6c, respectively. The ROC curve is a schematic plot that illustrates binary classification diagnostic capabilities. The ROC curve is nearer the top left corner for better results in the classification problems. The performance of all models is worst and closer to the threshold line. The performance of CNN-RF is excellent, with the Area Under Curve (AUC) being 1, while CNN-LR achieved an AUC of 0.83 and CNN-SVM achieved an AUC of 0.67. The authors applied the ETC and MRMR feature selection methods and achieved the same results as Fig. 6.

Fig. 6
figure 6

ROC curve for (a) CNN-LR-, (b) CNN-RF and (c) CNN_SVM models

3.6 Cloud deployment

The empirical analysis of the proposed models is remarkable. The CNN-RF model is performing very well compared to others. This section analyzes and compares the ML model performance in memory size, attributes, precision, and F1 score. Table 4 tabulates the comparative analysis of the models in terms of depth, model size, number of features, precision, and F1 score.

Table 4 Comparative analysis of models

From Table 4, the precision of all models is 100%, while the F1 score of the CNN-RF-MR model is the highest at 100%. Furthermore, the memory size of the CNN-LR-MR is 1.35KB, CNN-RF-MR has 1.2 MB, and CNN-SVM-MR has 52.16KB. The CNN-RF-MR or CNN-RF-ETC model is suitable for the liver disease prediction system because of its high overall performance. In this work, the CNN and CNN-RF-MR model is deployed to the cloud. The Hiroku cloud link will generate after the successful deployment of the model. With just one click of the Heroku cloud link, available on mobile, the website opens to upload eight attributes. The next move is to upload the eight attributes. The model accurately predicts the non-liver patient’s liver after uploading features to the cloud.

4 Discussions

In healthcare, accurately and rapidly predicting any disease is crucial to preventing and controlling the illness. Traditional approaches will help detect diseases, but it is time-consuming. Several computer-based methods were already proposed in the literature to solve the manual problem. Table 5 tabulates the ML-based liver disease prediction system available in the literature with accuracy.

Priya et al. [54] suggested SVM, J48, BN, Multi-layer Perceptron (MLP), and RF ML models for liver cancer disease classification using an ILPD dataset. The Min-Max normalization and Particle Swarm Optimization (PSO) feature selection techniques were applied to improve the model performance. The Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and accuracy metrics were employed to evaluate models. The J48 achieved the highest accuracy (95.04%) compared with MLP (77.54%), SVM (73.44%), RF (80.22%), and Bayes network (90.33%).

Pathan et al. [55] suggested NB, Ada Boost, J48, Bagging, and RF for liver cancer disease detection. Data were prepared using the K-Means technique to select the attributes. The RF model achieved the highest accuracy (100%) as compared with NB (55.84%), Adaboost (71.31%), J48 (87.46%), and Bagging (90.38%). Muthuselvan et al. [56] suggested NB, J48, RT, and K-star methods to detect liver cancer diseases. In another paper, Kaur et al. [57] told SVM, RF, NB, SMO, and J48 algorithms to classify liver cancer disease detection.

Abdalrada et al. [58] suggested LR identify the probability of liver disease using the ILPD dataset and achieved 72.4% accuracy, 78.3% specificity, 90.3% sensitivity, and 0.758% ROC. Whereas Harshpreet et al. [59] suggested a hybrid classification technique for liver disease prediction using ILPD and achieved an accuracy of 77.58%. Mankame et al. [60] proposed ML approaches to detect a liver disease from the ILPD dataset. They have dropped Direct Bilirubin attributes and used only ten out of eleven characteristics. Five ML models such as LR, KNN, DT, RF, and SVM were implemented and achieved the highest precision of 93% and ROC-AUC score of 96% using the RF model.

Table 5 ML-based liver disease prediction system available in the literature with accuracy/precision

Mostafa et al. [61] proposed to diagnose liver illness using ML algorithms such as SVM, RF, and ANN using Multiple imputations by chained equations (MICEs) and principal component analysis (PCA). The synthetic minority oversampling approach oversamples the minority class to control overfitting. The best accuracy attained by the RF was 98.14%. Joloudari et al. [62] suggested ML models such as RF, neural network, MLP, Bayesian networks, SVM, and Particle Swarm Optimization (PSO)-SVM to predict liver disease. These models were compared based on the extraction, loading, transformation, and analysis (ELTA) feature selection approach. PSO-SVM model achieved the highest average accuracy of 95.17%, precision of 96.77%, and F1 score of 95.84% after applying the 10-fold cross-validation method.

The proposed work developed a CNN-based system to classify liver diseases on the computer or smartphone. Two feature selection methods were used to select the best features. The literature review clarified various methods for identifying and classifying liver diseases. The proposed ML models perform well compared with the literature listed in Table 5. Figure 7 compares models’ performances on the ILPD dataset available in the literature. The authors have used the same ILPD dataset to classify liver disease. The present work model’s performance is compared with the literature available model’s precision and F1 score. The precision or accuracy is generally used to evaluate the model performance, but the F1 score is the best metric to test performance on an unbalanced dataset. From Fig. 7, it can be seen Pasha et al. [28], Ayeldeen et al. [22], and Singh et al. [57] have not reported the F1 score performance. Some researchers also did not cross-verify their proposed model using the cross-validation method. The CNN-RF framework achieved excellent precision of 100% at all K values, followed by Ayeldeen et al. [22] and Mankame et al. [54]. The F1 score is also the highest, followed by Mankame et al. [54] and Ramesh et al. [15]. Moreover, no authors in the literature have deployed the model on the cloud. The main advantage of model deployment on the cloud is doctors can access these models on the computer or smartphone for liver disease prediction. The proposed cloud-based classification method correctly classifies liver diseases. The proposed system shall help detect early-stage liver disease for doctors and other users. The empirical evaluation indicates the overall performance of the proposed CNN-RF models is remarkable.

Fig. 7
figure 7

Comparative analysis of proposed work with similar work in the literature

5 Conclusion

Doctors who are well skilled in spotting noteworthy observations and classifying them as malignant or benign utilizing background info and other supporting details diagnose liver disease. DL or ML systems can be trained to recognize the probability of liver disease in the same way humans can. The research paper’s overall goal is to assist people in detecting liver illness early and to detail several categorization approaches used in ML based on performance measures. This offers a clear view of how the algorithm operates on the liver dataset and how it produces a high-efficiency model for predicting liver illness. The authors tried a new method in place of selecting the best features from the dataset. First, these features are applied to the CNN model. The advantage of CNN is that it automatically extracts the essential features and applies them to the ML classifiers such as LR, RF, and SVM. Apart from this, two-feature selection methods, such as ETC and MRMR, are applied to features extracted by the CNN model. Ten markers were selected out of 80 removed by CNN, were applied to the LR, RF, and SVM and achieved identical results for both feature extraction methods. The author used only 10 features out of 80, reducing the model processing time. The robustness of the system was also tested using the stratified K-fold method. The CNN coupled with LR (CNN-LR), RF (CNN-RF), and SVM (CNN-SVM) produced outstanding classification results. CNN-RF outperformed the other two classification methods. Even though all three approaches performed well on the testing data set, CNN-RF had the highest accuracy, precision, recall, and F1 score of 100% (K = 50). CNN-RF produced highly effective classification performance with original and best-selected components and recommended deploying it to the cloud. The CNN-RF approaches can save time, cost, and lives by improving disease diagnosis.

The limitations of the proposed system are that it can only predict whether the patient has liver disease, but in the real world, if the system can identify disease stages with the most affected liver area is more helpful to the clinician. The future work is as follows:

  • Testing more case studies can further improve the system’s performance.

  • CNN can be implemented with a genetic algorithm, and performance can be verified.

  • Other liver disease data can be used with this proposed system.

  • Disease-affected area or part of the liver can be predicted using other datasets.

  • The authors will continue to extend this study to classify more types of liver diseases.