Abstract
Liver disease is a significant global burden on health, with about a few hundred million people suffering from chronic liver disease (CLD), with approximately 2 million deaths each year. Liver diseases are tough to identify and usually ignored in the early stages as it does not show any symptoms. The liver disease diagnosis in the early stage will help to take precautions to prevent future illness. Generally, recognition of people with liver illness is accomplished via liver biopsy and visual assessment of MRI by experienced professionals, which is a laborious and time-consuming practice. As a result, there is a need for the development of an automated detection method that can offer results with minimal and greater precision. The primary motivation of this work is to implement a machine learning (ML) based real-time liver diseases classification framework onto the cloud to reduce clinicians’ burden. The Indian Liver Patient Dataset (ILPD) was applied to classify liver diseases. The dataset has eleven attributes or features employed to train the models. The Convolutional Neural Network (CNN) was implemented and then the flatten layer output was given to the Logistic regression (LR), Random Forest (RF), and Support Vector Machine (SVM) classifier and achieved a precision of 100% for all models. The ExtraTreesClassifier (ETC) and Maximum Relevance Minimum Redundancy (MRMR) techniques were applied to select the features extracted by CNN and achieved remarkable 100% precision. The stratified K-fold method was used to evaluate the model performance. The comparative results confirm that the CNN-RF outperforms the literature-reported models. After the evaluation, the model was deployed successfully to the Platform-as-a-Service (PaaS) cloud.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
The liver is the body’s heaviest inner organ and carries out many essential functions [1]. It performs digestion, blood purification, blood toxicity control, bilirubin clearance, body metabolism, and conversion of harmful ammonia to urea. There are various liver diseases reported worldwide, including Non-alcoholic fatty liver disease (NAFLD) [2], Hepatocellular carcinoma (HCC) [3]. Due to such diseases, the patient’s health may be seriously affected once the liver gets sick. Liver disease can be diagnosed with many health conditions and types of equipment [4,5,6]. The main reasons for these diseases are alcohol, obesity, diabetes, etc. Once people consume alcohol, the liver distracts itself from other activities and primarily concentrates on making alcohol toxic [7]. The fat may increase near the liver of overweight people, which is a cause of the fatty liver disease (FLD). FLD has been described as a reversible disease and is treatable early [1]. The fat build-up in the liver is responsible for metabolic disorders like obesity, and high blood pressure affects insulin resistance and increases the risk of heart complications and death [8]. The diabetes patient uses insulin, which is 50% increasing the risk of liver disease. The common liver disorder is Hepatitis [9], Cirrhosis [10, 11], and Liver Cancer [6]. Long-term Cirrhosis and FLD may cause benign or malignant formation in the liver, and accurate assessment of these conditions early may lead to improved treatment results [1]. A diagnostic test is needed to detect such diseases because there are no visible symptoms of such illness [12]. There are various traditional methods available to see liver disease.
ML, IoT, and cloud computing have shown positive growth in the healthcare domain to assist, which reduces the physician’s pressure on a diagnosis. ML plays a pivotal role in disease identification using the reacted medical care datasets. ML learns and estimates the result based on the data provided [13]. ML can be described as a crucial tracker in the field such as medicine, data management, and surveillance with the support of suitably trained ML algorithms [14]. It analyses various characteristics and documents in the patient laboratory. It predicts whether a patient has some illness or not based on an appropriate learning strategy. The disease severity can be predicted by analyzing the results. Automatic learning can assist healthcare analysts with precision medicine [15]. The information obtained from completed ponders, patient socioeconomics, medical records, and other sources can be applied to establish a suitable and efficient learning model. Several factors distinguish between the traditional and ML techniques for disease prediction. ML possesses enormous analytical, visual, and predictive capabilities for various data types. Building an ML model that analyzes, imagines, and forecasts many disease types due to its broad applicability in the healthcare sector [16]. ML with Cloud computing is an advancing technological paradigm with a wide variety of economically complex and independent computational frameworks [17]. The cloud helps to maintain up-to-date health records that can access from anywhere. In a conventional process, Traditional techniques of treating liver problems have several drawbacks [18]. Here are a few examples:
-
A significant volume of medical data is generated, but there are far fewer competent observers to interpret this data. Furthermore, a physician may or may not be knowledgeable in analyzing various forms of data and images.
-
Finding hidden patterns and links in vast amounts of medical data is sometimes dismissed as insignificant, and manual detection can be slow and unclear.
-
many crucial features were not considered for proper forecasting, while in the ML techniques, many parts were studied, which provide higher precision [16].
-
Liver biopsy is dangerous and frequently misinterpreted by different observers.
-
Because of organ shortages, liver transplantation in hepatocellular carcinoma (HCC) is rarely performed, so other adequate liver resection (LR) treatments are prioritized. However, this can result in a recurrence of the disease in high-risk people.
-
A single biomarker is insufficient to predict disease. As a result, it is critical to combine a combination of biomarkers to improve diagnostic accuracy. Furthermore, biomarkers routinely utilized for disease diagnosis may produce erroneous results.
-
Early disease detection can save the liver from any more damage and protect the patient from serious illnesses. There are several non-invasive detection approaches. However, they frequently lack precision due to faulty blood-marker testing and may include expensive imaging techniques.
-
Prediction of surgical mortality is often undervalued, resulting in illness recurrence; and.
-
Clinical diagnosis by doctors is less reliable and time-consuming.
A cloud-based liver diseases classification system is developed to classify liver diseases in this work. The proposed work is divided into an introduction, related work, materials and methods, results, discussions, and conclusions with future work. The main highlights/contribution of this work include:
-
Advising on an efficient framework for a CNN network appropriate for liver disease classification by selecting parameters and hyperparameters.
-
Various optimizer combinations, such as Adam, RMSprop, and others, were tested, and the most effective model was provided with the Adam optimizer.
-
The improved CNN model was employed to diagnose liver disease with a training accuracy of 70.80% and validation accuracy of 74.58%.
-
Implementing CNN in conjunction with LR, RF, and SVM classifiers.
-
CNN-extracted features are fed into the LR, RF, and SVM classifiers.
-
Two feature selection methods, ETC and MRMR, are used to select the best feature.
-
Using a stratified K-fold approach to evaluate the suggested method.
-
CNN-LR, CNN-RF, and CNN-SVM performance comparison.
-
The CNN-RF model performs well, achieving 100% precision for all K values.
-
The effective model has been deployed to the PaaS cloud.
-
The model’s size, which is critical for deploying the model to the embedded platform and cloud, is also reported.
1.1 Related work
Expanding access to hidden features in medical data sets enables ML or Deep Learning (DL) algorithms to detect liver problems. Various data sets, including liver function tests, histologically stained slide pictures, and the same kind of molecular markers in blood or tissue, have been used to train ML or DL models that accurately predict liver illness. The accuracy of the ML algorithms reported in prior works was assessed using a combination of confusion matrix, receiver operating characteristic under area under the curve (ROC-AUC), and K-Fold cross-validation. Ramesh et al. [15] proposed ML algorithms such as SVM, KNN, Decision Tree (DT), Naïve Bayes (NB), and Random Forest (RF) to predict liver disease and achieved 88.5%, 85.5%, 88.1%, 59%, and 89%, respectively. Tanwar et al. [18] discussed various research articles with ML framework and suggested future work. Jaganathan [19] proposed SVM with feature selection techniques to extract the optimal subset of descriptors as modelling features to improve the prediction performance and achieved an accuracy of 81.1%, a sensitivity of 84%, and a specificity of 78.3%. Thirunavukkarasu et al. [20] proposed SVM, K-Nearest Neighbor (KNN), and LR algorithms to classify liver diseases. Razali et al. [21] proposed Neural Networks and Bayes Point Machines for liver disease prediction and achieved 66.85% and 70.52% accuracy. Ayeldeen et al. [22] offered a DT approach to predict liver fibrosis stages. Furthermore, Belavigi et al. [23] proposed the Stochastic Average Gradient (SAG) model, Resilient Backpropagation Neural Network, and Convolutional Neural Network (CNN) models to predict liver diseases in the early stage and achieved acceptable performance. Kumar et al. [24] proposed K-Means, RF, NB, KNN, and C 5.0 to detect liver disorders. Hashem et al. [25] proposed an ML model to predict liver fibrosis in chronic hepatitis-C patients. Vats et al. [26] suggested an ML framework, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), K-Means, and Affinity Propagation predict liver diseases. Kuppan et al. [27] proposed ML models like NB, Decision Table, and J48 to classify liver disease. Pasha et al. [28] proposed Logit Boost, Bagging, Adaboost, and Grading meta-learning algorithm to classify liver diseases using a dataset from the UCI repository. Baitharu et al. [29] suggested ML methods like NB, 1BK, DT, ZeroR, ANN, and VFI to detect liver cancer illness with 71.59% accuracy. Sontakke et al. [30] proposed computational methods to improve liver diseases diagnosis of liver diseases.
Singh et al. [31] give a coherent and detailed evaluation of the efforts made in forecasting liver failure, concentrating on several ML approaches established by numerous authors and measuring results. This research also discussed the datasets utilized by multiple authors to forecast liver illness. Pasha et al. [32] presented an ML-based screening method to identify liver disease utilizing RF, LR, and SVM and compared their outcomes. Poonguzharselvi et al. [33] presented a classification framework using KNN, RF, SVM, Nave Bayes, LR, and back propagation neural network (BPNN) to diagnose liver illness. RF with a Genetic algorithm reached the best accuracy of 84%. Yajurved et al. [34] utilized RF, LR, and SVM to determine CLD and obtained 76% accuracy and an F1 score of 84.78%. Keerthana et al. [35] suggested the LR technique for predicting liver illness with 76.3% accuracy and 0.678 Acute Boosted C5.0 and CHAID algorithms offered by Abdar et al. [36] were utilized to detect risk factors for liver problems. According to this study, ladies are more likely than males to get liver problems. The C5.0 algorithm, using the Boosting technique, has an accuracy of 93.75%, which is higher than the CHAID (65%).
Several forms of research on liver disease detection have begun from the above literature. First, the pre-processing data methods were applied, and then liver diseases were classified using several DL or ML models. There is also a range of studies in developing the early detection of liver diseases.
1.2 Motivation and purpose of the system
-
CNN, LR, RF, and SVM models were implemented to classify the disease using the ILPD dataset. Disease detection using ML is now a dominating research topic in most conferences and reputable journals. Multiple research groups simultaneously develop these applications, and the published results in a wide range of seminars, workshops, and publications. A software solution enables clinicians to identify liver illness using android or smartphones by entering the ILPD dataset features. The models such as CNN, LR, RF, and SVM were implemented in Python to predict liver illness with 100% precision. Conduct experiments with feature selection by applying ETC and MRMR methods to select the best features. This technique is helpful for diagnostic liver, predictive behaviour, evaluation of anticipated conditions, and the use of medicines. These studies can be used to learn new relationships and concepts.
2 Materials and methods
This research aims to present the performance analysis of the liver diseases classification system. Figure 1 shows the framework of the classification system for liver diseases. The ILPD dataset of liver diseases was selected and used for this work. The dataset attributes were pre-processed before applying them to the models. This move is helpful to replace the missing values, substitute numerical in place of a string, and change the numerical class into categorical. The CNN model is implemented with two convolutional layers, two batch normalization layers, flatten layer, one dropout layer, and three dense layers. The flatten layer output is given to the ETC and MRMR feature selection methods to select the best features. These best features were applied to the models like LR, RF, and an SVM, to classify liver disease. The models were evaluated using the stratified K-fold method to compare the model’s performance. In step 5, deploy the best-performing model with a small size to the Heroku cloud.
2.1 Datasets description
This paper used the ILPD dataset from the UCI repository, with 583 rows or instances with eleven different features or attributes with a known output class [37]. The total number of points and samples varies in the datasets. It contains numerical data, characters, and null values. The class attribute with the value “1” means that a person has a liver illness, and the class attribute with the value of “2” means a person with no liver illness. Table 1 tabulates the list of points and descriptions of the ILPD database. In the dataset, 416 out of 583 have liver diseases, and the remaining 167 are non-liver disease instances. Figure 2a shows the total number of male and female patients in the ILPD dataset. Figure 2b shows the age range of the male-female patients, and it is observed that the male-female patients are more in the approximately 30–60 age group.
2.2 Feature selection techniques
This paper compares the two most common state-of-the-art attribute selection methodologies, ETC and MRMR, generally used to retrieve the most relevant features from datasets.
2.2.1 Extra Tree Classifier (ETC)
The feature selection method is the data preprocessing approach to preparing data for various mining and ML applications, notably for high-dimensional data [38]. Feature selection makes models simpler and most straightforward, boosting ML performance, and preparing clean and comprehensible data [38]. It removes extraneous characteristics and optimum feature sets for enhancing high-precision classification [38, 39]. ETC is a decision-based method of learning used for the selection of features. ETC is similar to the Random Forest (RF) Classifier, randomizing some decision-making, data sub-sets to decrease data, and over-fitting [38, 40]. ETC is the same as RF, in which many trees are created and split the nodes using a random subset of characteristics [41]. The best features are chosen based on the information gain and entropy [42]. It adds several correlated decision trees to the forest. ETC appears to perform better in the context of noisy features. The entropy is:
The Information Gain is
Where “c” is the unique class labels number, “S” is a training set, and “is the proportion of rows with output label is “i.“
2.2.2 Maximum relevance minimum redundancy (MRMR)
The MRMR is a feature selection method that favours features with a strong correlation with the classes but a weak correlation among individuals. The F-statistic will be employed to estimate correlation with the courses for continuous characteristics, while the Pearson correlation coefficient will be utilized to estimate correlation among features. Following that, components are chosen one at a time using a greedy search to maximize the objective function [43]. MRMR is a valuable attribute selection step in many solving scenarios: it attempts to locate a small group of significant features concerning the predicted value that is hardly identical to everyone. This approach is also practical for picking 50 to 100 elements from tens of thousands of parts; choosing just some of them is adequate for accomplishing the tasks with maximum precision. However, traditional relevance and redundancy criteria have the drawbacks of being overly sensitive to the existence of outlying measurements and/or inefficient. One disadvantage or weakness of the MRMR in its commonly used form is that standard relevance and redundancy measurements are susceptible to the presence of variables.
2.3 Convolutional neural network (CNN)
CNN is a supervised learning neural network that performs classification and prediction using multi-layered designs [44]. The CNN model automatically filtered crucial prediction features. CNN comprises convolution, batch normalization, flattening, and dense layers. The convolution layer is made up of several filters or kernels. Normalization and standardization procedures are carried out by the Batch normalization layer. Batch normalization entailed a transformation procedure that kept the mean output close to zero and the standard deviation output close to one [45, 46].
2.4 ML classifiers
Classification is a method for assigning objects to groups or target classes. In healthcare, numerous applications were found in biomedical response modelling, commercial modelling, consumer segmentation, drug analysis, and credit analysis. Different classification algorithms like LR, KNN, SVM, Naïve Bayes, J-48, etc., are available. This study used three classifiers, LR, RF, and SVM, to classify liver disease. A brief description of the three classifiers is as follows:
-
Logistic regression (LR).
LR is a supervised predictive analytical ML algorithm based on the probability principle used for classification problems [47, 48]. LR can be called linear regression, but the LR employs a more complicated cost function described as a ‘sigmoid function’ or the ‘logistic feature’ rather than a linear function. [48]. The LR hypothesis restricts the cost function from 0 to 1, but linear functions may have values greater than one or less than zero [49]. Threshold value determination is important when LR is used as a classification model. In this work, the LR parameters C = 0.1, dual = False, class_weight = None, intercept_scaling = 1, fit_intercept = True, multi_class=’ovr’, max_iter = 20, n_jobs = 3, random_state = None, penalty=’l2’, warm_start = False, solver= ‘liblinear’, and tol = 0.0001, were used.
-
Random Forest (RF).
Random forest (RF): An ensemble classifier that uses randomness to build a decision tree from a set of independent and non-identical data [49]. RF is a supervised algorithm in which the “forest” is a collection of decision-making trees, which are often trained using the “bagging” method. The RF applies to both grading and regression situations. As the trees develop, RF enhances the randomness of the model. When dividing a node, it finds the best feature among numerous features rather than the most significant features. This results in a wide range, leading to a superior model. As a result, the technique for splitting a node in the random forest examines just a random subset of features [49].
-
Support Vector Machine (SVM).
SVM is a supervised learning approach for classification or regression problems [50]. SVM algorithm applied for binary and multi-class classification problems [51]. SVM algorithm classifies the data in binary classifications by finding the best hyperplane separating all datacenters within one class from those in the other. If data is linearly integrated, a mathematical function is used to transform records into advanced space dimensions so that the mathematical feature can become linearly divisible in new space [52, 53].
2.5 Model deployment process
The ML model is deployed to the cloud through GitHub [46]. The flowchart of the complete liver disease classification system is shown in Fig. 3.
3 Results
This section includes testing the classification system of liver diseases with CNN and three ML classifiers. The ILPD is split into training (468) and validation (117) datasets. The ML-classifiers are analyzed using the UCI ML repository dataset available online. Furthermore, the ETC and MRMR techniques are applied for feature selection. The CNN model was trained with eleven attributes of the ILPD dataset. The ML classifiers such as LR, RF, and SVM are prepared with the CNN model’s flatten layer output features. The same models are also trained with the best-selected attributes by ETC and MRMR methods.
3.1 Performance measures
The confusion matrix can be employed to evaluate the performance of binary and multi-class classification problems [45, 46]. The following metrics of the confusion matrix are applied to assess the model or system.
True negative rate (TNR):
3.2 Data preprocessing
Data preprocessing is essential in developing an ML model which formulates rough data to handle an ML model. The dataset typically contains noise, null values, and an unsuitable format for ML models. Data preprocessing is an essential step for cleaning the data and making it suitable for an ML model, which enhances the performance of the ML model. In this work, the authors checked any missing or null values in the dataset and found that only Albumin_and_Globulin_Ratio attributes have 4 NULL values, filled with the mean value. LabelEncoder is used to normalize labels that encode the variables into digits. The dataset is divided into a training and validation set. This is one of the crucial steps of data preprocessing for enhancing the performance of the ML model. The final step of data preprocessing is feature scaling, which helps to standardize the dataset’s independent attributes into a given range. The dataset variables are arranged in the same field and scale so that no variable dominates the other variable. Here, we will use the standardization method for our dataset.
3.3 Attribute or feature selection
This research aims to transfer the doctor’s decision-making process to a computer or mobile. Therefore, using the same information or data the physician or doctor uses is mandatory. The impact of attribute selection on the outcome is a significant factor in ML methods [51]. Therefore, the use of feature selection techniques is crucial. This study aims to classify liver diseases accurately by selecting the best feature to train the models. Compare the results of models with and without feature selection techniques to find what procedure provides the best classification results for liver diseases. The dataset was pre-processed in the correct format to train the ML models. The ETC and MRMR method was employed on these attributes. Figure 4 shows the feature selection score after applying the ETC method. The ETC method selects features such as f_9, f_74, f_66, f_55, f_67, f_42, f_52, f_29, f_30, and f_19. Whereas the MRMR method selects features such as “f_39, f_57, f_67, f_19, f_42, f_6, f_9, f_66, f_68, and f_30. Both the methods select different features. Table 2 tabulates the model’s name after applying ETC and MRMR techniques for easy understanding.
3.4 CNN implementation
The CNN framework has two convolution layers (Conv1D) with 32 and 16 filters in this work. The Adam optimizer and activation function ‘ReLU’ is used with padding ‘same,’ two Batch normalization layers, and flatten layer. The dropout layer with 0.5 is used after the flattening layer. Three dense layers are used with 128, 32, and 1 neuron. The details are tabulated in Table 3. The CNN model achieved 71.37% training accuracy and 71.19% validation accuracy. The flatten layer has 80 features from f_0 to f_79. These features will be used to train the three classifiers.
3.5 CNN with LR, RF, and SVM.
The approach is proposed by combining the CNN model with the LR, RF, and SVM classifiers. This method uses the CNN model as a feature extractor and the LR, RF, and SVM models as classifiers. The classifier is fed the characteristics acquired from the flattened layer. Following feature extraction, 80 features were routed to the classification section. The feature map was flattened to 80 feature vectors and then classified using classifiers. Figure 5a shows the performance of the CNN-LR, 5b shows the CNN-RF, and 5c shows the CNN-SVM models.
The performance of the models was tested with a stratified K-fold method with K values equal to 10, 20, 30, 40, and 50. The 80 features from the flattened layer were used to train and validate the model. The accuracy (highest) performance of the CNN-LR, CNN-ET, and CNN-MR are 91.66%, 100%, and 83.33%, respectively, at K, which equals 50, whereas precision was 100% for all models. The recall (highest) of the CNN-LR, CNN-RF and CNN-SVM are 88.88%, 100%, and 81.81%, respectively, while F1-score was 94.12%, 100%, and 90%, respectively. The CNN-RF models outperform all models in terms of accuracy, precision, recall, and F1 score. 10 features out of 80 were selected using ETC and MRMR methods. From Fig. 5, it can be seen that the performance of the models is identical with and without feature selection methods. The models achieved comparable performance with 10 features only. Both the feature selection method selects different parts still the performance of good. In this work, both ETC and MRMR methods perform well.
The Receiver Operator Characteristic (ROC) analysis efficiently measures ML and data mining performance [52]. The ROC curve of CNN-LR, CNN-RF, and CNN-SVM models is shown in Fig. 6a and b, and 6c, respectively. The ROC curve is a schematic plot that illustrates binary classification diagnostic capabilities. The ROC curve is nearer the top left corner for better results in the classification problems. The performance of all models is worst and closer to the threshold line. The performance of CNN-RF is excellent, with the Area Under Curve (AUC) being 1, while CNN-LR achieved an AUC of 0.83 and CNN-SVM achieved an AUC of 0.67. The authors applied the ETC and MRMR feature selection methods and achieved the same results as Fig. 6.
3.6 Cloud deployment
The empirical analysis of the proposed models is remarkable. The CNN-RF model is performing very well compared to others. This section analyzes and compares the ML model performance in memory size, attributes, precision, and F1 score. Table 4 tabulates the comparative analysis of the models in terms of depth, model size, number of features, precision, and F1 score.
From Table 4, the precision of all models is 100%, while the F1 score of the CNN-RF-MR model is the highest at 100%. Furthermore, the memory size of the CNN-LR-MR is 1.35KB, CNN-RF-MR has 1.2 MB, and CNN-SVM-MR has 52.16KB. The CNN-RF-MR or CNN-RF-ETC model is suitable for the liver disease prediction system because of its high overall performance. In this work, the CNN and CNN-RF-MR model is deployed to the cloud. The Hiroku cloud link will generate after the successful deployment of the model. With just one click of the Heroku cloud link, available on mobile, the website opens to upload eight attributes. The next move is to upload the eight attributes. The model accurately predicts the non-liver patient’s liver after uploading features to the cloud.
4 Discussions
In healthcare, accurately and rapidly predicting any disease is crucial to preventing and controlling the illness. Traditional approaches will help detect diseases, but it is time-consuming. Several computer-based methods were already proposed in the literature to solve the manual problem. Table 5 tabulates the ML-based liver disease prediction system available in the literature with accuracy.
Priya et al. [54] suggested SVM, J48, BN, Multi-layer Perceptron (MLP), and RF ML models for liver cancer disease classification using an ILPD dataset. The Min-Max normalization and Particle Swarm Optimization (PSO) feature selection techniques were applied to improve the model performance. The Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and accuracy metrics were employed to evaluate models. The J48 achieved the highest accuracy (95.04%) compared with MLP (77.54%), SVM (73.44%), RF (80.22%), and Bayes network (90.33%).
Pathan et al. [55] suggested NB, Ada Boost, J48, Bagging, and RF for liver cancer disease detection. Data were prepared using the K-Means technique to select the attributes. The RF model achieved the highest accuracy (100%) as compared with NB (55.84%), Adaboost (71.31%), J48 (87.46%), and Bagging (90.38%). Muthuselvan et al. [56] suggested NB, J48, RT, and K-star methods to detect liver cancer diseases. In another paper, Kaur et al. [57] told SVM, RF, NB, SMO, and J48 algorithms to classify liver cancer disease detection.
Abdalrada et al. [58] suggested LR identify the probability of liver disease using the ILPD dataset and achieved 72.4% accuracy, 78.3% specificity, 90.3% sensitivity, and 0.758% ROC. Whereas Harshpreet et al. [59] suggested a hybrid classification technique for liver disease prediction using ILPD and achieved an accuracy of 77.58%. Mankame et al. [60] proposed ML approaches to detect a liver disease from the ILPD dataset. They have dropped Direct Bilirubin attributes and used only ten out of eleven characteristics. Five ML models such as LR, KNN, DT, RF, and SVM were implemented and achieved the highest precision of 93% and ROC-AUC score of 96% using the RF model.
Mostafa et al. [61] proposed to diagnose liver illness using ML algorithms such as SVM, RF, and ANN using Multiple imputations by chained equations (MICEs) and principal component analysis (PCA). The synthetic minority oversampling approach oversamples the minority class to control overfitting. The best accuracy attained by the RF was 98.14%. Joloudari et al. [62] suggested ML models such as RF, neural network, MLP, Bayesian networks, SVM, and Particle Swarm Optimization (PSO)-SVM to predict liver disease. These models were compared based on the extraction, loading, transformation, and analysis (ELTA) feature selection approach. PSO-SVM model achieved the highest average accuracy of 95.17%, precision of 96.77%, and F1 score of 95.84% after applying the 10-fold cross-validation method.
The proposed work developed a CNN-based system to classify liver diseases on the computer or smartphone. Two feature selection methods were used to select the best features. The literature review clarified various methods for identifying and classifying liver diseases. The proposed ML models perform well compared with the literature listed in Table 5. Figure 7 compares models’ performances on the ILPD dataset available in the literature. The authors have used the same ILPD dataset to classify liver disease. The present work model’s performance is compared with the literature available model’s precision and F1 score. The precision or accuracy is generally used to evaluate the model performance, but the F1 score is the best metric to test performance on an unbalanced dataset. From Fig. 7, it can be seen Pasha et al. [28], Ayeldeen et al. [22], and Singh et al. [57] have not reported the F1 score performance. Some researchers also did not cross-verify their proposed model using the cross-validation method. The CNN-RF framework achieved excellent precision of 100% at all K values, followed by Ayeldeen et al. [22] and Mankame et al. [54]. The F1 score is also the highest, followed by Mankame et al. [54] and Ramesh et al. [15]. Moreover, no authors in the literature have deployed the model on the cloud. The main advantage of model deployment on the cloud is doctors can access these models on the computer or smartphone for liver disease prediction. The proposed cloud-based classification method correctly classifies liver diseases. The proposed system shall help detect early-stage liver disease for doctors and other users. The empirical evaluation indicates the overall performance of the proposed CNN-RF models is remarkable.
5 Conclusion
Doctors who are well skilled in spotting noteworthy observations and classifying them as malignant or benign utilizing background info and other supporting details diagnose liver disease. DL or ML systems can be trained to recognize the probability of liver disease in the same way humans can. The research paper’s overall goal is to assist people in detecting liver illness early and to detail several categorization approaches used in ML based on performance measures. This offers a clear view of how the algorithm operates on the liver dataset and how it produces a high-efficiency model for predicting liver illness. The authors tried a new method in place of selecting the best features from the dataset. First, these features are applied to the CNN model. The advantage of CNN is that it automatically extracts the essential features and applies them to the ML classifiers such as LR, RF, and SVM. Apart from this, two-feature selection methods, such as ETC and MRMR, are applied to features extracted by the CNN model. Ten markers were selected out of 80 removed by CNN, were applied to the LR, RF, and SVM and achieved identical results for both feature extraction methods. The author used only 10 features out of 80, reducing the model processing time. The robustness of the system was also tested using the stratified K-fold method. The CNN coupled with LR (CNN-LR), RF (CNN-RF), and SVM (CNN-SVM) produced outstanding classification results. CNN-RF outperformed the other two classification methods. Even though all three approaches performed well on the testing data set, CNN-RF had the highest accuracy, precision, recall, and F1 score of 100% (K = 50). CNN-RF produced highly effective classification performance with original and best-selected components and recommended deploying it to the cloud. The CNN-RF approaches can save time, cost, and lives by improving disease diagnosis.
The limitations of the proposed system are that it can only predict whether the patient has liver disease, but in the real world, if the system can identify disease stages with the most affected liver area is more helpful to the clinician. The future work is as follows:
-
Testing more case studies can further improve the system’s performance.
-
CNN can be implemented with a genetic algorithm, and performance can be verified.
-
Other liver disease data can be used with this proposed system.
-
Disease-affected area or part of the liver can be predicted using other datasets.
-
The authors will continue to extend this study to classify more types of liver diseases.
Data Availability
Not applicable.
References
Acharya, U.R., Koh, J.E.W., Hagiwara, Y., Tan, J.H., Gertych, A., Vijayananthan, A., Yaakup, N.A., Abdullah, B.J.J., Bin Mohd Fabell, M.K., Yeong, C.H.: Automated diagnosis of focal liver lesions using bidirectional empirical mode decomposition features. Computers in Biology and Medicine. 94, 11–18 (2018). https://doi.org/10.1016/j.compbiomed.2017.12.024
Shahabi, M., Hassanpour, H., Mashayekhi, H.: Rule extraction for fatty liver detection using neural networks. Neural Comput. & Applic. 31, 979–989 (2019). https://doi.org/10.1007/s00521-017-3130-5
Ali, L., Wajahat, I., Amiri Golilarz, N., Keshtkar, F., Bukhari, S.A.C.: LDA–GA–SVM: improved hepatocellular carcinoma prediction through dimensionality reduction and genetically optimized support vector machine. Neural Comput. & Applic. 33, 2783–2792 (2021). https://doi.org/10.1007/s00521-020-05157-2
Grissa, D., Nytoft Rasmussen, D., Krag, A., Brunak, S., Juhl Jensen, L.: Alcoholic liver disease: A registry view on comorbidities and disease prediction. PLoS Comput. Biol. 16, e1008244 (2020). https://doi.org/10.1371/journal.pcbi.1008244
Hashem, S., ElHefnawi, M., Habashy, S., El-Adawy, M., Esmat, G., Elakel, W., Abdelazziz, A.O., Nabeel, M.M., Abdelmaksoud, A.H., Elbaz, T.M., Shousha, H.I.: Machine Learning Prediction Models for Diagnosing Hepatocellular Carcinoma with HCV-related Chronic Liver Disease. Comput. Methods Programs Biomed. 196, 105551 (2020). https://doi.org/10.1016/j.cmpb.2020.105551
Losic, B., Craig, A.J., Villacorta-Martin, C., Martins-Filho, S.N., Akers, N., Chen, X., Ahsen, M.E., von Felden, J., Labgaa, I., DʹAvola, D., Allette, K., Lira, S.A., Furtado, G.C., Garcia-Lezana, T., Restrepo, P., Stueck, A., Ward, S.C., Fiel, M.I., Hiotis, S.P., Gunasekaran, G., Sia, D., Schadt, E.E., Sebra, R., Schwartz, M., Llovet, J.M., Thung, S., Stolovitzky, G., Villanueva, A.: Intratumoral heterogeneity and clonal evolution in liver cancer. Nat. Commun. 11, 291 (2020). https://doi.org/10.1038/s41467-019-14050-z
Naseem, R., Khan, B., Shah, M.A., Wakil, K., Khan, A., Alosaimi, W., Uddin, M.I., Alouffi, B.: Performance Assessment of Classification Algorithms on Early Detection of Liver Syndrome. Journal of Healthcare Engineering. 1–13 (2020). (2020). https://doi.org/10.1155/2020/6680002
Goceri, E., Shah, Z.K., Layman, R., Jiang, X., Gurcan, M.N.: Quantification of liver fat: A comprehensive review. Computers in Biology and Medicine. 71, 174–189 (2016). https://doi.org/10.1016/j.compbiomed.2016.02.013
Abdar, M., Yen, N.Y., Hung, J.C.-S.: Improving the Diagnosis of Liver Disease Using Multilayer Perceptron Neural Network and Boosted Decision Trees. J. Med. Biol. Eng. 38, 953–965 (2018). https://doi.org/10.1007/s40846-017-0360-z
Perveen, S., Shahbaz, M., Keshavjee, K., Guergachi, A.: A Systematic Machine Learning Based Approach for the Diagnosis of Non-Alcoholic Fatty Liver Disease Risk and Progression. Sci. Rep. 8, 2112 (2018). https://doi.org/10.1038/s41598-018-20166-x
Yang, J.D., Ahmed, F., Mara, K.C., Addissie, B.D., Allen, A.M., Gores, G.J., Roberts, L.R.: Diabetes Is Associated With Increased Risk of Hepatocellular Carcinoma in Patients With Cirrhosis From Nonalcoholic Fatty Liver Disease. Hepatology. 71, 907–916 (2020). https://doi.org/10.1002/hep.30858
Muruganantham, B.: Liver Disease Prediction Using Classification Algorithms. Int. J. Adv. Sci. Technol. 29, 311–319 (2020)
Kececi, A., Yildirak, A., Ozyazici, K., Ayluctarhan, G., Agbulut, O., Zincir, I.: Implementation of machine learning algorithms for gait recognition. Eng. Sci. Technol. Int. J. 23, 931–937 (2020). https://doi.org/10.1016/j.jestch.2020.01.005
Govindarajan, P., Soundarapandian, R.K., Gandomi, A.H., Patan, R., Jayaraman, P., Manikandan, R.: Classification of stroke disease using machine learning algorithms. Neural Comput. & Applic. 32, 817–828 (2020). https://doi.org/10.1007/s00521-019-04041-y
Ramesh, D., Katheria, Y.S.: Ensemble method based predictive model for analyzing disease datasets: a predictive analysis approach. Health Technol. 9, 533–545 (2019). https://doi.org/10.1007/s12553-019-00299-3
Godara, S.: Evaluation of Predictive Machine Learning Techniques as Expert Systems in Medical Diagnosis. IJST. 9, 1–14 (2016). https://doi.org/10.17485/ijst/2016/v9i10/87212
Sanaj, M.S., Joe Prathap, P.M.: Nature inspired chaotic squirrel search algorithm (CSSA) for multi objective task scheduling in an IAAS cloud computing atmosphere. Eng. Sci. Technol. Int. J. 23, 891–902 (2020). https://doi.org/10.1016/j.jestch.2019.11.002
Tanwar, N., Rahman, K.F.: Machine Learning in liver disease diagnosis: Current progress and future opportunities. IOP Conf. Ser. : Mater. Sci. Eng. 1022, 012029 (2021). https://doi.org/10.1088/1757-899X/1022/1/012029
Jaganathan, K., Tayara, H., Chong, K.T.: Prediction of Drug-Induced Liver Toxicity Using SVM and Optimal Descriptor Sets. IJMS. 22, 8073 (2021). https://doi.org/10.3390/ijms22158073
Thirunavukkarasu, Singh, A.S., Irfan, M., Chowdhury, A.: Prediction of Liver Disease using Classification Algorithms. In: 2018 4th International Conference on Computing Communication and Automation (ICCCA). pp. 1–3 (2018)
Razali, N., Mustapha, A., Wahab, M.H.A., Mostafa, S.A., Rostam, S.K.: A Data Mining Approach to Prediction of Liver Diseases. J. Phys. : Conf. Ser. 1529, 032002 (2020). https://doi.org/10.1088/1742-6596/1529/3/032002
Ayeldeen, H., Shaker, O., Ayeldeen, G., Anwar, K.M.: Prediction of liver fibrosis stages by machine learning model: A decision tree approach. In: 2015 Third World Conference on Complex Systems (WCCS). pp. 1–6 (2015)
Belavigi, D.H., Veena, G.S., Harekal, D.: Prediction of Liver Disease using Rprop, SAG and CNN. Int. J. Innovative Technol. Exploring Eng. (IJITEE). 8, 8 (2019)
Kumar, S., Katyal, S.: Effective Analysis and Diagnosis of Liver Disorder by Data Mining. In: 2018 International Conference on Inventive Research in Computing Applications (ICIRCA). pp. 1047–1051 (2018)
Hashem, S., Esmat, G., Elakel, W., Habashy, S., Raouf, S.A., Elhefnawi, M., Eladawy, M.I., ElHefnawi, M.: Comparison of Machine Learning Approaches for Prediction of Advanced Liver Fibrosis in Chronic Hepatitis C Patients. IEEE/ACM Trans. Comput. Biol. and Bioinf. 15, 861–868 (2018). https://doi.org/10.1109/TCBB.2017.2690848
Vats, V., Zhang, L., Chatterjee, S., Ahmed, S., Enziama, E., Tepe, K.: A Comparative Analysis of Unsupervised Machine Techniques for Liver Disease Prediction. In: 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). pp. 486–489 (2018)
Assistant, Professor, Department of Computer Science, Thiruvalluvar University College of Arts and Science, Thennangur, V., Kuppan, P., Manoharan, N.: Head and Assistant Professor, Department of Computer Science, Thiruvalluvar University College of Arts and Science, Thennangur,Vandavasi: A Tentative analysis of Liver Disorder using Data mining Algorithms J48, Decision Table and Naive Bayes. IJCOA. 6, 37–40 (2017). https://doi.org/10.20894/IJCOA.101.006.001.009
Department of Information Technology, University, B.Z., Pakistan, Pasha, M., Fatima, M.: Comparative Analysis of Meta Learning Algorithms for Liver Disease Detection. JSW. 12, 923–933 (2017). https://doi.org/10.17706/jsw.12.12.923-933
Baitharu, T.R., Pani, S.K.: Procedia Comput. Sci. 85, 862–870 (2016). https://doi.org/10.1016/j.procs.2016.05.276 Analysis of Data Mining Techniques for Healthcare Decision Support System Using Liver Disorder Dataset
Sontakke, S., Lohokare, J., Dani, R.: Diagnosis of liver diseases using machine learning. In: 2017 International Conference on Emerging Trends Innovation in ICT (ICEI). pp. 129–133 (2017)
Singh, G., Agarwal, C., Gupta, S.: Detection of Liver Disease Using Machine Learning Techniques: A Systematic Survey. In: Balas, V.E., Sinha, G.R., Agarwal, B., Sharma, T.K., Dadheech, P., Mahrishi, M. (eds.) Emerging Technologies in Computer Engineering: Cognitive Computing and Intelligent IoT, pp. 39–51. Springer International Publishing, Cham (2022)
Pasha, S.N., Ramesh, D., Mohmmad, S., Kishan, P.N., Sandeep, P.A.: C.H.: Liver disease prediction using ML techniques. Presented at the INTERNATIONAL CONFERENCE ON RESEARCH IN SCIENCES, ENGINEERING & TECHNOLOGY, Warangal, India (2022)
Poonguzharselvi, B.: M.M.A.A.: Prediction of Liver Disease Using Machine Learning Algorithm and Genetic Algorithm.Annals of the Romanian Society for Cell Biology.2347–2357(2021)
Yajurved, J., Prasad, P.S., Km, D.U.: Analysis of Chronic Disease (Liver) Prediction Using Machine Learning.Journal of Positive School Psychology.5489–5496(2022)
Keerthana, P.S.M., Phalinkar, N., Mehere, R., Bhanu Prakash Reddy, K., Lal, N.: A Prediction Model of Detecting Liver Diseases in Patients using Logistic Regression of Machine Learning. Social Science Research Network, Rochester, NY (2020)
Abdar, M., Zomorodi-Moghadam, M., Das, R., Ting, I.-H.: Performance analysis of classification algorithms on early detection of liver disease. Expert Syst. Appl. 67, 239–251 (2017). https://doi.org/10.1016/j.eswa.2016.08.065
UCI Machine Learning Repository: : Data Set, https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver+Patient+Dataset
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature Selection: A Data Perspective. ACM Comput. Surv. 50, 1–45 (2018). https://doi.org/10.1145/3136625
Research, Scholar, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamilnadu, India., Latha, P.H., Mohanasundaram, R., Professor, A., School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamilnadu, India.: A New Hybrid Strategy for Malware Detection Classification with Multiple Feature Selection Methods and Ensemble Learning Methods. IJEAT. 9, 4013–4018 (2019). https://doi.org/10.35940/ijeat.B4666.129219
Bhandari, N.: ExtraTreesClassifier, (2018). https://medium.com/@namanbhandari/extratreesclassifier-8e7fc0502c7,
ML | Extra Tree Classifier for Feature Selection:, (2019). https://www.geeksforgeeks.org/ml-extra-tree-classifier-for-feature-selection/,
9 Feature Transformation & Scaling: Techniques| Boost Model Performance, (2020). https://www.analyticsvidhya.com/blog/2020/07/types-of-feature-transformation-and-scaling/,
Radovic, M., Ghalwash, M., Filipovic, N., Obradovic, Z.: Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinf. 18, 9 (2017). https://doi.org/10.1186/s12859-016-1423-9
Aslan, S.N., Özalp, R., Uçar, A., Güzeliş, C.: New CNN and hybrid CNN-LSTM models for learning object manipulation of humanoid robots from demonstration. Cluster Comput. 25, 1575–1590 (2022). https://doi.org/10.1007/s10586-021-03348-7
Lanjewar, M.G., Gurav, O.L.: Convolutional Neural Networks based classifications of soil images. Multimed Tools Appl. 81, 10313–10336 (2022). https://doi.org/10.1007/s11042-022-12200-y
Lanjewar, M.G., Morajkar, P.P., Parab, J.: Detection of tartrazine colored rice flour adulteration in turmeric from multi-spectral images on smartphone using convolutional neural network deployed on PaaS cloud. Multimed Tools Appl. 81, 16537–16562 (2022). https://doi.org/10.1007/s11042-022-12392-3
Brownlee, J.: How to Use StandardScaler and MinMaxScaler Transforms in Python, (2020). https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/,
Understanding, L., Regression, (2017). https://www.geeksforgeeks.org/understanding-logistic-regression/,
Lanjewar, M.G., Parate, R.K., Parab, J.S.: Machine Learning Approach with Data Normalization Technique for Early Stage Detection of Hypothyroidism. In: Artificial Intelligence Applications for Health Care, pp. 91–108. CRC Press (2022)
Pant, A.: Introduction to Logistic Regression, https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148
Karagül Yıldız, T., Yurtay, N., Öneç, B.: Classifying anemia types using artificial learning methods. Eng. Sci. Technol. Int. J. 24, 50–70 (2021). https://doi.org/10.1016/j.jestch.2020.12.003
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010
Chidambaram, S., Srinivasagan, K.G.: Performance evaluation of support vector machine classification approaches in data mining. Cluster Comput. 22, 189–196 (2019). https://doi.org/10.1007/s10586-018-2036-z
Priya, M., Juliet, P., Tamilselvi, P.: Performance Analysis of Liver Disease Prediction Using Machine Learning Algorithms, https://www.semanticscholar.org/paper/Performance-Analysis-of-Liver-Disease-Prediction-Priya-Juliet/d5bd2f34087fd9e4de29eb6cff328f7bc5e63b20
Pathan, A.: Comparative Study of Different Classification Algorithms on ILPD Dataset to Predict Liver Disorder. IJRASET. 6, 388–394 (2018). https://doi.org/10.22214/ijraset.2018.2056
Muthuselvan, S., Rajapraksh, S., Somasundaram, K., Karthik, K.: Classification of Liver Patient Dataset Using Machine Learning Algorithms. IJET. 7, 323 (2018). https://doi.org/10.14419/ijet.v7i3.34.19217
Kaur, H., Bhalla, S., Raghava, G.P.S.: Classification of early and late stage liver hepatocellular carcinoma patients from their genomics and epigenomics profiles. PLoS ONE. 14, e0221476 (2019). https://doi.org/10.1371/journal.pone.0221476
Shaker Abdalrada, A., Hashim Yahya, O., Hadi, M., Alaidi, A., Ali Hussein, N., Alrikabi, T.H., Al-Quraishi, H.: A Predictive model for liver disease progression based on logistic regression algorithm. PEN. 7, 1255 (2019). https://doi.org/10.21533/pen.v7i3.667
Harshpreet Kaur, G.S.: The Diagnosis of Chronic Liver Disease using Machine Learning Techniques. ITII. 9, 554–564 (2021). https://doi.org/10.17762/itii.v9i2.382
Dattatreya, P., Mankame, Harshitha, R., Navya, N.C., Nitin Ravichander, Machine Learning Techniques in Analysis and Prediction of Liver Disease,IJIRT,Volume8, Issue 2, (2022)
Mostafa, F., Hasan, E., Williamson, M., Khan, H.: Statistical Machine Learning Approaches to Liver Disease Prediction. Livers. 1, 294–312 (2021). https://doi.org/10.3390/livers1040023
Joloudari, J.H., Saadatfar, H., Dehzangi, A., Shamshirband, S.: Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection. Inf. Med. Unlocked. 17, 100255 (2019). https://doi.org/10.1016/j.imu.2019.100255
Singh, J., Bagga, S., Kaur, R.: Software-based Prediction of Liver Disease with Feature Selection and Classification Techniques. Procedia Comput. Sci. 167, 1970–1980 (2020). https://doi.org/10.1016/j.procs.2020.03.226
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Not applicable.
Corresponding authors
Ethics declarations
Conflicts of interest
We have no conflicts of interest to disclose
Research involving human participants and/or animals
Not applicable.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lanjewar, M.G., Parab, J.S., Shaikh, A.Y. et al. CNN with machine learning approaches using ExtraTreesClassifier and MRMR feature selection techniques to detect liver diseases on cloud. Cluster Comput 26, 3657–3672 (2023). https://doi.org/10.1007/s10586-022-03752-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-022-03752-7