1 Introduction

HCC is the second prominent reason for cancer related fatality globally and is fifth most common cancer type (Galle et al. 2018; Njei et al. 2015). It is also categorized as fifth most common cause of cancer in men and the seventh most common cause in women (Fitzmaurice et al. 2018). In studies, it was found that probability of occurring of HCC is more frequent in males as compared to females (2.4:1) (Ferlay et al. 2010). HCC is typical category of primary liver cancer and accounts for around 75% of all liver cancer cases. Intrahepatic Cholangiocarcinoma (ICC) is another classification of liver cancer and comprises around 12–15% of all liver cancer cases (Petrick and McGlynn 2019). The liver cancer incidence and mortality vary from Europe to Eastern Asia due to its wide geographical variations. Figure 1 shows the incidence and mortality (both sexes) for liver cancer globally (Ferlay et al. 2018). In 2018, total 841,080 new incidences and 781,631 mortality was reported globally. Its incidence varies from Asia (609,596 new cases) to Europe (82,466 new cases) and mortality also varies from Asia (566,269 fresh cases) to Europe (77,375 fresh cases) (Ferlay et al. 2018). HCC primarily appears in the individuals with chronic liver diseases, such as cirrhosis. Cirrhosis is caused mainly by Hepatitis B and Hepatitis C infections. Chronic infections with Hepatitis B virus and/or Hepatitis C Viruses are major risk factors for HCC.

Fig. 1
figure 1

a Liver cancer incidences globally in 2018 and b Mortality due to liver cancer in 2018

Approximately 60% of HCC cases are caused by viral hepatitis (de Martel et al. 2015). Alcohol, heavy exposure to aflatoxin and metabolic syndrome are the other important risk factors of HCC. From statistical data, it has been found that number of fatality due to HCC is high and is likely to be continued. Regardless of improved screening and discoveries in medicines, HCC shows escalated mortality rate. The 5-year overall survival of HCC patients is 3–5% across all countries (Dhanasekaran et al. 2012).

With the recent developments in data mining, soft computing and machine learning techniques many researchers have taken keen interest in medical and clinical analysis from the available data sources. Recently, researchers and medical practitioner have been applying machine learning and statistical methods to develop prediction models for clinical analysis and treatments. Many state-of-art literatures have been presented for prediction of HCC. Masaya et al. (2019) proposed Gradient Boosting (GB) based prediction model. Clinical information was collected from 1582 patients (539 HCC patients; 1043 non-HCC patients) at University of Tokyo Hospital from January 1997 to May 2016. Using Gradient Boosting classifier, they obtain an accuracy of 87.34%, for the data used in their study, among all the proposed classifiers. Decision Tree Algorithm based HCC prediction model was proposed by Omran et al. (2015). The data was collected from Endemic Medicine Department, Cairo University Hospital, Egypt. Total 315 patients with Hepatitis C Virus (HCV) related chronic liver disease were registered for the study. 135 patients were suffered from HCC, 116 Cirrhosis of the liver patients without HCC and 64 patients with chronic hepatitis C.

The dataset comprises 29 features that encompass demographic features, haematological features, biological features, viral markers with additional clinical features. The proposed decision tree algorithm was able to predict HCC instances with an accuracy of 82.2%, sensitivity (recall) of 83.5% and specificity of 83.3%. The decision tree model predicts serum AFP as the foremost feature for HCC prediction. Their study reveals that male patients are 2.9 times more prone to develop HCC as compared to female patients. Liang et al. (2016) proposed biomarkers for early prediction of HCC. They proposed metabolic profiling, multivariate data exploration, machine learning method, pathway examination and ROC for the analysis of HCC. The proposed model with identified biomarkers achieves an accuracy of 83%, sensitivity of 96.50% and specificity of 83%. Comparison between Artificial Neural Network (ANN) and Logistic Regression (LR) based model was proposed by Chiu et al. (2013) for prediction of significant mortality attributes for HCC. Their ANN model shows better performance than LR model. The clinical information consisting of 21 features, that includes demographics and hepatic biochemical parameters of patients, was collected from 434 patients at Kaohsiung Medical University Hospital and Yuan’s Hospital, Taiwan. Comorbidity, liver cirrhosis, α-Fetoprotein, platelet, ASA classification, and TNM stage were predicted as highest significant features by ANN model. Their ANN model predicts an accuracy of 85.10%. Another ANN based HCC prediction model was proposed by Liu et al. (2020). Their model comprises of 39 features (10 patient related features, 03 HBV-related features, 19 laboratory data related features, 03 tumour-related features, 04 subsets of BCLC staging) and 3 target features. Their model predicts AUROC of 87.70%.

Machine learning based HCC survival prediction models were proposed by different authors. Dong et al. (2019) presents model for survival of HCC patients based on DNA methylation and machine learning. Cox regression as well as Support Vector Machine (SVM)‐Recursive Feature Elimination (RFE) algorithm and forward‐SVM algorithms were proposed to screen differently methylated sites. The proposed SVM-RFE model obtain tenfold cross-validation score of 0.50 and FW-SVM obtain tenfold cross-validation score of 0.95. The model predicted the best score with 134 best selected features. Shi et al. (2012) introduces prediction model for mortality after liver cancer surgery. Comparative analysis of Artificial Neural Network (ANN) and Logistic Regression (LR) Models were presented and the result shows that ANN performs better than LR in terms of accuracy, Hosmer–Lemeshow (H–L) statistics and AUROC curves. ANN model attains accuracy of 97.28%, H–L Statistics of 41.18% and AUROC curve value of 84.67% as compared to LR model with accuracy of 88.29%, H–L statistics of 54.53% and AUROC curve value of 76%. An unsupervised cluster-based survival prediction model of HCC patients was used by Santos et al. (2015). Neural Network (NN) and Logistic Regression (LR) classifiers were used for prediction in terms of accuracy, AUC and F-Measure. NN demonstrates better performance than LR classifier. NN achieves an accuracy of 75.2%, AUC of 70% and F- Measure score as 0.665. Inclusion of penalty function to the existing firefly algorithm for HCC prediction was projected by Sawhney et al. (2018). Firefly algorithm with penalty function was used to evaluate the most optimal subset of features. The optimal subset of features Random Forest classifier with optimum subset of features was used HCC classification. The number of subset of features were reduced to eight and accuracy of 83% was attained with RF classifier. Tuncer and Ertam (2019) used Neighbourhood Component Analysis (NCA) and reliefF methods for feature reduction. Twenty-three traditional machine learning classifiers were used for HCC prediction. The result was predicted in terms of accuracy, precision, recall and F-1 score. An accuracy of 92.12% and 83.03% was obtained for NCA and reliefF based methods.

A gene-based study for discriminating HCC from cirrhosis tissues was proposed by Zhang et al. (2020). Machine learning approach was applied on microarray data having 1091 HCC samples and 242 without HCC samples. Within-sample relative expression ordering (RECs) technique was implemented to draw out numerical descriptors. Maximum redundancy minimum relevance (mRMR) feature extraction was implemented to obtain significant gene pairs. Classification of gene pairs was obtained using Support Vector Machine classifier. Their proposed model obtained eleven most significant gene pairs with excellent classification result. These investigated gene pairs can be expressed as signature for HCC. The obtained signature gene pairs were:—TRMT112-SF3B1; MFSD5-COLEC10; FDXR-APC2; LAMC1-CHST4; UBE4B-HGF; NCAPH2-APC2; HSPH1-MTHFD2; TMEM38B-AGO3; PLGRKT-COLEC10; HNF1A-APC2; ARPC2-SF3B1. Their model had significant advantage that it can discriminate HCC and non-HCC samples for minimum biopsy specimens and even for inaccurately samples specimens.

In the present work, novel hybrid models using, LASSO Regression, Ridge Regression, Genetic Algorithm optimization and Random Forest with three machine learning classifiers are proposed for HCC prediction. The proposed method consists of data pre-processing, feature selection/optimization and classification. The main contributions of this paper are:

  • For performance evaluation, LASSO Regression, Ridge Regression, Genetic Algorithm optimization and Random Forest based feature evaluators with machine learning classifiers are presented.

  • Excellent performance results in terms of accuracy, recall, F-1 score, Jaccard score and AUROC.

  • The proposed methodology is compared with existing methodology and proposed methodology shows improved performance existing methods.

2 Material and proposed methodology

The proposed methodology is shown in Fig. 2. In first step, HCC liver cancer survival dataset, available at UCI data repository (UCI 2020), was selected for analysis. The data was collected at University Hospital in Portugal (Santos et al. 2015). The dataset contains 49 feature values obtained from 165 patients diagnosed with HCC. Step 2 performs the data pre-processing of HCC survival dataset. Step 3 performs feature weight assignment for assessing the feature importance. Step 4 performs the model implementation with significant features. Step 5 measures the performance of proposed model in terms of Accuracy, Recall, F1 Score, Log-Loss Score, Jaccard Score and AUROC.

Fig. 2
figure 2

Proposed methodology

2.1 Dataset

The dataset encompasses analysis of demographic, risk factor, laboratory and overall survival features from 165 patients diagnosed with HCC. The dataset covers 49 features for prediction of survival of HCC patients. The dataset consists of clinical attributes that are considered to be notable for clinical decision process. The clinical attributes considered for analysis are: Gender, Symptoms, Alcohol, HBsAg (Hepatitis B Surface Antigen), HBeAg (Hepatitis B e-Antigen), HBcAb (Hepatitis B Core Antibody), HCVAb (Hepatitis C Virus Antibody), Cirrhosis, Endemic Countries, Smoking, Diabetes, Obesity, Hemochromatosis, Arterial Hypertension, Chronic Renal Insufficiency, Human Immunodeficiency Virus, Non Alcoholic Steatohepatitis, Esophageal Varices, Splenomegaly, Portal Hypertension, Portal Vein Thrombosis, Liver Metastasis, Radiological Hallmark, Age at diagnosis, Grams of Alcohol per day, Packs of cigarettes per year, Performance Status, Encefalopathy degree, Ascites degree, International Normalised Ratio, Alpha-Fetoprotein (ng/mL), Haemoglobin (g/dL), Mean Corpuscular Volume (fl), Leukocytes(G/L), Platelets (G/L), Albumin (mg/dL), Total Bilirubin(mg/dL), Alanine transaminase (U/L), Aspartate transaminase (U/L), Gamma glutamyl transferase (U/L), Alkaline phosphatase (U/L), Total Proteins (g/dL), Creatinine (mg/dL), Number of Nodules, Major dimension of nodule (cm), Direct Bilirubin (mg/dL), Iron (mcg/dL), Oxygen Saturation, Ferritin(ng/mL). The target variable is encoded with value of 0 (patient did not survive) and 1 (patient survived). The description of qualitative input variables and quantitative input variables are presented in Tables 1 and 2 respectively.

Table 1 Description of qualitative (nominal) input attributes
Table 2 Description of quantitative (numeric) input attributes

2.2 Data pre-processing

The dataset contains 49 attributes with 23 attributes having nominal value and 26 attributes with continuous values as presented in Tables 1 and 2 respectively. Initially, 10.22% data is missing in the entire dataset. Santos MS et al. in 2015 observed that missing value imputation can be carried out using KNN with different values of k and with k = 1 gives the best fit for the missing values for the given dataset. It was further evaluated by Beretta and Santaniello (2016) that missing value imputation using KNN for any value of k > 1, standard deviations are significantly affected and inflated, hence KNN with k = 1 outperformed. So, KNN with k = 1 using Heterogeneous Euclidean Overlap Metric (HEOM) Distance is used for missing value imputation.

HEOM was described by Wilson and Martinez (1997, 2000) as an example of a heterogeneous distance measure. Suppose, we wish to find distances between some subset of n objects and that for each object we have measured the values of R predictors, Let J = {1, 2,..., n} be an index set for each of the n objects. For each i, j ∈ J, the HEOM defines the distance between the ith object and the jth object as

$$HEOM ({P}_{i},{P}_{j})={\sum }_{r=1}^{R}{d}_{r}({P}_{i,r},{P}_{j,r})$$
(1)

where

$${d}_{r}\left({P}_{i,r},{P}_{j,r}\right)=\left\{\begin{array}{c}\frac{\left|{P}_{i,r}-{P}_{j,r}\right|}{{range}_{r}}\, if \,r\, indexes\, a\, continuous\, attribute\\ {\delta }_{i,j} \,if \,r \,indexes\, a \,categorical\, attribute\end{array}\right.$$
(2)

And \({\delta }_{i,j}=1\) if \({P}_{i,r} \ne {P}_{j,r}\) and \({\delta }_{i,j}=0\) if \({P}_{i,r}={P}_{j,r}\). Here, \({d}_{r}({P}_{i,r},{P}_{j,r})\) can be thought of as the contribution of the rth attribute to the overall distance and \({range}_{r}\) = \(\underset{\mathrm{j}\in \mathrm{J}}{\mathrm{max}}\left\{{P}_{j,r}\right\}-\underset{\mathrm{j}\in \mathrm{J}}{\mathrm{min}}\left\{{P}_{j,r}\right\}\). Notice that a continuous attribute’s contribution to the HEOM distance is bounded above by 1.

The HCC dataset containing clinical attributes of 63 patients (dead) with target value encoded as 0 and clinical attributes of 102 patients (alive) with target value encoded as 1. The number of instances of dead and alive cases illustrates certain grade of class disproportion. Synthetic Minority Over-sampling Technique (SMOTE) proposed by Nitesh et al. (2002) is pertain to remove the class disproportionate. SMOTE is an oversampling approach used to attain quasi samples from minority class. This oversampling technique had varied applications in different areas (Sharma 2019; Fallahi and Jafari 2011; Liu et al. 2006; MacIsaac et al. 2006) with k = 3 as nearest neighbour value, SMOTE generate 204 pseudo samples with 102 instances each for target value.

2.3 Feature correlation and feature importance

The HCC dataset implemented in proposed machine learning model has 49 attributes with one target class. Certain features could be strongly correlated with other features. So, it is worth to eliminate one feature from these highly correlated features. Figure 3 shows the correlation amongst features and correlation of features with target class. The features with dark colour in figure show strong positive correlation and features with low colour shows negative correlation. One of the features from two features having high correlation can be eliminated as they have same consequence on the target class.

Fig. 3
figure 3

Heatmap showing the correlation of feature variables and target class

LASSO Regression, Ridge Regression, Genetic Algorithm optimization and Random Forest are proposed for feature evaluation and feature elimination.

2.4 Model implementation using random forest and gradient boosting hybrid approach

In this paper, Random Forest and Gradient Boosting hybrid approach is proposed for survival prediction of HCC. Weight is assigned to individual feature using proposed feature evaluation technique and significant features having high weight value are selected for prediction. Initially, the dataset has 49 features for survival prediction of HCC.

2.4.1 Random forest approach

Random Forest utilize construction of multiple trees (Breiman 2001). While constructing the tree, RF explore random subset of input variables at each division of node and the tree matures fully without pruning. Due to random selection of variables at each node, the correlation among the tree in forest decreases and hence the forest rate decreases (Hideko and Hiroaki 2012). Tree progression in RF can be given as:

  • At node N, randomly sample R from the given the independent variables Q.

  • For every random sampled variable (D = 1,2,3……R), estimate the best split AD amongst all the probable splits for Dth variable.

  • Select the optimum split AO among D = 1,2,3……R, best splits AD.

  • This Jth variable at its recognized cut point CAO is used to divide the node N.

  • Now, split the data at this node by sending the P = 1,2,3…. T observations with YTJ < CAO to the left descendant and all the observations with YTJ > CAO to the right descendant.

  • Repeat the steps till the tree matures.

Gini Importance approach is implemented to select the split with lowest impurity at each node. For each node N in decision tree, the split is estimated by the decrease in Gini impurity \(\Delta GI\left(N\right)\). Whereas, Gini impurity is given as

$$\Delta GI\left(N\right)={S}_{T}\Delta I\left(N\right)-{S}_{L}\Delta I\left({N}_{L}\right)-{S}_{R}\Delta I\left({N}_{R}\right)$$
(3)

where \(\Delta I\left(N\right)\) is known as Gini Index and can be given as

$$\Delta I\left(N\right)=1-\sum_{k}r({k|N)}^{2}$$
(4)

where, \(r(k|N)\) is the rate at which target class \(k\) is discriminated correctly at node N;

\(\Delta I\left({N}_{L}\right)\) and \(\Delta I\left({N}_{R}\right)\) are the Gini Index on the left side and right side of the node respectively; \({S}_{T}\) is the number of samples before split; \({S}_{L}\) and \({S}_{R}\) are the number of samples on left and right side of node after split.

Gini Importance can be obtained from average of all the decrease in Gini Impurity. Simulation parameters for RF approach are given in Table 3.

Table 3 Simulation parameters for regression and optimisation models

Figure 4 shows the relative feature importance of all the input attributes. It has been observed that 19 features have significant impact on result prediction. The significant features are:—‘Age at diagnosis’, ‘Performance Status’, ‘Alpha-Fetoprotein’, ‘Haemoglobin’, ‘Mean Corpuscular Volume’, ‘Leukocytes’, ‘Platelets’, ‘Albumin’, ‘Total Bilirubin’, ‘Aspartate transaminase’, ‘Gamma glutamyl transferase’, ‘Alkaline phosphatase’, ‘Total Proteins’, ‘Creatinine’, ‘Major dimension of nodule’, ‘Direct Bilirubin’, ‘Iron’, ‘Oxygen Saturation’, and ‘Ferritin’.

Fig. 4
figure 4

Relative feature importance score using Random Forest feature selector

The proposed Random Forest approach predicts ‘Alpha-Fetoprotein’, ‘Hemoglobin’, ‘Ferritin’ and ‘Alkaline phosphatase’ as most significant factors.

Figure 5 shows the heatmap of features selected by proposed Random Forest feature selector. Simulation results shows that higher accuracy can be achieved with selected features.

Fig. 5
figure 5

Heatmap of variables selected by Random Forest feature selector

2.4.2 Gradient boosting approach

Gradient Boosting is non-parametric algorithm proposed by Friedman (2001). The objective of GB algorithm is to sequentially build each decision tree model on the gradient descent direction of a loss function. Each supplement base model is intended to correct the errors made by its preceding base models. The loss function defines the accuracy of the models. Greater is the loss function, worst is the prediction accuracy of model. Prediction accuracy can be increased if the loss function decreases with each supplement of new base model. The probable method is to let the value of the loss function deteriorate in the direction of its gradient descent. The pseudo code for GB is as follows:

figure a

In step i), base model F0(x)is initialized by GB. In step ii), for k boosting stages it trains k models using for loop. Increasing the boosting stages k diminishes the error on the training set, but very high values of k leads to the problem of overfitting. Using Loss-Function, for every trained/imperfect model k, the value of negative gradient \({\mathbb{Z}}_{i}\) is calculated according to already trained k-1 models. Step iii) constructs new prediction model \(h(x;\Phi )\) and attain its parameter \(\Phi\) by fitting it to the \({\mathbb{Z}}_{i}\). Mean square method is used to achieve the minimum value in gradient direction. Step iv) estimates the gradient descent step size of the new model using the loss function. Step v) updates the model using prediction model \(h(x;\Phi )\). The proposed GB model use deviance as loss function and 100 boosting stages.

3 Performance evaluators

For the proposed model, prediction performance is measured in terms of Accuracy (%), Recall (%), F1 Score, Log-Loss Score and Jaccard Score. The performance evaluators can be defined as:

\(Accuracy= \frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}\); it defines the accurately predicted number of test events from the total number of test events.

\({\varvec{R}}ecall=Sensitivity=True Positive Rate \left(TPR\right)=\) \(\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}\); it signifies the number (%age) of correct positive prediction from total number of positives. Value of 1 (100%) indicate as best sensitivity and 0 (0%) indicate worst sensitivity.

where, True Positive (TP) signifies correct positive prediction; False Positive (FP) indicate incorrect positive prediction; False Negative (FN) indicate incorrect negative prediction and True Negative (TN) signifies correct negative.

3.1 F1 Score

It can be defined as weighted average of the precision (PPV) and recall (TPR). A model with F1 score of 1 is assumed to be its best value and 0 to be its worst value. Mathematically, F1 score is given as:

$${\text{F1 Score}}\, = \,{2 }* \, \left[ {{\text{Precision }}\left( {{\text{PPV}}} \right) \, *{\text{ Recall }}\left( {{\text{TPR}}} \right)} \right] \, / \, [{\text{Precision }}\left( {{\text{PPV}}} \right)\, + \,{\text{Recall }}\left( {{\text{TPR}}} \right)].$$

3.2 Log-loss score

This score is demarcated on probability approximations. This is also known as cross-entropy loss score. Instead of defining discrete predictions, this score is used to evaluate probability outputs. Mathematically, Log-Loss for a binary classifier can be defined as:

$$\mathrm{Log Loss}=\sum_{\mathrm{j}=1}^{\mathrm{M}}{\mathrm{y}}_{\mathrm{j}}\mathrm{log}\left({\mathrm{p}}_{\mathrm{j}}\right)+\left(1-{\mathrm{y}}_{\mathrm{j}}\right)\mathrm{log}(1-{\mathrm{p}}_{\mathrm{j}})$$
(9)

where \({\mathrm{p}}_{\mathrm{j}}\) is the likelihood/probability that the jth data point fits to class "1" as forecast by the classifier and \({\mathrm{y}}_{\mathrm{j}}\) is the actual class can be either "0" or "1". It evaluates the uncertainty of the likelihoods of proposed model by equating them with true label. The accuracy of classifier can be maximized by minimizing the log loss score. A good model should have a small log loss score. The lower is the log loss better is the prediction.

3.3 Jaccard score

Jaccard score calculates the average value of Jaccard Similarity Coefficients (JSC) amongst pairs of label sets.

Mathematically, JSC can be calculated as

$$\mathrm{J}\left({\mathrm{T}}_{\mathrm{i}},{\mathrm{P}}_{\mathrm{i}}\right)=\frac{\left|{\mathrm{T}}_{\mathrm{i}}\cap {\mathrm{P}}_{\mathrm{i}}\right|}{\left|{\mathrm{T}}_{\mathrm{i}}\bigcup {\mathrm{P}}_{\mathrm{i}}\right|}= \frac{\left|{\mathrm{T}}_{\mathrm{i}}\cap {\mathrm{P}}_{\mathrm{i}}\right|}{\left|{\mathrm{T}}_{\mathrm{i}}\right|+\left|{\mathrm{P}}_{\mathrm{i}}\right|+\left|{\mathrm{T}}_{\mathrm{i}}\cap {\mathrm{P}}_{\mathrm{i}}\right|}$$
(10)

where, \({\mathrm{T}}_{\mathrm{i}}\) is the actual truth label set and \({\mathrm{P}}_{\mathrm{i}}\) is the predicted label set. A good model should have a high Jaccard Score.

4 Performance evaluation of models

Serval experiments have been conducted using distinct penalized and optimization techniques on proposed machine learning algorithms for prediction of HCC survival. To evaluate the accuracy and other prediction parameters, the data set is divided randomly into training dataset (80%) which is used to build the model and test dataset (20%) to test the model. L-1 Penalized Nu-SVC Model, L-1 Penalized GBEL Model, L-1 Penalized RidgeCV Model, L-2 Penalized Nu-SVC Model, L-2 Penalized GBEL Model, L-2 Penalized RidgeCV Model, GA Optimized Nu-SVC Model, GA Optimized GBEL Model, GA Optimized RidgeCV Model, RF-Nu-SVC Model, RFGBEL Model and RF-RidgeCV Model are tested for envisaging the result. The experiments were simulated using Python 3.8 on an IBM PC with Intel Core i-7–6700 CPU @ 3.40 GHz processor with 8 GB RAM. Table 3 shows simulation parameters for L-1 Penalized, L-2 Penalized, Genetic Algorithm Optimized and RF models. The simulation parameters have their usual meanings. The performance of each classifier is measured in terms of Accuracy (%), Recall (%), Precision (%), F1 Score, Log-Loss Score and Jaccard Score.

4.1 Nu-support vector classification (Nu-SVC) hybrid model

Nu-SVC is analogous to SVC except that in Nu-SVC the number of support vectors can be specified. The Nu denotes the numeral values of samples that act as support vectors but lie on the wrong side of the hyperplane. It represents the limit of higher bound on the segment of training errors and lower bound of segment support vectors. Nu-SVC model with simulation parameters: Nu = 0.5; class weight = 1; degree of the polynomial kernel = 3; kernel = RBF, hard limit on iterations = -1 and tolerance for stopping criteria = 0.001 is proposed for prediction of HCC. Table 4 presents performance analysis of Nu-SVC classifier with different penalization and optimization techniques. It was found that, L-2 penalize and GA optimized Nu-SVC models attain lowest Log-Loss score as 15.16 and highest accuracy as 56.09%. Except GA optimized model, all models evaluate F-1 score as 0.63. Nu-SVC, L-1 penalized Nu-SVC, and RF-Nu-SVC predicts 100% recall/sensitivity value which signifies that these models are more efficient in correct positive prediction from the total number of positive values.

Table 4 Prediction performance measure of Nu-SVC models

Nu-SVC, L-1 Penalized Nu-SVC Model and RF-Nu-SVC Model predict same accuracy, recall, F-1 score, Log-Loss score and Jaccard score. GA optimized Nu-SVC demonstrates better performance in contrast to other Nu-SVC models in terms of accuracy, F-1 score, Log-Loss score and Jaccard score. Nu-SVC Model, L-1 Penalized Nu-SVC Model and RF-Nu-SVC Model demonstrates same outcome for all the performance matrices. As the Nu-SVC model acquires high value of Log-Loss which indicates the poor performance in terms of accuracy.

4.2 RidgeCV hybrid model

RidgeCV classifiers are based upon Ridge regression classifiers that perform cross validation. The proposed model uses tenfold cross validation. RidgeCV Model with simulation parameters Alpha array = ([ 0.1, 1., 10.]), cv = 10, class weight = 1 and maximum number of iterations = 1000 is proposed for result prediction. Table 5 shows the predicted results for Hybrid RidgeCV models. L-1 and L-2 penalized RidgeCV models predict same result in terms of accuracy (63.41%) and log loss (12.63). RidgeCV and RF-RidgeCV models predict the same value of accuracy (65.85%) and Log-Loss (11.79). Similar F-1 score (0.63) was evaluated by L-1 Penalized RidgeCV Model and RF-RidgeCV Model. GA optimized RidgeCV model shows superior performance as compared to other RidgeCV hybrid models in terms of accuracy, recall, F-1 score, Log-Loss score and Jaccard score.

Table 5 Prediction performance measure of RidgeCV models

It predicts the result with highest accuracy of 68.29%, recall/sensitivity of 73.68, Jaccard score of 0.51, F-1 score of 0.68 with minimum Log-Loss score of 10.95 amongst all RidgeCV hybrid models. All proposed RidgeCV models show improved performance as compared to Nu-SVC models and they obtain low Log-Loss score as compared to Nu-SVC models which indicated its better performance than Nu-SVC models.

4.3 Gradient boosting ensemble learning (GBEL) hybrid model

Gradient Boosting Ensemble Learning Model with simulation parameters: Cost-Complexity Pruning = 0; evaluation Criterion = Friedman Mean Square Error; Learning Rate = 0.1; Loss Function = Deviance; Validation Fraction = 0.1 and Boosting Stages = 100 is proposed for HCC survival result prediction.

The result presented in Table 6 shows that RFGBEL model predicts excellent results for all performance metrices. RFGBEL model predicts the HCC results with an accuracy of 93.92%. The RFGBEL model shows a significant improvement of 14.83–23.19% in accuracy as compared to other Gradient Boosting models. The RFGBEL model attain minimum cross entropy loss score. It obtains a cross entropy loss score of 5.89 as compared to highest cross entropy loss score of 10.10 obtained by GBEL model. The RFGBEL model shows a significant improvement in cross entropy loss of 2.53–4.21 amongst other proposed GBEL models. RFGBEL model shows excellent result in terms recall, F-1 score and Jaccard score. It estimates 94.73% recall, F-1 score of 0.93 and Jaccard score of 0.72. It shows significant improvement of 21.05% in recall, 0.19 improvement in Jaccard score and improvement of 0.16 in F-1 score as compared to other GBEL models.

Table 6 Prediction performance measure of GBEL models

5 Discussion

GBEL models (Table 6) particularly RFGBEL model demonstrates excellent performance for HCC prediction in contrast to Nu-SVC models (Table 4) and RidgeCV models (Table 5). Nu-SVC hybrid models shows unfavourable performance in terms of accuracy, F-1 score, Log-Loss score and Jacard score. It shows minimal performance, in contrast to RidgeCV and GBEL models, with average accuracy, F-1 score, Log-Loss score and Jacard score of 50.24%, 0.63, 17.18 and 0.47 respectively. However, RidgeCV hybrid models achieve improved performance in comparison to Nu-SVC hybrid models. RidgeCV hybrid models predict the HCC with an average accuracy, F-1 score, Log-Loss score and Jacard score of 65.36%, 0.64, 11.95 and 0.47 respectively. RidgeCV models shows an average improvement of 15.12%, and 5.23 (decrease) in accuracy and Log-Loss score respectively in contrast to Nu-SVC model. Nu-SVC and RidgeCV models foresees identical average jacard score. GBEL models accomplish excellent results for HCC prediction in terms of accuracy, recall, F-1 score, Log-Loss score and Jaccard score. The RFGBEL model outperform in terms of all performance measurement matrices. It classifies the survival and non-survival HCC samples with an accuracy and sensitivity of 93.92% and 94.73% respectively. The RFGBEL model also achieves excellent result in terms of F-1 score, Log-Loss score and Jaccard score with their values as 0.93, 5.89 and 0.72, respectively. In contrast to the average score of Nu-SVC and RidgeCV models, RFGBEL model shows significant improvement of 28.56 to 43.63% in accuracy, 0.29–0.30 in F-1 Score, 6.06 to 16.25 decrease in Log-Loss/Cross entropy loss and 0.25 improvement in Jaccard score.

The prediction performance of RFGBEL model is also tested in terms of AUROC. Figure 6 shows the AUC for Gradient Bosting Model, L-1 Penalized GBEL model, L-2 Penalized GBEL Model, GA optimized GBEL model and RFGBEL Model. AUROC curve is a plot between TPR and False Positive Rate (FPR) i.e. sensitivity against specificity. Area Under Curve can be computed by aggregating the area under the ROC curve. The larger is the area, the more accurate is the prediction (Bowers and Zhou 2019). The RFGBEL model computes value of AUC as 0.93. The RFGBEL model computes highest value of area under the curve amongst other proposed method. The highest value of AUC for RFGBEL model validate accurate result prediction by RFGBEL model. The high value of sensitivity predicted by RFGBEL model indicates its ability to correctly predict the positive cases.

Fig. 6
figure 6

Receiver operating curves (ROC) for a GBEL model, b L-2 penalized GBEL model, c RFGBEL model, d GA optimized GBEL model, e L-1 penalized GBEL model

The comparison result of proposed method with existing methods are presented in Table 7. Our proposed RFGBEL model predicts the result with an accuracy of 93.92%, F-1 score-0.93 and AUROC-0.93. The RFGBEL model predict Alpha-Fetoprotein as most significant factor. It can be clearly seen from the Table 7 that, our proposed RFGBEL method obtains much better performance than other methods. Using different machine learning algorithms, Tuncer et al. (2019) and Książek et al. (2019) proposed HCC prediction model in 2019. An Accuracy of 92.12%, recall of 91.20% and F-1 score 0.91 was obtained by Tuncer et al. (2019) and an accuracy of 88.49% and F-1 score of 0.87 was obtained by Książek et al. (2019). Chen et al. (2020) proposed HCC classification using three well known Support Vector Machine (SVM), Logistic Regression (LR) and Decision Tree (DT) classifiers. The missing values were handled by replacing them with median values. They obtained best prediction results with LR model with an accuracy of 73%, recall of 75%, F-1 score of 0.85.

Table 7 Comparison of proposed method with existing methods

Santos et al. (2015) proposed ANN with cluster based oversampling method for improving HCC survival prediction. Accuracy of 75.20%, F-1 score of 0.66 and AUROC of 0.70 was achieved with the proposed method. The proposed methodology was complex and time consuming as eleven different configurations (5–55 number of neurons in hidden layer in step of 5) with 30 runs, were performed to obtain optimal parameter. Elgin Christo et al. (2020) proposed co-operative coevolution approach was implemented for selection of relevant features and attributes. Random Forest classifier was implemented for HCC prediction. Their obtained an accuracy of 72.20%, recall of 70% and F-1 score of 0.73. S. Rajesh et al. (2020) proposed five different machine learning algorithms for HCC classification. Using Random Forest classifier, they achieved highest results amongst all proposed classifier with an accuracy of 80.64%, recall of 85% and F-1 score of 0.82. HCC dataset used by Tuncer and Ertam 2019; Książek 2019; Chen et al. 2020; Elgin Christo et al. 2020; Rajesh et al. 2020) contains feature values of 102 alive patients and 63 dead patients. The dataset represents some degree of imbalance in the size of patient’s profile and classification using imbalance data leads to result in favour of majority class (He and Garcia 2009; Daskalaki et al. 2006; Blagus and Lusa 2010; Hulse et al. 2007). HCC classification using Deep Neural Network (DNN) was proposed by Kayal et al. (2019). DNN with four hidden layers having 1024, 512, 256 and 128 neurons respectively in each layer was implemented. Their model obtained an accuracy of 78%, recall of 81.25% and F-1 score of 0.80. However, their model has disadvantage of selecting random number of neurons for each layer that was based on trial-and-error approach.

6 Conclusion

Distinguishing significant risk features/factors for HCC survival is of great importance in clinical medicine for screening and targeting the patients. The correct identification will help in development of better models to predict the results with more accuracy. Machine learning algorithms are capable of predicting performance parameters using static/dynamic data and the value of performance parameters can be maximized by using analytics and probabilistic models. In this study, HCC survival prediction model was developed using geographical factors, risk factors and clinical trial attributes. The suitable selection of input attributes/features advocate significant role in success of prediction model. The proposed model identifies 19 significant clinical features from 49 clinical attributes employing Random Forest approach. Proposed RFGBEL model demonstrates significant improvement in accuracy, recall, F-1 Score, Log-Loss/Cross entropy loss and Jaccard score as compared to NU-SVC, RidgeCV models. RFGBEL obtains more value of Area under curve (AUROC) as compared to other existing models. The study was compared with existing methods and it was suggested that RFGBEL model could be used as tool for predicting HCC survival using clinical attributes. It is also suggested that data mining techniques can be used as supplement tool for prognostic evaluation and clinical decision making.