Introduction

In a patient with a liver tumor, a pretreatment estimation of the remnant liver function is essential for reducing the morbidity and mortality associated with treatment [1]. The asialoglycoprotein receptor (ASGPR) is expressed on the sinusoidal surfaces of hepatocytes and contributes to the clearance of circulating glycoproteins containing terminal galactose residues [2]. Technetium-99m-galactosyl serum albumin (99mTc-GSA), an asialoglycoprotein analog, is taken up exclusively by the liver and binds specifically to ASGPR on hepatocytes [3, 4]. Decreased ASGPR expression has been reported in patients with liver damage [5]. Thus, 99mTc-GSA scintigraphy enables the direct estimation of functioning hepatocytes and is very useful for evaluations of liver function [6].

The receptor index (LHL15) and clearance index (HH15), which are calculated from liver and heart time-activity data, are the most commonly used parameters in liver function assessments [7,8,9]. However, these parameters are obtained on planar images and do not provide sufficient anatomical information for an accurate liver function assessment because they lack three-dimensional (i.e., depth) information.

Single photon emission computed tomography (SPECT)/computed tomography (CT) is a relatively new imaging modality that enables the near-simultaneous co-registration of functional and anatomical information acquired during the same imaging session and while the patient remains in the same bed position [10]. The main benefits of SPECT/CT is the improved localization of radioactivity. The standardized uptake value (SUV), a quantitative parameter used commonly during positron emission tomography, has recently been applied to bone SPECT/CT evaluations [11,12,13], and one report has investigated the SUV during 99mTc-GSA SPECT/CT evaluations of liver fibrosis [14]. Although reports have described various quantitative liver function analyses based on 99mTc-GSA scintigraphy, it remains unclear which methods are most useful in this context [7]. Therefore, some kind of new reliable method for evaluating or predicting of the liver function is required.

Machine learning is a field of computer science based on pattern recognition and computational learning that can identify patterns and relationships formed from complex multidimensional databases [15]. It relies on computer algorithms to learn and identify complex interactions among all variables by minimizing the error between predicted and observed outcomes. Compared with conventional statistical methods, which rely on predetermined models, machine learning can deeply detect the interactions among variations and iteratively learn from data to update algorithms [16].

Recently, some studies have proposed classification methods based on a machine learning approach into the research of nuclear medicine [17,18,19]. However, to our knowledge, no study has previously investigated the efficacy of a quantitative 99mTc-GSA SPECT/CT analysis using a machine learning approach for assessing the liver function. Therefore, this study aimed to assess the utility of a machine learning approach based on 99mTc-GSA SPECT/CT for predicting the liver function.

Materials and methods

Patients

Our institutional review board approved this retrospective study and waived the requirement for written informed consent. From October 2016 to December 2019, 166 consecutive patients with liver tumors underwent pretreatment 99mTc-GSA SPECT/CT to evaluate the liver function. The following inclusion criteria were applied: (1) measurement of three blood biochemical indices (albumin, total bilirubin and prothrombin time activity) within 1 week before or after 99mTc-GSA SPECT/CT and (2) performance of an indocyanine green clearance test (ICGR15: ICG retention rate at 15 minutes) within 1 week before or after 99mTc-GSA SPECT/CT.

Imaging protocols

Thoraco-abdominal planar and SPECT/CT images were acquired using a dual-head gamma camera with low-energy, high-resolution collimators and a multidetector (16-row) spiral CT scanner (Siemens Intevo SPECT/CT system; Siemens Medical Solutions USA, Inc., Hoffmann Estates, IL, USA). Immediately after the intravenous injection of 185 MBq of 99mTc-GSA (Nihon Medi-Physics Co., Ltd. Tokyo, Japan), an anterior planar image was dynamically acquired in 30-second frames for 30 minutes. Data were digitally collected using a 128 × 128 matrix on a dedicated computer system (E-soft, Siemens Healthcare GmbH, Erlangen, Germany). SPECT (45 steps at 15 s/step, 360°, 128 × 128 matrix) was initiated 30 minutes after 99mTc-GSA injection. Subsequently, CT images were acquired using a tube voltage of 130 kV and a dose-modulation algorithm with a quality reference mAs setting of 15 (CAREDose 4D; Siemens Medical Solutions USA, Inc.). SPECT data were reconstructed using Siemens xSPECT Quant with an Ordered Subset Conjugate-Gradient Minimizer (OSCGM) algorithm and the following settings: one subset, 60 iterations and a 10-mm Gaussian filter.

Image analysis

The planar, SPECT and SPECT/CT images were displayed on a workstation (Syngo.via; Siemens Healthcare GmbH, Erlangen, Germany). The anterior planar images were analyzed quantitatively by an experienced nuclear medicine radiologist who was aware of the study purpose but was blinded to the patients’ clinical information. The liver and heart time activity curves were generated using regions of interests (ROIs) that covered the whole liver and heart, respectively. The LHL15 and HH15 were determined according to a previously reported method as follows: LHL15 = liver ROI count at 15 min/(liver ROI count + heart ROI count at 15 min); HH15 = heart ROI count at 15 min/heart ROI count at 3 minutes [8, 20].

The following quantitative SPECT image analyses were performed by another nuclear medicine radiologist. First, a volume of interest (VOI) that contained the whole liver was placed manually and used to determine the liver SUVmax. Next, a threshold of 40% SUVmax was set to automatically delineate the VOI that met or exceeded this 40% threshold, and this VOI was used to calculate the liver SUVmean, metabolic liver volume (MLV) and total lesion 99mTc-GSA uptake (TL GSA). The TL GSA was calculated as the liver SUVmean multiplied by the MLV. The heart SUVmax was also recorded, using the CT images as a reference. The software (Syngo.via; Siemens Healthcare GmbH, Erlangen, Germany) automatically calculated the liver SUVmax, liver SUVmean, heart SUVmax, MLV and TL GSA. The SUVmax ratio (liver SUVmax/heart SUVmax) was calculated manually.

Assessment of liver function

This study used the liver function criteria corresponding to liver damage types A, B and C, as proposed by the Liver Cancer Study Group of Japan (Table 1) [21]. The patients were subsequently classified into low liver damage (grade A) and high liver damage (grade B and C) groups.

Table 1 Liver damage criteria presented by the Liver Cancer Study Group of Japan

Machine-learning approach

Four clinical (age, sex, background liver disease and tumor histological type) and 8 quantitative 99mTc-GSA (LHL15, HH15, liver SUVmax, liver SUVmean, MLV, TL GSA, heart SUVmax and SUVmax ratio) features were used to predict high liver damage employing machine-leaning approaches.

The machine learning classifiers in this study included a support vector machine (SVM) and random forest (RF). The SVM used the L2 penalty, while the RF used 100 estimators and a select entropy criterion. The ROC analysis was performed to compare the predictive performances of the models and the areas under the ROC curves (AUCs) were calculated.

To overcome imbalanced data, stratified sampling was used [22, 23]. To minimize the negative influence of overfitting, 5-fold cross-validation and feature selection were performed in this study [24, 25]. The 5-fold cross-validation randomly split the dataset into 5 subsets. For each repeated time, four subsets were used as the training group and the remaining subset was used as the testing data. This procedure was repeated 5 times, and each subset should be used exactly once as the testing group.

Gini impurity criterion was applied to determine the relative importance of individual features. The Gini impurity is the probability of an unseen case being incorrectly classified for a given decision or rule. Features with high Gini impurity (or low Gini importance) split the data into impure categories, while features that decrease Gini impurity are able to partition the data into purer classes with larger members. Thus, features with large decreases in Gini rank higher in importance for the model. The importance of a feature is defined as the decrease in the Gini impurity [26,27,28].

Not only feature selection method but also principal component analysis (PCA) method were conducted to compare the results of the classification based on the presence or absence of feature selection or dimensionality reduction, and the following 4 different models were obtained for the machine learning approaches; First, neither feature selection method nor PCA were performed, Second, feature selection was not performed, but PCA was conducted, Third, feature selection method using 5 best-ranked features was only performed, but PCA was not performed, Forth, both feature selection method using 5 best-ranked features and PCA were performed.

These machine learning approaches were performed using Orange version 3.24.1 (Bioinformatics Lab at University of Ljubljana, Slovenia), an open-source software package for data mining and visualization [29].

Statistical analysis

The Mann–Whitney U-test or chi-squared test was used to assess the difference between two quantitative variables or compare categorical data, as appropriate. A ROC analysis was performed to examine the diagnostic performance of each parameter for the prediction of high liver damage. The DeLong method was used to analyze the statistical significance of differences between AUCs [30].

Data are presented as medians and interquartile ranges (IQRs). All p values are two-sided, and a p value< 0.05 is considered to indicate a statistically significant difference. The statistical analysis was performed using MedCalc Statistical Software (MedCalc Software, Mariakerke, Belgium).

Results

Patient characteristics

Of the 166 initially identified patients, 38 were excluded from the analysis because they had not undergone an ICG clearance test. Finally, 128 patients (97 males, 31 females; mean [± SD] age, 71 ± 8 years; range, 39–90 years) were eligible for the analyses.

Seventy-six patients underwent liver resection, and the liver tumors received the following pathological diagnoses: hepatocellular carcinoma (HCC), 66 patients; cholangiocellular carcinoma, four patients and others, six patients. The remaining 52 patients underwent transcatheter arterial chemoembolization (TACE) for HCC.

The extent of liver damage in each patient was determined during the pre-treatment period. One hundred patients presented with low liver damage, while 28 presented with high liver damage (grade B, n = 26 and grade C, n = 2). Patients with hepatitis B and hepatitis C were significantly less likely to be positive for high liver damage, compared to those with hepatitis nonB/nonC disease. No significant differences in age, sex and liver tumor histology were observed between patients with low and high liver damage. The patients’ characteristics are summarized in Table 2.

Table 2 Clinical characteristics of patients in the low and high liver damage groups

Comparison of HH15, LHL15 and SUV-related parameters between low and high liver damage (Table 3)

As shown in Table 3, the patients with high liver damage had a significantly lower LHL15, liver SUVmax, liver SUVmean, TL GSA or SUVmax ratio and had a significantly higher HH15 or heart SUVmax, compared to those with low liver damage (p < 0.05 for all). No significant difference in the MLV was observed between the high and low liver damage groups (p = 0.31).

Table 3 Comparison of HH15, LHL15 and SUV-related parameters between patients with low and high liver damage

Diagnostic performances of each index for predicting high liver damage (Table 4)

As shown in Table 4, each index yielded the following AUC for the ability to predict high liver damage: HH15, 0.930 (p < 0.001); LHL15, 0.909 (p < 0.001); liver SUVmax, 0.650 (p = 0.012); liver SUVmean, 0.673 (p = 0.004); TL GSA, 0.713 (p = 0.001); heart SUVmax, 0.894 (p < 0.001) and SUVmax ratio, 0.917 (p < 0.001). The parameters yielded sensitivity rates ranging from 50.0% (liver SUVmean) to 100% (HH15), specificity rates ranging from 42.0% (liver SUVmax) to 89% (heart SUVmax) and accuracy rates of 51.6% (SUVmax) to 87.5% (heart SUVmax) for predicting high liver damage. The representative 99mTc-GSA SPECT/CT images of low and high liver damages were shown in Figs. 1 and 2, respectively.

Table 4 The abilities of HH15, LHL15 and SUV-related parameters to predict high liver damage
Fig. 1
figure 1

A 74-year-old male patient with hepatocellular carcinoma (HCC) and low liver damage. An early-phase enhanced computed tomography (CT) image (a) shows enhancement of the HCC. A 99mTc-GSA planar image (b) obtained 15 minutes after the injection of 99mTc-GSA shows hepatic uptake but no retention of 99mTc-GSA in the cardiac blood pool. A 99mTc-GSA SPECT/CT image (c) reveals homogeneous 99mTc-GSA uptake in the liver. The receptor index (LHL15), clearance index (HH15), liver maximum standardized uptake value (SUVmax), liver SUVmean, metabolic liver volume (MLV), total lesion 99mTc-GSA uptake (TL GSA), heart SUVmax and SUVmax ratio were 0.54, 0.95, 77.1, 46.76, 799.1 cm3, 37364.5, 4.3 and 17.76, respectively. All parameters failed to meet the threshold for high liver damage, and the patient underwent liver resection to treat HCC

Fig. 2
figure 2

A 58-year-old female patient with hepatocellular carcinoma (HCC) and high liver damage (Damage type C). A 99mTc-GSA planar image (a) obtained 15 minutes after the injection of 99mTc-GSA depicts the retention of 99mTc-GSA in the cardiac blood pool with hepatic and splenic uptake. The 99mTc-GSA single photon emission computed tomography (SPECT)/computed tomography (CT) image (b) shows heterogeneous 99mTc-GSA uptake in the liver. The receptor index (LHL15), clearance index (HH15), liver maximum standardized uptake value (SUVmax), liver SUVmean, metabolic liver volume (MLV), total lesion 99mTc-GSA uptake (TL GSA), heart SUVmax and SUVmax ratio were 0.72, 0.85, 55.2, 35.0, 567.9 cm3, 19888.7, 14.05 and 3.93, respectively. All these parameters except for liver SUVmean met the respective thresholds for high liver damage. CT hepatic arteriography (c) shows the enhanced tumors, and the patient underwent transcatheter arterial chemoembolization (TACE) to treat HCC

The AUCs of the HH15, LHL15, heart SUVmax and SUVmax ratio were significantly higher than those of the liver SUVmax, liver SUVmean and TL GSA (p < 0.05 for all). No significant differences in the AUCs was observed between other pairs of 7 parameters (p ≥ 0.05 each).

Prediction of high liver damage using machine learning (Table 5)

On the model without feature selection, the 12 features including 4 clinical and 8 quantitative 99mTc-GSA features were used as the input data. The AUC of each classifier for predicting high liver damage in the absence of a PCA was 0.904 for SVM (p < 0.001) and 0.939 for RF (p < 0.001). No significant differences in the AUCs were observed between SVM and RF (p ≥ 0.05 each). The sensitivity, specificity and accuracy were 92.9%, 73.0% and 77.3% for SVM, and 92.9%, 85.0% and 86.7% for RF, respectively. On the other hand, the AUC of each classifier for predicting high liver damage in the presence of a PCA was 0.839 for SVM (p < 0.001) and 0.896 for RF (p < 0.001). No significant differences in the AUCs were observed between SVM and RF (p ≥ 0.05 each). The sensitivity, specificity and accuracy were 78.6%, 84.0% and 82.8% for SVM, and 89.3%, 78.0% and 80.4% for RF, respectively.

Table 5 Classification performance for predicting the high liver damage

On the model with feature selection, the following 5 best features which were ranked using the decrease in Gini impurity (Supplemental Table 1) were used as the input data; Heart SUVmax, SUVmax ratio, LHL15, HH15, and background liver disease. The AUC of each classifier for predicting high liver damage in the absence of a PCA was 0.942 for SVM (p < 0.001) and 0.933 for RF (p < 0.001). No significant differences in the AUCs were observed between SVM and RF (p ≥ 0.05 each). The sensitivity, specificity and accuracy were 89.3%, 89.0% and 89.1% for SVM, and 92.9%, 85.0% and 86.7% for RF, respectively. On the other hand, the AUC of each classifier for predicting high liver damage in the presence of a PCA was 0.945 for SVM (p < 0.001) and 0.956 for RF (p < 0.001). No significant differences in the AUCs were observed between SVM and RF (p ≥ 0.05 each). The sensitivity, specificity and accuracy were 85.7%, 91.0% and 89.8% for SVM, and 96.3%, 90.0% and 91.4% for RF, respectively.

On both models with or without feature selections, we did not observe significant differences in the AUCs between each classifier and four of the quantitative 99mTc-GSA parameters (HH15, LHL15, heart SUVmax and SUVmax ratio; p ≥ 0.05). No significant differences were also observed either absence or presence of the PCA. However, the AUC of each classifier was significantly higher than that of the remaining three parameters (liver SUVmax, liver SUVmean and TL GSA; p < 0.05) on both model with or without feature selections, and these significances were also observed either absence or presence of the PCA. The RF with 5 selected features and PCA model obtained the best performance for predicting high liver damage (AUC = 0.956, sensitivity = 96.3%, specificity = 90.0%, accuracy = 91.4%).

Discussion

Although the LHL15 and HH15 are the most common parameters used in liver function assessments, these indices are calculated from planar images and do not provide sufficient anatomical information to enable an accurate assessment [7]. SPECT/CT generates image voxels in units of radioactivity per volume (i.e., kBq/ml) based on the results of an algorithm of the CT-based attenuation correction and scatter correction [31]. This technique differs from previous nuclear imaging techniques such as planar scintigraphy, SPECT or non-quantitative SPECT/CT, which generate imaging units in counts per second. Quantitative SPECT/CT allows the normalization of lesion radioactivity with respect to the injected radioactivity and thus yields quantitative parameter values such as the SUV [32, 33]. In this context, we evaluated the utility of an SUV-related 99mTc-GSA SPECT/CT analysis of liver function. To our knowledge, only one other report has discussed the use of the SUV obtained during 99mTc-GSA SPECT/CT to evaluate liver fibrosis [14], and the authors reported that the SUVmean was a highly accurate predictor of severe liver fibrosis.

In our study, patients with high liver damage had a significantly lower LHL15, liver SUVmax, liver SUVmean, TL GSA and SUVmax ratio and a significantly higher HH15 and heart SUVmax than patients with low liver damage. Three indices (HH15, LHL15 and SUVmax ratio) yielded a high AUC (> 0.90), while the heart SUVmax most accurately differentiated between low and high liver damage. These results suggest that in addition to the previously reported quantitative parameters HH15 and LHL15, SUV-related parameters derived via 99mTc-GSA SPECT/CT can be used to evaluate liver function. One advantage of the SUVmax over previously reported quantitative analyses is the ease of calculation, as this value can be determined by simply placing a VOI within the liver or heart without introducing inter- or intra-observer variability.

Some recent studies have proposed classification methods based on machine learning approaches such as SVM and RF [17,18,19]. SVM is commonly considered superior for classification, as it provides optimal results based on existing information [34]. RF is a very effective model-free approach in terms of feature screening and classification [26]. Sakai et al. examined the usefulness of a machine learning approach by using the radiomics features of digital breast tomosynthesis to automatically classify benign and malignant breast lesions. In that study, the adoption of SVM as a classifier yielded the highest classification performance (AUC = 0.798) [35]. Ahn et al. used a machine learning approach to examine the prognostic value of FDG-PET-based radiomics in patients with non-small cell lung cancer; in that study, the RF model best predicted disease recurrence (AUC = 0.956) [18]. Rajula et al. [36] reviewed the difference between machine learning and conventional statistical method in healthcare, and reported that one main difference between machine learning and conventional statistical method lies in their purposes, and machine learning is focused on making predictions as accurate as possible, while conventional statistical models are aimed at inferring relationships between variables. They also reported that the benefits of machine learning comprise flexibility and scalability compared with conventional statistical approaches, which make it deployable for several tasks, such as diagnosis and classification. To our knowledge, however, no study has previously investigated the efficacy of a quantitative analysis of 99mTc-GSA SPECT/CT data via a machine learning approach for the assessment of liver function.

In our study, the adoption of RF with 5 selected features and PCA model yielded the best liver damage classification performance, although no significant differences were observed between the AUCs of each classifier and four quantitative 99mTc-GSA parameters (HH15, LHL15, heart SUVmax and SUVmax ratio; p ≥ 0.05 for all). Even though these four quantitative 99mTc-GSA parameters showed high diagnostic performance for discriminating between low and high liver damage, it might be difficult to correctly classify low and high liver damage by using only one each parameter. Machine learning algorithms have the ability to analyze various data types (for instance, imaging data and laboratory findings) and integrate them into predictions for diagnosis [37], and our findings indicate that the machine learning approach with combining information on clinical and quantitative 99mTc-GSA features has the potential to correctly predict the liver function.

This study had some limitations that warrant discussion. First, this was a retrospective study with a relatively small sample, and case-selection bias was unavoidable. Our patient population included only a few patients with high liver damage because those with severe dysfunction do not usually receive 99mTc-GSA SPECT/CT evaluations, when therapeutic surgeries or liver tumor interventions are considered. Stratified sampling was used to overcome these imbalanced data, and cross-validation, feature selection and PCA were performed to minimize the negative influence of overfitting in this study. Although internal validation showed high diagnostic performance in RF with selected features and PCA model, the lack of external validation limits the generalizability of our result. Thus, a training/test scheme, which would require a large number of samples, would be preferable for the validation of classifiers, and a prospective study of a much larger population with external validation test is needed to validate and confirm our findings. Second, we did not examine the relationship between the post-therapeutic liver function and pretreatment 99mTc-GSA SPECT/CT parameters due to differences in the treatment procedures. A follow-up study is needed to assess the predictive value of the machine learning approach based on 99mTc-GSA SPECT/CT with respect to clinical outcomes.

In conclusion, a machine learning approach based on clinical and quantitative 99mTc-GSA SPECT/CT parameters might be useful for predicting liver function.