Introduction

Hepatocellular carcinoma (HCC) is the second most common cause of cancer-related death in the world, and it is a major global public health problem with high incidences in Asia and growing incidences in Europe and North America [13]. Curative therapies include tumour resection, liver transplantation and loco-regional tumour ablation. However, more than 70 % of patients with HCC present with intermediate to advanced stage disease at initial diagnosis [4, 5]. In those cases, image-guided intra-arterial therapies such as transarterial chemoembolization (TACE) have become the mainstay of palliative therapy and are included in all treatment guidelines and staging systems [6, 7].

Commonly used HCC staging algorithms such as the Barcelona Clinic Liver Cancer (BCLC) system and the recently introduced Hong Kong Liver Cancer (HKLC) system rely on 1-dimensional (1D) tumour size and number of tumours to stratify patients into therapies with curative or palliative intent [1, 810]. Though 1D, diameter-based measurements are workflow efficient; however, the caveat of such simplistic tumour assessments is their inherent inaccuracy, especially when applied to tumours that are frequently inhomogeneous in shape. This clinically underappreciated limitation has been recently recognized in the framework of tumour response assessment, which resulted in the development of 3D quantitative tumour and whole liver assessment techniques [1116]. However, the ability to more accurately assess liver lesions using 3D methods has just recently been applied for staging purposes [17]. The goal of our study was to apply 3D approaches to tumour staging in the framework of tumour burden assessment. Early 3D concepts for baseline imaging are technically promising, yet further validation is needed to create a reliable and standardized assessment approach [17]. This includes evaluating the number of lesions needed to assess in 3D for sufficient tumour staging [8, 18, 19]. The purpose of our study was to compare the ability of single- vs. multi-lesion assessment on baseline MRI using 1D- and 3D-based measurements to predict overall survival (OS) in patients with hepatocellular carcinoma (HCC) before transarterial chemoembolization (TACE).

Methods and materials

Study cohort

This retrospective single-institution study was compliant with the Health Insurance Portability and Accountability Act and approved by the institutional review board. A total of 300 patients with HCC treated using TACE at our institution between January 2009 and January 2012 were identified from our prospectively collected database [20]. All patients underwent their first TACE procedure at our institution. Twenty-eight patients were excluded because of missing or insufficient MRI (no contrast agent delivered, motion artefacts, missing phases). Another 142 patients were excluded because of portal vein invasion (either intra- or extrahepatic branch invasion), which is widely recognized as a negative prognostic factor and a relative contraindication for intra-arterial therapies. Eight additional patients with infiltrative HCC were excluded due to ill-defined tumour borders. Figure 1 illustrates the process of patient selection.

Fig. 1
figure 1

Exclusion criteria flowchart. A total of 142 patients were excluded because of portal vein invasion; 28 patients were excluded because of missing or inadequate MRI results. In addition, a total of 8 patients were excluded because of infiltrative HCC

Evaluation and staging

Diagnostic criteria for HCC included biopsy or characteristic radiographic findings [1]. All patients underwent a full clinical examination including laboratory diagnostics at baseline. The Eastern Cooperative Oncology Group (ECOG) performance status was recorded in all patients. Liver function and disease stage were assessed using the Child–Pugh classification, and the BCLC and HKLC staging systems. The HKLC staging was done retrospectively. Treatment decisions were made on a case-by-case basis by a multidisciplinary liver tumour board.

MRI protocol

All patients underwent a standardized MRI liver protocol which included axial T2-weighted fast spin-echo images, axial single-shot breath-hold gradient-echo diffusion-weighted echo-planar images, and axial breath-hold unenhanced and contrast-enhanced (0.1 mmol/kg intravenous gadodiamide [Omniscan; Amersham, Princeton, NJ]) T1-weighted 3D fat-suppressed spoiled gradient-echo images in the arterial, portal venous and delayed phases (20, 70 and 180 s after intravenous contrast administration, respectively) [21]. MRI was performed on a 1.5-Tesla MRI scanner (Magnetom Avanto, Siemens Medical Solutions, Erlangen, Germany) using a phased array torso coil (repetition time ms/ echo time ms, 5.77/2.77; field of view 320–400 mm; matrix, 192 × 160; slice thickness, 2.5 mm; receiver bandwidth, 64 kHz; flip angle, 10°).

Image analysis

1D measurements of enhancing and overall tumour diameter were done independently by two board-certified radiologists (R.E.S. and R.D., with 8 years of experience in abdominal MRI, respectively), both blinded to all clinical data. All measurements were done using standardized electronic calipers on Digital Imaging in Communications and Medicine (DICOM) files and included overall tumour diameter and enhancing tumour diameter as described in the literature [2224]. The measurements of both readers were averaged for the survival analysis. Figure 2 provides an overview of all anatomic and enhancement-based methods (A and B).

Fig. 2
figure 2

MRI assessment techniques. a 1D measurements of the overall tumour diameter as shown by a fine red line. b 1D measurement of the enhancing tumour diameter as shown by a fine red line. Enhancement was defined as areas with hyperintense MR signal in the arterial phase of the T1-weighted MRI sequence that were not seen in the pre-contrast phase. c Segmentation masks (3D rendering in red) representing total tumour volume (TTV). A fine yellow line as created by semiautomatic liver segmentation illustrates the liver contour. d Quantification of the enhancing tumour volume (ETV). Red represents maximum enhancement and blue represents no enhancement, normalized by the ROI. Green box 3D ROI used as the reference background of image intensity

3D analysis (overall tumour volume and enhancing volume) was done by a radiological reader (F.N.F.) with 1 year of training in the 3D image analysis tool, supervised by a board-certified radiologist (R.E.S.) with 8 years of experience in abdominal MRI. The reader was blinded to clinical data and the results of the 1D measurements. The 3D quantitative image analysis included measurement of the overall tumour volume and the volume of enhancing tissue. The overall tumour volume was measured using a semiautomatic tumour segmentation software prototype (Medisys; Philips Research, Suresnes, France) previously described in the literature [25, 26]. A maximum of three tumours per patient (based on BCLC) were individually segmented on the arterial phase of the contrast-enhanced baseline MRI [8]. The summation of the individual three largest tumour volumes was calculated for the total tumour volume (TTV) (Fig. 2c).

The enhancing tumour volume (ETV) measurement was performed as in previous studies [11, 12]. Briefly, this was done as follows: The native T1 images were subtracted from the arterial-phase T1 images to remove the background signal. In the next step, a region of interest (ROI) formed by a 1 cm3 cube was placed in a region of extra-tumoural liver parenchyma of the subtracted data set as a reference [27]. The ROI intensity value was used as a cut-off to identify the volume of hyperenhancing voxels within the segmented tumour volume. For the calculation of ETV the three largest tumour lesions were summed up and expressed in centimetres cubed. The software automatically generated a colour map to visualize the enhancement pattern (blue representing non-enhancing, necrotic tissue and red representing viable enhancing tumour tissue; Fig. 2d). Reader-independent reproducibility of semiautomatic tumour segmentation as well as the radiological-pathological validation of 3D quantitative tumour enhancement analysis has been reported previously [25, 27].

The total liver volume was calculated using a 3D liver segmentation software prototype (Medisys; Philips Research, Suresnes, France). Briefly, the software first automatically creates a 3D liver mask that readers could then adjust using semiautomatic 3D segmentation. This included the ability to expand or contract the 3D mask around control points or define points along the liver contour. The total liver volume was calculated automatically on the basis of the segmentation and expressed in centimetres cubes (Fig. 2c).

Enhancing tumour burden (ETB [%]) was defined as the ratio of ETV [cm3] and the total liver volume [cm3] calculated using the following formula:

$$ \mathrm{E}\mathrm{T}\mathrm{B}\ \left[\%\right]=\frac{\mathrm{ETV}\ \left[{\mathrm{cm}}^3\right]}{\mathrm{Total}\ \mathrm{liver}\ \mathrm{volume}\ \left[{\mathrm{cm}}^3\right]}\times 100 $$

In order to evaluate and compare each method, specific thresholds were defined to stratify patients into two groups: high tumour burden and low tumour burden. For diameter-based measurements, the commonly used thresholds from the Milan criteria as well as the BCLC and HKLC staging systems were adopted as follows: Unifocal tumours were classified as high tumour burden when more than 5 cm in diameter. Patients presenting with multifocal tumours with either more than three tumours or with one of the tumours having greater than 3 cm diameter were also classified as having high tumour burden [28]. The definition of thresholds for TTV [cm3] and ETV [cm3] was based on the same criteria extrapolated from 1D to a volumetric threshold of ≅ 65 cm3 for unifocal HCC and 45 cm3 for multifocal HCC (Fig. 3) [17]. Because ETB [%] is a relative value, it was decided to calculate the receiver operating characteristic (ROC) curve and a threshold of 4 % was chosen for stratification between high and low tumour burden.

Fig. 3
figure 3

Definition of thresholds. Cut-offs derive from unidimensional values that were extrapolated using the equation for the calculation of spheroid volumes (see figure). For single lesions, this was 5 cm for diameter and 65 cm3 for volumes. For multiple lesions, this was 3 lesions each 3 cm in diameter and for volumes 45 cm3

Statistical analysis

Descriptive statistics were used to summarize the data in absolute numbers as well as in percentages. Mean and range were used for continuous variables. Overall survival was measured from the time of baseline MRI until to the date of death. Patients lost in follow-up, alive at the end-of-observation date (16 December 2014) or later treated surgically were censored. The inter-reader agreement for non-3D measurements was assessed using intra-class correlation coefficient (ICC). The predictive value of each parameter (TTV [cm3], ETV [cm3] and ETB [%]) was assessed by Cox proportional hazard model using the thresholds mentioned above. Kaplan–Meier survival curves were generated and plotted for each method. Survival curves were compared using the log-rank test. P values of 0.05 or less were defined as statistically significant. The prognostic value of each model was furthermore assessed by calculating the concordances. Most importantly, concordances were calculated for the one, two or three largest tumours, respectively, in order to identify the number of tumour lesions needed for an accurate assessment. All statistical analyses were performed using the statistical software R (R Foundation for Statistical Computing, Version 3.1.2, Vienna, Austria, 2014) and SPSS (IBM, Version 22, Armonk, NY, USA).

Results

Patient characteristics and clinical outcome

Table 1 summarizes baseline patient characteristics. A total of 122 patients were included into the final data analysis. Mean patient age was 62.6 ± 10.0 years. A majority of patients (N = 70, 57 %) had multifocal disease. Median OS of the study cohort was 22.4 months (95 % CI 19.8–25.0) and by the end-of-observation date (16 December 2014), a total of 75 patients (61 %) were deceased.

Table 1 Baseline patients and tumour characteristics

Image analysis

Tumour characteristics and results of 1D measurements are summarized in Table 1. A total of 296 tumour lesions were assessed using the previously mentioned 1D and 3D techniques. Of note, the agreement between the radiological readers was good for both the diameter and enhancing diameter measurements (ICC 0.829 [95 % CI 0.704–0.896] and 0.827 [95 % CI 0.731–0.886], respectively).

The 3D measurements afforded a mean TTV [cm3] of 170.39 ± 309.27 cm3 (range 1.15–2089.06 cm3) and mean ETV [cm3] of 77.60 ± 122.38 cm3 (range 0.58–732.33 cm3). Mean liver volume was 1748.66 cm3 (standard deviation [SD] 552.7 cm3) leading to a mean ETB [%] of 4.1 ± 6.1 % (range 0.04–36.4 %).

Survival analysis

The univariate analysis of baseline clinical parameters identified a correlation between Child–Pugh classification (Child–Pugh class B, hazard ratio [HR] of 2.7 [95 % CI 2.0–3.6], p < 0.001) and ECOG score (ECOG 1, HR 1.8 [95 % CI 1.1–2.9], p = 0.011) with OS. The remaining baseline parameters included in the univariate analysis (age, sex, ethnicity, tumour multiplicity, and extrahepatic tumour spread) did not demonstrate a significant correlation with OS.

For the diameter-based thresholds, the log-rank test showed a good separation of the survival curves for high and low tumour burden groups in overall (p = 0.015) and enhancing (p = 0.01) tumour diameters, respectively (Fig. 4a, b). The majority of patients were classified into the high tumour burden group (81 % for overall tumour diameter and 79 % for enhancing diameter, respectively; Fig. 5a, b) with a median OS of 21.7 months (95 % CI 20.4–27.4) when measured with overall tumour diameter and 21.6 months (95 % CI 20.2–27.3) with enhancing tumour diameter.

Fig. 4
figure 4

OS curves for the respective thresholds for utilized image assessment techniques. Kaplan–Meier analysis results are shown for each subgroup and technique on the basis of cut-off values. According to the analysis, all radiological methods provided a good separation of the survival curves (P < 0.05)

Fig. 5
figure 5

Boxplots showing overall survival for high and low tumour burden groups stratified according to each of the used method

As for the 3D quantitative techniques, all parameters (TTV, ETV and ETB) achieved separation of the high and low tumour burden groups each with a p value of less than 0.001, as illustrated by Fig. 4c–e, respectively. As shown in Table 2, stratification based on TTV classified approximately half of the cohort (52 %) as high tumour burden with a median OS of 12.7 months (95 % CI 14.8–22.8). When stratified according to enhancing volumetric thresholds only a minority of patients was classified into the high tumour burden group (39 % for ETV [cm3] and 30 % for ETB [%]). Figure 5 illustrates OS for each group and method as boxplots. In contrast to the diameter-based methods, the 3D quantitative analysis in total achieves a better separation of low and high tumour burden patients with regard to survival prediction. Especially for ETB the 25th percentile of the low tumour burden group does not overlap with the 75th percentile of the high tumour burden group, meaning that there is a very good separation according to this method.

Table 2 Statistical analysis of the median overall survival

Multivariate analysis

When we adjusted for Child–Pugh score and ECOG stage non-3D thresholds, patients in the high tumour burden group showed a hazard ratio of 2.6 (95 % CI 1.2–5.6, p = 0.012) for overall diameter thresholds and an HR of 3.0 (95 % CI 1.5–6.3, p = 0.003) for enhanced tumour diameter. HR for TTV [cm3] and ETV [cm3] were 5.5 (95 % CI 3.1–9.6, p < 0.001) and 5.2 (95 % CI 3.1–8.8, p < 0.001), respectively. The highest HR was shown for a high tumour burden group stratified according to ETB [%], 6.6 (95 % CI 3.7–11.5, p < 0.001). While excluding the other methods from the equation, the stepwise forward selection of the radiological measurements identified ETB [%] as the single most predictive parameter for OS.

Predictive value of radiological methods and number of tumour lesions

The assessment of the predictive value of ETV [cm3] and ETB [%] showed good concordances of 0.796 (standard error [SE] 0.0379, 95 % CI 0.721–0.870), respectively. In contrast, 1D evaluation only reached concordances of 0.672 (SE 0.0379, 95 % CI 0.598–0.746) for overall and 0.694 (SE 0.0379, 95 % CI 0.619–0.768) for enhancing diameter.

Of note, the difference between predictive values of ETB [%] when assessed for only 1 (0.759), 2 (0.786) and 3 (0.793) largest tumour lesion is below two times the SE and therefore not significant.

Discussion

Our study has two main findings: First, the concept of enhancing tumour burden is a highly predictive prognostic factor for patient survival, which methodologically outperformed the commonly used 1D measurements. Second, 3D quantitative evaluation of the dominant liver tumour lesion reliably predicts survival in patients with multifocal HCC who will undergo chemoembolization, thus rendering multi-tumour assessment clinically dispensable.

The results of our study confirm three hypotheses: First, a high volume of viable tumour as represented by enhancing tissue has a negative impact on survival. In addition, the concept of enhancing tumour burden, which puts viable tumour volume in proportion to liver volume, is capable of reflecting the true and individual extent of the disease. Lastly, patient survival is highly dependent on the extent of the dominant liver lesion. Overall, the role of radiological assessment of HCC for diagnosis and staging cannot be overestimated and radiologically determined factors, such as size and number of tumour lesions, are central components prior to initial therapeutic assessment [29]. With regard to the BCLC staging system, many patients are situated on the edge between early and intermediate stages and one must therefore rely on imaging findings for treatment decisions between curative or palliative treatment options. Several new staging systems are currently being developed with the goal of addressing this practical issue in a more reliable way. On the one hand, an ideal HCC staging system should include state-of-the-art image assessment techniques in order to offer a more explicit and clear decision between curative and palliative therapy algorithms. On the other hand, the application of any additional image analysis instrument should be workflow-efficient and compatible with clinical routine of diagnostic radiology. The 3D quantitative evaluation of the dominant HCC lesion as described and validated in our work may do just that by fulfilling both aforementioned requirements.

The herein presented results demonstrate a statistically significant separation of survival curves for both diameter and volumetric measurements on baseline MRI. With regard to the diameter-based measurements, these results are consistent with existing data from other centres that showed the rationale behind diameter-based tumour evaluation in the past [8, 18, 30]. However, our data shows that stratification based on volumetric thresholds and especially ETB [%] is superior to any other tested radiologic method in terms of separation of survival curves, concordances and hazard ratios. An implication of this result is that diameter-based methods classified more patients with a high tumour burden as compared to the 3D quantitative techniques, thus suggesting a palliative approach for a comparably larger part of the patient cohort. This finding might be of particular clinical interest beyond interventional oncology and confirms the recent trend towards a wider indication for liver transplantation, as suggested by the recently introduced HKLC staging system [9]. It can thus be assumed that the commonly used imaging cut-offs as propagated within the BCLC staging systems may very well lead to a detriment for patients with HCC through an overestimation of the stage of their disease. This assumption is supported by data from other studies that evaluated diameter-based measurements in large tumour patient cohorts [18, 31, 32].

Volumetric tumour assessment is based on fairly new and clinically underrepresented technologies. A variety of fundamentally different approaches have been recently tested in order to shape the growing role of volumetric image analysis for specific clinical settings. One such study assessed the predictive value of volumetrically measured tumour burden for HCC recurrence after liver transplantation. Based on the assumption that HCC lesions are ellipsoid rather than spherical, this retrospective study calculated tumour burden on the basis of the overall tumour diameter by extrapolating the volumes using the ellipsoid formula [32]. However, this approach is limited because it neglects pathological heterogeneity in tumour shape and structure.

Another recently published study acknowledged the role of tumour biology for the assessment of tumour burden and expanded tumour assessment beyond diameter-based tumour size in the setting of intra-arterial therapies. The authors applied a volumetric assessment of tumour burden using apparent diffusion coefficient (ADC) mapping and concluded that changes in post embolization ADC values reflect patients’ tumour burden reliably [33]. However, a fundamental weakness of this study is the absence of conventional, diameter-based measurements as a reference. Moreover, the authors did not explore clinically applicable cut-offs and applied a segmentation technique that lacks radiological-pathological validation while using a less reliable, fully-automated algorithm.

Our previously published attempts to validate the use of 3D quantitative imaging for HCC staging identified the 65 cm3 threshold as a reliable cut-off value to stratify patient survival according to the enhancing tumour volume of the dominant HCC lesion. However, this study did not explore the emerging role of the liver volume and focused on the single dominant index lesion. The current study addresses both issues by showing that 3D assessment of all tumour lesions achieves the most predictive value. However, the difference in concordances between assessing the dominant lesion only vs. assessing multiple lesions remained below statistical significance and might be clinically dispensable. This result has a broad relevance and may increase the clinical practicability of a 3D tumour assessment. The herein described methodology introduced staggered thresholds, thus allowing for the prediction of survival in patients with smaller tumours. Moreover, the holistic concept of enhancing tumour burden that includes liver volume proved to be a superior model for the prediction of OS when compared to any other assessed radiological method.

There were several limitations to our study. First, it is a single-institutional retrospective analysis in patients with HCC treated with chemoembolization only. This may have caused selection bias and furthermore limits the applicability of the data with regard to early-stage disease. However, because TACE is the most commonly used treatment for unresectable HCC, our study may provide a more standardized approach for an interventional oncology setting. The choice of thresholds in this study is an additional limitation. An ideal staging system would separate very small single tumours of less than 2 cm in order to stratify patients into very early tumour stage as a separate group [8]. Because of the medium-sized cohort with mainly intermediate and advanced stages and tumour sizes, our study lacks this step and thus does not offer a sufficient subgroup analysis. The recruitment of patients with advanced-stage HCC (BCLC C) for loco-regional therapies and subsequent image analysis is in agreement with the clinical reality, which frequently deviates from the BCLC treatment recommendations. In light of the recently introduced HKLC staging system which propagates intra-arterial therapies in patients with greater tumour burden (mostly BCLC C), the herein proposed 3D imaging markers can be used across tumour stage and were thus validated for both BCLC B and C patients.

In summary, 3D quantitative assessment of the enhancing tumour burden as represented by the largest HCC lesion is a stronger predictor of survival as compared to diameter-based measurements. Assessing multiple tumour lesions provides no substantial improvement in predicting OS than evaluating the dominant lesion alone.