Introduction

In the latest version of the Liver Imaging Reporting and Data System (LI-RADS version 2018 [v2018]) [1], the LR-M category is used in at-risk patients to describe hepatic lesions which are probably or definitely malignant, but not specific for hepatocellular carcinoma (HCC). Accounting for approximately 36% and 30% of LR-M lesions [2,3,4], HCC and intrahepatic cholangiocarcinoma (ICC) are the two leading histologic tumor types within this category; less common etiologies include combined hepatocellular-cholangiocarcinoma (c-HCC-CCA, about 10% of the LR-M lesions), metastatic tumors, and others [2,3,4,5].

Differentiation between these primary liver malignancies is paramount in terms of treatment, prognosis, and survival. Specifically, the appropriateness of liver transplantation, optimal locoregional therapies, and chemotherapy choices differs substantially between cholangiocarcinoma-containing tumors (M-CC) and non-cholangiocarcinoma-containing tumors (non-M-CC) [6,7,8,9,10]. However, because of the overlapping risk factors, imaging characteristics, and clinical features between these entities within the LR-M category [6, 9,10,11,12], the diagnosis of LR-M lesions relies on histopathologic confirmation before treatment, which adds to costs of care and may give rise to undesired biopsy-related complications and treatment delays [1, 13].

To address this diagnostic dilemma, investigations have assessed the use of specific imaging features for distinguishing between different types of primary hepatic malignancies [14,15,16,17,18,19,20,21]. However, most of these studies are retrospective case-control in design without exclusive focus on LR-M lesions, instead including hepatic lesions representing a heterogenous spectrum of diagnostic categories including LR-4, LR-5, LR-M, and LR-TIV. In addition, on review of the literature, we have found no study to date exploring the combined utility of imaging features and serum tumor markers in the assessment of patients with LR-M lesions, which could provide valuable complementary information [6, 9, 10, 22, 23]. In real-world settings, tumor markers are readily available and much less costly than imaging, but their interpretation in patients with LR-M lesions is not well established.

Therefore, the purpose of this study was to develop diagnostic models for M-CC incorporating predictive serum tumor markers and LI-RADS v2018 imaging features on gadoxetate disodium–enhanced magnetic resonance imaging (EOB-MRI) in at-risk patients with LR-M lesions.

Materials and methods

This single-center study was approved by our institutional review board, and the requirement for informed consent was waived because we retrospectively used data from a prospectively collected observational cohort (Clinical trial registration No. ChiCTR1900026668). All authors had access to the study data and reviewed and approved the final manuscript.

Patient sample

Between July 2015 and September 2018, consecutive adult at-risk patients with cirrhosis and/or chronic hepatitis B virus infection who underwent 3.0-T EOB-MRI for suspicious liver lesions detected by ultrasound and/or computed tomography were prospectively enrolled at our tertiary care hospital. According to the routine standard clinical practice of our center, EOB-MRI is usually performed in at-risk patients only when at least one suspicious liver lesion was detected by surveillance ultrasound and/or computed tomography performed for another indication. The diagnosis of cirrhosis was confirmed according to Chinese guidelines on the management of liver cirrhosis [24]. A study coordinator with 6 years of experience in liver imaging retrieved and de-identified the images, then reviewed and marked all measurable hepatic lesions with reference to the clinical reports. Exclusion criteria decided by the coordinator were as follows: any previous treatment for the hepatic lesion and MR imaging of insufficient quality for diagnosis.

All MR images were then independently reviewed by three abdominal radiologists (H.J., Y.Q., and X.L.) blinded to the final diagnosis with 5, 5, and 10 years of experience in liver imaging, respectively, using LI-RADS v2018. All patients in whom at least 2/3 radiologists identified LR-M lesions were included for further analyses. Patients were subsequently excluded at this level for co-existing LR-4 and/or LR-5 lesions because there was no way to separate the contributions to serum tumor marker levels from the LR-M lesions vs. the co-existing LR-4 and/or LR-5 lesions; no conclusive histopathologic result for the LR-M lesion within 1 month after initial MRI examination; no serum tumor marker results of interest within 14 days prior to histopathologic examinations. Of note, patients assigned as LR-TIV contiguous with LR-M lesions were included. Tumor markers of interest included α-fetoprotein (AFP), carcinoembryonic antigen (CEA), and carbohydrate antigen (CA) 19-9.

Of 308 patients who met the initial inclusion criteria, 263 were ultimately excluded (Fig. 1). The final study cohort included 45 patients (37 males; mean age 52.6 ± 9.6 years).

Fig. 1
figure 1

Flow diagram of study sample selection

MRI protocols

All patients underwent EOB-MR examinations on a MAGNETOM Skyra 3.0-T MR scanner (Siemens Healthineers). MR sequences included the following: T2-weighted imaging; in- and opposed-phase T1-weighted sequence; diffusion-weighted imaging; and T1-weighted sequences before and after intravenous injection of EOB in the arterial phase, portal venous phase, transitional phase, and hepatobiliary phase. At the rate of 2 mL/s, 0.025 mmol/kg of EOB (Primovist; Bayer AG) was injected. Detailed MR protocols are presented in Supplementary Material 1.

Image analysis

For each included liver lesion, the same three reviewers as above independently evaluated the presence or absence of all LI-RADS v2018 major, ancillary, LR-M, and LR-TIV features (except those related to growth or US visibility, since those data were not available in the originating study database). For analyses requiring a single assessment, consensus was established by 2/3 majority based on the independent assessments.

All subsequent analyses were performed on a per-patient basis, and in patients with multiple LR-M lesions, the largest lesion was selected for feature-related analyses.

Reference standard

Histopathologic data, including hepatectomy and biopsy results, from routine report were used as the reference standard for determining tumor type. As a standard practice procedure, all specimens were reviewed by two independent pathologists (with 4 years and over 20 years of experience in liver oncology, respectively) who were aware of the clinical and imaging data, with disagreements resolved by discussion and consensus. All histopathologic diagnoses of the liver lesions were established according to the World Health Organization Classification [25].

For our analysis, as reported by and Park and colleagues [18], ICCs and c-HCC-CCAs were classified as M-CCs, while the remaining LR-M histologic types were classified as non-M-CCs.

Statistical analysis

Differences were compared with Student’s t test or the Mann-Whitney U test for continuous variables, and with χ2 test or Fisher’s exact test for categorical variables, where applicable.

Feature selection and diagnostic models

Serum tumor markers and LI-RADS v2018 imaging features were selected according to stepwise algorithms to generate diagnostic models for M-CC on a per-patient basis as detailed below.

First, LI-RADS v2018 imaging features were selected based on the following: inter-rater agreement with Fleiss kappa ≥ 0.2; and prevalence of the feature in the dataset of at least 10% but no greater than 90%. Of note, features either too rare or too common were excluded considering their limited statistical utilities in differentiating between different LR-M entities to minimize potential model over-fitting. Along with four controlling factors (patient age, sex, underlying cirrhosis [present vs. absent], and lesion size), the remaining imaging features were entered into a multivariate logistic regression analysis using the stepwise method to construct a radiological model (Model R) for M-CC.

Second, for individual serum tumor markers, we used receiver operating curve analysis to evaluate their respective diagnostic performances and optimal threshold values for M-CC.

Finally, all three serum tumor markers and the imaging features meeting the inter-rater agreement and prevalence criteria described previously were entered into a multivariate logistic regression analysis to generate a fusion model (Model F) with adjustment for the same controlling factors.

Performances of the diagnostic models

Performances of the diagnostic models for M-CC were evaluated using area under the receiver operating curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, and accuracy. McNemar’s and the Delong tests were used to compare pairwise sensitivities, specificities, and AUCs, where applicable.

All statistical analyses were performed with statistical software (R version 3.3.1; R Foundation for Statistical Computing; or SPSS version 25. IBM). P value of less than 0.05 was considered statistically significant.

Results

Patients

Key characteristics of the included patients and liver lesions are summarized in Table 1 and Supplementary Material 2. In total, 88.9% (40/45) of the included patients were chronic hepatitis B virus carriers and 26.7% (12/45) had established cirrhosis. Respectively, 46.7% (21/45) and 53.3% (24/45) patients were diagnosed with M-CC and non-M-CC.

Table 1 Clinical-pathologic characteristics of patients and liver lesions

Among all included patients, 60.0% (27/45) had single LR-M lesions, while 40.0% (18/45) had multiple LR-M lesions (number of lesions, 2–8 per patient). Of 113 total observed LR-M lesions (median size, 17 mm; interquartile range, 10–53 mm), the 45 largest lesions were included in subsequent analyses (median size, 59 mm; interquartile range, 42–83 mm). No significant difference in size of the dominant lesions between patients with M-CC and non-M-CC (p = 0.55) was observed.

M-CCs were associated with significantly higher levels of CA19-9 (39.3 U/mL vs. 15.9 U/mL, p = 0.043) compared with non-M-CCs. However, there were no significant differences in AFP (9.2 ng/mL vs. 4.3 ng/mL, p = 0.553) or CEA levels (2.3 ng/mL vs. 2.2 ng/mL, p = 0.776, Table 1). Among M-CCs, significantly higher levels of AFP were observed among patients with c-HCC-CCA than those with ICC (504.3 ng/mL vs. 4.9 ng/mL, p < 0.001, Supplementary Material 2).

Diagnostic models and performance

Radiological model

According to the selection algorithm detailed in Fig. 2 and Table 2, LI-RADS v2018 imaging features were removed based on low inter-rater agreement (n = 22) and prevalence less than 10% or greater than 90% (n = 3). Three features were left after these two steps, but “mosaic architecture” and “tumor in vein” dropped out of the model at multivariate logistic regression analysis. Therefore, only the feature “blood products in mass” was eventually included in Model R, which demonstrated substantial inter-rater agreement (Fleiss kappa = 0.71), was present in 36% (16/45) of lesions, and had an odds ratio of 0.053 (p = 0.006) for M-CC. Model R (no blood products in mass) had an AUC of 0.655, sensitivity of 81% (17/21), specificity of 50% (12/24), and accuracy of 64% (29/45).

Fig. 2
figure 2

Workflow of generating the diagnostic models for LR-M

Table 2 Results and distributions of LI-RADS v2018 imaging features of the largest lesion per patient using consensus data

Fusion model

Ultimately, Model F was constructed with three predictors: no blood products in mass; AFP > 4.8 ng/mL; and CA19-9 > 38 U/mL (Fig. 2). A total of 18% (8/45), 42% (19/45), 96% (43/45), and 4% (2/45) patients had all three, two or more, one or more, and none above predictors present, respectively. The corresponding risks of M-CC were 98%, 86%, 49%, and 1% in these categories, respectively (Table 3).

Table 3 Risk of M-CC according to Model F

The overall AUC, sensitivity, specificity, and accuracy of Model F for M-CC were 0.862, 76% (16/21), 88% (21/24), and 82% (37/45), respectively. Detailed performance characteristics for all models are summarized in Table 4.

Table 4 Per-patient performances of the diagnostic models for M-CC using consensus data

Comparisons between model performances

Model F demonstrated significantly superior AUC (0.862 vs. 0.655, p = 0.003), specificity (88% vs. 50%, p = 0.002), and similar sensitivity (76% vs. 81%, p = 0.706) compared with Model R (Table 4).

The performances of Model F variants using specific numbers of present predictors are summarized in Table 4. The variant using two or more predictors present performed significantly better than the variants using three predictors present (p = 0.047) or one or more predictors present (p < 0.001) based on AUC.

Representative cases of non-M-CC and M-CC are shown in Figs. 3 and 4, respectively.

Fig. 3
figure 3

Gadoxetate disodium–enhanced MR images of a 53-year-old male with chronic hepatitis B. Precontrast images (a) showed a 7.2-cm hypointense LR-M mass predominantly in segment IV with no “blood products in mass.” The mass demonstrated rim arterial phase hyperenhancement (b), mild central portal venous phase enhancement (c), hepatobiliary phase hypointensity (d), and non-targetoid restriction on diffusion-weighted images (e, b = 1200 s/mm2). The patient’s serum CA19-9 and AFP levels were 8.46 U/mL and 3.44 ng/mL (one predictor present), respectively, so a non-M-CC diagnosis was established by Model F. The mass was histopathologically proven as hepatocellular carcinoma, with a representative section with hematoxylin-eosin staining at × 100 magnification shown (f)

Fig. 4
figure 4

Gadoxetate disodium–enhanced MR images of a 46-year-old male with chronic hepatitis B. Precontrast images (a) showed a 5.2-cm hypointense LR-M mass in segment II and IV with no “blood products in mass.” The mass demonstrated rim arterial phase hyperenhancement (b), central portal venous phase enhancement (c), hepatobiliary phase hypointensity (d), and non-targetoid restriction on diffusion-weighted images (e, b = 1200 s/mm2). The patient’s serum CA19-9 and AFP levels were 42.65 U/mL and 46.07 ng/mL (three predictors present), respectively, so a M-CC diagnosis was established by Model F. The mass was histopathologically proven as intrahepatic cholangiocarcinoma, with a representative section with hematoxylin-eosin staining at × 100 magnification shown (f)

Discussion

Despite remaining a challenging area in liver imaging, the prediction of histological tumor type for LR-M lesions may be enhanced by the assessment of serum tumor markers. By analyzing 45 at-risk patients with LR-M lesions, we found that combining CA19-9 value > 38 U/mL and AFP > 4.8 ng/mL with the LI-RADS v2018 feature no “blood products in mass” on EOB-MRI was the best available model for detecting M-CC lesions (sensitivity of 76%, specificity of 88%). When using three predictors present as the cutoff value, the model provided excellent specificity (100%) for M-CC lesions, though with reduced sensitivity (38%).

Although substantial overlap was found between the imaging features of M-CC and non-M-CC tumors, “blood products in mass” was identified as a significant predictor favoring non-M-CC in our study. Defined as an ancillary feature “favoring HCC in particular” by LI-RADS v2018 [1], presence of “blood products in mass” is suggestive of rapid growth of hypervascular tumors which are prone to hemorrhage [26]. This feature has been shown useful in differentiating between HCC and non-HCC malignancies in at-risk patients in other studies [19, 27]. In contrast, most M-CCs are characterized by prominent desmoplastic and hypovascularized tumor stroma [7, 9], therefore rarely presenting with internal hemorrhage [26].

Previous works have reported promising performances of LR-M imaging features, in particular targetoid appearance, for distinguishing between HCC and other liver malignancies in general at-risk patients, though not necessarily those with LR-M lesions specifically [14,15,16,17,18,19,20,21]. Other imaging features, including capsule appearance, intralesional fat, peritumoral bile duct dilatation, surface retraction, septum, and T2 hyperintense foci, have been variably found useful in some publications but not others [15, 18, 20, 21]. However, without being framed specifically in the context of LR-M lesions, these findings can be difficult to apply. Our results demonstrated very limited incremental value of any of the individual LR-M imaging features in discriminating between different tumor types within the LR-M category. It was noteworthy that, to avoid substantial model over-fitting, a large proportion of features (79%) were excluded due to poor to slight inter-rater agreement. Nevertheless, agreement levels on LI-RADS imaging features were generally higher in previous studies [14,15,16,17,18,19,20,21]. This discrepancy could have been a result of the limited sample size of the current study. Additionally, rather than evaluating agreement between two reviewers [14,15,16,17,18,19], agreement was measured between three independent reviewers, and this difference in design might also have introduced more prominent inter-rater variability.

While most literature attempting to differentiate histological subtypes of LR-M lesions have focused on imaging features exclusively, we found that integrating the serum tumor markers CA19-9 and AFP into a diagnostic model can significantly aid in this task. Similar to our results, CA19-9 has been reported to be associated with cholangiocarcinoma in previous studies. Prior work conducted on healthy populations and those with chronic liver diseases and/or primary sclerosing cholangitis has found CA19-9 helpful in diagnosing cholangiocarcinoma, and the most commonly reported threshold values have been approximately 37 U/mL or 100 U/mL [22, 23, 28]. The CA19-9 threshold value derived in our study was 38 U/mL, relatively lower than that in published literature. This could be in part attributable to our specific patient population (LR-M patients with chronic hepatitis B virus infection). Another possible explanation is that other publications include perihilar and extrahepatic cholangiocarcinomas while LR-M lesions are typically peripheral, and the severe biliary obstruction often associated with central lesions typically elevates CA19-9 values. Notably, the diagnostic role of CA19-9 in isolation remains a matter of debate due to overlap with other benign and malignant diseases [9]. However, because patients with LR-M lesions have a very high risk of malignancy relative to other populations [4], the use of CA19-9 in differentiation between M-CC and non-M-CC is more reliable.

As the most widely utilized tumor marker in patients with HCC, an elevated AFP level is a risk factor for HCC [13, 29, 30]. Interestingly, although AFP values greater than 8–200 ng/mL have been reported indicative of HCC in at-risk patients [18, 29, 31], we found that AFP > 4.8 ng/mL was a significant predictor for M-CC, rather than non-M-CC. One potential cause for this counterintuitive result was that c-HCC-CCA, a biphenotypic tumor frequently associated with elevation of both AFP and CA19-9 [6, 20, 32], was considered as M-CC in this study due to similar treatment strategy and prognosis to ICC. In the current study, patients with c-HCC-CCA had markedly higher AFP values compared with those with ICCs (Supplementary Material 2). However, this might have skewed our model and reduced the sensitivity for the ICC M-CCs. Therefore, the utility of AFP in discriminating between different LR-M entities remains unclear, and efforts in larger scale population to confirm or refute our finding should continue to be actively pursued.

In current treatment paradigms, optimal management of patients with LR-M lesions is often dictated by the presence or absence of a cholangiocarcinoma element within the tumor [6, 8, 9, 13, 29, 30]. Therefore, we focused our models on the detection of lesions with cholangiocarcinoma components (including c-HCC-CCA) to refine the diagnosis of these challenging lesions. According to Model F, LR-M patients with all three criteria fulfilled should not undergo transplantation given 98% risk of M-CC, otherwise they may require biopsy to establish a definite diagnosis. When all three criteria are fulfilled, biopsy could be obviated in approximately 18% LR-M patients with excellent specificity. However, it should be noted that most included patients were beyond the Milan criteria [29]. Nevertheless, liver transplantation could still have been feasible after successful downstaging; hence, M-CC detection is clinically relevant for proper downstream management in this context.

Our study had several limitations. First, only 45 patients were included in this study. While this limited the statistical power of our study, LR-M lesions are relatively unusual, and it is challenging to collect larger cohorts in a single-center study. Additionally, although the sample was drawn from a consecutively enrolled prospective cohort, and feature reduction was performed before logistic regression analysis, there remains the possibility of selection bias and overfitting. Also, the relatively low prevalence of cirrhosis and high rate of hepatitis B virus infection of the enrolled patients might limit the application of our findings in a Western population. Second, all LI-RADS v2018 imaging features were evaluated based on EOB-MRI, but LI-RADS has been only minimally adapted for EOB-MRI because it was originally designed for MRI using extracellular contrast agents [33, 34]. Additionally, EOB is much less frequently used than extracellular contrast agents in Western countries. This might limit the generalizability of our model and highlighted the need for further assessment based on extracellular contrast agent–enhanced MRI. Third, patients with concomitant LR-4 and/or LR-5 lesions were not included in the current study so as to concentrate our scope on patients who had LR-M lesions exclusively and to eliminate the confounding effects of LR-4/5 lesions on serum tumor markers. However, excluding these patients could have restricted our model generalizability. Fourth, in patients with multiple LR-M lesions, the largest lesion for each patient was selected for analysis and assumed to be the major driver of the tumor markers. However, we did not have a rational way to account for the effect of multiple lesions on these lab results. Finally, due to the single-center nature of this study, no external validation was available to test and refine our models. Therefore, although we found serum tumor markers valuable in our clinical setting, the findings still mandate further validation.

In conclusion, a model integrating CA19-9, AFP, and the LI-RADS feature “blood products in mass” achieved high diagnostic performance for cholangiocarcinoma-containing tumors (M-CCs) in at-risk patients with LR-M lesions. When all three criteria were fulfilled, the specificity for M-CC was 100%, which might reduce or eliminate the need for biopsy confirmation.