Introduction

To standardize the performance of liver imaging in patients at risk for hepatocellular carcinoma (HCC) as well as interpretation and reporting of the results, the Liver Imaging Reporting and Data System (LI-RADS) was introduced in 2011 [1], recently updated in 2018 [2], and fully integrated into the American Association for the Study of Liver Diseases (AASLD) 2018 HCC clinical practice guidance [3]. LI-RADS assigns categories to liver observations based on the presence of major and ancillary imaging features [2]. These categories reflect the relative likelihood of benignity and HCC (i.e., LR-1 to LR-5), as well as the differentiation from other malignancies (LR-M). The major features (MFs) include size, non-rim arterial-phase hyperenhancement, non-peripheral washout, enhancing capsule, and threshold growth, which are used in the category assignment of LR-3, LR-4, and LR-5 observations, and the diagnosis of HCC currently relies on MFs alone [4]. In contrast to the MFs, ancillary features (AFs) may optionally be used to improve detection, increase confidence, or modify the category after the application of the MFs; i.e., AFs can upgrade LR category by one up to LR-4, but cannot be used to upgrade LR category to LR-5 [5].

Several rules for AF application are provided in LI-RADS v2018 [2]. However, the decision on whether to apply AFs when assigning a category is left to the radiologist’s discretion [2]. Although this flexibility may encourage wide adoption of LI-RADS by minimizing its complexity [6], it may also result in a variable range of change in the LI-RADS category assignments, i.e., 20.5–97.1% of LR-3 being upgraded to LR-4 [5, 7,8,9]. Recently, a few studies have reported that the use of AFs in combination with the MFs could increase the diagnostic performance for HCC, but they did not elucidate how to apply the AFs when assigning a category [5].

Therefore, the purpose of this study was to determine both the frequency of occurrence of each AF in the LI-RADS v2018 and its strength of association with HCCs, and to develop an appropriate strategy for applying AFs to improve the diagnosis of HCC ≤ 3 cm on gadoxetate disodium–enhanced magnetic resonance imaging (MRI) through the use of a historical cohort study.

Materials and methods

Our institutional review board approved this retrospective study, and the requirement for informed consent was waived.

Study subjects

From January to December 2016, 3854 patients at risk for HCC underwent surveillance ultrasound (Fig. 1). According to the AASLD practice guidance [3], patients who had hepatic nodules ≥ 1 cm in diameter detected on ultrasound were referred for further evaluations including dynamic computed tomography or MRI. A total of 746 patients underwent gadoxetate disodium–enhanced MRI for the evaluation of suspicious nodules detected during ultrasound surveillance. Patients were included according to the following criteria: (a) focal hepatic solid nodules on MRI, (b) nodule size ≤ 3.0 cm, (c) number of nodules ≤ 5, and (d) the nodule not being definitely or probably benign (LR-1 or LR-2) on MRI, such as a cyst, hemangioma, perfusion alternation, hepatic fat deposition or sparing, hypertrophic pseudomass, confluent fibrosis, or focal scar [7]. Of these 569 available nodules in 396 patients, 184 nodules were excluded because of a lack of final diagnosis due to an insufficient follow-up period of less than 24 months (38 nodules in 18 patients), or immediate locoregional treatment such as transcatheter arterial chemoembolization (TACE) or radiofrequency ablation (RFA) without biopsy or marginal tumor recurrence (146 nodules in 112 patients). Finally, 385 nodules in 266 patients were analyzed in this study.

Fig. 1
figure 1

Flowchart of study population. US, ultrasound; HCC, hepatocellular carcinoma; MRI, magnetic resonance imaging

MRI techniques

MRI was performed on a 1.5-T or 3-T scanners. Detailed techniques are given in Supplemental Method and Supplemental Table 1.

Image analysis

Two board-certified abdominal radiologists (each with > 7 years of experience in hepatic imaging) who were blinded to any information on clinical history or final diagnosis independently reviewed the MRI, and in the case of any discrepancies between the two readers, re-evaluated the MRI together to reach a consensus.

The readers analyzed nodule size and location, and the presence or absence of MFs (non-rim arterial-phase hyperenhancement, non-peripheral washout, or enhancing capsule), targetoid mass features, and AFs, according to the LI-RADS v2018 [2]. AFs included findings favoring malignancy in general, favoring HCC in particular, and favoring benignity (Supplementary Table 2). Subthreshold growth, size stability over 2 or more years, and size reduction were not included, because hepatic nodules ≥ 1 cm in diameter initially detected on ultrasound were included, and no previous examination was available for comparison.

Reference standard

The final diagnoses for the 385 nodules in 266 patients were as follows: (a) 283 nodules in 225 patients were determined as HCC on the basis of pathological evidence (184 nodules by resection or explantation, and 11 by biopsy), marginal recurrence after RFA or TACE (79 nodules), or interval growth of the lesion on follow-up images (nine nodules; mean, 4.2 months [range, 1–9]); (b) 18 nodules in 18 patients were determined as non-HCC malignancies by pathological diagnosis (17 nodules by resection or explantation, and one by biopsy), including 11 combined HCC and cholangiocarcinomas, six cholangiocarcinomas, and one metastatic adenocarcinoma; and (c) 84 nodules in 56 patients were determined to be benign by pathological specimens (29 nodules, including 13 nodules by surgical resection, three nodules by biopsy, and 13 nodules by explantation for liver transplantation) or a stable or regressed nodule size for at least 24 months (55 nodules; mean, 29.6 months [range, 24–37]). The 29 benign nodules determined by histopathologic proof were eight high-grade dysplastic nodules, 16 low-grade dysplastic nodules, one biliary adenoma, one hemangioma, one angiomyolipoma, one hyalinized nodule, and one chronic granulomatous inflammation. Of the 266 patients, 29 had both HCC (35 nodules) and benign lesions (39 nodules), three had both HCC (three nodules) and non-HCC malignancy (three nodules), and one had both non-HCC malignancy (one nodule) and benign lesions (two nodules).

Statistical analysis

All analyses were performed on a per-lesion basis. The frequency of occurrence of each AF was recorded for both the HCC and non-HCC groups. To determine the strength of association between HCC diagnosis and imaging features, the diagnostic odds ratios (DORs) of each MF and AF were calculated using logistic regression model with generalized estimating equation to adjust clustering effect [10]. Inter-observer agreement on the presence of AFs was assessed using the overall proportion of agreement and kappa statistics.

To develop the best strategy for applying AFs to improve diagnostic performance for HCC ≤ 3 cm, we counted the total number of all observed AFs favoring malignancy in general or HCC in particular. When both AFs favoring malignancy in general or HCC in particular and AFs favoring benignity were simultaneously present, we did not adjust category determined using MFs only according to the LI-RADS version 2018 [2]. When only AFs favoring malignancy in general or HCC in particular were present, various criteria were developed using cutoff points based on the number of all observed AFs favoring malignancy in general or HCC in particular regardless of AFs’ DORs and we adjusted category determined using MFs only according to each criterion.

The diagnostic performance for HCC ≤ 3 cm, i.e., sensitivity, specificity, positive predictive value, negative predictive value, and accuracy, was calculated when only MFs were considered. After then, the diagnostic performance of each criterion was calculated when AFs in addition to MFs were considered. To evaluate the diagnostic performance of each criterion, the sensitivity and specificity were compared with those values attained before addition of the AFs using generalized estimating equation models.

Subgroup analysis was performed in each LI-RADS category group (i.e., LR-3, LR-4, and LR-5) to determine the LI-RADS categories in which AFs can improve diagnostic accuracy. The change in diagnostic performance after applying AFs was also evaluated using McNemar’s test and Bonferroni’s correction to compare the sensitivity and specificity before and after application of the AFs.

Statistical analyses were performed using SAS v 9.4 (SAS Institute). P < 0.05 was considered to indicate a significant difference. In the case of multiple comparisons, Bonferroni’s correction was applied to give an adjusted significance value of p < 0.01.

Results

Patients

The baseline characteristics of the 266 patients and 385 nodules are summarized in Table 1. There were 216 men (mean age, 61.2 years; range, 37–86 years) and 50 women (mean age, 61.7 years; range, 33–84 years). Hepatitis B was the most common cause of chronic liver disease (n = 210, 78.9%), followed by alcoholic liver disease (n = 24, 9.0%). The sizes of the 385 nodules ranged from 5 to 30 mm (mean, 18.3 ± 7.7 mm). The median value of total bilirubin in the 266 patients was 0.7 mg/dL (range, 0.2–6.0).

Table 1 Baseline characteristics of the 266 patients and 385 nodules

Major and ancillary imaging features

The LI-RADS category assignments using MFs only were as follows: LR-3, 154 nodules (40.0%); LR-4, 44 nodules (11.4%); LR-5, 164 nodules (42.6%); LR-TIV, 4 nodules (1.0%); and LR-M, 19 nodules (4.9%). Of the 283 HCCs, 159 (56.2%) were categorized as LR-5, while 82.4% (84/102) of the non-HCCs were categorized as LR-3. The three MFs of non-rim arterial-phase hyperenhancement, non-peripheral washout, and enhancing capsule were more common in the HCC group than in the non-HCC group (Table 2). Of these three MFs, non-rim arterial-phase hyperenhancement had the strongest association with HCC (DOR 21.51; 95% confidence interval [CI] 10.18–45.47), followed by enhancing capsule (DOR 10.04; 95% CI 1.22–82.31) (Table 2).

Table 2 Frequencies and diagnostic odds ratios of the major features

Differences in the frequencies and DORs of AFs between the two groups are summarized in Table 3. Of the AFs favoring HCC in particular, three features (non-enhancing capsule, nodule-in-nodule architecture, and mosaic architecture) were noted only in the HCC group. Mosaic architecture had the strongest association with HCC (DOR 15.95; 95% CI 0.89–285.37) among the five AFs favoring HCC in particular. All AFs favoring malignancy in general, not HCC in particular, were noted more frequently in the HCC group than in the non-HCC group. Of the AFs favoring malignancy in general, not HCC in particular, hepatobiliary-phase hypointensity had the strongest association with HCC (DOR 21.82; 95% CI 5.59–85.20), followed by restricted diffusion (DOR 16.45; 95% CI 8.85–30.57). The frequency of all AFs favoring benignity in the HCC group was 1% or less than 1%, which was lower than that in the non-HCC group.

Table 3 Frequencies and diagnostic odds ratios of the ancillary features

The proportions of inter-observer agreement and kappa values for each AF ranged from 85.5% (329/385) to 99.2% (382/385) and from 0.0 to 0.75, respectively (Supplementary Table 3). Of the 17 AFs, iron sparing in solid mass showed the highest proportion of agreement (99.2%), and corona enhancement and transitional-phase hypointensity showed the lowest proportion of agreement (85.5%). In addition, mild-moderate T2 hyperintensity showed the highest kappa value (κ = 0.75), and iron sparing in solid mass showed the lowest kappa value (κ = 0.0). The reasons of discrepancy between kappa values and proportions of agreement are a well-known kappa paradox and very low prevalences of some ancillary features (Table 3) [11].

Diagnostic performance of criteria using ancillary features

The diagnostic performances of the LI-RADS categorizations for HCC ≤ 3 cm before and after adding the AFs to the MFs are summarized in Table 4. When we used MFs only, the sensitivity and specificity of the LR-5 category for diagnosis of HCC ≤ 3 cm were 56.2% (159/283) and 95.1% (97/102), respectively, while those of the LR-4 and LR-5 categories combined were 70.0% (198/283) and 90.2% (92/102), respectively.

Table 4 Diagnostic performance of LI-RADS for HCC in 385 nodules before and after the additional application of ancillary features to major features

We developed five criteria using the number of AFs favoring malignancy in general or HCC in particular regardless those DORs as cutoff points, i.e., a number of AFs ≥ 1 to ≥ 5. As the number of AFs increased, the sensitivity decreased but the specificity increased. The sensitivities of the criteria with AFs ≥ 1 to AFs ≥ 3 were significantly higher than those of MFs only (86.6–93.6% [245/283–265/283] vs. 70.0% [198/283]; p < 0.001), while the specificities were significantly lower than those of MFs only (22.5–72.5% [23/102–74/102] vs. 90.2% [92/102]; p < 0.001). The sensitivity and specificity of the criterion of AFs ≥ 5 were not significantly different from those of MFs only (p ≥ 0.120). However, the criterion of AFs ≥ 4 had significantly higher sensitivity than that of MFs only (80.6% [228/283] vs. 70.0% [198/283]; p < 0.001), while the specificity was not significantly different (85.3% [87/102] vs. 90.2% [92/102]; p = 0.060; Fig. 2). In this criterion (AFs ≥ 4), the combination of hepatobiliary-phase hypointensity, transitional-phase hypointensity, mild-moderate T2 hyperintensity, and restricted diffusion was the most common one (84.4%, 205/243).

Fig. 2
figure 2

A 60-year-old man with chronic hepatitis B and a surgically confirmed hepatocellular carcinoma. a A 15-mm nodule in hepatic segment VI shows non-rim arterial-phase hyperenhancement (arrow) without non-peripheral washout (b) on portal venous phase or enhancing capsule on portal venous phase or (c) transitional phase. The nodule was categorized as LR-3 when only major features were considered. The nodule (arrow) shows (c) transitional-phase hypointensity, (d) hepatobiliary-phase hypointensity, (e) mild T2 hyperintensity, and (f) restricted diffusion (b = 900 s/mm2). After application of these four ancillary features in addition to the major features, this nodule was upgraded to LR-4. This nodule was confirmed as hepatocellular carcinoma after surgical resection

Subgroup analyses for validating the effects of ancillary features

In the LR-3 category subgroup (Supplementary Table 4), significant increases in diagnostic accuracy were noted with the criteria of AFs ≥ 3 and AFs ≥ 4 (p ≤ 0.001). The criterion of AFs ≥ 4 had significantly higher sensitivity than that of MFs only (42.9% [30/70] vs. 0.0% [0/70]; p < 0.001), although the specificity was not significantly different (94.0% [79/84] vs. 100.0% [84/84]; p = 0.025). The most common combination of AFs used in LR-3 being upgraded to LR-4 was hepatobiliary-phase hypointensity, transitional-phase hypointensity, mild-moderate T2 hyperintensity, and restricted diffusion (80.0%, 28/35).

However, in both the LR-4 (Supplementary Table 5) and LR-5 (Supplementary Table 6) category subgroups, the sensitivities decreased when AFs were used in addition to MFs, while the specificities increased, in comparison with values determined using MFs only. The diagnostic accuracy did not increase when AFs were applied in addition to MFs.

Discussion

This study demonstrated that AFs favoring HCC in particular and malignancy in general were more common in the HCC group than in the non-HCC group, with variable frequencies. Of these AFs, hepatobiliary-phase hypointensity had the strongest association with HCC ≤ 3 cm. When we applied the AFs in addition to the MFs, the criterion of four or more AFs favoring malignancy in general or HCC in particular significantly increased the sensitivity for diagnosis of probable HCC ≤ 3 cm from 70.0 to 80.6% (p < 0.001), without a significant decrease in specificity (90.2% vs. 85.3%; p = 0.060), especially in LR-3 observations.

In LI-RADS v2018, there are nine AFs favoring malignancy in general, five AFs favoring HCC in particular, and seven AFs favoring benignity [2]. As these AFs are equally weighted in significance, the use of AFs might be unclear (e.g., whether an LR-3 nodule should be upgraded to LR-4 when only one rare AF favoring malignancy is present). A few recent studies noted the different frequencies of AFs and their associations with HCC and revealed that corona enhancement and restricted diffusion showed high DORs on gadobenate dimeglumine–enhanced MRI [5, 12]. However, in this study, using gadoxetate disodium–enhanced MRI, hepatobiliary-phase hypointensity had the highest frequency and strongest association with HCC, followed by restricted diffusion and mild-moderate T2 hyperintensity. Considering the different frequencies of AFs and their different associations with HCC, we need to stratify the various AFs according to their strength of association, and in doing so, the use of AFs may be made clearer.

Previous studies have reported on the importance of hepatobiliary-phase hypointensity for diagnosing HCC, including increasing sensitivity and discriminating well-differentiated HCC from benign precursor nodules [13,14,15]. However, in this study, hepatobiliary-phase hypointensity showed high frequencies in both the non-HCC group (84.3%) and the HCC group (98.2%). Because of the high frequency of hepatobiliary-phase hypointensity in the non-HCC group, the criterion of AFs ≥ 1 decreased the specificity for diagnosing HCC from 90.2 to 22.5%. Therefore, in the case of gadoxetate disodium–enhanced MRI, a number of AFs ≥ 1 may be an insufficient criterion for a one-category upgrade up to LR-4. In other words, AFs need to be used in a more conservative and strict approach.

Several recent studies reported that AFs could improve the diagnostic performance for HCC, showing 10.0–12.9% increases in sensitivity [5, 16, 17]. However, although many previous studies have reported on the LI-RADS category assignment and its diagnostic performance using AFs, the changes in the LI-RADS category assignments before and after the use of AFs were unclear [12, 18,19,20] or variable, i.e., 20.5–97.1% of LR-3 upgraded to LR-4 [5, 7, 9]. In the LI-RADS v2018, the LI-RADS category can be upgraded by one category up to LR-4 when one or more AFs favoring malignancy are present, although the use of AFs is at the radiologist’s discretion [2]. This ambiguity might result in the variable range of change in LI-RADS category assignments and the conflicting results about whether AFs can increase the sensitivity of the LR-4 category for diagnosing HCC without a significant decrease in specificity [5, 17]. In this study, we found that diagnostic performance varied according to the number of AFs favoring malignancy in general or HCC in particular, and that the criterion of AFs ≥ 4 was the best criterion, as it increased sensitivity without significantly decreasing specificity. Therefore, the criterion of AFs ≥ 4 may help clarify the use of AFs when applying the LI-RADS for the diagnosis of HCC ≤ 3 cm.

According to the subgroup analyses, in both LR-5 and LR-4 nodules, the added application of AFs showed no benefit for HCC diagnosis in comparison with the use of MFs only. This result supports the current guideline, which does not allow LR-4 to be upgraded to LR-5, even though hepatic observations may have AFs favoring malignancy [2]. In contrast to both LR-4 and LR-5 subgroup analyses, we found a significant increase in diagnostic accuracy when the criteria of AFs ≥ 3 or ≥ 4 were applied to the LR-3 subgroup. Furthermore, the criterion of AFs ≥ 4 in the LR-3 subgroup significantly increased sensitivity (42.9% vs. 0.0%; p < 0.001), without significantly decreasing specificity (94.0% vs. 100.0%; p = 0.025). This suggests that the added application of AFs is important in LR-3 nodules, which have the potential to be upgraded to LR-4.

In this study, the LI-RADS category assignments using MFs only showed a bimodal distribution of LR categories, with LR-3 (40%) and LR-5 (43%) making up the most with a small proportion of LR-4 (11%). Our result is similar to those of previous studies regarding the LI-RADS category assignment determined by MFs only, i.e., 30–35% in LR-3, 14–20% in LR-4, and 21–44% in LR-5 [5, 7]. Considering these results, the use of AFs might undermine a bimodal distribution of LI-RADS categories, increasing confidence in category assignment.

This study has a few limitations. First, the retrospective nature of the study design may have led to selection bias. However, we tried to overcome this limitation by including a large number of subjects and using a historical cohort study design similar to a real clinical situation. Second, the clinical diagnosis including marginal tumor recurrence after TACE or RFA, or interval growth of the lesion on follow-up imaging, could be a limitation due to a lack of universal histological confirmation as a reference standard. However, according to the AASLD practice guidance, observations that meet the typical imaging findings can be diagnosed as HCC without histopathological evidence [3]. Therefore, our final diagnoses correspond to clinical practice. Third, the predominance of patients with hepatitis B and preserved liver function (Child-Pugh classification A) in our study subjects may have a limitation to apply our results generally.

In conclusion, various AFs in the LI-RADS v2018 showed variable frequencies of occurrence and strengths of association with HCC ≤ 3 cm. More strict application of AFs in addition to MFs in LR-3 observations is needed to improve the diagnostic performance for probable HCC ≤ 3 cm on gadoxetate disodium–enhanced MRI, i.e., a criterion of four or more AFs favoring malignancy in general or HCC in particular.