Background

Hepatocellular carcinoma (HCC), which accounts for 99% of all liver cancer cases, is the fifth most common cancer worldwide, the second leading cause of cancer-related deaths, and a major global health problem[1, 2]. Contrast-enhanced ultrasound (CEUS) provides high diagnostic accuracy of HCC due to its superior safety, spatial, and temporal resolution. But the reason why the American Association for the Study of Liver Diseases (AASLD) [3] and the European Association for the Study of the Liver (EASL) [4] excluded CEUS as a diagnostic tool for HCC in guidelines was that studies have concluded that the mode of arterial phase hyperenhancement (APHE) followed by washout at CEUS is not unique to HCC and occurs in about 50% of intrahepatic cholangiocarcinoma (ICC) in cirrhosis, causing about 1% of nodules to be misdiagnosed [5, 6]. To improve the diagnostic accuracy of HCC, the American College of Radiology (ACR) has released a program to standardize the reporting and data collection of CEUS that describes the risk of HCC in people with a chronic hepatitis background, named contrast-enhanced ultrasound liver imaging reporting and data system (CEUS LI-RADS) [7]. The CEUS LI-RADS system also contains a class of lesions with malignant characteristics, named LR-M, which does not specifically refer to HCC but suggests non-HCC malignant tumors.

Previous studies have shown that the 2017 version of CEUS LI-RADS system LR-5 category can effectively predict the risk of hepatocellular carcinoma [8, 9]. But some studies showed that 35%-48% of HCC has been misdiagnosed as LR-M, which led to a high NPV and low sensitivity for HCC[9, 10]. It suggested the need for improving the diagnostic efficiency of HCC by reducing the misjudgment of LR-M.

Some studies have verified the diagnostic effect of the LI-RADS system in their clinical works and expressed their opinions [8, 11,12,13]. Zheng et al. showed that the sensitivity for HCC may be raised by regrouping LR-M categories with no punched-out appearance in 5 min to LR-5 [8]; Fei Li et al. proposed that if the onset of early washout was adjusted to < 45 s, the specificity could be further improved without decreasing the sensitivity [11]. Their researches suggested a possible way to improve the LI-RADS efficacy by adjusting the criteria of signs. How can it be applied clinically? Further evaluation is needed.

This study aimed at improving the diagnostic efficiency of CEUS LI-RADS system version 2017 in distinguishing HCC from non-HCC malignancies. In our population, we evaluated the methods of adjusting the criteria of the signs and tested their contributions to the improvement of HCC diagnosis.

Methods

Written informed consent from patients was waived by the ICE for Clinical Research and Animal Trials of the First Affiliated Hospital of Sun Yat-sen University, because of the retrospective nature of the study.

Patients

From Jan. 2015 to Dec. 2015, we retrospectively analyzed 315 focal liver lesions in 289 patients with a high risk of chronic hepatitis and alcoholic cirrhosis, who underwent CEUS at our institution. In the case of multiple lesions in the same patient, the largest lesions were selected for the study (Fig. 1).

Fig. 1
figure 1

Flowchart of the study population inclusion and exclusion. CEUS: contrast-enhanced ultrasound. HCC: hepatocellular carcinoma

All the CEUS imaging data were saved as digital movies for review, and accurate images of three phases (artery phase, portal venous phase, and delayed phase), enhancement patterns, the onset and degree of washout, availability of pathological diagnosis results can be retrieved. The exclusion criteria were as follows: 1. CEUS data missing; 2. previously treated lesions or local relapse from previously treated lesions, such as transarterial chemoembolization (TACE), chemotherapy, and radiotherapy; 3. cirrhosis due to a vascular disorder, such as Budd-Chiari syndrome, chronic portal vein occlusion, cardiac congestion, or diffuse nodular regenerative hyperplasia; 4. diffuse HCC (Table 1).

Table 1 The inclusion and exclusion criteria

Reference standard

All nodules were diagnosed by means of either histologic evaluation or a combination of clinical follow-up and imaging reference standard. Nodules classified as CEUS LR-1 were assigned to benign, such as cyst, hemangioma, and hepatic fat deposition/sparing. Nodules classified as CEUS LR-2 were assigned to distinct iso-enhancing solid nodule < 10 mm or non-masslike iso-enhancing observation of any size, not typical hepatic fat deposition/sparing. Nodules were categorized as CEUS LR-3 for (1) iso-enhancing and ≥ 10 mm, (2) < 10 mm with APHE but without late or mild washout, (3) without APHE or washout regardless of their size, or (4) < 20 mm without APHE but showed later washout. If the later washout occurred in a nodule ≥ 20 mm, then it will be assigned as LR-4, or nodule showed APHE with late and mild washout but < 10 mm in size, if the nodule ≥ 10 mm, it must have no APHE and no washout of any type to be classified as LR-4. The LR-5 was assigned nodules ≥ 10 mm with APHE (Not rim, not peripheral discontinuous globular) followed by late and mild washout. Rim enhancement pattern (not globular peripheral) in the arterial phase indicates LR-M. Besides, marked or early washout < 60 s also classified as LR-M regardless of the arterial appearance. If unequivocal enhancing soft tissue is observed in the veins, they are classified as LR-TIV categories regardless of whether the enhancement pattern is similar to LR-4 or LR-5.

Ultrasound imaging acquisition

B-mode ultrasound (BUS) and CEUS examinations were performed with an Aplio 500 (Toshiba Medical Systems, Tokyo, Japan) with a 375BT convex transducer (frequency range, 1.9–6.0 MHz) and an Aixplorer Ultrasound system (SuperSonic Imagine, Aix-en-Provence, France) equipped with an SC6-1 convex probe (frequency range, 1.0–6.0 MHz). The number, size, location, echogenicity of lesions, and liver background were described on BUS. The CEUS examinations were performed with a low-mechanical index after a bolus injection of 2.4 mL of SonoVue (Bracco, Milan, Italy) in the antecubital vein followed by a 5-mL saline flush. The timer started at the same time as the contrast injection was completed. Then, the target lesions and surrounding liver parenchyma were observed continuously for at least 90 s. After 90 s, the lesions were scanned intermittently and recorded for 5 min or more until no washout features could be observed. All imaging data were saved for later evaluation.

CEUS imaging analysis

The histopathologic results and other imaging examination results were blinded then presented to two radiologists who have 5 and 10 years of experience in liver CEUS diagnosis. These two radiologists then reviewed the CEUS imaging independently, assigning categories according to CEUS LI-RADS (2017 version). If there was a disagreement, arbitration would be performed by a radiologist with 15 years of experience in liver CEUS diagnosis.

The following diagnostic features were applied to characterize each nodule based on the CEUS LI-RADS version 2017: whether the tumor was in vein; the size of the nodule; APHE and its pattern (homogeneous hyperenhancement, heterogeneous hyperenhancement, peripheral discontinuous globular hyper-enhancement, peripheral rim-like hyperenhancement, iso-enhancement, hypo-enhancement); the time of washout onset; the degree of washout (whenever this feature occurred). If the washout appeared, the washout onset was divided into < 60 s and ≥ 60 s after contrast injection. Furthermore, to evaluate diagnostic criteria after adjusted, the onset time of washout was divided into < 45 s and ≥ 45 s after contrast injection. And if there was marked washout emerging within 2 min or 3 min, it would be defined as markedly hypo-enhanced (appears black);

Statistical analysis

All statistical analyses were performed using the SPSS version 20.0 software package. Descriptive analysis was reported as rates in percentages and absolute values. Continuous variables are expressed as medians and ranges. The overall diagnostic capability of LI-RADS was assessed in terms of accuracy, sensitivity, specificity, PPV, and negative predictive value. Categorical variables were compared by using the paired × 2 test. A two-sided p value of < 0.05 was considered statistically significant.

Results

Patients and nodule characteristics

The basic characteristics of patients and nodules of our study sample are shown in Table 2. A total of 315 nodules in 289 patients were included in this study. Of the 315 lesions, 286 (91%) were HCC; 12 (0.4%) were ICC; 4 (1.3%) were combined hepatocellular-cholangiocarcinoma; 2 (0.6%) were metastasis; 4 (1.3%) were other malignancies. There were 7 (2.2%) benign nodules. Among the benign nodules, 2 (0.6%) were dysplastic nodules, 1 (0.3%) was a cirrhotic nodule, 2 (0.6%) were intrahepatic bile duct adenomas, 2 (0.6%) were hemangiomas. These nodules were confirmed by histologic assessment through surgery or biopsy.

Table 2 Patient and nodule characteristics

Diagnostic performance of CEUS LI-RADS version 2017

Of the 315 nodules, there was 1 (0.3%) LR-1, 7 (1.9%) LR-3, 8 (2.5%) LR-4, 152 (46.3%) LR-5, 113 (35.8%) LR-M, and 34 (10.8%) LR-TIV nodules (Table 3). No LR-2 categories were observed by each observer. No malignant lesions were incorrectly classified as LR-1. A case of hemangioma was classified as LR-1. As expected, the risk of HCC increased gradually from the LR-3 to the LR-5 category. The incidence rates of HCC within the LR-3, LR-4, and the LR-5 category were 42.8% (3 of 7), 87.5% (7 of 8), and 96.1% (146 of 152), respectively (Table 4).

Table 3 The number of different LR-RADS category
Table 4 The incidence rates of HCC within the LR-RADS category

There were 7 LR-3 nodules, among which 4 were non-HCC lesions, one was metastasis, two were hemangiomas and one was dysplastic nodule. Of the 8 LR-4 nodules, only one was non-HCC lesion and it was dysplastic nodules. The LR-TIV category was displayed by 34 (10.8%) of all 315 nodules and it contained 32 HCC and 2 ICC.

152 (48.3%) out of the 315 nodules were in the LR-5 category, and of which 146 (96.1%) were HCC. And of the remaining six LR-5 categories, two were metastases, one was combined hepatocellular-cholangiocarcinoma, one was hepatic sarcoma, one was cirrhotic nodule, one was benign, but not specified. The accuracy, sensitivity, specificity, PPV, and NPV of the LR-5 category as a predictor of HCC were 53.3% (95% CI: 0.478–0.588), 56.1% (95% CI: 0.451–0.567), 78.5% (95% CI: 0.634–0.938), 96.1% (95% CI: 0.93–0.991), 14.5% (95% CI: 0.089–0.201), respectively.

Among the 315 nodules, 113 (35.8%) were LR-M category, of which 98 (86.7%) were HCC. The vast majority of non-HCC LR-M nodules were ICC (n = 10, 8.8%). The remaining were either combined hepatocellular-cholangiocarcinoma (n = 3, 2.7%) or neoplasms of other cellular origin (n = 2, 1.8%). The LR-M category as a predictor of non-HCC malignancies and its accuracy, sensitivity, specificity, PPV, NPV were 69.5% (95% CI: 0.644–0.746), 68.4% (95% CI: 0.475–0.893), 66.2% (95% CI: 0.608–0.716), 11.5% (95% CI: 0.056–0.174), 92.5% (95% CI: 0.889–0.960), respectively.

LR-5 and LR-M nodules reclassified by the time of washout

38 (33.6%) out of 113 LR-M nodules showed APHE and washed out < 45 s, and the remaining 75 nodules also showed APHE but washed out ≥ 45 s (Fig. 2). These 75 nodules were regrouped as LR-5 to reassess the diagnostic performance. Compared with the previous results after reclassification, the diagnostic performance had changed. The accuracy, sensitivity, specificity, PPV, and NPV of the LR-5 category were 71.7% (95% CI: 0.668–0.767, P < 0.001) versus 53.3%, 74.1% (95% CI: 0.691–0.792, P < 0.001) vs. 56.1%, 48.3% (95% CI: 0.301–0.665, P = 0.018) vs. 78.5%, 93.3% (95% CI: 0.902–0.966, P < 0.001) versus 96.1%, 15.9% (95% CI: 0.083–0.236) versus14.5%, respectively. The accuracy, sensitivity, specificity, PPV, and NPV of the LR-M category after reclassification were 95.2% (95% CI: 0.929–0.976, P < 0.001) versus 69.5%, 31.5% (95% CI: 0.107–0.525, P = 0.023) versus 68.4%, 89.2% (95% CI: 0.857–0.927, P < 0.001) versus 66.2%, 15.4% (95% CI: 0.041–0.267) versus 11.5%, 95.3% (95% CI: 0.928–0.978) versus 92.5%, respectively (Table 5).

Fig. 2
figure 2

A 68 mm-length HCC lesion (white arrowheads) in a 71-years old man with chronic HBV infection (a). The lesion (white arrowheads) (b) showed APHE 19 s after SonoVue injection followed by early washout at 53 s (c) and mild washout was seen at 129 s (d). which was categorized as LR-M according to CEUS Liver Imaging Reporting and Data System version 2017

Table 5 Diagnostic performance of categories LR-5 and LR-M before (< 60 s) and after (< 45 s) recategorization according to the time of washout

LR-5 and LR-M nodules reclassified by the marked washout time

46 (40.7%) of 113 LR-M nodules showed marked washout within 3 min, and the remaining 67 nodules were regrouped as LR-5 to reappraised the diagnostic performance (Fig. 3). After reclassification, the diagnostic performance also changed when compared with before. The accuracy, sensitivity, specificity, PPV, and NPV of the LR-5 category were 80% (95% CI: 0.756–0.844, P < 0.001) versus 53.3%, 80% (95% CI: 0.751–0.849, P < 0.001) versus 56.1%, 80% (95% CI: 0.694–0.901) versus 78.5%, 94.9% (95% CI: 0.921–0.979) versus 96.1%, 45.8% (95% CI: 0.359–0.558, P < 0.001) vs. 14.5%, respectively. And the accuracy, sensitivity, specificity, PPV, and NPV of the LR-M category were 85.3% (95% CI: 0.815–0.893, P < 0.001) versus 69.5%, 47.3% (95% CI: 0.249–0.698) versus 68.4%, 87.5% (95% CI: 0.837–0.913, P < 0.001) versus 66.2%, 19.6% (95% CI: 0.081–0.31) versus 11.5%, 96.3% (95% CI: 0.940–0.985) versus 92.5% (Table 6).

Fig. 3
figure 3

A 57 mm ICC lesion (white arrowheads) in a 73-years old woman (a). The lesion (white arrowheads) (b) showed APHE 28 s after SonoVue injection followed by mild washout at 62 s (c) and marked washout was seen at 126 s (d), which was categorized as LR-5 according to CEUS Liver Imaging Reporting and Data System version 2017

Table 6 Diagnostic Performance of Categories LR-5 and LR-M before (within 120 s) and after (within 180 s) recategorization according to marked washout time

Interobserver consistency in CEUS LI-RADS Classification

The interobserver consistency for CEUS LI-RADS was almost perfect agreement with k value of 0.803 (95% CI: 0.753, 0.854). And 45 of 315 lesions (14.28%) needed to reclassify to reach a consensus (Table 7).

Table 7 The classification of LI-RADS by two observers

Discussion

In this retrospective study, we assessed the diagnostic efficacy of CEUS LI-RADS system version 2017 by analyzing 315 untreated liver nodules in patients with a high risk of HCC. Our study showed that the LR-5 category had a high PPV (96.1%) for HCC similar to previous studies [9]. However, the LR-M category for non-HCC malignancies diagnosis provided a high NPV (92.5%) but low PPV (11.5%) because of a high proportion of HCC (98 of 113 nodules) misclassification, suggesting that the LR-M category required refinement[14].

After we reclassified the LR-M nodules with a washout onset ≥ 45 s into LR-5, there were improvements in the diagnostic accuracy of both LR-M and LR-5. The LR-5 showed higher accuracy and sensitivity with an insignificant decline in PPV, but a significant decline in specificity. The primary purpose of LI-RADS is to provide high specificity for the diagnosis of HCC, and so this modification would not be helpful. Meanwhile, the LR-M showed notable improvements in accuracy, specificity in a similar way, but there was a decrease in sensitivity and little improvement in PPV. In Li’s research[11], using the LR-M criteria to distinguish ICC and HCC, if the early washout onset was modulated to < 45 s, there had been a marked improvement in specificity and without losing sensitivity, which was not completely consistent with our results. We hypothesized that the reason for this discrepancy was that we included 16% of non-HCC and non-ICC malignant tumors (18 in 113), which usually showed washout ≥ 45 s and reclassified as LR-5, while ICC was all the LR-M nodules in Li’s research.

Then, we tried to modify the standard of LR-5 and LR-M again and reclassified the LR-M nodules without marked washout onset within180s into LR-5. After this modification, the LR-5 showed higher accuracy, sensitivity, and NPV, which were similar to previous reports [8]. Despite reclassification, changes in specificity were negligible and PPV remained high. At the same time, the LR-M also had a considerable improvement in accuracy, specificity, and non-significant in PPV, which was not exactly consistent with Zheng’s report [8]. We speculated that there were two reasons for this inconsistency. First of all, we modified their standard so that instead of LR-M nodules which showed the absence of punched-out appearance within 5 min being reclassified to LR-5, LR-M nodules which showed the absence of punched-out appearance within 3 min were classified LR-5. The reason for the standard adjustment was that in our LR-M nodules, most of the marked washout appeared within 3 min, a few after 3 min, and no more than 5 min. Perhaps this was one of the reasons for this discordance. Second, the difference might also be due to patients from different high-risk backgrounds. In the study of Zheng et al., the patients all had HBV infection, but in our study, the patient's high-risk background consisted of hepatitis B (94.1%) and alcoholic cirrhosis (5.1%), and two patients had both HBV and HCV (0.69%). And these may be the reasons we could not reproduce their results in our study.

There are still some inadequacies in our study, given its retrospective nature and single-center research. Our study sample size was relatively small, resulting in the absence of LR-2 nodule in our case. Prospective studies with multi-center and larger sample size are needed to verify the issue. Also, according to the CEUS LI-RADS system Version 2017, we classified 34 LR-TIV categories based on the presence or absence of vascular invasion in 315 cases. However, APHE and its subsequent washout were found in most of these nodules, but we did not classify these nodules into LR-5 or LR-M categories, so it is unknown whether these cases would have a significant influence on our statistical data. Maybe this is another topic which is worth studying. Moreover, limited by sample size, we did not conduct a detailed subgroup analysis, such as the efficacy difference of LI-RADS in different tumor sizes, the effect of cirrhosis background on efficacy, etc. In our study, the incidence of HCC was significantly higher than that of non-HCC (90.8% vs.7.1%), that imbalance potentially affected the outcomes rather than the test itself. Because it may have provided a high PPV but low NPV of LR-5 to diagnose HCC and a high NPV but low PPV of LR-M to diagnose non-HCC malignancy. And future studies may be required to correct this imbalance. Finally, CEUS LI-RADS is not so intuitive and easy-to-apply compared with other reporting systems such as thyroid imaging reporting and data system (TI-RADS) and breast imaging reporting and data system (BI-RADS) due to the high numbers of variables. In addition, some parameters of the system are relatively subjective in practice, such as the washout time. We believe that with the rapid development of artificial intelligence (AI) in medical images, the use of AI to automatically identify CEUS LI-RADS features and present them to readers for a final discrimination is expected to reduce its complexity and improve its practicability, and thus overcome the obstacle.

Conclusion

In conclusion, the LR-5 of CEUS LI-RADS system Version 2017 is an effective diagnostic tool to predict the risk of HCC. Alteration of the algorithm by reclassified LR-M nodules which showed arterial phase hyperenhancement, early washout, and absence of punched-out appearance within 3 min to LR-5 could improve the diagnostic performance. This may be a potential way to distinguish HCC from non-HCC malignancies in patients with a high-risk background.