Introduction

Hepatocellular carcinoma (HCC) can be diagnosed non-invasively in high-risk patients owing to typical features on contrast-enhanced imaging. Hallmarks of HCC include arterial phase hyperenhancement followed by a gradual “washout” of the contrast agent during portal venous and late phase. However, this characteristic contrast enhancement pattern is not found in all cases [1,2,3,4,5,6,7,8].

Attempts to improve standardization in the interpretation, documentation and reporting of contrast-enhanced imaging led to the development of LI-RADS (Liver Imaging Reporting and Data System) by the American College of Radiology (ACR) in 2011 [9, 10]. With LI-RADS, focal liver lesions in high-risk patients are categorized according to “major features” (lesion diameter, arterial phase hyperenhancement, washout appearance, capsule appearance, threshold growth) and optionally using ancillary features. The probability of a lesion being an HCC is expressed by assigning a LI-RADS category between LR-1 (definitely benign) and LR-5 (definitely HCC). LI-RADS is only defined for observations in contrast-enhanced computed tomography (CE-CT) and magnetic resonance imaging (CE-MRI).

German National Guidelines regard contrast-enhanced ultrasound (CEUS) as an imaging modality equivalent of CE-MRI and CE-CT in the non-invasive diagnosis of HCC in high-risk patients [11]. CEUS provides a unique real-time assessment of contrast enhancement patterns using contrast agents that remain strictly intravascular, allowing for very sensitive assessment of tumour vascularity. Several multicentre studies and meta-analyzes have demonstrated the excellent diagnostic accuracy of CEUS in the differential diagnosis of focal liver lesions [12,13,14,15,16,17,18,19,20,21].

Standardized CEUS-based diagnostic algorithms in HCC such as ACR-CEUS-LI-RADS have not been developed until very recently [22,23,24,25].

To date, there are no studies evaluating these CEUS-based algorithms in a clinical setting. Only a few studies have assessed inter-reader agreement with LI-RADS in CT and MRI [26,27,28,29,30,31,32]. Thus, for the first time, this pilot study aimed to directly compare the diagnostic accuracy and interobserver agreement between ACR-CEUS-LI-RADS and MRI-LI-RADS.

Materials and methods

Study design

Figure 1 illustrates the design of this retrospective study. The risk population for HCC was defined according to national guidelines as patients with cirrhosis of any origin, chronic hepatitis B infection, chronic hepatitis C infection with advanced fibrotic changes, and histologically proven non-alcoholic steatohepatitis [11]. Inclusion criteria were at least one focal liver lesion visible on conventional ultrasound and availability of both CEUS and CE-MRI of the liver. All patients had both CEUS and CE-MRI within at least 3 months after initial detection of the lesion. Patients with prior systemic or local ablative treatment for HCC were excluded. All patients provided written informed consent for de-identified data evaluation. The study was conducted in accordance with the guidelines of the Declaration of Helsinki and approved by the local ethics committee.

Fig. 1
figure 1

Study design and patient selection

A systematic, retrospective search of the interdisciplinary liver cancer board was performed to identify high-risk patients. Patients’ electronic records were assessed to select those with CEUS examinations of the liver. Additionally, patients were identified when presenting for conventional liver ultrasound or CEUS, including patients undergoing HCC surveillance, symptomatic patients and patients with incidental focal liver lesions.

CEUS

CEUS examinations were performed according to European Federation of Societies for Ultrasound in Medicine and Biology (EFSUMB) guidelines for the characterization of focal liver lesions following a standardized protocol with low mechanical index and intravenous bolus injection of 1.5 mL SonoVue® (Bracco Imaging, Konstanz, Germany) followed by a saline flush [12]. Video clips of the examinations were recorded over a time period of 3–5 min, beginning prior to the first arrival of detectable microbubbles and continuing until the beginning of clearance of microbubbles from parenchymal tissue in the late phase. Contrast enhancement patterns during arterial, portal venous and late phase were assessed. Vascular phases were defined according to EFSUMB guidelines (arterial phase, beginning within 20 s after injection of the contrast agent, duration until 30–45 s after injection, depending on the cardiocirculatory situation; portal venous phase, beginning 30–45 s after injection, ending about 120 s after injection; late phase, beginning 2 min after injection and lasting until the clearance of microbubbles from liver tissue) [12].

CEUS was performed using three different ultrasound devices (Siemens Acuson S2000, GE Logiq E9, Toshiba Aplio 500) by three physicians with 5–10 years of experience in liver ultrasound. All patients underwent conventional B-mode of the liver prior to CEUS. In cases with more than one lesion, all lesions were recorded. Only the lesion most accessible to ultrasound examination was chosen for CEUS and assessment by two independent observers (not involved in the CEUS examination) using the standardized diagnostic algorithm CEUS LI-RADS.

MRI

MRI was performed following standardized protocols with two different 1.5-T MR scanners: one Magnetom Aera and one Magnetom Avanto (both Siemens Healthineers, Erlangen, Germany). A dedicated HCC protocol was used for each scanner. All images had a slice thickness of 5 mm. The spacing was 6 mm for two-dimensional sequences and 5 mm for three-dimensional acquisition. After the pre-contrast sequences, weight-adapted gadobutrol 1.0 mmol/mL (Gadovist®, Bayer Pharma AG, Berlin, Germany) was injected intravenously as a non-liver-specific contrast agent. T1w sequences were acquired in the arterial, venous and several post-CE phases. The matrix was roughly 260 × 320 pixels for all sequences except the diffusion-weighted images, for which it was 130 × 160 pixels.

The default protocols are presented in detail in Table 1.

Table 1 Magnetic resonance imaging protocols

The target lesion assessed with CEUS-LI-RADS was identified on MRI scans and evaluated by two observers using the standardized diagnostic algorithm MRI-LI-RADS. A lesion was measured on the axial T1w sequence it was best visible.

Standardized algorithms

CEUS-LI-RADS

CEUS examinations were evaluated according to ACR-CEUS-LI-RADSv.2016 (Supplemental Fig. 1) developed by the American College of Radiology. Prior to the study, observers received theoretical training (2 h) to become familiarized with the use of the CEUS-LI-RADS algorithm. The algorithm and its features were explained and example CEUS clips of all categories were reviewed. In a subsequent practical training phase, classification according to CEUS-LI-RADS was taught using five example lesions.

Two observers with 2 and 5 years of experience, respectively, in CEUS examinations of the liver blinded to patients’ clinical data and final diagnosis (except for knowledge of a high-risk constellation) independently reviewed the CEUS examinations and assigned a CEUS-LI-RADS category to the target observation. Observers assessed uptake of contrast agent in the target observation relative to the surrounding parenchyma. They decided on hyper-, iso- or hypoenhancement of the target observation in the arterial, portal venous and late phase and, if available, very late phase (> 240 s). In case of “washout”, observers were asked to distinguish between early “washout” (starting < 60 s) and late “washout” (≥ 60 s).

MRI-LI-RADS

MRI examinations were evaluated according to LI-RADSv.2014 (Supplemental Fig. 2). One certified radiologist with 7 years and one resident with 4 years of experience in hepatobiliary imaging blinded to patients’ clinical data and final diagnosis (except for knowledge of a high-risk constellation, location and size of the target observation) independently reviewed the MRI examinations and assigned a LI-RADS category to the target observation.

MRI observers did not receive a particular training session; MRI-LI-RADS has been implemented for routine diagnosis in suspected HCC lesions in our department for 2 years.

Ancillary features and tie-breaking rules were not used. As MRI scans were available for one examination per patient, the major feature of “threshold growth” could not be assessed.

Reference standard

Final diagnosis was based on histology or, if histological findings were not available, on characteristic findings upon contrast-enhanced imaging and, in cases of benign lesions, constant appearance and lack of interval growth during follow-up imaging. Histological findings were obtained via ultrasound-guided biopsy in 30 cases (core biopsy in 29 cases, fine needle biopsy in one case), CT-guided biopsy in two cases, and surgical resection in four cases. For biopsy, a mean of 2.7 separate biopsies (range, 1–6) was taken at the decision of the examiner. Mean length of total tissue samples available for one patient was 38 mm (range, 4–97 mm). All histological diagnoses were made by two expert pathologists in consensus.

Statistical analysis

Quantitative variables are expressed as a mean ± standard deviation. Categorical variables are expressed as frequencies. Groups were compared using Fisher’s exact test. Cohen’s ĸ statistics were used for the evaluation of interobserver agreement. Results were interpreted as follows: ĸ = 0.81–1.00, (almost) perfect agreement; ĸ = 0.61–0.80, substantial agreement; ĸ = 0.41–0.60, moderate agreement; ĸ = 0.21–0.40, fair agreement; ĸ ≤ 0.20, slight agreement. SPSS-21 (IBM Corporation, Armonk, NY, USA) and Excel 2010 (Microsoft Corporation, Redmond, Washington, USA) were used for statistical analyzes. Differences were considered statistically significant for p < 0.05.

Results

Patient and tumour characteristics

Patient and tumour characteristics are shown in Tables 2 and 3. Of 50 lesions, 43 were HCCs (86%); two were intrahepatic cholangiocellular carcinomas (ICCs); five were benign lesions. Histological findings were available in 36/50 lesions (72%; 32 HCCs, two ICCs, two regenerate/dysplastic nodules).

Table 2 Patient characteristics (n = 50)
Table 3 Tumour characteristics

Of the benign lesions, three were regenerate/dysplastic nodules; one was a cyst; and one was a focal fat sparing. Three HCCs and two benign lesions were less than 20 mm in diameter. Mean lesion size on conventional ultrasound was 30 ± 16.6 mm (range, 16–69 mm) for non-HCC lesions, versus 42.9 ± 29.2 mm (range, 14–150 mm) for HCC lesions. In 31 cases, lesion size was larger on ultrasound than on MRI; in 19 cases, lesions were measured with greater size on MRI. However, there were no cases where different measurements upon ultrasound or MRI would have led to a discrepancy in size category as < 20 mm or ≥ 20 mm. Thus, slight differences in size measurements between imaging modalities did not affect LI-RADS categorization.

Concordant/discordant findings for CEUS-LI-RADS versus MRI-LI-RADS

Perception of major features differed between CEUS and MRI, although statistical significance was reached only for the perception of arterial phase hyperenhancement, but not washout appearance. Arterial phase hyperenhancement was observed in 76% of lesions (n = 38/50) with CEUS and 90% of lesions (n = 45/50) with MRI (p = 0.038; mean values from both observers); intermodality agreement between CEUS and MRI for the perception of arterial phase hyperenhancement was fair (κ = 0.329). “Washout” was seen in 54% of lesions (n = 27/50) in CEUS and 62% of lesions (n = 31/50) in MRI (p = 0.420); intermodality agreement for washout appearance was slight to fair (κ = 0.202).

With CEUS-LI-RADS, a considerable proportion of the 32 histologically proven HCCs were classified as LR-4 (8/32 versus 16/32 for observer 1 versus observer 2) and LR-3 (3/32 versus 1/32). With MRI-LI-RADS, 4/32 (12.5%) (observer 1) versus 6/32 (18.8%) (observer 2) histologically proven HCCs were categorized as LR-3, LR-4 or LR-M. In detail, one HCC was categorized as LR-3, two as LR-4 and one as LR-M by observer 1; one HCC was categorized as LR-3, four as LR-4 and one as LR-M by observer 2.

Intermodality agreement for CEUS and MRI for LI-RADS category was slight to fair (κ = 0.218). A direct comparison of LI-RADS classification with CEUS and MRI is presented in Table 4.

Table 4 Comparison of CEUS-LI-RADS and MRI-LI-RADS category (combined values from both observers of every modality)

Examples of LI-RADS categories in CEUS and MRI are shown in Figs. 2, 3 and 4.

Fig. 2
figure 2

Exemplary LI-RADS-3 lesion in both CEUS and MRI. Atypical HCC (18 mm; white arrowhead). a B-mode ultrasound shows a hypoechoic lesion. CEUS: b arterial isoenhancement, c portal-venous isoenhancement and d no “washout” in the delayed phase. T1w VIBE fat-sat: e non-contrast phase, f arterial phase with weak hyperenhancement, g portal-venous phase with weak hyperenhancement and h no measureable washout in the delayed phase

Fig. 3
figure 3

Exemplary LI-RADS-4 lesion in CEUS. For MRI tie-breaking rules have to be applied between LR-4 and LR-5. In this study no follow-up examination or ultrasound was available, therefore the lesion was classified LR-4. With threshold growth (LR-5g) or “washout” and visibility in ultrasound (LR-5us) it would have been upgraded. Small HCC (14 mm; white arrowhead). a Hypoechoic lesion in B-mode ultrasound. CEUS: b homogenous arterial hyperenhancement, c sustained portal-venous hyperenhancement and d no “washout” in the delayed phase. T1w VIBE fat-sat: e hypointense lesion in the non-contrast phase, f arterial hyperenhancement, g portal-venous hyperenhancement and h “washout” in the delayed phase

Fig. 4
figure 4

Exemplary LI-RADS-5 lesion. Large HCC (46 mm; white arrowhead). a Hypoechoic lesion in B-mode ultrasound. CEUS: b arterial hyperenhancement with an adjacent, now clearly visible lesion (black arrowhead), c slight “washout” in the portal-venous phase and d clear “washout” in the delayed phase. T1w VIBE fat-sat: e hypointense lesion in the non-contrast phase, f strong arterial hyperenhancement, g “washout” and hyperenhancing capsula in the portal-venous and h delayed phase

Interobserver agreement

CEUS-LI-RADS

There was no discordance between the two observers for the distinction between early “washout” (< 60 s) and late “washout” (≥ 60 s). Early “washout” (< 60 s) was not perceived in any of the lesions in the study collective by either observer. The two observers perceived arterial phase hyperenhancement in 74%/76% of cases (37/50 versus 38/50 cases; p = 0.640) and “washout” in 68%/40% (34/50 versus 20/50 cases; p = 0.005).

Interobserver agreement according to Cohen’s kappa was moderate for arterial phase hyperenhancement (ĸ = 0.511) and washout appearance (ĸ = 0.490), and fair for the CEUS-LI-RADS category (ĸ = 0.309) (Table 5).

Table 5 Interobserver agreement for major features and lesion categories

MRI-LI-RADS

The two observers perceived capsule appearance in 47/50 versus 40/50 cases (94%/80%; p = 0.037), arterial phase hyperenhancement in 46/50 versus 44/50 cases (92%/88%; p = 0.505) and “washout” in 31/50 versus 31/50 cases (62%/62%; p = 1.000).

Interobserver agreement for MRI-LI-RADS was moderate for capsule appearance (ĸ = 0.449), arterial phase hyperenhancement (ĸ = 0.565) and washout appearance (ĸ = 0.582), and substantial for the LI-RADS category (ĸ = 0.609, Table 5).

Discussion

Our study is the first one to assess interobserver agreement for MRI-LI-RADS and CEUS-LI-RADS in direct comparison. We found moderate interobserver agreement for MRI-LI-RADS for all three major features (arterial phase hyperenhancement, washout appearance, capsule appearance) and substantial agreement for MRI-LI-RADS category. With CEUS-LI-RADS we found moderate interobserver agreement for arterial phase hyperenhancement (ĸ = 0.511) and washout appearance (ĸ = 0.490) and only fair agreement concerning CEUS-LI-RADS category (ĸ = 0.309). Importantly, intermodality agreement between CEUS and MRI was only fair for arterial phase hyperenhancement (κ = 0.329), slight to fair for “washout” (κ = 0.202) and slight to fair for LI-RADS category (κ = 0.218). The fact that interobserver agreement for the final category was substantial for MRI and only fair for CEUS although interobserver agreement for major features was moderate for both modalities might be due to the fact that “washout” plays a more important role in MRI than in CEUS.

The results of this study demonstrate considerable discrepancy between MRI and CEUS in terms of major feature assessment and the final LI-RADS category assignment. We found that this is mostly due to the fact that perception of arterial phase hyperenhancement differs between CEUS and MRI. With the LR-5 category definition in CEUS-LI-RADS in its current version, a relevant proportion of HCCs is categorized as LR-4 or LR-3. With CEUS-LI-RADS, a lesion cannot possibly be categorized as definite HCC if it is either lacking arterial phase hyperenhancement or contrast “washout”. This results in the fact that a substantial proportion of HCCs is categorized as LR-4. These findings raise the question of whether “washout” should be a necessary prerequisite for the categorization of a lesion as HCC in CEUS. Contrast agents in CEUS differ from those used in MRI in that they remain strictly intravascular. Therefore, “washout” in CEUS cannot be equated with “washout” in MRI. There is evidence from the literature that “washout” should not be mandatory for the non-invasive diagnosis of HCC with CEUS in cirrhotic patients [8, 33]. In our study detection of “washout” differed significantly between observers with CEUS, but not with MRI, indicating the limitation of “washout” as a major criterion in CEUS

Another reason for the higher agreement of MRI-LI-RADS in comparison to CEUS-LI-RADS in our study can be seen in the issue of learning curves. CEUS-LI-RADS is a very recent development, whereas MRI-LI-RADS has been widely adopted since its first release in 2011. In our department, MRI-LI-RADS has been routinely used for several years by the MRI observers, whereas the CEUS observers needed a special training session prior to the study to become familiar with the use of CEUS-LI-RADS. Correspondingly, Quaia et al. showed that interobserver agreement for the assessment of hyper- or hypoenhancement of focal liver lesions in CEUS was moderate (ĸ = 0.47–0.63), but was better in experienced readers [34]. It might thus be expected that interobserver agreement will improve along with increasing application of the algorithm as has been shown for other “RADS” algorithms such as Breast Imaging Reporting and Data System [30, 35].

To date, there are no studies assessing interobserver agreement for standardized CEUS-based algorithms, and only a few studies addressing this issue for LI-RADS in MRI or CT. Thus, our results are not directly comparable to the literature. Studies assessing interobserver agreement for MRI-LI-RADS found values for Cohen’s kappa between 0.35 and 0.44 [30, 32], with strongest interobserver agreement for arterial phase hyperenhancement [30]. For LI-RADS-CT, ĸ values between 0.56 and 0.69 are reported [36,37,38].

However, our results suggest best intermodality agreement between CEUS and MRI for arterial phase hyperenhancement. This emphasizes the point that arterial phase hyperenhancement should be regarded as the CEUS key imaging feature of HCC in high-risk patients, whereas the diagnostic value of “washout” in CEUS deserves further investigation.

Our study has some limitations. These are the relatively small sample size, the retrospective nature and the single-centre design. However, the work was intended as a pilot study, which (for the first time) evaluates interobserver and intermodality agreement of CEUS-LI-RADS and MRI-LI-RADS in direct comparison. Another major limitation is the design with different observers for CEUS and MRI. The agreement might have been stronger if the same observers had interpreted both CEUS and MRI.

In conclusion, the interobserver agreement for major features was moderate for both CEUS and MRI. The interobserver agreement for the final LI-RADS category was substantial for MRI and only fair for CEUS. Furthermore, intermodality agreement for the final LI-RADS category between MRI and CEUS was only slight to fair. Further refinement of the LI-RADS algorithms and increasing education and practice may be necessary to improve the concordance between CEUS and MRI for the final LI-RADS categorization.