Introduction

Hepatocellular carcinoma (HCC) is the sixth most common cancer and second most common cause of cancer mortality worldwide, with a 5-year survival rate of 18% [1]. It is one of the leading causes of death in patients with cirrhosis, and early tumor detection in the form of biannual screening/surveillance has improved survival in these high-risk patients [2,3,4,5,6]. Ultrasound (US) is the most widely used modality for HCC screening/surveillance (with or without serum alpha-fetoprotein (AFP) testing) and is recommended as the preferred imaging test by all major liver societies worldwide [7,8,9].

The American College of Radiology (ACR) has developed the US Liver Imaging Reporting and Data System (US LI-RADS) algorithm to streamline such examinations, improve communication among physicians, and guide patient management utilizing a distinct visualization score and an assigned US category. The visualization score is a subjective assessment and informs the expected sensitivity of the study with three possibilities: score A (no or minimal limitations), score B (moderate limitations), and score C (severe limitations). The US category guides management and has three possibilities: US LI-RADS 1 (negative; no US evidence of HCC), US LI-RADS 2 (subthreshold; observation(s) < 10 mm in diameter, not definitely benign), and US LI-RADS 3 (positive; observation(s) ≥ 10 mm in diameter, not definitely benign, or new thrombus in vein) [7, 10]. These scores and categories apply to the patient level rather than observation level.

A variety of technical and patient-related factors may affect the quality of the US examination, with examples including suboptimal beam penetration related to the patient’s body habitus or background liver heterogeneity, inconsistent or unsatisfactory scanning technique by the sonographer (when applicable), and inability of the patient to follow breathing instructions, among many others. Thus, patients undergoing HCC screening/surveillance with US may be at-risk of having an early-stage HCC missed, potentially leading to increased morbidity and mortality.

Two studies have evaluated the clinical performance of US LI-RADS and assessed various predictors of visualization scores, including a large study of over 2000 patients [11, 12]. To complement these studies, we tested the effect of demographics and clinical factors on US visualization score. The purpose of our study is to describe US quality for HCC screening/surveillance using the US LI-RADS scoring system, and to assess predictive factors of worse US quality scores.

Materials and methods

Patients

This retrospective, Health Insurance Portability and Accountability Act (HIPAA)-compliant single-center study was approved by our institutional review board with exemption for patient consent. The Department of Radiology imaging database was queried for abdominal US examinations performed at our major urban academic tertiary transplant center between January 1, 2017 and December 31, 2017. Included patients were adults (age ≥ 18 years) deemed at-risk for HCC (those with non-cirrhotic chronic hepatitis B (HBV) or cirrhosis not due to congenital hepatic fibrosis or a vascular disorder) that underwent US for routine HCC screening/surveillance. The patient population consisted of 470 patients (M/F 264/206, median age 59.0 years, IQR 49.8–66.0, range 23–87); this population is a subset of the cohort published in a recent study assessing patterns of HCC screening at our institution [13].

Demographic and clinical data was recorded for each patient from the electronic medical record, including age, gender, race/ethnicity, BMI, presence/absence of cirrhosis, etiology of underlying liver disease, and Child–Pugh score (where applicable). Characteristics of our study cohort are summarized in Table 1.

Table 1 Characteristics of our study cohort (n = 470)

Image acquisition

Abdominal US examinations were performed at outpatient imaging facilities at our hospital utilizing Philips EPIQ or Philips iU22 US systems (Philips Healthcare, Andover, MA). The US examinations were performed by one of 15 technologists over the course of the year according to a standardized institutional protocol.

Image analysis

Visualization scores were randomly assigned to the abdominal US examinations independently by two board-certified fellowship-trained abdominal radiologists (KL, SR), with 16- and 20-years’ experience, respectively, using PACS (Centricity 3.0, General Electric Medical Systems) [14]. Examples of score A include studies where the liver is homogeneous or minimally heterogeneous, there is minimal beam attenuation or shadowing, or where the liver is visualized in near entirety. Examples of score B include studies where the liver is moderately heterogeneous, there is moderate beam attenuation or shadowing, or some portions of the liver or diaphragm are not visualized. Examples of score C include studies where the liver is severely heterogeneous, there is severe beam attenuation or shadowing, or most of the liver and/or diaphragm are not visualized (Fig. 1) [10].

Fig. 1
figure 1

Illustration of ultrasound (US) LI-RADS visualization scores in patients undergoing US for HCC screening. A US LI-RADS visualization score A (no or minimal limitations): 42-year-old male with BMI of 28.6 kg/m2 and noncirrhotic chronic HBV. Longitudinal grayscale US image of right hepatic lobe demonstrates noncirrhotic, homogeneous liver parenchyma with no beam attenuation or shadowing. Portion of liver imaged is visualized in its entirety with clearly visible right hemidiaphragm and liver border. No observation identified. B US LI-RADS visualization score B (moderate limitations): 55-year-old female with BMI of 29.6 kg/m2 and HCV cirrhosis. Longitudinal grayscale image of right hepatic lobe demonstrates moderately heterogeneous, cirrhotic liver parenchyma with some beam attenuation and shadowing, with a portion of liver and right hemidiaphragm not visualized. No focal observation clearly identified. C US LI-RADS visualization score C (severe limitations): 62-year-old male with BMI of 32.5 kg/m2 and NASH cirrhosis. Longitudinal grayscale image of right hepatic lobe demonstrates severely heterogeneous, cirrhotic liver parenchyma with significant beam attenuation and shadowing, with majority of right lobe and right hemidiaphragm not visualized. No focal observation clearly identified

Statistical analysis

Demographic, clinical, and imaging data were summarized using descriptive statistics. The difference in baseline characteristics between cirrhotic and noncirrhotic patients was assessed using the Wilcoxon-rank sum and Fisher’s exact tests. A generalized linear mixed model, accounting for the correlation of the measurements from the same patients and readers, was used to test the effects of cirrhosis and obesity (defined as BMI ≥ 30 kg/m2) on worse visualization score (B or C), after adjusting for covariates of age, gender, and race/ethnicity utilizing odds ratio (OR) statistics. Kappa coefficient (K) assessed inter-reader visualization score and diagnostic category agreement using the following categories: 0 to 0.20 as slight, 0.21 to 0.40 as fair, 0.41 to 0.60 as moderate, 0.61 to 0.80 as substantial, and 0.81 to 1 as almost perfect agreement [15]. A p-value ≤ 0.05 represented an acceptable level of statistical significance in our study. All statistical tests were conducted using R statistical software (version 4.1.3, Vienna, Austria).

Results

US findings

For reader 1, 295 (62.8%) cases were scored as visualization score A, 153 (32.6%) as score B, and 22 (4.6%) as score C. For reader 2, 320 (68.1%) cases were scored as visualization score A, 134 (28.5%) as score B, and 16 (3.4%) as score C (Fig. 2). There was moderate inter-reader agreement for US LI-RADS visualization score (K = 0.478) and there was 100% concordance among readers for assigned category (Kappa = 1), with 430 (91.5%) cases scored as US LI-RADS 1, 10 (2.1%) as US LI-RADS 2, and 30 (6.4%) as US LI-RADS 3. Of the 30 cases scored LI-RADS 3 by both readers, 9 were ultimately characterized as probable (LI-RADS 4) or definite (LI-RADS 5) HCC on follow-up confirmatory CT or MRI, yielding a positive predictive value (PPV) of 30% (9/30), with 21 false-positives ultimately characterized as LI-RADS 1 (n = 12), LI-RADS 2 (n = 3), and LI-RADS 3 (n = 6) on follow-up cross-sectional imaging (Figs. 3 and 4, Table 2). There were no cases that had more than 1 lesion.

Fig. 2
figure 2

Ultrasound LI-RADS visualization score breakdown per reader in our cohort (n = 470)

Fig. 3
figure 3

Illustration of US LI-RADS chronic HCV cirrhosis. Transverse grayscale image of left hepatic lobe demonstrates a 3.9 cm homogeneously echogenic subcapsular observation in segment 2 (A; arrow). This was confirmed to be hepatocellular carcinoma on subsequent MRI (LI-RADS 5) (B, C; arrows), characterized by arterial phase hyperenhancement, portal venous washout, and enhancing capsule

Fig. 4
figure 4

Illustration of US LI-RADS 3 false-positive. A US-3 (false-positive): 50-year-old female with noncirrhotic chronic HBV. Longitudinal grayscale image of right hepatic lobe demonstrates a 2.2 cm homogeneously echogenic subcapsular observation near the hepatic dome (arrow). B, C. Subsequent confirmatory MRI demonstrates a focal region of signal intensity loss on opposed-phase T1 weighted image compared to in-phase image (arrow) without abnormal enhancement (not shown), consistent with focal fat deposition

Table 2 Details of the 40 cases scored as either US LI-RADS 2 or US LI-RADS 3 by both readers

Predictive factors of LI-RADS visualization score

The interaction between cirrhosis and BMI was not significant (OR 0.96 [0.83–1.10], p = 0.53), and there was no significant difference in the baseline characteristics of age, race/ethnicity, BMI, and etiology of liver disease between cirrhotic and noncirrhotic patients (all p-values < 0.001). Cirrhosis (OR 10.4 [4.25–25.48], p < 0.001) and obesity (OR 3.61 [2.11–6.20], p < 0.001) predicted worse visualization scores (B or C) after adjusting for covariates of age, gender and race/ethnicity.

Discussion

In this study performed in patients undergoing HCC screening/surveillance, we demonstrated acceptable inter-reader agreement for US LI-RADS visualization score (K = 0.478), with approximately one-third of cases being moderately diagnostic (visualization score B) and approximately 5% of cases being essentially nondiagnostic (visualization score C). Of the 30 US LI-RADS 3 lesions requiring confirmatory cross-sectional imaging, 9 were ultimately characterized as probable or definite HCC on follow-up CT or MRI, yielding a PPV of 30% (9/30). We also demonstrated that cirrhosis and obesity are both similar independent predictors of worse US quality, revealing the high impact cirrhosis and obesity have on worse visualization score. Our 100% concordance among readers in assigning US LI-RADS category is not surprising and is likely related to sonographer annotations, however, this may not be the case elsewhere.

These results are closely aligned with the largest study published to date assessing US LI-RADS clinical performance in over 2000 patients, where the authors demonstrated a visualization score breakdown (76.8% A, 18.9% B, 4.2% C) and US category breakdown (90.4% US LI-RADS 1, 4.6% US LI-RADS 2, 4.9% US LI-RADS 3) similar to our study [11]. Our moderate inter-reader agreement for visualization score aligns closely with a very recent Canadian study by Kiri et al. assessing US LI-RADS visualization score in 237 US examinations that also demonstrated moderate inter-reader agreement (K = 0.51) [12]. To complement these published studies, we tested additional factors that may affect US quality, including race/ethnicity and BMI. Like in our results, the study by Millet et al. demonstrated a PPV of 35% for US LI-RADS; however, it demonstrated high sensitivity of 82% for HCC detection despite other reported sensitivities as low as 45% in a recent meta-analysis [3, 11]. Our results closely align with a study assessing outcomes of 267 US LI-RADS 3 observations; in which the authors demonstrated a PPV of a US LI-RADS 3 observation for probable or definite HCC of 18.8% [16]. No follow-up was sought on our patients with US LI-RADS 1 or 2 scores, preventing us from calculating sensitivity and specificity.

Increased BMI is widely associated with poorer quality US, both scientifically and in clinical practice. In the aforementioned Canadian study, the authors demonstrated similar results in that patients with BMI ≥ 25 kg/m2 were more likely to have visualization scores B or C at uni- and multivariable analysis [12]. In another recent study of 352 patients assessing predictors of US failure to detect HCC, the authors demonstrated significantly reduced US sensitivity relative to CT or MRI in patients with BMI ≥ 30 kg/m2 or NASH [17]. A study of 941 patients undergoing US HCC screening demonstrated that US was inadequate for excluding HCC in over one-third of patients with BMI > 35 kg/m2, Child–Pugh C cirrhosis, or NASH cirrhosis [18]. Anecdotally, a published survey of 14 sonographers revealed a general consensus that the best abdominal US images are obtained in patients with normal BMI, whereas images obtained from patients with BMI 30.0 to 34.9 kg/m2 were limited in quality [19]. These findings are due to the fact that the mean depth of insonation is deeper for obese patients requiring a greater distance that the US waves have to travel, resulting in greater absorption and dispersion in the surrounding tissues with increased refraction backscatter and decreased signal-to-noise ratio (SNR) [20]. Decreased SNR leads to impaired US visualization of liver lesions and reduces the sensitivity of US screening/surveillance; combined with recently published data, these findings overall suggest that overweight or obese patients undergoing HCC screening/surveillance with US may benefit from other modalities to best detect potentially curable HCC, to allow early application of curative therapies, and to reduce morbidity and/or mortality.

HCC screening is performed with CT or MRI in many centers given the improved sensitivity and ability to detect early-stage tumors associated with these modalities [13, 21]. In a prospective Korean study directly comparing the performance of US and MRI for HCC screening/surveillance, the PPVs for HCC detection with US and MRI were 17% and 54%, respectively, and sensitivities for HCC detection were 28% and 86%, respectively [22]. The higher cost of these modalities, along with increased radiation exposure (CT) and relatively long exam times (MRI), raise questions about which imaging modality is optimal for HCC screening/surveillance. These drawbacks must be reconciled with the fact that US screening generally demonstrates higher false-positive rates than CT or MRI, often revealing indeterminate findings that require additional imaging and/or biopsy, adding to additional potential cost and harm [21, 23, 24].

In our study, a substantial proportion of patients (37% and 32%, for readers 1 and 2, respectively) were assigned visualization scores of B or C, highlighting the suboptimal US image quality frequently encountered in patients typically undergoing HCC screening/surveillance. In a very recent study of 3169 patients who underwent multiple HCC surveillance US examinations, the probability of a repeat visualization score C on subsequent US increased with a higher number of preceding examinations with score C [25]. Potential future screening directions for this population include a patient-centered quality scoring system incorporating individual patient factors (such as presence of cirrhosis and BMI) to adequality stratify risk and identify those in whom US sensitivity may be reduced; in these patients, and especially in those with previous US examinations with score C, screening with CT or MRI would likely be beneficial. Abbreviated MRI techniques could also be used to mitigate the financial and time cost of a complete MRI, while providing more comprehensive information than US about the liver and other abdominal organs potentially relevant to transplant candidates [26,27,28,29].

Our study has several limitations, most notably being its retrospective, single-center design. Our patient population being from a densely populated, urban setting is not necessarily reflective of the true national or global human population. Although patients were scanned according to a standardized institutional protocol, obtaining images from 15 different sonographers and 2 different US systems from the same vendor may have introduced variability in image quality. Our limited sample size, limited number of studies categorized as US LI-RADS 3, and suboptimal distribution of clinical variables prevented us from controlling for too many covariates (including etiology of underlying liver disease and Child–Pugh score) in the statistical analysis, and from providing more robust descriptive statistics. Finally, no follow-up was sought on any patients in our cohort receiving US LI-RADS 1 or US LI-RADS 2 scores, preventing us from calculating sensitivity and specificity.

In conclusion, we report an acceptable inter-reader agreement when assigning a US LI-RADS visualization score and 100% concordance in assigning a diagnostic category. Moderate to severe limitations in quality of US performed for HCC screening/surveillance was observed in approximately a third of patients in our cohort. Patients with cirrhosis and/or obesity are more likely to have poorer quality US studies and may benefit from other screening modalities such as CT or MRI. Future research should consider stratifying patients at-risk into adequate screening tests.