Introduction

Hepatocellular carcinoma (HCC) is the most common primary hepatic malignant tumor and the fourth most common cause of cancer-related death worldwide [1, 2]. Because the diagnosis of HCC can be noninvasively made on the basis of specific imaging features without pathologic confirmation [3, 4], the accurate imaging diagnosis of HCC is an important clinical issue in the management of patients at risk for HCC. To improve the performance and standardization of the imaging diagnosis of HCC in at-risk patients, the Liver Imaging Reporting and Data System (LI-RADS) was developed in 2011 [5], and fully integrated into the American Association for the Study of Liver Diseases (AASLD) 2018 HCC clinical practice guidance [6]. LI-RADS assigns categories to liver observations (i.e., LR-1 to LR-5) on the basis of the presence of major and ancillary imaging features, with the major features including observation size, nonrim arterial-phase hyperenhancement, nonperipheral washout, enhancing capsule, and threshold growth.

Out of the five major features, two items (observation size and threshold growth) are related to the size of observation. Because the probability of HCC in a cirrhosis-associated nodule is positively correlated with the size of the observation (i.e., observations < 2.0 cm are more likely to be benign or well-differentiated malignancy, whereas the likelihood of malignancy increases in larger observations) [7, 8], observation size is considered an important major feature. In addition, treatment allocation of patients with hepatocellular carcinoma is dependent on lesion size. Thus, the accurate size measurement of observations is essential for providing proper management. Although LI-RADS states that size measurement should not be performed in arterial-phase (AP) or diffusion-weighted image (DWI) [6], there is a lack of clear guidance on which image sequence is optimal for size measurement.

A few studies have investigated correlations in HCC size between pathologic measurements and those on CT and MRI [9, 10], the results of these previous studies were limited because they did not provide results according to the specific phase of imaging [9], and also exclusively included HCC without including benign lesions or non-HCC malignancies [10]. In addition, gadoxetate disodium contrast agent has different pharmacokinetic characteristics to other extracellular contrast agents, and the evidence as to which image sequence is the most suitable for size measurement on gadoxetate disodium-enhanced MRI is insufficient.

Therefore, we aimed to determine the optimal image sequence for gadoxetate disodium-enhanced MRI in terms of size measurement of hepatic observations in comparison with pathologic measurement, and to evaluate its clinical impact on the LI-RADS v2018 classification.

Materials and methods

This single center study was approved by the institutional review board of our center, which waived the need for informed patient consent because of the study’s retrospective nature.

Patients

From our institution’s computerized databases, 595 patients who underwent gadoxetate disodium-enhanced MRI and subsequent hepatectomy within 1 month between January 2017 and December 2017 (Fig. 1) were retrospectively identified. Of these 595 patients, 157 were excluded because of no risk factor for HCC, 79 because of a hepatic observation more than 5 cm, 102 because they had undergone locoregional treatment for HCC before surgery, and four because of no focal lesion on pathology. Because the 5 cm is the generally accepted cutoff for small intrahepatic tumors which was used in patient selection for the liver transplantation [11, 12], we limited the lesion size accordingly (≤ 5 cm). Finally, 253 patients with 279 hepatic observations were analyzed in the present study.

Fig. 1
figure 1

Flowchart of the study population. HCC, hepatocellular carcinoma; MRI, magnetic resonance imaging

MRI technique

MRI examinations were performed using 1.5-T (Magnetom Avanto; Siemens Healthineers, Erlangen, Germany) or 3.0-T (Magnetom Skyra, Siemens Healthineers) scanners. The MRI protocol consisted of non-enhanced MRI using breath-hold dual gradient-echo T1-weighted imaging (T1WI), respiratory-triggered turbo spin echo T2-weighted imaging (T2WI), DWI using a respiratory-triggered single-shot echo-planar imaging sequence with b values of 0, 50, 500, and 900 s/mm2, and contrast-enhanced MRI. The contrast-enhanced MRI used a fat-suppressed three-dimensional spoiled gradient-echo T1-weighted sequence. After intravenous injection of 0.1 mL/kg body weight of gadoxetate disodium at 1 mL/s followed by a 20 mL saline flush, the following four image sequences were obtained at different phases: AP (determined using a test-bolus method); portal venous phase (PVP, 25 s after completion of the AP images); transitional phase (TP, 3 min after contrast injection); and hepatobiliary phase (HBP, 20 min after contrast injection). Further details of the MRI techniques and sequence parameters are provided in Online Resource 1.

Image analysis

Images were independently reviewed by two abdominal radiologists (J.H. and J.Y.C. with 4 and 2 years of experience in hepatic imaging, respectively). Because this study was mainly focused on the size measurement of hepatic observations, the readers were informed of the location of the target observations to be analyzed, but were blinded to the pathologically measured size of the target observations. The list of target hepatic observations correlating with pathologic findings was prepared by the third investigator (S.H.C.) who was not involved in the image analysis.

The readers evaluated the visibility of hepatic observations on the following six MRI sequences, T1WI, T2WI, AP, PVP, TP, and HBP. The visibility of hepatic observations on each sequence was scored using a 5-point scale: 1 = non-visible; 2 = visible, but faint; 3 = equivocal; 4 = mostly clear margin, but partly indistinct; and 5 = perfectly demarcated margin. For visible hepatic observations, size measurement was performed using the largest outer-edge-to-outer-edge dimension on both axial and coronal reconstructed images of each sequence according to LI-RADS v2018 [6]. The largest value in the three dimensions of axial and coronal images was determined as the size of the hepatic observations. In addition, the readers analyzed the presence or absence of major features (nonrim arterial-phase hyperenhancement, washout, or enhancing capsule), ancillary features, and targetoid mass features according to LI-RADS v2018 [6]. LI-RADS category was assigned based on the observation size measured in each MRI sequence. When LI-RADS classification was not available because of invisibility of hepatic observation in the analyzed MRI sequence, we categorized it into a non-visible observation.

In the case of any discrepancies between the two readers, the average value between the two readers was used for both visualization score and observation size, and re-evaluation with a third reader was made to reach a consensus on LI-RADS category assignment.

Reference standard

Gross liver specimens were thoroughly examined and evaluated by experienced pathologists. The resected liver was regularly cut at 5 mm intervals to reveal the largest cross-section centered on the hepatic mass. The pathologists identified the hepatic mass in each sliced section and analyzed its characteristics including location, size, and resection margin. Mass size was expressed as a three-dimensional measurement of the longest axes (width × length × height). For each mass, at least three formalin-fixed paraffin-embedded blocks containing the tumor and adjacent non-neoplastic liver tissue were fabricated, and all hematoxylin–eosin-stained slides were reviewed by expert hepatobiliary pathologists. If necessary, immunohistochemical staining was performed to determine the final diagnosis.

Statistical analysis

Continuous data are expressed as mean and standard deviation, and proportions as number and percentage. Statistical analysis was performed using SPSS Statistics for Windows version 23.0 (SPSS Inc., Chicago, IL, USA) and R version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria). For post hoc pairwise comparison, we used Bonferroni correction method to adjust for multiplicity, and all p values were considered statistically significant at p < 0.05.

The visualization scores for the visibility of hepatic observations were compared between the six MRI sequences using repeated measures analysis of variance (RM ANOVA) with post hoc comparison tests.

We compared the largest value in the three dimensions of MRI with the largest value in the three dimensions of pathologic specimen. To compare the observation size on the six MRI sequences with the size determined by pathologic evaluation, correlations between MRI and pathologic measurements were calculated using the Pearson correlation coefficient. The correlation coefficients (r) from these comparisons were interpreted as follows: 0.90‒1.00 = very strong correlation; 0.70‒0.89 = strong correlation; 0.40‒0.69 = moderate correlation; 0.10‒0.39 = weak correlation; 0.00‒0.09 = negligible correlation [13]. In addition, a RM ANOVA with post hoc comparison tests was used to compare MRI sequences in terms of the absolute difference between MRI and pathologic measurements, which was defined as the magnitude of the difference between the two measurements. Bland–Altman plots were also used to assess agreement between MRI and pathologic measurements.

To evaluate the clinical impact of size measurement using the different MRI acquisitions, the LI-RADS classification according to the size measurement of each MRI sequence was assessed, and compared between the six MRI sequences using a Cochran’s Q test with post hoc McNemar’s test.

Inter-reader reliability was evaluated using weighted kappa statistics for visualization score (0.00‒0.20 = slight; 0.21‒0.40 = fair; 0.41‒0.60 = moderate; 0.61‒0.80 = substantial; and 0.81‒1.00 = almost perfect) and using intraclass correlation coefficients (ICC) for size measurement (0.00‒0.39 = poor; 0.40‒0.59 = fair; 0.60‒0.74 = good; and 0.75‒1.00 = excellent) [14].

Results

Patient characteristics

The clinical characteristics of the 253 included patients with 279 observations are summarized in Table 1. The patients included 187 men (mean age = 63 years; range = 39–83) and 66 women (mean age = 65 years; range = 43–84). Of the 253 patients, 114 (45.1%, 114/253) had liver cirrhosis. The most common risk factor was hepatitis B (80.2%, 203/254), followed by hepatitis C (8.3%, 21/254). The mean size of the 279 included observations on pathologic measurement was 26.4 ± 11.2 mm (range = 4.0–50.0 mm). Of the 279 observations, 247 were confirmed as HCC, 21 as non-HCC malignancy (11 cholangiocarcinomas, nine combined hepatocellular cholangiocarcinomas, and one metastasis), and 11 as benignity (five dysplastic nodules, three hemangiomas, two regenerative nodules, and one bile ductular proliferative lesion).

Table 1 Baseline characteristics of the patients and lesions

Visualization scores on the six MRI sequences

Of the six MRI sequences, HBP showed the highest visualization score (4.1 ± 0.6), which was significantly higher than that on the other five MRI sequences (all p < 0.001; Table 2). TP had the second-highest visualization score (3.6 ± 1.1), which was similar to that of T2WI, and both were significantly higher than the other three MRI sequences (p ≤ 0.001). In the assessment of inter-reader reliability of visualization score, T1WI, T2WI, and PVP showed moderate agreement, whereas AP, TP, and HBP showed fair agreement (Online Resource 2).

Table 2 Comparison of visualization scores between the six MRI sequences

Correlations between MRI and pathologic measurements

Table 3 summarizes the correlations and comparisons of absolute differences in observation size between pathologic assessment and each MRI sequence. Observation size on all six MRI sequences showed a very strong and statistically significant correlation (all p < 0.001 and r > 0.9) with pathologic assessment. Of the six MRI sequences, HBP showed the highest correlation coefficient (r = 0.965), followed by TP (r = 0.960). The absolute difference between MRI and pathologic measurement was lowest on TP (2.3 mm ± 2.2), followed by HBP (2.4 mm ± 2.1), with no significant difference between TP and HBP (p > 0.999). Both TP and HBP had significantly lower absolute differences than AP (2.3 mm vs. 2.8 mm, p = 0.002 for TP; 2.4 mm vs. 2.8 mm, p = 0.002 for HBP) or PVP (2.3 mm vs. 2.7 mm, p = 0.006 for TP; 2.4 mm vs. 2.7 mm, p = 0.028 for HBP). Bland–Altman plots demonstrating the agreement between each MRI sequence and pathologic measurement are shown in Online Resource 3. The 95% limits of difference between HBP and pathologic measurement were − 5.2 and 6.8, whereas between TP and pathologic measurement they were − 6.2 and 6.4.

Table 3 Correlation coefficients and absolute differences in observation size between imaging and pathologic assessments

In the assessment of inter-reader reliability of size measurement, all six MRI sequences showed excellent agreement (ICC > 0.9; Online Resource 2). In both readers, HBP showed the highest correlation coefficient (r = 0.967 in reader 1 and r = 0.958 in reader 2, respectively; Table 4), and both TP and HBP demonstrated the lowest absolute difference between MRI and pathologic measurement (2.2 mm on both TP and HBP by reader 1 and 2.6 mm on both TP and HBP by reader2, respectively; Table 4). The inter-reader absolute difference of size measurement was 0.9 mm on TP and 0.8 mm on HBP, respectively.

Table 4 Correlation coefficients and absolute differences in observation size between imaging and pathologic assessments in the two readers

LI-RADS classification

The results of LI-RADS classification according to the size measurement of each MRI sequence are illustrated in Fig. 2. Of the six MRI sequences, HBP had no non-visible observations, preventing size measurement. HBP (0.0%, 0/279) showed a significantly lower percentage of non-visible observations than T1WI (3.6%, 10/279, p = 0.024), T2WI (3.9%, 11/279, p = 0.014), AP (3.2%, 9/279, p = 0.040), and PVP (3.2%, 9/279, p = 0.040). The percentage of non-visible observations between HBP and TP was not statistically significant (0.0% vs. 1.8%, p = 0.380). Compared with HBP, the LI-RADS classification was different in 4.3% (12/279) of T1WI, 4.3% (12/279) of T2WI, 5.7% (16/279) of AP, 3.9% (11/279) of PVP, and 2.2% (6/279) of TP. Regarding LR-3, LR-4, and LR-5, there was no significantly different LI-RADS classification among the six MRI sequences (p ≥ 0.122).

Fig. 2
figure 2

LI-RADS classification according to size measurement of the six MRI sequences. T1WI, T1-weighted image; T2WI, T2-weighted image; AP, arterial phase; PVP, portal venous phase; TP, Transitional phase; HBP, hepatobiliary phase

Discussion

In the present study, we found that hepatobiliary-phase imaging was superior to all other sequences in respect to both visualization score (mean of 4.1 on a 5-point scale) and correlation with pathologic measurement (r = 0.965 and mean absolute difference = 2.4 mm). In addition, of the six MRI sequences evaluated, hepatobiliary-phase imaging provided clear LI-RADS classifications without any non-visible observations. Therefore, hepatobiliary-phase imaging may be the optimal MRI acquisition for measuring hepatic observations on gadoxetate disodium-enhanced MRI.

In our study, HBP showed the highest visualization score for hepatic observations. Unlike the result of Seuss et al., who suggested PVP as a potential standard approach for size measurement and reported that AP had the highest percentage of visible HCC (96–98%) on extracellular contrast-enhanced MRI, AP did not have a high visualization score in our study (fifth highest of the six MRI sequences). Considering the smaller administered volume and lower gadolinium content of gadoxetate disodium-enhanced MRI in comparison with extracellular contrast-enhanced MRI [15], the weak arterial hyperenhancement with gadoxetate disodium could cause a relatively low visualization score. By contrast, we found that HBP had the highest visualization score, a finding similar to previous studies that reported improved detection and localization of focal hepatic observations, including HCC, using gadoxetate disodium-enhanced MRI[16,17,18] (e.g., the sensitivity for hypovascular HCC was significantly increased from 59 to 95% using HBP imaging with gadoxetate disodium-enhanced MRI[19]). This good performance can be explained by high lesion-to-liver contrast and high conspicuity during the HBP of gadoxetate disodium-enhanced MRI [20, 21].

The observation sizes on all of the six MRI sequences showed significant strong correlations with those on pathologic assessment (r > 0.9; p < 0.001). This indicates that size measurement on gadoxetate disodium-enhanced MRI is reliable. In particular, the mean absolute difference between MRI and pathologic measurement was 3 mm or smaller on all MRI sequences. Our results showed smaller absolute differences than those of Seuss et al. (2.3–3.0 mm vs. 4.3–6.8 mm). We think this difference may be because we measured the maximal diameter of hepatic observations using both axial and coronal plane MRI, whereas Seuss et al. used only axial plane MRI [10].

Both TP and HBP showed lower absolute differences than PVP, which had the lowest absolute difference in a previous study [10]. The lesion-to-liver contrast on PVP is determined by the true enhancement or washout of lesions, whereas that on the TP of gadoxetate disodium-enhanced MRI is derived not only from true enhancement and washout of lesions, but also from enhancement of surrounding hepatic parenchyma, because the uptake of gadoxetate disodium by hepatocytes starts approximately 90 s after contrast injection [22]. In other words, TP imaging has characteristics of both PVP and HBP imaging, which may be the reason for the better performance of size measurement on TP in comparison with that on PVP. In addition, given the differences in pharmacokinetics between gadoxetate disodium and other extracellular contrast agents, the results for TP in our study should be interpreted differently to those of the equilibrium-phase in the previous study [10]. By contrast, AP and T2WI showed relatively large absolute differences in our study (mean absolute difference = 2.8 mm on AP and 3.0 mm on T2WI). Because reactive or perfusion-related change in peritumoral liver parenchyma is often highly visible on AP imaging and artifacts caused by respiratory motion or other patient motion on T2WI can produce image blurring, ghosting, or loss of signal intensity [23, 24], size measurement on these sequences may not be reliable. Likewise, LI-RADS v2018 guidance does not recommend measuring the size of observations on AP or DWI if the margins are clearly visible on a different phase [6].

Regarding the LI-RADS classification, there was no significant difference in LR-3, LR-4, and LR-5 categorizations among the six MRI sequences. Considering the fact that the mean absolute difference between MRI and pathologic measurement was 3 mm or smaller on all MRI sequences and LI-RADS classification can depend on observation size measurement only in the case on the boundary of size criteria (i.e., an arterial-phase hyperenhancing observation with washout can be categorized into LR-4 when measured as 9 mm, whereas LR-5 when measured as 11 mm, Fig. 3), the effect of observation size measurement on LI-RADS classification may not be substantial. However, HBP did not have any non-visible observations. This may be a clinical implication that HBP enables LI-RADS classification by measuring observation size in all cases. Our findings suggest that HBP provides not only accurate size measurement, but also clear LI-RADS classification.

Fig. 3
figure 3

A 67-year old woman with HCC. (A-F) Gadoxetate disodium-enhanced MRI shows an arterial-phase hyperenhancing lesion (arrow) in segment VI (C) with washout on PVP (D) and HBP hypointensity (F). It was measured as 9, 8, 10, 8.5, 9, and 11 mm on T1WI (A), T2WI (B), AP (C), PVP (D), TP (E), and HBP (F), which was assigned as LR-4, LR-4, LR-5, LR-4, LR-4, and LR-5, respectively. It was confirmed as HCC and measured as 11 mm on pathologic examination. Arrows indicate hepatic observations on each phase

Our study has several limitations. First, a selection bias may have been introduced by the inclusion of only surgically resected observations, and this may have resulted in the high proportion of HCC in our study. However, to minimize this limitation, we consecutively included all surgically resected hepatic observations, whether non-HCC malignancies, benign lesions, or HCCs, rather than exclusively including HCCs diagnosed from liver explant, as was performed in a previous study (14). Second, although LI-RADS suggests that observation size is measured on the MRI sequences in which margins are clearest, we measured observation size on the six MRI sequences to determine the optimal MRI sequence in terms of size measurement in comparison with pathologic measurement. Third, size measurement on T2WI may be limited because of suboptimal image quality of reconstructed coronal T2WI using two-dimensional images.

In conclusion, hepatobiliary-phase images had the best lesion visibility and highest correlation of measured observation size with pathologic assessment among the six MRI sequences gadoxetate disodium-enhanced MRI. Therefore, hepatobiliary-phase images can be considered clinically useful for measuring hepatic observations on gadoxetate disodium-enhanced MRI.