Introduction

Hepatocellular carcinoma (HCC) is the most common primary liver cancer and is associated with high mortality worldwide [1]. Gadoxetic acid-enhanced magnetic resonance imaging (MRI) is widely employed for the noninvasive identification of HCC in high-risk individuals, as advocated by leading guidelines [2,3,4,5]. Gadoxetic acid-enhanced MRI has proven particularly effective in identifying small HCCs (< 2 cm) because of the strong contrast between the lesion and the liver background during the hepatobiliary phase (HBP), resulting in high sensitivity [6]. However, factors such as transient respiratory motion during the arterial phase (AP), low signal-to-noise ratio (SNR), and a limited temporal window of dynamic phases may compromise the image quality and focal lesion detectability of gadoxetic acid-enhanced MRI [7, 8]. Over the years, numerous rapid imaging techniques, such as parallel imaging and compressed sensing, have been developed to enhance the temporal resolution of the dynamic phases [9, 10]. Despite these advancements, the utility of these techniques remains restricted by the intrinsic trade-off between scan duration and image quality parameters, such as SNR and spatial resolution.

Recently, deep learning (DL) algorithms have been applied to MRI, including image acquisition and reconstruction [11,12,13]. Specifically, numerous DL reconstruction algorithms (DLRAs) have been developed to enhance image quality by reducing noise and mitigating artifacts in two-dimensional T2-weighted fast spin-echo, T1-weighted gradient-echo (GRE), and diffusion-weighted images acquired using clinical scanners [14,15,16,17]. DL reconstruction amplifies the image sharpness and clarity by suppressing artifacts such as Gibbs ringing and increasing the SNR [18, 19]. Recently, vendor-specific DL reconstruction was introduced for 3D T1-weighted GRE sequences, which are commonly used in dynamic abdominal MRI scans (AIRTM Recon DL, GE Healthcare, Waukesha, Wisconsin, USA). Based on previous research outcomes that demonstrated the efficacy of DL reconstruction of 2D T1-weighted GRE, 3D T2-weighted sequences in other organs, and 3D T1-weighted MR enterography [20,21,22], we hypothesized that the implementation of DL reconstruction may enhance image quality in 3D T1-weighted GRE sequences of gadoxetic acid-enhanced liver MRI examinations. Consequently, this may improve focal lesion detectability, bearing significant clinical ramifications for early identification and management of HCC.

Therefore, this retrospective study aimed to investigate the effectiveness of a vendor-specific DLRA for 3D T1-weighted GRE sequences in improving image quality and focal lesion detectability of gadoxetic acid-enhanced liver MRI of patients at a high risk of HCC.

Materials and methods

This retrospective study was approved by the Institutional Review Board of Seoul National University Hospital (IRB No. H-2206-176-1336), which waived the requirement for informed consent.

Patients

Between September 2021 and May 2022, 106 consecutive patients underwent gadoxetic acid-enhanced dynamic liver MRI using a 3-T scanner, and DLRA was applied to the acquired images. Among these patients, we enrolled those who fulfilled the following inclusion criteria: (a) patients at risk of developing HCC and (b) age ≥ 18 years. Patients were excluded if (a) more than five focal liver lesions (FLLs) were found and (b) liver MRI was not consistent with the routine protocol for any reason. Thus, 23 patients were excluded because of no risk factor for HCC (n = 3) or equal or more than five FLLs (n = 20). Finally, 83 patients (male, n = 49; median age 65.8 [interquartile range 60.0–72.0] years) were included in this study (Fig. 1; Table 1).

Fig. 1
figure 1

Flow diagram of the study showing which patients were included or excluded

Table 1 Patients characteristics

MRI acquisition

All examinations were performed using a 3-T scanner (SIGNA Premier; GE Healthcare). Liver MRI examination consisted of the following sequences: heavily T2-weighted imaging (T2WI), fat-suppressed T2WI, precontrast T1-weighted imaging (T1WI), postcontrast T1WI (arterial, portal venous, transitional, and hepatobiliary phases), dual-echo imaging, and diffusion-weighted imaging with three b-values (50, 400, and 800 s/mm2; Table 2). A standard dose of gadoxetic acid (0.025 mmol/kg; Primovist/Eovist; Bayer Healthcare, Berlin, Germany) was administered with a 25 mL saline flush at a rate of 1 mL/s. Dynamic phases, including triple AP, portal venous phase (PVP, 60 s after contrast injection), transitional phase (3 min), and HBP (20 min), were obtained using a fat-suppressed 3D T1-weighted GRE sequence (LAVA, GE Healthcare) with standard resolution after the injection. Furthermore, during PVP and HBP, high-resolution axial images with a 2-mm slice thickness and 1-mm reconstruction interval were routinely obtained using a 3D T1-weighted GRE sequence immediately after standard-resolution imaging.

Table 2 Imaging parameters of gadoxetic acid-enhanced liver MRI

Deep learning reconstruction algorithm

The DLRA used in this study was a vendor-provided prototype version of the AIRTM Recon DL 3D (GE Healthcare) [14], which is now commercially available. This image reconstruction technique includes a deep convolution neural network to reduce noise, reduce Gibbs ringing, and improve image sharpness in all three directions. The MR images were reconstructed offline from the raw k-space data at a user-specified denoising level. In addition, among the liver MR images, triple AP, high-resolution PVP, and high-resolution HBP images were reconstruct ed using DL reconstruction at a 75% denoising level [22] and used for comparison between conventional and DL reconstruction.

Image analysis

All 83 pairs of MRI datasets (conventional and DL reconstructions) were evaluated by three abdominal radiologists with 5–7 years of experience in interpreting abdominal MR images. Triple AP, high-resolution PVP, and high-resolution HBP images were reconstructed using the conventional reconstruction technique and DLRA. Precontrast T1W and heavily T2W images reconstructed with the conventional method were added to both sets to determine whether enhancement existed and to exclude cysts. Therefore, conventional and DL image sets included axial precontrast T1W, heavily T2W, and postcontrast triple AP, high-resolution PVP, and high-resolution HBP images reconstructed with either conventional or DL reconstruction, respectively.

Paired sets of MR images generated with conventional and DL reconstruction methods were provided in random order. The image sets were anonymous and randomly assigned to folders A or B to avoid any bias. The reviewers were blinded to both the reconstruction method and clinical information except that the patients were at risk of developing HCC. A minimum 4-week washout period was used between the evaluations of folders A and B.

Image quality

For each image set, the degree of contrast enhancement of the hepatic vessels (right hepatic artery on AP, right portal vein on PVP, and right hepatic vein on HBP images), conspicuity of the hepatic vessels, liver edge sharpness, ringing artifacts, susceptibility artifacts, motion artifacts, subjective image noise levels, and overall image quality were evaluated for each phase. The detailed scale information used for image evaluation is provided in Table E1.

Detectability of focal liver lesions

Reviewers were asked to detect up to four FLLs per patient, excluding arterioportal shunts, post-treatment changes, and cysts, and to record the conspicuity score of each suspected lesion in each of the three phases (AP, PVP, and HBP) on a 4-point scale (Table E1). They also were requested to indicate the location and image slices of suspected lesions for localization.

Reference standards

Two experienced abdominal radiologists (J.H.Y. and J.H.K., with 15 and eight years of experience in interpreting abdominal MRI, respectively), who did not participate in the review session, evaluated the entire MRI sequence and follow-up images to identify and characterize FLLs in consensus. A total of 89 FLLs were identified and categorized according to the Liver Imaging Reporting and Data System (LI-RADS) version 2018 [23] and LI-RADS 2017 Treatment Response algorithm [24].

Statistical analysis

The Mann–Whitney U test was used to compare the image quality scores, which were averaged across the three readers, between the conventional and DL methods. The performance of each reader in detecting FLLs in the AP, PVP, and HBP was evaluated using figures of merit (FOMs) from a jackknife alternative free-response receiver operating characteristic (JAFROC) analysis (version 4.2.1) on a per-lesion basis. Comparisons of the FOMs were performed using the Hillis improvement [25] of the method described by Dorfman et al. [26] with the modeling assumption of random reader-random cases. Per-lesion sensitivity was calculated as the number of correctly localized lesions divided by the total number of lesions. The false-positive interpretation rate was defined as the number of false-positive interpretations divided by the total number of MRI scans. McNemar’s test was used to compare the sensitivities of the conventional and DL methods for individual readers. The generalized estimating equation was used to compare the pooled sensitivities, lesion conspicuity scores, and rates of false-positive interpretations between the conventional and DL methods.

Interreader agreement levels were evaluated using Gwet’s AC1 [27], as the prevalence of a trait and the bias of raters affect the kappa statistic. For example, if there is an imbalance in the number of positive or negative ratings, the kappa coefficient decreases. Similarly, if there is a consistent variation in the ratings made by raters, the kappa coefficient may be artificially high. In contrast, Gwet's AC1 is not affected by trait prevalence or rater bias, thus providing more reliable results [27]. For the interpretation of the chance-corrected agreement, we used the criteria suggested by Landis and Koch [28]: 0.00–0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; 0.81–1.00, almost perfect agreement.

Results

Patient demographics and FLL characteristics

The clinical characteristics of the patients are summarized in Table 1. In total, 89 FLLs (median diameter, 1.0 cm; interquartile range, 0.8–1.2 cm) were included in this study. Detailed information on the FLLs is presented in Table 1.

Image quality parameter comparisons

DL-reconstructed AP, PVP, and HBP images showed compared to conventionally reconstructed images significantly better overall image quality, image contrast, vessel conspicuity, liver edge sharpness, significantly less subjective image noise, ringing artifacts, and motion artifacts (P < 0.05 for all; Table 3; Fig. 2). In terms of susceptibility artifacts, no difference in AP and PVP images was found between conventional and DL methods (3.02 ± 0.46 vs. 3.01 ± 0.40 and 3.02 ± 0.47 vs. 3.02 ± 0.48; P = 0.505 and 0.886, respectively). However, for HBP, the DL method resulted in significantly fewer artifacts than the conventional method (3.21 ± 0.53 vs. 3.11 ± 0.46; P = 0.037; Table 3). In addition, the DL-reconstructed AP, PVP, and HBP images showed a better conspicuity score for FLLs than conventionally reconstructed images (P < 0.05; Table 3; Fig. 3).

Table 3 Comparison of image quality between conventional and deep learning reconstructions
Fig. 2
figure 2

Gadoxetic acid-enhanced MRI of a 66-year-old male patient with chronic hepatitis B. The conventional arterial (A), portal venous (B), and hepatobiliary (C) phases in the top row show worse vessel conspicuity (arrows), image contrast, liver edge sharpness, and increased ringing artifact (arrows) compared with the DL-reconstructed arterial (D), portal venous (E), and hepatobiliary (F) phases in the bottom row

Comparison of FLL detectability between conventional and DL reconstruction

In all three AP, PVP, and HBP images, the conventional and DL methods did not show significant differences in JAFROC FOMs (0.603 vs. 0.611, 0.603 vs. 0.635, and 0.648 vs. 0.682; P = 0.847, 0.467, and 0.617, respectively; Table 4). However, the DL reconstruction of AP, PVP, and HBP images demonstrated significantly higher pooled sensitivity than the conventional method (24.3% [65/267] vs. 21.7% [58/267], 30.7% [82/267] vs. 24.7% [66/267], and 41.9% [112/267] vs. 36.3% [97/267], respectively; P < 0.05). Table 4 summarizes the FLL detection performance of each reviewer.

Table 4 Comparison between observation detectability of conventional and deep learning reconstructions

Comparison of HCC diagnosis between conventional and DL reconstruction

The detection rates of LR-TR viable lesions in DL-reconstructed AP and HBP images were significantly higher than those in the corresponding conventionally reconstructed images (37.5% [18/48] vs. 22.9% [11/48] and 52.1% [25/48] vs. 39.6% [19/48], respectively; P < 0.05; Table 5; Fig. 3). In addition, DL-reconstructed PVP images showed a higher lesion detection rate for LR-3 than conventionally reconstructed images (24.0% [18/75] vs. 16.0% [12/75]; P = 0.031). In all other LI-RADS categories, no significant difference in lesion detection rates was observed between the conventional and DL methods (Table 5).

Table 5 Subgroup analyses of observation detectability of conventional and deep learning reconstructions
Fig. 3
figure 3

Gadoxetic acid-enhanced MRI of a 64-year-old female patient with chronic hepatitis B. The conventional arterial (A) and hepatobiliary (B) phases in the top row show worse LR-TR viable lesion (arrows) conspicuity compared with the DL-reconstructed arterial (C) and hepatobiliary (D) phases in the bottom row

Interreader agreement for image quality evaluation

Interreader agreements for image contrast, vessel conspicuity, liver edge sharpness, subjective image noise, ringing artifact, susceptibility artifact, motion artifact, and overall image quality in AP, PVP, and HBP ranged from moderate to almost perfect (Gwet’s AC1 range, 0.516–0.969). In terms of lesion conspicuity, a moderate-to-substantial agreement was found in the AP and PVP (Gwet’s AC1 range, 0.626–0.861), whereas agreement was poor to moderate in the HBP (Gwet’s AC1 range, 0.150–0.593; Table E2).

Discussion

In this study, we demonstrated that by implementing vendor-specific DLRA in 3D T1-weighted GRE sequences for gadoxetic acid-enhanced liver MR examinations of high-risk HCC patients, image quality and sensitivity in detecting non-cystic FLLs and LR-TR viable lesions were notably improved. DLRA substantially improved the overall image quality, image contrast, vessel conspicuity, and liver edge sharpness, while reducing subjective image noise, ringing artifacts, and motion artifacts. The enhancement in image quality with the DLRA can be attributed to the inherent ability of DL algorithms to address the noise and artifacts present in the imaging data. By operating on raw and complex-valued k-space data, these algorithms can suppress ringing artifacts, reduce noise, and enhance image sharpness and clarity [18]. The observed enhancements in image quality parameters are consistent with prior research that has shown the efficacy of DLRA in improving the image quality for a range of MRI sequences. These include 2D T1-weighted GRE sequences, 3D T1-weighted MR enterography, and T2-weighted sequences of various organs [15,16,17, 20,21,22].

Notably, in our study, the detection of LR-TR viable lesions in the AP and HBP was significantly improved with DLRA application. This finding is clinically relevant, as locoregional treatments are frequently used for early- and intermediate-stage HCCs. Furthermore, DL-reconstructed 3D T1-weighted GRE images displayed higher conspicuity scores for FLLs, ascribed to increased contrast and liver edge sharpness, as well as reduced noise and artifacts. The enhanced image sharpness and clarity and reduced noise and artifacts contributed to better identification of LR-TR viable lesions and higher conspicuity scores for FLLs. Given that the early detection of LR-TR viable lesions may enable additional treatment to achieve complete necrosis of HCC lesions and improve survival, this finding has substantial clinical value. Our results suggest that DLRA can enhance the diagnostic accuracy of gadoxetic acid-enhanced liver MRI for early-stage HCC detection in high-risk individuals, potentially improving patient management and outcomes. Future multicenter prospective studies with larger cohorts are required to validate and generalize these findings.

In our study, we employed triple AP imaging, acquiring three independent 3D datasets during a single breath-hold, and applied high acceleration factors to mitigate transient respiratory motion in gadoxetic acid-enhanced liver MRI. This is crucial because the occurrence of such motion in gadoxetic acid-enhanced liver MRI is not negligible, with reported incidence rates ranging from 3.2 to 26.7% [10, 29]. Because the depiction of AP hyperenhancement is crucial for HCC diagnosis [2], the acquisition of high-quality 3D GRE images during AP with high temporal and spatial resolution may benefit high-risk HCC patients. Various techniques, such as parallel imaging, view sharing, and compressed sensing, have been employed to achieve rapid AP image acquisition while preserving spatial resolution [8,9,10]. Nonetheless, these techniques may encounter challenges related to low SNR, artifacts, and motion-related problems. We discovered that DLRA significantly reduced image noise and artifacts while improving image resolution using a super-resolution algorithm [18, 19, 22]. By enabling a higher acceleration factor in the 3D T1-weighted GRE sequences, the breath-hold time might be reduced without sacrificing image quality. Recent research has also demonstrated that DLRA methods decrease 2D T1-weighted GRE sequence acquisition time while improving image quality. Our study findings indicate that triple AP imaging using 3D T1-weighted GRE sequences with DLRA and high acceleration factors can assist in the diagnosis of HCC, especially with gadoxetic acid.

Regarding interreader agreement, all items, except for lesion conspicuity score in the HBP (which ranged from poor to moderate), demonstrated moderate to almost perfect agreement. The HBP is known to be the most sensitive phase for lesion detection among the dynamic phases of liver MRI. Therefore, in terms of lesion conspicuity scores, it is believed that reader preferences (sensitivity vs. specificity) may have a relatively more pronounced impact on HBP compared to other phases.

Our study has some limitations. First, the retrospective nature of the study and the inclusion of a single-center cohort may have limited the generalizability of our findings. Second, we utilized a vendor-specific DLRA, which may restrict the applicability of our results to other MRI systems and DL algorithms. Third, the relatively small sample size and number of lesions may constrain the statistical power of our analyses. Forth, subtraction images were not included in our study data sets. Given that subtraction images could be beneficial for the evaluation of non-cystic focal liver lesions, it would be valuable to conduct future studies that incorporate subtraction images.

In conclusion, utilizing vendor-specific DLRA for 3D T1-weighted GRE sequences in gadoxetic acid-enhanced liver MRI can greatly enhance the image quality and improve the detection of FLLs, especially LR-TR viable lesions. The clinical adoption of DLRA for 3D T1-weighted GRE in liver MRI could potentially aid in the diagnosis and treatment of HCC.