Introduction

Magnetic resonance elastography (MRE) of the liver estimates the parenchymal stiffness that is positively correlated with the histopathologic stages of liver fibrosis induced by viral hepatitis and non-alcoholic fatty liver disease (NAFLD) [1,2,3,4,5]. Hepatic MRE uses phase contrast sequence and motion encoding gradient to capture the propagation of the shear wave generated by an external low frequency vibration [6,7,8]. Wave images of the hepatic shear wave in motion are then processed to elastograms or stiffness maps to enable quantitative measurement of the hepatic parenchymal stiffness [8]. By reflecting liver fibrogenesis and ensuing collagen proliferation with crosslinking, MRE can yield high accuracy in predicting the severity of fibrosis noninvasively [9]. Hence, liver stiffness through MRE is a promising quantitative imaging biomarker (QIB) that could potentially replace biopsy [10, 11]. In recent clinical trials, liver stiffness obtained with MRE was shown to be a feasible QIB and adopted as one of the main outcomes in assessing the efficacy of drugs reducing hepatic steatosis in NAFLD [12,13,14,15].

Since scientific evidence increasingly indicates liver stiffness through MRE to serve as a QIB, studies to establish the variability of measurement, or precision, under a range of study conditions are needed [16]. Variability in the measurement of QIB is represented in terms of repeatability considered as the measurement variability under identical or nearly identical condition, and reproducibility, which refers to that under a variety of conditions [16, 17]. Reproducibility of the QIB is more easily transferrable to actual clinical setting wherein the measurement value is acquired under inherently inconsistent conditions [17]. Specifically, in longitudinal study requiring comparisons of repeated measurements over time, a robust QIB should be constantly reproducible to ensure reliability of the study. In this setting, it is important to distinguish between change that represents a true response to intervention and variability in measurement, based on reproducibility of the QIB [17].

Prior studies have assessed the repeatability and reproducibility of hepatic MRE using the same or different parameters of field strength, MRE pulse sequence including gradient echo (GRE) and spin echo–echo planar (SE-EPI) sequences, imager manufacturer, scan interval, and driver [18,19,20,21,22]. These studies and a recent inclusive meta-analysis reported good repeatability/reproducibility of hepatic MRE and provided a pooled repeatability coefficient (RC) of 22% [23]; while variability of the measurement value under a single scan variant is discussed in depth, the effect of combining relevant factors might still needs to be validated. Our study aimed to demonstrate the reproducibility of liver stiffness measured with MRE under various combinations of settings of field strength, pulse sequence, scan interval, and readers expressed in reproducibility coefficient (RDC).

Materials and methods

This prospective study was approved by our Institutional Review Board (IRB) and is compliant with Health Insurance Portability and Accountability Act (HIPAA). Written informed consent was obtained from all subjects.

Study design

The current study population was included in the prior study which reported the repeatability and reproducibility of proton density fat fraction measurement [24]. On a sample size calculation, 19 subjects were required to detect a correlation of 0.6 or greater when statistical power was set at 0.8 under a significance level of 0.05. With a dropout rate of 20%, 24 subjects were planned to be recruited.

From August 2017 to December 2017, newly diagnosed NAFLD patients were prospectively enrolled from our outpatient clinic for liver disease. The diagnosis of NAFLD was based on radiologic evidence, clinical history, and biochemical profiles [25]: The inclusion criteria for the participants were (1) ≥ 18 years old, (2) hepatic parenchymal echogenicity brighter than renal cortex on screening ultrasound, (3) no significant amounts of alcohol consumption, (4) no coexisting chronic liver disease, and (5) no exposure to known hepatotoxins [25]. Subjects contraindicated to MRI were excluded from the study.

In all subjects, serial MRE was performed with 1.5 T (Signa HDxt; GE Healthcare, Milwaukee, WI, USA) and 3.0 T (Discovery 750 W; GE Healthcare) MR scanners on the same day less than 15 min apart, and after 2 weeks. MRE was performed at each field strength once per visit; thus, a total of four MRE examinations were performed per patient included in the study.

MRE setup and protocol

MRE was obtained using a commercially available MRE technique (MR-Touch; GE Healthcare). Subjects were instructed to fast for at least 4 h before the examination, and positioned with a phase-array coil centered at the liver. A passive driver was securely placed against the right anterior chest wall over the liver to transmit the 60-Hz acoustic waves generated by an active driver located outside the MRI suite. Six experienced MRI technologists performed the MRE, and we did not specifically match the technologist with the patient. After finishing the 1st scan, the patient was transported to the other room for the 2nd scan.

Hepatic shear waves were imaged using motion-sensitized 2D GRE and 2D SE-EPI MRE pulse sequences with 1.5 T and 3.0 T MR systems, respectively [26, 27]. Four contiguous axial image slices through the mid liver were acquired during four cycles of end-expiratory breath-holds with 2D GRE sequence and two cycles of end-expiratory breath-holds with 2D SE-EPI sequence. Detailed parameters of MRE pulse sequences are shown in Table 1.

Table 1 MR elastography pulse sequence parameters

Quantitative analysis of MRE

Wave images and elastograms (stiffness maps) were automatically generated for each image slice on the console, and these were submitted to a post processing software (READY View version 12.3; GE Healthcare). A 95% confidence map was superimposed on each elastogram to indicate the area of high-quality data to be included in the stiffness measurement [28].

Three readers with different levels of expertise in MRE technique and knowledge in liver anatomy independently measured the liver stiffness. Reader 1 and 2 (R1 and R2; K.B. and K.H.J.) were abdominal radiologists with more than 5 and 10 years of experience in abdominal MRI, respectively, and Reader 3 (R3; Y.H.J.) was an image analyst with 6 months of experience in processing body MR images. R1 had 3 years of experience in obtaining liver stiffness with MRE, and trained R2 and R3 with 20 clinical cases (4 image slices × 20 cases = 80 elastograms) before formal measurement was conducted. Regions of interest (ROIs) were manually drawn in the largest possible liver parenchyma bounded by confidence maps while avoiding the major vessels, biliary tree, and areas immediately under the liver capsule in the right liver [22, 29]. The readers were blinded to the clinical data of patients and measurement values by the other readers. Four separate review sessions were held for each reader with 1-week washout period during review of the MRE from the same patient. Liver stiffness in MRE was defined as the mean measurement value obtained from four image slices expressed in kilopascals (kPa) according to the method considered reproducible in prior studies [22, 30]. To assess intrareader agreement, repeat measurement values were obtained by R1 and R3 in the same manner. ROI sizes were recorded by R1.

Statistical analysis

Liver stiffness values are presented as means and SD in kPa. A repeated measure analysis of variance (ANOVA) was used to assess differences in the value of liver stiffness among readers. A post-hoc Bonferroni test was used to identify significant difference in pairwise comparisons. Bonferroni-corrected P value of < .05 was considered statistically significant.

Reproducibility of measurements across field strengths in the same day and at 2 weeks of interscan interval was assessed by within-subject coefficient of variation (wCV) and RDC [16, 17]. In cases of failed MRE acquisition, reproducibility analysis was conducted with the corresponding pairs excluded, and their values were only used for descriptive statistics. The bias of measurement was evaluated with Bland–Altman plots. The correlation of measurement was assessed by linear regression analysis. Interobserver agreements were assessed using two-way random-effects model intraclass correlation coefficients (ICCs). Intraobserver agreements for R1 and R3 were analyzed, and 95% confidence interval (CI) was calculated for each ICC. Paired Student’s t test was used to compare the size of ROIs between 1.5 and 3.0 T. P value of < .05 was considered statistically significant.

Statistical analysis was performed using MedCalc version 19.0.3 (MedCalc, Ostend, Belgium) and Statistical Package for the Social Sciences (SPSS) version 25.0 (IBM, Armonk, NY, USA).

Results

Study population

Of 24 initially enrolled patients, four dropped out of the study: One patient was lost to follow-up and did not attend the 2nd MRE session, and the others withdrew their consents before the 1st MRE sessions. Finally, a total of 20 patients who completed the scheduled imaging sessions including 13 male individuals and seven female individuals with mean age ± standard deviation (SD) of 43.2 years ± 15.4 (range 18–72) and mean body mass index ± SD of 27.5 kg/m2 ± 5.2 (range 19.2–38.2) were included.

MRE failures

MRE was considered unsuccessful if the total number of the valid pixels bounded by the 95% confidence map was less than 500 in the four image slices [1]. MRE failure was present in four examinations (5%) of three patients. Of these, two failed MRE examinations in 3.0 T in two patients at the 1st visit in the early phase of the study was due to absence of waves on the wave images, which is probably associated with technical failure in the driver setup [1]. Subsequently, the function of active and passive drivers was double checked throughout the study. In one patient, there was failure of two consecutive MRE examinations in 1.5 T owing to lack of fit of the abdominal girth within the magnet bore.

Liver stiffness values

The mean liver stiffness values by each reader in 1.5 T and 3.0 T at two visits are shown in Table 2. No significant intrareader difference in the measurement value at four different MRE examinations was obtained; whereas significant interreader difference in the measurement value was obtained between R3 and the other two readers, although the mean difference was small (< 0.3 kPa).

Table 2 Mean liver stiffness measurement in MR elastography

Reproducibility of the value of liver stiffness

The reproducibility of liver stiffness measurement at different field strengths/MRE sequences, and visits of each reader is shown in Table 3. At 2-week interscan interval, under the same field strength, the range of wCV and RDC in each reader was 5.6–9.4% and 15.4–26%, respectively, and for pooled values of all readers, the RDC in both 1.5 T and 3.0 T was between 21.6 and 22%. Reproducibility tended to be lower for MRE performed at different field strengths/MRE sequences: largest values of wCV and RDC were obtained when stiffness was measured across the field strengths/MRE sequences over 2 weeks of interval (wCV range 4.9–10.3%; RDC range 13.5–28.5%). The range of pooled values of RDC was between 19.7 and 26%.

Table 3 Reproducibility of liver stiffness measurement in MR elastography

The mean biases and 95% limit of agreement (LOA) of the liver stiffness measurement are shown in Fig. 1. At the same field strength over 2–week interval, the mean bias ± LOA was 0.04 kPa ± 0.49 under 1.5 T and − 0.03 kPa ± 0.53 under 3.0 T. Across field strengths/MRE sequences, the mean bias ± LOA was − 0.03 kPa ± 0.46 at the same day, and − 0.02 kPa ± 0.55 at 2 weeks of interscan interval.

Fig. 1
figure 1

Bland–Altman plots for the mean biases and limits of agreement (LOAs) of pooled liver stiffness measured under various scan conditions. a measurement at 2-week interscan interval with 1.5 T; b measurement at 2-week interscan interval with 3.0 T; c measurement performed on the same day across field strengths/MRE sequences; d measurement at 2-week interscan interval across field strengths/MRE sequences. Mean biases range from − 0.03 to 0.04 kPa, and LOAs are within 0.55 kPa

Linear regression of liver stiffness measurement

Linear correlations of measurement values under various scanning conditions and their correlation coefficients (r2) are shown in Fig. 2. For 114 measurements under 1.5 T (19 subjects × two scans × three readers) and 108 measurements under 3.0 T (18 subjects × two scans × three readers), moderate to strong linear correlation of the pooled values of liver stiffness at the same field strength over 2 weeks of interscan interval was observed (r2= 0.684 for 1.5 T; r2 = 0.515 for 3.0 T). For 216 measurements [(17 subjects × two scans × three readers) + (19 subjects × two visits × three readers)], strong linear correlation of the measurement values across field strengths/MRE sequences on the same day (r2 = 0.701), and moderate linear correlation (r2= 0.586) of those at 2–week interscan interval were observed.

Fig. 2
figure 2

Linear correlations of pooled measurement values of liver stiffness. Moderate to strong linear correlations are observed among the measurement values from a 1.5 T at 2-week interscan interval, b 3.0 T at 2-week interscan interval, c across field strengths/MRE sequences in the same day, and d across field strengths/MRE sequences at 2-week interscan interval

Intra- and interobserver agreements

The intrareader agreement for both R1 and R3 was excellent (ICC = 0.944; 95% CI 0.850–0.975 for R1; ICC = 0.817; 95% CI 0.677–0.900 for R3). Interreader agreement among all three readers was excellent (ICC = 0.958; 95% CI 0.939–0.972).

Mean biases of the measurement values by readers are shown in Fig. 3. The range of mean biases was 0.05–0.19 kPa, and LOAs were within 0.53 kPa.

Fig. 3
figure 3

Bland–Altman plots for the mean biases and limits of agreement (LOAs) of liver stiffness between a Reader 1 and 2, b Reader 1 and 3, and c Reader 2 and 3. Mean biases range from 0.05 to 0.19 kPa, and LOAs are within 0.53 kPa

ROI size comparisons

The mean size of ROIs was significantly larger in 2D SE-EPI with 3.0 T MR system than in 2D GRE sequence with 1.5 T MR system (80.2 cm2 ± 21.5 vs. 44.9 cm2 ± 16.0; P < .001) (Fig. 4).

Fig. 4
figure 4

Representative 2D GRE and 2D SE-EPI MR elastography at 1.5 T and 3 T, respectively, in a 28-year-old female patient on her 1st visit. Mean liver stiffness was 3.3 kPa ± 0.3 at 1.5 T 2D GRE and 3.0 kPa ± 0.2 at 3 T 2D SE-EPI MR elastography. The mean size of ROI was 53.7 cm2 ± 8.7 at 1.5 T 2D GRE and 90.9 cm2 ± 9.9 at 3 T 2D SE-EPI MR elastography

Discussion

In our study, MRE-measured liver stiffness showed RDC of 13.5–28.5% in three readers under various combinations of scanning conditions, with a trend of lower reproducibility for measurement values acquired under different field strengths/MRE sequences at a longer (2 weeks) interscan interval but not in the specific reader. For measurements across field strengths/MRE sequences over 2-week of interval, the largest RDC in pooled measurement from all readers was 26% with upper 95% CI of 30.6%, which corresponded to a change of 0.55 kPa. Based on the liver stiffness of approximately 2.0–2.5 kPa reported in previous studies to differentiate between advanced fibrosis (F3 and F4 stages) and no to mild to moderate fibrosis (F0, F1, and F2 stages), the reproducibility of the value of liver stiffness obtained in our study is acceptable, and MRE-measured liver stiffness could be used as a reproducible QIB [1, 11].

A recent meta-analysis reported that the summary RC of hepatic MRE was 22% (95% CI 16.1%, 28.2%), and the RC estimate tended to be higher in studies using untrained operators, 1.5 T rather than 3.0 T, and longer interscan periods [23]. In our study, the scan at longer interscan interval of 2 weeks resulted in a higher RDC than that acquired in the same day across field strengths/MRE sequences. Under the same field strength (1.5 T), RC ranged from 14 to 37% at 2- to 4-week of scan interval [19, 31, 32], which is in agreement with our result of similar magnitude of RDC at 2 weeks of interscan interval (18% to 23.4% in 1.5 T; 15.4% to 26% in 3.0 T). The impact of longer interscan period is also reflected in linear regression analysis across visits, where the slopes were 0.81 for 1.5 T and 0.73 for 3.0 T. Though these are close to 1, these discrepancies and higher RDC may be attributed to variables including and not limited to operators, technologists, reviewers, patient motion, and other patient characteristics [16]. Nevertheless, the percentage RDCs in pooled measurement are within 0.53 kPa of absolute value, which is considered a relatively narrow range of variability of the measurement.

Few studies have assessed the reproducibility of hepatic MRE across field strengths. Trout et al. conducted analysis of scans acquired on the same day and reported good to excellent agreement of the values (ICC of 0.62–0.84) obtained across field strengths using 2D GRE, 2D SE-EPI, and 3D SE-EPI MRE sequences [22]. Yasar et al. reported RDC of 25.4% to 29.7% for scans acquired on the same day with 1.5 T (Siemens) and 3.0 T (GE) platforms using 2D GRE which reflects the combined effect of field strength and vendor differences [33]. In our study, the RDC for scans acquired on the same day across field strengths/MRE sequences ranged from 15.4 to 21% in each reader, and the pooled value of RDC ranged from 19.8 to 22.1%. These values are similar to those obtained under the same field strength at 2 weeks of interscan interval, which suggests that switching of field strength and MRE sequence has as comparable impact to extending the scan interval to 2 weeks, on the reproducibility of hepatic MRE.

In our study, RDC under different field strengths tended to increase to 28.5% when the scan interval was extended to 2 weeks; although we used a distinct MRE sequence for each field strength, we obtained better reproducibility of results than those of prior studies, which may be explained by the fact that imaging was performed using scanners from the same manufacturer. Indeed, in the comparison of measurements from two imager manufacturers under fixed field strength and MRE sequence, statistically significant difference was noticed across manufactures at 3.0 T [22]. Thus, the largest pooled RDC of 26% with corresponding absolute difference within 0.55 kPa for measurements across field strengths/MRE sequences and 2-week interscan interval in our study may have limited applicability to identical vendor platforms. Current reports have indicated cross-vendor validation of hepatic MRE for only few of the imager manufacturers and further verification is needed [22, 33, 34]. Our study provides the margin of liver stiffness through MRE that reflects true change across field strengths/MRE sequences over 2-week of interscan interval, and highlights that hepatic MRE is a robust QIB.

The newly introduced 2D SE-EPI MRE sequence has shorter scan time and higher signal-to-nose ratio than the 2D GRE MRE sequence [35, 36]. Studies have shown that 2D SE-EPI MRE is advantageous since it achieves a single breath-hold scan and high interobserver agreement while providing larger measurable ROI, lower failure rate, and comparable liver stiffness value as compared to 2D GRE [35,36,37,38,39]. In line with prior publications, the mean size of ROIs was nearly two-fold larger in 2D SE-EPI with 3.0 T than in 2D GRE sequence with 1.5 T units in our study. Currently, 2D GRE MRE sequence is most widely available, and more up-to-date MRE sequences including 2D SE-EPI MRE sequence are being developed; hence, reproducibility among sequences might become an important issue.

ROI drawn by a trained operator can substantially improve the reproducibility [22]. Therefore, before the start of our study, the readers underwent training for drawing a single free-hand ROI [29]. As a result, we obtained small variation in RDC with satisfactory interreader LOA (± 0.53 kPa), and achieved excellent agreement among three independent readers (ICC = 0.958) with various levels of background knowledge in MRE technique and liver anatomy. Moreover, we achieved high intrareader agreements between the readers with the most and least experience in hepatic MRE measurement. However, the number of cases considered sufficient for training remains unclear, and evaluation of the actual impact of training is needed since we did not collect measurement values of readers before they underwent training. Operators should be trained to acquire ROIs consistently in repeat examinations.

Our study has several limitations. First, we were unable to assess the reproducibility across imager manufacturers due to limitation of available MR system in our institution. Second, although we prospectively enrolled patients with chronic liver disease with NAFLD, the cohort of our study did not include a wide spectrum of liver stiffness/fibrosis stage and the number of participants was small. Using threshold values by Hsu et al. assessing NAFLD, the majority of the patients belonged to F0 (< 2.61 kPa), and some were in F1 (< 2.97 kPa), F2 (< 3.62 kPa), and F3 (< 4.7 kPa) but none were in F4 [40]. Therefore, reproducibility in higher liver stiffness/fibrosis stage may still need further validation. Third, we were not able to assess the influence of MRE sequence on reproducibility separately from that of field strength, though the overall reproducibility was within an acceptable clinical range. One study performed direct assessment of the pairwise reproducibility of hepatic MRE under different pulse sequences [22]. Finally, we did not evaluate repeatability of measurement values acquired on the same day under the same field strength and therefore, there may be limitation in analyzing the influence of scanning interval on the reproducibility. Previous studies assessing hepatic MRE performed on the same day within the same field strength have reported high repeatability of the results obtained [20, 22, 23].

In conclusion, liver stiffness through MRE showed pooled RDC of 26% (upper 95% CI 30.6%) and corresponding LOA within 0.55 kPa across field strengths, MRE sequences, and 2-week interscan interval for trained readers with various levels of experience in MRE. The measurement values were within acceptable clinical range of the magnitude of change, rendering liver stiffness through MRE a reproducible QIB.