Introduction

In patient care, staging of liver fibrosis has important implications for disease prognosis and management decisions. The reference standard remains liver biopsy to assess the severity of liver fibrosis, the grade of steatosis, and the inflammatory activity, and to determine which patients might benefit from pharmacological therapy [1]. However, limitations of liver biopsy have been highlighted in recent years [2]. In response to these shortcomings, alternate noninvasive methods have been developed to detect liver fibrosis in patients with chronic liver disease (CLD) [3]. Among these noninvasive techniques, elastography is generally considered to provide the highest diagnostic performance for staging fibrosis [4, 5].

Elastography methods measure mechanical properties, namely stiffness quantified using the elasticity modulus or acoustic shear wave propagation speed, which represents a surrogate biomarker of liver fibrosis. These techniques rely on the concept that stiffness tends to be low in normal liver and increases with fibrotic liver. Clinically available elastography techniques include transient elastography (TE), point shear wave elastography (pSWE), and MR elastography (MRE). Although meta-analyses have reported a higher diagnostic accuracy for MRE compared with that for US-based elastography techniques [5,6,7], these comparisons are prone to selection biases due to different eligibility criteria, patient populations, and referral patterns. Hence, there is a need to perform paired comparisons of the most commonly used elastography techniques in the same patient population.

The purpose of this study was to perform a head-to-head comparison of the feasibility and diagnostic performance of TE, pSWE, and MRE for detecting histology-determined fibrosis in patients with CLD. The secondary objective was to evaluate the influence of potential confounders (i.e., inflammation and steatosis) on association between fibrosis and stiffness measurements.

Materials and methods

Study design and subjects

This cross-sectional imaging trial was approved by the Institutional Review Board of the two participating institutions, Centre hospitalier de l’Université de Montréal and McGill University Health Centre (ClinicalTrials.gov Identifier No. NCT02044523). All subjects provided written informed consent. TE, pSWE, and MRE examinations were performed as research procedures within 6 weeks of the liver biopsy for all patients, and if done after the liver biopsy, a minimum delay of 48 h was observed. Histopathology was used as the reference standard. Feasibility and fibrosis-staging accuracy of the three index tests were compared.

The hepatology clinics of the two participating institutions recruited consecutive patients between January 2014 and September 2018. Adult participants were enrolled in this study if (a) they underwent a liver biopsy as part of their clinical standard of care for suspected or known chronic liver disease caused by hepatitis B virus (HBV) infection, hepatitis C virus (HCV) infection, nonalcoholic fatty liver disease (NAFLD), nonalcoholic steatohepatitis (NASH), or autoimmune hepatitis (AIH); or (b) they underwent a liver biopsy to resolve an unexplained discrepancy between the fibrosis stages inferred by TE results and by noninvasive scoring systems based on laboratory tests. Participants were excluded if they had any contraindication to MRI.

TE examination

TE using the FibroScan (Echosens) was used to measure the median Young elasticity modulus (in kPa) as a surrogate of liver fibrosis. The M probe was used by default. The XL probe was used in case of high BMI or failure to measure the liver stiffness with the M probe [8]. Experienced hepatologists or nurses positioned the transducer on the skin at an intercostal space over the right liver lobe. At this site, TE was repeated up to 20 times, until at least 10 valid measurements were collected. Failure was defined as the impossibility of obtaining 10 valid measurements [9]. Reliability was defined according to the clinically used criteria proposed by Boursier et al based on the ratio of the interquartile range to the median (IQR/M) [9].

pSWE examination

Conventional ultrasound images in B-mode and acoustic radiation force impulse imaging (ACUSON S2000 or S3000, Siemens Healthineers) were acquired with the same convex probe (4C1, Siemens Healthineers), tissue harmonic imaging (4 MHz), and a mechanical index of 1.7. pSWE examination was performed according to the clinical guidelines by experienced hepatologists or nurses [10]. The median shear wave velocity (in m/s) was considered representative of liver stiffness, and the IQR/M was used as an indicator of variability. Technical success was achieved if 10 valid measurements were obtained in 20 repetitions or less. Reliability was defined according to clinical guidelines using the success rate and the IQR/M [10].

MRE examination

All MR examinations were performed on a 3.0-T clinical scanner (Achieva TX; Philips Healthcare) in accordance with a previously described method [11]. A transducer (Resoundant) positioned on the right side of the patient in supine position induced a mechanical vibration at 60 Hz synchronized with the acquisition of a motion-sensitized gradient-echo (GRE) sequence. The MRE image analysis technique included elastogram images with parametric maps of goodness-of-fit to exclude areas of unreliable measurements from the region of interest (ROI). Further details of MR acquisition and post-processing parameters to compute the shear modulus (in kPa) are provided in the Supplemental Materials. Measures of iron (R2*) were performed. MRE measurements were considered reliable if R2* was in the normal range (lower than 126 s−1 at 3.0 T) [12, 13].

Histopathological analysis

Liver biopsies were performed with 16-G or 18-G core needles according to the clinical standard of care. Hematoxylin and eosin (H&E) slides were centrally scored by an expert liver pathologist. Fibrosis stages, inflammation grades, and steatosis grades were assessed. Details of fibrosis scoring are provided in the Supplemental Materials.

Blinding

Technologists, sonographers, physicians, and image analysts participating in the analysis of the index tests were blinded to histopathological results. The pathologist was blinded to elastography results.

Statistical analysis

Statistical analyses were performed by a senior-level biostatistician with the SAS 9.4 (SAS Institute) and the free software R3.4.2 (R Foundation).

Feasibility and reliability

The technical feasibility and reliability rates were calculated. Pairwise comparisons of the rates were performed using the McNemar test reflecting the fact that index tests were performed on the same subjects. Technically unfeasible or unreliable measurements were excluded from further analyses.

Staging comparison

Comparison of index tests’ measurements between all fibrosis stages was performed using the nonparametric Kruskal-Wallis rank-sum test with Bonferroni’s correction. Pairwise comparisons were performed between fibrosis stages with a post hoc Mann-Whitney U test in each index test.

Diagnostic performance

The diagnostic accuracy of stiffness measurements by TE, pSWE, and MRE for predicting histology-determined fibrosis stage was assessed by the Obuchowski score, a multinomial version of the area under the receiver operating characteristic (ROC) curve [14], and the area under the ROC curve (AUC). Estimates of diagnostic performance (including sensitivity, specificity, accuracy, positive predictive value, and negative predictive value) were calculated for the threshold that provided at least 90% sensitivity for differentiation of F0 vs. ≥ F1, ≤ F1 vs. ≥ F2, ≤ F2 vs. ≥ F3, and ≤ F3 vs. F4. Measures of AUC of TE, pSWE, and MRE were compared using the DeLong method.

Confounding variables

Spearman’s rank correlation and multiple regression analysis of TE, pSWE, and MRE measurements as a function of fibrosis, inflammation, and steatosis were performed to evaluate the confounding effect of these histological features on liver stiffness. Spearman’s ρ, regression coefficient estimates, normalized regression coefficient estimates, standard deviation, and adjusted R2 were reported for each technique.

Results

Population

Our cohort included 100 eligible adult patients who underwent TE, pSWE, MRE, and liver biopsy between January 2014 and July 2018 (Fig. 1). Seventy-nine patients were recruited at Centre hospitalier de l’Université de Montréal and 21 at McGill University Health Centre. Mean age was 55 years (22–78) (Table 1). Forty-seven were women (47%), and all patients had suspected or known liver fibrosis or cirrhosis induced by either HBV (n = 3), HCV (n = 21), NAFLD (n = 7), NASH (n = 45), AIH (n = 17), or mixed causes (n = 7).

Fig. 1
figure 1

Flowchart of patient selection

Table 1 Characteristics in 100 patients

Histopathological findings had the following distribution in our study cohort: for fibrosis stage, 15 patients had F0, 16 F1, 24 F2, 18 F3, and 27 F4; for inflammation activity grade, 9 patients had A0, 49 A1, 32 A2, and 10 A3; for steatosis grade, 35 patients had grade S0, 27 grade S1, 18 grade S2, and 20 grade S3. Some patients were overweight (n = 27), obese (n = 48), or severely obese (n = 5). The mean body mass index (BMI) of the cohort was 30.1 ± 5.9 kg/m2. The median time interval between TE, pSWE, and MRE and liver biopsy was 5 days (0–31 days), 11 days (0–31 days), and 11 days (0–31 days), respectively. Average H&E slide length was 20.5 mm (10–30 mm) and included 2 fragments on average (1–13). Fifty-eight TE examinations were performed with the M probe and 42 with the XL probe.

Feasibility and reliability

The technical failure rate was 0% for TE, 1% for pSWE, and 6% for MRE. Technical failure rate differences were not significant between all the elastographic techniques. The rate of unreliable examinations was 8% for TE, 19% for pSWE, and 3% for MRE. Reliability differences were significant only between TE and pSWE (p < 0.05) and between pSWE and MRE (p < 0.001). For MR examinations, mean R2* was 50.6 ± 16.5 s−1 (13.1–112.8 s−1) in the population with measurements deemed reliable (n = 91) and 147.4 ± 39.8 s−1 (127.4–220.0 s−1) in the population with measurements deemed unreliable (n = 3).

Stiffness measurements

Mean stiffness measurements with standard deviations for each fibrosis stage in all techniques are reported in Table 2. Examples of TE, pSWE, and MRE measurements are shown in Fig. 2, and boxplots are shown in Fig. 3.

Table 2 TE, pSWE, and MRE mean values as a function of fibrosis stage
Fig. 2
figure 2

Schematic diagrams of (a) TE, (b) pSWE, and (c) MRE liver stiffness measurements. d TE stiffness measurement, (e) pSWE stiffness measurement, and (f) MRE elastogram in a cirrhotic obese male patient with hepatitis C virus, BMI = 30.8 kg/m2. Liver stiffness: TE Young’s modulus = 20.90 kPa; pSWE shear wave speed = 1.75 m/s; and MRE shear modulus = 5.48 kPa. Liver biopsy: percutaneous biopsy; sample length = 22 mm; fibrosis stage = F0; inflammation activity grade = A3; and steatosis grade = S3

Fig. 3
figure 3

Median liver stiffness with interquartile ranges measured with index tests vs. histology-determined fibrosis stages. a Young’s modulus measured with TE. b Shear wave speed measured with pSWE. c Shear modulus measured with MRE. Stiffness properties with each index test were significantly different between fibrosis stages (p < 0.0001). The band inside the box indicates the median, the box indicates the first and third quartiles, whiskers indicate the minimum and maximum values, and red crosses (+) indicate outliers

Staging comparison

Stiffness measurements with TE, pSWE, and MRE differed significantly between histology-determined fibrosis stages (p < 0.0001). Post hoc tests revealed that stiffness measurements differed significantly between ≤ F1 vs. ≥ F2 for pSWE (p < 0.05), ≤ F2 vs. ≥ F3 for MRE (p < 0.05), and ≤ F3 vs. F4 for all techniques (p < 0.001, p < 0.05, and p < 0.05, respectively).

Diagnostic performance

Estimates of diagnostic performance are shown in Table 3 and ROC analysis in Fig. 4. AUCs were similar or higher for detecting any dichotomized fibrosis stages with MRE than with TE or pSWE. For differentiating F0 vs. ≥ F1, the AUC was significantly higher for MRE than that for TE (0.88 vs. 0.71; p < 0.05) or pSWE (0.88 vs. 0.73; p < 0.05). For differentiating ≤ F1 vs. ≥ F2, the AUC was significantly higher for MRE than that for TE (0.85 vs. 0.75; p < 0.05). For differentiating ≤ F2 vs. ≥ F3 and ≤ F3 vs. F4, there were no significant differences in AUCs between the elastographic techniques. Also, there were no significant differences between the AUCs of TE and pSWE for differentiation of fibrosis stages.

Table 3 Diagnostic accuracy of Young’s modulus by TE, shear wave speed by pSWE, and shear modulus measured by MRE for staging liver fibrosis (95% CI in parenthesis)
Fig. 4
figure 4

Receiver operating characteristic curves for distinguishing dichotomized fibrosis stages with (a) TE, (b) pSWE, and (c) MRE. AUCs were similar or higher for detecting any dichotomized fibrosis stages with MRE than with TE or pSWE

Confounding variables

Spearman’s rank correlation and multiple regression analysis of the confounding effects of fibrosis, inflammation, and fibrosis on elastographic measurements are shown in Table 4. Scatter plots with linear regression of stiffness measurements with inflammation and steatosis are shown in Figs. 5 and 6, respectively. Univariate correlation coefficients demonstrated that liver stiffness measured with TE, pSWE, and MRE increased significantly with fibrosis stage (0.57 [p < 0.001], 0.62 [p < 0.0001], and 0.72 [p < 0.0001], respectively), increased significantly with inflammation grade (0.20 [p < 0.05], 0.23 [p < 0.05], and 0.21 [p < 0.05], respectively), and decreased with steatosis stage (− 0.11 [p = 0.14], − 0.23 [p < 0.05], and − 0.22 [p < 0.05], respectively). However, multiple regression coefficient estimates demonstrated that liver stiffness measured with TE, pSWE, and MRE increased significantly with fibrosis stage (3.07 [p < 0.0001], 0.24 [p < 0.0001], and 0.55 [p < 0.0001], respectively), but not significantly increased with inflammation grade (0.65 [p = 0.56], 0.08 [p = 0.41], and 0.13 [p = 0.32], respectively), nor significantly decreased with steatosis stage except when measured with pSWE (− 0.05 [p = 0.94], − 0.13 [p < 0.05], and 0.08 [p = 0.35], respectively).

Table 4 Univariate analysis and multiple regression analysis of liver fibrosis, inflammation, and steatosis impact on stiffness measured by index tests
Fig. 5
figure 5

Scatter plots of stiffness measurements compared with inflammation grades with (a) TE, (b) pSWE, and (c) MRE. Spearman’s rho were respectively ρ = 0.20, ρ = 0.23, and ρ = 0.21 (p < 0.05 for all)

Fig. 6
figure 6

Scatter plots of stiffness measurements compared with steatosis grades with (a) TE, (b) pSWE, and (c) MRE. Spearman’s rho were respectively ρ = − 0.11 (p = 0.14), ρ = − 0.23 (p < 0.05), and ρ = − 0.22 (p < 0.05)

Discussion

In this cohort, the technical failure rates were not significantly different between elastography techniques. At univariate analysis, liver stiffness measured by all techniques increased with fibrosis stages and inflammation and decreased with steatosis. In multiple regression analysis, only fibrosis significantly correlated with stiffness measurements across all techniques, except pSWE which also correlated with steatosis. Diagnostic accuracy for distinguishing early stages of fibrosis was higher with MRE than with TE or pSWE.

Higher rates of unreliable examinations were observed for pSWE than for TE or MRE. The technical failure rates of elastography techniques were lower than previously reported for TE [15], similar or higher to previously reported for pSWE [16,17,18], and lower than that reported for MRE [19,20,21]. Reliability assessed with similar criteria was found to be similar or higher than that reported for TE [15, 22], lower to previously reported for pSWE [22], and higher than that reported for MRE [19].

As anticipated, stiffness measurements increased with higher fibrosis stages regardless of the technique used [23]. All elastography techniques had a higher accuracy for differentiating fibrosis stages ≤ F3 vs. F4, than for ≤ F2 vs. ≥ F3. This is consistent with prior meta-analyses that have shown higher AUCs for differentiation of higher fibrosis stages [5, 7]. Of note, the accuracy obtained in our study using MRE (0.88) for differentiating F0 from F1 and higher was improved compared with that in previous literature.

Overall, MRE provided either a similar or higher accuracy than TE and pSWE for staging liver fibrosis. The accuracy of MRE was significantly higher than that of TE or pSWE for differentiating F0 from F1 or higher, and significantly higher than that of TE for differentiating F1 or lower from F2 or higher. The MRE technique produces shear waves distributed throughout the liver and includes larger ROIs on four acquired slices whereas pSWE and TE samples include smaller regions of interest (with lengths of 10 mm and 40 mm, respectively).

In prior meta-analyses, higher diagnostic accuracies have been reported for MRE [24] than for pSWE [25] or TE [7]. However, these studies were performed in different patient populations. Hence, there was a need to compare their diagnostic accuracy head-to-head in the same patient population. Some prior studies have performed paired comparisons of diagnostic accuracy between two of the three elastography techniques. A prior study by Cui et al has reported a higher fibrosis-staging accuracy for MRE than for pSWE in patients with NAFLD [26]. Similarly, prior studies comparing the diagnostic performance of MRE and TE have found a higher accuracy for MRE than for TE in patients with CLD [27], chronic HBV [7], or NAFLD [28]. A study by Bohte et al reported a similar diagnostic accuracy for MRE and TE in patients with HBV and HCV [29]. Most studies comparing US-based elastography techniques have found a similar diagnostic accuracy between pSWE and TE in patients with CLD [30], NAFLD [22], and viral hepatitis [31]. One cross-sectional study by Rizzo has found that pSWE was more accurate than TE for the detection of significant fibrosis ≥ F2, severe fibrosis ≥ F3, and cirrhosis F4 [32].

In univariate analysis, we found that liver shear stiffness measured with TE, pSWE, and MRE increased with fibrosis, increased to a lesser degree with inflammation, and decreased with steatosis. These findings were consistent across all the elastography techniques. Inflammation, often accompanied by hepatocyte ballooning and edema, may increase liver stiffness by increased cellularity, cell size, or hydrostatic pressure [33]. The increase in liver stiffness observed with higher inflammation grades is consistent with prior findings in animal studies [34, 35] and in patients with CLD [19, 36, 37]. In a cohort of patients with HBV, Shi et al obtained similar results for MRE measurements and argued that advanced inflammatory activity (≥ A2) induced higher stiffness measurements especially in lower fibrosis stages (≤ F2) [38].

The impact of steatosis on liver stiffness remains controversial and may depend on the elastography technique used: while some studies have found that steatosis decreases stiffness [39], others have found the opposite [40, 41], and some have found no significant influence [42, 43]. Frequency-dependent viscosity of fat and the use of lower frequencies by TE (50 Hz) and MRE (60 Hz) compared with those used by pSWE (range of 100 to 500 Hz) may explain these discrepancies [44]. Future cross-sectional prospective studies would require quantifying the shear loss modulus (viscosity) to confirm the impact of this confounder. Our results are coherent with those of Yoneda et al who reported a negative correlation of pSWE stiffness measurements with steatosis. When only considering cases with NAFLD of NASH in which fibrosis and steatosis coexist, multivariate analyses also found that steatosis remained a significant confounder only for pSWE. Considering the coexistence of several histopathological changes and the emergence of quantitative techniques for the detection of inflammation using MRE [34], and of fat using MR imaging [45, 46] or ultrasound-based attenuation [47, 48], future multiparametric techniques could improve fibrosis-staging accuracy.

We acknowledge the following limitations. Our study included patients with a variety of CLD, whereas recent studies tend to select homogeneous patient populations with a single etiology. However, our inclusion criteria reflect clinical reality because patients with suspected or multiple coexisting causes of CLD may still require fibrosis staging by noninvasive techniques. Diagnostic accuracy of elastography techniques was lower than previously reported in meta-analyses [17, 24, 49]. The evolution of clinical practice over time has introduced a selection bias toward challenging cases. Nowadays, unambiguous cases are often not biopsied, whereas cases with unreliable elastography results or discrepancies between elastography techniques and blood markers or biological scoring indexes are more likely to undergo liver biopsy. These difficult cases reflect underlying heterogeneity of fibrosis distribution possibly contributing to a lesser diagnostic performance of elastography techniques. Finally, we did not perform colocalization of the ROI sampled by the three elastography techniques and intra- and interobserver variability was not evaluated. As shear wave velocities are different in machines from different vendors in ultrasound elastography [50], provided cutoffs might not be applicable to other machines. Instead, we performed elastography examinations according to clinical standards of care by operators blinded to each other and to the reference standard. For TE and pSWE, 10 valid measurements were obtained, and IQR/M was used as a surrogate of variability.

Even though MRE at 3.0 T has been shown to be feasible [51, 52], higher failure rates compared with those for MRE at 1.5 T have been reported especially when using a GRE sequence [20]. MRE using a GRE sequence has higher failure rates than MRE using a spin-echo sequence due to its susceptibility sensitivity [21, 53]. At our center, light iron overload is characterized by R2* between 130 and 200 s−1 and moderate iron overload by R2* between 200 and 320 s−1 at 3.0 T, as extrapolated from calibration curves from Wood et al [13]. In this study, two subjects had mild and one had moderate iron overload. As an average iron concentration was found to be relatively low, good results were obtained using MRE with a GRE sequence at 3.0 T. However, it might be appropriate to perform MRE examination at 1.5 T and/or with a spin-echo sequence when available for population with high iron overload incidence.

In a context where the fibrosis stage may lead to enrollment in systematic surveillance programs for hepatocellular carcinoma or to prescription of expensive medication, high accuracy and cost-effectiveness are required for noninvasive techniques. Indeed, there is a need for tools which can accurately distinguish ≤ F1 from ≥ F2, since a significant fibrosis is a major criterion for initiation of long-term treatments such as antiviral B therapy or antifibrotic drugs in HBV and NASH [54]. The significantly higher AUC obtained for detecting stages ≤ F1 from ≥ F2 with MRE compared with that of TE suggests that MRE examinations should preferentially be performed to confirm disease in patients with suspected CLD. A cost-utility analysis of NASH annual noninvasive screening strategies previously showed that MRE was more cost-effective than biopsy [55]. Moreover, accurate staging of F0 from ≥ F1 will be required for early diagnosis and prevention in CLD as early fibrosis is reversible [56, 57]. Significantly higher AUCs obtained for detecting F0 from ≥ F1 with MRE compared with those of TE or pSWE suggest that MRE could also be employed for the early screening of fibrosis. If combined with MRI in the setting of a hepatocellular carcinoma surveillance program, MRE may also have an added value as MRE-determined liver stiffness has been shown to be a significant predictor of hepatocellular carcinoma occurrence in compensated CLD [58].

Multiparametric quantitative MR imaging or ultrasound techniques accounting for the coexistence of inflammation and steatosis will be required to improve fibrosis-staging accuracy and further reduce the need for liver biopsy. Tang et al showed that quantitative ultrasound and shear wave elastography provided improved classification accuracy for grading steatosis, inflammation, and fibrosis compared with elastography alone in an animal model [59]. Yin et al proposed a multiparametric quantitative MR imaging model which parameters could accurately predict NAFLD activity score in an animal model [60]. Distinguishing inflammation and steatosis from fibrosis is critical for noninvasive diagnosis and prognosis of CLD with elastography, especially in inflammatory disease and fatty liver disease.

All these endpoints should be achieved with user-friendly screening tools for clinicians sensitive enough to diagnose cirrhotic patients whatever the cause. Future work should include a cost-effectiveness analysis of strategies combining portable ultrasound-based elastographic techniques for point-of-care screening and comprehensive magnetic resonance–based examinations that also permit grading of steatosis, iron, and inflammation [34] in addition to staging of fibrosis in CLD.

In this prospective cross-sectional study, liver stiffness measured by MRE and US-based elastography techniques increased with fibrosis stages and inflammation and decreased with steatosis. MRE provided a diagnostic accuracy higher than US-based elastography techniques for staging of early stages of histology-determined liver fibrosis.