Introduction

Liver fibrosis is characterised by excessive deposition of the extracellular matrix, leading to consequent liver architectural distortion. As progressive liver fibrosis leads to cirrhosis, patients are at increased risk for hepatic decompensation and hepatocellular carcinoma [1, 2]. Consequently, identifying fibrosis across the spectrum from early fibrosis to cirrhosis is of great clinical importance for the management of patients with chronic liver disease (CLD) [3]. Although liver biopsy is considered to be the gold standard for liver fibrosis measurements, it has several limitations such as invasiveness, risks of complications, sampling errors, and interobserver variations in interpretation [4]. Therefore, several noninvasive imaging-based methods for the assessment of liver fibrosis have been developed [5].

Recent advances in magnetic resonance (MR) imaging have led to a growing interest in multiparametric MR imaging to assess liver fibrosis, including MR spectroscopy, diffusion-weighted imaging (DWI), perfusion-weighted imaging, and MR elastography (MRE) [6, 7]. Of these, MRE is gaining acceptance as the most accurate noninvasive tool for the assessment of liver fibrosis [8,9,10,11,12]. Gradient recalled echo-based MRE (GRE-MRE) has been well studied and is available in many specialised centres. Although MRE is not limited by obesity, most failures of GRE-MRE are caused by the short T2* transverse relaxation time of iron-overloaded liver [4, 11, 13]. Because spin-echo echo-planar imaging-based MRE (SE-EPI-MRE) is less sensitive to transverse relaxation signal decay than GRE-MRE, SE-EPI-MRE has been described to better estimate stiffness maps for increased spatial coverage within similar acquisition time, as well as having larger areas of stiffness measurement, and lower overall failure rate [13,14,15,16]. However, a direct comparison of GRE and SE-EPI-MRE for the assessment of liver fibrosis with histopathology as a reference standard has rarely been examined [17, 18].

Several recent meta-analyses have characterised the diagnostic performance of GRE-MRE in various CLDs [19,20,21]. However, since many studies on MRE were published after 2016, the data need to be updated. In addition, there are no meta-analyses comparing GRE and SE-EPI-MRE. Therefore, the purpose of this study is to compare the overall diagnostic value of GRE and SE-EPI-MRE for the detection and staging of liver fibrosis by performing a meta-analysis, using histopathology as the reference standard.

Materials and methods

The overall process of this meta-analysis was carried out according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [22,23,24].

Literature search

We searched MEDLINE, EMBASE, and Cochrane library databases for relevant studies by using the following keywords and medical subject heading (MeSH) terms: “liver fibrosis” OR “hepatic fibrosis” AND “magnetic resonance elastography” OR “MR elastography” OR “MRE”. The last search date was 31 January 2017. The search included all articles, without time limits or language restrictions. References cited within potentially eligible articles and related citations suggested by PubMed were assessed to find other relevant studies. Details of the literature search are given in Table E1 (online).

Study selection

All search results were screened for eligibility on the basis of the title and abstract by two independent reviewers. Animal studies, review articles, and case reports were excluded. Subsequently, the same two reviewers evaluated the full text of articles for final inclusion. Studies were selected if they included all of the following criteria: (1) the study appraised the performance of MRE for the diagnosis of liver fibrosis; (2) histopathology was used as the reference standard; (3) the study used a comparable liver fibrosis staging system; (4) available data could be used to compute the true-positive, false-positive, true-negative, and false-negative results of MRE for the diagnosis of the fibrosis stage. Studies were excluded if any of the following exclusion criteria were present: (1) the study reported a conference abstract; (2) the study was not written in English; (3) the study involved fewer than 20 patients; (4) the study evaluated only children or transplant recipients. Studies with greater sample sizes were brought in when overlapping samples were recruited in more than one study. All discrepancies in study selection between the two reviewers were resolved by consensus following discussion.

Data extraction

Data were extracted by two independent reviewers using a predefined form. For each study, we retrieved the following study design information: the names of the first and corresponding authors; hospital or medical school; the country where the study was conducted; year of publication; study design; time range of study; number of patients; reference test; scoring system for histopathologic staging of liver fibrosis; time period between reference standard and index test; mean or median patient age; gender ratio; patient spectrum. The following technical characteristics of MRE were recorded: magnetic field strength; MR scanner; MRE dimension; wave frequency of MRE; cut-off value used for each stage of liver fibrosis. To compare fibrosis staging acquired with various scoring systems, we transformed each reported fibrosis stage into the simplified five-stage fibrosis scoring system shown in Table E2 (online).

Study quality assessment

The quality of studies included in the meta-analysis was assessed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool by two independent reviewers. This tool enables reviewers to evaluate the risk of bias in patient selection, index test, reference standard, patient flow and timing, and a study’s applicability to clinical practice in terms of patient selection, index tests, and reference standards [25]. We considered an acceptable interval of time between MRE and biopsy to be less than or equal to 1 year, given the relatively slow progression of liver fibrosis.

Statistical analysis

For each study, two-by-two contingency tables were extracted or reconstructed for the classification of F0 versus F ≥ 1, F0 and F1 versus F ≥ 2, F0–F2 versus F ≥ 3, and F0–F3 versus F4, respectively. The primary analysis was the diagnostic performance of GRE and SE-EPI-MRE for detection of significant fibrosis (F ≥ 2). The secondary analysis was the performance of each MRE sequence for detection of any fibrosis (F ≥ 1), advanced fibrosis (F ≥ 3), and cirrhosis (F = 4), respectively. The pooled estimates of sensitivity, specificity, positive and negative likelihood ratios, and the diagnostic odds ratio, along with the 95% confidence interval (CI), were calculated by using the bivariate random effects model of Reitsma et al. [26]. A summary receiver operating characteristic (ROC) curve was generated using the bivariate model of Reitsma et al. [26], which is equivalent to a hierarchical summary ROC. Heterogeneity among the included studies was assessed with I 2 and χ2 statistics for each of the pooled estimates [23]. Heterogeneity was judged to be substantial if I 2 was more than 50% and if the P value for χ2 statistics was less than 0.10 [27]. To investigate the cause of heterogeneity, threshold effects, influence and sensitivity analysis, and subgroup analysis were performed. Threshold effects were explored by assessing a linear correlation between the sensitivity and false-positive rates and were regarded as substantial if a Spearman’s correlation coefficient was 0.6 or higher [28]. In the influence analysis, the presence of outlier and influential research was assessed by using Cook’s distance, with 4/n as a cut-off, where n is the number of studies [29]. Sensitivity analysis was performed to quantify the effects of each study on the pooled estimates. Subgroup analysis for sensitivity was performed by using a random effects model with several covariates that could have accounted for heterogeneity. Publication bias was assessed by using the funnel plot asymmetry test, with a P value of less than 0.10 considered to indicate significant small-study bias [30]. Statistical analysis was performed by using R software (R Foundation for Statistical Computing, Vienna, Austria; version 3.3.2; http://www.R-project.org ) with the “metafor” and “mada” packages.

Results

Study selection and characteristics of the included studies

A total of 360 relevant studies were identified in the initial search after the removal of duplicates, and 233 articles were excluded on the basis of the screening of titles or abstracts. Among 127 potentially relevant studies, 26 studies with a total of 3,200 cases were finally included in the meta-analysis [9,10,11,12, 17, 31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] (Fig. 1).

Fig. 1
figure 1

Flowchart of study selection

Important characteristics of the 26 included studies are summarised in Tables 1, 2, and E3 (online). The number of patients ranged from 20 to 416. In five studies the disease spectrum was restricted to non-alcoholic fatty liver disease (NAFLD) [36, 41, 45, 49, 50], in four studies to chronic hepatitis B (CHB) [37, 38, 40, 43], in three studies to chronic hepatitis B or C [17, 39, 48], and in one study to primary sclerosing cholangitis [42]. In the study by Chang et al. [40], the CHB group and the non-CHB group were evaluated independently. Accordingly, we separately considered each group as a unit of analysis. The reference standard was liver resection for all patients in 1 study [44], a combination of resection, biopsy, and/or explant in 7 studies [9, 12, 17, 35, 38,39,40], and biopsy for all patients in the remaining 18 studies.

Table 1 Characteristics of studies that assessed the diagnostic accuracy of GRE-MRE
Table 2 Characteristics of studies that assessed the diagnostic accuracy of SE-EPI-MRE

Quality assessment

Figure 2 demonstrates the results of study quality assessment with the QUADAS-2 tool. The included studies satisfied most of the QUADAS-2 questions. In 15 studies, however, the risk of bias in patient selection was high because of a restricted disease spectrum or non-consecutive patient enrolment. The details of quality assessment for each study are provided in Table E4 (online).

Fig. 2
figure 2

Grouped bar charts show the results of study quality assessment with the QUADAS-2 tool. Each chart shows the cumulative results of included studies in terms of the risk of bias (left) and concerns regarding applicability (right) according to each QUADAS-2 domain

Diagnostic accuracy of MRE

Primary analysis

Nineteen and six units of data were included for detection of significant fibrosis (F ≥ 2) on GRE and SE-EPI sequences, respectively. The pooled sensitivity, specificity, positive and negative likelihood ratios, and diagnostic odds ratios for detection of significant fibrosis with GRE-MRE were 0.92 (95% CI: 0.89, 0.94), 0.92 (95% CI: 0.90, 0.95), 9.44 (95% CI: 5.99, 14.87), 0.12 (95% CI: 0.08, 0.17), and 93.28 (95% CI: 49.21, 176.81), respectively (Fig. 3, A, Table 3). The pooled sensitivity, specificity, positive and negative likelihood ratios, and diagnostic odds ratios for detection of significant fibrosis with SE-EPI-MRE were 0.87 (95% CI: 0.77, 0.96), 0.93 (95% CI: 0.89, 0.97), 9.09 (95% CI: 6.22, 13.28), 0.14 (95% CI: 0.07, 0.30), and 90.32 (95% CI: 39.38, 207.19), respectively (Fig. 3, B, Table 3). There was no difference in the pooled sensitivity of the GRE and SE-EPI sequence (0.92 vs. 0.87, P = 0.336). The area under the summary ROC curve was 0.95 and 0.94 for the GRE and SE-EPI sequences, respectively, which suggested high diagnostic accuracy (Fig. 4).

Fig. 3
figure 3

Forest plots of sensitivity and specificity for the detection of significant fibrosis (F≥2) with A GRE sequence and B SE-EPI sequence. Squares: individual study point estimates. Error bars: 95% CIs. Dashed line and rhombus: summarised estimate and its 95% CI

Table 3 Pooled analysis of the diagnostic performance for diagnosis and staging of liver fibrosis
Fig. 4
figure 4

Summary ROC curve of the accuracy of detection of significant fibrosis (F ≥ 2) with A GRE sequence and B SE-EPI sequence. Each circle (GRE) and square (SE-EPI) indicates the data of each included study, while the black circles (GRE) and squares (SE-EPI) indicate summary estimates of each sequence. Continuous lines represent a 95% confidence region and dotted lines represent a 95% prediction region

Secondary analysis

The areas under the summary ROC curve for stage diagnosis of any fibrosis, advanced fibrosis, and cirrhosis on GRE and SE-EPI sequences were 0.93 versus 0.94, 0.94 versus 0.95, and 0.92 versus 0.93, respectively, suggesting high diagnostic accuracy (Table 3). The summary ROC plot is shown in Figure E1-4 (online).

Assessment of heterogeneity

In the primary analysis, GRE-MRE exhibited substantial heterogeneity (P < 0.001 for χ2 and I2 > 60%) for both sensitivity and specificity. However, SE-EPI-MRE exhibited substantial heterogeneity for sensitivity (P < 0.001 for χ2 and I2 = 92.1%), but not for specificity (P = 0.15 for χ2 and I2 = 40.2%).

Diagnostic threshold effects

In the primary analysis, the Spearman’s correlation coefficient between sensitivity and the false-positive rate for GRE and SE-EPI-MRE was calculated to be -0.391 (P = 0.098) and 0.543 (P = 0.266), respectively. Therefore, a weak positive correlation between sensitivity and the false-positive rate was demonstrated only for SE-EPI-MRE. No substantial threshold effect (correlation coefficient of 0.6 or higher) was observed in the secondary analysis. Heterogeneity statistics and threshold effects are summarised in Table E5 (online).

Influence and sensitivity analysis

In the influence analysis for GRE-MRE, studies by Godfrey et al. [33] and Cui et al. [41] showed relatively larger values for Cook’s distance (0.48 and 0.58) than other studies and accordingly may be considered to have suggested greater influence. In the sensitivity analysis, the amount of heterogeneity (I 2) was reduced from 60.4% to 52.0% and 53.3% when studies by Godfrey et al. and Cui et al., respectively, were excluded. However, pooled sensitivities were almost unchanged. In the influence analysis for SE-EPI-MRE, a study by Bohte et al. [48] was judged to show greater influence (Cook’s distance, 0.65). In the sensitivity analysis, however, the amount of heterogeneity was minimally altered after removal of the study by Bohte et al.

Subgroup analysis

For GRE-MRE, sensitivity was significantly higher in studies originating from Asian countries compared to studies originating from Western countries (94% vs. 82%, P = 0.003), in studies with an aetiology of CLD that was specified to be viral in comparison to studies in which aetiologies were not specified (95% vs. 88%, P = 0.004) and in studies in which the METAVIR system was used in all patients in comparison to studies where the METAVIR system was not used in all patients (94% vs. 85%, P = 0.007). Other subgroup factors did not show statistically significant differences (P > 0.166). For SE-EPI-MRE, significantly higher sensitivity was reported in studies that used 3D MRE than in studies that used 2D MRE (98% vs. 81%, P = 0.039). There was no significant difference in sensitivity for the remaining subgroups (P > 0.078) (Table 4).

Table 4 Sensitivity estimates for each subgroup for detection of significant fibrosis

Assessment of publication bias

In the primary analysis, low likelihoods of publication bias for GRE and SE-EPI sequences (P = 0.136 and P = 0.119) were observed with funnel plot asymmetry tests. However, high likelihoods of publication bias were observed for GRE-MRE for the staging of advanced fibrosis and cirrhosis (P = 0.041 and P = 0.002) and for SE-EPI-MRE for the staging of advanced fibrosis (P < 0.001).

Discussion

Early and accurate detection of liver fibrosis is important because liver fibrosis is now considered a dynamic process, with significant potential for reversal [51]. Significant fibrosis (≥ F2) is typically considered a hallmark of the progressive form of liver disease, and the ultimate aim of treatment for this fibrosis stage is to cure the patient by resolving the underlying cause of liver disease [51]. Furthermore, discernment of advanced fibrosis (F3) or cirrhosis (F4) is essential because advanced fibrosis and cirrhosis patients should be screened for portal hypertension and hepatocellular carcinoma [5]. On the basis of our study results, both GRE and SE-EPI-MRE were highly accurate for detecting all stages of liver fibrosis, could be acquired by various MR systems and sequences, and had the advantage of being associated with MR, which was a better method of detection for hepatocellular carcinoma than other modalities.

Primary and secondary analysis of our meta-analysis on GRE-MRE and SE-EPI-MRE demonstrated an excellent diagnostic performance with area under the summary ROC curve values of 0.92-0.95. Previous studies comparing GRE and SE-EPI-MRE were without histopathologic data [13,14,15, 52] or were limited to specific diseases [17, 50]. The current meta-analysis included only studies with pathologic fibrosis staging as the reference standard, directly compared two MRE sequences, included various aetiologies of CLD, and showed that SE-EPI-MRE is highly accurate for detecting each stage of fibrosis without significant differences in pooled sensitivity, specificity, and area under the summary ROC curve values in comparison to GRE-MRE.

According to the findings of a recent study on the agreement and repeatability of MRE in 24 adult volunteers, GRE-MRE showed better agreement and repeatability than SE-EPI-MRE [52]. However, the results of SE-EPI-MRE require a shorter acquisition time in comparison to GRE-MRE, and this may be beneficial in CLD patients or in children and young adults who lack the ability to breath-hold appropriately [14]. In addition, due to a higher wave signal-to-noise ratio in comparison to GRE-MRE, SE-EPI-MRE enabled wave tracking through larger and deeper regions of the liver, which resulted in larger measurable regions of interest in an obese patient. Obtaining as much of the liver volume as possible is important for both the accuracy and reproducibility of liver stiffness measurement as well as to overcome limitations related to sampling errors with biopsy or US elastography. Because GRE-MRE has better agreement and repeatability, GRE sequences should be used in the first place. However, in patients with conditions of iron overload in the liver (e.g., cirrhosis, thalassemia, and hemochromatosis), or in obese patients who are likely to fail on GRE sequences, SE-EPI sequences should be used.

In a subgroup analysis of GRE-MRE on fibrosis staging systems, significant differences in sensitivity between METAVIR versus other systems were present (94% vs. 85%, P = 0.007). This may be in line with the findings of studies from Western countries and in studies in which the CLD aetiology was not specified (which included NAFLD), which also demonstrated significantly lower sensitivity in subgroup analysis. NAFLD is the most common liver disease in the USA, and a recent meta-analysis of MRE on NAFLD demonstrated a pooled sensitivity and specificity of 75-88% and 77-87%, respectively [19]. We assume that the different histopathologic findings of viral hepatitis and NAFLD together with the different fibrosis staging systems used may have affected the results.

Three-dimensional (3D) SE-EPI is advantageous in that all three-directional motion encoding gradients are applied and enable true 3D volumetric acquisition of MRE, making the measurement of liver stiffness more accurate and generalisable. In a subgroup analysis, significantly higher sensitivity was reported in 3D SE-EPI than in 2D SE-EPI. However, all of the studies with 3D SE-EPI included large numbers of normal patients, which may have caused the diagnostic performance to be overestimated because of spectrum bias. Future studies comparing 2D and 3D SE-EPI-MRE with histopathology as a reference standard are necessary.

This study has both strengths and limitations. One major strength is that this is the first meta-analysis to compare GRE and SE-EPI-MRE with histopathology as the reference standard. Another strength of this study is inclusion of some of the most recently published studies, with 12 out of 26 included studies having been published after January 2016, as well as subgroup and sensitivity analyses to evaluate the stability of findings and to identify potential factors responsible for heterogeneity. The main limitation of this study is that it was a study-level diagnostic accuracy meta-analysis, as opposed to being based on individual participant data. Second, we could not compare the failure rate of each MRE sequence because most of the included studies lacked sufficient data to calculate the failure rate. However, several recent studies have shown that SE-EPI-MREs have lower failure rates, particularly in patients with iron overload in the liver [13, 15]. Third, considerable heterogeneity was observed for both GRE and SE-EPI sequences. Fourth, we considered an acceptable interval of time between MRE and biopsy to be less than or equal to 1 year. This could have resulted in some bias if the disease progressed during that time interval.

In conclusion, through a systematic review and meta-analysis, MRE with both GRE and SE-EPI was shown to be a highly accurate and noninvasive technique for staging liver fibrosis in CLD. Since GRE-MRE has better agreement and repeatability, GRE-MRE should be used in the first place. However, in patients who are likely to fail on GRE-MRE, SE-EPI-MRE should be used. Future research comparing SE-EPI-MRE and other noninvasive methods is warranted.