Introduction

Advanced liver fibrosis and cirrhosis are significantly correlated with mortality in patients with chronic liver disease, along with increased risk of hepatocellular carcinoma (HCC) development [1, 2]. Although untreated fibrosis is known to progress to irreversible cirrhosis, recent studies have shown that hepatic fibrosis (HF) is potentially reversible with antifibrotic drugs in patients with hepatitis B or C virus infection [1, 3]. Therefore, reliable methods for measuring HF are essential for monitoring treatment response to antifibrotic agents and for the early detection of HF progression [1]. As liver biopsy has limitations including invasiveness, sampling errors and interobserver variability, there has been an increasing interest in non-invasive tools for HF staging [4,5,6].

There are several imaging methods for the non-invasive diagnosis of HF, including ultrasound (US) elastography, magnetic resonance (MR) elastography and diffusion-weighted MR imaging [4,5,6,7,8,9]. Diffusion-weighted imaging (DWI) is one of the potential MR techniques for the diagnosis and staging of HF [9, 10]. Of late, various DWI models other than conventional apparent diffusion coefficients (ADCs) have been introduced. Intravoxel incoherent motion (IVIM) imaging, a bi-exponential model of DWI, considers the combined effects of pure molecular diffusion and perfusion-related diffusion parameters [11]. According to several recent studies, IVIM imaging demonstrates better diagnostic performance for differentiating significant HF than ADC [12, 13]. Also, a stretched exponential DWI model has been developed to reflect more realistic physiological characteristics of in vivo tissue, such as heterogeneity [14]. Stretched exponential DWI differs from the bi-exponential model in that it needs no assumptions regarding tissue compartmentalisation, unlike the true diffusion and perfusion compartments of the biexponential model [14, 15].

The stretched exponential DWI model has been recently used in glioblastoma and cervical tumours in the differentiation of tumour types and their grading, and in the early evaluation of tumour response to treatment [14, 16,17,18]. Regarding characterising liver disease, a previous study with an animal model showed that the stretched exponential model could potentially help assess HF [15]. However, to our knowledge, no study has explored the feasibility of stretched exponential DWI or has compared the diagnostic performance of different DWI models for the evaluation of HF in humans.

Therefore, the purpose of our study was to compare retrospectively the ability of DWI parameters acquired from mono-exponential, bi-exponential and stretched exponential models in the evaluation of HF.

Materials and methods

Two of the authors (E.K. and J.H.) are employees of Philips Healthcare and they provided technical support for the post-processing software of DWI. However, these authors were not involved in data analysis and the other authors who were not associated with Philips Healthcare maintained full control of the data during the study.

Study population

This retrospective study was approved by our institutional review board, and the requirement for informed consent was waived due to its retrospective study design. A study coordinator (Y.E.C.) reviewed the clinical information of patients and their pathological results to identify an eligible study population (Fig. 1). From November 2015 to January 2017, patients who fulfilled the following criteria were initially recruited: (1) adults (≥18 years old) who underwent liver MRI including DWI with multiple b values at 3 T, (2) those with their HF stage histologically confirmed, and (3) those for whom the time period between histological result and MRI was less than a year. Among 120 patients who met these criteria, 25 patients were excluded for the following reasons: (1) patients who underwent a right hemihepatectomy (n = 12) which precluded the region-of-interest (ROI) measurement of the diffusion parameters and whose histological results showed (2) a moderate or severe degree of fatty liver (n = 4), (3) iron deposition (n = 3) or (4) chemotherapy-related sinusoidal obstruction (n = 6). Patients with these histological features were excluded because these features could affect the diffusion parameters [19,20,21,22]. Finally, 95 patients were included in this study (men, 70; women, 25; mean age, 59.5 ± 9.5 years).

Fig. 1
figure 1

Study flow diagram

Histopathological analysis

Pathological assessment of HF was considered as the reference standard. The mean time period between MRI and pathological evaluation was 81.2 days (SD, 87.8 days). Liver specimens were obtained by hepatic resection (n = 86), liver transplantation (n = 5) or percutaneous liver biopsy (n = 4). An experienced hepatic pathologist (Y.N.P.), who was blinded to clinical and imaging findings, reviewed the liver specimen and assessed HF according to the Batts-Ludwig scoring system [23, 24]. HF stage was determined as follows: F0, no fibrosis; F1, portal fibrosis without septa; F2, periportal fibrosis with few septa; F3, septal fibrosis; F4, cirrhosis. Hepatic steatosis was categorised through visual assessment of the percentage of hepatocytes with fatty accumulation as follows: S0 (no fatty liver), <5%; S1 (mild fatty liver), 5–33%; S2 (moderate fatty liver), 34–66%; S3 (severe fatty liver), >67%.

MR imaging acquisition

Imaging was performed on a 3-T MR scanner (Achieva TX, or Ingenia; Philips Healthcare, Andover, MA, USA) with anterior and posterior coils. The liver MRI protocol included non-enhanced T1-weighted and T2-weighted images, in- and opposed-phase T1-weighted images, DWI and contrast-enhanced T1-weighted images. DWI was obtained using free-breathing fat-suppressed single-shot echo-planar imaging with the following parameters: echo time, 50.2 ms; repetition time, 5,000 ms; echo train length, 27; receiver bandwidth, 2,877 per pixel; field of view, 400 mm; matrix size, 90×92; number of excitation, 3; section thickness, 5 mm; spectral adiabatic inversion-recovery (SPAIR) fat suppression; acquisition time, approximately 4 min 15 s. Nine b factors (0, 10, 25, 50, 75, 100, 200, 500, 800) were applied in three orthogonal directions.

Post-processing and image analysis

Post-processing was performed using IVIM analysis software (EXPRESS; Philips Healthcare) to obtain diffusion parameters and parametric liver maps.

The ADC was calculated from all nine b values by using a mono-exponential model according to the following equation:

$$ \mathrm{S}(b)/\mathrm{S}(0)=\exp \left(-b\cdotp \mathrm{ADC}\right) $$

where S(b) represents the signal intensity at a given b value, S(0) represents the signal intensity in the absence of any diffusion weighting and b represents the b factor applied in diffusion weighting.

By using a biexponential IVIM analysis, the true diffusion coefficient (D t ) was calculated by using b values greater than 200 s/mm2 with the simple linear fit equation. The pseudo-diffusion coefficient (D p ), and perfusion fraction (f) were then obtained based on the following equation with a non-linear regression algorithm [11, 25]:

$$ \mathrm{S}(b)/\mathrm{S}(0)=\left[\left(1-f\right)\bullet \exp \left(-b\bullet {D}_t\right)\right]+\left[f\bullet \exp \left(-b\bullet {D}_p\right)\right] $$

In a stretched exponential DWI model, the distributed diffusion coefficient (DDC) and the water molecular diffusion heterogeneity index (α) were calculated using the following equation [14]:

$$ \mathrm{S}(b)/\mathrm{S}(0)=\exp {\left[-\left(b\bullet \mathrm{DDC}\right)\right]}^{\alpha } $$

where DDC represents the mean intravoxel diffusion rate and α is related to the intravoxel water molecular diffusion heterogeneity, which is bound between 0 and 1 [16]. A higher α value indicates low intravoxel diffusion heterogeneity, which approaches pure mono-exponential decay. As low b values (below 100 s/mm2) are known to be less important for accurate fitting of the stretched exponential DWI model [14, 26], another imaging set of six b values (0, 10, 50, 200, 500, 800) with omitted lower b values was tested. Using our post-processing software, diffusion parameters could be calculated using only the selected b values from the same ROIs with the full b-value imaging set. The parameters obtained in this six-b-value imaging set were defined as DDC# and α#. In this study, we used the least-squares fit for the mono-exponential model, and the Levenberg-Marquardt fit for the biexponential and stretched exponential models [16, 27].

Regarding image analysis, a board-certified abdominal radiologist (N.S.) with 5 years of experience in liver MRI who was blinded to the histopathological results put ROIs on DWI images (b = 0 s/mm2) using the post-processing software. The different metrics could be calculated from these ROIs on DWI images using the software. Three circular ROIs (mean area ± SD, 275.1 ± 174.2 mm2) were drawn in the right hepatic lobe, avoiding large intrahepatic vessels and focal hepatic lesions. The left hepatic lobe was not included because of possible cardiac motion artefacts that could result in inaccurate measurement of the diffusion parameters [28]. The mean value of the three ROI values was used for statistical analysis.

Statistical analysis

The DWI parameters were compared among patients with HF stage 0–1, those with HF stage 2–3 and those with HF stage 4 using the analysis of variance (ANOVA) test. Post hoc analysis with Bonferroni multiple comparisons was used to determine a statistical significance between each group. Spearman rank correlation analysis was performed to evaluate the correlation between DWI parameters and histological HF stages. The overall diagnostic performance of DWI parameters for HF staging was obtained with the Obuchowski measure using R package and was compared using the DeLong test [29]. The Obuchowski measures allow generalisation of areas under the receiver operating characteristic (ROC) curve when a reference standard is not a binary scale [19]. Next, the diagnostic performances of DWI parameters for diagnosing significant fibrosis (F ≥ 2) and cirrhosis (F = 4) were compared using the ROC curve analysis. Areas under the ROC curve (AUCs) among DWI parameters were compared using the DeLong test. The intra-individual variability of the parameters among the three ROIs in the liver was evaluated by using the coefficient of variation (CoV). The differences in CoV values between each parameter were evaluated with a linear mixed model.

All statistical analyses were performed with SAS (version 9.4; SAS, Cary, NC, USA) and R package (version 3.3.2). P values less than 0.05 were considered to indicate a statistically significant difference.

Results

Patient characteristics

The number of patients according to each HF stage was as follows: 30 (31.6%) patients in F0, 14 (14.7%) patients in F1, 4 (4.2%) patients in F2, 18 (18.9%) patients in F3 and 29 (30.5%) patients in F4. For hepatic steatosis, 79 patients had no fatty liver (S0) and 16 patients had mild fatty liver disease (S1). Of the 95 patients, 65 patients had underlying liver disease, and the aetiology of chronic liver disease was as follows: hepatitis B virus (n = 44), alcoholic liver disease (n = 14) and hepatitis C virus (n = 7). Malignant focal hepatic lesions were identified in 84 patients: HCC in 46 patients, metastasis (from colorectum, 33; stomach, 2; ovary, 2) in 37 patients and adult-type hepatoblastoma in one patient.

Staging hepatic fibrosis

The values of DWI parameters according to HF stage are shown in Table 1. All the parameters showed a tendency to decrease as HF stage progressed (Fig. 2). All DWI parameters except ADC and D t were significantly different among F0–1, F2–3 and F4 stages (p < 0.001). Among these parameters, the DDC# was the only parameter to show significant difference between F0–1 and F2–3, and between F2–3 and F4 (p = 0.029 and p = 0.003, respectively). D p , α, and α# revealed significant differences between F0–1 and F2–3 (p < 0.05), although there was no significant difference between F2–3 and F4. Figures 3 and 4 show the graphs of signal decay and DWI parametric maps in non-fibrotic liver and cirrhosis.

Table 1 Values of DWI parameters according to the hepatic fibrosis stage
Fig. 2
figure 2

Box plots of DWI parameters according to fibrosis stage (F0-1, F2-3 and F4). All of the parameters except for ADC and D t showed a tendency to decrease as the fibrosis stage increased. The DDC# was the only parameter to show significant differences between F0–1 and F2–3, and between F2–3 and F4 (p = 0.029 and p = 0.003, respectively)

Fig. 3
figure 3

Non-fibrotic liver (F = 0) of a 62-year-old man. a The logarithmic plots of signal decay were obtained by using the mono-exponential, bi-exponential and six-b-value and nine-b-value sets of the stretched exponential models. Two curves from the stretched exponential model (red and green) almost overlapped exactly. b A native image was obtained with the b value of 0 s/mm2. Parametric maps of c ADC, d D t , e f, f D p , g DDC and h α were demonstrated

Fig. 4
figure 4

Liver cirrhosis (F = 4) of a 51-year-old woman. a The logarithmic plots of signal decay were obtained by using the mono-exponential, bi-exponential, and six-b-value and nine-b-value sets of the stretched exponential models. Also, two curves from the stretched exponential model (red and green) are nearly same. b A native image acquired with the b value of 0 s/mm2 shows hepatic cysts (arrows) in the left lobe. Parametric maps of c ADC, d D t , e f, f D p , g DDC and h α were demonstrated

All of the DWI parameters, except D t , showed significant correlation with HF stages. The negative correlation was strongest for DDC# (ρ = –0.609, p < 0.001), followed by α# (ρ = –0.585, p < 0.001), DDC (ρ = –0.576, p < 0.001), D p (ρ = –0.507, p < 0.001), α (ρ = –0.438, p < 0.001), f (ρ = –0.383, p < 0.001) and ADC (ρ = –0.222, p = 0.031).

Diagnostic performance of DWI parameters for staging hepatic fibrosis

The Obuchowski measure of the diffusion parameters for HF staging was highest for DDC# (0.770 ± 0.03), followed by α# (0.768 ± 0.04), DDC (0.748 ± 0.03), D p (0.728 ± 0.04), α (0.692 ± 0.04), f (0.669 ± 0.04), ADC (0.597 ± 0.05) and D t (0.575 ± 0.05) (Table 2). Compared to other parameters, DDC# showed significantly better performance than ADC (p < 0.001), D t (p < 0.001) and f (p = 0.035). There was no significant difference between DDC# and the other parameters (DDC# vs D p , p = 0.375; DDC# vs DDC, p = 0.245; DDC# vs α, p = 0.106 and DDC# vs α#, p = 0.955).

Table 2 Overall diagnostic accuracy of DWI parameters in hepatic fibrosis staging

The diagnostic performances of DWI parameters for diagnosing significant fibrosis (F ≥ 2) and cirrhosis (F = 4) using the ROC curve analysis are summarised in Table 3. For the diagnosis of significant fibrosis (≥F2), the DDC# had significant higher AUCs than ADC, D t and f (p < 0.001, p < 0.001 and p = 0.012, respectively), although the DDC# was not significantly different from the other parameters (DDC# vs D p , p = 0.659; DDC# vs DDC, p = 0.369; DDC# vs α, p = 0.562; DDC# vs α#, p = 0.571).

Table 3 Diagnostic performances of DWI parameters in diagnosing significant fibrosis (≥F2) and cirrhosis (=F4)

The AUC for differentiating F0–3 and F4 (cirrhosis) was highest for DDC# (0.837, 0.746–0.928), followed by DDC (0.801, 0.709–0.894), α# (0.770, 0.658–0.882) and D p (0.766, 0.658–0.875). The DDC# also showed a significantly higher AUC than ADC (p = 0.007) and D t (p < 0.001), whereas its AUC was not significantly different from that of f (p = 0.164), D p (p = 0.351), DDC (p = 0.234), α# (p = 0.335) and α (p = 0.082).

The variability of the DWI parameters in the three different locations of the liver was evaluated using CoV. The CoV (%) values were as follows: ADC, 9.4% (95% CI, 8.2–10.6); D t , 19.0% (16.5–21.7); f, 23.8% (20.9–27.1); D p , 74.6% (63.9–84.2); DDC, 16.1% (14.1–18.4); DDC#, 16.8% (14.6–19.3); α#, 18.4% (16.1–20.8); α, 15.1% (13.3–16.9). Among the DWI parameters, those showing good diagnostic performances for HF staging, such as DDC#, α# and D p , were selected to compare CoVs. The differences in CoVs between each parameter were significant between DDC# and D p (p < 0.001), and between α# and D p (p < 0.001). There was no significant CoV difference between DDC# and α# (p = 0.735).

Discussion

This study showed that the DWI parameters from the stretched exponential model had greater diagnostic potential than the ADC and D t for diagnosing HF. The DWI parameters from the stretched exponential model and D p showed overall comparable diagnostic performances for HF staging. However, DDC# and α# are more promising tools for assessing HF stages, as their intra-individual variability is lower than that of D p , and because they can be obtained accurately with fewer b-value acquisitions, which enables a reduction in scan time.

Several previous studies have focused on the diagnostic abilities of DWI for HF [12, 13, 19, 30,31,32]. As HF is associated with increased connective tissue, progression of HF restricts Brownian water diffusion within the liver [33]. In addition, deposition of collagen fibres increases hepatic resistance to portal blood flow, and this results in decreased portal blood perfusion [33]. These changes with HF contribute to decreases in DWI parameter values from the mono-exponential and bi-exponential models. Previous studies with the IVIM technique have shown that D p is significantly lower in the fibrotic liver than in the non-fibrotic liver, and D p showed better diagnostic performance than D t, or ADC [12, 13, 34]. Our study results correspond with these previous study results. With the exception of DWI parameters from the stretched exponential model, D p revealed the best diagnostic performance for diagnosing significant fibrosis and cirrhosis.

A stretched exponential model characterises non-Gaussian behaviour of molecular diffusion and reflects the degree of intravoxel heterogeneity of biological tissue [14]. In our study, all parameters from the stretched exponential model showed significant negative correlation with HF stages. The overall diagnostic performance for HF staging was the highest for DDC#, followed by α#, which was significantly better than ADC and D t . The DDC, which reflects a continuous distribution of diffusion coefficients from each diffusion compartment, is more accurate than conventional ADC or D t for HF staging. The value of α decreased as HF progressed in our study. This decrease in α can be explained by increased intravoxel diffusion heterogeneity in the fibrotic liver, possibly due to more histological heterogeneity than what would be observed in the non-fibrotic liver. Regarding the diagnosis of HF with stretched exponential DWI, one preliminary study using an ex vivo mouse model has been published [15]. In that study, DDC showed strong correlation with histopathological HF (ρ = –0.72; 95% CI, –0.83, –0.53), whereas there was no significant correlation between α and HF (ρ = –0.20; 95% confidence interval, –0.55, 0.19). The discordant result on α in our study may be attributed to different study subjects, i.e. the ex vivo animal model versus the in vivo human model. A larger cohort study should follow to establish the relationship between α and HF.

The D p from the biexponential model revealed overall comparable diagnostic performance to the parameters from the stretched exponential model for HF staging. However, considerable intra-individual measurement variability of D p limits it clinical utility. Poor measurement reproducibility of perfusion-sensitive IVIM parameters such as D p and f is a critical limitation of these parameters [33, 35, 36]. In addition, IVIM-derived parameters significantly depend on the number of b-value acquisitions [37, 38]. As including more b values improves the accuracy of the IVIM-derived parameters, some authors suggested at least 11 b-value acquisitions for the IVIM technique [38]. However, acquiring more b values results in longer scan time. For the stretched exponential model, parameters from the six-b-value imaging set (DDC# and α#) showed comparable performance with those (DDC and α) from the nine-b-value imaging set. Therefore, the parameters from stretched exponential DWI can be superior to D p for HF staging due to better measurement reproducibility and possible shorter acquisition time.

Besides DWI, US elastography or MR elastography are clinically available for diagnosing and grading HF [4, 6, 8, 39]. US elastography technique including transient elastography (TE) is used the most in clinical settings because it is rapid and easy to perform. However, US elastography is operator-dependent, and its values are affected by many factors, such as ascites, obesity and narrow intercostal space [4]. MR elastography can evaluate the fibrosis of the whole liver and is reported to be superior to DWI in HF staging [8, 32]. Despite those advantages, MR elastography is not widely available to patients as it requires dedicated scanners and commercially available software [6]. Compared with MR elastography, DWI can be easily included in standard liver MR protocols. However, DWI parameters can be substantially influenced by technical factors such as b values and acquisition methods [6, 28]. Hence, poor reproducibility and lack of clear cut-off values between HF stages limit the clinical use of DWI parameters in the quantitative assessment and monitoring of HF. The main interests in non-invasive methods for HF diagnosis are the follow-up of treated patients and the non-invasive detection of HF in patients without specific history. However, it is still uncertain whether any method including DWI can replace biopsy in the initial work-up of HF, a potential limitation of any non-invasive method.

This study has some limitations. First, this study has an inherent selection bias due to its retrospective study design and our study population included a relatively small number of patients with intermediate stages of HF. Second, based on histological review, we excluded patients with a moderate to severe degree of fatty liver, iron deposition or chemotherapy-related sinusoidal obstruction. The possible confounding effects of these histological features in estimating HF stage could not be assessed, because the number of excluded patients was small to analyse. Third, there might have been some discrepancies between the locations of ROI measurements and pathological evaluation. As fibrosis may be heterogeneously distributed within the liver, this is a potential bias in interpreting this study results. Fourth, the time interval between MRI and pathology confirmation was relatively long. Fifth, the free-breathing scan technique used in this study could have affected the accuracy and measurement variability of DWI values. Finally, we did not compare all possible different b-value combinations in the stretched DWI model, because optimisation of b-value selection was not the primary purpose of this study.

In conclusion, stretched exponential DWI is a promising method for the diagnosis of HF. The DDC and α from fewer b-value acquisitions can be superior biomarkers for HF staging than D p because of less intra-individual measurement variability and shorter acquisition time.