Introduction

Fat and water signals can be separated with various magnetic resonance imaging (MRI) methods [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]. Generally, water and fat separation utilizes two physical effects: differences in relaxation [19] and differences in chemical shift [20]. Currently, the most utilized MRI methods for proton density fat fraction (PDFF [20, 21]) assessment are Dixon methods [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], which exploit the chemical shift effect. Dixon methods do not require any special modifications in MRI sequences, and are fast and easy to implement on MR systems. However, the spectrum of fat is nontrivial [22,23,24]; therefore, to achieve highly accurate reconstruction and subsequent PDFF quantification, a multi-echo approach with a multi-peak signal model is required [12,13,14,15]. The next crucial requirement for correct water–fat signal separation during data reconstruction in these methods is the estimation of a field map (B0). The importance of both prerequisites for precise fat fraction quantitation increases with decreasing FF. The clinical relevance of a rather low level of tissue fat is evidenced by the fact that a 5% value for an MR-based PDFF was suggested as the cut-off between histologically proven mild and moderate hepatic steatosis [25, 26]. Several studies have focused on the comparison of in vivo MRI and MR spectroscopy (MRS) measurements [27,28,29,30,31], predominantly for the range of higher proton density fat fractions (PDFF > 5%). Other studies have studied and reported the linearity, bias, and precision of hepatic MRI–PDFF versus MRS–PDFF across field strengths, methods, and manufacturers [32]. Furthermore, specified possible sources of errors (T1 and T2* relaxation, number of echoes, Gaussian decay, bi-exponential decay, number of double bonds (ndb), noise bias, and water frequency) in MRI–PDFF quantification [33] based on multi-echo magnitude images were investigated.

Currently, several software tools are available for Dixon MRI-based fat quantification. All main MRI system manufacturers provide their own embedded software tools for fat quantification; moreover, several toolboxes in the Matlab® environment have been proposed. The quantification of a low-level (< 5%) fat fraction requires an advanced signal model and optimized sequence parameters to reach maximal FF estimation accuracy.

In this work, we aimed to compare three available software tools: the FatWater12 Matlab® toolbox [34], and Fatty Riot [35] (both non-commercial) and the commercial Jim software v8 (Xinapse Systems Ltd, West Bergholt, UK), and to evaluate the accuracy of low-level fat fraction estimation in phantoms and in healthy volunteers. The purpose of this study was to evaluate the MRI–PDFF estimation accuracy of several tested software packages, and determine a robust (against water/fat swaps) and computationally efficient solution.

Theory

Reliable water–fat separation necessitates the use of a multi-peak signal model. With prior knowledge of the multi-peak fat spectrum (spectral positions and relative amplitudes of individual fat peaks), a more precise water (w) and fat (f) separation can be achieved [12]. Thus, the signal s(TEn) of one image voxel for the nth echo time (TEn) can be described as follows [11]:

$$s\left( {{\text{TE}}_{n} } \right) = \left( {w + f \cdot \sum\limits_{{p = 1}}^{P} {\alpha _{p} } \cdot e^{{j\left( {2\pi \cdot f_{p} \cdot {\text{TE}}_{n} } \right)}} } \right) \cdot e^{{j\left( {2\pi \cdot {\text{TE}}_{n} \cdot \left( {\psi + j\frac{1}{{T_{2}^{*} }}} \right)} \right)}} ,$$

where \({\alpha }_{p}\) and \({f}_{p}\) [Hz] define the relative amplitude and frequency of the pth spectral fat component, respectively. The frequency ψ [Hz] represents the B0 field map. This general expression can be considered an advanced signal model that expresses the water–fat signal behavior during acquisition. Its generality is limited by the assumption that all water and fat peaks exhibit identical T2* relaxation (which is justified by the fact that, in the supposed tissues of interest, such as the liver, the coherence loss rate, r*, due to microscopic field inhomogeneity, dominates over the natural relaxation rate, r2, so 1/T2* = r2 + r*r*).

The unknown parameters w, f, ψ and T2* can be estimated by several approaches. A common and important problem for most “Dixon-based” methods (e.g., [8, 9, 11, 12, 36]) is the estimation of the B0 field map, which is estimated from the acquired echo images simultaneously with other parameters. For a single-peak fat model, several approaches to the estimation of the field map were applied [3, 5]. Regardless of the calculation details, the single-peak fat signal model does not provide the required accuracy of PDFF. To address this problem, An and Xiang [37] employed multi-frequency spectrum modeling and used nonlinear least squares estimation for water and fat decomposition. The field map estimation robustness can be improved by the region-growing (RG) algorithm [38,39,40], which utilizes the similarity of the static field B0 between neighboring voxels. Generally, RG may fail in low-SNR image regions, in which the phase unwrapping process is unsuccessful. The RG algorithm can also fail in discontinuous regions. Another approach imposes a smoothness constraint on the estimated field map. This can be accomplished by a spatial low-pass filtering of the final field map [8], or by explicit smoothness terms within the optimization procedure [13,14,15]. The use of arbitrary echo times (phase shifts) has some specifics and in case of iterative, descent-based algorithm finding of global minimum cannot be guaranteed due to the presence of multiple local optima generated by the maximum-likelihood cost function [13]. To overcome several limitations of descent-based algorithms, the variable-projection method (VARPRO) [13] was proposed, which provides globally optimal solution to the nonconvex nonlinear least squares optimization problem. Later, the same authors such as in previous case introduced a more advanced algorithm graph-cut [14] that generally improves the joint estimation of the water/fat images and the field map. The last and most widely used approach is a multi-resolution method, which gradually seeks a solution in a coarse-to-fine manner [15, 40, 41]. The attractivity of acceleration methods, such as compressed sensing, keeps growing; therefore, Sharma et al. developed water–fat-compressed sensing reconstruction using restricted subspace field map estimation (RSFE) [41]. In this method, the field map is directly estimated from undersampled k-space measurements.

The widely popular approach [9] introduced by Reeder et al. performs reconstruction of water, fat, and field maps independently at each voxel by an iterative nonlinear least squares fitting procedure. The iterative decomposition works for arbitrary echo times and results in the maximum-likelihood water/fat decomposition. The significant drawback of the original method [9] is an implicit assumption that the field inhomogeneity is moderate, which, in the presence of large field inhomogeneity, leads to inaccurate water/fat estimation. Several extensions of the original algorithm focused on this problem. One of the possible extensions has been published by Yu et al. [39], who performed field map estimation with an RG scheme and improved immunity to field map inhomogeneity. Another interesting solution is the multi-resolution method, Hierarchical IDEAL [36], which helps manage the selection of the correct decomposition at each voxel. This method was subsequently generalized for arbitrary echo times and two or more species (water, fat, silicone oil, etc.) [15]. The combination of two approaches, RG and multi-resolution, was introduced by Lu and Hargreaves [40]. The field map estimate is refined and propagated to increasingly finer resolutions in an efficient manner until the full-resolution field map is obtained for final water–fat separation.

Methods

Two different methodological approaches are compared: MRI vs MRS measurements. Due to many possible effects (partial volume effect, volume inhomogeneity, excitation profile, etc.), the finding of correlation between MRS and MRI measurements could be hampered for in vivo measurements, especially in the case of low-fat fractions. Therefore, the use of a dedicated phantom minimized these in vivo effects. In this study, we performed (i) phantom and (ii) in vivo measurements.

The data for quantitative analysis were acquired on a 3 T MR System (Siemens Trio, Siemens Healthineers, Erlangen, Germany) with phase-array abdominal and spinal receiver coils in combination with a whole-body transmit coil.

Phantoms

Two multi-compartment phantoms were prepared (small and large), with compartments filled with various fat concentrations prepared by mixing Intralipid® 20% emulsion with saline (0.9% NaCl) in pre-defined ratios.

The small phantom (phantom 1) consisted of eight vials (20 ml syringes ~ 20 mm inner diameter) tied together and submerged in saline, where two contained only saline or the Intralipid® 20% emulsion (20% soybean oil, 1.2% egg yolk phospholipids, 2.25% glycerin, and water). The ground truth (expected) 22.32% of the PDFF value of the Intralipid® 20% emulsion was calculated from known chemical properties (no. of protons, molar mass, density) of individual chemical components. The remaining six vials were filled with mixtures of saline and Intralipid® in defined ratios (v/v): 1/20, 2/20, 3/20, 4/20, 5/20 and 10/20 (PDFF ≈ 1.12, 2.23, 3.35, 4.46, 5.58, and 11.16% of fat).

The large phantom (phantom 2) contained several vials with different FF concentrations (mixtures of saline and the Intralipid® 20% emulsion). The purpose of the large phantom was to verify the robustness of the tested algorithms against strong B0 field inhomogeneity. The presence of several objects ensured that the B0 magnetic field was inhomogeneous and that the remaining air bubbles influenced the local magnetic field. The phantom was submerged completely in water to minimize the susceptibility effect on the air/water boundary.

MR image data were acquired with 3D- and 2D-SPGR (Spoiled Gradient Echo) [42] sequences with (a) monopolar/unipolar gradients (used for “small” phantom) and (b) bipolar readout gradients (used for “large” phantom), respectively. Parameters for SPGR were as follows.

  1. (a)

    Field of view (FOV) = 31.5 cm × 31.5 cm, bandwidth (BW) = 1040 Hz/pixel, repetition time (TR) = 9.32 ms, flip angle (FA) = 3° (to minimize T1 effects), acquisition matrix size = 160 × 160 pixels, and 6 echo times (TE = 1.23, 2.54, 3.85, 5.16, 6.47, and 7.78 ms). The echo spacing of Δ TE = 1.31 ms was chosen to correspond to a phase shift theta of 7π/6 radians between water and the major fat-peak signal.

  2. (b)

    FOV = 40 cm × 40 cm, BW = 1040 Hz/pixel, TR = 25 ms, FA = 5°, matrix size = 256 × 256 pixels, and 6 echo times (TE = 1.23, 2.39, 5.14, 9.07, 12.99, and 16.92 ms). The non-equidistant echo spacing and long echo times ensured difficult conditions for the tested reconstructions (occurrence of phase wraps for longer echo times).

In the case of the small phantom, spectroscopic data were acquired and evaluated by a HISTO (high-speed T2-corrected multi-echo) protocol [25, 43] (multi-echo single-voxel spectroscopy based on the STEAM sequence [44] including fat and R2 quantification supplied by the system manufacturer; TE = 12, 24, 36, 48, and 72 ms, TR = 2000 ms and a voxel size of 12 × 12 × 12 mm3). It should be noted that the fat percentage value provided by the HISTO protocol (Siemens) is only a part of the total fat fraction. The correct PDFF value can be obtained by multiplying the value provided by HISTO by a specific factor based on the signal model. This factor is determined as \(\sum_{p=1}^{P}{\alpha }_{p}/\sum_{h=1}^{H}{\alpha }_{h}\), where \({\alpha }_{h}\) is the relative amplitude of the hth spectral fat component in a frequency range from ≈ 0.6 ppm to ≈ 3 ppm.

The small phantom data were processed by all three tested software tools to evaluate the accuracy of the PDFF–MRI quantification. The data from big phantom were processed in FatWater12 Matlab® toolbox to find robust solution against the water/fat swaps in presence of large field inhomogeneity. In principle, algorithms exploited by Fatty Riot toolbox (GC) and Jim 8.0 software (VARPRO, but without GC) are contained in FatWater12 Matlab® toolbox; therefore, the “big phantom” data were processed by all algorithms for multi-echo (3+) data in this toolbox.

Model of fat

In general, the use of Intralipid requires the application of a corresponding Intralipid spectral signal model in the calculation; application of the in vivo [34] fat signal model would lead to incorrect fat fraction estimation. For this purpose, the spectrum of Intralipid was acquired on 3 T and 9.4 T (Bruker Biospec 94/30 USR, Billerica, MA, USA) NMR systems by the STEAM-CHESS (12 × 12 × 12 mm3 voxel size, TE = 20, 30, 40, 50, 70, and 90 ms, TM = 10 ms, NA = 4, TR = 5000 ms, water suppression BW of 50 Hz) and STEAM-VAPOR (2 × 2 × 2 mm3 voxel size, TE = 5–95 ms with step of ΔTE = 5 ms, TM = 6.8 ms, NA = 100, TR = 5000 ms, water suppression BW of 120 Hz) sequences, respectively (Fig. 1). The Intralipid spectra were measured at room temperature. The measured 9.4 T spectra were fitted in jMRUI (AMARES), and by extrapolating the TE dependence of each individually estimated component to TE = 0 in Matlab® (Fig. 1b) the signal model of Intralipid® 20% emulsion was obtained.

Fig. 1
figure 1

The Intralipid spectra at a 9.4 T field (STEAM-VAPOR sequence, 2 × 2 × 2 mm3 voxel size, TE = 5–95 ms, ΔTE = 5 ms, TM = 6.8 ms, NA = 100, TR = 5000 ms) and b fitted components; c comparison of 3 T and 9.4 T spectra; d the Intralipid spectra at 3 T (STEAM-CHESS sequence, 12 × 12 × 12 mm3 voxel size, TE = 20, 30, 40, 50, 70 and 90 ms, TM = 10 ms, NA = 4, TR = 5000 ms)

Healthy subjects

Seven young healthy volunteers (3f/4 m; age, 32.4 ± 3.1 years; body mass index, 22.6 ± 2.2 kg m−2; mean ± SD) participated in this study, which was approved by the local ethics committee. The volunteers were measured using a 3D-SPGR sequence with unipolar gradients, and with the following parameters: TR = 9.32 ms; TE = 1.23, 2.54, 3.85, 5.16, 6.47 and 7.78 ms; voxel size 1.2 × 1.2 × 3.5 mm3; FA = 3°; CAIPIRINHA with R = 2 × 2 acceleration factor; and 48 slices; matrix size in-plane = 112 × 160 px (interpolated to 224 × 320 px). Time of acquisition (TA) was 6.9 s. MRS was performed and evaluated by the HISTO protocol, with TR = 3000 ms, 5 echo times (TE = 12, 24, 36, 48 and 72 ms), and a voxel size of 20 × 20 × 20 mm3. Both in vivo MRI and MRS measurements were performed at exhalation.

In fact, there is no standardized way to quantify the severity of water/fat swap artifacts in the calculated maps. In phantoms, the w/f swap artifacts were assessed based on the total swap area as detected by segmentation (edge filter, threshold, etc.) in the calculated PDFF maps; this was possible thanks to the prior knowledge of the phantom. In the case of in vivo measurements, the number of slices containing any water/fat swaps was evaluated.

Software

MRI data were processed in the FatWater12 Matlab® toolbox [34], Xinapse Systems’ Jim software v8 and the Fatty Riot Matlab® toolbox [35]. Calculations with the FatWater12 toolbox were performed using four different approaches: Hierarchical IDEAL [15]; Region-Growing method [39]; Graph-Cut (GC) [14]; and Restricted Subspace Field Map Estimation (RSFE) [41]. The Fatty Riot toolbox exploited the GC [14] approach in our case. The tested FatWater12 and Fatty Riot toolboxes use the same implementation of GC algorithm. Jim software v8 uses the VARPRO method [13], and we used a hybrid complex/magnitude-fitting option as a final step [45]. In all cases, complex (real + imag.) data were used for reconstruction. All algorithms were applied with the default values of their parameters and only in GCs setup (only in vivo case) the “maximum R2*” was changed to 250 and the “optimization transfer descent flag” was changed to 1. No R2* estimation is included in the RFSE and RG approaches. In the other tested algorithms, R2* estimation is provided. In the GC approach, both options (with or without R2* estimation) are available and the default option is without R2* (the minimum and maximum values are both “0”).

All data analysis and statistical analyses were performed in MATLAB (MathWorks, Natick, MA, USA). Workstation configuration: Intel® Core™ i7-7700HQ CPU @ 2.80 GHz, 24 GB RAM 2400 MHz. Furthermore, the tested algorithms were also evaluated in terms of computational efficiency (expressed by computational time).

Results

Model of fat

Unlike the 3 T spectra (Fig. 1d), the acquired 9.4 T (Fig. 1a) spectra enabled differentiation of individual spectral components (frequencies and relative amplitudes). The acquired 3 T spectrum was compared with apodized (to reach the same FWHM of the main fat-peak as in the 3 T spectrum) 9.4 T spectra (Fig. 1c). It is clear that spectral component no.1 differed in amplitude for the 3 T and 9.4 T fields. Spectral components 2 and 3 are not visible in the 3 T spectra due to the presence of a residual water signal. The main fat peak (11) was overlapped with peak 10. The signal model of the Intralipid® 20% emulsion for 3 T field was estimated from the 9.4 T signal model with regard to the aforementioned. For further calculations in the phantom and in vivo in the liver, we used a 11-peak and a 9-peak model, respectively. The details of both models are displayed in Table 1.

Table 1 The Intralipid (spectrum in Fig. 1) and in vivo [34] signal model parameters

In vitro (phantom) measurements

In all experiments, we obtained multi-echo MR images and MR spectra of a quality that was satisfactory for further analysis. The echo-time effect on MR images (magnitude and phase) of phantom 1 and phantom 2 are depicted in Fig. 2. In phantom 2, more phase wraps were visible for longer echo times due to B0 field inhomogeneity, as we expected. The effect of echo time on the MR spectra of three vials of phantom 1 containing 11.12, 5.58, and 3.35% of PDFF (predominantly T2 relaxation) are depicted in Fig. 3.

Fig. 2
figure 2

MRI data used in water/fat decomposition (magnitude and phase) of phantom 1 and phantom 2. Phantom 1 contains vials with different ratios of intralipid emulsion and saline. The red labels in phantom 1 show the expected (ground truth) fat fraction for each vial. In phantom 2, we can see fast changes of the phase within the large field-of-view for longer echo times

Fig. 3
figure 3

Examples of MR spectrum dependence on TE (from bottom to top: TE = 12, 24, 36, 48, and 72 ms) from phantom 1 of three vials with ground-truth PDFFs of a 11.18, b 5.59, and c 3.35%. The spectra were visualized in jMRUI software [46,47,48]

The results of PDFF analysis by different MRI approaches and MRS are shown in Fig. 4 for all approaches. The Bland–Altman plots comparing (A) the ground truth and the average of MRI- and MRS-based estimates of PDFF and (B) the MRI- and MRS-based estimates of PDFF (Fig. 5) are shown. Table 2 shows slopes of the linear regression and biases of BA analysis for each algorithm/method separately.

Fig. 4
figure 4

Phantom: correlation between a ground-truth and PDFFs (MRI and MRS); b MRS and MRI measurements. The right panel is an enlarged portion of the left panel displaying the region of fat content below 7%. ** T2* estimate was included

Fig. 5
figure 5

Bland–Altman plot for PDFF of ground-truth values, MRS, and MRI measurements, showing the limits of agreement (dotted lines) at – 1.96 SD and + 1.96 SD around the mean difference. a Comparison of the ground-truth and the average of MRI- and MRS-based estimates of PDFF. b Comparison of the MRI- and MRS-based estimates of PDFF. The red lines represent the mean of the differences (bias)

Table 2 Linear regression slopes and the mean of differences (biases) for BA analysis in phantom 1

The robustness of algorithms (included in the FatWater12 toolbox) to considerable field inhomogeneity was tested with the second phantom (Fig. 6). Reconstruction almost without water/fat swaps over the full FOV was achieved only by the GC algorithm (1.71% of full FOV); the other reconstructions were affected by water/fat swaps in relatively large image regions: 10.53, 12.18, and 36.64% of full FOV for Hierarchical IDEAL, RSFE, and RG, respectively. Moreover, the reconstructed PDFF maps show that the estimated fractions were inaccurate/incorrect, as can be seen in the water regions (Fig. 6).

Fig. 6
figure 6

Fat fraction maps of the big phantom calculated in FatWater12 toolbox. Test of robustness to field inhomogeneity: a Restricted Subspace Field Map Estimation [12.18%], b Region-Growing [36.64%], c Hierarchical IDEAL [10.53%], and d Graph-Cut [1.71%]. The red square indicates the position of vials containing an Intralipid emulsion. The color bar shows the scale of PDFF in percentage (0–100). The numbers in square brackets show the sizes of regions (in percentage) affected by water/fat swaps within full field-of-view

In vivo measurements of hepatic fat fraction

Examples of spectra acquired from subjects with the highest and the lowest fat fraction are shown in Fig. 7. Several individual fat peaks were clearly visible in the subject with the highest liver fat content; in the subject with the lowest PDFF, the distinction of individual fat peaks was hampered by the lower SNR. The relative amplitudes of the fat peaks (CH=CH) at ≈ 5.19 and 5.29 ppm were low with respect to the main fat peak (CH2) at ≈ 1.3 ppm. In addition, the spectral broadening due to field inhomogeneity within the excited voxel contributed to the spectral overlap of this peak with that of water. The MRI-derived PDFF maps, calculated with different decomposition algorithms from one subject for slices that intersected the voxel excited in MRS measurements, are shown in Fig. 8. The position of the MRS voxel in the liver is marked by a small bold square in A16. The white-bordered PDFF maps in the top-right corner of each PDFF map of the liver represent magnified VOI in slices that intersected the volume covered by MRS measurements. The size of the VOI evaluated in the MRI measurements was 21.6 × 21.6 × 25.2 mm3 (≈ 30% more than the MRS voxel volume). The full-width-at-half maximum (FWHM) of the water peak ranged from 17 to 25 Hz for all subjects. The zoomed VOIs in the calculated PDFF maps showed slight differences between the algorithms. The PDFF values within the VOI were in the range from 0 to ≈ 7.2%. The distribution of the MRI–PDFF within the VOI for each subject is clearly visible in the whisker diagram plotted in Fig. 9. The statistical characteristics (inter-quartile range, skewness, and kurtosis) of the measured MRI–PDFF distributions within the VOI are shown in Table A1 (Appendix section). The results of all in vivo MRI and MRS measurements are compared in Fig. 10. The bar chart (Fig. 10a) shows the means of the calculated PDFF within the VOI for each subject and reconstruction. The correlation analysis (Fig. 10b) shows agreements between MRS and MRI measurements; R squared ranged from 0.8069 (Jim 8.0) to 0.9552 (RG). In terms of computational efficiency, the most efficient solution was the Hierarchical IDEAL (2.31 s); however, a change of the hierarchical level (HL) parameter can alter the computational efficiency dramatically. We performed extra reconstructions for only several HL values to show the effect of the change in this crucial parameter on the reconstruction efficiency: for HL = 1, 4, 5, 6, 7, 10, and 15, the calculation times were 0.41, 0.62, 0.84, 1.27, 2.31, 29.29, and 653 s, respectively. In the other “open” solutions/algorithms tested, a change of the input parameters can change the computational efficiency and robustness of the reconstruction too; slightly different reconstruction parameters can be required for each dataset. The evaluation of the influence of parameter changes in the other approaches is beyond the scope of this work.

Fig. 7
figure 7

Examples of echo-time (TE) dependence of MRS spectra acquired by HISTO sequence (from bottom to top: TE = 12, 24, 36, 48, and 72 ms) from subjects with a ≈ 6.3% and b ≈ 2.9% of MRS–PDFF). Magnified spectra were visualized in jMRUI. Water and main fat peak are situated at 4.7 and 1.3 ppm, respectively

Fig. 8
figure 8

Examples of percentage PDFF maps from subject S3 for slices that intersected the volume covered by the MRS measurement (position indicated in A16, shown zoomed in the right-top corner image inserts). Calculated by: a Hierarchical IDEAL {2.31 s}; b Graph-Cut {52.56 s}; c Region-Growing {25.39 s}; d RSFE {1278 s}; e Fatty Riot: Graph-Cut {75.74 s}; and f Jim software v8: VARPRO {69 s}. Water–fat swaps were only visible in the RG results (indicated by arrows in C16-21). The computational times of one slice (No. 21) for each approach is shown in the curly brackets

Fig. 9
figure 9

Distributions of calculated MRI–PDFFs in percentages within the observed volume for individual subjects. The bottom and top of the box represent the lower (Q1) and upper (Q3) quartile (25th and 75th percentile), and the central red line indicates the median. The whiskers extended from the top and bottom of the box are the upper and lower limits of advanced values (\(\pm\) 2.7σ ≈ 99.3% coverage), respectively. The red crosses denote outliers. Number of points (voxels in VOI) for each subject was 1944

Fig. 10
figure 10

In vivo measurements: a comparison of spectroscopic measurements with calculated fat fractions (mean values) for each subject; b correlation between MRS and individual MRI approaches (coefficient of determination); c Bland–Altman plot for PDFF of in vivo MRS and MRI measurements with limits of agreement (dotted lines) from − 1.96 SD to + 1.96 SD. The red line is the mean of the differences (bias)

Discussion

In our study, we compared three commercial and non-commercial software tools used for water/fat decomposition based on identically acquired multi-echo gradient-echo-based MR data. These software tools provide one or more sophisticated algorithms that can solve field map estimation problem. The resulting MRI–PDFF data for each processing approach were compared to gold-standard MRS–PDFF results. We also assessed the computational efficiency of these approaches.

The results of phantom and in vivo measurements showed good correlation between MRI and MRS measurements.

The correlation analysis shows excellent agreements (R2 > 0.99) between the expected values and the MRI and MRS measurements were achieved with all tested solutions in phantom 1. The Bland–Altman analysis showed that slopes of linear regression were in agreement with the reference (ground truth), but in the case of Jim software v 8.0 (VARPRO), all PDFF values were found to be overestimated in all range. The most PDFF estimates were found to be within the 95% limits of agreement. In the case of Hierarchical IDEAL and Jim software v8, the PDFF value in the saline vial was overestimated, because T2* was included in estimation process. In phantom 2, we tested the propensity of the algorithms included in FatWater12 toolbox to water/fat swaps. The resulting fat fraction map acquired with GC contained no water/fat swaps; however, the fat fraction map acquired from RG contained several water/fat swaps that occurred in the large field of view. In the other two approaches, the fat fraction maps also contained water/fat swaps.

The correlation between MRS and MRI in vivo measurements, especially in the abdominal region, can be influenced by several factors. The most critical are voluntary or involuntary movement of the patient, and partial volume effect. Therefore, the in vivo measurements were realized in the exhalation state, and the evaluated volume-of-interest from MR images was compared with MRS measurements. The fulfilment of these basic conditions led to a good match between MRI and MRS water/fat methods. The whisker diagram shows that medians and variances of the PDFF estimates were similar for the approaches employed in the FatWater12 toolbox, while the estimates provided by the remaining two were different; with the last one, in particular, the fat fraction of several subjects was significantly underestimated. As can be clearly seen in the bar chart (Fig. 10a), the MRS–PDFF values were substantially higher than the MRI–PDFF values in subjects S2, S4, S6, and S7 due to intra-voxel field inhomogeneity in the MRS voxel. However, it can be seen that the MRI approaches provided similar results, but the tissue inhomogeneity and, generally, in vivo measurement effects, led to differences in the MRI–PDFF maps. Like in phantom measurements, the bias with the reconstruction algorithm in Jim software v8.0 was higher compared to the other tested algorithms. Beside this, the linear regression slope was not in good agreement with the reference (MRS-HISTO). The GC showed the fewest water/fat swaps in the phantom and in vivo tests, and hence among the methods tested, the GC stands out as the most robust one. It should be noted, however, that GC is a flexible solution that could be strongly influenced by a regularization parameter. Its choice, specific for each individual dataset, can help to identify the optimal solution that most efficiently avoids the water–fat swaps that may result from field inhomogeneity. The choice of the regularization parameter was described in the original paper [14]. The most computationally efficient solution, which also provided good robustness, was the Hierarchical IDEAL algorithm. However, Hierarchical IDEAL uses multi-resolution decomposition; therefore, if higher spatial resolution is required, a higher number of hierarchical levels must be chosen, which prolongs the computation time (see “Results” section).

Similar to phantom measurements, with the set of the acquired 3D MRI datasets, the RG algorithm has shown less robustness to water/fat swaps than the other tested algorithms. In this case, water/fat swaps were observed in all subjects. We have observed water/fat swaps in all subject data for Region-Growing algorithm and for one subject and Hierarchical IDEAL algorithm. All remaining tested algorithms reconstruct PDFF maps without water/fat swaps. Most of the tested MRI data-processing algorithms showed good reliability, accuracy, and robustness against field inhomogeneity. The Jim8.0 (QFat package) provided the worst in vivo results; in this case, the fat fraction was noticeably underestimated in three of seven cases compared to other MRI approaches (bar plot in Fig. 10a) for subjects S1, S6, and S7. In addition, the calculated PDFF value of one subject (S6) was out of the 95% confidence interval (CI), as can be seen in the Bland–Altman plot (Fig. 10c). The best agreement between MRI and MRS in vivo measurements was reached by the RG approach implemented in the FatWater12 toolbox. Although, also in this case, we could detect water–fat swaps, these were observed outside the VOI (see Fig. 8). The Fatty Riot toolbox had a lower R square than the algorithms in the FatWater12 toolbox, but in the case of subject 4, there was good agreement with the spectroscopic measurements (Fig. 10a). Like in phantom measurements, the linear regression slopes and the biases in BA analysis were quantified (Table 3) for each algorithm. In this case, the MRS–PDFF values acquired by HISTO are our reference.

Table 3 Linear regression slopes and the mean of differences (biases) for BA analysis in in vivo measurements

The multi-echo data acquisition (six echoes is a compromise between: (A) the short data acquisition time required by breath-hold, and (B) accuracy) in conjunction with complex evaluation, including full signal model and fat spectral modelling (reference spectrum), provided the possibility to evaluate even a low fat fraction in the liver.

Finally, we must be aware that the signal-to-noise ratio of the acquired data is a factor that influences the fat fraction quantification accuracy. In this respect, the combination of a whole-body transmit coil and a flexible multi-channel matrix, as well as multi-channel spine receive coils, and the employment of well-tuned data-acquisition protocols, yielded satisfactory results. All the tools under evaluation performed reconstruction offline.

The main limitation of this study is the fact that the tested software tools are not certified for clinical use, and therefore, the results cannot be used alone for diagnostic purposes. Furthermore, all processing tools worked offline, which forces data transfer and the use of external workstations. From the point of view, these measurement methods, the coverage of the whole volume of the liver by MRS during one breath hold is currently impossible with existing methods, unlike conventional MRI (requires the use of acceleration methods). However, the sensitivity of MRI approaches at low PDFF is limited by noise and by the T1 effect that occurs in gradient echo-based methods. Due to different T1 relaxation of water (T1w) and fat (T1f), an unequal weighting between them occurs. It leads to overestimation of PDFF. The quantification of PDFF < 1% in vivo is problematic and requires many echo measurements (\(\gg\) 6) and high SNR. The increasing number of acquired echo images leads to prolonged measurement times.

The knowledge acquired during our study will be utilized in our future work where we will focus on the feasibility and reliability of MRI–PDFF quantification at ultra-high field of 7 T. Another interesting application area of MRI–PDFF may be pancreas whose irregular shape is a problem for localized MRS (limitation of voxel size due to gradient field limitation). Unlike previously published studies [27,28,29,30,31], we focused on the accuracy and robustness of MRI–PDFF quantification of liver fat concentrations below about 5%. For this purpose, we intentionally selected a group of screened subjects with low liver-fat fractions. We found a strong correlation between MRI–PDFF and MRS–PDFF in in vivo measurements.

Conclusion

Our results showed that MRI multi-echo measurement in combination with a robust algorithm (including multi-spectrum modeling) for water/fat decomposition is a valuable (provides a high-resolution PDFF map of the whole liver during one breath-hold) and accurate method for low-level fat fraction quantification. In general, the MRI approach provides good reliability and precision of quantification of fat content over a large FOV in high resolution much faster than MRS. Based on our results, the most flexible and complex solution is provided by the FatWater12 toolbox, which itself contains several algorithms. This makes room for optimization of reconstruction parameters. The joint estimation-based approach, Graph-Cut, was found to be the most robust and flexible non-commercial solution.