1 Introduction

Diffusion-weighted magnetic resonance imaging (DWI) is a noninvasive technique used to distinguish internal biological structures based on differences in the random motion of water molecules among tissue types. DWI has proven useful for detecting acute cerebral infarction and malignant tumors, in which the restricted diffusion of water molecules manifests as high signal intensity [1]. Quantitative assessment using DWI is also useful in some clinical applications. The apparent diffusion coefficient (ADC) is a quantitative value derived from DWI and has important applications in several clinical cases [2, 3]. Many factors can introduce variability in ADC measurements [4,5,6]. The signal-to-noise ratio (SNR) also influences the measured ADC. Saritas et al. found that a low SNR in high b value DW images led to an underestimation of ADC [7]. The precision of ADC measurement using the two-point estimation is theoretically improved by increasing the SNR at b = 0 [8]. Moreover, repeatability of ADC measurements is required for disease monitoring and comparative research [9, 10].

A higher SNR has been achieved through the recent development of magnetic resonance imaging (MRI) technology, such as scanners with high field strength and multichannel coils. Increased SNR is typically available for high-resolution imaging, including thin slices or faster imaging. In DWI, thin-slice imaging using 2- to 2.5-mm thickness improves the spatial resolution of lesions and reduces the partial volume effect (PVE) in ADC measurements [11, 12]. Moreover, thin-slice imaging using a 1-mm thickness is expected to have additional diagnostic benefits in clinical practice [13]. However, measurement errors and poor repeatability of ADCs caused by the low SNR of thin-slice imaging are major concerns, despite the potential of thin-slice DWI.

Recently, the Quantitative Imaging Biomarkers Alliance (QIBA) reported the requirement values for measurement errors and repeatability of ADC using a quantitative DWI phantom, such as an ice-water phantom [14]. ADC errors using thin-slice imaging should be clarified by comparison with the quantitative values described in the QIBA profile for quality control. However, to the best of our knowledge, no study has quantitatively addressed the measurement error and repeatability of ADC using the requirement values reported in QIBA profiles. We hypothesized that a 3.0-T scanner could potentially achieve acceptable measurement errors and repeatability of ADC for 1-mm slice thickness within a reasonable scan time compared to a 1.5-T scanner. This study aimed to investigate the measurement error and repeatability of ADC obtained with thin-slice DWI in a phantom experiment.

2 Materials and methods

2.1 Ice-water phantom

We used an ice-water phantom, a widely recognized standard for DWI, in all experiments [4]. The ADC of water at 0 °C was 1.1 × 10−3 mm2/s [15]. Our phantom consisted of five plastic rods with a 20-mm inner diameter inside a 3.4-L plastic container of ice water. The rods were filled with water and arranged in the plastic container with one at the center and four at the peripheral positions (Fig. 1) approximately 40 mm from the center.

Fig. 1
figure 1

Ice-water phantom used in this study. a Diagram of the phantom. Five water-filled rods of 20-mm diameter (dashed circles) are submerged in a plastic container filled with ice water. One rod is set at the magnet center, and the others are set at peripheral positions all approximately 40 mm away from the magnet center. b Diffusion-weighted image showing the centre of the phantom

2.2 DWI acquisition

Data acquisition was performed using 1.5-T and 3.0-T MR scanners (Philips Ingenia, Best, the Netherlands) with an 18-channel dS head-and-neck coil (Philips). The maximum slew rate and gradient strength for both scanners were 200 mT/m/ms and 45 mT/m, respectively. The phantom was positioned near the magnet center. One axial slice of the phantom at three thicknesses was acquired near the magnet center. The imaging parameters were as follows: sequence, echo-planar imaging; repetition time, 10,000 ms; echo time, shortest (range 81–85 ms); b values, 0 and 1000 mm2/s; slice thicknesses, 1, 3, and 5 mm; half-scan factor, 0.6; parallel imaging, SENSE; acceleration factor, 2; number of signal averages (NSA), 1; phase-encoding direction, anterior–posterior; field of view, 230 mm; acquisition matrix, 128 × 128; reconstruction matrix, 512 × 512; and scan time, 60 s. The receiver bandwidth was set to the maximum possible value (1403 Hz/pixel for the 1.5-T scanner and 1675 Hz/pixel for the 3.0-T scanner). Motion-probing gradients were applied in three orthogonal directions.

2.3 SNR assessment

All DW images were recorded in DICOM format and assessed using ImageJ software (version 1.45; National Institutes of Health, Bethesda, MD, USA). The SNRs at b = 0 and b = 1000 were calculated using the following equation [14]:

$$ {\text{SNR}} = \frac{{{\text{Spatial}}\;{\text{ mean }}\;{\text{pixel }}\;{\text{value }}\;{\text{on }}\;{\text{signal }}\;{\text{image}}}}{{{\text{Spatial}}\;{\text{ mean }}\;{\text{pixel }}\;{\text{value}}\;{\text{ on }}\;{\text{temporal }}\;{\text{noise}}\;{\text{ image}}}}. $$
(1)

The signal and temporal noise images consisted of the mean and standard deviation, respectively, of each pixel over five consecutive scans. Square regions of interest (ROIs) of 10 × 10 pixels were carefully set at the center of each water rod in both signal and temporal noise images (Fig. 2). Spatial mean values were calculated within the ROIs for both the signal and temporal noise images. According to the QIBA recommendation [14], the 95% confidence interval (CI) of the SNR was defined as follows:

$$ 95\% \;{\text{CI}} = \pm \; 1.96 \cdot \frac{{\sigma_{{{\text{SNR}}}} }}{\sqrt N }, $$
(2)

where N is the number of pixels in the ROI and σSNR is the standard deviation defined by

$$ \sigma_{{{\text{SNR}}}} = SNR \cdot \sqrt {{\text{sCV}}^{2} + {\text{nCV}}^{2} ,} $$
(3)

where sCV and nCV are the coefficients of variance within the ROIs of the signal and noise images, respectively.

Fig. 2
figure 2

ROIs for SNR and ADC assessment. ROIs on the signal image (a), noise image (b), and ADC map (c) of the phantom middle acquired using the 1.5-T scanner. The signal image and temporal noise image were obtained by calculating the average and standard deviation, respectively, of each pixel from five consecutive scans. The ADC map was calculated using b = 0 and b = 1000 images. The 10 × 10-pixel ROIs (white squares) are set at identical positions for each image

2.4 ADC assessment

ADC assessment was performed according to the QIBA methodology [14]. ADC maps of the ice-water phantom were generated using ImageJ software. The ADCs of each pixel were calculated using the following equation:

$$ {\text{ADC}} = - \frac{1}{1000}\ln \left( {\frac{{S_{1000} }}{{S_{0} }}} \right), $$
(4)

where ADC is the apparent diffusion coefficient and S1000 and S0 are the signal intensities at b = 1000 and b = 0 mm2/s, respectively. Then, the ADC was measured within ROIs of the ADC map as described for the SNR assessment. We assessed the accuracy and precision of the ADC as follows:

$$ {\text{Accuracy}}\; (\% ) = 100 \cdot \frac{{{\text{ADC}}_{{{\text{measure}}}} - {\text{ADC}}_{{{\text{true}}}} }}{{{\text{ADC}}_{{{\text{true}}}} }} $$
(5)

and

$$ {\text{Precision}}\; (\% ) = 100 \cdot \frac{\sigma }{{{\text{ADC}}_{{{\text{measure}}}} }}, $$
(6)

where ADCmeasure is the mean ADC within the ROI, ADCtrue is the ADC of water at 0 °C (1.1 × 10−3 mm2/s), and σ is the standard deviation within the ROI. The ADCs at the center and peripheral positions were averaged over five repeated scans. The repeatability was assessed using the within-subject coefficient of variation (wCV), defined as follows:

$$ {\text{wCV}}\; (\% ) = 100 \cdot \frac{{\sigma_{{\text{w}}} }}{\mu }, $$
(7)

where µ and σw are the mean and standard deviation, respectively, calculated using each average ADC within the ROIs from the five measurements. Both the SNR and ADC at the periphery were averaged over the ROIs of all four rods.

2.5 Statistical analysis

ADC obtained with each scanner and slice thickness was assessed using a one-sample t-test to compare the measured ADC with that of water at 0 °C (1.1 × 10−3 mm2/s). A P value < 0.05 was considered statistically significant. All data analyses were performed using R software (version 3.2.3, R Foundation for Statistical Computing, Vienna, Austria).

3 Results

Table 1 presents the SNR values obtained by each scanner at both positions for each slice thickness. SNRs were roughly 2–3 times higher in the 3.0-T scanner than those in the 1.5-T scanner. Peripheral SNRs were approximately 1.2–1.5 times higher than those at the center. Furthermore, SNRs at b = 0 were approximately three times higher than those at b = 1000. Figure 3a, b shows the mean ADC derived from each slice thickness using the 1.5-T and 3.0-T scanners, respectively. The average ADCs at 1.5 T and 3.0 T were 1.092 × 10−3 mm2/s (range 1.075–1.101 × 10−3 mm2/s) and 1.120 × 10−3 mm2/s (range 1.113–1.127 × 10−3 mm2/s), respectively. On an average, the ADC at 3.0 T was approximately 2.0% higher than that of water at 0 °C (P < 0.001 for all slice thicknesses).

Table 1 Signal-to-noise ratios obtained for the 1.5-T and 3.0-T MRI scanners
Fig. 3
figure 3

Apparent diffusion coefficient (ADC) plots obtained using the 1.5 T scanner (a) and 3.0 T scanner (b). The black dots and error bars show the average and 95% confidence intervals, respectively, calculated from five consecutive image acquisitions. The ACDs at the peripheral positions are merged from all four rods. The transverse dashed and dotted lines in each graph represent 1.1 × 10−3 mm2/s (the ADC of water) and 3.0 deviations from 1.1 × 10−3 mm2/s. The solid line in each graph represents the average ADC at each scanner. The asterisk indicates that there is a statistically significant difference between the measured ADC and the ADC of water (1.1 × 10−3 mm2/s). Previous research has indicated that the ADC deviations are within ± 3.0% near the magnet center using different scanners [4]. There was no obvious underestimation of the ADC values at either position on the 1.5 T and 3.0 T scanners. The ADCs obtained using the 3.0 T scanner show systematic variation from the ADC value of water (P < 0.001); however, all ADC values were still within 3.0%

Table 2 shows the accuracy and precision of the ADCs. The accuracy at the center was within ± 2.4% and ± 2.5% in the 1.5-T and 3.0-T scanners, respectively. Precision degraded with a decrease in the slice thickness. The precisions of the 1.5-T and 3.0-T scanners were 8.35% and 2.86% at the center and 6.34% and 3.34% at the periphery positions, respectively, at 1-mm thickness.

Table 2 Measurement errors in the apparent diffusion coefficient (ADC) values obtained using the 1.5-T and 3.0-T scanners

Table 3 shows the repeatability of the ADC across five consecutive measurements. The wCVs improved with increasing slice thickness for both scanners. The wCVs at the center were ≤ 3.4% and ≤ 1.3%, and those at the periphery were ≤ 2.5% and ≤ 1.7% in the 1.5-T and 3.0-T scanners, respectively.

Table 3 Repeatability of the apparent diffusion coefficient (ADC) measurements using the 1.5-T and 3.0-T scanners

4 Discussion

This is the first study to examine the measurement errors and repeatability of ADCs derived from thin-slice DWI of a temperature-controlled phantom. ADCs in both scanners showed good accuracy and no underestimation due to low SNR. The precision and repeatability of ADCs were improved in the 3.0-T scanner. Moreover, the 3.0-T scanner showed moderate precision and repeatability even with 1-mm thickness compared to the value in previous literature [14]. Our results indicate that the 3.0-T scanner at a reasonable scan time can be used for reliable measurement of ADC at 1-mm thickness.

Both scanners showed good accuracy within ± 2.5% for all slice thicknesses and both positions, which was higher than the requirement value (± 3.6%) in the QIBA profile [14].

This result indicates that thin-slice DWI in both scanners will not produce substantial ADC underestimation due to low SNR, even at 1-mm thickness. The average SNR of 1.5-T images at 1-mm thickness was lower than that recommended by the QIBA (SNR ≥ 50 ± 5 for b = 0 image) [14]. The discrepancy of SNR at b = 0 is attributed to the maximum b value used for ADC calculation. The QIBA profile assumes that the maximum b value of 2000 is used for the ADC calculation, which is higher than that in our study. An inherently high SNR at b = 0 is required to maintain the SNR at the maximum b value, which results in accurate ADC measurement. The ADC values at 3.0 T ranged 1.09–1.14 × 10−3 mm2/s, as measured using QIBA diffusion phantom reported by Paudyal et al. [9] They concluded that the ADCs were excellently measured. Our results were similar to those values, although a systematic bias is observed in the 3.0-T scanner. Therefore, we consider the ADC bias to be less important because it is within the values reported in previous studies [9, 14].

The 3.0-T scanner demonstrated better precision than that of the 1.5-T scanner at all slice thicknesses because of the higher SNR. QIBA recommends that the precision of ADC be < 2.0% near the isocenter [14]. Our results show that the precision of ADC at 1-mm thickness in the 1.5-T scanner was obviously worse than the QIBA recommendation [14] and previous work by Malyarenko et al.[4]. The precision of the ADC is determined by the SNR at b = 0 and the interval of the b values [8]. In this study, the interval was 1000 mm2/s, which is the best value to estimate the ADC of the ice-water phantom according to Xing et al. [16]. This implies that precision depends only on the SNR at b = 0. A simple strategy for increasing the SNR is to increase the NSA. However, a considerably long scan time is required to achieve the QIBA profile requirement value of precision when using 1-mm thickness in a 1.5-T scanner. In contrast, a 3.0-T scanner shows slightly worse precision of ADC using 1-mm thickness. The precision of the ADC improves with the increase in the NSA in the 3.0-T scanner as with the case in the 1.5-T scanner. However, the intrinsically high SNR of the 3.0-T scanner enables improvement in precision with a smaller number of NSA than that required for the 1.5-T scanner. This indicates that a 3.0-T scanner can achieve the QIBA profile requirement value of precision within an acceptable scan time even when using 1-mm thickness.

The 3.0-T scanner showed moderate repeatability of ADC using 1-mm thickness. The repeatability in both scanners improved with increased slice thickness owing to the higher SNR. Our results show that ADC using thin-slice imaging has inferior repeatability in both scanners compared to the requirement value in the QIBA profile [14]. The wCVs reported in Paudyal et al. are within 1.07% and regarded as good repeatability [9]. Compared to those values, wCVs of 1-mm thickness in the 3.0-T scanner can be considered moderately repeatable and those in the 1.5-T scanner were considerably worse. Grech-Sollars et al. reported that the intra-scanner variations of ADC obtained using the same scanner were 1.0% for the white matter and 2.9% for the gray matter using 1.5-T and 3.0-T scanners [17]. The intra-scanner variation is affected not only by image noise, but also by physiological noise (respiration, pulsation, etc.) in vivo. Conversely, wCVs in our result were obtained under the ideal phantom study conditions, in which the image noise was the dominant error source of ADC repeatability. Therefore, wCVs should at least be better than the intra-scanner variation in Grech-Sollars et al. [17]. Compared to their results, the 1.5-T scanner has poor ADC repeatability of at 1-mm thickness because of the insufficient SNR.

This study has several limitations. First, the results were obtained using scanners from a single vendor, so the differences in ADC measurements between the 1.5-T and 3.0-T scanners may not accurately predict those obtained using scanners or coil systems from different vendors. Second, our phantom does not fully cover the ADCs observed in biological tissues. A reduction in SNR is likely in areas with ADCs > 1.1 × 10−3 mm2/s, so measurements using objects with higher ADCs could provide more realistic data for clinical applications. In addition, our phantom does not simulate the PVE that would occur in vivo. More useful information would be provided through further studies using a dedicated phantom reflecting the PVE. Third, the ADC measurement errors due to artifacts were not fully investigated as our phantom was homogeneous and static. Motion artifacts due to respiration or pulsation and chemical shift artifacts due to fat suppression failure could be additional sources of ADC measurement error in vivo. Finally, the repeatability, including the setup error and phantom preparation, was not assessed, which should be discussed in future studies.

5 Conclusion

In the phantom experiment, we revealed that the ADC measured using our 1.5-T and 3.0-T scanners showed good agreement with the requirement values in the QIBA profile. We also demonstrated that the 3.0-T scanner has better precision and repeatability than those of the 1.5-T scanner. In particular, the 3.0-T scanner has the potential to achieve acceptable precision and repeatability within acceptable scan times when using 1-mm thickness. Therefore, the 3.0-T scanner can be used for the reliable measurement of ADC using 1-mm thickness.