Introduction

Diffusion-weighted imaging (DWI) has been widely performed to detect and characterize liver lesions, monitor and predict the treatment response of hepatic tumors [1,2,3].In DWI, the apparent diffusion coefficient (ADC), representing mobility of water molecules within the tissue, is a quantitative parameter as imaging and therapeutic biomarker for patients with liver diseases [4]. Unfortunately, ADCs can be interfered by image artifacts from breathing and cardiac motion [5]. Respiratory-triggering technique can reduce these artifacts in DWI protocols, but it is at the cost of rather long and also uncertain scan times, which in turn can markedly increase patient's uncomfortableness and decrease image quality [6]. Therefore, a DWI technique with scan time reduction and fewer artifacts is desired to obtain good image quality and reliable ADC quantification.

Recently, simultaneous-multislice (SMS) imaging was introduced using simultaneous multi-band radiofrequency excitation with echo-planar imaging readout and blipped-CAIPIRINHA technique that can reduce the scan time and increase patient’s comfort. In SMS, the acceleration factor can be achieved using controlled aliasing introduced using the blipped-CAIPIRINHA technique, which was 2 in our study [7,8,9]. Compared to standard DWI (STD), SMS-DWI has been shown with similar image quality, but 40–70% shorter scan time under the same conditions [6, 10]. Currently, different breathing schemes have been used in SMS-DWI, including breath-hold (BH-SMS), free breathing (FB-SMS) and respiratory-triggered (RT-SMS). Among them, an optimal SMS-DWI would be able to improve workflow in daily routine, especially in the face of tight examination schedules, which presents a reliable ADC measurement, good image quality and sufficient SNR. To date, only a few studies on STD-DWI have reported the ADC repeatability with different breathing schemes [11,12,13,14]. Kandpal and Kwee et al. found that RT-STD could provide better image quality and SNR than BH-STD and FB-STD [11, 14]. LEE et al. demonstrated that echocardiography triggering STD was more effective in improving the repeatability of ADC and IVIM measurement than FB-STD and RT-STD [12]. Chen et al. considered FB-STD to have better repeatability and shorter acquisition time compared to that of BH, RT and NT-STD [13]. However, the DWI repeatability was just evaluated with different STD techniques.

With the growing clinical interests of hepatic SMS-DWI [15, 16], the aim of our study was to evaluate SMS-DWIs with three different breathing schemes (BH, FB and RT) at 3.0 T MRI and obtain an optimal respiration method for SMS sequences with respect to ADC repeatability, signal-to-noise ratio (SNR) and image quality, in comparison with STD and other SMS sequences.

Materials and methods

Subjects

All volunteers gave written informed consent to this prospective study, which was approved by the local Institutional Review Board. 23 healthy young volunteers with similar age (juniors in a medical college, mean 21 years, range 20–22 years) were enrolled (12 males, 11 females). In our study, the inclusion criteria included normal liver function tests, no history of alcohol and drug abuse, viral hepatitis, and prior abdominal surgery.

MR imaging protocol

This MRI study were carried out on a Siemens Prisma 3.0 T whole-body MRI system (80 mT/m maximum gradient strength, Siemens Healthcare, Erlangen, Germany) with an 18-channel anterior surface body coil in combination with 12 elements of a 32-channel spine coil. Each subject was scanned twice with a short break (5 min) and repositioned between two identical sessions. This allowed the quantification of intersession variability. Within each session, each subject underwent SMS- and STD-DWIs in the axial view with three different breathing acquisition schemes, with the order as RT-STD, RT-SMS, BH-STD, BH-SMS, FB-STD and FB-SMS. Three b-values of 50, 300, and 600 s/mm2 were sampled in three orthogonal diffusion directions (three-scan trace). BH technique required the participants to hold the breath on the end-expiratory (BH-STD: twice breath-holds, BH-SMS: one end-expiration), FB and RT adopted an air-filled pressure sensor to measure respiratory-induced pressure changes with a respiration belt around the subjects. Their scan parameters were kept as close as possible and the detailed parameters of all SMS and STD sequences were summarized in Table 1. In our study, SMS was a commercially available sequence, which was achieved based on STD-DWI with the same acceleration factor. However, other acceleration factors = 2 were used in three SMS sequences (SMS AF = 2). For SMS and STD sequences, the acquisition volumes were large enough to completely cover the whole liver. The fat suppression was achieved with spectral adiabatic inversion recovery and the minimum time of echo (TE) was applied in all DWIs without any filter. Additionally, a minimum TR was applied in BH-SMS for saving scan time to meet the requirement of breath-hold ability. A k-space-based parallel imaging technique, generalized autocalibrating partially parallel acquisition (GRAPPA), was used and the scan time was recorded in our study (Table 1).

Table 1 The summarized parameters of all DWI sequences

Image analysis

All SMS- and STD-DWI images were transferred to a workstation and apparent diffusion coefficient (ADC) maps were calculated by image post processing (Syngo. via, VB10, Siemens Healthcare). In our study, ADC and SNR were independently measured by two radiologists (Y.P. and W.L., readers 1 and 2 with 5 and 10 years of clinical experience in liver MRI, respectively), and the image quality was assessed by another two radiologists (J.H. and S.X., reader 3 and 4 with 4 and 3 years of clinical experience in liver MRI). ADC maps were calculated by the log-linear fitting algorithm with three different b-values (b = 50, 300, 600 s/mm2) according to the following equation:

$$I_{{{\text{Trace}}}} = \, I_{0e}^{ - ADC*b} = I_{0e}^{{ - \, b*\left( {D1 + D2 + D3} \right)/3}} = \left( {I1*I2*I3} \right)^{1/3}$$
(1)

where I1, I2, I3 are the measured diffusion-weighted images with three orthogonal gradient directions and D1, D2, D3 are the corresponding diffusion coefficients. Nine regions of interests (ROIs) were drawn in three representative sections (superior, central and inferior) on each ADC map. The central section was defined by the main stem of right portal vein. The superior and inferior sections were six consecutive section levels above or below the central section. ROIs were positioned only in right liver lobes because ADC and SNR values in left lobes were found unreliable due to cardiac motion artifacts [13, 17]. ROIs were placed with reference to the anatomic landmarks, such as portal, hepatic veins and their main branch, which were easy to discriminate on b = 50 s/mm2 images. Thus, nine circular ROIs were manually positioned in three representative slices on the b = 50 s/mm2 images and then copied them to ADC maps (ADC measurements) and b = 600 s/mm2images (SNR measurements) (Fig. 1). Nine ROIs were kept the same size (0.8mm2) on 12 DWIs. All intrahepatic ROIs were positioned with a distance of at least 5 mm to the organ capsule and kept away from macroscopically visible vessels and bile ducts. For each ROI, the maximum, minimum, mean values and their standard deviation were recorded. Thus, a total of 108 ADCs were collected for each volunteer (three ROIs per section, three sections, two repeated series and six techniques). ADCs were measured twice in a 2-week period for reader 1 and once in a week for reader 2. b = 600 s/mm2 has been recommended as an optimal b-value in DWI for differentiation of abdominal lesions [18]. Therefore, the highest b = 600 s/mm2 images were used to calculate the estimated SNR of each representative section with the formula:

$${\text{SNR}} = {\text{SI}}/{\text{SD}}$$
(2)

where SI is the mean signal in the three ROIs of the same sections, and SD is the standard deviation of the signal from ROIs with the same size placed at nearby background in the corresponding section. In all subjects, SNRs were measured once in a 1-week period for reader 1and reader 2, which were recorded with the maximum, minimum, mean value and their standard deviation. Assessment of image quality was performed on the first DWI series (RT- STD, RT-SMS, BH-STD, BH-SMS, FB-STD and FB-SMS) for each participator. Image quality was rated on a five-point Likert scale with respect to following five aspects (L1–L5). L1: the sharpness of liver edge (5 = sharp and no blurring, 4 = mild blurring, 3 = moderate blurring, 2 = Severe blurring, 1 = non diagnostic), L2: severity of artifacts ( 5 = no artifacts, 4 = mild artifacts, 3 = moderate artifacts, 2 = severe artifacts, 1 = non diagnostic), L3: imaging quality of the liver dome (5 = excellent, 4 = good and not affecting interpretation, 3 = moderate and potentially affecting interpretation, 2 = poor and definitely affecting interpretation, 1 = non diagnostic), L4: image quality of segments caudal of the liver dome (5 = excellent, 4 = good and not affecting interpretation, 3 = moderate and potentially affecting interpretation, 2 = poor and definitely affecting interpretation, 1 = non diagnostic), L5: Overall image quality (5 = excellent, 4 = good and not affecting interpretation, 3 = moderate and potentially affecting interpretation, 2 = poor and definitely affecting interpretation, 1 = non diagnostic) [19]. Image quality scores were recorded in ADC maps and three DWI images with b = 50, 300 and 600 s/ mm2 for each volunteer, respectively.

Fig. 1
figure 1

A diagram of ROIs manually drawn in superior, central and inferior representative slices in the first session with BH-SMS sequence

Statistical analysis

All statistical analyses were performed using SPSS (SPSS for Windows, version 13.0; SPSS, Chicago, IL). ADC and SNR were expressed as mean and SD (minimum, maximum). Ratings were expressed as median (minimum, maximum). In our study, P-values of less than 0.05 were considered statistically significant. The interobserver agreement was analyzed by calculating weighted kappa coefficients (quadratic weighting) with kappa values of 0.01–0.20 representing slight agreement, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect agreement.

Intra- and interclass correlation coefficients (ICCs) were used to evaluate the intra- and interobserver agreement of ADC measurements [20, 21]. Intraobserver ICC was compared in two ADC measurements of the reader 1, and interobserver ICC was calculated between the first ADC measurements of reader 1 and the ADC measurements of reader 2. An ICC greater than 0.8 was indicative of good agreement. In addition, the repeatability of ADC measurements was evaluated with the Bland–Altman method [22].The mean absolute difference (bias) and the 95% confidence interval of the mean difference (limits of agreement, LOA) between the first and second DWI series were compared [16]. The differences of SNR between the six DWI techniques were assessed by using two-way classification analysis of variance. The Bonferroni method was used to adjust for multiple comparisons [23]. Image quality was compared between the six sequences using Friedman test. If the Friedman test showed a statistically significant P-value, the Dunn-Bonferroni post-hoc method for all pairwise comparisons were performed [6, 24].

Results

SMS and STD with different breathing schemes were successfully carried out in all volunteers. Acquisition time reduction with the SMS techniques was achieved with a good image quality.

Scan time

To obtain DWI of 24 slices, accelerated SMS acquisition reduced scan time significantly in comparison to STD-DWI: 36% for BH scheme (27 vs. 42 s), 31% for FB (54 vs. 78 s), and 55% for RT (42 vs. 97 s) (Table 1). BH-SMS enabled the shortest scan time (27 s) to obtain whole-liver DWI among six DWIs.

The ADC measurementsand their intra- and interobserver agreement

The average ADC values of three representative sections were gained with six DWI sequences. They are more reliable for intraobserver ADC measurements when using BH (almost all P > 0.05) than RT and FB techniques (almost all P < 0.05). For example, in reader 1’s two measurements (first and second) with BH-SMS, the average ADC values were (1.074 ± 0.140) × 10–3 mm2/s vs.(1.073 ± 0.114) × 10–3 mm2/s for superior section (P = 0.912), which yielded less variation than those with RT-SMS ((1.043 ± 0.144) × 10–3 mm2/s vs. (1.077 ± 0.152) × 10–3 mm2/s, P = 0.022); and FB-SMS ((1.011 ± 0.138) × 10–3 mm2/s vs. (1.049 ± 0.132) × 10–3 mm2/s, P = 0.005) for superior slice (Table 2). Furthermore, BH-SMS had the highest intraobserver ICCs (0.920 ~ 0.941) in six DWI sequences (Table 2). For the interobserver average ADC values, they all were not significantly different between Reader1 and Reader 2 in six DWI sequences (all P > 0.05). For instance, the average ADC values with BH-SMS were (1.074 ± 0.140) × 10–3 mm2/s and (1.075 ± 0.134) × 10–3 mm2/s for superior section (P = 0.562), (1.059 ± 0.122) × 10–3 mm2/s and (1.053 ± 0.125) × 10–3 mm2/s for the central section (P = 0.578), and (1.029 ± 0.111) × 10–3 mm2/s and (1.034 ± 0.093) × 10–3 mm2/s for the inferior section (P = 0.934) (Table 2). Furthermore, BH-SMS had a good interobserver agreement (0.831–0.886) (Table 2).

Table 2 The ADC measurements in six DWI techniques and their intra- and interobserver agreement

Repeatability of ADC measurements

In the right liver lobe, the repeatability of ADC measurements in nine different anatomic locations were varied for each technique. However, the mean ADC absolute differences (bias) with BH-SMS (0.046–0.058 × 10–3 mm2/s) were lower than other five DWI sequences (FB-SMS: 0.063–0.101 × 10–3 mm2/s, RT-SMS: 0.060–0.109 × 10–3 mm2/s, BH-STD: 0.047–0.068 × 10–3 mm2/s, FB-STD: 0.064–0.090 × 10–3 mm2/s, RT-STD: 0.055–0.069 × 10–3 mm2/s) in nine locations (Table 3). Furthermore, BH-SMS had the highest ADC measurement repeatability with the lowest LOA (0.010–0.013 × 10–3 mm2/s) in all six sequences (Table 3, Fig. 2). In addition, the ADC repeatability in the central middle position was better to that in other anatomic location for all DWI techniques, yielding mean absolute differences of ADCs ± LOA (BH-SMS: (0.056 ± 0.011) × 10–3 mm2/s, FB-SMS: (0.084 ± 0.025) × 10–3 mm2/s, RT-SMS: (0.074 ± 0.018) × 10–3 mm2/s, BH-STD: (0.053 ± 0.010) × 10−3mm2/s, FB-STD: (0.064 ± 0.018) × 10−3mm2/s, RT-STD: (0.055 ± 0.012) × 10−3mm2/s) (Table 3).

Table 3 The mean absolute differences of ADC measurements and their 95%confidence intervals in nine anatomic locations with six DWI techniques
Fig. 2
figure 2

The repeatability of ADC measurements by Bland–Altman method with different DWI sequences in nine different anatomic locations. For BH technique, BH-SMS was slightly lower than BH-STD in mean ADC differences and LOA (The first and second lane). For FB and RT schemes, SMS sequence was higher than corresponding STD in mean ADC differences and LOA (from the third to sixth lane). However, BH-SMS had lower mean ADC differences and LOA than FB-STD and RT-STD (The first, fourth and sixth lane). They all suggested that BH-SMS had the highest ADC measurement repeatability in six DWI sequences

SNR quantitative analysis

Compared to various protocols, BH-SMS had the highest surrogate SNR in three representative sections except for RT-STD (Fig. 3). For example, in superior representative section, the surrogate SNR was 29.81 ± 20.49 with BH-SMS measured by reader1, which was obviously higher than FB-STD (9.57 ± 5.91, P < 0.0001) and BH-STD (11.26 ± 7.24, P < 0.0001), and was slightly higher than RT-SMS (20.18 ± 13.60, P = 0.239) and FB-SMS (19.19 ± 9.54, P = 0.120). In addition, an excellent agreement were found in superior (r = 0.95), middle (r = 0.94) and inferior section (r = 0.86) for BH-SMS between reader 1 and reader 2 (Table 4).

Fig. 3
figure 3

SNR measurements of three representative sections by reader1with six DWI sequences. The SNR measurements from different ROI using BH-SMS were higher than the results using other sequences except RT-STD

Table 4 Signal-to-noise-ratio (SNR) of different representative sections on the right hepatic lobe

Qualitative image analysis

The interobserver agreements of two readers were very good for assessing image quality in five aspects (L1–L5) and were shown in Table 5. For instance, the kappa value was 0.951 (P < 0.0001) for the overall image quality assessment between Reader 3 and Reader 4. A detailed summary of all image quality criteria rankings as stratified by reader3 were also presented in Table 5. There were significant difference for the severity of artifacts (L2) and overall image quality (L5) in ADC map (L2: P < 0.0001, L5: P < 0.0001), b = 50 images (L2: P < 0.0001, L5: P = 0.027), b = 300 images (L2: P < 0.0001, L5: P = 0.012), b = 600 images (L2: P < 0.0001, L5: P = 0.018), which presented differences between STD and SMS sequences by further analysis (P < 0.05). However, it was not significantly different between BH-SMS and other 5 DWI sequences (P > 0.05) (Table 5, Figs. 4, 5).

Table 5 The evaluation of image quality by reader 3 and the agreement of image quality between reader 3 and reader 4
Fig. 4
figure 4

Comparisons of image quality of STD- and SMS-DWI with b = 600 s/mm2 using different breathing schemes (BH breath-hold, FB free-breathing, RT respiratory-triggering). Image quality of BH-SMS did not significantly different with other five DWI acquisitions in three representative sections (superior, central and inferior), including the sharpness of liver edge (BH-SMS: 5 (4,5), other five DWIs: 5 (4,5)–5 (5,5), P > 0.05), severity of artifacts (BH-SMS: 4 (4,5), other five DWIs: 4 (3,5)–5 (4,5), P > 0.05) and Overall image quality (BH-SMS: 4 (4,5), other five DWIs: 5 (4,5), P > 0.05)

Fig. 5
figure 5

Comparisons of image quality of STD-DWI (the upper three rows) and SMS-DWI (the lower three rows) using different breathing schemes (BH breath-hold, FB free-breathing, RT respiratory-triggering). Diffusion-weighted trace images with three different b-values (50, 300, 600 s/mm2) with the corresponding ADC maps (left) are arrayed. Image quality with BH-SMS was slightly superior to other five DWI sequences in sharpness of liver edge and severity of artifacts, but they were not differ significantly between BH-SMS and other five DWI sequences (P > 0.05)

Discussion

SMS-DWI technique has been proposed in clinical applications for detecting and characterizing liver lesions with shorter examination time and similar image quality in comparisons with STD-DWI sequence [6, 19]. In this study, the scan time has been saved about 36% for BH-SMS, 31% for FB-SMS and 55% for RT-SMS, which cuts at least one-third of scan time of the corresponding STD-DWI without noticeable artifacts [6]. BH-SMS has used the shortest scan time to acquire whole-liver DWI with a good image quality on 3.0 T MRI, which is helpful to improve the work effectiveness and relieve the workflow tension.

The mean ADC values were 0.9–1.1 × 10–3 mm2/s for three representative sections in all DWIs, which were near the low-end of the literature values [6, 12, 15]. The mean intraobserver ADC values with the BH technique were more reliable than RT and FB techniques. This might be associated with the mitigation of respiratory movement when using the BH scheme for those young volunteers who have the adequate breath-holding capability, which allows obtaining images with clear anatomic landmarks and placing ROIs in consistent locations. Therefore, BH-SMS offered higher repeatability for ADC measurements than RT-SMS and FB-SMS and the highest ICC (0.9200.941) in intraobserver ADC measurements for three representative sections. For the interobserver average ADC measurements, BH-SMS had no significant difference between Reader 1 and Reader 2 and the interobserver agreement was good (0.8310.886), which suggested that BH-SMS had sufficient reliability and repeatability for assessing ADC measurements.

In our results, the ADC repeatability in different locations was varied for each technique. However, it was the highest in the central middle location for all six DWI techniques. This finding was in agreement with the result of Chen et al. [13]. In addition, BH-SMS had the greatest ADC measurement repeatability in all six sequences yielded the smallest ADC mean differences and LOA. Taron. et. al reported that breathing schemes could affect the absolute ADC values (bias) [6]. In our study, the participators are young students with a good ability to hold their breaths and have fewer motion artifacts with BH technique than RT and FB techniques. Thus, the absolute differences of ADC values with the BH scheme were lower than those with RT and FB schemes. In addition, BH-SMS has a shorter scan time than BH-STD, which helps decrease the motion artifacts and achieve low ADC mean differences and LOA. Furthermore, our results found that all the LOAs were around 20–30% of the mean ADC value. This finding was in agreement with the findings of Chen and Kim et al. [13, 25]. Thus, for treatment response evaluation with ADC as a biomarker, we also recommend a confident threshold to be at least 20%, and the same DWI acquisition technique should be applied for all baseline and follow-up studies.

In our study, the greatest SNR was obtained for each DWI due to the application of a minimum TE in six DWIs, which could be affected by other factors such as the volume of the voxel, the bandwidth, and the numbers of signal acquisition (NSA). These were kept the same as much as possible in SMS and STD. In clinical work, NSA = 2, 2, 3 is usually used to acquire a signal for 50, 300, 600 s/mm2, respectively, in FB and RT- DWI. However, NSA = 1, 1, 1 was performed in BH-DWI due to the maximum saving of scanning time without exceeding the upper limit of the breath-hold ability, which could yield a good DW image quality for the patients. In the quantitative analysis, the mean SNR of the liver in BH-SMS was significantly higher than almost all DWI sequences, except for RT-STD at b = 600 s/mm2. RT-STD had approximately twice the SNR compared to BH-SMS, the reason may be due to the application of high NSA for improving SNR and 3 concatenations for gaining good DW image quality for RT-STD. But RT-STD takes at least twice the scan duration than BH-SMS, which is not preferred for clinical examinations. Interestingly, BH-SMS had distinctly higher SNR than FB-STD and BH-STD, and was slightly higher than RT-SMS and FB-SMS due to the fewer motion artifacts caused by the BH technique, which easily integrates with SMS. Also, an excellent agreement with BH-SMS was found for three representative sections. Therefore, BH-SMS can achieve a sufficient, reliable and reproducible SNR, which is a good choice for acquiring liver DWI to assess liver diseases at 3.0 T MRI.

Visual assessment is helpful for disease detection and characterization by observing the different signal attenuation on DWI. In our study, all SMS- and STD-DWI methods obtained good image quality. However, RT- and FB- based SMS-DWI were rated significantly lower than their corresponding STD-DWI techniques in terms of severity of artifacts and overall image quality, including all images (b = 50, 300, 600 s/ mm2) and ADC maps (P < 0.05). Previous work conducted on 1.5 T reported that there were no significant differences in the overall image quality with RT- and FB-acquisitions between STD and SMS techniques [6], which was inconsistent with our findings. This may be due to the higher sensitivity of SMS technique to the breathing motion at 3.0 T than 1.5 T MRI. In addition, a good image quality was obtained with BH-SMS in our study, which was not significantly different with STD-DWI sequences (all P-value > 0.05). This may be attributed to that BH-DWI was performed during a breath-hold, which attempted to freeze motion, and the image quality can improve with less breath motion artifacts, especially for those young volunteers with a good respiration holding ability. Kartalis et al. [26] reported that the image quality of BH scheme was similar with RT scheme on DWI for detecting pancreatic ductal adenocarcinoma. Therefore, BH-SMS is preferable to the clinical routine when the patients have an excellent breath-holding ability.

There are several limitations in our study. First, the DWI assessments were performed only among young healthy volunteers. These subjects had a good breath-holding ability and gained the best ADC repeatability with BH-SMS. It suggested that BH-SMS can be an optimal hepatic DWI sequence. The diagnostic performance of the BH-SMS sequence warrant further studies for detecting hepatic lesions in clinical settings. Second, the TR were different among various DWI sequences, which may result into a reduced signal due to T1 saturation effects when use a smaller TR, especially in applying high b-values. Lastly, the DWI were assessed only in right lobes because the image quality of left lobes could be affected by the respiration motion, heart beat and gastrointestinal peristalsis. DWI study in liver left lobes should be carried out in the future.

In conclusion, the liver DWI using the BH-SMS technique provided considerable scan time reduction, comparable image quality, sufficient SNR and highest ADC repeatability comparable to STD and other two SMS techniques on 3.0 T MRI. Based on the results presented here, BH-SMS is recommended as the optimal hepatic DWI sequence for subjects with adequate breath-holding capability, which could improve work efficiency, especially in the condition of busy examination schedules in the daily routine.