Introduction

Magnetic resonance (MR) imaging with diffusion tensor imaging (DTI) quantifies proton diffusion based on a 3-dimensional ellipsoid model [1, 2]. Since water diffusion is favored along nerve fiber tracts and hindered perpendicular to the nerve axis by cell membranes and myelin sheaths, peripheral nerve DTI has the potential to deliver specific biomarkers of microstructural nerve integrity and has been investigated in numerous recent studies in healthy volunteers and specific neuropathies [3,4,5,6,7,8,9,10,11].

The main readout parameters of DTI are fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) [12]. The FA is a dimensionless value between 0 (equal diffusion in all directions) and 1 (diffusion in only one direction), is considered to be a marker of nerve tissue damage and is reduced in various pathologies [3, 8, 10, 11]. While MD expresses an average of diffusion in all directions, AD and RD specifically describe diffusion along and orthogonal to the nerve course and might possibly offer specific information about the myelin sheath and axonal integrity but are also modulated by inflammation and edema [3, 12,13,14].

Peripheral nerve DTI is regarded as a promising technique to deliver biomarkers for differential diagnosis and follow-up of various neuropathies. Despite being investigated in various studies [3,4,5,6,7,8,9,10,11], a broad implementation of peripheral nerve DTI in the clinical routine is still lacking. A fundamental question that has to be addressed before implementation is how reliable the readout parameters are, both between different readers (interreader reliability) and between different scanning sessions (test-retest reliability).

The authors are aware of only a few studies that have addressed the reliability of peripheral nerve DTI and amongst them there is great variety in terms of imaging technique, number of scans, anatomical regions, and postprocessing methods [15,16,17,18,19,20,21,22]. Most of these studies expressed the results by reporting the intraclass correlation coefficient (ICC), a frequently reported dimensionless parameter in reliability studies; however, care should be taken when interpreting absolute ICC values since ICC not only depends on the precision of the test but also on the variance of the assessed parameter in the examined population and is therefore sensitive to data range [23, 24]. In simple terms, this means that a particular test has a high ICC (suggesting an accurate test) in a group where the measured parameter differs largely between individuals but a low ICC (suggesting an inaccurate test) in a group where the measured parameter occurs only in a low range, even if the test is exactly the same. A statistical parameter which describes the precision of the method independently of the variance in the population is the standard error of measurement (SEM) [25]. In addition, SEM not only enables classification of reliability in qualitative terms, such as good or moderate but also represents a quantitative measure of precision, expressed in the same physical unit as the measured quantity [25]. The concept of the SEM is easy to understand in clinical practice when imagining that a certain test measures values around a true value with a certain error. Assuming a normal distribution, approximately 68% of all measurements the test delivers are ±1 SEM around the true value and around 95% of all measurements the test delivers are distributed ±2 SEM around the true value [25]. To our knowledge, a systematic assessment of both interobserver and test-retest SEM is still lacking for all major parameters of peripheral nerve diffusion tensor imaging.

Although advances in imaging techniques now even enable an investigation of even small-caliber nerves, this study still focused on the sciatic nerve as one of the most commonly investigated nerves in MRN, especially when describing systemic neuropathies [5, 26,27,28,29,30]. The relatively straight course and the large caliber of the sciatic nerve represent good technical conditions for MRN, especially for quantitative imaging.

The aim of this study was to determine the interreader and test-retest reliability of peripheral nerve DTI by providing the SEM and ICC of scalar DTI parameters. A cohort of 27 healthy participants were prospectively examined where each received 3 scans of a standardized peripheral nerve DTI protocol, which were analyzed by two independent readers.

Material and Methods

This study was approved by the institutional ethics committee. Written informed consent was obtained from all participants.

Study Subjects

For this study 28 healthy young men were recruited by public announcement and prospectively enrolled between October 2016 and August 2018. Of the participants one was excluded due to motion artifacts during the study so that 27 participants were finally analyzed. The mean age was 24.2 ± 3.2 years (range 19–30 years), mean weight was 76.6 ± 7.6 kg, mean height was 180 ± 5 cm, mean body mass index (BMI) was 23.8 ± 2.2. The dominant side was right in 20/27 and left in 7/27.

All participants underwent three MR scans on different days, all of which were analyzed by two independent readers (flowchart in Fig. 1). Inclusion criteria were male sex and age between 18 and 35 years. Exclusion criteria were any known diseases of the peripheral nervous system as well as general contraindications for MRI and severe motion artifacts in the MRI examination.

Fig. 1
figure 1

Flowchart of study design. A total number of 27 healthy volunteers underwent MR neurography on 3 separate days, each covering the sciatic nerve in the same anatomical region. Postprocessing and parameter readout was performed by two independent observers. Subsequently, statistical analyses were performed to calculate interreader agreement and test-retest reproducibility

MR Imaging

All participants were repeatedly examined at a 3.0 T MR scanner (Magnetom Prisma-FIT, Siemens Healthcare, Erlangen, Germany) on 3 separate days. The mean timespan between the scanning sessions was 5.7 ± 2.3 days between scans 1 and 2 and 4.93 ± 1.11 days between scans 2 and 3. All participants underwent MR neurography of the dominant leg and were examined in a supine position with feet first and legs extended. A 15-channel transmit-receive knee coil (Siemens Healthineers, Erlangen, Germany) was placed at mid-thigh to distal thigh level and great care was taken that the distal end of the coil was approximately corresponding to the superior patella margin. This position not only allowed high reproducibility but also made it possible to perform high resolution imaging of the sciatic nerve on patients with larger thigh circumferences. In order to ensure a comfortable position and to avoid motion artifacts the thigh was immobilized with pads. In this position, the following image protocol was carried out:

  1. i.

    Axial T2-weighted turbo spin echo (TSE) 2D sequence with spectral fat saturation, phase encoding in anteroposterior direction, TR/TE 8640/54 ms, matrix size 512 × 333, phase resolution 65%, field of view 160 × 160 mm2, voxel size 0.3 × 0.3 × 3.5 mm3, slice thickness 3.5 mm (distance factor 10%), number of slices 41, number of averages 3, acquisition time 6 min 48 s.

  2. ii.

    Axial single-shot spin-echo echo planar imaging (EPI) DTI with diffusion gradients in 20 directions (b-value = 0 and 800 s/mm2) and fat suppression (spectrally adiabatic inversion recovery, SPAIR), phase encoding in anterior-posterior direction, TR/TE 7100/67 ms, matrix size 120 × 120, field of view 160 × 160 mm2, voxel size 1.3 × 1.3 × 3.5 mm3, slice thickness 3.5 mm (distance factor 10%), number of slices 41, number of averages 2, acquisition time 5 min 42 s.

Image Postprocessing

Image analysis was performed by two independent readers (M.K. and F.P. with more than 5 and 3 years of experience in neuromuscular imaging, respectively) using the DICOM-viewer OsiriX (Pixmeo Sarl, Bernex, Switzerland). Image quality was visually assessed by both readers and one participant was excluded in consensus due to severe motion artifacts, and thus the data of 27 participants were analyzed in this study. All analyses in this study refer to these 27 participants. A total of seven slices of interest were defined at mid-thigh to distal-thigh level in each scan in identical anatomical positions by F.P. and then used for independent segmentation by both readers. Segmentation of the sciatic nerve was performed in the anatomical T2-weighted image by freehand regions of interests (ROI) where nerve size and shape could be well-delineated (Fig. 2). Analysis of the sciatic nerve was restricted to the tibial portion to avoid inclusion of fat and connective tissue. The ROIs were then copied to the corresponding b0-image (b = 0) and slightly adjusted, mostly downsized, to avoid partial volume artifacts from surrounding fat tissue and to compensate possible distortion artifacts [19]. The FA, MD, AD, and RD were obtained using the OsiriX plugin DTI map with a preset noise threshold of 14. For further analysis, DTI parameters were averaged from all 7 slices.

Fig. 2
figure 2

Representative images of nerve segmentation. Segmentation of the tibial portion of the sciatic nerve was conducted independently by each reader using a free-hand ROI tool in OsiriX. ROIs were transferred onto co-registered DTI maps and slightly adjusted in the b0-image before DTI parameters were obtained. FA fractional anisotropy, MD mean diffusivity, RD radial diffusivity, AD axial diffusivity, T2w T2-weighted image

Statistical Analysis

All statistical analyses were performed using SPSS (Version 24; SPSS Inc, Chicago, IL, USA) or the computing environment R (Version 3.4.4; R Development Core Team, R Foundation for Statistical Computing, Vienna, Austria). Values are shown as mean ± standard deviation unless otherwise stated. Analyses of variance were conducted with the factors readers and scans (for each reader) to test for differences. Subjects were considered as random effects. Bonferroni correction was applied to correct for multiplicity. To assess interreader reliability a single measurement, absolute agreement, two-way random effects model, ICC (2,1) according to Shrout and Fleiss [31], was used to calculate ICCs with 95% confidence intervals (CI). The ICCs and its 95% CIs for the test-retest reliability were estimated based on a single measurement, absolute agreement, two-way mixed effects model as suggested by Shrout and Fleiss. Based on Koo and Li, ICC values between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.9 were considered as moderate, good, and excellent agreement, respectively [32]. Furthermore, a Bland-Altman analysis for repeated measurements was performed and illustrated in Bland-Altman plots [33]. To estimate the interreader and test-retest variability the standard error of measurements (SEM) according to Popović and Thomas was calculated [25].

Results

Averaged over all measurements the mean ± standard deviation FA was 0.507 ± 0.05. MD was 1308.5 ± 162.4 ×10−6 mm2/s, RD was 905.6 ± 145.4 × 10−6 mm2/s and AD was 2114.1 ± 219.2 × 10−6 mm2/s. Testing with ANOVA revealed no significant differences between mean values of different scans and different readers for all DTI parameters. Detailed descriptive statistics of DTI parameters for all three scans and both readers are presented in Table 1 and visualized as scatter dot plots in Supplementary Fig. 1.

Table 1 Descriptive statistics of DTI measurements

The ICCs for interreader and test-retest reliability are presented in Table 2. For interreader reliability they ranged from 0.81 to 0.92 implying good to excellent agreement according to Koo and Li [32]. The ICCs for test-retest reliability had only slightly lower values and ranged from 0.76 to 0.92, also implying good to almost perfect agreement.

Table 2 Intraclass correlation coefficients (ICCs) for interreader and test-retest reliability

The SEM for interreader reliability and test-retest reliability are presented in Table 3. The SEM for FA was 0.02 for interreader and test-retest agreement, the SEM for MD, RD, and AD ranged between 46.2 × 10−6 mm2/s and 70.1 × 10−6 mm2/s for interreader reliability and between 45.9 × 10−6 mm2/s and 70.1 × 10−6 mm2/s for test-retest reliability. These SEM values can also be put in relation to the overall mean values, which resulted in a measurement error of approximately 3.9% for FA, 3.8% for MD, 5.1% for RD and 3.3% for AD for interreader reliability and 3.9% for FA, 3.7% for MD, 5.1% for RD and 3.3% for AD for test-retest reliability.

Table 3 Standard error of measurement (SEM) for interobserver and intraobserver variability

Bland-Altman plots for interreader reliability of all DTI parameters are shown in Fig. 3. Bland-Altman plots for test-retest reliability are shown in Fig. 4 (for FA) and in Supplementary Figs. 2–4 (for MD, RD and AD). As shown in the Bland-Altman plots, measurement error mainly occurs as a random fluctuation and not as a systematic error between scans or readers (bias). Moreover, visual assessment of Bland-Altman plots does not reveal a proportional bias (e.g. systematically higher measurement error for higher or lower FA, MD, or AD values).

Fig. 3
figure 3

Bland-Altman plots for assessment of interreader reliability of all DTI parameters with single values for bias (grey line) as well as for upper and lower limits of agreement (dotted lines). a FA fractional anisotropy, b MD mean diffusivity, c RD radial diffusivity, d AD axial diffusivity

Fig. 4
figure 4

Bland-Altman plots for assessment of test-retest reliability of fractional anisotropy (FA) values for both observers and all three scans, respectively. Plots as typically illustrated with single values for bias (grey line) as well as for upper and lower limits of agreement (dotted lines). a, b Scan 1 vs. (versus) Scan 2; c, d Scan 2 vs. Scan 3; e, f Scan 1 vs. Scan 3 for Reader 1 and 2, respectively

Discussion

This study assessed the interreader and test-retest reliability of peripheral nerve DTI in a cohort of 27 healthy participants. The standard error of measurement (SEM) for all major DTI parameters is reported and thus orientation values are provided, which quantitatively describe the imprecision associated with repeated DTI measurements.

Until now, precision of peripheral nerve DTI has been assessed in several studies [15,16,17,18,19,20,21,22], examining the median nerve [15, 18, 19], the brachial plexus [16, 17], the tibial and peroneal nerve [20], the sciatic nerve [21] as well as the lumbar and sacral nerves [22]. The design of these studies was heterogeneous and while most assessed interreader reliability, only three [16, 20, 22] assessed the test-retest reliability by repeated MR scans with identical parameters. All existing studies came to the conclusion that peripheral nerve DTI is principally reliable, and this conclusion is mainly based on the analysis of intraclass correlation coefficients (ICCs). The findings of the present study are in line with this conclusion as ICCs between 0.81 and 0.92 are reported for interreader and between 0.76 and 0.91 for test-retest reliability, which can be interpreted as good to excellent agreement [32].

The ICC is a dimensionless value between 0 and 1 and its calculation is based on the relation of the variance of the examined parameter in the population and its variance due to measurement error [25]. While ICC allows a general classification of reliability in terms of broad categories, it always has to be considered in conjunction with the variance of the assessed parameter in the examined population and not as an inherent property of the test itself.

The SEM is a statistical parameter that describes measurement precision independently of the variance in the population [24, 25]. It represents the standard deviation of the measurement error and while some SEM values have been briefly reported for interreader FA and apparent diffusion coefficient (ADC) values of the brachial plexus in one study [17], a systematic assessment of SEM in peripheral nerve DTI was still lacking. This study reports SEM values for both interreader and test-retest reliability of all major DTI parameters. The results of the present study may serve as orientation values when interpreting DTI parameters on an individual patient level.

Assuming a normal distribution of the measurement error, the SEM may be used to calculate CI around a measured value that cover the true value with 95% probability (95% CI: measured value ±1.96 × SEM) [25]. Moreover, SEM allows calculation of the minimum detectable difference (MDD = 2.8 × SEM), which might be particularly useful when DTI parameters are used as biomarkers for follow-up of neuropathies. When the change of the measured parameter is larger than the MDD, it is likely due to a true change in the measured parameter and thus suggestive for a true nerve pathology. Otherwise the changes would probably only indicate a fluctuation in the observed parameter due to an accumulation of errors during recording (e.g. bias due to signal to noise ratio, partial volume effect that increases as nerve size decreases, motion artifacts) and/or postprocessing (e.g. segmentation errors) [25].

Among all DTI parameters, the FA is the most commonly reported parameter and regarded as a general measure of nerve tissue injury. In this study an SEM = 0.02 was found for both test-retest and interreader reliability of FA. This implies an MDD of 2.8 × 0.02 = 0.06. These values may be regarded in conjunction with mean values of peripheral nerve FA that have been reported for various peripheral nerve diseases in multiple studies (see Table 4; [3,4,5,6,7,8, 10, 14, 30, 34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]). In these studies, mean FA differences between healthy and diseased nerves ranged between 0.06 and 0.25, thus they are at least in the range of the MDD reported here.

Table 4 Fractional anisotropy in various diseases as reported in previous studies of peripheral nerve DTI (nonexhaustive)

The authors are aware of at least two longitudinal studies of peripheral nerve DTI in humans. Simon et al. assessed DTI as a method of disease progression in amyotrophic lateral sclerosis (ALS) and found tibial nerve FA to be reduced from 0.434 (baseline) to 0.415 (after 6 months) [49]. While this difference (0.019) was significant on a group comparison level, a similar difference in a single patient, in a hypothetical examination setting like the present study, would be below the MDD and might therefore not be reliably differentiated from pure measurement error. Lichtenstein et al. on the other hand found that FA values in a clinically stable cohort of CIDP patients remained nearly unchanged over a follow-up period of 6 months and were still significantly lower compared to healthy controls (0.34 ± 0.01 at baseline; 0.35 ± 0.02 after 6 months; healthy controls 0.45 ± 0.01) [50]. To further elucidate the potential of DTI as a tool for individual patient follow-up, more longitudinal studies are needed that examine various pathologies and take into account the imprecision on an individual patient level.

This study has some limitations. First, the examined cohort was healthy and demographic variables were relatively homogeneous. It was intended to assess measurement error under optimal conditions and it was decided to conduct this study in a young collective, since in our experience they are easier to position and exhibit less motion artifacts than many older subjects. A lower variance of the DTI parameters in the cohort than in a less homogeneous group may have led to lower ICC values. A larger study including older subjects and also participants with peripheral nerve disease would be desirable. While this cohort consisted of male participants the results are not expected to depend on the subjects’ sex, as it has been shown that DTI parameters do not systematically differ between sexes [9]. Second, analysis was restricted to the sciatic nerve, which is well-suited for MRN. Analysis of other nerves with smaller caliber or oblique course, like in the upper extremities, may lead to higher measurement error. Therefore, the values given should be primarily used when performing MRN of the sciatic nerve. Moreover, patient positioning, MR examination and nerve segmentation were performed by experienced radiologists and the number of readers was two. Conditions in a clinical setting might differ and therefore the values of measurement error presented here should be considered as minimum values. Furthermore, the results are dependent on the segmentation method itself. It was chosen to perform the initial segmentation on high-resolution anatomical T2-weighted images where nerve morphology was accurately depicted and then the ROIs were transferred to the b0-image where they were adjusted. This method may use the advantages of high-resolution morphologic imaging while also compensating possible distortion artifacts. A direct segmentation based on the diffusion maps, e.g. FA map, would have the disadvantage that possible pathologies within the nerve could unintentionally be excluded due to their potentially reduced FA. Following segmentation, all DTI parameters were averaged from seven adjacent slices, similar to previous studies [3, 49, 50]. While averaging of multiple slices minimizes noise and segmentation errors, a relatively small coverage in the z‑direction of approximately 2.7 cm reduces possible effects from B1-field inhomogeneities and from potential physiological alterations of DTI parameters along the course of the nerve [34, 51]. Also, despite being relatively small the coverage is large enough for a robust re-identification of a nerve segment in relation to anatomical landmarks in follow-up examinations. The segmentation technique could be similarly applied in a clinical setting, e.g. in the follow-up of systemic neuropathies. In the future autosegmentation methods will probably become more important since they have already been applied to the sciatic nerve with good accuracy [52]. The full potential of these methods remains to be elucidated. Finally, this study represents a single vendor study. Assessments of other manufacturers, studies with other sequence parameters, subject populations, segmentation methods and postprocessing software are desirable and the values of measurement accuracy presented here should be interpreted with caution.

In summary, this study assessed interreader and test-retest reliability of peripheral nerve DTI in a cohort of 27 healthy volunteers. This study confirms adequate reliability of peripheral nerve DTI in principle and by reporting SEM for all major DTI parameters it also quantitatively assesses the imprecision that is associated with different readers or multiple scans. The values presented here may serve as first orientation values of measurement precision when interpreting parameters of sciatic nerve DTI.