Introduction

White matter pathways of the brain can be non-invasively visualized and quantitatively characterized using diffusion tensor imaging (DTI) [1]. Diffusion can be characterized using scalar quantities such as fractional anisotropy (FA), which tells the degree of diffusion anisotropy, and the mean diffusivity (MD), which is used to measure the orientationally averaged diffusivity [1, 2]. Maturation of brain white matter is characterized by increasing FA and decreasing MD [3, 4]. In previous studies, it has been reported that premature infants have lower anisotropy and higher MD in several brain areas compared with full-term infants at term-equivalent age [57]. Lower FA and higher MD compared with normal have been associated with neurological abnormalities and disabilities in later development [8, 9].

The typical way to analyse DTI images is to manually fit ROIs (regions of interest) to selected brain regions and measure parameter values [1012]. The location of ROIs is usually based on anatomical information of the b0 images, the direction-coded anisotropy maps or conventional sequences obtained at the same session as the DTI imaging [4, 1315]. The accuracy of the ROI location is based on anatomical knowledge of the researcher performing the measurements. In several studies, fixed-shape fixed-sized ROIs have been used. However, better intra-observer and inter-observer reproducibility of parameter value measurements has been reported using anatomically shaped ROIs [16]. The value of these measurements is highly influenced by the reproducibility and validity of the measurements. Another way to analyse DTI images is to use group-level methods such as TBSS (track-based spatial statistics) [17].

Reproducibility studies of DTI parameter value measurements was first done to examine the variability between FA and MD values when imaging was done with MRI scanners from different manufacturers, and significant variation of mean values was found [18]. Reproducibility has also been examined between repeat imaging sessions with the same equipment, e.g. in hippocampal areas, and excellent reproducibility was found [19]. However, parameter value reproducibility may have significant regional variability [2022]. Most of the reproducibility studies of DTI parameter value measurements have been performed on DTI images of adult brains [19, 22]. However, reproducibility may vary in infants compared with adults as FA and MD measurements are more susceptible to partial volume artefacts in infants. This is because brain structures are smaller and still maturing. Reproducibility of FA and MD value measurement of the pyramidal tracts has been reported on term infants [23]. However, regional variability between different white matter tracts has not been reported in infants.

The aim of this study is to measure intra-observer and inter-observer reproducibility of ROI-based FA and MD measurements of the brain areas in prematurely born infants imaged at term. One DTI sequence was obtained and all measurements were performed on the same dataset. Intra-observer reproducibility was based on repeated ROI measurements performed by the same observer. Inter-observer reproducibility was investigated by comparing ROI measurements between two observers. Measurements were performed in several brain regions. In this study, we used known anatomy to draw ROIs.

Materials and methods

Subjects

This study is a part of the Development and Functioning in Very Low Birth Weight Infants from Infancy to School Age (PIPARI) study at Turku University Hospital. Inclusion criteria to the PIPARI study were birth at gestational age 31 weeks and 6 days or below, or birth weight < 1,500 g. Imaging was performed with a 1.5-T MRI scanner using a DTI sequence. Exclusion criteria were: death during neonatal period major congenital anomalies or recognized syndromes. The total number of infants included was 76. Gestational ages at birth ranged from 23 weeks and 3 days to 34 weeks and 3 days. MRI was performed once at term-equivalent age. The study protocol was approved by the Ethics Review Committee of the hospital. All families gave informed consent.

MRI

The MRI was performed with a 1.5-T MRI system (Gyroscan Intera CV Nova Dual; Philips Medical Systems, Best, The Netherlands) with a SENSE head coil. MRI was done during postprandial sleep without any pharmacological sedation. The infants were swaddled to calm them and to reduce movement artefacts. A pulse oximeter was routinely used during the scan. A physician attended the examination if necessary to monitor the infant. Ear protection was used (3M disposable ear plugs 1100; 3M, Brazil/Wurth hearing protector, no. 899 3000 232; Wurth, Böheimkirchen, Austria).

The sequence used for diffusion imaging was a single-shot, echo-planar imaging with SENSE. TR/TE, 2,264/68 ms. The axial slice thickness was 5 mm, with a gap between slices of 1 mm. A 200-mm-square field of view (FOV) was used. Imaging matrix was 112 × 89 and the reconstructed voxel size 0.78 mm × 0.78 mm. Number of signal averages was 2, SENSE reduction 2 and EPI factor 47. The b values used for diffusion weighting were 0, 600 and 1,200 s/mm2, with 15 directions. However, only images with b values 0 and 600 s/mm2 were used for the analysis. Fat suppression was done using spectral presaturation with inversion recovery (SPIR). In addition, the imaging protocol included conventional T1-weighted, T2-weighted and FLAIR seuences.

Processing and regions of interest

Image analysis was done using PRIDE V4 Fiber Tracking 4.1 beta 4 (Philips Medical Systems, Best, The Netherlands). ROIs were drawn based on anatomical structures on the directionally encoded anisotropy color maps. For anatomical reference, b0 images were used. ROI size and shape varied between different structures. The smallest ROIs were drawn to enclose the colliculus inferior and largest to enclose the calcarinus cortex or posterior white matter.

The ROIs were drawn to enclose the anterior and posterior part of corpus callosum, posterior part of capsula interna, corona radiata, putamen, thalamus, radiatio optica, colliculus inferior, cortex calcarinus and frontal and posterior white matter. Bilateral structures were measured individually. FA and MD were measured. ROIs were drawn twice by one observer (observer A, medical physicist supervised by neuroradiologist with 2 years’ experience in neuroradiology) to evaluate intra-observer reproducibility and once by a second observer (observer B, senior neuroradiologist with 13 years’ experience in neuroradiology) to evaluate inter-observer reproducibility. ROI placements are shown in Fig. 1. Observers scored the image quality independently. Only images that both observers agreed were of acceptable quality were used in this study. Images from seven infants were removed from this study due to movement artefact. Additionally, if image quality was insufficient in some slices of the whole DTI slice set, no measurements were made from those individual slices. Because of this, the number of ROIs varied among different structures.

Fig. 1
figure 1

Position of ROIs demonstrated on colour-coded fractional anisoptropy maps and b0 image. a Colliculus inferior, (b) anterior part of the corpus callosum (arrowhead), putamen (short arrow), internal capsule, thalamus (long arrow), optic radiation (short wide arrow) and calcarine cortex (long wide arrow), (c) posterior part of the corpus callosum, (d) corona radiata and (e) frontal white matter and posterior white matter (arrow)

Statistical analysis

Intra-observer and inter-observer reproducibility were assessed using the intra-class correlation coefficient (ICC) and Bland-Altman proposed limits of agreement [24]. We decided to use both methods since they can provide inconsistent results [25]. The ICC reflects relative homogeneity within the observers or measurements of one observer in relation to the total variation. ICCs were calculated between observers A and B and between first and second measurement of observer A using repeated measurements ANOVA. Reproducibility was classified as “excellent” when ICC is greater than 0.75, “fair to good” when ICC is more than 0.4 but less then 0.75 and “poor” when ICC is less than 0.4 [26]. Variation in ROI size was taken into account in the analyses and the association of ROI size and measurements were also assessed. In addition, the differences in ROI size between the observers were tested using repeated measurements ANOVA. Residuals were checked for justification of the analysis and logarithmic or power transformations of the variables were used in the analyses when appropriate. P < 0.05 was considered statistically significant. Statistical analyses were performed using SAS System for Windows, version 9.1.3. (SAS Institute, Cary, NC, USA).

Results

MRI was performed at term-equivalent age. Gestational ages at MRI ranged from 39.1 to 44.1 (mean ± SD, 40.1 ± 0.63) weeks.

The intra-observer reproducibility of FA measured with ICC was excellent bilaterally in the calcarine cortex and in the frontal white matter on the left side. In other structures, intra-observer reproducibility of FA was fair to good. The intra-observer reproducibility of MD was excellent in the anterior part of the corpus callosum and bilaterally in the posterior limb of the internal capsule, corona radiata, putamen and frontal white matter. In the optic radiation, the reproducibility was excellent on the left and in the thalamus and calcarine cortex on the right. In other structures, the intra-observer reproducibility of MD was fair to good. Results of the ICC measurements are summarized in Table 1. Measured values for FA and MD are shown in Figs. 2 and 3.

Table 1 Intra-observer and inter-observer reproducibility for fractional anisotropy (FA) and mean diffusivity (MD)
Fig. 2
figure 2

Fractional anisotropy (FA) of different brain areas in preterm infants at term-equivalent age (median; 25th and 75th centiles). White bar observer A (first measurement), light-grey bar observer A (second measurement), dark-grey bar observer B (cc corpus callosum, ic internal capsule, cr corona radiata, or optic radiation, coll inferior colliculus, caco calcarine cortex, WM white matter)

Fig. 3
figure 3

Mean diffusivity (MD) in different brain areas in preterm infants at term-equivalent age (median; 25th and 75th centiles). White bar observer A (first measurement), light-grey bar observer A (second measurement), dark-grey bar observer B (cc corpus callosum, ic internal capsule, cr corona radiata, or optic radiation, coll inferior colliculus, caco calcarine cortex, WM white matter)

Inter-observer reproducibility of FA measured with ICC was excellent in the posterior part of the corpus callosum. In most of the structures, inter-observer reproducibility of FA was fair to good, but in the frontal and posterior white matter on the right, it was poor. The inter-observer reproducibility of MD was excellent in the posterior part of the corpus callosum, bilaterally in the posterior limb of the internal capsule and in the corona radiata on the right side. In most of the structures, inter-observer reproducibility of MD was fair to good. In the inferior colliculus on the right side the reproducibility was poor.

Bland-Altman analysis showed good results for measured structures based on visual reviewing of the Bland-Altman scatter plots. The scattering plots showed no considerable bias, and the variance was independent of the mean value. Limits of agreement varied between structures. The differences between measurements were smallest for FA in the calcarine cortex and for MD values in the internal capsule for both intra- and inter-observer comparisons. The largest differences between measurements were in the corpus callosum for FA for both intra-observer and inter-observer comparisons. For inter-observer comparisons, difference in MD values was largest in the posterior white matter and for intra-observer comparison in the posterior part of the corpus callosum. Scatter plots of the posterior limb of the internal capsule and the inferior colliculus are shown in Fig. 4. Limits of agreement for all measured structures are shown in Table 2.

Fig. 4
figure 4

Bland-Altman scatter plots of the inter-observer mean diffusivity values in the right (a) and left (b) posterior limb of the internal capsule and the right (c) and left (d) inferior colliculus. In the y-axis, difference between observers with limits of agreement (dotted lines) and bias (solid line); on the x-axis is the mean value of the two observers

Table 2 Limits of agreement for FA and MD

ROI size varied significantly and systematically between observers. Observer A’s ROIs were always smaller than those of observer B’s. Observer A’s ROIs were smaller on the second measurement compared with the first for most of the structures. When intra-observer reproducibility was assessed using ICC, ROI size had a significant effect to FA values in the inferior colliculus and in the frontal white matter on the left side. For MD values, the effect was significant in the inferior colliculus bilaterally and in the posterior limb of the internal capsule on the left side. When inter-observer reproducibility was assessed using ICC, ROI size had a significant effect on FA in the anterior and posterior parts of the corpus callosum, the posterior limb of the internal capsule on the left side, the optic radiation on the right side and bilaterally in the inferior colliculus, putamen and thalamus. ROI size had a significant effect on MD values in the anterior and posterior parts of the corpus callosum, in the posterior limb of the internal capsule on the right side and bilaterally in the inferior colliculus.

Discussion

Our results suggest that FA and MD measurements of different brain areas are adequately reproducible at term-equivalent age when ROI fitting is done using prior anatomical knowledge and when variation in the ROI size is taken into account. Our results showed no systematic difference in reproducibility between white and grey matter areas or between the hemispheres. Both intra-observer and inter-observer reproducibility assessed with ICC and the Bland-Altman methods varied among structures and between FA and MD values. This reflects the facts that there are brain areas that can be anatomically delineated precisely, e.g. the internal capsule. In these areas, reliable measurements can be performed repeatedly. In frontal or posterior white matter, the most representative area for ROI measurements is more challenging to delineate repeatedly. Also, vicinity of ventricles can cause additional variation to measurements.

We found intra-observer reproducibility for MD generally better than for FA when assessed with ICC. Half of the reproducibility values for MD were classified as excellent and the rest as fair to good. Most values were classified as fair to good for FA. Different ROI size, shape and positioning relative to surrounding tissues may cause variation between measured FA and MD values. This may cause poorer reproducibility. The effect of ROI size is due in part to volume and spatial variation of parameter values within the brain. The effect of ROI size was quite systematic only in the inferior colliculus. One possible explanation is the small size of this structure.

Inter-observer reproducibility was similar for both parameters. In most structures reproducibility was fair to good when assessed with ICC. However, the inter-observer reproducibility was poor in the right frontal and posterior white matter for FA and in the right inferior colliculus for MD. The poor reproducibility values may be due to different ROI shapes or to positioning relative to white and grey matter, as the ROI size did not affect values of FA and MD in the frontal and posterior white matter. In addition to ROI positioning, poor reproducibility value of MD on the right side in the inferior colliculus may be because of the small size of the target and because of pulsatility. Pulsatile brain motion can artificially increase MD values or increase standard deviation in structures adjacent to ventricles and inferior to the corpus callosum [2729]. This may cause spatial heterogeneity between voxels in these structure. However, it is unclear why the effect was not bilateral.

In most structures, both ICC and LA gave similar results for intra-observer and inter-observer reproducibility. Results were inconsistent in the thalamus, cortex calcarinus and frontal and posterior white matter for FA values. For MD values, results were inconsistent also in the corpus callosum and in the optic radiation.

The first studies to evaluate intra-observer and inter-observer reproducibility in adults revealed high reproducibility in the hippocampal area [19]. However, regional distribution of reported reproducibility values varies among studies. Compared with our results, higher intra-observer reproducibility of FA and MD values (CV ≤ 2.7% and ICC ≥ 0.96) and for inter-observer reproducibility (CV ≤ 2.7% and ICC ≥ 0.90) were reported in cerebral peduncle, anterior and posterior limb of internal capsule, genu of corpus callosum, superior corona radiate and cingulum [16]. However, a similar regional distribution to ours was observed in another study in corpus callosum, cortical spinal tract, internal capsules, basal ganglia and centrum semiovale [22]. Reproducibility varied from slight to substantial agreement [22]. Our results were similar to published reports in the internal capsule, but variation was larger in the corpus callosum [30]. In other structures, limits of agreement have not been published to our knowledge. Measured FA and MD values are similar to those reported in previous studies [15, 31, 32]. However, the deviation was greater. This is not unexpected as our patient population included infants with brain injury (including severe).

The main limitation of our study is image resolution, especially slice thickness. Decreasing slice thickness would benefit measurements in small structures in particular. Inter-slice gaps may have had a negative effect. Also, the ROI size may have caused variation between measured FA and MD values. This might have caused lower reproducibility. Another limitation is that we did not measure signal-to-noise. Decreasing signal-to-noise causes an upward bias in FA but no significant effect on MD [33]. In addition to pulsatile brain motion, this could cause spatial heterogeneity of parameters, especially FA, which might have led to greater variation with in measured values. Thus, different ROI positioning may have caused poorer reproducibility. It may be argued that stricter criteria for interpretation of correlations should have been used as the interpretation of ICC is not evidence-based. This fact and the possible effect of outliers might have led to overestimation of reproducibility.

Conclusion

Although parameter values are reported constantly for DTI studies, the reliability of these values needs to be evaluated. The reproducibility of anatomy-based ROI measurement of FA and MD was fair to good in this study. However, ranges for optimal ROI size in different brain regions and different patient groups might be useful for improving the intra-observer and inter-observer reproducibility. In future studies, the benefits and limitation of tractography-based ROI selection need to be investigated also in infants.