Keywords

1 Introduction

Tractography has significantly advanced clinical applications [5, 6, 25] and has enabled neuroscientists to study developmental and pathological effects on the human connectome [14]. Traditional tractography pipelines often use anatomical segmentations to obtain priors for reconstructing tracts from diffusion-weighted MRI (dMRI). This introduces a dependency on T1-weighted (T1w) images, which are required for anatomical segmentation by neuroimaging suites such as FreeSurfer [9]. For dMRI microstructural analyses, accurate segmentations of the grey/white matter (GM/WM) boundary are particularly important as different biophysical models have been proposed for each tissue type [22]. However, segmenting in T1w, rather than diffusion image space is problematic due to non-linear distortions between modalities, as well as potential registration inaccuracies and interpolation artifacts when mapping segmentation labels from anatomical to diffusion image space. Furthermore, enabled by acquisitions with high angular resolution and multiple b-values, dMRI-derived cytoarchitectonic boundaries may in the future complement or supersede T1w-derived segmentations for morphometric analyses. Addressing this need for fast, accurate, dMRI-based segmentation, we present a framework for segmenting 170 GM, WM, and subcortical regions in native diffusion space, without requiring high-quality T1w images.

Methods such as SLANT and FastSurfer [11, 12] introduce deep learning for neuromorphometry, yet still rely on T1w images. Traditional [13, 30, 31, 35] and deep learning-based [15, 19] methods extend segmentation to diffusion-weighted images (DWIs) for various acquisition protocols and dMRI representations. Applications of segmentation based on the inherently multi-channel dMRI signals include whole-brain GM/WM/Cerebrospinal fluid [31, 35], WM regions [19, 30], nuclei (cerebellar [15] and thalamic [13]), organs [3, 8, 26, 39], tumors [28] and stroke lesions [4, 20]. As an alternative approach to traditional tractography, neural networks can directly segment WM tracts based on dMRI [16, 17, 21, 23] or diffusion orientations [32, 33, 37, 38], from clinical [21] or high-quality [32] datasets. Unfortunately, segmenting WM tracts as volumetric labels does not provide an along-the-tract parameterization which is useful for point-wise analyses of microstructural and geometric features of the tracts. In contrast to direct, volumetric tract segmentation approaches, the present work aims to provide the information that guides traditional tractography methods. No work to date has addressed whole-brain segmentation, including subcortical structures, cortical regions, and WM regions underlying the cortex, directly from DWIs.

Here, we introduce deep learning-based segmentation of 170 distinct regions (cortical, subcortical, and WM) from DWIs (see Fig. 1). Thus, we provide a fast, deep learning alternative for a critical step in dMRI pre-processing streams, which can facilitate the use of classical tractography methods, without limiting their outputs and subsequent analysis options. Basing segmentations purely on dMRI data removes the dependency on high-quality T1w images, the potentially error-prone, cross-modal co-registration and interpolation, and avoids confounding non-linearities between anatomical and diffusion spaces. To develop effective segmentation of dMRI data, we compile a dataset of DWIs and reference segmentations generating the latter by mapping FreeSurfer segmentations [9] to diffusion space. We adopt the anatomy-targeted FastSurfer architecture [11], which already supports the segmentation of a large number of regions. Moreover, expanding on the work of Li et al. [17] for WM tract segmentation, we explore suitable dMRI data representations for learning-based segmentation. We compare inputs consisting of images without diffusion-weighting, diffusion tensor components, or DWIs and vary the number of DWIs from which tensors are generated.

When compared against FreeSurfer, our method achieves performance comparable to the state-of-the-art at orders of magnitude faster processing times. As a use case, we integrate dMRI-based segmentations into the tractography package TRACULA (TRActs Constrained by UnderLying Anatomy) [36], which performs global probabilistic tractography with anatomical priors. Differences of TRACULA tracts based on deep learning versus traditional anatomical initialization (see Fig. 4) are in the order of previously reported differences between automated TRACULA reconstructions and manual labels while reducing the TRACULA pre-processing run-time by 4 h. With our approach, the segmentation of 170 regions in dMRI space can be achieved in 32 s on a GPU.

Fig. 1
figure 1

Whole-brain segmentation of 170 regions (cortical, sub-cortical and white matter) directly from diffusion MRI (left), and probabilistic white matter tracts generated by TRACULA based on anatomical priors from dMRI-based segmentations (right)

2 Materials and Methods

The tract posterior probabilities computed by TRACULA involve anatomical priors that are calculated from an anatomical segmentation. We perform this segmentation directly in diffusion space and remove the requirement for T1w images by replacing the corresponding component of the TRACULA pipeline with a deep learning network.

2.1 Data

2.1.1 Diffusion MRI Data

We use pre-processed DWIs from the WU-Minn Human Connectome Project (HCP) [7, 18, 24, 27, 29, 34], which are already corrected for eddy-currents and subject motion. These images are acquired with a 3-shell protocol at b-values of 1000, 2000, and 3000 s/mm\(^2\). Each shell is composed of 90 diffusion-encoding gradient directions, approximately uniformly distributed on the sphere. In addition, 18 images without diffusion-weighting are interleaved with the DWIs. For training, validation, and test, we create gender-balanced non-intersecting subsets of 250, 50, and 100 subjects, respectively.

2.1.2 Segmentation Labels

Anatomical segmentations of T1-weighted images of the same subjects are obtained with FreeSurfer 6.0 [9]. We project cortical parcellations from the surface models up to 2 mm deep into the WM, as required by TRACULA. The registration to the diffusion space is performed with the boundary-based rigid registration method bbregister [10].

2.2 Data Representations

Since q-space sampling schemes may comprise anywhere from six to several hundred measurements, a general segmentation approach should be independent of the exact choice of diffusion-encoding directions and b-values. Instead, a suitable representation has to abstract from acquisition details yet contain sufficient relevant information. A parsimonious model that is often fitted to DWIs acquired on shells is the diffusion tensor [1], which models local diffusion as a single (uni-modal) Gaussian distribution. The symmetric \(3 \times 3\) diffusion tensor can be understood as a condensed summary of the local diffusion behavior at a given voxel and can be reconstructed from any q-shell acquisition scheme that includes at least 6 directions. To explore how the number of DWIs used to fit the tensor affects its performance for the segmentation task, we extract multiple subsets of gradient directions on the same shell (approximately uniformly distributed). For each of these subsets, the diffusion tensor is fitted to the data with FSL’s dtifit function and the six unique tensor components are stacked and used as input.

2.3 Architecture

FastSurferCNN [11] is a U-Net-based neuroimage segmentation network validated extensively on anatomical MRI datasets. Three fully convolutional networks are trained independently on axial, coronal, and sagittal slices of MR images. The predictions from the three views are then combined into the final prediction volume by means of a weighted average (view-aggregation). The network uses skip-connections between encoder- and decoder-blocks. In the decoder-blocks, information from the previous decoder-block and the corresponding encoder-block are combined. Instead of simply concatenating these feature maps, FastSurferCNN employs competitive dense blocks to reduce the number of parameters. Competitive dense blocks rely on max-out activations to encourage the network to learn which parts of the provided feature maps are relevant for the segmentation task.

All networks are trained with a combined loss-function containing a median frequency balanced logistic loss with edge-focus and a Dice loss:

$$\mathcal {L} = \underbrace{-\sum _x \omega (x)g_c(x)log(p_c(x))}_{\text {Logistic Loss}}-\underbrace{\frac{2\sum _x p_c(x) g_c(x)}{\sum _x p_c^2(x)+\sum _x g_c^2(x)}}_{\text {Dice Loss}}$$

with \(\omega (x) = \omega _F(x)+\omega _E(x)\), median frequency balanced weights \(\omega _F(x)\), edge-weighting \(\omega _E(x)\) at voxel x, references g, prediction p and class c. In order to provide 3D spatial context, FastSurferCNN’s input consists not only of the slice of interest but also a sequence of neighboring slices.

2.4 Training

We train all networks with an initial learning rate of 0.01, which is reduced every 10 epochs (multiplied by 0.2). Early stopping is applied when the loss on the validation set does not improve for 15 epochs.

2.5 Tracts

To calculate priors for tractography, TRACULA needs registrations between anatomical, diffusion and MNI spaces. Since we provide the segmentation natively in diffusion space, a registration to anatomical T1w space is obsolete and a single registration from diffusion to MNI space suffices. This mapping can easily be established without a T1w image, yet the two different registration-methods (one includes the anatomical image whereas the other does not) introduce an additional variance that prevents a consistent evaluation of tract similarity. We, therefore, re-use the mapping from diffusion to MNI space that was established to generate the reference tracts. Furthermore, to mitigate artifacts from nearest-neighbor interpolation to large voxel sizes, we instead predict and map probability distributions over classes (soft-labels) via tri-linear interpolation, taking the argmax over classes in the target space.

2.6 Evaluation Criteria

2.6.1 Segmentation Quality

We measure the similarity of the segmentation resulting from our method and the segmentation of FreeSurfer mapped to diffusion space, with two evaluation criteria.

The Dice score \(\mathcal {D}_c\) for region c measures the relative overlap between the binary labels of the prediction \(P_c = \{p_{ic} \mid i=1,\ldots ,N\}\) and the reference \(R_c=\{r_{ic} \mid i=1,\ldots ,N\}\) segmentation:

$$\mathcal {D}_c(P, R) = \frac{2 \sum _{i=1}^N p_{ic} r_{ic}}{\sum _{i=1}^N p_{ic} + \sum _{i=1}^N r_{ic}}.$$

The voxel-based mean Hausdorff distance \(\mathcal {H}_c\) for region c measures the difference between prediction \(P_c\) and reference labels \(R_c\) via

$$\mathcal {H}_c(P, R) = \frac{1}{|R_c|}\sum _{r\in R_c} \min _{p \in P_c} ||p-r||_2 + \frac{1}{|P_c|}\sum _{p \in P_c} \min _{r \in R_c} ||p-r||_2.$$

2.6.2 Tractography

Similarly to the segmentation case, we quantify the similarity between pairs of tracts reconstructed with anatomical priors from either segmentation via voxel-based mean Hausdorff distance. Since TRACULA relies on a Markov-Chain Monte-Carlo method, two different sets of tracts are not per se comparable. Thus, in accordance with previous tract evaluations [36, 40], we threshold the tracts at 20% of their maximum intensity.

3 Results and Discussion

To determine the impact on segmentation quality, we evaluate networks with respect to different data subsets and input representations, keeping the network architecture (e.g. number of filters) fixed.

Fig. 2
figure 2

Ablation of neural network input in comparison to FreeSurfer reference segmentation. Evaluation 1: Q-Space Sampling Density (top group of blue bars): 1. Only \(b=0\) image, 2.–5. \(b=0\) + diffusion tensors fitted with varying sampling density (2.–4. only on the first shell with \(b={1000\,\mathrm{\text {s}/\text {m}\text {m}^2}}\), 5. on three shells); Evaluation 2: dMRI data representation (bottom group of green bars) of a fixed set of 30 DWIs on the first shell: \(b=0\) plus 6. an FA map, 7. the diffusion tensor, and 8. DWIs directly

3.1 Evaluation 1: Q-Space Sampling Density

When q-space is sampled more densely, we expect the diffusion tensor to be more accurate due to the improved signal-to-noise ratio (SNR). To explore how different single-shell q-space samplings influence the segmentation quality, we compare FreeSurfer segmentations against our network’s prediction for several scenarios which are displayed in Fig. 2.

The Dice score seems to correlate with the compactness of the shape of anatomical regions. Scores are high for sub-cortical regions, where shapes feature large volume-to-surface ratios, lower in the folded cortical areas, and lowest for the thin cortical projections into the WM. The slim, 2 mm projection on a coarse voxel grid with 1.25 mm isotropic size could be another reason for the smaller Dice scores in WM regions. In fact, the ratio of voxels on the region boundary is significantly higher for white matter regions compared to cortical regions. The Hausdorff distances paint a very similar picture in terms of ranking methods while they are more consistent than Dice scores across WM- and cortical regions.

As expected, the segmentation quality increases when the diffusion tensor is fitted with more DWIs (1.–5.). The segmentation based solely on the image without diffusion-weighting (\(b=0\)) provides a strong baseline, potentially due to the higher SNR at \(b=0\). Yet, the inclusion of additional diffusion information increases segmentation performance further.

3.2 Evaluation 2: Input Representations

We also assess the effect of different input data on the segmentation quality (see Fig. 2, second, green bar group: 6.–8.). Fractional Anisotropy (FA) measures the coherence of water diffusion in a voxel and is frequently used in tract-based analyses. Since FA is a scalar measure computed from the eigenvalues of the tensor, it only contains a subset of the tensor information. As a result, the performance is worse with FA (6.) than with the full tensor (7.). More broadly, the diffusion tensor is a simple model that fails to accurately describe water diffusion in full detail. Thus, it is not surprising that a segmentation directly based on the DWIs (8.) yields better results than one based on the diffusion tensor. However, for the task of segmentation, the diffusion tensor seems to capture most of the relevant information present in the DWIs.

3.3 Evaluation 3: Generalization

In the previous evaluation, networks were trained separately for each set of inputs. This evaluation explores how a network trained on tensor components based on n DWIs performs when it is evaluated on tensor components based on m DWIs (with \(n \ne m\)). This kind of generalizability is a critical property when accommodating data with a variety of acquisition details at test time. Notably, while neither network generalizes perfectly to the different input format, the stable results (Fig. 3) suggest that generalizability can be asserted to a large extent.

Fig. 3
figure 3

Network generalization when tensors are fitted with differently many DWIs

3.4 Evaluation 4: Tract Similarity

Finally, we assess the stability of WM tract generation when switching from traditional T1w image segmentation to our dMRI-based, deep learning approach: For a gender-balanced subset of 20 subjects from the HCP, we reconstruct 18 different WM tracts with TRACULA, using either the proposed, dMRI-based segmentations or the FreeSurfer T1w-based segmentations (see Fig. 4). The deviation is within the margin of the deviation of TRACULA’s tracts from manually-annotated bundles [36], which is around 2 mm for most tracts.

Fig. 4
figure 4

Similarity of tracts based on DWI segmentations versus FreeSurfer’s segmentation. L. \(=\) Left, R. \(=\) Right, sup. \(=\) superior, ant. \(=\) anterior, inf. \(=\) inferior, TE \(=\) temporal endings, PE \(=\) parietal endings

We time both applications on five representative cases (Evaluation 1 and 4). Our method takes 32 s for the anatomical segmentation in diffusion space compared to 4 h with FreeSurfer (parallelization of hemispheres and 4 threads, 7 h sequentially). For the tractography pipeline, our work accelerates the total run time from 283 to 56 min.

4 Conclusion

Our work presents and analyzes the application of deep learning for anatomical segmentation directly on DWIs. Applied to probabilistic tractography with anatomical priors, our method enables processing without the requirement of T1w images and thus avoids errors from non-linear distortions, registration inaccuracies, or interpolation artifacts. As a consequence, dMRI-based anatomical segmentation achieves results similar to corresponding state-of-the-art T1w-based segmentation and speeds up pre-processing in the TRACULA pipeline by hours. Accelerating heavy processing pipelines is essential especially for large cohort studies such as HCP [29] or the Rhineland study [2], where large diffusion datasets from thousands of participants require efficient methods.

Furthermore, we analyze how the choice of input images for the neural network affects segmentation performance. We confirm that segmentation quality increases as more q-space samples are included when fitting diffusion tensors—likely due to increased SNR. Skipping the tensor fit and directly learning from DWIs increases segmentation performance further. However, simply increasing q-space samples is not an option due to memory limitations and reliance on the availability of the same set of q-space samples for future input cases. Tensor-based inputs, on the other hand, provide a widely applicable alternative [17] and yield results that remain relatively stable and close to the direct DWI performance. In our opinion, tensors fitted to 30 DWIs, a number that is feasible in clinical studies, offer a good balance. Future work will explore diffusion models other than the tensor as segmentation inputs and assess generalizability across a large variety of different dMRI datasets.

While we illustrate the use of our deep learning-based segmentation in a pipeline for probabilistic tractography with anatomical priors, it can be useful in a wide range of other applications. These include improved WM/GM and pallidum-putamen segmentation, seed-based tractography, network analysis, or ROI-based analysis of microstructural measures, to name a few.