Keywords

1 Introduction

Diffusion MRI (dMRI) reveals a number of properties of white matter (WM) microstructure. Its sensitivity to water diffusion in living tissue allows us to compute numerous summary measures that relate to neural fiber integrity and architecture in the brain. Based on certain assumptions, each can quantify different aspects of WM microstructure. One of the most basic measures—fractional anisotropy (FA)—is based on the diffusion tensor model (DTI) [1], and continues to be popular despite its known limitations, which include its ambiguity at fiber crossings. Other models overcome some limitations of DTI, including multi-tensor models, such as the tensor distribution function (TDF) [2], q-ball imaging and the orientation distribution function (ODF) [3], constrained spherical deconvolution [4], neurite orientation dispersion and density imaging (NODDI) [5], and freewater index (FW) [13] among others. Each model leads to its own set of scalar microstructural measures and many offer a richer understanding of WM microstructure than FA does. Which combination of measures best characterizes brain disease remains an open question, and depends on the disease examined, and the spectral and angular resolution of the available data. This question may have a different answer in different parts of the brain depending on the underlying changing pathology (e.g., pathological changes in gray/white matter interfaces or more central white matter tracts).

At the time of writing, around 20 microstructural measures have been proposed for single-shell dMRI. Microstructural measures derived from new dMRI models may carry even more information on WM microstructure including the geometry of diffusion anisotropy, diffusivity, complexity, estimated number of distinguishable fiber compartments, number of crossing fibers and neurite dispersion. Combining these in a classification task is challenging, and requires proper regularization. Here, we use a Total Variation-Lasso or TV-L1 regularization as a prior term in a logistic regression framework. The channel-wise TV term leads to linear models that are approximately spatially piecewise constant, giving the weight maps descriptive power to suggest both the regions and measures that are helpful in a disease classification task, while considering multiple measures together. We build on prior work with TV-L1 regularizers in neuroimaging; they have been used successfully for fMRI decoding and in electrophysiological studies [6].

The classification task examined here is to discriminate Alzheimer’s disease patients (AD) and healthy aging controls (NC), based on their dMRI data, by merging information from a range of complementary indices. A discriminative model in this setting may be useful as a disease biomarker, for drug trial enrichment and to help identify those most likely to decline in the future. In view of this, many studies describe WM microstructural differences between AD and NC [7], and some exploit WM metrics for classification [8,9,10]. By combining several measures in a classification task, we hope to generate a biomarker of disease that is “greater than the sum of its parts.”

2 Methods

2.1 Data Acquisition and Preprocessing

Baseline MRI, dMRI, and clinical data were downloaded from the ADNI database (adni.loni.usc.edu). Here we performed an analysis of dMRI data from 102 participants: 53 healthy controls (CN; mean age: 72.4 ± 6.0 years; 24 M/29 F), and 49 AD patients (mean age: 74.9 ± 8.7 years; 29 M/20 F).

All subjects underwent whole-brain MRI scanning on 3T GE Medical Systems scanners at 14 acquisition sites across North America. Anatomical T1-weighted SPGR (spoiled gradient echo) sequences (256 × 256 matrix; voxel size = 1.2 × 1.0 × 1.0 mm3; TI = 400 ms; TR = 6.98 ms; TE = 2.85 ms; flip angle = 11°), and dMRI (128 × 128 matrix; voxel size: 2.7 × 2.7 × 2.7 mm3; TR = 9000 ms; scan time = 9 min were acquired; 46 separate images were acquired for each dMRI scan: 5 images with no diffusion sensitization (b 0 images) and 41 diffusion-weighted images (DWI; b = 1000 s/mm2).

Images were preprocessed as in [7]. To summarize, raw dMRI images were corrected for motion and eddy current distortions, and T1-weighted images underwent inhomogeneity normalization. Extra-cerebral tissue was removed from both scan types. Each T1-weighted anatomical image was linearly aligned to a standard brain template (the down-sampled Colin27 [11]): 110 × 110 × 110, with 2-mm isotropic voxels). The diffusion images were linearly and then elastically registered [12] to their respective T1-weighted structural scans to correct for echo-planar imaging induced susceptibility artifacts. The gradient tables were corrected to account for the linear registration of the DWI images to the structural T1-weighted scan.

2.2 DMRI Reconstruction Models, Scalar Maps, and Spatial Normalization

For each subject, dMRI microstructural measures were computed from four different reconstruction models: DTI, TDF, NODDI and FW. Five measures were extracted from these models: FA and mean diffusivity (MD) from DTI, fractional anisotropy from TDF (FA-TDF), the orientation dispersion index (OD) from NODDI and the free water index (FW). We will not describe the well known DTI based FA and MD here, but will briefly describe the other three models:

The Tensor Distribution Function (TDF) represents the diffusion profile as a probabilistic mixture of tensors [2] allowing the reconstruction of multiple underlying fibers per voxel, together with a distribution of weights. We compute the voxel-wise TDF as the probability distribution function P(D) defined on all feasible 3D Gaussian diffusion processes in tensor space D:

$$ S\left(\boldsymbol{\mathfrak{q}}\right)=P\left(\mathbf{D}\right){e}^{\left(-t{\boldsymbol{\mathfrak{q}}}^T\mathbf{Dq}\right)}d\mathbf{D}, $$
(1)

where S is the measured intensity signal, \( \boldsymbol{\mathfrak{q}}= r\delta G \), where r, δ, and G are the gyromagnetic ratio, the duration of the diffusion sensitization, and the applied magnetic gradient vector, respectively. The number of detected peaks is estimated by examining the local maxima of the tensor orientation distribution (TOD), defined in the unit sphere along directions θ:

$$ \mathrm{TOD}\left(\theta \right)={\int}_{\!\!\!\lambda }P\left(\mathbf{D}\left(\theta, \lambda \right)\right) d\lambda, $$
(2)

where λ are the eigenvalues. The TDF-averaged eigenvalues of each fiber were calculated by computing the expected values along the principal direction of the fiber. From these eigenvalues a scalar TDF anisotropy (FA-TDF) is calculated as an extension of the standard FA formula:

$$\begin{array}{lll}\mathrm{FA} \, \mathrm{TDF}=\displaystyle\int \mathrm{TOD}\left(\theta \right)\ast \mathrm{FA}\left(\theta \right) d\theta\\ \qquad\qquad=\sqrt{\frac{{\left({\lambda}_1^{\prime}\left(\theta \right)-{\lambda}_2^{\prime}\left(\theta \right)\right)}^2+{\left({\lambda}_1^{\prime}\left(\theta \right)-{\lambda}_3^{\prime}\left(\theta \right)\right)}^2+{\left({\lambda}_2^{\prime}\left(\theta \right)-{\lambda}_3^{\prime}\left(\theta \right)\right)}^2}{2\left[{\lambda_1^{\prime}\left(\theta \right)}^2+{\lambda_2^{\prime}\left(\theta \right)}^2+{\lambda_3^{\prime}\left(\theta \right)}^2\right]}} \\ \qquad{\lambda}_i^{\prime}\left(\theta \right)=\frac{\int P\left(\mathbf{D}\left(\theta, \lambda \right)\right){\lambda}_i d\lambda}{\int P\left(\mathbf{D}\left(\theta, \lambda \right)\right) d\lambda}\end{array}$$
(3)

TheNeurite Orientation Dispersion and Density Imaging (NODDI) is a composite model that takes into account three compartments that affect water diffusion in the brain: the intracellular compartment, the extracellular compartment, and the cerebrospinal fluid (CSF) [5]. The intracellular compartment is modeled as cylinders with a radius of zero that represent the axons and dendrites of the brain tissue, which are jointly called neurites. The ODF of the intracellular compartment is modeled as a Watson distribution that can capture the dispersion orientation of coherent central white matter bundles as well as the incoherent neurites of the grey matter. The normalized intracellular compartment A ic is modeled as:

$$ {\mathit{A}}_{\mathit{ic}}={\int}_{{\mathbb{S}}^2}{\mathit{f}\left(\boldsymbol{\mathfrak{n}}\right)\mathit{e}}^{-\mathit{b}{\mathit{d}}_{\parallel }{\left(\boldsymbol{\mathfrak{q}}\cdot \boldsymbol{\mathfrak{n}}\right)}^2}\mathit{d}\boldsymbol{\mathfrak{n}} $$
(4)

Here, \( \boldsymbol{\mathfrak{q}} \) represents the gradient directions, b the b-value of the diffusion weighting, \( \boldsymbol{\mathfrak{n}} \) are the orientations of the cylinders with parallel diffusivity d || along which the signal is attenuated and \( f\left(\boldsymbol{\mathfrak{n}}\right) \) is the Watson distribution, which has two parameters (μ, \( \mathcal{K} \)) and is defined as:

$$ \mathit{f}\left(\boldsymbol{\mathfrak{n}}\right)=\mathit{M}{\left(\frac{1}{2},\frac{3}{2},\mathcal{K}\right)}^{-1}{\mathit{e}}^{\mathcal{K}{\left(\boldsymbol{\mu} \cdot \boldsymbol{\mathfrak{n}}\right)}^2} $$
(5)

Here, the distribution tends to be symmetric around the mean orientation μ, and M is Kummer’s confluent hypergeometric function. \( \mathcal{K} \) is called the concentration parameter. For \( \mathcal{K}>0 \), as \( \mathcal{K} \) increases the density along μ tends to concentrate. Once \( \mathcal{K} \) is estimated the orientation dispersion index (OD) is calculated as:

$$ \mathrm{OD}=\frac{1}{\pi } \arctan \left(\frac{1}{\mathcal{K}}\right) $$
(6)

OD goes from 0 to 1, the higher the value the more dispersed the neurites in a particular voxel. In our analyses below we used only the OD maps. The intracellular and extracellular volume fractions as well as the isotropic CSF volume fraction are not taken into account in our analyses. Zhang et al. demonstrated that the latter measures require more than one shell in order to be reliable, whereas the OD can be computed reliably with single shell data even with standard clinical acquisitionb-values of b = 1000 s/mm2 [5]. OD may be more informative than DTI, in areas with less organized patterns such as areas of multiple fiber crossings as well as towards the gray/white matter boundaries.

Free-Water Imaging (FW) estimates the contribution of freely diffusing water molecules to the diffusion signal with a bi-tensor model [13]. The first component of the model is the so-called tissue compartment that represents either grey matter or a bundle of the white matter. The second component reflects the free-water compartment, which is said to be proportional to the amount of CSF contamination, especially in areas of the white matter that are close to the ventricles. The free-water component is also expected to increase with neuroinflammation due to edema. The full model is defined as:

$$ {S}_{\boldsymbol{\mathfrak{q}}}\left(\mathbf{D},f\right)=f{e}^{\left(-b{\boldsymbol{\mathfrak{q}}}^T\mathbf{D}\boldsymbol{\mathfrak{q}}\right)}+\left(1-f\right){e}^{\left(-b{d}_w\right)}, $$
(7)

where S is the attenuated signal, \( \boldsymbol{\mathfrak{q}} \) are the applied diffusion gradient directions, b is the b-value of the diffusion weighting, D is the diffusion tensor and f is the fractional volume of the tissue compartment (0 < f ≤ 1). The second term is a fully isotropic tensor, where d w is the bulk diffusivity of water, which is constant at body temperature (3 × 10−3 mm2/s).

Voxel-wise maps of all five measures—FA, MD, FA-TDF, OD, and FW—were created for all 102 subjects; all subjects’ maps were spatially normalized to a custom ADNI- derived minimal deformation template (MDT). Template creation and spatial normalization was performed according to previously published voxelwise ADNI-DTI analyses [7].

2.3 Regularized Logistic Regression Classification

In general, the linear logistic regression model has the following classification function

$$ y=f\left(\mathbf{X},\boldsymbol{w},b\right)=F\left(\mathbf{X}\boldsymbol{w}+b\right) $$
(8)

Here \( \mathbf{X}\in {\mathfrak{R}}^{n\times p} \), n is the number of samples (subjects) and p is the number of features. As all the computations were performed within the MDT mask (193,586 ∼200,000 voxels), p is the number of voxels times the number of diffusion measures (five in this case). The parameters to be estimated are w and b, where w \( \in {\mathfrak{R}}^p \) is a p-dimensional vector, \( b\in {\mathfrak{R}}_n \) is the intercept and y ∈ {−1, 1} is the class label, in our case, to be the subject diagnosis. The regularized cost to be optimized is:

$$ \widehat{\boldsymbol{w}}=\mathit{\arg \;}\mathit{\min}\ \boldsymbol{\mathcal{L}}\left(y,F\left(\mathbf{X}\boldsymbol{w}+b\right)\right)+\lambda \boldsymbol{\mathfrak{J}}\left(\boldsymbol{w}\right),\kern0.5em \lambda \ge 0 $$
(9)

where \( \boldsymbol{\mathcal{L}} \) is the logistic loss function, \( \boldsymbol{\mathfrak{J}}\left(\boldsymbol{w}\right) \) is the regularization term and λ is the Lagrange multiplier. The intercept b is not regularized, and only depends on the loss function. We will simplify \( \boldsymbol{\mathcal{L}}\left(y,F\left(\mathbf{X}\boldsymbol{w}+b\right)\right) \) to \( \boldsymbol{\mathcal{L}}\left(\boldsymbol{w}\right) \). In our case, the standard TV-L1 norm cost becomes:

$$ \boldsymbol{\mathfrak{J}}\left(\boldsymbol{w}\right)=\left(1-\alpha \right){\left\Vert \boldsymbol{w}\right\Vert}_1+\alpha {\sum}_{j=1}^{N_m}\mathrm{TV}\left({\boldsymbol{w}}_{\boldsymbol{j}}\right),\kern0.5em \mathrm{TV}(y)= \left\Vert \nabla y\right\Vert, $$
(10)

where the first term is the LASSO or L1 cost, TV is the Total Variation penalty [6], w j is the weight map of a microstructural measure j, N m (=5 here) is the number of measures used and α is a constant that sets the desired tradeoff between L1 and TV terms. The L1 penalty encourages sparsity in the model, by setting most coefficients to zero. This penalty function suffers from some limitations when there is a large number of parameters p to fit, and few observations n, as LASSO selects at most n variables before it saturates. Further, if there is a group of highly correlated variables, then LASSO tends to select one variable from a group and ignores the others. On the other hand, the TV is defined as the L1 norm of the image gradient, which allows for sharp edges, encouraging the recovery of a smooth, piecewise constant weights map. This in turn allows us to interpret the weight maps as they may highlight clusters that can resemble anatomical regions.

We used the FISTA procedure [6] to find \( \skew{-3}\widehat{\boldsymbol{w}\ } \) (the estimated value for w). As the L1 terms are not smooth, a naïve gradient descent may not always converge to a good minimum. For this convex optimization, smooth and non-smooth terms are considered separately. The logistic loss and the logistic gradient are the smooth terms:

$$ \mathcal{L}\left(\boldsymbol{w}\right)=\frac{1}{n}{\sum}_{i=1}^n\mathit{\log}\left(1+{e}^{-{y}_i\left({\mathbf{X}}_i^T\boldsymbol{w}\right)}\right) $$
(11)
$$ \nabla \mathcal{L}\left(\boldsymbol{w}\right)=-\frac{1}{n}{\sum}_{i=1}^n\frac{y_i{\mathbf{X}}_i}{1+{e}^{y_i\left({\mathbf{X}}_i^T\boldsymbol{w}\right)}} $$
(12)

We used an eightfold nested cross-validation to tune the parameters α and λ.

3 Results

We were able to classify individuals into diagnostic groups (AD vs. NC) with an accuracy of 76.2%. We ran a parallel test by using only one measure (FA-DTI) and the prediction accuracy was 50%. As expected, the resulting maps of significant predictors showed cohesive regional patches of stable coefficients, a property that is favored by the TV regularization term. Figure 1 shows the resulting map for each of the five measures.

Fig. 1
figure 1

Regularized maps of useful diagnostic predictors, based on measures computed from diffusion MRI. (a) FA-DTI, (b) FA-TDF, (c) MD-DTI, (d) OD, (e) FW. Color bars show the value of the coefficients, from negative (blue) to positive (red), with zero in green

FW and MD showed similar predictive properties, with large regions of negative coefficients in the frontal lobes (both hemispheres). FA-DTI and FA-TDF also showed a similar pattern, but FA-TDF showed larger and more cohesive regions in the frontal white matter, especially in areas with fiber crossings. OD showed some similarities with the MD map although the regions with the larger coefficients (both positive and negative) tended to be smaller and more widespread. Many of these observations are in line with what is expected for each measure. The direction of the coefficients is also important to note. It is expected that the anisotropy of the white matter tends to decrease in AD compared to healthy aging controls, but MD, FW and OD on the other hand tend to increase with white matter disruption.

4 Discussion

In this article, we evaluated the utility of the TV-L1 prior logistic regression to assess the ability of multiple dMRI reconstruction methods to simultaneously distinguish alterations in WM microstructure between people with AD and matched healthy controls. We computed five dMRI derived microstructural measures from four different reconstruction models that were used together in a regularized classification framework and we were able to successfully classify AD from healthy controls and to derive spatially coherent discrimination patterns across the entire brain for each measure.

AD pathology includes disturbances in the brain’s WM pathways including loss of axons, myelin sheaths, and oligodendroglial cells, which may not all be detected by using DTI based descriptors alone. Machine learning for classification based on dMRI features has been focused mainly on DTI derived measures; although HARDI derived measures have also been explored [19, 20]. Volumetric measures, including hippocampal volume, gray matter volume from voxel-based morphometry, and cortical thickness [14,15,16, 18], have effectively classified AD patients, but few studies have used dMRI-derived biomarkers for classification purposes. Most of these studies have used DTI based measures: several used voxel-wise features from DTI maps, using methods such as Pearson correlation and ReliefF for feature reduction [8,9,10], reporting classification accuracies of >90%. In [17], tractography-based connectivity metrics based on fiber count, FA-DTI, and diffusivity were used for SVM classification, reporting an accuracy of 88%. Clearly, these accuracies depend on the problem and dataset used, and are not directly comparable with one another. Spatial and anatomical regularization for classification purposes have also been tested on AD discrimination against controls by Cuingnet et al. [18]. Here they achieved improved classification accuracies by using this type of regularization on cortical features and producing discriminatory parcellated maps of the cortex highlighting the brain regions traditionally compromised in AD.

Here we evaluated 102 subjects and were able to reach a relatively high classification accuracy for a white matter study of AD. Although our approach did not necessarily “beat” prior classification results, our goal was to compare the relative utility of multiple metrics for classification, which leads to some insight on how the disease may affect different fiber properties. Moreover, it was important to see if these measures might complement and add to the information provided by DTI measures—particularly in regions outside the coherent WM. Many dMRI measures are correlated with each other to some extent, but each captures the microstructure slightly differently, and at the various spatial locations, there may be greater sensitivity to detecting subtle changes with one measure versus another.

In conclusion, different reconstruction models and their respective scalar descriptors provide distinct micro-anatomical features, which differ in classification value by brain region. Together these estimates may improve brain-wide classification and may overcome the need to compute localized statistically determined regions of interest, and allow us to observe microstructural changes in the entirety of the brain. We made use of the main functionality of the TV prior, namely its denoising and smoothing capabilities across the image. This is essential in this context since single voxels prove to be very noisy and neighboring anatomy is presumably similar. Future work should compare other classification methods and improve estimates by incorporating tissue volume differences. We will also test if dMRI metrics can contribute to leading classification approaches based on biomarkers such as hippocampal volume, amyloid deposition, and tensor-based morphometry.