Introduction

Diffusion MRI (dMRI) data is widely used to study several brain disorders, such as Alzheimer’s disease, schizophrenia etc. Several multi-center studies have acquired dMRI data at different sites. However, the inter-scanner variability poses a potential problem for joint analysis of these data sets (Matsui 2014; Magnotta et al. 2012; Dariya et al. 2016). This inter-site variability in the measurements can come from several sources, such as the number of head coils used (16, 20 or 32 channel head coil), sensitivity of the coils, the imaging gradient non-linearity, the magnetic field inhomogeneity, the differences in the algorithms used to reconstruct the data, and other scanner related factors (Kochunov et al. 2014). These factors lead to non-linear changes in the data as well as in the estimated diffusion measures such as fractional anisotropy (FA). Thus, aggregating data sets from different sites is challenging due to the inherent differences in the acquired images from different scanners (Giannelli et al. 2014). Although the inter-site variability can be minimized by acquiring data using similar type of scanners (same vendor and version) with similar pulse sequence parameters (Cannon et al. 2014), many recent studies have shown that there still exist large differences between diffusion measurements from different sites (Kochunov et al. 2014; Mirzaalian et al. 2015, 2016). Specifically, the inter-site variability in FA and mean diffusivity (MD) is not uniform over the entire brain, but is tissue specific as well as region specific (Mirzaalian et al. 2015, 2016). Thus, harmonizing dMRI data across sites is imperative for joint analysis of the data.

There are three major categories of methods that have addressed the issue of dMRI data pooling (not necessarily harmonization). The first category of methods use Meta Analysis (Salimi-Khorshidi et al. 2009; Jahanshad et al. 2013; Kochunov et al. 2014), which involves combining z-scores of a given diffusion measure (say FA) from all sites to determine group differences. However, the subject population at each site may not be sufficient to capture the variance of the entire population, a critical requirement to ensure proper pooling and analysis of the z-scores (which depends on the variance and not just the mean). The second category of methods use Statistical Covariates to account for site-specific differences (Forsyth and Cannon 2014; Venkatraman et al. 2015). For example, the method of Kochunov et al. (2014) uses z-scores from each site and then regresses-out site specific differences using statistical covariates. In general, a linear model is used to account for these site-specific differences, which may not be accurate enough if one wants to analyze fiber tracts that travel between distant brain regions, as regional variations are not taken into account in this framework. Finally, both of the methodologies mentioned above correct for scanner differences at the last stage of the analysis using specific dMRI measures based on a certain model of diffusion.

Recently, we proposed a third category of method (Mirzaalian et al. 2015, 2016), which harmonized the dMRI data in a model independent manner. We harmonized the raw dMRI signal appropriately in different brain regions and tissue types. In this work, we further build on this methodology, but significantly simplify it and address some of its limitations as discussed below.

Our contribution

In this paper, we provide an extension of our previous work (Mirzaalian et al. 2015, 2016) and address some of its limitations by using a registration-based framework. The main contributions of this work are: i) The current work has no dependence or requirement of an a-priori computed Freesurfer (or any other) segmentation to obtain correspondence between different brain regions across subjects. In fact, this was one of the challenges of our earlier work, where a separate mechanism was devised to correct all errors in the Freesurfer segmentation, which were quite common. For example, several voxels at the boundary between white and gray would be easily misclassified by Freesurfer once it was transformed to the diffusion MRI space of each subject due to limited resolution. This correction was done separately for each site and had to be tuned for each region. These limitations don’t exist in the current work; ii) In our earlier work, a single group level difference between sites for each SH order was computed for each region separately, which was then uniformly used to update all voxels in that particular region. In comparison, the proposed framework is voxel-wise and thus allows to update the differences at a voxel level, albeit in a smooth manner. This can be clearly seen in Fig. 2, where both voxel wise and region-wise differences in RISH features are displayed for all the sites; iii) In our earlier work, two levels of mapping were required: 1. Region-wise, 2. Voxel-wise. In the current framework, this is simplified and only a voxel-wise mapping is directly computed. iv). Using synthetic data, we also show that the proposed method is robust and preserves the changes in diffusion measures due to subtle pathology.

Thus, the proposed method is a much more simplified version of our earlier work and is quite easy to use and implement, without the requirement for any kind of parameter tuning or dependence on Freesurfer or any other kind of segmentation algorithm. The code will be publicly available soon. Please contact the senior author on this paper for further details.

Method

Figure 1 shows an outline of our dMRI data harmonization procedure. Details about each of the steps in this procedure are given in the following sections.

Fig. 1
figure 1

Outline of the proposed method for inter-site dMRI data harmonization. The reference and the target sites are shown in green and red, respectively. Given the RISH features, we start by performing multi-modal registration between all data sets. Next, we compute voxel-wise population average of the RISH features for each site and also the group differences (Δ) between the sites. By unwarping the computed differences to a given subject in a target site, we map the SH coefficients, which are then used to harmonize the dMRI signal

Multi-modal image registration

Any dMRI signal (at a single b-value) can be represented accurately in a basis of spherical harmonics. Rotation invariant features derived from this representation can be used to harmonize the diffusion signal as these features do not depend on the orientation of the white matter fibers or gray matter tissue, but only on the frequency content. Thus, the signal can be modified by changing the rotation invariant spherical harmonic (RISH) features, without changing the underlying fiber orientation (and hence the connectivity) of each subject. Consequently, we use the RISH features to harmonize dMRI data.

We start by computing a set of RISH features at each voxel from the dMRI signal S=[s 1...s G ]T along G unique gradient directions. This signal can be compactly represented in a basis of spherical harmonics (SH) with coefficients C i j given by: \( S\approx {\sum }_{i}{\sum }_{j} C_{ij} Y_{ij}, \) where Y i j are the SH basis functions of order i and degree j. Several RISH features \(\mathcal {F}\) at each voxel can be computed as follows (Mirzaalian et al. 2015; Descoteaux et al. 2007)Footnote 1 :

$$ \mathcal{F}=[\Vert C_{o}\Vert^{2}\Vert C_{2}\Vert^{2}...\Vert C_{8}\Vert^{2}] \text{~~~where:}\Vert C_{i}\Vert^{2}=\sum\limits_{j=1}^{2 i+1}(C_{ij})^{2}. $$
(1)

These RISH feature images are then used within a registration framework with data from all subjects across all sites to create a single template. There are 5 RISH features for SH of up to order 8; each RISH feature can be represented as a scalar image (see Fig. 2). Thus, for each subject we have 5 image volumes (RISH feature images), which we term as multi-modal images as they capture different aspects (frequency) of the diffusion signal. Our multi-modal template (consisting of the 5 RISH feature images) is computed using the ANTs algorithm (Avants et al. 2014) (i.e., vector-image registration of RISH feature images).

Fig. 2
figure 2

Regionwise (R) and voxelwise (V) RISH features for different SH orders and sites

In the template space, given the registered RISH images, we approximate the expected value of the voxel-wise RISH features as the sample mean over the N t subjects for the t th site using:

$$\mathbb{E}_{t}([\Vert C_{i}\Vert^{2}])\approx\sum\limits_{n=1}^{N_{t}} [\Vert {C_{i}^{n}}\Vert^{2}]/N_{t}, $$

where \(\Vert {C_{i}^{n}}\Vert ^{2}\) is the RISH feature of order i for the n th subject. Unlike the method of Mirzaalian et al. (2015) and Mirzaalian et al. (2016), where two separate mappings are computed, one at the ROI level and another one at the voxel level, in this work, we only compute one mapping at each voxel to harmonize the signal. In Fig. 2, we show examples of voxel-wise (using the registered images) and region-wise (using Freesurfer label maps) RISH features for different SH orders and different sites. As can be seen in these figures, the RISH features vary significantly across the brain as well as in different tissue types (white matter, neocortical, subcortical). Comparing the amount of energy of the signal at different orders in Fig. 2, it can be seen that the energy at order 8 is 500 times smaller than the one at order 0. Therefore, using SH order of up to 8 in our pipeline, we capture 99.9 % of the energy of the signal. This is also consistent with all works that have used spherical harmonics, where they have also truncated the SH basis at order 6 or 8, (Descoteaux et al. 2007; Özarslan et al. 2006; Tournier et al. 2007).

Mapping voxel-wise RISH features between sites

As part of the diffeomorphic registration procedure to compute a template, we also obtain the deformation field that maps each voxel location ν to π n (ν), where π n is a diffeomorphism for subject n. Further, the voxel-wise expected value of each of the RISH features for the target and reference site \(\{\mathbb {E}_{r},\mathbb {E}_{t}\}\) are also computed in the template space. Using a procedure analogous to the one used in Mirzaalian et al. (2015, 2016), harmonization is achieved by scaling each of the SH coefficients based on the difference between the group means of the two sites (for matched set of subjects). This ensures that only the shape of the signal changes, but not its orientation (a critical requirement to preserve subject-specific anatomical connectivity). This was also numerically verified in Mirzaalian et al. (2015), where it was shown that this scaling procedure does not change the orientation of the underlying fiber orientation distribution function (fODF).

Given the computed \(\{\mathbb {E}_{r},\mathbb {E}_{t}\}\), we harmonize the signal by scaling the SH coefficients of the signal at ν by:

$$ \pi(C_{ij}(\nu^{\prime}))= \sqrt{\frac{ \Vert C_{i}(\nu^{\prime}) \Vert^{2} +\mathbb{E}_{r}(\nu)-\mathbb{E}_{k}(\nu) }{\Vert C_{i} (\nu^{\prime}) \Vert^{2}}} C_{ij}(\nu^{\prime}) $$
(2)

where \(\nu ^{\prime }={\Pi }_{n}^{-1}(\nu )\). The final harmonized diffusion signal at ν is then computed using:

$$ \hat{S}(\nu^{\prime})=\sum\limits_{i}\sum\limits_{j} \pi(C_{ij}(\nu^{\prime})) Y_{ij}. $$
(3)

We applied this procedure to harmonize data from several sites as described in the next section.

Experiments and results

Diffusion MRI data was acquired at seven different sites (two GE, three Philips, and two Siemens scanners) on a group of matched healthy subjects. All the healthy subjects were matched for age and handedness to the best possible extent. Demographic and scanner details are given in Table 1. All data sets had a spatial resolution of 2 mm isotropic voxel size and a b-value of 1000 s/m m 2. Since the subjects were matched across all the sites, at a statistical group level, we do not expect to see statistical biological differences. Therefore, it is reasonable to hypothesize that the differences in the RISH features and standard diffusion measures are only due to scanner related inconsistencies. To validate our hypothesis, we used a paired t-test to compute p-values of RISH features and standard diffusion measures (such as MD, FA, and generalized fractional anisotropy (GFA)) between an arbitrarily chosen reference site and all of the other (target) sites. Tables 2, 3 and 4 show that prior to harmonization, significant statistical differences exist between sites in Freesurfer defined ROIs for all diffusion measures. However, using the proposed multi-modal registration based harmonization procedure, all statistical site differences in each of the Freesurfer ROIs are removed. For the sake of comparison, we included the results from our earlier work (Mirzaalian et al. 2016) on the same set of subjects, but using the Freesurfer ROI based method in the appendix section (see Table 7). As can be seen, our results are comparable to those presented in Mirzaalian et al. (2016).

Table 1 Scanner details and demographic information for each site (M - Male, F - Female, R - right handed, L - left handed)
Table 2 P-values in Freesurfer defined regions for MD before and after data harmonization

To ensure our method is unbiased and robust, we used an independent voxel-wise method to test the robustness of our harmonization procedure. We used tract-based spatial statistics (TBSS) (Smitha et al. 2006) to demonstrate that scanner related differences in FA and GFA, which existed prior to data harmonization are practically removed by using the proposed harmonization procedure; see Fig. 3. An interesting point to note is that, significant FA differences in the centrum-semiovale region exist prior to harmonization between the two Siemens scanners (Site#6), but not in GFA indicating that FA is a poor metric to use in crossing fiber regions. However, all differences in FA and GFA are removed after harmonization (Table 3).

Fig. 3
figure 3

TBSS results using FA and GFA for different target sites before (a-f) and (g) after harmonization. The yellow-red colormap displays p-values less than 0.05

Table 3 p-values for FA before and after harmonization in Freesurfer ROIs

Validation on unseen subjects

In the above experiments, all subjects were used to obtain the site differences \(\{\mathbb {E}_{r} - \mathbb {E}_{t}\}\) in RISH features, which were then used for harmonization. To test the validity of our approach on a set of new subjects, we created two distinct data sets, one for training and one for testing from two different sites. We used 70 % of the subjects in the reference and the target sites (Site#1) to learn the harmonization parameters using (2) and computed the p-values before and after harmonization for rest of the 30 % of the subjects, which were excluded from the training stage. Note that, in this experiment, the data in the training/testing groups at the two sites were age-matched. Computed p-values are reported in Table 5, which are very similar to results shown in Tables 24. Thus, the proposed method could be used in a true data harmonization scenario, at least when the acquisition protocol is similar across sites. Note that, our numerical results in Tables 25 are comparable to those in our earlier work (Tables 7 and 8 in Appendix).

Table 4 P-values in Freesurfer defined regions for GFA before and after data harmonization
Table 5 P-values before and after harmonization for MD, FA, GFA for different sites and ROIs using test data excluded from training

Validation on a traveling subject

Using the learnt parameters in our pipeline, we harmonized the images of a traveling subject for which there existed data acquired on six different sites (all sites except site #3). The computed region-wise p-values for {MD, FA, GFA } between voxels from each Freesurfer ROI before and after data harmonization are reported in Table 6, which are all above 0.05 after harmonization, indicating that scanner specific differences were removed in this traveling subject. We should point out that statistically, significant differences in FA, MD and GFA existed prior to harmonization in this traveling subject as seen in Table 6.

Table 6 P-values in Freesurfer defined regions for MD (top), FA (middle), and GFA (bottom) before and after data harmonization for the traveling subject

Synthetic validation in the presence of pathology

To demonstrate the robustness of the harmonization procedure in the presence of pathology, we did some synthetic experiments. We generated three synthetic images labeled as {S r,S t,1,S t,2}, where i) S r is a control subject at the reference site; ii) S t,1 is a control subject at the target site; and iii) S t,2 is a synthetically generated subject with pathology in white matter at the target site. To generate S t,1, we first introduced a simple warping (rotation) to S r and added some bias to the second order RISH features of S r; the bias was added to voxels within a mask denoted by Mask1 (Fig. 4). This is similar to a data set where the data acquired at the target and reference site are different, as is typically the case for in-vivo data. In particular, the FA in the simulated white matter region for S r was 0.79, for S t,1 was 0.82, while for S t,2 it was 0.79 (lower FA due to pathology). Various levels of rician noise (standard deviation of noise ranging from 0 to 0.2) was added to test the effect of noise on the harmonization procedure.

Fig. 4
figure 4

The procedure to evaluate the effect of harmonization in the presence of pathology. Synthetic images {S r, S t,1,S t,2} are generated and feature differences of the diseased and control subjects at the target site before (i.e. S t,1 vs S t,2) and after harmonization (i.e. \(\hat {S}^{t,1}\) vs \(\hat {S}^{t,2}\)) are reported in Fig. 5, which indicate that our method would preserve the differences during harmonization of the data

The data, S t,2 was generated by adding some bias to the second order RISH features of S t,1 within voxels given by another mask, Mask2; in fact, we assumed that the voxels within Mask2 were affected by disease. The second order RISH features of {S r, S t,1,S t,2} and the masks are shown in Fig. 4. We first registered S t,1 to S r to learn the spatial mapping π (for different noise levels) followed by computing the region-wise mean of the RISH features, which were used to obtain the harmonized images \(\{\hat {S}^{t,1},\hat {S}^{t,2}\}\), respectively.

In Fig. 5, for the voxels within Mask2 with different levels of rician noise, we report the differences between: i) S t,1 and S t,2; and ii) \(\hat {S}^{t,1}\) and \(\hat {S}^{t,2}\) for each of the following measures: ∥C 22, FA, and GFA. Our goal is to demonstrate that the differences in FA and GFA due to disease are preserved during the harmonization procedure, i.e., \({S}^{t,1} - S^{t,2} \approx \hat {S}^{t,1} - \hat {S}^{t,2}\) .From the plots in Fig. 5 with 100 different realizations for each noise level, we see that subtle differences in diffusion measures are preserved during the harmonization procedure. Thus, this method could potentially be used to harmonize dMRI data, by first “learning” the harmonization parameters from a group of matched control subjects followed by applying the same parameters to the “diseased” cases.

Fig. 5
figure 5

Difference of ∥C 22, FA, and GFA between: i) S t,2 and S t,2; (original data) and ii) \(\hat {S}^{t,2}\) and \(\hat {S}^{t,2}\) (harmonized data) for different levels of rician noise. It can be seen that the differences computed between the normal and patient cases at the target site are preserved after data harmonization

Conclusion and limitations

In this work, we proposed a registration based framework to harmonize the raw dMRI signal from different sites in a subject-dependent manner, by removing scanner specific differences from the signal.

In our earlier region-wise harmonization method, we required both the DWI images and a structural (T1-weighted) image of the subjects; the structural images were used to perform the region-wise segmentation task. Then, the label maps were registered to the DWI space to obtain inter-subject correspondences. Further, we automatically (albeit with some parameter tuning) had to sub-segment large regions into smaller regions to be able to properly harmonize the data. However, our new registration framework is free of: i) requirement of a structural image; ii) segmentation of the structural image; iii) registration between the structural and the DWI images; and iv) correction of Freesurfer segmentation errors. This significantly simplifies the entire processing pipeline while maintaining the accuracy of the method.

Using synthetic experiments, we demonstrated that, once the signal is harmonized using data from healthy subjects, it can then be used to map another cohort of diseased subjects without altering the signal due to disease or pathology. The proposed method is model independent and directly maps the signal to the reference site. We also validated our approach on a group of new subjects not used to “learn” the mapping transformation, demonstrating the robustness of the proposed approach on new unseen data sets. Thus, the method can be of great use to aggregate data from multiple sites for joint analysis of a large sample of data.

An ideal scenario in which the proposed method could be used is when a few traveling human phantoms are available from all sites, scanned within a very short period of time. In this case, the scanner specific differences can be obtained from these traveling subjects and subsequently used for data harmonization. A similar scheme, albeit using only the dMRI derived scalar measures of FA and MD obtained from a limited single tensor model was used in Pohl et al. (2016). In contrast, our method will allow to harmonize the acquired dMRI data without any modeling assumptions.

Nevertheless, a comprehensive in-vivo validation study needs to be done to ensure that the dMRI signal due to disease is preserved during the harmonization procedure (we only did synthetic validation, which shows encouraging results). Further, the effect of using this procedure in the case of extreme pathologies such as brain tumors needs to be evaluated. The proposed method cannot be used for DSI data sets, however, it can be used to separately harmonize each b-value shell for multi-shell diffusion data.