HFPRM: Hierarchical Functional Principal Regression Model for Diffusion Tensor Image Bundle Statistics

Zhang, Jingwen; Huang, Chao; Ibrahim, Joseph G.; Jha, Shaili; Knickmeyer, Rebecca C.; Gilmore, John H.; Styner, Martin; Zhu, Hongtu

doi:10.1007/978-3-319-59050-9_38

Jingwen Zhang²⁰,
Chao Huang²⁰,
Joseph G. Ibrahim²⁰,
Shaili Jha²¹,
Rebecca C. Knickmeyer²²,
John H. Gilmore²²,
Martin Styner²² &
…
Hongtu Zhu^20,23

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10265))

Included in the following conference series:

International Conference on Information Processing in Medical Imaging

6637 Accesses
1 Altmetric

Abstract

Diffusion-weighted magnetic resonance imaging (MRI) provides a unique approach to understand the geometric structure of brain fiber bundles and to delineate the diffusion properties across subjects and time. It can be used to identify structural connectivity abnormalities and helps to diagnose brain-related disorders. The aim of this paper is to develop a novel, robust, and efficient dimensional reduction and regression framework, called hierarchical functional principal regression model (HFPRM), to effectively correlate high-dimensional fiber bundle statistics with a set of predictors of interest, such as age, diagnosis status, and genetic markers. The three key novelties of HFPRM include the simultaneous analysis of a large number of fiber bundles, the disentanglement of global and individual latent factors that characterizes between-tract correlation patterns, and a bi-level analysis on the predictor effects. Simulations are conducted to evaluate the finite sample performance of HFPRM. We have also applied HFPRM to a genome-wide association study to explore important genetic variants in neonatal white matter development.

Knickmeyer was partially supported by the National Institutes of Health grant MH083045.

Gilmore was partially supported by the National Institutes of Health grants MH064065, MH070890, and HD053000.

Styner was partially supported by the National Institutes of Health grant EB005149-01.

Zhu was partially supported by the National Institutes of Health grant MH086633, the National Science Foundation grants SES-1357666 and DMS-1407655, as well as a senior investigator grant from the Cancer Prevention Research Institute of Texas.

Access provided by CONRICYT-eBooks. Download conference paper PDF

A Longitudinal Functional Analysis Framework for Analysis of White Matter Tract Statistics

dMRI: Diffusion Magnetic Resonance Imaging as a Window onto Structural Brain Networks and White Matter Microstructure

Pandora: 4-D White Matter Bundle Population-Based Atlases Derived from Diffusion MRI Fiber Tractography

Article 16 November 2020

Keywords

1 Introduction

Scientifically, investigation in the connectional organization of human brain and its variation across subjects is a critical step to understand the pathology of many neuro-related disorders. Diffusion-weighted MRI offers a non-invasive approach to study the tissue structure of white matter fiber bundles in vivo, including both the geometric shape and the diffusion properties [2, 6, 9, 12, 17, 24, 27]. Delineating diffusion statistics along fiber bundles may help identify structural connectivity abnormalities across different spatial-temporal scales. It could eventually inspire new approaches for disease preventions, diagnoses and clinical treatments.

Group analysis of fiber bundle statistics poses remarkable computational and mathematical challenges to existing statistical methods. The first challenge is to efficiently and simultaneously study multiple fiber bundles with heterogeneous geometric structures and variation patterns. The second challenge is to correlate fiber bundle statistics with a large number of covariates, such as millions of genetic markers. This challenge is motivated by the demand to carry out a genome-wide association study on fiber bundle statistics. Voxel-wise methods [21] and single tract analysis [8, 26, 28] suffer from performing massive multiple comparison adjustments, which would severely reduce detection power. The third challenge is to properly handle the potential correlation among multiple tracts and to disentangle tract-specific information from global information shared by a large portion of fiber bundles.

The aim of this paper is to develop a hierarchical functional principal regression model (HFPRM) framework to address the three challenges discussed above. HFPRM consists of three statistical models, including a varying coefficient model (VCM), a latent factor analysis (LFA) procedure, and a multivariate regression model (MRM). The path diagram of HFPRM is presented in Fig. 1. The VCM not only captures the functional structure of fiber bundle statistics for each single tract, but also maps the heterogeneous geometric structure of multiple fiber bundles onto a common coordinate system. The LFA is applied to characterize potential inter-tract correlation across multiple bundles. It allows us to explicitly identify both tract-specific and global latent signals. The integration of VCM and LFA dramatically reduces the dimension of fiber bundle statistics. Finally, using MRM, we are able to examine the effect of selected predictors on both global level and individual level.

In Sect. 2, we introduce the general framework of HFPRM and propose a two stage estimation procedure to study both global effect and individual tract effect. In Sects. 3 and 4, we use numerical simulations and a real data example to examine the finite sample performance of HFPRM. Section 5 concludes with some remarks.

2 Methods

2.1 Data Structure

Suppose that we obtain a data set with clinical, genetic variables as well as DTI statistics along M fiber bundles from n subjects. For the m-th fiber bundle, $m=1,\cdots ,M$, we use $s_m\in [0,S_m]$ to denote the arc length of any point relative to a fixed end point, where $S_m$ is the longest arc length on the tract. For the i-th subject where $i=1,\cdots ,n$, $y_{i,m}(s_m)$ denotes a specific diffusion statistics observed at arc-length $s_m$ along the m-th tract, and ${{\varvec{x}}}_i$ is a $q \times 1$ vector of covariates.

2.2 HFPRM

HFRPM is proposed to study the association between diffusion properties (e.g., FA, MD or RD) along M fiber bundles with a set of covariates, such as age, gender, and genetic markers. It consists of three key components, a varying coefficient model (VCM), a latent factor analysis (LFA) procedure, and a multivariate regression model (MRM).

The VCM describes the functional association between $\{y_{i,m}(s_m): s_m\in [0, S_m]\}$ and ${{\varvec{x}}}_i$ for a single tract. It admits the following form,

$$\begin{aligned} y_{i,m}(s_m)=\mu _m(s_m)+\eta _{i,m}(s_m)+e_{i,m}(s_m), \end{aligned}$$

(1)

where $\mu _m(s_m)$ is the function of population mean, $\eta _{i,m}(s_m)$ is an individual function characterizing subject-specific spatial variations along the m-th tract, and $e_{i,m}(s_m) $ is the measurement error. Let $SP(0,\varSigma )$ represent a stochastic process with mean zero and covariance operator $\varSigma (s_m, s_m')$. It is assumed that $\eta _{i,m}(s_m)$ and $e_{i,m}(s_m)$ are mutually independent and identical copies of stochastic processes $SP(0,\varSigma _{\eta _m})$ and $SP(0,\varSigma _{e_m})$ respectively, in which $\varSigma _{e_m}(s_m,s_m')=\sigma ^2_{e_m}(s_m)\mathbf{1}(s_m=s_m')$ and $\mathbf{1}(\cdot )$ is an indicator function.

The major challenge to simultaneously study M fiber bundles is the heterogenuity in their geometric structures. It is necessary to find a common coordinate system for $\{\eta _{i,m}(s_m)\}_{m=1}^M$. Specifically, we use functional principal component analysis (fPCA) to extract the key features in $\eta _{i,m}(s_m)$. Based on Mercer’s theorem, $\varSigma _{\eta _m}(s_m,s_m')$ admits a spectral decomposition as follows:

$$\begin{aligned} \varSigma _{\eta _m}(s_m,s_m') = \sum _{d=1}^{+\infty } \lambda _{md} \phi _{md}(s_m) \phi _{md}(s_m'), \end{aligned}$$

(2)

where $\{\lambda _{md} \ge 0\}$ are eigenvalues in descending order with $\sum _{d=1}^{\infty } \lambda _{md} < \infty $ and $\{\phi _{md}(s_m)\}$ are the corresponding orthonormal eigenfunctions. Using Karhunen-Loeve expansion [13, 16], $\eta _{im}(s_m)$ can be expressed as

$$\begin{aligned} \eta _{i,m}(s_m)= & {} \sum _{d=1}^{+\infty } z_{i,md}\phi _{md}(s_m)~~\text{ with }~~ z_{i,md}=\displaystyle \int _0^{S_m} \eta _{i,m}(s_m) \phi _{md}(s_m) ds_m. \end{aligned}$$

(3)

Individual function $\eta _{i,m}(s_m)$ can then be equivalently represented by a set of functional principal component (fPC) scores $\{z_{i, md}: d=1, \ldots , \infty \}$. In practice, a relatively small number of fPC scores would account for the majority of variation in $\eta _{i,m}(s)$. Therefore, we can approximate $\eta _{i,m}(s_m)$ by a finite vector ${{\varvec{z}}}_{i,m}=(z_{i,m1},\ldots ,z_{i,mD})^T$ of dimension D. For notational simplicity, it is assumed that D is the same across all M bundles. Now we use ${{\varvec{z}}}_{i,m}$ to integrate information across M bundles and denote ${{\varvec{z}}}_i$ as a $p \times 1$ long vector that concatenates all ${{\varvec{z}}}_{i,m}$s together, where $p=DM$.

A LFA is then proposed to account for potential inter-tract correlation across multiple bundles. Specifically, ${{\varvec{z}}}_i$ is assumed to have the following latent factor structure,

$$\begin{aligned} {{\varvec{z}}}_i={{\varvec{\varLambda }}}{{\varvec{f}}}_i+{{\varvec{u}}}_i, \end{aligned}$$

(4)

where ${\varvec{\varLambda }}$ is a $p\times L$ loading matrix and ${{\varvec{f}}}_i$ and ${{\varvec{u}}}_i$, respectively, represent global and individual latent factors. When there exist homogeneous signal patterns across multiple fiber bundles, L is expected to be much smaller than p. Global factor ${{\varvec{f}}}_i$ thus allows us to study the shared pattern in a low dimensional space. And tract-specific pattern can also be captured by each component in ${{\varvec{u}}}_i=({{\varvec{u}}}_{i,1},\cdots ,{{\varvec{u}}}_{i,M})^T$.

Finally, a MLM is introduced to correlate the global and individual latent factors with covariate ${{\varvec{x}}}_i$,

$$\begin{aligned} {{\varvec{f}}}_i= & {} {{\varvec{B}}}_f^T{{\varvec{x}}}_i+{{\varvec{\epsilon }}}_{f,i} ~~\text{ and }~~ {{\varvec{u}}}_{i,m} ={{\varvec{B}}}_{u_m}^T {{\varvec{x}}}_i+{{\varvec{\epsilon }}}_{u_m,i}, ~ \text {for } m =1 ,\cdots , M, \end{aligned}$$

(5)

where ${{\varvec{B}}}_f$ and ${{\varvec{B}}}_{u_m}$ are, respectively, $q\times L$ and $q\times D$ coefficient matrices and ${{\varvec{\epsilon }}}_{f,i}$ and ${{\varvec{\epsilon }}}_{u_m,i}$ are residual terms. Using (5), we are able to perform a hierarchical analysis on both global level and individual level.

2.3 Estimation and Inference Procedure

In practice, diffusion statistics are observed on discrete grid points along each tract. For the m-th tract, assume $y_{i,m}(s_m)$ is observed on sample point set $\mathcal {S}_m=\{s_{m,1},\ldots ,s_{m,k},\ldots ,s_{m,K_m}\} \subset [0,S_m]$, we use the following two-stage procedure to estimate fPC scores $\mathbf {Z}=\{{{\varvec{z}}}_i\}_{1\le i\le n}$, global factors $\mathbf {F}=\{{{\varvec{f}}}_i\}_{1\le i\le n}$ and individual factors $\mathbf {U}=\{{{\varvec{u}}}_i\}_{1\le i\le n}$.

Stage I: For each tract, $\mu _m(s_m)$ and $\eta _{i,m}(s_m)$ are estimated from (1) and functional principal component analysis is applied to calculate $\hat{\phi }_{md}(s_m)$ and $\hat{{{\varvec{z}}}}_i$,
Stage II: Perform factor analysis on $\hat{{{\varvec{z}}}}_i$ to extract global factor $\hat{{{\varvec{f}}}}_i$ and individual factor $\hat{{{\varvec{u}}}}_i$. Regression and hypothesis testing can then be applied on $\hat{{{\varvec{f}}}}_i$ and $\hat{{{\varvec{u}}}}_i$ respectively.

Details of the two stages are given below.

In Stage I, to estimate the mean curve from model (1), we apply the local linear kernel smoothing technique. $\mu _{m}(s_m)$ is first approximated by the following taylor expansion,

$$\begin{aligned} \mu _{m}(s_{m,k})\approx \mu _m(s_m)+d\mu _m(s_m)(s_{m,k}-s_m). \end{aligned}$$

(6)

Let K(s) be a predetermined smoothing kernel and denote $K_h(s)=\frac{1}{h}K(\frac{s}{h})$ as the rescaled function with bandwidth h, $\hat{\mu }_m(s_m)$ and $d\hat{\mu }_m(s_m)$ can be estimated as the minimizers of the following weighted least square function,

$$\begin{aligned} \sum _{i=1}^n \sum _{k=1}^{K_m}[y_{i,m}(s_{m,k})-\mu _m(s_m)-d\mu _m(s_m)(s_{m,k}-s_m)]^2K_h(s_{m,k}-s_m), \end{aligned}$$

(7)

and solution $\hat{\mu }_m(s_m)$ is smooth curve with local linearity. More complicated polynomial structure can be applied using higher order expansion if necessary.

Similarly, we expand individual function $\eta _{i,m}(s_m)$ for subject i as follows,

$$\begin{aligned} \eta _{i,m}(s_{m,k})\approx \eta _{i,m}(s_m)+d\eta _{i,m}(s_m)(s_{m,k}-s_m). \end{aligned}$$

(8)

The corresponding weighted least square function is given by,

$$\begin{aligned} \sum _{k=1}^{K_m}[y_{i,m}(s_{m,k})-\hat{\mu }_{m}(s_{m,k})-\eta _{i,m}(s_m)-d\eta _{i,m}(s_m)(s_{m,k}-s_m)]^2K_h(s_{m,k}-s_m). \end{aligned}$$

(9)

When smoothed individual functions are obtained as $\{\hat{\eta }_{i,m}(s_m)\}_{i=1}^n$, we can calculate the empirical covariance function $\hat{\varSigma }_{\eta _m}(s_m,s_m^{\prime })=\frac{1}{n}\sum _{i=1}^n \hat{\eta }_{i,m}(s_m)\hat{\eta }_{i,m}(s_m^{\prime })$. And eigenbases $\{\hat{\phi }_{md}(s_m)\}$ can be estimated from spectral decomposition,

$$\begin{aligned} \hat{\varSigma }_{\eta _m}(s_m,s_m^{\prime }) = \sum _{d} \hat{\lambda }_{md} \hat{\phi }_{md}(s_m) \hat{\phi }_{md}(s_m^{\prime }). \end{aligned}$$

(10)

Then individual random effect $\hat{\eta }_{i,m}(s_m)$ is projected onto basis functions $\{\hat{\phi }_{md}(s_m)\}$ to get functional PC scores,

$$\begin{aligned} \hat{z}_{i,md}=\sum _{k=1}^{K_m}\hat{\eta }_{i,m}(s_{k,m})\hat{\phi }_{md}(s_{k,m}). \end{aligned}$$

(11)

There are several strategies to determine the number of fPCs to be extracted. For example, the analog of some model selection techniques have been generalized for this purpose, such as Akaike information criterion (AIC), Bayesian information criterion (BIC) [25] and cross-validation (CV) [20]. Alternatively, the percentage of explained variation has been widely used to give an appropriate cut-off in practice. Here, we choose D as the minimum number of fPCs that incorporates at least $V\%$ of total variation in each tract. When the optimal $D=D_m$ is different across tracts, the largest $D_m$ will be used for all tracts.

In Stage II, a PCA-based factor analysis is performed. Let ${\hat{{\varvec{\xi }}}_1,\ldots ,\hat{{\varvec{\xi }}}_L }$ be the first L eigenvectors of sample covariance matrix $\hat{\mathbf {\Sigma }}_\mathbf {z} =\frac{1}{n}\hat{\mathbf {Z}}^T\hat{\mathbf {Z}}$. The loading matrix, the global factors and the individual factors are estimated as,

$$\begin{aligned} \hat{\mathbf {\Lambda }}=\sqrt{p}(\hat{{\varvec{\xi }}}_1,\ldots ,\hat{{\varvec{\xi }}}_L),~ \hat{\mathbf {F}}=\frac{1}{p}\hat{\mathbf {Z}}\hat{\mathbf {\Lambda }},~ \text {and}~\hat{\mathbf {U}} = \hat{\mathbf {Z}} - \hat{\mathbf {F}}\hat{\mathbf {\Lambda }}^T \end{aligned}$$

(12)

Finally, the MLM (5) is used to estimate regression coefficients. Standard test statistics, such as wald and score statistics, can be applied subsequently for inference purpose.

3 Simulations

In this section, numerical simulations are conducted to evaluate the proposed method. Particularly, we examine the performance of HFPRM to detect covariate effect in hypothesis testing.

3.1 Setup

11 fiber tracts with FA measure shown in Table 1 were selected from diffusion tensor tractography in UNC Early Human Brain Development Studies [7]. Functional responses were simulated from a vary coefficient model with fixed covariate effects,

$$\begin{aligned} y_{i,m}(s_m)=\mu _m(s_m)+{{\varvec{\beta }}}_m(s_m)^T{{\varvec{x}}}_i+\eta _{i,m}(s_m)+e_{i,m}(s_m), \end{aligned}$$

(13)

where $i=1,\cdots ,n$ and $m=1,\cdots ,11$, ${{\varvec{\beta }}}(s_m)$ was a $q \times 1$ vector of coefficient functions along the $m-$th tract, covariates ${{\varvec{x}}}_i=(x_{i1},\cdots ,x_{iq})^T$ were generated from N(0, 1) for continuous variables or from multinomial distribution with equal probabilities for categorical variables, $\eta _{i,m}(s_m)$ followed gaussian process $GP\{0,\varSigma _{\eta _m}\}$ and $e_{i,m}(s_m)$ followed $GP\{0,\varSigma _{e_m}\}$. Compared to model (1), the above equation directly specified the covariates as fixed effect. Sample size n was set to be 100 and true parameters $({{\varvec{\beta }}}(s_m), \varSigma _{\eta _m},\varSigma _{e_m})$ were estimated from real data using FADTTS [28].

To examine our method, the following two scenarios on $\beta (s_m)^T x_i$ were simulated. In case I, the aim is to study shared effect of multiple tracts. Gender (G) and gestational age at birth (Gage) were included as covariates for all the 11 tracts,

$$\begin{aligned} y_{i,m}(s_m)=\mu _m(s_m)+c\beta _{m,1}(s_m)\text {Gage}_i+\beta _{m,2}(s_m)\text {G}_i+\eta _{i,m}(s_m)+e_{i,m}(s_m),~ \forall m, \end{aligned}$$

in which we assumed $c=0,0.2,0.4,0.6$ and Gage effect was tested.

In case II, we want to examine a tract-specific effect. Birth weight (BW) was added as covariate to one particular tract, right uncinate fasciculus $(m=11)$, in addition to case I,

$$\begin{aligned} y_{i,m}(s_m)= & {} \mu _m(s_m)+\beta _{m,1}(s_m)\text {Gage}_i+\beta _{m,2}(s_m)\text {G}_i+\eta _{i,m}(s_m)+e_{i,m}(s_m), ~m\le 10, \\ y_{i,11}(s_m)= & {} \mu _m(s_m)+\beta _{11,1}(s_m)\text {Gage}_i+\beta _{11,2}(s_m)\text {G}_i+c\beta _{11,3}(s_m)\text {BW}_i\\+ & {} \eta _{i,11}(s_m)+e_{i,11}(s_m), \end{aligned}$$

where effect size c was set to take values 0, 0.5, 1, 1.5 and the effect of BW was tested.

We applied HFPRM to the simulated dataset. The varying coefficient model (1) was first fitted to estimate individual functions. Functional principal components were then extracted such that at least 85% of total variation is included for each tract. In factor analysis, the first elbow point in the scree plot was taken as a cut-off to determine the number of global factors. In testing step, type I error and statistical power were calculated at significance level $\alpha =0.05$ based on 1000 simulation replications. FADTTS was also applied on each single tract and the results were compared.

3.2 Results

In case I, the first five functional principal components were extracted for each tract and the first factor was identified as global factor. The rejection rates for global factor analysis and FADTTS on testing Gage effect are presented by Fig. 2(a). The global factor analysis is substantially more powerful than the single tract analysis when detecting commonly shared effect. Such results are expected since common effect tends to be accumulated in the global factor.

In case II, the first five functional principal components were extracted for each tract and the first two factors were identified as global factors. Figure 2(b) shows the rejection rates for global factor analysis, individual factor analysis and FADTTS on testing BW effect. As can be seen, individual factor analysis in HFPRM achieves comparable power to single tract analysis for detecting tract-specific effect.

4 Early Human Brain Development Study

To investigate how genetic factors influence brain structure in prenatal and early postnatal stage, we conducted a genome-wide association study on the fiber bundle statistics in a unique cohort of infants. A total number of 662 neonatal twin subjects were taken from the UNC Early Brain Development Studies [7].

4.1 Data Acquisition and Preprocessing

MRI scans were acquired either on a 3T Siemens Allegra head-only scanner (N = 566) or on a 3T Siemens TIM Trio 3 T scanner (N = 96). For the Allegra model, 339 diffusion weighted images were acquired by a single shot EPI DTI sequence with the following parameters: TR/TE = 5200/73 ms, voxel resolution = $2 \times 2 \times 2\,\text {mm}^3$, 6 non-collinear directions with $b=1000\,\text {s}/\text {mm}^2$ and 1 baseline image with $b=0$. To improve the signal-to-noise ratio, five scans were repeated and averaged. For the remaining subjects scanned on Allegra, DWI was acquired with the following parameters: TR/TE = 7680/82 ms, voxel resolution = $2 \times 2 \times 2\,\text {mm}^3$, 42 non-collinear directions with diffusion gradients of $b=1000\,\text {s}/\text {mm}^2$ in addition to 7 baseline images. For the Trio model, DWIs were acquired using a similar protocol to that of the 42 direction Allegra model with TR/TE = 7200/83 ms. Quality control was applied on raw DWIs using DTIPrep [18], and FSL [11, 22] was performed for skull stripping and brain masking. We used a weighted least squares method [8] to estimate diffusion tensors and followed the UNC-Utah NA-MIC framework [23] to create a study-specific atlas. Subsequently, a total number of 44 fiber tracts listed in Table 1 were reconstructed in the atlas space using a streamline algorithm [5]. For each subject, four scalar diffusion properties, FA, MD, AD and RD, were then calculated at each location along each tract using neighboring diffusion tensors.

Genotyping of single nucleotide polymorphisms (SNPs) was conducted on Affymetrix Axiom genome-wide LAT Array. Samples with call rates less than $95\%$, outliers for homozygosity, ancestry outliers and unexpected relatedness were excluded from the study. We also removed genetic markers with Hardy-Weinberg equilibrium p-value less than $10^{-8}$, call rate less than $95\%$ and Mendelian error rate larger than $10\%$. Population stratification was assessed using PCA [19]. Imputation was performed with MaCH-Admix [15] using 1000G reference panel [3]. To evaluate the quality of imputed SNPs, we computed the mean $R^2$ under varying minor allele frequency (MAF) categories and selected $R^2$ cutoffs as described in [14]. SNPs with MAF less than 0.01 were excluded from imputed dataset. Eventually, 472 twin subjects (32 MZ pairs, 75 DZ pairs and 259 singletons or unpaired twin subjects) and 8,538,562 genetic markers were retained for further analysis.

Table 1. List of fiber tracts in simulation and real data experiment

Full size table

4.2 Data Analysis

In this experiment, we chose to focus on the fractional anisotropy (FA) measure. FA quantifies the extent of local directional water diffusion and partially reflects the degree of bundle maturation in premature brains [4]. To eliminate the heterogeneity in variance among different tracts, $y_{i,m}(s_m)$ was rescaled by the total standard deviation along the tract. For the twin study, ACE model was fitted in (5) to account for correlation within twin pairs. Seven variables were added as covariates, including gestational age at birth, gender, DTI direction, scanner type and the first three genetic principal component to adjust for population stratification.

4.3 Results

In functional PCA, the first 5 functional principal components were extracted for each tract to include at least 70% of variation. Figure 3(a) shows the scree plot in factor analysis and the elbow point is located at factor 2. Therefore, the first factor is identified as the global factor. We then performed GWAS on the global factor. The result is visualized by Fig. 3(b). In the Manhattan plot, we observed a significant region in anaplastic lymphoma kinase (ALK) gene on chromosome 2. The ALK gene is a neuronal orphan receptor tyrosine kinase that plays an important role in the nervous system development [1], and is highly expressed in the neonatal brain [10]. As a comparison, we also performed association analysis for top hit rs66556850 on each single tract. The result is presented by Fig. 3(c). A number of tracts have relatively small pvalue yet not small enough to be detected by a single tract GWAS. It indicates that the global factor analysis is more powerful to detect commonly shared genetic effect than single tract analysis.

5 Conclusion

We have developed a hierarchical functional principal regression model (HFPRM) to efficiently conduct joint analysis on diffusion statistics from multiple neurofiber bundles. A varying coefficient model is introduced and functional PCA is applied to capture major tract variation. Factor analysis is then adopted to extract key features at both global level and individual level. Finally, standard estimation and testing procedures can be applied to study global effect and tract-specific effect. Simulation results demonstrated that HFPRM is powerful to detect common effect shared by multiple tracts. HFPRM has also been successfully applied to a genome-wide association study on neonatal twins. We are able to identify some important genetic variants related to early childhood brain development that were ignored by single tract analysis.

References

National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/gene/238
Bach, M., Laun, F.B., Leemans, A., Tax, C.M., Biessels, G.J., Stieltjes, B., Maier-Hein, K.H.: Methodological considerations on tract-based spatial statistics (TBSS). Neuroimage 100, 358–369 (2014)
Article Google Scholar
Genomes Project Consortium, et al.: An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422), 56–65 (2012)
Google Scholar
Dubois, J., Hertz-Pannier, L., Dehaene-Lambertz, G., Cointepas, Y., Le Bihan, D.: Assessment of the early organization and maturation of infants’ cerebral white matter fiber bundles: a feasibility study using quantitative diffusion tensor imaging and tractography. Neuroimage 30(4), 1121–1132 (2006)
Article Google Scholar
Fedorov, A., Beichel, R., Kalpathy-Cramer, J., Finet, J., Fillion-Robin, J.C., Pujol, S., Bauer, C., Jennings, D., Fennessy, F., Sonka, M., et al.: 3D slicer as an image computing platform for the quantitative imaging network. Magn. Reson. Imaging 30(9), 1323–1341 (2012)
Article Google Scholar
Garyfallidis, E., Ocegueda, O., Wassermann, D., Descoteaux, M.: Robust and efficient linear registration of white-matter fascicles in the space of streamlines. NeuroImage 117, 124–140 (2015)
Article Google Scholar
Gilmore, J.H., Schmitt, J.E., Knickmeyer, R.C., Smith, J.K., Lin, W., Styner, M., Gerig, G., Neale, M.C.: Genetic and environmental contributions to neonatal brain structure: a twin study. Hum. Brain Mapp. 31(8), 1174–1182 (2010)
Google Scholar
Goodlett, C.B., Fletcher, P.T., Gilmore, J.H., Gerig, G.: Group analysis of DTI fiber tract statistics with application to neurodevelopment. NeuroImage 45, S133–S142 (2009)
Article Google Scholar
Guevara, P., Poupon, C., Rivière, D., Cointepas, Y., Descoteaux, M., Thirion, B., Mangin, J.: Robust clustering of massive tractography datasets. NeuroImage 54(3), 1975–1993 (2011)
Article Google Scholar
Iwahara, T., Fujimoto, J., Wen, D., Cupples, R., Bucay, N., Arakawa, T., Mori, S., Ratzkin, B., Yamamoto, T.: Molecular characterization of ALK, a receptor tyrosine kinase expressed specifically in the nervous system. Oncogene 14(4), 439–449 (1997)
Article Google Scholar
Jenkinson, M., Beckmann, C.F., Behrens, T.E., Woolrich, M.W., Smith, S.M.: FSL. Neuroimage 62(2), 782–790 (2012)
Article Google Scholar
Jin, Y., Shi, Y., Zhan, L., Gutman, B.A., de Zubicaray, G.I., McMahon, K.L., Wright, M.J., Toga, A.W., Thompson, P.M.: Automatic clustering of white matter fibers in brain diffusion mri with an application to genetics. NeuroImage 100, 75–90 (2014)
Article Google Scholar
Karhunen, K.: Zur Spektraltheorie stochastischer Prozesse. Ann. Acad. Sci. Finnicae Ser. A 1, 34 (1946)
MathSciNet MATH Google Scholar
Liu, E.Y., Buyske, S., Aragaki, A.K., Peters, U., Boerwinkle, E., Carlson, C., Carty, C., Crawford, D.C., Haessler, J., Hindorff, L.A., et al.: Genotype imputation of metabochipsnps using a study-specific reference panel of 4,000 haplotypes in african americans from the women’s health initiative. Genet. Epidemiol. 36(2), 107–117 (2012)
Article Google Scholar
Liu, E.Y., Li, M., Wang, W., Li, Y.: Mach-admix: genotype imputation for admixed populations. Genet. Epidemiol. 37(1), 25–37 (2013)
Article Google Scholar
Loève, M.: Fonctions aléatoires à décomposition orthogonale exponentielle. La Revue Scientifique 84, 159–162 (1946)
MathSciNet MATH Google Scholar
O’Donnell, L.J., Westin, C.F., Golby, A.J.: Tract-based morphometry for white matter group analysis. NeuroImage 45, 832–844 (2009)
Article Google Scholar
Oguz, I., Farzinfar, M., Matsui, J., Budin, F., Liu, Z., Gerig, G., Johnson, H.J., Styner, M.A.: DTIPrep: quality control of diffusion-weighted images. Front. Neuroinform. 8, 4 (2014)
Article Google Scholar
Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38(8), 904–909 (2006)
Article Google Scholar
Rice, J.A., Silverman, B.W.: Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Stat. Soc. Ser. B (Methodol.) 53, 233–243 (1991). JSTOR
MathSciNet MATH Google Scholar
Smith, S.M., Jenkinson, M., Johansen-Berg, H., Rueckert, D., Nichols, T.E., Mackay, C.E., Watkins, K.E., Ciccarelli, O., Cader, M.Z., Matthews, P.M., et al.: Tract-based spatial statistics: voxelwise analysis of multi-subject diffusion data. Neuroimage 31(4), 1487–1505 (2006)
Article Google Scholar
Smith, S.M., Jenkinson, M., Woolrich, M.W., Beckmann, C.F., Behrens, T.E., Johansen-Berg, H., Bannister, P.R., De Luca, M., Drobnjak, I., Flitney, D.E., et al.: Advances in functional and structural mr image analysis and implementation as FSL. Neuroimage 23, S208–S219 (2004)
Article Google Scholar
Verde, A.R., Budin, F., Berger, J.B., Gupta, A., Farzinfar, M., Kaiser, A., Ahn, M., Johnson, H.J., Matsui, J., Hazlett, H.C., et al.: UNC-Utah NA-MIC framework for DTI fiber tract analysis. Front. Neuroinform. 7, 51 (2014)
Article Google Scholar
Wedeen, V.J., Rosene, D.L., Wang, R., Dai, G., Mortazavi, F., Hagmann, P., Kaas, J.H., Tseng, W.Y.I.: The geometric structure of the brain fiber pathways. Science 335(6076), 1628–1634 (2012)
Article Google Scholar
Yao, F., Müller, H.G., Wang, J.L.: Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc. 100(470), 577–590 (2005)
Article MathSciNet MATH Google Scholar
Yuan, Y., Gilmore, J.H., Geng, X., Martin, S., Chen, K., Wang, J.I., Zhu, H.: FMEM: functional mixed effects modeling for the analysis of longitudinal white matter tract data. NeuroImage 84, 753–764 (2014)
Article Google Scholar
Yushkevich, P.A., Zhang, H., Simon, T.J., Gee, J.C.: Structure-specific statistical mapping of white matter tracts. NeuroImage 41, 448–461 (2008)
Article MATH Google Scholar
Zhu, H., Kong, L., Li, R., Styner, M., Gerig, G., Lin, W., Gilmore, J.H.: FADTTS: functional analysis of diffusion tensor tract statistics. NeuroImage 56(3), 1412–1425 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
Jingwen Zhang, Chao Huang, Joseph G. Ibrahim & Hongtu Zhu
Curriculum in Neurobiology, University of North Carolina at Chapel Hill, Chapel Hill, USA
Shaili Jha
Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, USA
Rebecca C. Knickmeyer, John H. Gilmore & Martin Styner
Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, USA
Hongtu Zhu

Authors

Jingwen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Joseph G. Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Shaili Jha
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca C. Knickmeyer
View author publications
You can also search for this author in PubMed Google Scholar
John H. Gilmore
View author publications
You can also search for this author in PubMed Google Scholar
Martin Styner
View author publications
You can also search for this author in PubMed Google Scholar
Hongtu Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongtu Zhu .

Editor information

Editors and Affiliations

University of North Carolina, Chapel Hill, North Carolina, USA
Marc Niethammer
University of North Carolina, Chapel Hill, North Carolina, USA
Martin Styner
Kitware Inc., Carrboro, North Carolina, USA
Stephen Aylward
University of North Carolina, Chapel Hill, North Carolina, USA
Hongtu Zhu
University of Pennsylvania, Philadelphia, Pennsylvania, USA
Ipek Oguz
University of North Carolina, Chapel Hill, North Carolina, USA
Pew-Thian Yap
University of North Carolina, Chapel Hill, North Carolina, USA
Dinggang Shen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J. et al. (2017). HFPRM: Hierarchical Functional Principal Regression Model for Diffusion Tensor Image Bundle Statistics. In: Niethammer, M., et al. Information Processing in Medical Imaging. IPMI 2017. Lecture Notes in Computer Science(), vol 10265. Springer, Cham. https://doi.org/10.1007/978-3-319-59050-9_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-59050-9_38
Published: 23 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59049-3
Online ISBN: 978-3-319-59050-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics