Abstract
The molecular determinants of tissue composition of the human brain remain largely unknown. Recent genome-wide association studies (GWAS) on this topic have had limited success due to methodological constraints. Here, we apply advanced whole-brain analyses on multi-shell diffusion imaging data and multivariate GWAS to two large scale imaging genetic datasets (UK Biobank and the Adolescent Brain Cognitive Development study) to identify and validate genetic association signals. We discover 503 unique genetic loci that have impact on multiple regions of human brain. Among them, more than 79% are validated in either of two large-scale independent imaging datasets. Key molecular pathways involved in axonal growth, astrocyte-mediated neuroinflammation, and synaptogenesis during development are found to significantly impact the measured variations in tissue-specific imaging features. Our results shed new light on the biological determinants of brain tissue composition and their potential overlap with the genetic basis of neuropsychiatric disorders.
Similar content being viewed by others
Introduction
The human brain develops through complex yet carefully orchestrated neurobiological processes, whereby cortical and subcortical circuitries are integrated for proper functioning1. Neural migration, axonal guidance, and synapse formation are coordinated through spatially distributed molecular gradients spanning across several brain regions2. Differences in tissue composition are the result of these developmental processes. We can gain substantial insight into how neural circuitries were formed and supported by investigating the genetic determinants of whole-brain patterning with respect to tissue composition.
Recent advances in multi-shell diffusion magnetic resonance imaging and diffusion signal modeling have created an opportunity to evaluate tissue composition in vivo3,4,5,6,7. Differences in signals between water molecules of intracellular, extracellular, and unhindered compartments are captured by higher-order diffusivity (multiple shells), allowing for the estimation of the relative proportions of cell bodies, axonal fibers, and interstitial fluids within a voxel3,4,5,8,9,10,11,12. This type of tissue modeling has been used to detect compositional changes driven by neurodegeneration8,11, development3, obesity4, and carcinogenesis9,10. However, there is currently no genome-wide association study (GWAS) on compositional features. This omission is critical, as traditional imaging measurements are insensitive to neurite density, short-range fibers, and cellular properties of cortical gray matter and subcortical nuclei7.
Moreover, GWAS of brain imaging measurements usually adopt a univariate approach, performing associations with one brain region at a time13,14,15,16,17,18. Patterns encompassing the whole brain have been mostly ignored or controlled away as global effects, potentially biasing the interpretations toward purely regional effects. This risks misattributing the nature of genetic effects on the brain, e.g., the cortical surface area is driven by local cortical expansion when it may instead be due to underlying axonal growth. The univariate region-of-interest approach may also be underpowered to detect the full extent of genetic variants associated with canonical neurodevelopmental pathways, especially when effects are spatially distributed19,20. A multivariate GWAS, focused on detecting loci that have effects across multiple brain regions, has been shown to be highly efficient in discovering many loci21,22,23. By expanding how we can analyze the discovered spatial patterning from the multivariate GWAS, we can reveal further biological insight into the molecular gradients that shape the human brain.
Here, we performed a multivariate GWAS on the metrics derived from multi-shell diffusion imaging to examine the genetic determinants of whole-brain patterning of cellular compartments. Using two largest extant imaging genetic studies that have compatible multishell scans, the UK Biobank24 (UKB) and the Adolescent Brain Cognitive Development℠ Study (ABCD Study®)25,26, we identified and validated 503 unique loci for tissue sensitive diffusion metrics. The discovered loci were enriched for neurogenesis, neuron differentiation, and axonal development. Among the validated loci, 152 have not been reported previously by GWAS of brain imaging phenotypes. By investigating the spatial distribution of the associated effects, we highlighted critical molecular pathways involved in neuroinflammation and axonal growth, and the corresponding regions that may be susceptible to these processes. Signal overlap, at both the locus level and genome-wide, with neuropsychiatric outcomes indicate the functional relevance of our GWAS results, providing a foundation for further understanding of the biological underpinnings of neuropsychiatric disorders.
Results
Multivariate GWAS on features of tissue composition across the whole brain
We processed multi-shell diffusion MRI data from UKB and ABCD with restriction spectrum imaging (RSI) to extract the tissue composition features of the human brain3,4,5,8,9,10,11,12,27. To ensure the validating test was robust against study variability due to time shift28, we selected the UKB samples received MRI scans before 2019 as the discovery set while all others were regarded as replication sets. The sample characteristics can be found in the Supplementary Table 1. The images were harmonized and registered to a common atlas to ensure the alignment of voxels across subjects (See Method for detailed imaging processing pipelines25,26 and Supplementary Figure 1 for quality control metircs). RSI decomposes the diffusion-weighted signals as emanating from three separable tissue compartments: intracellular, extracellular, and free water (Fig. 1a). Each compartment is characterized by its intrinsic diffusion properties. In this study, we consider the intracellular compartment, which is defined by restricted diffusion bounded by cellular membranes, and the free water compartment characterized by the unimpeded diffusion of water molecules. RSI estimates the normalized isotropic restricted signal volume fraction, N0, which captures the relative amount of cell bodies within a voxel, such as the densities of neurons, astrocytes, and oligodendrocytes. The normalized directional restricted signal volume fraction, ND, captures the relative amount of tube-like structures within a voxel, such as axons and dendrites. The free water component, NF, captures the relative amount of free water outside of cell structures. N0, ND, and NF provide greater tissue specificity than the widely-used diffusion tensor metrics, have been useful in the understanding variation of cellular organization within the human brain and are highly informative for the human brain development3,10,11,27. The spatial distributions of those three tissue-sensitive measures can be seen in Supplementary Figure 2–5.
Three separate voxel-wise multivariate GWAS on N0, ND, and NF were performed. For the discovery stage (UKB discovery set, imaging acquisition before 2019, n = 23,543), we used combined principal components (CPC) statistics22,29 (Fig. 1b) to identify associated loci from multivariate measurements. As a practical extension to other multivarite GWAS methods, such as MOSTest21, CPC combines statistics from associations with the finite number of principal components and has close form expression on the null distribution without the need for permutations22. Using the UKB discovery set, we calculated the principal components (PCs) from the tissue feature across all voxels. From the whole-brain images in 2 mm resolution per voxel, spanning across 100 by 100 by 130 voxels, the first 5000 PCs were extracted and used in the subsequent analyses, explaining more than 70% of the total variance of the imaging data (Supplementary Figure 6). Since all PCs are orthogonal to each other, the statistical inference can be based on combining the associations between genetic variants and each of the derived PCs (Fig. 1b). Each of the PCs can be regarded as an orthogonal basis function with limited interpretability, yet the weighted combination of them can represent any spatial distribution (Supplementary Figure 7–9). CPC combined the association signals across PC for a given genetic variant and detect the genetic loci that are shared across multiple PCs, thus reducing the burden of multiple testing and the false detection on nuisance effects. We tuned the hyper-parameters for the combination function to optimize the power for discovery19,22 by searching through four possible combination sets (see Methods). To account for hyper-parameter tuning and the three tissue features, we set the p-value threshold for genome-wide significance as 5e-8 divided by 12 = 4.2e-9.
After Linkage-disequilibrium pruning (LD R2 > 0.1) and positional clumping (distance < 250 K bp), we found 432, 350, and 273 independent genetic loci associated with N0, ND, and NF, respectively (Fig. 2a; Supplementary Data 1–4). After merging loci with overlapping genomic ranges, there are 503 unique loci across all three tissue features (Supplementary Data 1).
Loci validated in adults and adolescents
To validate the discovered loci in independent studies, we first calculated polyvoxel scores30,31,32,33 based on eigenvectors and association weights from the discovery set, and then performed the association tests between genetic variants and the derived scores (see Methods). This procedure is similar to confirmatory canonical correlation analysis23, except with only one variant involved in each regression. We repeated the same confirmatory analysis in the UKB validation set (n = 6396, scanned after 2019) and ABCD samples (n = 8189), except for including study-specific covariates and random effects controlling for family relatedness and diverse genetic background in ABCD (see Methods). Among the discovered loci, 335 (79%), 298 (85%), and 222 (81%) were found to be validated in the independent UKB validation set for N0, ND, and NF, after Bonferroni correction for the number of loci discovered. In ABCD, 106 (25%), 153 (43%), and 88 (32%) of the discovered loci were validated for N0, ND, and NF, despite the large differences in age and other sample characteristics between UKB and ABCD.
Characteristics of validated loci
To examine the overlap between our validated loci and previously reported loci in neuroimaging GWAS, we curated the reported loci lists from the NHGRI-EBI Catalog based on keywords in “brain”, “imaging”, “cortical”, “subcortical”, and “white matter”. The final list of reported loci included GWAS on brain connectivity15, cortical surface measures13,21,34, derived imaging instruments across all modalities35, subcortical volumes14,21,36, brain volumes16,34,37, white matter hyperintensities38, and white matter microstructure18. We queried if any of our validated loci were in linkage-disequilibrium (LD) with or located in 250 kb regions of previously reported neuroimaging loci. The results are summarized in Fig. 2b. Among the validated loci, 134 unique loci overlapped with previously published GWAS on cortical surface measurements and 108 unique loci were found to be associated across cortical and subcortical structures, indicating wide pleiotropic effects across brain regions (Fig. 2b, Supplementary Data 2–7). We also found 136 unique novel loci through our approach, demonstrating improved power in both discovery and replications.
On the other hand, the gene set analyses39,40 on the identified loci shows each tissue feature has distinct pattern of Gene Ontology enrichment. While all tissue features were highly enriched for the Gene Ontology term of neurogenesis, N0 showed stronger enrichment in anatomical morphogenesis, while ND demonstrated more enriched in axon development, neuron projection guidance, and tangential neuronal migration (Fig. 2c). This suggests that at the level of the genomic loci, modeling tissue compositions captured differential molecular effects associated with the human brain.
Loci showing differential effects on tissue compositions
Closer inspection of the effect size distributions of the loci provides a unique angle into the molecular processes shaping the human brain. For instance, the 5q14.3 locus at the gene body of VCAN, tagged by a common SNP rs12653308, was found to be strongly associated with N0 (Fig. 3a). It was reported to be associated with various diffusion metrics from white matter fiber tracts18 and cortical surface measurements21 (Fig. 3a). Instead of fiber tracts or cortical surface regions, we found that the association strength is particularly strong in the hippocampus bilaterally (Figs. 3b, c, Supplementary Data 5–8), based on the regional enrichment analysis with 50,000 bootstraps (see Method). VCAN, which encodes versican and is a lectican-binding chondroitin sulfate proteoglycan (CSPG), serves a critical role in astrocyte-mediated neuroinflammation41, and has potential interacting pharmacological targets42,43 (Fig. 3d; Supplementary Information; Supplementary Data 9–11). CSPGs were found to be associated with astrocyte-dependent synaptogenesis within the hippocampus44. When we examined the associations between genetic variants of genes encoding CSPGs (BCAN, NCAN, and VCAN) and tissue features, we found N0 showed stronger association signals than ND and NF (Fig. 3e). Since the effects were validated in ABCD, our results support the early effects of astrocytic mediated processes on the human hippocampus via CSPGs. Changes in the distribution of CSPGs in the hippocampal formation were observed among patients with schizophrenia and patients with bipolar disorders45,46, linking our findings to neuropsychiatric outcomes.
The locus located at 2p23.3, tagged by rs11126784, has strong signals associated with ND (Fig. 3f). This locus resides within the gene body of DPYSL5 and has been reported to be associated with cortical surface measures21. Instead of the cortical surface, our whole-brain multivariate GWAS indicates the effect sizes were more diffusely distributed among white matter tracts, especially within cortico-striatal circuitry (Fig. 3g, h, Supplementary Data 5-8). DPYSL5 belongs to the collapsin response mediator protein (CRMP) family, including DPYSL2, DPYSL3, and DPYSL4, which are essential for axonal growth and neurite morphogenesis47,48,49 (Fig. 3i). Indeed, all tagged SNPs of the CRMP family proteins show stronger association signals with ND than with N0 and NF (Fig. 3j). Our results are concordant with CRMP involvement in neurodevelopment and showing that their effects can be observable among major white matter fiber bundles early on. Our findings are also relevant to neuropsychiatric outcomes, as CRMP has been implicated in schizophrenia and mood disorders50.
The 136 novel loci we discovered and validated in this study are relevant for neuropsychiatric phenotypes and warrant further investigation (Supplementary Data 2-4). An N0-specific novel locus at 5q14.3 is within the gene body of MEF2C, which can influence neural progenitor cell differentiation and regulation of synaptic densities51,52. This locus overlaps with GWAS findings of educational attainment and intelligence53. Another locus at 20p12.1, on the gene body of MACROD2, showed consistent signals among adults and adolescents (ND: UKB discovery p = 1e-29, UKB validation p = 1.7e-18, and ABCD validation p = 4.6e-8), and has previously been linked to autism54 and general cognitive ability55. The gene MACROD2 was also implicated in educational attainment53 and risk-taking behaviors56.
Cell-type enrichment analysis
Although N0, ND, and NF were designed to capture different properties of tissue compartments, the strong overlapping signals across the three features indicates that similar cell processes and populations may shape all three microstructural features. To investigate this, we analyzed the heritability enrichment given cell type annotations using stratified LD score regression (S-LDSC)57. A dimensionally-corrected multivariate statistic, such as the scaled \({\chi }^{2}\), can be used in the context of LDSC for deriving the relative enrichment in the average heritability of the high-dimensional phenotypes23. Hence, we ran S-LDSC with tissue-specific chromatin annotations57 and cell type-specific annotations58 to obtain cell type-specific enrichment patterns for our RSI phenotypes (Fig. 4).
While the overall patterns of the enrichment are similar across three tissue features, ND has the strongest enrichment signals across all activating histone markers (H3K27ac, H3K36me3, H3K4me1, H3K4me3, and H3K9ac) and DNase hypersensitivity sites (Pbon < 0.05). All three features were enriched in the chromatin state of fetal brain and hippocampal tissues whereas ND also shows enrichment in the cingulate cortex and substantia nigra (Fig. 4a). With respect to cell populations, using public available cell-type-specific chromatin state data from mouse samples that have been shown to be useful for prioritizing human GWAS results58, our analysis indicates all three features have significant enrichment in embryonic dopaminergic interneurons and astrocytes (Pbon < 0.05; Fig. 4b). Moreover, ND shows stronger enrichment signals in oligodendrocytes, as expected for an imaging feature capturing the integrity of tubular structures such as the myelin sheath.
Genetic overlap with neuropsychiatric and immune-related phenotypes
We investigated the proportion of genome-wide signals of the three tissue features which overlap with neuropsychiatric phenotypes53,56,59,60,61,62,63,64,65 and immune disorders66. Based on a method tailored for unsigned multivariate statistics23, we evaluated the signal shared between each pair of traits using their summary statistics. The amount of shared signal was the Spearman correlation of the average SNP −log10 p values within each approximately independent LD block. All three tissue features consistently show significant overlap with immune disorders (ρ:0.15–0.21, all P < 1.2e-9)66, schizophrenia (ρ: 0.15–0.17, all P < 5e-10)63, attention deficit hyperactivity disorder (ρ: 0.11–0.12, all P < 2e-6)67, bipolar disorder (ρ: 0.10–0.12, all P < 6e-6)61, and cross-psychiatric-disorders (ρ: 0.14–0.16, all P < 7e-9)59. The overlaps with Alzheimer’s disease is less evident (ρ: 0.07–0.10)65. Educational attainment53 and risk-related behaviors56 are also significantly correlated (ρ: 0.10–0.20, all P < 5e-5; Supplementary Figure 11). The patterns of genome-wide signals shared with neuropsychiatric phenotypes were not evidently different across the three tissue features, despite the distinct patterns we observed at the locus level and cell-type-specific enrichments. While the limited resolution of LD blocks may contribute to this null finding, the evident similarities in the genome-wide level results may mean that pleiotropic effects, either horizontal or vertical, on neurodevelopmental traits are highly polygenic, sharing multiple loci but with different functional outputs.
Discussion
Using imaging features of whole brain tissue compositions, a multivariate GWAS discovered and validated 503 loci, of which 136 had not been reported in previous GWAS of neuroimaging phenotypes. Through in-depth examination of effect size distributions, we demonstrated the specific impact of molecular pathways, including CSPGs and CRMP, on the tissue composition underlying the human brain in vivo. Our findings are relevant for neuropsychiatric outcomes, including cognitive functions and psychiatric disorders. By identifying the key protein families and highlighting the susceptible brain regions through enrichment analyses, these results indicate a path to further investigate molecular mechanisms of brain regional development and specialization.
Our results indicate widespread pleiotropies between the development of cortical surfaces and cerebral white matter. As the patterning of the mature brain is the end results of multiple molecular processes working from differentiation of neuroprogenitor cells, migration of neurons, to synaptic prunings68, the relevant genes are unlikely to confine their effects on one single anatomically defined region. This is in line with findings from malformations of cortical development that the germline mutations of genes involves in cell migration would lead to global malformations instead of localized lesions69. Many of the loci we discovered and replicated were also found to be associated with other imaging modalities across cortical and subcortical regions. We are not claiming that focal effects do not exist, indeed the variants we highlighted do have locally enriched signals. Instead we suggest that it is more likely that regional specification of the human brain is the ultimate result of a complex coordination of multiple distributed molecular processes70 rather than that single genetic variants effect single anatomical regions. For instance, the group of genes belonging to CSPG had consistent associations with N0 metrics. The association signals were enriched in multiple brain regions beyond previously reported ROIs18,21, especially bilateral hippocampus. This astrocyte-dependent molecular process may have more direct effects on the synaptic pruning in the hippocampal regions and then cascading downstream to the associated fiber tracts.
Our findings showcase the need for novel analytic approaches in brain imaging genetics. Multivariate GWAS on whole-brain phenotypes circumvents the potential “spotlight bias” that region-of-interest approaches are susceptible to71. Diffuse effects across brain regions and neurobiological pathways are more easily detected with this approach, as the inference is based on the total sum of the effects. Moving beyond the metrics of structural volumes or fiber orientation enabled us to detect molecular effects on brain tissue properties, identifying relevant biological pathways important for human brain development and neuropsychiatric outcomes.
Because our multivariate GWAS was optimized for detecting signals shared across PCs, the statistical power may be less than ideal for detecting extremely sparse genetic effects, i.e. limited to only one or two PCs19,21,22. Although it is possible to have regionally specific genetic effects, our approach will be less senstive to detect such effects since our PCs captured information across the whole brain and were anatomically agnostic. Instead of having one PC to represent one particular anatomical structure, it was the weighted combinations of several PCs that highlighted certain anatomical structures. This is the benefit of using a multivariate GWAS, as it implicitly picks up the patterning signals without pre-defining the region of interest. However, these statistical properties can also make it difficult to interpret which anatomical regions are most relevant for a given discovered loci. To facilitate the interpretation, we implemented regional enrichment analyses, examining which anatomical structures have higher average signals compared to other regions.
Our results highlight the pleiotropic nature of genes involved in synaptic pruning, neuroinflammation, and axonal growth. The microglia-related molecular processes were implicated in multiple brain regions across cortical and subcortical structures. The significant loci overlaps between tissue-sensitive imaging metrics and psychiatric disorders implicates the etiological mechanisms beyond the neuronal growth, such as microglia-mediated synaptic pruning. Our identified genes may aid in experimental studies investigating interventions for neuropsychiatric outcomes.
Methods
UK Biobank samples
The inclusion criteria for the UKB sample were as follows: individuals who had valid consent at the time the analyses were performed (Dec 2020), were genetically inferred as having European ancestry, and completed the neuroimaging protocols. Among individuals who were included in the analyses, we further divided samples into two groups based on when the neuroimaging was performed (before or after 2019). We decided to use this naturally occurring temporal cut-point instead of randomized allotment of the groups because of best practice considerations20,28,72,73, avoiding potential systematic biases driven by temporally related imaging confounds. In particular, the potential time shift of the study design can lead to over-optimistic evaluation on the generalizability if random data split instead of time split was used28. Given our purpose is to discover and validate the biologically relevant effects, we used a conservative approach by selecting a naturally occurring time point as the selection criteria for discovery and replication sets in UKB. Individuals who had valid imaging data before 2019 were assigned as the discovery set (n = 23,543) and those who had valid imaging data, not before, but after 2019 were assigned to the validation set (n = 6396). The demographic information of the final selected UKB samples can be found in the Supplementary Table 1. Data from UKB is obtained under accession number 27412.
Adolescent Brain Cognitive Development study (ABCD) samples
For validating of results, we selected the full baseline data of the ABCD Study from public data release 3.0 (NDA DOI: 10.15154/1524729). Since ABCD was designed to recruit individuals with the diverse ancestral background which reflect the racial/ethnic composition of the United States, we did not exclude individuals based on their genetic ancestries, using linear mixed-effects models to control for the family relatedness and heterogeneous ancestral background. We only excluded those who did not have valid imaging and genetic data from release 3.0, resulting in 8189 individuals in the analyses. The demographic characteristics of the ABCD samples can be found in the Supplementary Table 1.
Imaging data processing
Both UKB and ABCD have diffusion imaging protocols that were compatible for applying RSI models. The MRI scans of UKB were performed at three scanning sites in the United Kingdom, all on identically configured Siemens Skyra 3 T scanners, with 32-channel receive head coils. The MRI scans of ABCD were collected by 21 study sites throughout the United States, with scanners from Siemens Prisma, GE 750 and Phillips 3 T scanners. To harmonize the imaging data across the two studies, we processed the dMRI data from UKB and ABCD using the ABCD-consistent imaging processing pipeline implemented by the ABCD Data Analysis, Informatics, and Resource Center (ABCD DAIRC). The detailed processing procedures have been published elsewhere25. In short, multi-shell diffusion MRI data of ABCD acquired with seven b = 0 s/mm2 frames and 96 noncollinear gradient directions, with 6 directions at b = 500 s/mm2, 15 directions at b = 1000 s/mm2, 15 directions at b = 2000 s/mm2, and 60 directions at b = 3000 s/mm2. Multishell diffusion MRI data of UKB acquired with five b = 0 s/mm2 frames and 100 non-collinear gradient directions, with 50 directions at b = 1000 s/mm2 and 50 directions at b = 2000 s/mm2. Preprocessing imaging quality control involves automatic motion detection and expert rating of the imaging quality25. Multishell diffusion data that passed preprocessing imaging quality control were processed through forward-reverse gradient warping, gradient nonlinearity distortion correction, eddy current correction, and motion correction to reduce the spatial distortion and signal heterogeneities driven by scanner differences. The corrected images were then aligned to a common atlas using rigid-body registration, adjusting the diffusion gradient directions to account for head rotation relative to the atlas25. Fiber orientation density (FOD) functions were calculated for each voxel, and the derived tensor information together with T1 structural information was fed into multi-channel nonlinear smoothing spline registration, resulting in positional and orientational aligned voxel-wise diffusion data in 2 mm resolution. Post-processing quality measures were calculated based on the voxelwise correlations between registered images and synthesized imaging metrics given the common atlas. Images with average correlations to the atlas below 0.8 were excluded.
Restriction spectrum imaging (RSI) models the diffusion signals as mixtures of spherical harmonic basis functions5,12. Based on the intrinsic diffusion characteristics of separable pools of water in the human brain (i.e. intracellular, extracellular, and unhindered free water), RSI estimates the signal volume fractions of each compartment and their corresponding spherical harmonic coefficients. The measure of restricted isotropic diffusion (N0) is the coefficient of the zeroth-order spherical harmonic coefficient, normalized by the Euclidian norm of all model coefficients. This feature is most sensitive to isotropically diffusing water in the restricted compartment, within cell bodies. The measure of restricted directional diffusion (ND) is the sum of second and fourth-order spherical harmonic coefficients, normalized by the norm of all model coefficients. This feature is sensitive to anisotropically diffusing water in the restricted compartment, within oriented structures such as axons and dendrites. The normalized free water diffusion (NF) measure is calculated as the zeroth-order spherical harmonic coefficients for the unhindered water compartment. NF is also normalized by the Euclidean norm of all-spherical harmonics coefficients. This normalization makes the RSI features unitless and in the range of 0 to 1.
Genotype data processing
For UKB, we used the released v3 imputed genotype data. For ABCD, we used the public release 3.0 imputed genotype data. Both datasets were imputed with the HRC reference panel74. We performed post-imputation quality control to only allow for GWAS on common bi-allelic SNPs. We filtered SNPs which have minor allele frequencies less than 0.5 percent, Hardy-Weinberg disequilibrium (p < 1e-10), and missingness greater than 5 percent. Genetic principal components and ancestral factors were derived using well-called independent SNPs for both datasets and were used for controlling population stratification in our analyses.
Combined principal component GWAS (CPC)
In the present multivariate GWAS of RSI measures we implemented the CPC method22 in the MOSTest package21. When the covariance among input measures is identity, CPC testing statistic is mathematically equivalent to the MOSTest test. Therefore, the codebase needed for performing CPC on ultra-high dimensional imaging data is compatible to our MOSTest except the following two components: First, the imaging measures were undergoing eigen-decompostions to derive PCs. Second, the testing statistics were based on close-form solution instead of the permutation scheme. As we were working on identity covariance matrix with finite number of PCs instead of million of voxels, CPC is a practical alternative to the original MOSTest.
CPC has been shown to be a robust multivariate GWAS method that is well powered to detect loci across different scenarios19,22,29. In our case, we optimized our power to detect genetic variants that shape the brain development, leaving traces in multiple brain regions. CPC enables the identification of loci that have association signals across multiple PCs, without the caveats of focusing on single brain regions. The procedures were as follows. First, the PCs and their corresponding eigenvectors were derived given the voxel-wise imaging data (Supplementary Figure 2-9). Each SNP was regressed on each of the derived PC scores, controlling for age, sex, 20 genetic PCs, genotyping batches, and intracranial volume. For a given SNP, the Wald statistics for each PC were combined as a simple linear sum (Fig. 1b). Given that PCs are orthonormal, the sum of the squared Wald statistics follows the \({\chi }^{2}\) distribution with k degrees of freedom for k PCs combined19,22. Although several different combination functions can be used19, we found that the global-local combination with Fisher’s method proposed in the original CPC paper has greatest power in detecting genetic loci22. Therefore, we experimented with four different global-local cut points (50, 100, 500, and 1000 PCs) to see which combinations yield the most discoveries. To reflect this experiment, we lowered the significance threshold to p < 4.2e-9 (corrected for 12 multiple comparisons, as 4 thresholds and 3 features were used in the current study).
Validation with confirmatory polyvoxel scoring
To perform the validation test for the discovered loci, we used the confirmatory polyvoxel scoring instead of repeating the GWAS on the independent cohorts. The eigenvectors (\({v}_{k}\)) and the regression coefficients (\({\beta }_{k}\)) obtained from the discovery set were used to calculate the imaging scoring for all subjects in the validation sets.
x stands for the raw imaging data. Given that each PC is independent of the other, it can be shown that the SNP regression on the polyvoxel score is equivalent to the comparison of the consistencies of regression coefficients between the discovery set and validation set.
Regional enrichment for spatial distribution across voxels
To provide more interpretability for the multivariate GWAS results, we developed a regional enrichment analysis to show which brain regions have relatively stronger signals. Most previous imaging studies relied on re-doing the voxelwise association tests to show the effect distributions of the discovered loci14,16,17,18,36. Given the distributed nature of the effect sizes among imaging measurements, the voxel-wise associations were not an ideal way of localizing effects20. Instead, we examined the overlap between association patterns and regions of interest in the co-registered anatomical atlas. The enrichment score is the probability-weighted regression coefficients from CPC:
The variance of the enrichment score was estimated by bootstrapping the association patterns from SNPs that did not surpass the significance threshold. We then calculate the corresponding enrichment z-score and the corresponding p-values. In the current study, we obtained 130 probability maps of brain regions defined in the common atlas (Supplementary Data 9). We applied the regional enrichment analyses on the loci that showed robust signals across adult and adolescent data.
Loci annotations, overlaps, and gene-set enrichment analyses
To annotate the identified genetic loci, we used FUMA39 and the GRanges function in R. SNPs with LD of r2 < 0.1 and within 250 kb distance were considered as one single locus. MAGMA40 was used for calculating the gene-set enrichment. To map the candidate genes onto the identified loci, we used FUMA with Hi-C mapping and eQTL information from PsychENCODE75.
Calculation of high dimensional heritability
Previous studies on the heritability of high-dimensional phenotypes indicated the average heritability is a valid way of estimating the genetic architecture of human traits23,76. It is equivalent to the weighted average of heritabilities across each of the PCs. We applied LD score regression for each PC and then weighted these according to their eigenvalues, deriving the average heritabilities across RSI features.
Stratified LD score regression for heritability enrichment analyses
As the prior literature on multivariate GWAS has demonstrated23, the multivariate \({\chi }^{2}\) can be rescaled and then used with stratified LDSC (S-LDSC) to examine the relative enrichment of heritability for given annotations. Here, we examined the tissue-specific enrichment through histone marker annotations of human tissues, given that the regulatory landscape has more tissue specificity than gene expressions57. For cell-type specific analyses, we used the mouse single cell ATAC-seq data because it is a comprehensive resource with established utility in prioritizing human risk variants58. The scaled genome-wide multivariate \({\chi }^{2}\) for each imaging metric, i.e. N0, ND, and NF, was regressed against the tissue-specific/cell-type-specific annotations, while controlling for the baseline annotations as recommended by S-LDSC57. We reported the signed enrichment Z statistics, as well as the corresponding multiple comparisons adjusted p values.
Calculation of shared genome-wide signals between two phenotypes
As proposed in other multivariate GWAS efforts23, for a given summary statistics of a phenotype, we first calculated the average magnitudes of associations in each of the approximately independent LD blocks77, deriving the unsigned polygenic signal profiles of a given trait. Spearman correlations were performed for each pair of the GWAS results, evaluating the level of overlapping in the genome-wide signals.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Data from UKB is available through UKB application (https://www.ukbiobank.ac.uk). The research has been conducted using the UK Biobank Resource under Application Number 27412. Adolescent data used in the preparation of this article were obtained from the Adolescent Brain Cognitive Development℠ Study (ABCD Study®) (https://abcdstudy.org), held in the NIMH Data Archive (NDA). ABCD data used in here is under the NDA study registered at https://doi.org/10.15154/1524729. Genomic locus and gene-set results can be found in the Supplementary Data. Full summary statistics can be found in LocusZoom.js78 (N0: https://my.locuszoom.org/gwas/575925/; ND: https://my.locuszoom.org/gwas/611203/; NF: https://my.locuszoom.org/gwas/644492/).
Code availability
ABCD processing codes can be found in github repository series (https://github.com/ABCD-STUDY). Codes used specifically for this study, including obtaining restricted spectrum imaging metrics, combined principal components GWAS, polyvoxel scores, and spatial regional enrichment analyses, can be found in the public accessible GITHUB page at (https://github.com/cmig-research-group/RSIGWAS). The code version used in this study is registered79. The main code base is on MATLAB version 2017b.
References
Molnár, Z., Luhmann, H. J. & Kanold, P. O. Transient cortical circuits match spontaneous and sensory-driven activity during development. Science 370, abb2153 (2020).
Geschwind, D. H. & Rakic, P. Cortical evolution: judge the brain by its cover. Neuron 80, 633–647 (2013).
Beck, D. et al. White matter microstructure across the adult lifespan: A mixed longitudinal and cross-sectional study using advanced diffusion models and brain-age prediction. Neuroimage 224, 117441 (2021).
Rapuano, K. M. et al. Nucleus accumbens cytoarchitecture predicts weight gain in children. Proc. Natl Acad. Sci. USA 117, 26977–26984 (2020).
White, N. S., Leergaard, T. B., D’Arceuil, H., Bjaalie, J. G. & Dale, A. M. Probing tissue microstructure with restriction spectrum imaging: Histological and theoretical validation. Hum. Brain Mapp. 34, 327–346 (2013).
Pines, A. R. et al. Leveraging multi-shell diffusion for studies of brain development in youth and young adulthood. Dev. Cogn. Neurosci. 43, 100788 (2020).
Tournier, J. D., Mori, S. & Leemans, A. Diffusion tensor imaging and beyond. Magn. Reson Med. 65, 1532–1556 (2011).
Hope, T. R. et al. Diffusion tensor and restriction spectrum imaging reflect different aspects of neurodegeneration in Parkinson’s disease. PLoS One 14, e0217922 (2019).
Khan, U. A. et al. Diagnostic utility of restriction spectrum imaging (RSI) in glioblastoma patients after concurrent radiation-temozolomide treatment: A pilot study. J. Clin. Neurosci. 58, 136–141 (2018).
McDonald, C. R. et al. Restriction spectrum imaging predicts response to bevacizumab in patients with high-grade glioma. Neuro Oncol. 18, 1579–1590 (2016).
Reas, E. T. et al. Sensitivity of restriction spectrum imaging to memory and neuropathology in Alzheimer’s disease. Alzheimers Res. Ther. 9, 55 (2017).
White, N. S. et al. Improved conspicuity and delineation of high-grade primary and metastatic brain tumors using “restriction spectrum imaging”: quantitative comparison with high B-value DWI and ADC. AJNR Am. J. Neuroradiol. 34, 958–964 (2013). S951.
Grasby, K. L. et al. The genetic architecture of the human cerebral cortex. Science 367, aay6690 (2020).
Hibar, D. P. et al. Common genetic variants influence human subcortical brain structures. Nature 520, 224–229 (2015).
Jahanshad, N. et al. Genome-wide scan of healthy human connectome discovers SPON1 gene variant influencing dementia severity. Proc. Natl Acad. Sci. USA 110, 4768–4773 (2013).
Stein, J. L. et al. Identification of common variants associated with human hippocampal and intracranial volumes. Nat. Genet 44, 552–561 (2012).
Zhao, B. et al. Genome-wide association analysis of 19,629 individuals identifies variants influencing regional brain volumes and refines their genetic co-architecture with cognitive and mental health traits. Nat. Genet 51, 1637–1644 (2019).
Zhao, B. et al. Large-scale GWAS reveals genetic architecture of brain white matter microstructure and genetic overlap with cognitive and mental health traits (n = 17,706). Mol. Psychiatry. 26, 3943–3955 (2019).
Liu, Z. & Lin, X. A Geometric Perspective on the Power of Principal Component Association Tests in Multiple Phenotype Studies. J. Am. Stat. Assoc. 114, 975–990 (2019).
Dick, A. S. et al. Meaningful associations in the adolescent brain cognitive development study. Neuroimage 239, 118262 (2021).
van der Meer, D. et al. Understanding the genetic determinants of the brain with MOSTest. Nat. Commun. 11, 3512 (2020).
Aschard, H. et al. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am. J. Hum. Genet 94, 662–676 (2014).
Naqvi, S. et al. Shared heritability of human face and brain shape. Nat. Genet 53, 830–839 (2021).
Miller, K. L. et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nat. Neurosci. 19, 1523–1536 (2016).
Hagler, D. J. et al. Image processing and analysis methods for the Adolescent Brain Cognitive Development Study. Neuroimage 202, 116091 (2019).
Casey, B. J. et al. The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32, 43–54 (2018).
McDonald, C. R. et al. Recovery of white matter tracts in regions of peritumoral FLAIR hyperintensity with use of restriction spectrum imaging. AJNR Am. J. Neuroradiol. 34, 1157–1163 (2013).
Efron, B. Prediction, Estimation, and Attribution. JASA 115, 636–655 (2020). 530.
Porter, H. F. & O’Reilly, P. F. Multivariate simulation framework reveals performance of multi-trait GWAS methods. Sci. Rep. 7, 38837 (2017).
Loughnan, R. J. et al. Generalization of Cortical Multivariate Genome-Wide Associations Within and Across Samples. bioRxiv, 2021.2004.2023.441215, https://doi.org/10.1101/2021.04.23.441215 (2021).
Zhao, W. et al. Individual Differences in Cognitive Performance Are Better Predicted by Global Rather Than Localized BOLD Activity Patterns Across the Cortex. Cereb. Cortex 31, 1478–1488 (2021).
Fan, C. C. et al. Williams syndrome-specific neuroanatomical profile and its associations with behavioral features. Neuroimage Clin. 15, 343–347 (2017).
Fan, C. C. et al. Williams Syndrome neuroanatomical score associates with GTF2IRD1 in large-scale magnetic resonance imaging cohorts: a proof of concept for multivariate endophenotypes. Transl. Psychiatry 8, 114 (2018).
Hofer, E. et al. Genetic correlations and genome-wide associations of cortical structure in general population samples of 22,824 adults. Nat. Commun. 11, 4796 (2020).
Elliott, L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562, 210–216 (2018).
Satizabal, C. L. et al. Genetic architecture of subcortical brain structures in 38,851 individuals. Nat. Genet 51, 1624–1636 (2019).
Zhao, Q. et al. Adolescent alcohol use disrupts functional neurodevelopment in sensation seeking girls. Addict Biol, e12914, https://doi.org/10.1111/adb.12914 (2020).
Persyn, E. et al. Genome-wide association study of MRI markers of cerebral small vessel disease in 42,310 participants. Nat. Commun. 11, 2175 (2020).
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol. 11, e1004219 (2015).
Stephenson, E. L. et al. Chondroitin sulfate proteoglycans as novel drivers of leucocyte infiltration in multiple sclerosis. Brain 141, 1094–1110 (2018).
Finan, C. et al. The druggable genome and support for target identification and validation in drug development. Sci Transl Med 9, https://doi.org/10.1126/scitranslmed.aag1166 (2017).
Freshour, S. L. et al. Integration of the Drug-Gene Interaction Database (DGIdb 4.0) with open crowdsource efforts. Nucleic Acids Res. 49, D1144–D1151 (2021).
Pyka, M. et al. Chondroitin sulfate proteoglycans regulate astrocyte-dependent synaptogenesis and modulate synaptic activity in primary embryonic hippocampal neurons. Eur. J. Neurosci. 33, 2187–2202 (2011).
Pantazopoulos, H., Woo, T. U., Lim, M. P., Lange, N. & Berretta, S. Extracellular matrix-glial abnormalities in the amygdala and entorhinal cortex of subjects diagnosed with schizophrenia. Arch. Gen. Psychiatry 67, 155–166 (2010).
Shah, A. & Lodge, D. J. A loss of hippocampal perineuronal nets produces deficits in dopamine system function: relevance to the positive symptoms of schizophrenia. Transl. Psychiatry 3, e215 (2013).
Jeanne, M. et al. Missense variants in DPYSL5 cause a neurodevelopmental disorder with corpus callosum agenesis and cerebellar abnormalities. Am. J. Hum. Genet 108, 951–961 (2021).
Hamdan, H. et al. Mapping axon initial segment structure and function by multiplexed proximity biotinylation. Nat. Commun. 11, 100 (2020).
Brot, S. et al. CRMP5 interacts with tubulin to inhibit neurite outgrowth, thereby modulating the function of CRMP2. J. Neurosci. 30, 10639–10654 (2010).
Quach, T. T., Honnorat, J., Kolattukudy, P. E., Khanna, R. & Duchemin, A. M. CRMPs: critical molecules for neurite morphogenesis and neuropsychiatric diseases. Mol. Psychiatry 20, 1037–1045 (2015).
Harrington, A. J. et al. MEF2C regulates cortical inhibitory and excitatory synapses and behaviors relevant to neurodevelopmental disorders. Elife 5, e20059 (2016).
Li, H. et al. Transcription factor MEF2C influences neural stem/progenitor cell differentiation and maturation in vivo. Proc. Natl Acad. Sci. USA 105, 9397–9402 (2008).
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet 50, 1112–1121 (2018).
Anney, R. et al. A genome-wide scan for common alleles affecting risk for autism. Hum. Mol. Genet 19, 4072–4082 (2010).
Davies, G. et al. Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nat. Commun. 9, 2098 (2018).
Karlsson Linnér, R. et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat. Genet 51, 245–257 (2019).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet 50, 621–629 (2018).
Hook, P. W. & McCallion, A. S. Leveraging mouse chromatin data for heritability enrichment informs common disease architecture and reveals cortical layer contributions to schizophrenia. Genome Res. 30, 528–539 (2020).
Cross-Disorder Group of the Psychiatric Genomics Consortium. Novel Loci and Pleiotropic Mechanisms across Eight Psychiatric Disorders. Cell 179, 1469–1482.e1411 (2019).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet 51, 431–444 (2019).
Stahl, E. A. et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet 51, 793–803 (2019).
Nagel, M. et al. Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nat. Genet 50, 920–927 (2018).
Pardiñas, A. F. et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet 50, 381–389 (2018).
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet 50, 668–681 (2018).
Wightman, D. P. et al. Largest GWAS (N=1,126,563) of Alzheimer’s Disease Implicates Microglia and Immune Cells. medRxiv, 2020.2011.2020.20235275, https://doi.org/10.1101/2020.11.20.20235275 (2020).
de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet 49, 256–261 (2017).
Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet 51, 63–75 (2019).
Stiles, J. & Jernigan, T. L. The basics of brain development. Neuropsychol. Rev. 20, 3270348 (2010).
Castello, M. & Gleeson, J. G. Insight into developmental mechanisms of global and focal migration disorders of cortical development. Curr. Opin. Neurobiol. 66, 77–84 (2021).
Norbom, L. B. et al. New insights into the dynamic development of the cerebral cortex in childhood and adolescence: Integrating macro- and microstructural MRI findings. Prog. Neurobiol. 204, 102109 (2021).
Gentili, G. et al. The case for preregistering all region of interest (ROI) analyses in neuroimaging research. European Journal of Neuroscience 53, 357–361 (2020)
Smith, S. M. & Nichols, T. E. Statistical Challenges in “Big Data” Human Neuroimaging. Neuron 97, 263–268 (2018).
Nichols, T. E. et al. Best practices in data analysis and sharing in neuroimaging using MRI. Nat. Neurosci. 20, 299–303 (2017).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet 48, 1279–1283 (2016).
Gandal, M. J. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).
Ge, T. et al. Multidimensional heritability analysis of neuroanatomical shape. Nat. Commun. 7, 13291 (2016).
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).
Boughton, A. P. et al. LocusZoom.js: Interactive and embeddable visualization of genetic association study results. Bioinformatics 37, 3017–3018 (2021).
Fan, C. et al. Multivariate genome-wide association study on tissue-sensitive diffusion metrics highlights pathways that shape the human brain. RSIGWAS: release 1.0. https://doi.org/10.5281/zenodo.6289762. (2022)
Acknowledgements
This work was supported by grant R01MH122688, RF1MH120025, and R01MH118281 funded by the National Institute for Mental Health (NIMH). Data used in the preparation of this article were obtained from the Adolescent Brain Cognitive Development SM(ABCD) Study (https://abcdstudy.org), held in the NIMH Data Archive (NDA). The ABCD Study® is supported by the National Institutes of Health and additional federal partners under award numbers U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123, U24DA041147. A full list of supporters is available at https://abcdstudy.org/federal-partners.html. A listing of participating sites and a complete listing of the study investigators can be found at https://abcdstudy.org/consortium_members/. ABCD consortium investigators designed and implemented the study and/or provided data but did not necessarily participate in the analysis or writing of this report. This manuscript reflects the views of the authors and may not reflect the opinions or views of the NIH or ABCD consortium investigators. The ABCD data repository grows and changes over time. The ABCD data used in this report came from https://doi.org/10.15154/1524729. The fast track data release used in this report are available at https://nda.nih.gov/edit_collection.html?id=2573. Instructions on how to create an NDA study are available at https://nda.nih.gov/training/modules/study.html). We specially thank Megan Chang for her assistance on organizing and curating the genomic relevant documents.
Author information
Authors and Affiliations
Contributions
C.C.F., A.M.D., and O.A.A. conceptualized and designed the study. C.C.F. and R.L. performed the analyses. D.J.H. and O.F. processed the data. C.C.F. interpreted the results and wrote the draft of the manuscript. R.L., C.M., D.P., C.H.C., D.J.H., W.K.T., N.P., D.M., O.F., O.A.A., and A.M.D. provide critical inputs for the revision of manuscripts.
Corresponding author
Ethics declarations
Competing interests
Dr. Andreassen has received speaker’s honorarium from Lundbeck and Sunovion, and is a consultant to HealthLytix. Dr. Dale is a Founder of and holds equity in CorTechs Labs, Inc, and serves on its Scientific Advisory Board. He is a member of the Scientific Advisory Board of Human Longevity, Inc. and receives funding through research agreements with General Electric Healthcare and Medtronic, Inc. The terms of these arrangements have been reviewed and approved by UCSD in accordance with its conflict of interest policies. The other authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Max Lam and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Fan, C.C., Loughnan, R., Makowski, C. et al. Multivariate genome-wide association study on tissue-sensitive diffusion metrics highlights pathways that shape the human brain. Nat Commun 13, 2423 (2022). https://doi.org/10.1038/s41467-022-30110-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-022-30110-3
- Springer Nature Limited
This article is cited by
-
Unsupervised ensemble-based phenotyping enhances discoverability of genes related to left-ventricular morphology
Nature Machine Intelligence (2024)
-
Abundant pleiotropy across neuroimaging modalities identified through a multivariate genome-wide association study
Nature Communications (2024)
-
Genotype Data and Derived Genetic Instruments of Adolescent Brain Cognitive Development Study® for Better Understanding of Human Brain Development
Behavior Genetics (2023)