Abstract
Genome-wide association studies of brain imaging phenotypes are mainly performed in European populations, but other populations are severely under-represented. Here, we conducted Chinese-alone and cross-ancestry genome-wide association studies of 3,414 brain imaging phenotypes in 7,058 Chinese Han and 33,224 white British participants. We identified 38 new associations in Chinese-alone analyses and 486 additional new associations in cross-ancestry meta-analyses at P < 1.46 × 10−11 for discovery and P < 0.05 for replication. We pooled significant autosomal associations identified by single- or cross-ancestry analyses into 6,443 independent associations, which showed uneven distribution in the genome and the phenotype subgroups. We further divided them into 44 associations with different effect sizes and 3,557 associations with similar effect sizes between ancestries. Loci of these associations were shared with 15 brain-related non-imaging traits including cognition and neuropsychiatric disorders. Our results provide a valuable catalog of genetic associations for brain imaging phenotypes in more diverse populations.
Similar content being viewed by others
Data availability
All summary statistics of CH-GWASs based on the 7,058 CHIMGEN participants and cross-ancestry GWAS meta-analyses based on 7,058 CHIMGEN and 33,224 UKBB participants are freely available on the website http://www.mulinlab.org/pheweb/ and at the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/home) with the accession numbers GCST90383981–GCST90390808. All WB-GWAS results based on UKBB participants are available at the Oxford Brain Imaging Genetics web browser (http://big.stats.ox.ac.uk/). The LD reference panels of the 1KGP can be found at https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html. SG10K whole-genome sequencing data are available at https://ega-archive.org with accession number EGAS00001003875. The Reactome Pathway Database is available at https://reactome.org/. For the genetic colocalization analyses, we used summary statistical data from several GWASs of non-imaging traits, and the corresponding accessible links for these data are provided in Supplementary Table 30.
Code availability
We made use of publicly available software, tools and algorithms. Code used to generate results reported in this paper is available on Zenodo at https://doi.org/10.5281/zenodo.10920937 (ref. 46). Software used includes AutoPtx in FSL version 5.0.10 (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIRST), BGENIE version 1.3 (https://jmarchini.org/bgenie/), the BBR algorithm in FSL version 5.0.10 (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIRST), CAT12 r1364 (http://dbm.neuro.uni-jena.de/cat), coloc version 5.1.0 (https://chr1swallace.github.io/coloc/), ComBat harmonization (https://github.com/precision-medicine-um/ComBatHarmonization), the DARTEL algorithm in SPM12 r7771 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/), FINEMAP version 1.4.1 (http://www.christianbenner.com/), FIRST in FSL version 5.0.10 (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIRST), FSL version 5.0.10 (http://www.fmrib.ox.ac.uk/fsl), FreeSurfer version 6.0.0 (http://surfer.nmr.mgh.harvard.edu/), GCTA version 1.92.1 (https://yanglab.westlake.edu.cn/software/gcta/), IMPUTE2 version 2.3.2 (http://mathgen.stats.ox.ac.uk/impute/impute_v2.html), KING version 2.2.5 (https://www.kingrelatedness.com/), LocusFocus version 1.6.0 (https://github.com/naim-panjwani/LocusFocus/), METASOFT version 2.0.0 (http://genetics.cs.ucla.edu/meta/), PANTHER 18.0 (http://pantherdb.org/), PheWeb version 1.3.16 (https://github.com/statgen/pheweb/), PLINK version 2.0.0a (http://zzz.bwh.harvard.edu/plink/), PROBTRACKX in FSL version 5.0.10 (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIRST), Popcorn version 1.0 (https://github.com/brielin/Popcorn), QCTOOL version 2.0.8 (https://www.well.ox.ac.uk/~gav/qctool_v2/), SAIGE version 0.39.2 (https://saigegit.github.io/), SHAPEIT2 version 2.r904 (https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html), SPM12 r7771 (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/) and Winner’s Curse Adjustment version 0.1.1 (https://amandaforde.github.io/winnerscurse/index.html).
References
Deco, G. & Kringelbach, M. L. Great expectations: using whole-brain computational connectomics for understanding neuropsychiatric disorders. Neuron 84, 892–905 (2014).
Jansen, A. G., Mous, S. E., White, T., Posthuma, D. & Polderman, T. J. C. What twin studies tell us about the heritability of brain development, morphology, and function: a review. Neuropsychol. Rev. 25, 27–46 (2015).
Makowski, C. et al. Discovery of genomic loci of the human cerebral cortex using genetically informed brain atlases. Science 375, 522–528 (2022).
Elliott, L. T. et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 562, 210–216 (2018).
Zhao, B. X. et al. Common genetic variation influencing human white matter microstructure. Science 372, eabf3736 (2021).
Hibar, D. P. et al. Common genetic variants influence human subcortical brain structures. Nature 520, 224–229 (2015).
Grasby, K. L. et al. The genetic architecture of the human cerebral cortex. Science 367, eaay6690 (2020).
Kang, D. W. et al. Differences in cortical structure between cognitively normal East Asian and Caucasian older adults: a surface-based morphometry study. Sci. Rep. 10, 20905 (2020).
Tang, Y. C. et al. Brain structure differences between Chinese and Caucasian cohorts: a comprehensive morphometry study. Hum. Brain Mapp. 39, 2147–2155 (2018).
Tang, Y. C. et al. The construction of a Chinese MRI brain atlas: a morphometric comparison study between Chinese and Caucasian cohorts. Neuroimage 51, 33–41 (2010).
Langer-Gould, A., Brara, S. M., Beaber, B. E. & Zhang, J. L. Incidence of multiple sclerosis in multiple racial and ethnic groups. Neurology 80, 1734–1739 (2013).
Papp, V. et al. Worldwide incidence and prevalence of neuromyelitis optica: a systematic review. Neurology 96, 59–77 (2021).
Meeker, K. L. et al. Socioeconomic status mediates racial differences seen using the AT(N) framework. Ann. Neurol. 89, 254–265 (2021).
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).
Smith, S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nat. Neurosci. 24, 737–745 (2021).
Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).
Chen, M. H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182, 1198–1213 (2020).
Liu, N. A. et al. Cross-ancestry genome-wide association meta-analyses of hippocampal and subfield volumes. Nat. Genet. 55, 1126–1137 (2023).
Xu, Q. et al. CHIMGEN: a Chinese imaging genetics cohort to enhance cross-ethnic and cross-geographic brain research. Mol. Psychiatry 25, 517–529 (2020).
Fortin, J. P. et al. Harmonization of multi-site diffusion tensor imaging data. Neuroimage 161, 149–170 (2017).
Faye, L. L., Sun, L., Dimitromanolakis, A. & Bull, S. B. A flexible genome-wide bootstrap method that accounts for ranking- and threshold-selection bias in GWAS interpretation and replication study design. Stat. Med. 30, 1898–1912 (2011).
Mela, A. & Goldman, J. E. The tetraspanin KAI1/CD82 is expressed by late-lineage oligodendrocyte precursors and may function to restrict precursor migration and promote oligodendrocyte differentiation and myelination. J. Neurosci. 29, 11172–11181 (2009).
Zhao, Y. et al. CD82–TRPM7–Numb signaling mediates age-related cognitive impairment. Geroscience 42, 595–611 (2020).
Suzuki, T. et al. Genome-wide association study of epilepsy in a Japanese population identified an associated region at chromosome 12q24. Epilepsia 62, 1391–1400 (2021).
Hautakangas, H. et al. Genome-wide analysis of 102,084 migraine cases identifies 123 risk loci and subtype-specific risk alleles. Nat. Genet. 54, 152–160 (2022).
Zhou, W. et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Hong, S. E. et al. Autosomal recessive lissencephaly with cerebellar hypoplasia is associated with human RELN mutations. Nat. Genet. 26, 93–96 (2000).
Mao, D. X. et al. De novo EIF2AK1 and EIF2AK2 variants are associated with developmental delay, leukoencephalopathy, and neurologic decompensation. Am. J. Hum. Genet. 106, 570–583 (2020).
Yang, J. A. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Brown, B. C., Asian Genetic Epidemiology Network Type 2 Diabetes Consortium, Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
Gay, N. R. et al. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 21, 233 (2020).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Gong, J. F. et al. Genetic association and transcriptome integration identify contributing genes and tissues at cystic fibrosis modifier loci. PLoS Genet. 15, e1008007 (2019).
Buch, A. M. & Liston, C. Dissecting diagnostic heterogeneity in depression by integrating neuroimaging and genetics. Neuropsychopharmacology 46, 156–175 (2021).
Abi-Dargham, A. & Horga, G. The search for imaging biomarkers in psychiatric disorders. Nat. Med. 22, 1248–1255 (2016).
Song, W. C., Qian, W., Wang, W. D., Yu, S. Y. & Lin, G. N. Mendelian randomization studies of brain MRI yield insights into the pathogenesis of neuropsychiatric disorders. BMC Genomics 22, 342 (2021).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Alfaro-Almagro, F. et al. Image processing and quality control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, 400–424 (2018).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Wu, D. et al. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell 179, 736–749 (2019).
Delaneau, O., Zagury, J. F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).
Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).
Fu, J. Cross-ancestry genome-wide association studies of brain imaging phenotypes. Zenodo https://doi.org/10.5281/zenodo.10920937 (2024).
Acknowledgements
We are grateful to participants and researchers of CHIMGEN, who generously donated their time to make this resource available. We acknowledge funding from the National Key Research and Development Program of China (2018YFC1314300) and the National Natural Science Foundation of China (82030053, 81425013) to C. Yu. We thank the SG10K Consortium for collecting and sharing the high-coverage whole-genome sequencing data of Asian populations. We acknowledge the UKBB for providing GWAS summary statistics of brain IDPs. For the genetic colocalization analyses, we used summary statistical data from several GWASs of non-imaging traits. We thank groups (BioBank Japan, the Complex Traits Genetics laboratory, the Early Genetics and Lifecourse Epidemiology consortium, GIANT, the Genetics of Personality Consortium, the GWAS and Sequencing Consortium of Alcohol and Nicotine use, the International Multiple Sclerosis Genetics Consortium, the METASTROKE collaboration, the Psychiatric Genomics Consortium, the Social Science Genetic Association Consortium, the UKBB) and authors (J.M. Lane, J.N. Foo, K. Ishigaki, M.A. Nalls, S.L. Pulit and W. van Rheenen) for making these data publicly available and all the participants and researchers in these studies.
Author information
Authors and Affiliations
Consortia
Contributions
C. Yu and J.F. designed the study and wrote the manuscript. J.F., Jianhua Wang and Quan Zhang analyzed data. C. Yu, M.J.L. and J. Cheng supervised this work. J.F., Quan Zhang, Jianhua Wang, M.W., B.Z., Wenzhen Zhu, S.Q., Z.G., G.C., Y. Yu, W. Liao, Hui Zhang, B.G., X.X., T.H., Z. Yao, W.Q., F.L., M. Liang, S. Wang, Q.X., Jiayuan Xu, P.Z., W. Li, D.S., C. Wang, S.L., Z. Yan, F.C., Jing Zhang, Jiance Li, W.S., Y.M., D.W., J. Xian, J.-H.G., Xiaochu Zhang, K. Xu, X.-N.Z., L. Zhang, Z. Ye, J. Cheng, M.J.L. and C. Yu acquired data. All authors critically reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Jason Stein and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Consistency of approaches for reducing the scanner effect.
Combat harmonization is applied to brain IDPs to reduce the scanner effect and generate 763 genome-wide significant associations (linear regression model, two-sided P < 5 × 10−8, uncorrected) in the discovery stage of CH-GWASs. Based on these IDPs without Combat harmonization, the association analyses are repeated by including scanner as covariate. Scatter plots show the relationships of effect sizes (763 associations) derived from the two approaches, in which effect sizes are estimated by beta values from both discovery (n = 5,025) (a) and replication (n = 2,033) (b) stages of CH-GWASs. Data are presented as beta values ± SE.
Extended Data Fig. 2 Ancestry-specific effect sizes of reliable associations.
The effect sizes are estimated for the 77 reliable variant-IDP associations (discovery: two-sided P < 1.46 × 10−11, Bonferroni corrected, n = 5,025; replication: two-sided P < 3.24 × 10−4, Bonferroni corrected, n = 2,033) identified by CH-GWASs (linear regression model) that are also included in WB-GWASs (n = 22,138). Each dot means an association colored by EAF differences. The effect sizes of the 77 associations are estimated before (a) and after (b) the winner’s curse correction. Data are presented as beta values ± SE.
Extended Data Fig. 3 MAF and novel discovery in CH-GWASs.
a, Inter-ancestry MAF comparison between CHIMGEN and UKBB in seven LD-independent variants from novel associations identified by CH-GWASs. Two-sided P-value is calculated by paired sample Wilcoxon test (n = 7). b, Intra-ancestry MAF comparison in UKBB between seven LD-independent novel variants identified by CH-GWASs and 246 LD-independent variants identified by WB-GWASs. Two-sided P-value is calculated by Wilcoxon rank sum test (pink, n = 7; red, n = 246) c, Pair-wise inter-ancestry MAF comparison between CHIMGEN and UKBB in the seven LD-independent variants from novel CH-GWAS associations. In (a) and (b), the box bounds the interquartile ranges (IQRs) divided by the median, and whiskers represent minima and maxima of the data points.
Extended Data Fig. 4 Spatial distribution of the pooled associations in each brain atlas or parcellation.
a, Diedrichsen cerebellar atlas (SUIT); b, Harvard-Oxford subcortical atlas; c, Harvard-Oxford cortical atlas; d, Subcortical structures from FIRST; e, Subcortical structures from aseg; f, BA exvivo parcellation; g, Desikan-Killiany (DK) parcellation; h, Desikan-Killiany-Tourville (DKT) parcellation; i, Parcellation based on the pial surface using Desikan-Killiany parcellation (pial); j, JHU fibers; k, UKBB 100-component group-ICA. In (a-j), colors represent the numbers of significant associations for each brain region. In (k), circle colors reflect the numbers of significant associations for functional activity amplitude of each brain RSN and a link indicates the existence of a significant association for functional connectivity between every two RSNs.
Extended Data Fig. 5 SNP-based heritability of brain IDPs.
a, SNP-based heritability of brain structural (top), functional (middle), and diffusion (bottom) IDPs in CHIMGEN using GCTA. b, SNP-based heritability of these IDPs in CHIMGEN (top, GCTA) and UKBB (bottom, LDSC). Points are colored according to the phenotype subgroups, and circles and stars show significant (mixed linear model, two-sided P < 0.05, uncorrected) and non-significant (mixed linear model, two-sided P > 0.05, uncorrected) heritability, respectively.
Extended Data Fig. 6 Colocalization between schizophrenia and brain IDPs.
Ideogram shows 543 colocalizations between schizophrenia and 263 brain IDPs, involving 31 independent colocalization loci at 14 chromosomes.
Extended Data Fig. 7 Performance of ComBat harmonization.
a, b, Correlation matrices show inter-scanner correlations of surface areas across the 74 left cerebral cortical subregions derived from the Destrieus (a2009s) parcellation before (a) and after (b) ComBat harmonization in a representative participant who travels to and is scanned at 28 MR scanners. c, d, Correlation matrices show inter-scanner correlations of L1 across the 48 white matter fiber tracts derived from the JHU atlas before (c) and after (d) harmonization in the same participant. e, Distributions of surface areas of the 74 subregions in 7,058 CHIMGEN participants among 30 MR scanners before (upper row) and after (lower row) harmonization. f, Distributions of L1 of the 48 white matter fiber tracts in 7,058 CHIMGEN participants among 30 MR scanners before (upper row) and after (lower row) harmonization. In violin plots of (e) and (f), white dots represent the median, thick bars represent the interquartile range, and thin bars represent the entire data range. Note: MRI data of center 15 are acquired at center 1 and MRI data of center 20 are acquired at center 2.
Extended Data Fig. 8 Quality assessments of CHIMGEN genetic data.
a, Genotype concordance (>99.79%) of non-missing calls between 86 pairs of duplicate samples indicates high genotyping reproducibility. b, Genotype heterozygosity and missing rates of 549,309 variants in 7,195 CHIMGEN participants. Missing rates of 4 participants are greater than 3% and heterozygosity rates of 15 participants are greater than five times standard deviations from the mean. c, Correlation of allele frequency across 524,924 overlapping variants between CHIMGEN (n = 7,163) and SG10K (n = 4,810). Bin is colored according to the log10-scaled number of variants within the bin. d, MAF distribution of 704,555 variants in 7,191 CHIMGEN participants. The inset figure shows variant counts (n = 142,828) with MAF < 0.001. e, f, Genetic population stratification tested by PCA in 549,309 variants of the 7,191 CHIMGEN participants with seven (CDX, CHB, CHS, KIV, JPT, CEU, and YRI; e) and five (CDX, CHB, CHS, KIV, and JPT; f) ancestral populations from 1KGP, we detect one outlier (a participant) that deviates from the CHIMGEN population. MAF, minor allele frequency; PCA, principal component analysis.
Extended Data Fig. 9 Imputation performance of CHIMGEN genetic data.
a, b, Imputation performance for the reference panels of 1KGP (blue), SG10K (orange), and 1KGP + SG10K (green). The performance is assessed by the number of high-quality (INFO > 0.8) imputed variants in the four MAF bins (0.005-0.01, 0.01-0.05, 0.05-0.2, and 0.2–0.5) on chromosome 2 (a) and is evaluated by the imputation error rates for the 4,373 masked variants on chromosome 2 in 90 participants (b). c, Imputation performance of the assumed genotyping data from four genotyping arrays. The consistency of imputed genotypes based on variants included in the ASA-750K array with direct sequenced genotypes on chromosome 20 is higher than those obtained from other arrays. The x-axis represents each array, and the y-axis shows imputation performance assessed by the concordance rate in 90 participants. d, Distribution of INFO values of autosomal variants in the imputed dataset. The top left graph shows the INFO distribution of all variants. The remaining graphs show INFO distributions in the bins with different MAF ranges. e, Distributions of MAF and INFO of the finally included 6,830,145 autosomal variants. In (b) and (c), the box bounds the interquartile ranges (IQRs) divided by the median, and whiskers extend to a maximum of 1.5 × IQR beyond the box. Illumina GSA, Illumina Global Screening Array; INFO, imputation information score; MAF, minor allele frequency.
Extended Data Fig. 10 Selecting principal components as covariates.
Principal components (PCs) are generated based on genomic data after quality control (549,309 variants of the 7,163 CHIMGEN participants). a, Eigenvalues against the numbers of principal components. b, Scatter plots show the relationships between PC1 and PC2 (left panel) and between PC3 and PC4 (right panel). Each point represents a participant and is placed according to their corresponding principal component scores.
Supplementary information
Supplementary Information
Supplementary Note and Figs. 1–10
Supplementary Tables 1–40
Supplementary Tables 1–40.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fu, J., Zhang, Q., Wang, J. et al. Cross-ancestry genome-wide association studies of brain imaging phenotypes. Nat Genet 56, 1110–1120 (2024). https://doi.org/10.1038/s41588-024-01766-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01766-y
- Springer Nature America, Inc.