Abstract
Admixed populations are routinely excluded from genomic studies due to concerns over population structure. Here, we present a statistical framework and software package, Tractor, to facilitate the inclusion of admixed individuals in association studies by leveraging local ancestry. We test Tractor with simulated and empirical two-way admixed African–European cohorts. Tractor generates accurate ancestry-specific effect-size estimates and P values, can boost genome-wide association study (GWAS) power and improves the resolution of association signals. Using a local ancestry-aware regression model, we replicate known hits for blood lipids, discover novel hits missed by standard GWAS and localize signals closer to putative causal variants.
Similar content being viewed by others
Data availability
All summary statistics described here for total and LDL cholesterol in ~4,300 admixed UK Biobank individuals can be found at https://github.com/eatkinson/Tractor_ms_results and have been uploaded to the GWAS catalog under accession numbers GCST90012868–GCST90012873 (https://www.ebi.ac.uk/gwas/deposition/bodyofwork/GCP000093). The UK Biobank raw data can be obtained through a data access application available at https://www.ukbiobank.ac.uk. PGC-PTSD data can be obtained through a data access application at https://pgc-ptsd.com/data-samples/access-data/. BioBank Japan summary statistics are available at http://jenger.riken.jp/en/. The 1000 Genomes reference panel is available at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The Human Genome Diversity Project dataset is available at https://www.internationalgenome.org/data-portal/data-collection/hgdp.
Code availability
All code is freely available. The automated quality control pipeline to prepare datasets for Tractor and run LAI is located at https://github.com/eatkinson/Post-QC. We freely provide Tractor code in Python and Hail, as well as examples of implementation in Jupyter notebook at https://github.com/eatkinson/Tractor alongside a detailed wiki. Specific scripts used to produce the simulated data and results are additionally freely provided at https://github.com/eatkinson/Tractor_ms_results.
References
Parker, K., Menasce Horowitz, J., Morin, R. & Lopez, M. H. Multiracial in America: Proud, Diverse and Growing in Numbers (Pew Research Center, 2015); https://www.pewsocialtrends.org/2015/06/11/multiracial-in-america/
Bhardwaj, A. et al. Racial disparities in prostate cancer: a molecular perspective. Front. Biosci. 22, 772–782 (2017).
Grizzle, W. E. et al. Self‐identified African Americans and prostate cancer risk: West African genetic ancestry is associated with prostate cancer diagnosis and with higher Gleason sum on biopsy. Cancer Med. 8, 6915–6922 (2019).
Duggan, M. A., Anderson, W. F., Altekruse, S., Penberthy, L. & Sherman, M. E. The Surveillance, Epidemiology, and End Results (SEER) program and pathology: toward strengthening the critical relationship. Am. J. Surg. Pathol. 40, e94–e102 (2016).
Freedman, M. L. et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African–American men. Proc. Natl Acad. Sci. USA 103, 14068–14073 (2006).
Bateman, E. D. et al. Global strategy for asthma management and prevention: GINA executive summary. Eur. Respir. J. 31, 143–178 (2008).
Daya, M. & Barnes, K. C. African American ancestry contribution to asthma and atopic dermatitis. Ann. Allergy Asthma Immunol. 122, 456–462 (2019).
Wyss, A. B. et al. Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function. Nat. Commun. 9, 2976 (2018).
Demenais, F. et al. Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks. Nat. Genet. 50, 42–50 (2018).
Benetos, A. & Aviv, A. Ancestry, telomere length, and atherosclerosis risk. Circ. Cardiovasc. Genet. 10, e001718 (2017).
Mozaffarian, D. et al. Heart disease and stroke statistics—2015 update. Circulation 131, e29–e322 (2015).
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177, 26–31 (2019).
Popejoy, A. B. & Fullerton, S. M. Genomics is falling. Nature 538, 161–164 (2016).
Sul, J. H., Martin, L. S. & Eskin, E. Population structure in genetic studies: confounding factors and mixed models. PLoS Genet. 14, e1007309 (2018).
Huang, H. et al. Bootstrat: population informed bootstrapping for rare variant tests. Preprint at bioRxiv https://doi.org/10.1101/068999 (2016).
Sohail, M. et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife 8, e39702 (2019).
Berg, J. J. et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8, e39725 (2019).
Lander, E. S. & Schork, N. J. Genetic dissection of complex traits. Science 265, 2037–2048 (1994).
Coram, M. A., Fang, H., Candille, S. I., Assimes, T. L. & Tang, H. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101, 218–226 (2017).
Walters, R. K. et al. Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat. Neurosci. 21, 1656–1669 (2018).
Martin, E. R. et al. Properties of global- and local-ancestry adjustments in genetic association tests in admixed populations. Genet. Epidemiol. 42, 214–229 (2018).
Stevenson, A. et al. Neuropsychiatric genetics of African populations—psychosis (NeuroGAP—Psychosis): a case-control study protocol and GWAS in Ethiopia, Kenya, South Africa and Uganda. BMJ Open 9, e025469 (2019).
The H3Africa Consortium., Enabling the genomic revolution in Africa. Science 344, 1346–1348 (2014).
Kowalski, M. H. et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 15, e1008500 (2019).
The Precision Medicine Initiative Cohort Program—Building a Research Foundation for 21st Century Medicine. Precision Medicine Initiative Working Group Report to the Advisory Committee to the Director, NIH (Precision Medicine Initiative Working Group, 2015).
Logue, M. W. et al. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: posttraumatic stress disorder enters the age of large-scale genomic collaboration. Neuropsychopharmacology 40, 2287–2297 (2015).
Bien, S. A. et al. The future of genomic studies must be globally representative: perspectives from PAGE. Ann. Rev. Genom. Hum. Genet. 20, 181–200 (2019).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).
Hero, J. O., Zaslavsky, A. M. & Blendon, R. J. The United States leads other nations in differences by income in perceptions of health and health care. Health Aff. 36, 1032–1040 (2017).
Williams, D. R., Priest, N. & Anderson, N. B. Understanding associations among race, socioeconomic status, and health: patterns and prospects. Health Psychol. 35, 407–411 (2016).
2016 National Healthcare Quality and Disparities Report (Agency for Healthcare Research and Quality, 2017).
Li, Y. R. & Keating, B. J.Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med. 6, 91 (2014).
Spain, S. L. & Barrett, J. C. Strategies for fine-mapping complex traits. Hum. Mol. Genet. 24, R111–R119 (2015).
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
Wu, Y. et al. Trans-ethnic fine-mapping of lipid loci identifies population-specific signals and allelic heterogeneity that increases the trait variance explained. PLoS Genet. 9, e1003379 (2013).
Van de Bunt, M. et al. Evaluating the performance of fine-mapping strategies at common variant GWAS loci. PLoS Genet. 11, e1005535 (2015).
Mahajan, A. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014).
The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Zhang, J. & Stram, D. O. The role of local ancestry adjustment in association studies using admixed populations. Genet. Epidemiol. 38, 502–515 (2014).
Lachance, J. et al. Evolutionary history and adaptation from high-coverage whole-genome sequences of diverse African hunter-gatherers. Cell 150, 457–469 (2012).
Tang, H., Siegmund, D. O., Johnson, N. A., Romieu, I. & London, S. J. Joint testing of genotype and ancestry association in admixed families. Genet. Epidemiol. 34, 783–791 (2010).
Coram, M. A. et al. Genome-wide characterization of shared and distinct genetic components that influence blood lipid levels in ethnically diverse human populations. Am. J. Hum. Genet. 92, 904–916 (2013).
Aschard, H., Gusev, A., Brown, R. & Pasaniuc, B. Leveraging local ancestry to detect gene–gene interactions in genome-wide data. BMC Genet. 16, 124 (2015).
Zaitlen, N., Pas, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).
Pasaniuc, B. et al. Enhanced statistical tests for GWAS in admixed populations: assessment using African Americans from CARe and a Breast Cancer Consortium. PLoS Genet. 7, e1001371 (2011).
Pasaniuc, B. et al. Analysis of Latino populations from GALA and MEC studies reveals genomic loci with biased local ancestry estimation. Bioinformatics 29, 1407–1415 (2013).
Chimusa, E. R. et al. Genome-wide association study of ancestry-specific TB risk in the South African coloured population. Hum. Mol. Genet. 23, 796–809 (2014).
Smith, E. N. et al. Genome-wide association study of bipolar disorder in European American and African American individuals. Mol. Psychiatry 14, 755–763 (2009).
Skotte, L., Jørsboe, E., Korneliussen, T. S., Moltke, I. & Albrechtsen, A. Ancestry‐specific association mapping in admixed populations. Genet. Epidemiol. 43, 506–521 (2019).
Shriner, D.Overview of admixture mapping. Curr. Protoc. Hum. Genet. 94, 1.23.1–1.23.8 (2013).
Chen, M. et al. Admixture mapping analysis in the context of GWAS with GAW18 data. BMC Proc. 8, S3 (2014).
Chen, W. et al. A generalized sequential Bonferroni procedure for GWAS in admixed populations incorporating admixture mapping information into association tests. Hum. Hered. 79, 80–92 (2015).
Hoggart, C. J., Shriver, M. D., Kittles, R. A., Clayton, D. G. & McKeigue, P. M. Design and analysis of admixture mapping studies. Am. J. Hum. Genet. 74, 965–978 (2004).
Patterson, N. et al. Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979–1000 (2004).
Spear, M. L. et al. A genome-wide association and admixture mapping study of bronchodilator drug response in African Americans with asthma. Pharmacogenomics J. 19, 249–259 (2019).
Gignoux, C. R. et al. An admixture mapping meta-analysis implicates genetic variation at 18q21 with asthma susceptibility in Latinos. J. Allergy Clin. Immunol. 143, 957–969 (2019).
Shetty, P. B. et al. Variants for HDL-C, LDL-C, and triglycerides identified from admixture mapping and fine-mapping analysis in African American families. Circ. Cardiovasc. Genet. 8, 106–113 (2015).
Shetty, P. B. et al. Variants in CXADR and F2RL1 are associated with blood pressure and obesity in African–Americans in regions identified through admixture mapping. J. Hypertens. 30, 1970–1976 (2012).
Reiner, A. P. et al. Genome-wide association and population genetic analysis of c-reactive protein in African American and Hispanic American women. Am. J. Hum. Genet. 91, 502–512 (2012).
Florez, J. C. et al. in Racial Identities, Genetic Ancestry, and Health in South America: Argentina, Brazil, Colombia, and Uruguay (eds Gibbon, S. et al.) 137–153 (Palgrave Macmillan, 2011).
Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D.RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).
Geza, E. et al. A comprehensive survey of models for dissecting local ancestry deconvolution in human genome. Brief. Bioinform. 20, 1709–1724 (2019).
Schubert, R., Andaleon, A. & Wheeler, H. E.Comparing local ancestry inference models in populations of two- and three-way admixture. PeerJ 8, e10090 (2020).
Tishkoff, S. A. et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).
Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).
Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).
Choi, Y., Chan, A. P., Kirkness, E., Telenti, A. & Schork, N. J. Comparison of phasing strategies for whole human genomes. PLoS Genet. 14, e1007308 (2018).
Andrés, A. M. et al. Understanding the accuracy of statistical haplotype inference with sequence data of known phase. Genet. Epidemiol. 31, 659–671 (2007).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Natarajan, P. et al. Deep-coverage whole genome sequences and blood lipids among 16,324 individuals. Nat. Commun. 9, 3391 (2018).
Musunuru, K. & Kathiresan, S. Genetics of common, complex coronary artery disease. Cell 177, 132–145 (2019).
Rotimi, C. N. et al. The genomic landscape of African populations in health and disease. Hum. Mol. Genet. 26, R225–R236 (2017).
Superko, H. R., Momary, K. M. & Li, Y. Statins personalized. Med. Clin. North Am. 96, 123–139 (2012).
Fu, J. et al. Unraveling the regulatory mechanisms underlying tissue-dependent genetic variation of gene expression. PLoS Genet. 8, e1002431 (2012).
Avery, C. L. et al. A phenomics-based strategy identifies loci on APOC1, BRAP, and PLCG1 associated with metabolic syndrome phenotype domains. PLoS Genet. 7, e1002322 (2011).
Lettre, G. et al. Genome-wide association study of coronary heart disease and its risk factors in 8,090 African Americans: the NHLBI CARe project. PLoS Genet. 7, e1001300 (2011).
Talmud, P. J. et al. Gene-centric association signals for lipids and apolipoproteins identified via the HumanCVD BeadChip. Am. J. Hum. Genet. 85, 628–642 (2009).
Sandhu, M. S. et al. LDL-cholesterol concentrations: a genome-wide association study. Lancet 371, 483–491 (2008).
Sanna, S. et al. Fine mapping of five loci associated with low-density lipoprotein cholesterol detects variants that double the explained heritability. PLoS Genet. 7, e1002198 (2011).
Fox, C. S. et al. Genome-wide association to body mass index and waist circumference: the Framingham Heart Study 100K project. BMC Med. Genet. 8, S18 (2007).
Kathiresan, S. et al. A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study. BMC Med. Genet. 8, S17 (2007).
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. Chapter 7, Unit 7.20 (2013).
Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Zhang, R. The ANGPTL3-4-8 model, a molecular mechanism for triglyceride trafficking. Open Biol. 6, 150272 (2016).
Fu, Z., Abou-Samra, A. B. & Zhang, R. A lipasin/Angptl8 monoclonal antibody lowers mouse serum triglycerides involving increased postprandial activity of the cardiac lipoprotein lipase. Sci. Rep. 5, 18502 (2015).
Zhang, R. Lipasin, a novel nutritionally-regulated liver-enriched factor that regulates serum triglyceride levels. Biochem. Biophys. Res. Commun. 424, 786–792 (2012).
Siddiqa, A. et al. Visualizing the regulatory role of Angiopoietin-like protein 8 (ANGPTL8) in glucose and lipid metabolic pathways. Genomics 109, 408–418 (2017).
Yamada, H. et al. Circulating betatrophin is elevated in patients with type 1 and type 2 diabetes. Endocr. J. 62, 417–421 (2015).
Espes, D., Martinell, M. & Carlsson, P.-O. Increased circulating betatrophin concentrations in patients with type 2 diabetes. Int. J. Endocrinol. 2014, 323407 (2014).
Hu, H. et al. Increased circulating levels of betatrophin in newly diagnosed type 2 diabetic patients. Diabetes Care 37, 2718–2722 (2014).
Fu, Z. et al. Elevated circulating lipasin/betatrophin in human type 2 diabetes and obesity. Sci. Rep. 4, 5013 (2015).
Cannon, M. E. et al. Trans-ancestry Fine Mapping and Molecular Assays Identify Regulatory Variants at the ANGPTL8 HDL-C GWAS Locus. G3 7, 3217–3227 (2017).
Kanai, M. et al. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nat. Genet. 50, 390–400 (2018).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Lin, M. et al. Population-specific reference panels are crucial for genetic analyses: an example of the CREBRF locus in Native Hawaiians. Hum. Mol. Genet. 29, 2275–2284 (2020).
Ntzani, E. E., Liberopoulos, G., Manolio, T. A. & Ioannidis, J. P. A. Consistency of genome-wide associations across major ancestral groups. Hum. Genet. 131, 1057–1071 (2012).
Marigorta, U. M. & Navarro, A. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genet. 9, e1003566 (2013).
Waters, K. et al. Consistent association of type 2 diabetes risk variants found in Europeans in diverse racial and ethnic groups. PLoS Genet. 6, e1001078 (2010).
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Liu, J. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Carlson, C. S. et al. Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol. 11, e1001661 (2013).
Kuchenbaecker, K. et al. The transferability of lipid loci across African, Asian and European cohorts. Nat. Commun. 10, 4330 (2019).
Mägi, R. et al. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum. Mol. Genet. 26, 3639–3650 (2017).
Wegmann, D. et al. Recombination rates in admixed individuals identified by ancestry-based inference. Nat. Genet. 43, 847–853 (2011).
Atkinson, E. G. et al. No evidence for recent selection at FOXP2 among diverse human populations. Cell 174, 1424–1435.e15 (2018).
Deng, L., Ruiz-Linares, A., Xu, S. & Wang, S. Ancestry variation and footprints of natural selection along the genome in Latin American populations. Sci. Rep. 6, 21766 (2016).
Jin, W. et al. Genome-wide detection of natural selection in African Americans pre- and post-admixture. Genome Res. 22, 519–527 (2012).
Bomba, L., Walter, K. & Soranzo, N. The impact of rare and low-frequency genetic variants in common disease. Genome Biol. 18, 77 (2017).
Mathieson, I. & McVean, G. Differential confounding of rare and common variants in spatially structured populations. Nat. Genet. 44, 243–246 (2012).
Van Rossum G. & Drake, F. L. Jr. Python Reference Manual (Centrum Wiskunde en Informatica, 1995).
GNU Project, Free Software Foundation. Bash (3.2.48) [Unix shell program] (2007).
The International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
Chen, C. Y. et al. Improved ancestry inference using weights from external reference panels. Bioinformatics 29, 1399–1406 (2013).
Williams, A. admix-simu: program to simulate admixture between multiple populations. Zenodo https://doi.org/10.5281/ZENODO.45517 (2016).
Cann, H. M. et al. A human genome diversity cell line panel. Science 296, 261–262 (2002).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
The Hail Team. Hail https://github.com/hail-is/hail (2008).
Google Compute Engine launches, expanding Google’s cloud offerings. Google Cloud Platform Blog https://cloudplatform.googleblog.com/2012/06/google-compute-engine-launches.html (2012).
Kluyver, T. et al. Jupyter Notebooks – a publishing format for reproducible computational workflows. in Positioning and Power in Academic Publishing: Players, Agents and Agendas (eds. Loizides, F. & Scmidt, B.) 87–90 (IOS Press, 2016).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Bokeh: Python Library for Interactive Visualization (Bokeh Development Team, 2020); https://bokeh.org/citation/
Shin, J.-H., Blay, S., McNeney, B. & Graham, J.LDheatmap: an R function for graphical display of pairwise linkage disequilibria between single nucleotide oolymorphisms. J. Stat. Softw. 16, 1–10 (2006).
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Benner, C., Havulinna, A., Salomaa, V., Ripatti, S. & Pirinen, M. Refining fine-mapping: effect sizes and regional heritability. Preprint at bioRxiv https://doi.org/10.1101/318618 (2018).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).
Akiyama, M. et al. Characterizing rare and low-frequency height-associated variants in the Japanese population. Nat. Commun. 10, 4393 (2019).
Kanai, M., Tanaka, T. & Okada, Y. Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set. J. Hum. Genet. 61, 861–866 (2016).
Harrell, F. E. & Davis, C. E. A new distribution-free quantile estimator. Biometrika 69, 635–640 (1982).
Acknowledgements
We thank the PGC-PTSD working group, P. Natarajan, S. Gagliano Taliun and many other scientists within and beyond Boston for their intellectual contributions to this work. This project was supported by the National Institute of Mental Health (K01 MH121659 and T32 MH017119 to E.G.A., K99MH117229 to A.R.M., R37 MH107649 to B.M.N., and 2R01MH106595 to C.M.N. and K.C.K.). M.K. was supported by a Nakajima Foundation Fellowship and the Masason Foundation. M.L.S. was supported by the Fundação de Amparo à Pesquisa do Estado de São Paulo (#2018/09328-2). The BioBank Japan Project was supported by the Tailor-Made Medical Treatment Program of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) and the Japan Agency for Medical Research and Development (AMED). This research has been conducted using the UK Biobank Resource under application number 31063.
Author information
Authors and Affiliations
Contributions
E.G.A. designed and implemented the pipeline, ran the analyses and drafted the manuscript. A.X.M. designed and ran the analyses. M.K. designed and ran the analyses with the aid of J.C.U., Y.K., Y.O. and H.K.F. A.R.M. contributed code and aided in writing the manuscript. K.J.K. and M.L.S. aided in code implementation. K.C.K., C.M.N., B.M.N. and M.J.D. supervised and advised on the project. All authors reviewed and approved the final draft.
Corresponding author
Ethics declarations
Competing interests
M.J.D. is a founder of Maze Therapeutics. A.R.M. serves as a consultant for 23andMe and is a member of the Precise.ly Scientific Advisory Board. B.M.N. is a member of the Deep Genomics Scientific Advisory Board and serves as a consultant for the CAMP4 Therapeutics Corporation, Takeda Pharmaceutical and Biogen. The remaining authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks Michelle Daya and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Painted karyograms of a simulated AA individual showing EUR (red) and AFR (blue) ancestral tracts across demographic models.
The first column shows the results for the demographic model of one pulse of admixture 3 generations ago, the middle column shows the realistic model of one pulse 9 generations ago, and the right column shows a pulse 20 generations ago. In all cases the model involved 84% AFR ancestry and 16% EUR. The rows show the results from treatments of the data across steps of the Tractor pipeline. The top row shows the truth results from our simulations. Painted karyograms after statistical phasing of this truth cohort is shown in the second row. The third row illustrates the recovery of tracts broken by switch errors in phasing obtained by unkinking. The bottom row shows the smoothing and further improvement of tracts acquired through an additional round of LAI.
Extended Data Fig. 2 Tractor recovers disrupted tracts, improving tract distributions.
The top row (A-C) shows the improvements to the distributions of the number of discrete EUR tracts observed in simulated AA individuals under demographic models of 1 pulse of admixture at 3, 9 (realistic for AA population history) and 20 generations ago. The bottom row (D,E) shows the results from different initial admixture fractions, of 70% and 50% AFR, respectively, at 9 generations since admixture. These can be compared to the inferred realistic demographic model shown in B. In all panels, the simulated truth dataset is shown in black, after statistical phasing in purple, immediately after tract recovery procedures is in orange, and after one additional round of LAI after tract recovery in yellow.
Extended Data Fig. 3 The contribution of absolute MAF and effect size to Tractor power.
All cases assume an 80/20 AFR/EUR admixture ratio, 10% disease prevalence, 12k cases/30k controls with an effect only in the AFR genetic background. In all panels, the solid line uses a traditional GWAS model while the dashed line is our LAI-incorporating Tractor model. (A,B): Equal effect in EUR and AFR with shifted absolute MAF. (C,D): effect only in AFR background. (A,C): MAF is set to 10% in both AFR and EUR. (B,D): MAF is set to 40% in both AFR and EUR. Panels E and F illustrate the heterogeneity in effect sizes required to observe gains in Tractor power over traditional GWAS assuming 20% MAF in both ancestries and an effect that is stronger in AFR with varying difference to the EUR effect.
Extended Data Fig. 4 The interaction of between-ancestry MAF differences and effect sizes on Tractor power.
In all cases, the grey solid line uses a traditional GWAS model while the black dashed line is our LAI-incorporating model, admixture proportions are 80/20 AFR/EUR, disease prevalence is 10%, and the AFR MAF is fixed at 20%. A and E model the same effect size between EUR and AFR while varying the EUR MAF. B,D,F model the case when there is no effect in the EUR background while varying EUR MAF. C models an effect size difference of 30% with the effect being stronger in the EUR background. For comparison, Fig. 2f shows the same effect at matched 20% MAF.
Extended Data Fig. 5 The impact of LAI accuracy on Tractor’s performance as compared to standard GWAS and asaMap.
We modeled perfect accuracy, realistic accuracy as derived from simulations of our AA demographic model (98%), and a lower bound of 90% LAI accuracy. Black lines all indicate Tractor runs: the solid black line is Tractor’s performance with perfect LAI accuracy, the dashed line is at 98% accuracy, and the dotted line is at 90% accuracy. The red line represents the power obtained from standard GWAS, and the blue line for the asaMap model for the ancestry in which the effect was modeled (AFR for A,B, and C, and EUR for D). In all cases we included 10 PCs as covariates and 1000 replicates were run.
Supplementary information
Supplementary Information
Supplementary Discussion, Tables 1–3 and Figs. 1–7.
Rights and permissions
About this article
Cite this article
Atkinson, E.G., Maihofer, A.X., Kanai, M. et al. Tractor uses local ancestry to enable the inclusion of admixed individuals in GWAS and to boost power. Nat Genet 53, 195–204 (2021). https://doi.org/10.1038/s41588-020-00766-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-00766-y
- Springer Nature America, Inc.
This article is cited by
-
Multi-trait GWAS for diverse ancestries: mapping the knowledge gap
BMC Genomics (2024)
-
The genetic architecture of youth anxiety: a study protocol
BMC Psychiatry (2024)
-
Abundant pleiotropy across neuroimaging modalities identified through a multivariate genome-wide association study
Nature Communications (2024)
-
Genetic architecture distinguishes tinnitus from hearing loss
Nature Communications (2024)
-
Low-frequency and rare genetic variants associated with rheumatoid arthritis risk
Nature Reviews Rheumatology (2024)