Skip to main content

Identification of Disease-Related Genes Using a Genome-Wide Association Study Approach

  • Protocol
  • First Online:
Disease Gene Identification

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1706))

Abstract

Genome-wide association studies (GWAS) provide a hypothesis-free approach to discover genetic variants contributing to the risk of a certain disease or disease-related trait. Ongoing efforts to annotate the human genome have helped to localize disease-causing variants and point to mechanisms by which genetic variants might exert functional effects. By integrating bioinformatics approaches with in vivo and in vitro genomic strategies to predict and subsequently validate the functional roles of GWAS-identified variants, disease-related pathways can be characterized, providing new possibilities for therapeutic intervention. Here, we describe a basic workflow, from sample preparation to data analysis, for performing a GWAS to identify disease genes. We also discuss resources for the annotation and interpretation of GWAS results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  1. LaFrambiose T (2009) Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic Acids Res 37:4181–4193

    Article  Google Scholar 

  2. Bush WS, Moore JH (2012) Chapter 11: genome-wide association studies. PLoS Comput Biol 8:e1002822

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Kemper KE, Deatwyler HD, Visscher PM, Goddard ME (2012) Comparing linkage and association analyses in sheep points to a better way of doing GWAS. Genet Res Camb 94:191–203

    Article  CAS  PubMed  Google Scholar 

  4. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, Parkinson H (2014) The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42(Database issue):D1001–D1006

    Article  CAS  PubMed  Google Scholar 

  5. Burdett T (EBI), Hall PN (NHGRI), Hastings E (EBI), Hindorff LA (NHGRI), Junkins HA (NHGRI), Klemm AK (NHGRI), MacArthur J (EBI), Manolio TA (NHGRI), Morales J (EBI), Parkinson H (EBI) and Welter D (EBI). The NHGRI-EBI Catalog of published genome-wide association studies. Available at: www.ebi.ac.uk/gwas. Accessed November 2016

  6. Smemo S, Tena JJ, Kim KH, Gamazon ER, Sakabe NJ, Gómez-Marín C, Aneas I, Credidio FL, Sobreira DR, Wasserman NF, Lee JH, Puviindran V, Tam D, Shen M, Son JE, Vakili NA, Sung HK, Naranjo S, Acemel RD, Manzanares M, Nagy A, Cox NJ, Hui CC, Gomez-Skarmeta JL, Nóbrega MA (2014) Obesity-associated variants within FTO form long–range functional connections with IRX3. Nature 507:371–375

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Habek M, Brinar VV, Borovecki F (2010) Genes associated with multiple sclerosis: 15 and counting. Expert Rev Mol Diagn 10:857–861

    Article  CAS  PubMed  Google Scholar 

  8. Sham PC, Purcell SM (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335–346

    Article  CAS  PubMed  Google Scholar 

  9. McCarthy MI et al (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9:356–369

    Article  CAS  PubMed  Google Scholar 

  10. Faul F, Erdfelder E, Lang AG, Buchner A (2007) G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39:175–191

    Article  PubMed  Google Scholar 

  11. Faul F, Erdfelder E, Bucher A, Lang AG (2009) Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav Res Methods 41:1149–1160

    Article  PubMed  Google Scholar 

  12. R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Accessed September 2016

    Google Scholar 

  13. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Gogarten SM, Bhangale T, Conomos MP, Laurie CA, McHugh CP, Painter I, Zheng X, Crosslin DR, Levine D, Lumley T, Nelson SC, Rice K, Shen J, Swarnkar R, Weir BS, Laurie CC (2012) GWASTools: an R/bioconductor package for quality control and analysis of genome-wide association studies. Bioinformatics 28:3329–3331

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nat Genet 39:906–913

    Article  CAS  PubMed  Google Scholar 

  16. Aulchenko YS, Ripke S, Isaacs A, van Duijn CM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23:1294–1296

    Article  CAS  PubMed  Google Scholar 

  17. Aulchenko YS, Karssen LC (2015) The GenABEL project developers. The GenABEL Tutorial Zenodo; doi:https://doi.org/10.5281/zenodo.19738

  18. Nicolazzi EL, Marras G, Stella A (2016) SNPConvert: SNP array standardization and integration in livestock species. Microarrays 5:17

    Article  PubMed Central  Google Scholar 

  19. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909

    Article  CAS  PubMed  Google Scholar 

  20. Rice TK, Schork NJ, Rao DC (2008) Methods for handling multiple testing. Adv Genet 60:293–308

    PubMed  Google Scholar 

  21. Panagiotou OA, Ioannidis JPA, the Genome-Wide Significance Project (2012) What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int J Epidemiol 41:273–286

    Article  PubMed  Google Scholar 

  22. De S, Pedersen BS, Kechris K (2014) The dilemma of choosing the ideal permutation strategy while estimating statistical significance of genome-wide enrichment. Brief Bioinform 15:919–928

    Article  PubMed  Google Scholar 

  23. Backes C, Rühle F, Stoll M, Haas J, Frese K, Franke A, Lieb W, Wichmann HE, Weis T, Kloos W, Lenhof HP, Meese E, Katus H, Meder B, Keller A (2014) Systematic permutation testing in GWAS pathway analyses: identification of genetic networks in dilated cardiomyopathy and ulcerative colitis. BMC Genomics 15:622

    Article  PubMed  PubMed Central  Google Scholar 

  24. Turner SD (2014) Qqman: an R package for visualizing GWAS results using Q—Q and manhattan plots. biorXiv. https://doi.org/10.1101/005165

  25. Duggal P, Gillanders EM, Holmes TN, Bailey-Wilson JE (2008) Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies. BMC Genomics 9:516

    Article  PubMed  PubMed Central  Google Scholar 

  26. Stephens M, Scheet P (2005) Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am J Hum Genet 76:449–462

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Zheng-Bradley X, Flicek P (2016) Applications of the 1000 genomes project resources. Brief Funct Genomics 16(3):163–170. [Epub ahead of print] PMID: 27436001

    PubMed Central  Google Scholar 

  28. Browning BL, Browning SR (2016) Genotype imputation with millions of reference samples. Am J Hum Genet 98:116–126

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Aulchenko YS, Struchalin MV, van Duijn CM (2010) ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11:134

    Article  PubMed  PubMed Central  Google Scholar 

  30. Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Res 38:e164

    Article  PubMed  PubMed Central  Google Scholar 

  31. Chang X, Wang K (2012) wANNOVAR: annotating genetic variants for personal genomes via the web. J Med Genet 49:433–436

    Article  PubMed  PubMed Central  Google Scholar 

  32. Yang H, Wang K (2015) Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat Protoc 10:1556–1566

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F (2016) The ensembl variant effect predictor. Genome Biol 17:122

    Article  PubMed  PubMed Central  Google Scholar 

  34. Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila Melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6:80–92

    Article  CAS  Google Scholar 

  35. Smedley D, Jacobsen JO, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, Robinson PN (2015) Next-generation diagnostics and disease-gene discovery with the exomiser. Nat Protoc 10:2004–2015

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Cheng YC, Hsiao FC, Yeh EC, Lin WJ, Tang CY, Tseng HC, Wu HT, Liu CK, Chen CC, Chen YT, Yao A (2012) VarioWatch: providing large-scale and comprehensive annotations on human genomic variants in the next generation sequencing era. Nucleic Acids Res 40(Web Server issue):W76–W81

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Speir ML, Zweig AS, Rosenbloom KR, Raney BJ, Paten B, Nejad P, Lee BT, Learned K, Karolchik D, Hinrichs AS, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Fujita PA, Eisenhart C, Diekhans M, Clawson H, Casper J, Barber GP, Haussler D, Kuhn RM, Kent WJ (2016) The UCSC genome browser database: 2016 update. Nucleic Acids Res 44(D1):D717–D725

    Article  CAS  PubMed  Google Scholar 

  38. Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Keenan S, Lavidas I, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Nuhn M, Parker A, Patricio M, Pignatelli M, Rahtz M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Birney E, Harrow J, Muffato M, Perry E, Ruffier M, Spudich G, Trevanion SJ, Cunningham F, Aken BL, Zerbino DR, Flicek P (2016) Ensembl 2016. Nucleic Acids Res 44(D1):D710–D716

    Article  CAS  PubMed  Google Scholar 

  39. Shabalin AA (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28:1353–1358

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Claussnitzer M, Dankel SN, Klocke B, Grallert H, Glunk V, Berulava T, Lee H, Oskolkov N, Fadista J, Ehlers K, Wahl S, Hoffmann C, Qian K, Rönn T, Riess H, Müller-Nurasyid M, Bretschneider N, Schroeder T, Skurk T, Horsthemke B, Spieler D, Klingenspor M, Seifert M, Kern MJ, Mejhert N, Dahlman I, Hansson O, Hauck SM, Blüher M, Arner P, Groop L, Illig T, Suhre K, Hsu YH, Mellgren G, Hauner H, Laumen H, DIAGRAM+Consortium (2014) Leveraging cross-species transcription factor binding site patterns: from diabetes risk loci to disease mechanisms. Cell 156:343–358

    Article  CAS  PubMed  Google Scholar 

  41. Li MX, Gui HS, Kwan JS, Sham PC (2011) GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 88:283–293

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Van der Sluis S, Dolan CV, Li J, Song Y, Sham P, Posthuma D, Li MX (2015) MGAS: a powerful tool for multivariate gene-based genome-wide association analysis. Bioinformatics 31:1007–1015

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

We would like to cordially thank Peter Kovacs, head of the research group Genetics of Obesity and Diabetes, and our colleagues for their everlasting scientific and personal support.

Funding: Tobias Wohland is funded by the IFB AdiposityDiseases (AD2-6E95). Dorit Schleinitz is funded by the Boehringer Ingelheim Foundation and by a Collaborative Research Center (C1, CRC1052).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dorit Schleinitz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Wohland, T., Schleinitz, D. (2018). Identification of Disease-Related Genes Using a Genome-Wide Association Study Approach. In: DiStefano, J. (eds) Disease Gene Identification. Methods in Molecular Biology, vol 1706. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7471-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7471-9_7

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7470-2

  • Online ISBN: 978-1-4939-7471-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics