Abstract
The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these “one at-a-time” strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Guo H, Fortune MD, Burren OS et al (2015) Integration of disease association and eQTL data using a Bayesian colocalisation approach highlights six candidate causal genes in immune-mediated diseases. Hum Mol Genet 24:3305–3313. doi:10.1093/hmg/ddv077
Pierce BL, Tong L, Chen LS et al (2014) Mediation analysis demonstrates that trans-eQTLs are often explained by cis-mediation: a genome-wide analysis among 1,800 South Asians. PLoS Genet 10:e1004818. doi:10.1371/journal.pgen.1004818
Kang H, Kerloc’h A, Rotival M et al (2014) Kcnn4 is a regulator of macrophage multinucleation in bone homeostasis and inflammatory disease. Cell Rep 8:1210–1224. doi:10.1016/j.celrep.2014.07.032
Rotival M, Zeller T, Wild PS et al (2011) Integrating genome-wide genetic variations and monocyte expression data reveals trans-regulated gene modules in humans. PLoS Genet 7:e1002367. doi:10.1371/journal.pgen.1002367
Fehrmann RSN, Jansen RC, Veldink JH et al (2011) Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet 7:e1002197. doi:10.1371/journal.pgen.1002197
Small KS, Hedman AK, Grundberg E et al (2011) Identification of an imprinted master trans regulator at the KLF14 locus related to multiple metabolic phenotypes. Nat Genet 43:561–564. doi:10.1038/ng.833
Shabalin AA (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28:1353–1358. doi:10.1093/bioinformatics/bts163
MacDonald JH (2009) Kruskal-Wallis test. Biol Handb Stat 165–172. doi: 10.1002/9780470479216.corpsy0491
Yang T-P, Beazley C, Montgomery SB et al (2010) Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies. Bioinformatics 26:2474–2476. doi:10.1093/bioinformatics/btq452
Broman KW, Wu H, Sen S, Churchill GA (2003) R/qtl: QTL mapping in experimental crosses. Bioinformatics 19:889–890
Clayton D, Leung H-T (2007) An R package for analysis of whole-genome association studies. Hum Hered 64:45–51. doi:10.1159/000101422
Sun W (2009) eQTL analysis by Linear Model. In: http://www.bios.unc.edu/~weisun/software/eMap.pdf. Accessed 20 Oct 2015
Qi J, Asl HF, Björkegren J, Michoel T (2014) kruX: matrix-based non-parametric eQTL discovery. BMC Bioinformatics 15:11. doi:10.1186/1471-2105-15-11
Gao C, Tignor NL, Salit J et al (2014) HEFT: eQTL analysis of many thousands of expressed genes while simultaneously controlling for hidden factors. Bioinformatics 30:369–376. doi:10.1093/bioinformatics/btt690
Marchini J, Howie B, Myers S et al (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913. doi:10.1038/ng2088
Bottolo L, Chadeau-hyam M, Hastie DI et al (2011) ESS++: a C++ objected-oriented algorithm for Bayesian stochastic search model exploration. Bioinformatics 27:587–588. doi:10.1093/bioinformatics/btq684
Guan Y, Stephens M (2011) Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann Appl Stat 5:1780–1815
Scott-Boyer MP, Imholte GC, Tayeb A et al (2012) An integrated hierarchical Bayesian model for multivariate eQTL mapping. Stat Appl Genet Mol Biol. doi: 10.1515/1544-6115.1760
He X, Fuller CK, Song Y et al (2013) Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. Am J Hum Genet 92:667–680. doi:10.1016/j.ajhg.2013.03.022
Flutre T, Wen X, Pritchard J, Stephens M (2013) A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet 9:e1003486. doi:10.1371/journal.pgen.1003486
Petretto E, Bottolo L, Langley SR et al (2010) New insights into the genetic control of gene expression using a Bayesian multi-tissue approach. PLoS Comput Biol 6:e1000737. doi:10.1371/journal.pcbi.1000737
Sul JH, Han B, Ye C et al (2013) Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches. PLoS Genet 9:e1003491. doi:10.1371/journal.pgen.1003491
Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296:752–755. doi:10.1126/science.1069516
Dudoit S, Yang YH, Callow MJ, Speed TP (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 12:111–139. doi:10.1146/annurev.psych.53.100901.135153
Gerrits A, Li Y, Tesson BM et al (2009) Expression quantitative trait loci are highly sensitive to cellular differentiation state. PLoS Genet 5:e1000692. doi:10.1371/journal.pgen.1000692
Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11:407–409. doi:10.1038/nmeth.2848
Narahara M, Higasa K, Nakamura S et al (2014) Large-scale East-Asian eQTL mapping reveals novel candidate genes for LD mapping and the genomic landscape of transcriptional effects of sequence variants. PLoS One 9:e100924. doi:10.1371/journal.pone.0100924
Duggal G, Wang H, Kingsford C (2014) Higher-order chromatin domains link eQTLs with the expression of far-away genes. Nucleic Acids Res 42:87–96. doi:10.1093/nar/gkt857
Gatti DM, Shabalin AA, Lam T-C et al (2009) FastMap: fast eQTL mapping in homozygous populations. Bioinformatics 25:482–489. doi:10.1093/bioinformatics/btn648
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57:289–300. doi:10.2307/2346101
GTEx Consortium (2013) The genotype-tissue expression (GTEx) project. Nat Genet 45:580–585. doi:10.1038/ng.2653
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67. doi:10.1080/00401706.1970.10488634
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B Stat Methodol 73:273–282. doi:10.1111/j.1467-9868.2011.00771.x
Wu TT, Chen YF, Hastie T et al (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25:714–721. doi:10.1093/bioinformatics/btp041
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429. doi:10.1198/016214506000000735
Tibshirani R, Saunders M, Rosset S et al (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B Stat Methodol 67:91–108. doi:10.1111/j.1467-9868.2005.00490.x
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol 67:301–320. doi:10.1111/j.1467-9868.2005.00503.x
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B 68:49–67. doi:10.1111/j.1467-9868.2005.00532.x
Kim S, Xing EP (2009) Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 5:e1000587. doi:10.1371/journal.pgen.1000587
Wang W, Zhang X (2011) Network-based group variable selection for detecting expression quantitative trait loci (eQTL). BMC Bioinformatics 12:269
Lee S, Xing EP (2012) Structured input-output Lasso, with application to eQTL mapping, and a thresholding algorithm for fast estimation. Available at: https://arxiv.org/abs/1205.1989
Cheng W, Zhang X, Guo Z et al (2014) Graph-regularized dual Lasso for robust eQTL mapping. Bioinformatics 30:139–148. doi:10.1093/bioinformatics/btu293
Kim S, Xing EP (2012) Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. Ann Appl Stat 6:1095–1117. doi:10.1214/12-AOAS549
Leng C, Lin Y, Wahba G (2006) A note on the lasso and related procedures in model selection. Stat Sin 16:1273–1284
Rakitsch B, Lippert C, Stegle O, Borgwardt K (2013) A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics 29:206–214. doi:10.1093/bioinformatics/bts669
Brown AA, Richardson S, Whittaker J (2011) Application of the Lasso to expression quantitative trait loci mapping. Stat Appl Genet Mol Biol 10:1–35. doi:10.2202/1544-6115
Meinshausen N, Bühlmann P (2010) Stability selection. J R Stat Soc Ser B 72:417–473. doi:10.1111/j.1467-9868.2010.00740.x
Shah RD, Samworth RJ (2013) Variable selection with error control: another look at stability selection. J R Stat Soc Ser B 75:55–80. doi:10.1111/j.1467-9868.2011.01034.x
Waldmann P, Mészáros G, Gredler B et al (2013) Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet 4:270. doi:10.3389/fgene.2013.00270
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22. doi:10.18637/jss.v033.i01
Beaumont MA, Rannala B (2004) The Bayesian revolution in genetics. Nat Rev Genet 5:251–261. doi:10.1038/nrg1318
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795. doi:10.2307/2291091
O’Hara RB, Sillanpää MJ (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4:85–117
Servin B, Stephens M (2007) Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 3:1296–1308. doi:10.1371/journal.pgen.0030114
Stephens M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nat Rev Genet 10:681–690. doi:10.1038/nrg2615
Lee S-I, Dudley AM, Drubin D et al (2009) Learning a prior on regulatory potential from eQTL data. PLoS Genet 5:e1000358. doi:10.1371/journal.pgen.1000358
Das A, Morley M, Moravec CS et al (2015) Bayesian integration of genetics and epigenetics detects causal regulatory SNPs underlying expression variability. Nat Commun 6:8555. doi:10.1038/ncomms9555
Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791. doi:10.1038/nrg1916
Kendziorski CM, Chen M, Yuan M et al (2006) Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62:19–27. doi:10.1111/j.1541-0420.2005.00437.x
Bottolo L, Richardsony S (2010) Evolutionary stochastic search for bayesian model exploration. Bayesian Anal 5:583–618. doi:10.1214/10-BA523
Zhang M, Montooth KL, Wells MT et al (2005) Mapping multiple quantitative trait loci by Bayesian classification. Genetics 169:2305–2318. doi:10.1534/genetics.104.034181
Zhang M, Zhang D, Wells MT (2008) Variable selection for large p small n regression models with incomplete data: mapping QTL with epistases. BMC Bioinformatics 9:251. doi:10.1186/1471-2105-9-251
Liu J, Liu Y, Liu X, Deng H-W (2007) Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components. Am J Hum Genet 81:304–320. doi:10.1086/519495
Chun H (2009) Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 182:79–90. doi:10.1534/genetics.109.100362
Chen W, Ghosh D, Raghunathan TE, Sargent DJ (2009) Bayesian variable selection with joint modeling of categorical and survival outcomes: an application to individualizing chemotherapy treatment in advanced colorectal cancer. Biometrics 65:1030–1040. doi:10.1111/j.1541-0420.2008.01181.x
Chipman H, George EI, McCulloch RE (2001) The practical implementation of Bayesian model selection. Institute of Mathematical Statistics, Beachwood, OH, pp 65–116
Brown PJ, Vannucci M, Fearn T (2002) Bayes model averaging with selection of regressors. J R Stat Soc Ser B 64:519–536. doi:10.1111/1467-9868.00348
Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103:681–686. doi:10.1198/016214508000000337
Carvalho CM, Polson NG, Scott JG (2010) The horseshoe estimator for sparse signals. Biometrika 97:465–480. doi:10.1093/biomet/asq017
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38, 10.1.1.133.4884
Ročková V, George EI (2014) EMVS: the EM approach to Bayesian variable selection. J Am Stat Assoc 109:828–846. doi:10.1080/01621459.2013.869223
Gelfand AE, Smith AFM (2012) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409. doi: 10.1080/01621459.1990.10476213
Hans C, Dobra A, West M (2007) Shotgun stochastic search for “Large p ” regression. J Am Stat Assoc 102:507–516. doi:10.1198/016214507000000121
Bottolo L, Petretto E, Blankenberg S et al (2011) Bayesian detection of expression quantitative trait loci hot spots. Genetics 189:1449–1459. doi:10.1534/genetics.111.131425
Bottolo L, Chadeau-Hyam M, Hastie DI et al (2013) GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm. PLoS Genet 9:e1003657. doi:10.1371/journal.pgen.1003657
Barbieri MM, Berger JO (2015) Optimal predictive model selection. Ann Stat 32:870–897
Broët P, Lewin A, Richardson S et al (2004) A mixture model-based strategy for selecting sets of genes in multiclass response microarray experiments. Bioinformatics 20:2562–2571. doi:10.1093/bioinformatics/bth285
Efron B (2008) Microarrays, empirical bayes and the two-groups model. Stat Sci 23:1–22. doi:10.1214/08-STS236REJ
Strimmer K (2008) fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24:1461–1462. doi:10.1093/bioinformatics/btn209
Zellner A, Siow A (1980) Posterior odds ratios for selected regression hypotheses. Trab Estad Y Investig Oper 31:585–603. doi:10.1007/BF02888369
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109. doi:10.2307/2334940
Eiben AE, Raué PE, Ruttkay Z (1994) Genetic algorithms with multi-parent recombination. In: Parallel problem solving from nature — PPSN III. Springer, Heidelberg, pp 78–87
Lewin A, Saadi H, Peters JE et al (2016) MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues. Bioinformatics 32:523–32. doi:10.1093/bioinformatics/btv568
Ardlie KG, Deluca DS, Segre AV et al (2015) The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348:648–660. doi:10.1126/science.1262110
Todorov V, Filzmoser P (2010) Robust statistic for the one-way MANOVA. Comput Stat Data Anal 54:37–48. doi:10.1016/j.csda.2009.08.015
Kim S, Becker J, Bechheim M et al (2014) Characterizing the genetic basis of innate immune response in TLR4-activated human monocytes. Nat Commun 5:5236. doi:10.1038/ncomms6236
Chen X, Shi X, Xu X et al (2012) A two-graph guided multi-task Lasso approach for eQTL mapping. ece.ubc.ca XX:208–217
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188. doi:10.1214/aos/1013699998
Hofner B, Boccuto L, Göker M (2015) Controlling false discoveries in high-dimensional situations: boosting with stability selection. BMC Bioinformatics 16:144
Heinig M, Petretto E, Wallace C et al (2010) A trans-acting locus regulates an anti-viral expression network and type 1 diabetes risk. Nature 467:460–464. doi:10.1038/nature09386
Grundberg E, Small KS, Hedman ÅK et al (2012) Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat Genet 44:1084–1089. doi:10.1038/ng.2394
Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2:224–244
Gelfond JAL, Ibrahim JG, Zou F (2007) Proximity model for expression quantitative trait loci (eQTL) detection. Biometrics 63:1108–1116. doi:10.1111/j.1541-0420.2007.00778.x
Wakefield J (2009) Bayes factors for genome-wide association studies: comparison with P-values. Genet Epidemiol 33:79–86. doi:10.1002/gepi.20359
Emilsson V, Thorleifsson G, Zhang B et al (2008) Genetics of gene expression and its effect on disease. Nature 452:423–428. doi:10.1038/nature06758
Westra H-J, Peters MJ, Esko T et al (2013) Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat Genet 45:1238–1243. doi:10.1038/ng.2756
Acknowledgments
We acknowledge funding from Medical Research Council Grant G 1002319 (L.B.), MR/M013138/1 (L.B.), MR/M004716/1 (M.I. and E.P.) and Duke-NUS Graduate Medical School Singapore (E.P.).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this protocol
Cite this protocol
Imprialou, M., Petretto, E., Bottolo, L. (2017). Expression QTLs Mapping and Analysis: A Bayesian Perspective. In: Schughart, K., Williams, R. (eds) Systems Genetics. Methods in Molecular Biology, vol 1488. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6427-7_8
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6427-7_8
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6425-3
Online ISBN: 978-1-4939-6427-7
eBook Packages: Springer Protocols