Abstract
An immense amount of observable diversity exists for all traits and across global populations. In the post-genomic era, equipped with efficient sequencing capabilities and better genotyping methods, we are now able to more fully appreciate how regulation of gene expression is consequential to one’s genotypes in coding and non-coding DNA. The identification of genetic loci that contribute to quantifiable variation in genetic expression is critical in further improving our understanding of the biological regulation of complex traits. Expression quantitative traits loci (eQTLs) mapping studies have provided a powerful suite of techniques for genome wide analysis to detect these regulatory effects. However, a typical eQTL analysis relies on a large number of samples with many genetic variants to achieve robust power and significance for detection. With this in mind, eQTL analysis brings about distinct computational and statistical challenges that require advanced methodological development to overcome. In recent years, many statistical and machine learning methods for eQTL analysis have been developed with the ability to provide a more complex perspective towards the identification of relationships between genetic variation and genetic expression. In this chapter, we provide a comprehensive review of statistical and machine learning methods. We will present various machine learning methods based upon regularization terms and several other statistical analysis methods. Finally, we will discuss prior knowledge integration and hyperparameter optimization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rockman MV, Kruglyak L (2006) Genetics of global gene expression. Nat Rev Genet 7(11):862
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW et al (2010) Common snps explain a large proportion of the heritability for human height. Nat Genet 42(7):565
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184
Cheung VG, Spielman RS (2009) Genetics of human gene expression: mapping DNA variants that influence gene expression. Natl Rev Genet 10(9):595
Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, Lyle R, Hunt S, Kahl B, Antonarakis SE, Tavaré S et al (2005) Genome-wide associations of gene expression variation in humans. PLoS Genet 1(6):e78
Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, De Grassi A, Lee C et al (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315(5813):848–853
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67
Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. In: Proceedings of the 26th annual international conference on machine learning. ACM, New York, pp 433–440
Yuan L, Liu J, Ye J (2011) Efficient methods for overlapping group lasso. In: Advances in neural information processing systems, pp 352–360
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245
Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:10010736
Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadian SF, Ward R et al (2001) Linkage disequilibrium in the human genome. Nature 411(6834):199
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Ser B (Stat Methodol) 67(1):91–108
She Y et al (2010) Sparse regression with exact clustering. Electron J Stat 4:1055–1096
Reid S, Tibshirani R (2016) Sparse regression and marginal testing using cluster prototypes. Biostatistics 17(2):364–376
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Samarov DV, Allen D, Hwang J, Lee YJ, Litorja M (2017) A coordinate-descent-based approach to solving the sparse group elastic net. Technometrics 59(4):437–445
Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48
Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 109–117
Zhang Y, Yang Q (2017) A survey on multi-task learning. arXiv preprint arXiv:170708114
Negahban S, Wainwright MJ (2008) Joint support recovery under high-dimensional scaling: benefits and perils of ℓ1, ∞-regularization. In: Proceedings of the 21st international conference on neural information processing systems. Curran Associates, Red Hook, pp 1161–1168
Jalali A, Sanghavi S, Ruan C, Ravikumar PK (2010) A dirty model for multi-task learning. In: Advances in neural information processing systems, pp 964–972
Chen J, Zhou J, Ye J (2011) Integrating low-rank and group-sparse structures for robust multi-task learning. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 42–50
Kim S, Xing EP et al (2012) Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. Ann Appl Stat 6(3):1095–1117
Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. Comput J 26(4):354–359
Kim S, Xing EP (2009) Statistical estimation of correlated genome associations to a quantitative trait network. PLoS Genet 5(8):e1000587
Chen X, Shi X, Xu X, Wang Z, Mills R, Lee C, Xu J (2012) A two-graph guided multi-task lasso approach for eQTL mapping. In: Artificial intelligence and statistics, pp 208–217
Lee S, Zhu J, Xing EP (2010) Adaptive multi-task lasso: with application to eQTL detection. In: Advances in neural information processing systems, pp 1306–1314
Lee S, Xing EP (2012) Leveraging input and output structures for joint mapping of epistatic and marginal eQTLS. Bioinformatics 28(12):i137–i146
Obozinski G, Taskar B, Jordan M (2007) Joint covariate selection for grouped classification. Technical Report, Statistics Department, UC Berkeley
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
Varma S, Das S (2018) Deep learning. https://srdas.github.io/DLBook/HyperParameterSelection.html#tuning-hyper-parameters
Bergstra J, Yamins D, Cox DD (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. J Mach Learn Res 28:I-115–I-123
Zhou J, Cui G, Zhang Z, Yang C, Liu Z, Sun M (2018) Graph neural networks: a review of methods and applications. arXiv preprint arXiv:181208434
You J, Liu B, Ying Z, Pande V, Leskovec J (2018) Graph convolutional policy network for goal-directed molecular graph generation. In: Advances in neural information processing systems, pp 6410–6421
De Cao N, Kipf T (2018) Molgan: an implicit generative model for small molecular graphs. arXiv preprint arXiv:180511973
Fout A, Byrd J, Shariat B, Ben-Hur A (2017) Protein interface prediction using graph convolutional networks. In: Advances in neural information processing systems, pp 6530–6539
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Chen, J., Nodzak, C. (2020). Statistical and Machine Learning Methods for eQTL Analysis. In: Shi, X. (eds) eQTL Analysis. Methods in Molecular Biology, vol 2082. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0026-9_7
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0026-9_7
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0025-2
Online ISBN: 978-1-0716-0026-9
eBook Packages: Springer Protocols