Abstract
Gene–environment interactions have important implications for elucidating the genetic basis of complex diseases beyond the joint function of multiple genetic factors and their interactions (or epistasis). In the past, G × E interactions have been mainly conducted within the framework of genetic association studies. The high dimensionality of G × E interactions, due to the complicated form of environmental effects and the presence of a large number of genetic factors including gene expressions and SNPs, has motivated the recent development of penalized variable selection methods for dissecting G × E interactions, which has been ignored in the majority of published reviews on genetic interaction studies. In this article, we first survey existing studies on both gene–environment and gene–gene interactions. Then, after a brief introduction to the variable selection methods, we review penalization and relevant variable selection methods in marginal and joint paradigms, respectively, under a variety of conceptual models. Discussions on strengths and limitations, as well as computational aspects of the variable selection methods tailored for G × E studies, have also been provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hunter DJ (2005) Gene–environment interactions in human diseases. Nat Rev Genet 6(4):287
Simonds NI, Ghazarian AA, Pimentel CB, Schully SD, Ellison GL, Gillanders EM, Mechanic LE (2016) Review of the gene-environment interaction literature in cancer: what do we know? Genet Epidemiol 40(5):356–365
Flowers E, Froelicher ES, Aouizerat BE (2012) Gene-environment interactions in cardiovascular disease. Eur J Cardiovasc Nurs 11(4):472–478
Cornelis MC, Hu FB (2012) Gene-environment interactions in the development of type 2 diabetes: recent progress and continuing challenges. Annu Rev Nutr 32:245–259
Dempfle A, Scherag A, Hein R, Beckmann L, Chang-Claude J, Schäfer H (2008) Gene–environment interactions for complex traits: definitions, methodological requirements and challenges. Eur J Hum Genet 16(10):1164
Ottman R (1996) Gene–environment interaction: definitions and study design. Prev Med 25(6):764–770
Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K (2002) A comprehensive review of genetic association studies. Genet Med 4(2):45
Lunetta KL (2008) Genetic association studies. Circulation 118(1):96–101
Wu C, Li S, Cui Y (2012) Genetic association studies: an information content perspective. Curr Genomics 13(7):566–573
Cornelis MC, Tchetgen Tchetgen EJ, Liang L, Qi L, Chatterjee N, Hu FB, Kraft P (2011) Gene-environment interactions in genome-wide association studies: a comparative study of tests applied to empirical studies of type 2 diabetes. Am J Epidemiol 175(3):191–202
Murcray CE, Lewinger JP, Gauderman WJ (2008) Gene-environment interaction in genome-wide association studies. Am J Epidemiol 169(2):219–226
Winham SJ, Biernacka JM (2013) Gene–environment interactions in genome-wide association studies: current approaches and new directions. J Child Psychol Psychiatry 54(10):1120–1134
Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Stat Sin 20(1):101
Wu C, Ma S (2014) A selective review of robust variable selection with applications in bioinformatics. Brief Bioinform 16(5):873–883
Caspi A, Moffitt TE (2006) Gene–environment interactions in psychiatry: joining forces with neuroscience. Nat Rev Neurosci 7(7):583
Thomas D (2010) Gene–environment-wide association studies: emerging approaches. Nat Rev Genet 11(4):259
Ober C, Vercelli D (2011) Gene–environment interactions in human disease: nuisance or opportunity? Trends Genet 27(3):107–115
Fletcher JM, Conley D (2013) The challenge of causal inference in gene–environment interaction research: leveraging research designs from the social sciences. Am J Public Health 103(S1):S42–S45
McAllister K, Mechanic LE, Amos C, Aschard H, Blair IA, Chatterjee N, Jankowska MM (2017) Current challenges and new opportunities for gene-environment interaction studies of complex diseases. Am J Epidemiol 186(7):753–761
Wu M, Ma S (2018) Robust genetic interaction analysis. Brief Bioinform 20(2):624–637
Cordell HJ (2002) Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum Mol Genet 11(20):2463–2468
Moore JH (2003) The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum Hered 56(1-3):73–82
Moore JH (2005) A global view of epistasis. Nat Genet 37(1):13
McKinney BA, Reif DM, Ritchie MD, Moore JH (2006) Machine learning for detecting gene-gene interactions. Appl Bioinforma 5(2):77–88
Phillips PC (2008) Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9(11):855
Cordell HJ (2009) Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 10(6):392
Moore JH, Williams SM (2009) Epistasis and its implications for personal genetics. Am J Hum Genet 85(3):309–320
Wang X, Elston RC, Zhu X (2010) The meaning of interaction. Hum Hered 70(4):269–277
Li M, Lou XY, Lu Q (2012) On epistasis: a methodological review for detecting gene-gene interactions underlying various types of phenotypic traits. Recent Pat Biotechnol 6(3):230–236
Koo CL, Liew MJ, Mohamad MS, Salleh M, Hakim A (2013) A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. Biomed Res Int 2013:432375
Van der Weele TJ, Knol MJ (2014) A tutorial on interaction. Epidemiol Methods 3(1):33–72
Wei WH, Hemani G, Haley CS (2014) Detecting epistasis in human complex traits. Nat Rev Genet 15(11):722–733
Niel C, Sinoquet C, Dina C, Rocheleau G (2015) A survey about methods dedicated to epistasis detection. Front Genet 6:285
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58(1):267–288
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67(2):301–320
Li C, Li H (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformatics 24(9):1175–1182
Huang J, Ma S, Li H, Zhang CH (2011) The sparse Laplacian shrinkage estimator for high-dimensional regression. Ann Stat 39(4):2021
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Series B Stat Methodol 68(1):49–67
Huang J, Breheny P, Ma S (2012) A selective review of group selection in high-dimensional models. Stat Sci 27(4)
Breheny P, Huang J (2009) Penalized methods for bi-level variable selection. Stat Interface 2(3):369
O'Hara RB, Sillanpää MJ (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4(1):85–117
Park T, Casella G (2008) The bayesian lasso. J Am Stat Assoc 103(482):681–686
Kyung M, Gill J, Ghosh M, Casella G (2010) Penalized regression, standard errors, and Bayesian lassos. Bayesian Anal 5(2):369–411
Ahn J, Mukherjee B, Gruber SB, Ghosh M (2013) Bayesian semiparametric analysis for two-phase studies of gene-environment interaction. Ann Appl Stat 7(1):543
Liu C, Ma J, Amos CI (2015) Bayesian variable selection for hierarchical gene–environment and gene–gene interactions. Hum Genet 134(1):23–36
Li J, Wang Z, Li R, Wu R (2015) Bayesian group LASSO for nonparametric varying-coefficient models with application to functional genome-wide association studies. Ann Appl Stat 9(2):640
Ren J, Zhou F, Li X, Chen Q, Zhang H, Ma S, Jiang Y, Wu C (2020) Semi-parametric Bayesian variable selection for gene-environment interactions. Stat Med 39(5):617–638
George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88(423):881–889
George EI, McCulloch RE (1997) Approaches for Bayesian variable selection. Stat Sin:339–373
Ročková V, George EI (2018) The spike-and-slab lasso. J Am Stat Assoc 113(521):431–444
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Machine Learning: Proceedings of the Thirteenth International Conference, vol 96, pp 148–156
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat:1189–1232
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM sigkdd international conference on knowledge discovery and data mining pp. 785–794. ACM
Bühlmann P, Yu B (2006) Sparse boosting. J Mach Learn Res 7:1001–1024
Buehlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34(2):559–583
Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22(4):477–505
Pashova H, LeBlanc M, Kooperberg C (2013) Boosting for detection of gene–environment interactions. Stat Med 32(2):255–266
Wu M, Ma S (2019) Robust semiparametric gene-environment interaction analysis using sparse boosting. Stat Med 38(23):4625–4641
Hwang C, Shim J (2017) Feature selection in the semivarying coefficient LS-SVR. J Korean Data Infor Sci Soc 28(2):461–471
Shim J, Hwang C, Jeong S, Sohn I (2018) Semivarying coefficient least-squares support vector regression for analyzing high-dimensional gene-environmental data. J Appl Stat 45(8):1370–1381
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Series B Stat Methodol 70(5):849–911
Song R, Lu W, Ma S, Jessie Jeng X (2014) Censored rank independence screening for high-dimensional survival data. Biometrika 101(4):799–814
Hao N, Zhang HH (2014) Interaction screening for ultrahigh-dimensional data. J Am Stat Assoc 109(507):1285–1301
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Wu C, Zhou F, Ren J, Li X, Jiang Y, Ma S (2019) A selective review of multi-level omics data integration using variable selection. High Throughput 8(1):4
Lu M, Lee HS, Hadley D, Huang JZ, Qian X (2014) Logistic principal component analysis for rare variants in gene-environment interaction analysis. IEEE/ACM Trans Comput Biol Bioinform 11(6):1020–1028
Ko YA, Mukherjee B, Smith JA, Kardia SL, Allison M, Roux AVD (2016) Classification and clustering methods for multiple environmental factors in gene-environment interaction–application to the multi-ethnic study of atherosclerosis. Epidemiology 27(6):870
Wang Y, Xu M, Wang Z, Tao M, Zhu J, Wang L, Wu R (2011) How to cluster gene expression dynamics in response to environmental signals. Brief Bioinform 13(2):162–174
Wang T, Ho G, Ye K, Strickler H, Elston RC (2009) A partial least-square approach for modeling gene-gene and gene-environment interactions when multiple markers are genotyped. Genet Epidemiol 33(1):6–15
Ma S, Yang L, Romero R, Cui Y (2011) Varying coefficient model for gene–environment interaction: a non-linear look. Bioinformatics 27(15):2119–2126
Wu C, Cui Y (2013) A novel method for identifying nonlinear gene–environment interactions in case–control association studies. Hum Genet 132(12):1413–1425
Cornelis MC, Agrawal A, Cole JW, Hansel NN, Barnes KC, Beaty TH et al (2010) The gene, environment association studies consortium (GENEVA): maximizing the knowledge obtained from GWAS by collaboration across studies of multiple conditions. Genet Epidemiol 34(4):364–372
Liu X, Zhong P-S, Cui YH (2020) Joint test of parametric and nonparametric effects in partial linear models for gene-environment interaction. Stat Sin 30:325–346
Shi X, Liu J, Huang J, Zhou Y, Xie Y, Ma S (2014) A penalized robust method for identifying gene–environment interactions. Genet Epidemiol 38(3):220–230
Xie Y, Xiao G, Coombes KR, Behrens C, Solis LM, Raso G, Girard L, Erickson H, Roth J, Heymach J, Moran C, Danenberg K, Minna J, Wistuba I (2011) Robust gene expression signature from formalin-fixed paraffin-embedded samples predicts prognosis of non–small-cell lung cancer patients. Clin Cancer Res 17(17):5705–5714
Chai H, Zhang Q, Jiang Y, Wang G, Zhang S, Ahmed SE, Ma S (2017) Identifying gene-environment interactions for prognosis using a robust approach. Econom Stat 4:105–120
Bien J, Taylor J, Tibshirani R (2013) A lasso for hierarchical interactions. Ann Stat 41(3):1111
Choi NH, Li W, Zhu J (2010) Variable selection with the strong heredity constraint and its oracle property. J Am Stat Assoc 105(489):354–364
Zhang S, Xue Y, Zhang Q, Ma C, Wu M, Ma S (2020) Identification of gene–environment interactions with marginal penalization. Genet Epidemiol 44(2):159–196
Simon N, Friedman J, Hastie T, Tibshirani R (2013) A sparse-group lasso. J Comput Graph Stat 22(2):231–245
Liu J, Huang J, Zhang Y, Lan Q, Rothman N, Zheng T, Ma S (2013) Identification of gene–environment interactions in cancer studies using penalization. Genomics 102(4):189–194
Wu C, Jiang Y, Ren J, Cui Y, Ma S (2018) Dissecting gene-environment interactions: a penalized robust approach accounting for hierarchical structures. Stat Med 37(3):437–456
Xu Y, Wu M, Zhang Q, Ma S (2019) Robust identification of gene-environment interactions for prognosis using a quantile partial correlation approach. Genomics 111(5):1115–1123
Ma S, Xu S (2015) Semiparametric nonlinear regression for detecting gene and environment interactions. J Stat Plan Inference 156:31–47
Liu X, Cui Y, Li R (2016) Partial linear varying multi-index coefficient model for integrative gene-environment interactions. Stat Sin 26:1037
Lin X, Lee S, Christiani DC, Lin X (2013) Test for interactions between a genetic marker set and environment in generalized linear models. Biostatistics 14(4):667–681
He Z, Zhang M, Lee S, Smith JA, Kardia SL, Roux VD, Mukherjee B (2017) Set-based tests for the gene–environment interaction in longitudinal studies. J Am Stat Assoc 112(519):966–978
Antonelli J, Mazumdar M, Bellinger D, Christiani DC, Wright R, Coull BA (2017). Estimating the health effects of environmental mixtures using Bayesian semiparametric regression and sparsity inducing priors. arXiv:1711.11239
Bai R, Moran GE, Antonelli J, Chen Y, Boland MR (2019) Spike-and-slab group lassos for grouped regression and sparse generalized additive models. arXiv:1903.01979
Ma S, Song PXK (2015) Varying index coefficient models. J Am Stat Assoc 110(509):341–356
Zhang Y, Holford TR, Leaderer B, Boyle P, Zahm SH, Flynn S, Tallini G, Owens P, Zheng T (2004) Hair-coloring product use and risk of non-Hodgkin’s lymphoma: a population-based case-control study in Connecticut. Am J Epidemiol 159(2):148–154
Zhang Y, Lan Q, Rothman N, Zhu Y, Zahm S, Wang S, Holford T, Leaderer B, Boyle P, Zhang B, Zou K, Chanock S, Zheng T (2005) A putative exonic splicing polymorphism in the BCL6 gene and the risk of non-Hodgkin lymphoma. J Natl Cancer Inst 97(21):1616–1618
Wu M, Zhang Q, Ma S (2020) Structured gene-environment interaction analysis. Biometrics 76(1):23–35
Wang X, Xu Y, Ma S (2019). Identifying gene-environment interactions incorporating prior information. Stat Med 38(9):1620–1633
Zhou F, Ren J, Li G, Jiang Y, Li X, Wang W, Wu C (2019). Penalized variable selection for lipid–environment interactions in a longitudinal lipidomics study. Genes 10:1002
Zhou F, Ren J, Li X, Wu C, Jiang Y (2019) Package ‘interep’: interaction analysis of repeated measure data. R package version 0.3.0
Zhou F, Lu X, Ren J, Fan K, Ma S, Wu C (2020). Sparse group variable selection for Gene-environment interactions in the longitudinal study. (under review)
Ren J, Zhou F, Li X, Ma S, Jiang Y, Wu C (2020). Robust Bayesian variable selection for gene-environment interactions. arXiv preprint arXiv:2006.05455
Wu M, Zang Y, Zhang S, Huang J, Ma S (2017). Accommodating missingness in environmental measurements in gene-environment interaction analysis. Genet Epidemiol 41(6):523–554
Du Y, Ren J, Zhou F, Jiang Y, Ma S, Wu C (2020). Integrating multi-omics data for gene-environment interactions. (To be submitted)
Xu Y, Wu M, Ma S, Ejaz Ahmed S (2018) Robust gene–environment interaction analysis using penalized trimmed regression. J Stat Comput Simul 88(18):3502–3528
Xu Y, Zhong T, Wu M, Ma S (2019) Histopathological imaging–environment interactions in cancer modeling. Cancers 11(4):579
Wu C, Cui Y (2013) Boosting signals in gene-based association studies via efficient SNP selection. Brief Bioinform 15(2):279–291
Jin L, Zuo X, Su W, Zhao X, Yuan M, Han L, Zhao X, Chen Y, Rao S (2014) Pathway-based analysis tools for complex diseases: a review. Genomics Proteomics Bioinformatics 12(5):210–220
Jiang Y, Huang Y, Du Y, Zhao Y, Ren J, Ma S, Wu C (2017) Identification of prognostic genes and pathways in lung adenocarcinoma using a Bayesian approach. Cancer Informat 1(7)
Wu C, Zhong PS, Cui Y (2013) High dimensional variable selection for gene-environment interactions. Technical Report. Michigan State University, Michigan
Wu C, Zhong PS, Cui Y (2018) Additive varying-coefficient model for nonlinear gene-environment interactions. Stat Appl Genet Mol Biol 17(2)
Wang L, Li H, Huang JZ (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103(484):1556–1569
Wu C, Cui Y, Ma S (2014) Integrative analysis of gene–environment interactions under a multi-response partially linear varying coefficient model. Stat Med 33(28):4988–4998
Wu C, Shi X, Cui Y, Ma S (2015) A penalized robust semiparametric approach for gene–environment interactions. Stat Med 34(30):4016–4030
Ren J, Zhou F, Li X, Wu C, Jiang Y (2019) Package ‘spinBayes’: semi-parametric gene-environment interaction via Bayesian variable selection. R package version 0.1.0. 2019
Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Series B Stat Methodol 55(4):757–779
Fan J, Zhang W (2008) Statistical methods with varying coefficient models. Stat Interface 1(1):179
Kaslow RA, Phair JP, Friedman HB, Lyter D, Solomon RE, Dudley J, Polk BF, Blackwelder W (1987) Infection with the human immunodeficiency virus: clinical manifestations and their relationship to immune deficiency: a report from the multicenter AIDS cohort study. Ann Intern Med 107(4):474–480
Morris JS (2015) Functional regression. Annu Rev Stat Appl 2:321–359
Fu WJ (1998) Penalized regressions: the bridge versus the lasso. J Comput Graph Stat 7(3):397–416
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494
Wu C, Zhang Q, Jiang Y, Ma S (2018) Robust network-based analysis of the associations between (epi) genetic measurements. J Multivar Anal 168:119–130
Ren J, Du Y, Li S, Ma S, Jiang Y, Wu C (2019) Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis. Genet Epidemiol 43(3):276–291
Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning Now Publishers Inc Norwell, MA, 3(1), 1-122
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J R Stat Soc Series B Stat Methodol 67(1):91–108
Sun H, Wang S (2013) Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data. Stat Med 32(12):2127–2139
Ren J, He T, Li Y, Liu S, Du Y, Jiang Y, Wu C (2017) Network-based regularization for high dimensional SNP data in the case–control study of type 2 diabetes. BMC Genet 18(1):44
Kim K, Sun H (2019) Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data. BMC Bioinformatics 20(1):510
Gjuvsland AB, Hayes BJ, Omholt SW, Carlborg Ö (2007) Statistical epistasis is a generic feature of gene regulatory networks. Genetics 175(1):411–420
Hu T, Sinnott-Armstrong NA, Kiralis JW, Andrew AS, Karagas MR, Moore JH (2011) Characterizing genetic interactions in human disease association studies using statistical epistasis networks. BMC Bioinformatics 12(1):364
Hu T, Andrew AS, Karagas MR, Moore JH (2013) Statistical epistasis networks reduce the computational complexity of searching three-locus genetic models. Pac Symp Biocomput 2013:397–408
King B, Lu L, Yu M, Jiang Y, Standard J, Su X, Zhao Z, Wang W (2015) Lipidomic profiling of di-and tri-acylglycerol species in weight-controlled mice. PLoS One 10(2):e0116398
Acknowledgements
We thank the editor and reviewers for their invitation, careful review, and insightful comments, leading to a significant improvement of this article. This study has been partly supported by the National Institutes of Health (CA191383, CA204120), the VA Cooperative Studies Program of the Department of VA, Office of Research and Development, an innovative research award from KSU Johnson Cancer Research Center, and a KSU Faculty Enhancement Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Zhou, F., Ren, J., Lu, X., Ma, S., Wu, C. (2021). Gene–Environment Interaction: A Variable Selection Perspective. In: Wong, KC. (eds) Epistasis. Methods in Molecular Biology, vol 2212. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0947-7_13
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0947-7_13
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0946-0
Online ISBN: 978-1-0716-0947-7
eBook Packages: Springer Protocols