Abstract
In biology, the term “epistasis” indicates the effect of the interaction of a gene with another gene. A gene can interact with an independently sorted gene, located far away on the chromosome or on an entirely different chromosome, and this interaction can have a strong effect on the function of the two genes. These changes then can alter the consequences of the biological processes, influencing the organism’s phenotype. Machine learning is an area of computer science that develops statistical methods able to recognize patterns from data. A typical machine learning algorithm consists of a training phase, where the model learns to recognize specific trends in the data, and a test phase, where the trained model applies its learned intelligence to recognize trends in external data. Scientists have applied machine learning to epistasis problems multiple times, especially to identify gene–gene interactions from genome-wide association study (GWAS) data. In this brief survey, we report and describe the main scientific articles published in data mining and epistasis. Our article confirms the effectiveness of machine learning in this genetics subfield.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Losos JB, Mason KA, Singer SR, et al (2008) Biology, 8th edn. McGraw Hill, New York City, pp 233–234
Alberts B, Johnson A, Walter P, et al (2008) Molecular biology of the cell, 5th edn. Garland Science, New York City
Roff DA, Emerson K (2006) Epistasis and dominance: evidence for differential effects in life-history versus morphological traits. Evolution 60(10):1981–1990
Snustad DP, Simmons MJ (2015) Principles of genetics, binder ready version. Wiley, Hoboken
Smith SD, Wang S, Rausher MD (2012) Functional evolution of an anthocyanin pathway enzyme during a flower color transition. Mol Biol Evol 30(3):602–612
Katsumoto Y, Fukuchi-Mizutani M, Fukui Y, et al (2007) Engineering of the rose flavonoid biosynthetic pathway successfully generated blue-hued flowers accumulating delphinidin. Plant Cell Physiol 48(11):1589–1600
Gonnet JF (2003) Origin of the color of Cv. Rhapsody in blue rose and some other so-called “blue” roses. J Agric Food Chem 51(17):4990–4994
Nakamura N, Fukuchi-Mizutani M, Fukui Y, et al (2010) Generation of pink flower varieties from blue Torenia hybrida by redirecting the flavonoid biosynthetic pathway from delphinidin to pelargonidin. Plant Biotechnol 27(5):375–383
Chayut N, Yuan H, Ohali S, et al (2017) Distinct mechanisms of the ORANGE protein in controlling carotenoid flux. Plant Physiol 173(1):376–389
Wolf JB, Brodie ED, Wade MJ (2000) Epistasis and the evolutionary process. Oxford University Press, Oxford
Abu-Mostafa YS, Magdon-Ismail M, Lin HT (2012) Learning from data, vol 4. AMLBook, New York City
Chicco D (2017) Ten quick tips for machine learning in computational biology. BioData Min 10(1):35
Libbrecht MW, Noble WS (2015) Machine learning applications in genetics and genomics. Nat Rev Genet 16(6):321–332
Pearson TA, Manolio TA (2008) How to interpret a genome-wide association study. J Am Med Assoc 299(11):1335–1344
Zhang X, Huang S, Zhang Z, et al (2012) Chapter 10: Mining genome-wide genetic markers. PLoS Comput Biol 8(12):e1002828
Niel C, Sinoquet C, Dina C, et al (2015) A survey about methods dedicated to epistasis detection. Front Genet 6:285
Cole BS, Hall MA, Urbanowicz RJ, et al (2017) Analysis of gene-gene interactions. Curr Protoc Hum Genet 95(1):1–14
Pautasso M (2013) Ten simple rules for writing a literature review. PLoS Comput Biol 9(7):e1003149
Moore JH, Gilbert JC, Tsai CT, et al (2006) A flexible computational frame-work for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J Theor Biol 241(2):252–261
Motsinger-Reif AA, Dudek SM, Hahn LW, et al (2008) Comparison of approaches for machine-learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology. Genet Epidemiol 32(4):325–340
O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evol Comput 5(4):349–358
Briggs F, Ramsay P, Madden E, et al (2010) Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis. Genes Immun 11(3):199
Jiang X, Neapolitan RE, Barmada MM, et al (2011) Learning genetic epistasis using Bayesian network scoring criteria. BMC Bioinf 12(1):89
Collins RL, Hu T, Wejse C, et al (2013) Multifactor dimensionality reduction reveals a three-locus epistatic interaction associated with susceptibility to pulmonary tuberculosis. BioData Min 6(1):4
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of AAAI 1992 – the 10th national conference on artificial intelligence, vol 2, pp 129–134
Hibar DP, Stein JL, Jahanshad N, et al (2013) Exhaustive search of the SNP-SNP interactome identifies epistatic effects on brain volume in two cohorts. In: Proceedings of MICCAI 2013 – the 16th international conference on medical image computing and computer-assisted intervention. Springer, Berlin, pp 600–607
Petersen RC, Aisen P, Beckett LA, et al (2010) Alzheimer’s disease neuroimaging initiative (ADNI): clinical characterization. Neurology 74(3):201–209
Granados EAO, Vásquez LFN, Granados HA (2013) Characterizing genetic interactions using a machine learning approach in Colombian patients with Alzheimer’s disease. In: Proceedings of IEEE BIBE 2013 – the 13th IEEE international conference on bioinformatics and bioengineering. IEEE, Chania, pp 1–2
de Oliveira FC, Borges CCH, Almeida FN, et al (2014) SNPs selection using support vector regression and genetic algorithms in GWAS. BMC Genomics 15(7):S4
Howard R, Carriquiry AL, Beavis WD (2014) Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3: Genes Genomes Genet 4(6):1027–1046
Uppu S, Krishna A, Gopalan RP (2014) An associative classification based approach for detecting SNP-SNP interactions in high dimensional genome. In: Proceedings of IEEE BIBE 2014 – the 14th IEEE international conference on bioinformatics and bioengineering. IEEE, Boca Raton, pp 329–333
Holzinger ER, Szymczak S, Dasgupta A, et al (2014) Variable selection method for the identification of epistatic models. In: Pacific symposium on bio-computing. World Scientific, Singapore, pp 195–206
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Li Q, Kim Y, Suktitipat B, et al (2015) Gene-gene interaction among WNT genes for oral cleft in trios. Genet Epidemiol 39(5):385–394
Moore JH (2015) Epistasis analysis using ReliefF. Methods Mol Biol 1253:315–325
Li J, Malley JD, Andrew AS, et al (2016) Detecting gene-gene interactions using a permutation-based random forest method. BioData Min 9(1):14
Howard R, Carriquiry AL, Beavis WD (2017) Application of response surface methods to determine conditions for optimal genomic prediction. G3: Genes Genomes Genet 7(9):3103–3113
Byvatov E, Schneider G (2003) Support vector machine applications in bioinformatics. Appl Bioinf 2(2):67–77
Cloninger CR, Zwir I (2018) What is the natural measurement unit of temperament: single traits or profiles? Philos Trans R Soc B Biol Sciences 373(1744):20170163
Arabnejad M, Dawkins B, Bush WS, et al (2018) Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS. BioData Min 11(1):23
Piette ER, Moore JH (2018) Improving machine learning reproducibility in genetic association studies with proportional instance cross validation (PICV). BioData Min 11(1):6
Li B, Zhang N, Wang YG, et al (2018) Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods. Front Genet 9:237
Salesi S, Alani AA, Cosma G (2018) A hybrid model for classification of biomedical data using feature filtering and a convolutional neural network. In: Proceedings of SNAMS 2018 – the 5th international conference on social networks analysis, management and security. IEEE, Piscataway, pp 226–232
Urbanowicz RJ, Kiralis J, Sinnott-Armstrong NA, et al (2012) GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min 5(1):16
Wang J, Chen J, Wang H (2018) A new model based on fuzzy integral for cancer prediction. In: Proceedings of IEEE BIBM 2018 – the 2018 IEEE international conference on bioinformatics and biomedicine. IEEE, Piscataway, pp 2309–2315
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta Protein Struct 405(2):442–451
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(6)
Chang YC, Wu JT, Hong MY, et al (2018) GenEpi: gene-based epistasis discovery using machine learning. bioRxiv 421719:1–41
Li Y, Raidan F, Li B, et al (2018) Using random forests as a prescreening tool for genomic prediction: impact of subsets of SNPs on prediction accuracy of total genetic values. In: Proceedings of the world congress on genetics applied to livestock production, vol 11, pp 1–5
Verma SS, Lucas A, Zhang X, et al (2018) Collective feature selection to identify crucial epistatic variants. BioData Min 11(1):5
Carey DJ, Fetterolf SN, Davis FD, et al (2016) The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med 18(9):906
Ansarifar J, Wang L (2019) New algorithms for detecting multi-effect and multi-way epistatic interactions. Bioinformatics 35(24):5078–5085
Hanley JP, Rizzo DM, Buzas JS, et al (2019) A tandem evolutionary algorithm for identifying causal rules from complex data. Evol Comput 28:1–32
Yang CH, Yang HS, Chuang LY (2019) PBMDR: a particle swarm optimization-based multifactor dimensionality reduction for the detection of multilocus interactions. J Theor Biol 461:68–75
Eberhart R, Kennedy J (1995) Particle swarm optimization. In: Proceedings of ICNN 1995 – the 1995 IEEE international conference on neural networks, vol 4. Citeseer, pp 1942–1948
Chen Q, Zhang X, Zhang R (2019) Privacy-preserving decision tree for epistasis detection. Cybersecurity 2(1):7
Romagnoni A, Jégou S, Van Steen K, et al (2019) Comparative performances of machine learning methods for classifying Crohn disease patients using genome-wide genotyping data. Sci Rep 9(1):10351
Castelvecchi D (2016) Can we open the black box of AI? Nat News 538(7623):20
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215
Acknowledgements
The authors thank Ka Chun Wong (City University of Hong Kong) for having curated this book.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Chicco, D., Faultless, T. (2021). Brief Survey on Machine Learning in Epistasis. In: Wong, KC. (eds) Epistasis. Methods in Molecular Biology, vol 2212. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0947-7_11
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0947-7_11
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0946-0
Online ISBN: 978-1-0716-0947-7
eBook Packages: Springer Protocols