Abstract
As databases of genome data continue to grow, our understanding of the functional elements of the genome grows as well. Many genetic changes in the genome have now been discovered and characterized, including both disease-causing mutations and neutral polymorphisms. In addition to experimental approaches to characterize specific variants, over the past decade, there has been intense bioinformatic research to understand the molecular effects of these genetic changes. In addition to genomic experimental assays, the bioinformatic efforts have focused on two general areas. First, researchers have annotated genetic variation data with molecular features that are likely to affect function. Second, statistical methods have been developed to predict mutations that are likely to have a molecular effect. In this protocol manuscript, methods for understanding the molecular functions of single nucleotide polymorphisms (SNPs) and mutations are reviewed and described. The intent of this chapter is to provide an introduction to the online tools that are both easy to use and useful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mooney, S. (2005) Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis. Brief Bioinform, 6, 44-56.
Ng, P.C. and Henikoff, S. (2006) Predicting the effects of amino Acid substitutions on protein function. Annu Rev Genomics Hum Genet, 7, 61-80.
Steward, R.E., MacArthur, M.W., Laskowski, R.A. and Thornton, J.M. (2003) Molecular basis of inherited diseases: a structural perspective. Trends Genet, 19, 505-513.
Cooper, D.N., Stenson, P.D. and Chuzhanova, N.A. (2006) The Human Gene Mutation Database (HGMD) and its exploitation in the study of mutational mechanisms. Curr Protoc Bioinformatics, Chapter 1, Unit 1.13.
Hamosh, A., Scott, A.F., Amberger, J., Valle, D. and McKusick, V.A. (2000) Online Mendelian Inheritance in Man (OMIM). Hum Mutat, 15, 57-61.
Altman, R.B. (2007) PharmGKB: a logical home for knowledge relating genotype to drug response phenotype. Nat Genet, 39, 426.
Mailman, M.D., Feolo, M., Jin, Y., Kimura, M., Tryka, K., Bagoutdinov, R., et al. (2007) The NCBI dbGaP database of genotypes and phenotypes. Nat Genet, 39, 1181-1186.
Sjoblom, T., Jones, S., Wood, L.D., Parsons, D.W., Lin, J., Barber, T.D., et al. (2006) The consensus coding sequences of human breast and colorectal cancers. Science, 314, 268-274.
Greenman, C., Stephens, P., Smith, R., Dalgliesh, G.L., Hunter, C., Bignell, G., et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature, 446, 153-158.
Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C. and Ferrin, T.E. (2004) UCSF Chimera - a visualization system for exploratory research and analysis. J Comput Chem, 25, 1605-1612.
Chen, R., Morgan, A.A., Dudley, J., Deshpande, T., Li, L., Kodama, K., Chiang, A.P. and Butte, A.J. (2008) FitSNPs: highly differentially expressed genes are more likely to have variants associated with disease. Genome Biol, 9, R170.
Aerts, S., Lambrechts, D., Maity, S., Van Loo, P., Coessens, B., De Smet, F., et al. (2006) Gene prioritization through genomic data fusion. Nat Biotechnol, 24, 537-544.
van Driel, M.A., Cuelenaere, K., Kemmeren, P.P., Leunissen, J.A., Brunner, H.G. and Vriend, G. (2005) GeneSeeker: extraction and integration of human disease-related information from web-based genetic databases. Nucleic Acids Res, 33, W758-W761.
Perez-Iratxeta, C., Wjst, M., Bork, P. and Andrade, M.A. (2005) G2D: a tool for mining genes associated with disease. BMC Genet, 6, 45.
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25, 25-29.
Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S. (2006) SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics, 22, 773-774.
Adie, E.A., Adams, R.R., Evans, K.L., Porteous, D.J. and Pickard, B.S. (2005) Speeding disease gene discovery by sequence based candidate prioritization. BMC Bioinformatics, 6, 55.
Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., et al. (2007) New developments in the InterPro database. Nucleic Acids Res, 35, D224-D228.
Rossi, S., Masotti, D., Nardini, C., Bonora, E., Romeo, G., Macii, E., et al. (2006) TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res, 34, W285-W292.
Franke, L., van Bakel, H., Fokkens, L., de Jong, E.D., Egmont-Petersen, M. and Wijmenga, C. (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet, 78, 1011-1025.
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. and Kanehisa, M. (1999) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res, 27, 29-34.
Bader, G.D., Betel, D. and Hogue, C.W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res, 31, 248-250.
Peri, S., Navarro, J.D., Kristiansen, T.Z., Amanchy, R., Surendranath, V., Muthusamy, B., et al. (2004) Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res, 32, D497-D501.
Mishra, G.R., Suresh, M., Kumaran, K., Kannabiran, N., Suresh, S., Bala, P., et al. (2006) Human protein reference database - 2006 update. Nucleic Acids Res, 34, D411-D414.
George, R.A., Liu, J.Y., Feng, L.L., Bryson-Richardson, R.J., Fatkin, D. and Wouters, M.A. (2006) Analysis of protein sequence and interaction data for candidate disease gene prediction. Nucleic Acids Res, 34, e130.
Radivojac, P., Peng, K., Clark, W.T., Peters, B.J., Mohan, A., Boyle, S.M. and Mooney, S.D. (2008) An integrated approach to inferring gene-disease associations in humans. Proteins, 72, 1030-1037.
Tiffin, N., Adie, E., Turner, F., Brunner, H.G., van Driel, M.A., Oti, M., et al. (2006) Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes. Nucleic Acids Res, 34, 3067-3081.
Turner, F.S., Clutterbuck, D.R. and Semple, C.A. (2003) POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol, 4, R75.
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res, 31, 51-54.
Birney, E., Andrews, D., Bevan, P., Caccamo, M., Cameron, G., Chen, Y., et al. (2004) Ensembl 2004. Nucleic Acids Res, 32 Database issue, D468-D470.
Laskowski, R.A. and Thornton, J.M. (2008) Understanding the molecular machinery of genetics through 3D structures. Nat Rev Genet, 9, 141-151.
Karchin, R., Diekhans, M., Kelly, L., Thomas, D.J., Pieper, U., Eswar, N., et al. (2005) LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics, 21, 2814-2820.
Yue, P., Melamud, E. and Moult, J. (2006) SNPs3D: candidate gene and SNP selection for association studies. BMC Bioinformatics, 7, 166.
Singh, A., Olowoyeye, A., Baenziger, P.H., Dantzer, J., Kann, M.G., Radivojac, P., et al. (2007) MutDB: update on development of tools for the biochemical analysis of genetic variation. Nucleic Acids Res, 36 (Database issue), D815-D819.
Jegga, A.G., Gowrisankar, S., Chen, J. and Aronow, B.J. (2007) PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease. Nucleic Acids Res, 35, D700-D706.
Pieper, U., Eswar, N., Braberg, H., Madhusudhan, M.S., Davis, F.P., Stuart, A.C., et al. (2004) MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res, 32 Database issue, D217-D222.
Youn, E., Peters, B., Radivojac, P. and Mooney, S.D. (2006) Evaluation of features for catalytic residue prediction in novel folds. Protein Sci, 16, 216-226.
Ofran, Y. and Rost, B. (2003) Predicted protein-protein interaction sites from local sequence information. FEBS Lett, 544, 236-239.
Iakoucheva, L.M., Radivojac, P., Brown, C.J., O’Connor, T.R., Sikes, J.G., Obradovic, Z. and Dunker, A.K. (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res, 32, 1037-1049.
Wang, Z. and Moult, J. (2001) SNPs, protein structure, and disease. Hum Mutat, 17, 263-270.
Ye, Y., Li, Z. and Godzik, A. (2006) Modeling and analyzing three-dimensional structures of human disease proteins. Pac Symp Biocomput, 11, 439-446.
Radivojac, P., Baenziger, P.H., Kann, M.G., Mort, M.E., Hahn, M.W. and Mooney, S.D. (2008) Gain and loss of phosphorylation sites in human cancer. Bioinformatics, 24, i241-i247.
UniProt Consortium (2008) The universal protein resource (UniProt). Nucleic Acids Res, 36, D190-D195.
Wang, P., Dai, M., Xuan, W., McEachin, R.C., Jackson, A.U., Scott, L.J., et al. (2006) SNP Function Portal: a web database for exploring the function implication of SNP alleles. Bioinformatics, 22, e523-e529.
Reumers, J., Maurer-Stroh, S., Schymkowitz, J. and Rousseau, F. (2006) SNPeffect v2.0: a new step in investigating the molecular phenotypic effects of human non-synonymous SNPs. Bioinformatics, 22, 2183-2185.
Conde, L., Vaquerizas, J.M., Santoyo, J., Al-Shahrour, F., Ruiz-Llorente, S., Robledo, M. and Dopazo, J. (2004) PupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level. Nucleic Acids Res, 32, W242-W248.
Reumers, J., Conde, L., Medina, I., Maurer-Stroh, S., Van Durme, J., Dopazo, J., et al. (2008) Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases. Nucleic Acids Res, 36, D825-D829.
Cai, Z., Tsung, E.F., Marinescu, V.D., Ramoni, M.F., Riva, A. and Kohane, I.S. (2004) Bayesian approach to discovering pathogenic SNPs in conserved protein domains. Hum Mutat, 24, 178-184.
Chasman, D. and Adams, R.M. (2001) Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J Mol Biol, 307, 683-706.
Krishnan, V.G. and Westhead, D.R. (2003) A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function. Bioinformatics, 19, 2199-2209.
Saunders, C.T. and Baker, D. (2002) Evaluation of structural and evolutionary contributions to deleterious mutation prediction. J Mol Biol, 322, 891-901.
Vitkup, D., Sander, C. and Church, G.M. (2003) The amino-acid mutational spectrum of human genetic disease. Genome Biol, 4, R72.
Care, M.A., Needham, C.J., Bulpitt, A.J. and Westhead, D.R. (2007) Deleterious SNP prediction: be mindful of your training data! Bioinformatics, 23, 664-672.
Ferrer-Costa, C., Gelpi, J.L., Zamakola, L., Parraga, I., de la Cruz, X. and Orozco, M. (2005) PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics, 21, 3176-3178.
Ramensky, V., Bork, P. and Sunyaev, S. (2002) Human non-synonymous SNPs: server and survey. Nucleic Acids Res, 30, 3894-3900.
Ng, P.C. and Henikoff, S. (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res, 31, 3812-3814.
Ye, Z.Q., Zhao, S.Q., Gao, G., Liu, X.Q., Langlois, R.E., Lu, H. and Wei, L. (2007) Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP). Bioinformatics, 23, 1444-1450.
Bromberg, Y. and Rost, B. (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res, 35, 3823-3835.
Tian, J., Wu, N., Guo, X., Guo, J., Zhang, J. and Fan, Y. (2007) Predicting the phenotypic effects of non-synonymous single nucleotide polymorphisms based on support vector machines. BMC Bioinformatics, 8, 450.
Mi, H., Lazareva-Ulitsky, B., Loo, R., Kejariwal, A., Vandergriff, J., Rabkin, S., et al. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res, 33, D284-D288.
Wang, G.S. and Cooper, T.A. (2007) Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet, 8, 749-761.
Freimuth, R.R., Stormo, G.D. and McLeod, H.L. (2005) PolyMAPr: programs for polymorphism database mining, annotation, and functional analysis. Hum Mutat, 25, 110-117.
Smith, P.J., Zhang, C., Wang, J., Chew, S.L., Zhang, M.Q. and Krainer, A.R. (2006) An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Hum Mol Genet, 15, 2490-2508.
Yvert, G., Brem, R.B., Whittle, J., Akey, J.M., Foss, E., Smith, E.N., et al. (2003) Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet, 35, 57-64.
Hudson, T.J. (2003) Wanted: regulatory SNPs. Nat Genet, 33, 439-440.
Pruitt, K.D. and Maglott, D.R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res, 29, 137-140.
Riva, A. and Kohane, I.S. (2002) SNPper: retrieval and analysis of human SNPs. Bioinformatics, 18, 1681-1685.
Kim, B.C., Kim, W.Y., Park, D., Chung, W.H., Shin, K.S. and Bhak, J. (2008) SNP@Promoter: a database of human SNPs (single nucleotide polymorphisms) within the putative promoter regions. BMC Bioinformatics, 9 Suppl 1, S2.
Matys, V., Fricke, E., Geffers, R., Gossling, E., Haubrock, M., Hehl, R., et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res, 31, 374-378.
Chen, K. and Rajewsky, N. (2006) Natural selection on human microRNA binding sites inferred from SNP data. Nat Genet, 38, 1452-1456.
Montgomery, S.B., Griffith, O.L., Schuetz, J.M., Brooks-Wilson, A. and Jones, S.J. (2007) A survey of genomic properties for the detection of regulatory polymorphisms. PLoS Comput Biol, 3, e106.
Segal, E., Raveh-Sadka, T., Schroeder, M., Unnerstall, U. and Gaul, U. (2008) Predicting expression patterns from regulatory sequence in Drosophila segmentation. Nature, 451, 535-540.
Kawabata, T., Ota, M. and Nishikawa, K. (1999) The Protein Mutant Database. Nucleic Acids Res, 27, 355-357.
Acknowledgments
We are graciously supported by K22LM009135 (PI: Mooney), R01LM009722 (PI: Mooney), P01AG018397 (PI: Econs), U01GM061373 (PI: Flockhart), and the Indiana Genomics Initiative. The Indiana Genomics Initiative (INGEN) is supported in part by the Lilly Endowment.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Mooney, S.D., Krishnan, V.G., Evani, U.S. (2010). Bioinformatic Tools for Identifying Disease Gene and SNP Candidates. In: Barnes, M., Breen, G. (eds) Genetic Variation. Methods in Molecular Biology, vol 628. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60327-367-1_17
Download citation
DOI: https://doi.org/10.1007/978-1-60327-367-1_17
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60327-366-4
Online ISBN: 978-1-60327-367-1
eBook Packages: Springer Protocols