Abstract
The sequence of many eukaryotic genomes is nowadays available from a personal computer to any researcher in the world-wide scientific community. However, the sequences are worthless without the adequate annotation of the biological meaningful elements. The annotation of the genes, in particular, is a challenging task that can not be tackled without the aid of specific bioinformatics tools. We present in this chapter a simple protocol mainly based on the combination of the program GeneID and other computational tools to annotate the location of a gene, which was previously annotated in D. melanogaster, in the recently assembled genome of D. yakuba.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Blanco, E., and R. Guigó (2005) Predictive methods using DNA sequences, in Bioinformatics : A Practical Guide to the Analysis of Genes and Proteins (Baxevanis, A.D. and Ouellette, B.F.F. Eds). Wiley-Interscience: Hoboken, NJ, p. xviii, 540 p.
ENCODE Project Consortium (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146), 799–816.
Zhang, M. Q. (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3(9), 698–709.
Venter, J. C., et al. (2001) The sequence of the human genome. Science 291(5507), 1304–51.
Nagaraj, S. H., Gasser, R. B., and Ranganathan, S. (2007) A hitchhiker’s guide to expressed sequence tag (EST) analysis. Brief Bioinform 8(1), 6–21.
Stanke, M., Tzvetkova, A., and Morgenstern, B. (2006) AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome. Genome Biol 7 Suppl 1, S11 1–8.
Allen, J. E., and Salzberg, S. L. (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21(18), 3596–603.
Kuhn, R. M., et al. (2007) The UCSC genome browser database: update 2007. Nucleic Acids Res 35(Database issue), D668–73.
Hubbard, T. J., et al. (2007) Ensembl 2007. Nucleic Acids Res 35(Database issue), D610–7.
Wheeler, D. L., et al. (2007) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 35(Database issue), D5–12.
Guigo, R., et al. (1992) Prediction of gene structure. J Mol Biol 226(1), 141–57.
Parra, G., Blanco, E., and Guigo, R. (2000) GeneID in Drosophila. Genome Res 10(4), 511–5.
Blanco, E., Parra, G., and Guigó, R. (2007) Using geneid to identify genes in Current Protocols in Bioinformatics (Baxevanis, A. D. et al., Eds). John Wiley & Sons: New York, p. 1–28 (Unit 4.3).
Burge, C., and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268(1), 78–94.
Besemer, J., and Borodovsky, M. (2005) GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res 33(Web Server issue), W451–4.
Uberbacher, E. C., and Mural, R. J. (1991) Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. Proc Natl Acad Sci USA 88(24), 11261–5.
Salamov, A. A., and Solovyev, V. V. (2000) Ab initio gene finding in Drosophila genomic DNA. Genome Res 10(4), 516–22.
Reese, M. G., et al. (2000) Genome annotation assessment in Drosophila melanogaster. Genome Res 10(4), 483–501.
Glockner, G., et al. (2002) Sequence and analysis of chromosome 2 of Dictyostelium discoideum. Nature 418(6893), 79–85.
Jaillon, O., et al. (2004) Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431(7011), 946–57.
Aury, J. M., et al. (2006) Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444(7116), 171–8.
Guigo, R., et al. (2006) EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biol 7 Suppl 1, S2 1–31.
Gingeras, T. R. (2007) Origin of phenotypes: genes and transcripts. Genome Res 17(6), 682–90.
Ladd, A. N., and Cooper, T. A. (2002) Finding signals that regulate alternative splicing in the post-genomic era. Genome Biol 3(11), reviews0008.
Low, S. C., and Berry, M. J. (1996) Knowing when not to stop: selenocysteine incorporation in eukaryotes. Trends Biochem Sci 21(6), 203–8.
Castellano, S., et al. (2004) Reconsidering the evolution of eukaryotic selenoproteins: a novel nonmammalian family with scattered phylogenetic distribution. EMBO Rep 5(1), 71–7.
Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35(Database issue), D61–5.
Crosby, M. A., et al. (2007) FlyBase: genomes by the dozen. Nucleic Acids Res 35(Database issue), D486–91.
Guigo, R. (1998) Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol 5(4), 681–702.
Kent, W. J. (2002) BLAT – the BLAST-like alignment tool. Genome Res 12(4), 656–64.
Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22(22), 4673–80.
Birney, E., Clamp, M., and Durbin, R. (2004) GeneWise and Genomewise. Genome Res 14(5), 988–95.
Abril, J. F., and Guigo, R. (2000) gff2ps: visualizing genomic annotations. Bioinformatics 16(8), 743–4.
Fabra, P., and Miracle, J. (1983) Diccionari general de la Ilengua catalana. (17a ed). EDHASA editorial: Barcelona, 1786 p.
Jimenez, G., et al. (2000) Relief of gene repression by torso RTK signaling: role of capicua in Drosophila terminal and dorsoventral patterning. Genes Dev 14(2), 224–31.
Adams, M. D., et al. (2000) The genome sequence of Drosophila melanogaster. Science 287(5461), 2185–95.
Parra, G., et al. (2003) Comparative gene prediction in human and mouse. Genome Res 13(1), 108–17.
Wang, M., Buhler, J., and Brent, M. R. (2003) The effects of evolutionary distance on TWINSCAN, an algorithm for pair-wise comparative gene prediction. Cold Spring Harb Symp Quant Biol 68, 125–30.
Batzoglou, S. (2005) The many faces of sequence alignment. Brief Bioinform 6(1), 6–22.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Blanco, E., Abril, J.F. (2009). Computational Gene Annotation in New Genome Assemblies Using GeneID. In: Posada, D. (eds) Bioinformatics for DNA Sequence Analysis. Methods in Molecular Biology, vol 537. Humana Press. https://doi.org/10.1007/978-1-59745-251-9_12
Download citation
DOI: https://doi.org/10.1007/978-1-59745-251-9_12
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-58829-910-9
Online ISBN: 978-1-59745-251-9
eBook Packages: Springer Protocols