Abstract
A common task in EST projects is the conversion of sequence chromatograms originating from gel-based or capillary sequencers into annotated sequence objects. Here we describe the usage of a software pipeline (available from http://www.nematodes.org/bioinformatics/), which has been developed to make the most of EST datasets. This modular software solution is targeted toward small- to medium-sized EST projects and comprises a series of Perl scripts. The software design is based on our experience during EST projects for parasitic nematodes and other species. The trace2dbest module processes sequence trace files and prepares the text files necessary for the submission of the sequences to the public repository dbEST. PartiGene provides facilities for clustering and assembling the ESTs into putative gene objects or unigenes and organizes the data in a relational database. Additional tools are available for annotation and for making the data accessible via the World Wide Web.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adams, M. D., Kelley, J. M., Gocayne, J. D., Dubnick, M., Polymeropoulos, M. H., Xiao, H., Merril, C. R., Wu, A., Olde, B., Moreno, R. F., Kerlavage, A. R., McCombie, W. R., and Venter, J. C. (1991) Complementary-DNA Sequencing - Expressed Sequence Tags and Human Genome Project. Science 252, 1651–56.
McCombie, W. R., Adams, M. D., Kelley, J. M., Fitzgerald, M. G., Utterback, T. R., Khan, M., Dubnick, M., Kerlavage, A. R., Venter, J. C., and Fields, C. (1992) Caenorhabditis-Elegans Expressed Sequence Tags Identify Gene Families and Potential Disease Gene Homologs. Nature Genetics 1, 124–31.
Boguski, M. S., Lowe, T. M. J., and Tolstoshev, C. M. (1993) Dbest - Database for Expressed Sequence Tags. Nature Genetics 4, 332–33.
Paquola, A. C. M., Nishyiama, M. Y., Reis, E. M., da Silva, A. M., and Verjovski-Almeida, S. (2003) ESTWeb: bioinformatics services for EST sequencing projects. Bioinformatics 19, 1587–88.
D'Agostino, N., Aversano, M., and Chiusano, M. L. (2005) ParPEST: a pipeline for EST data analysis based on parallel computing. BMC Bioinformatics 6, S9.
Parkinson, J., Anthony, A., Wasmuth, J., Schmid, R., Hedley, A., and Blaxter, M. (2004) PartiGene - constructing partial genomes. Bioinformatics 20, 1398–404.
Rudd, S., Mewes, H. W., and Mayer, K. F. X. (2003) Sputnik: a database platform for comparative plant genomics. Nucleic Acids Research 31, 128–32.
Christoffels, A., van Gelder, A., Greyling, G., Miller, R., Hide, T., and Hide, W. (2001) STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Res 29, 234–38.
Pertea, G., Huang, X. Q., Liang, F., Antonescu, V., Sultana, R., Karamycheva, S., Lee, Y., White, J., Cheung, F., Parvizi, B., Tsai, J., and Quackenbush, J. (2003) TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19, 651–52.
Parkinson, J., Whitton, C., Schmid, R., Thomson, M., and Blaxter, M. (2004) NEMBASE: a resource for parasitic nematode ESTs. Nucleic Acids Res 32, D427–D30.
Sturzenbaum, S. R., Parkinson, J., Blaxter, M., Morgan, A. J., Kille, P., and Georgiev, O. (2003) The earthworm Expressed Sequence Tag project. Pedobiologia 47, 447–51.
Peregrin-Alvarez, J. M., Yam, A., Sivakumar, G., and Parkinson, J. (2005) PartiGeneDB - collating partial genomes. Nucleic Acids Res 33, D303–D07.
Wasmuth, J. D., and Blaxter, M. L. (2004) Prot4EST: Translating Expressed Sequence Tags from neglected genomes. Bmc Bioinformatics 5, 187.
Schmid, R., and Blaxter, M. L. (2008) annot8r: GO, EC and KEGG annotation of EST datasets. BMC Bioinformatics 9, 130.
Anthony, A., and Blaxter, M. wwwPartiGene unpublished.
Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 8, 175–85.
Ewing, B., and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8, 186–94.
Green, P. phrap unpublished.
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic Local Alignment Search Tool. J Mol Biol 215, 403–10.
Parkinson, J., Guiliano, D. B., and Blaxter, M. (2002) Making sense of EST sequences by CLOBBing them. Bmc Bioinformatics 3.
Stajich, J. E., Block, D., Boulez, K., Brenner, S. E., Chervitz, S. A., Dagdigian, C., Fuellen, G., Gilbert, J. G. R., Korf, I., Lapp, H., Lehvaslaiho, H., Matsalla, C., Mungall, C. J., Osborne, B. I., Pocock, M. R., Schattner, P., Senger, M., Stein, L. D., Stupka, E., Wilkinson, M. D., and Birney, E. (2002) The bioperl toolkit: Perl modules for the life sciences. Genome Res 12, 1611–18.
Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H. Z., Lopez, R., Magrane, M., Martin, M. J., Natale, D. A., O'Donovan, C., Redaschi, N., and Yeh, L. S. L. (2005) The universal protein resource (UniProt). Nucleic Acids Res 33, D154–D59.
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene Ontology: tool for the unification of biology Nature Genetics 25, 25–29.
Bairoch, A. (2000) The ENZYME database in 2000 Nucleic Acids Res 28, 304–05.
Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes Nucleic Acids Res 28, 27–30.
Mulder, N. J., Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., Copley, R., Courcelle, E., Das, U., Durbin, R., Fleischmann, W., Gough, J., Haft, D., Harte, N., Hulo, N., Kahn, D., Kanapin, A., Krestyaninova, M., Lonsdale, D., Lopez, R., Letunic, I., Madera, M., Maslen, J., McDowall, J., Mitchell, A., Nikolskaya, A. N., Orchard, S., Pagni, M., Pointing, C. P., Quevillon, E., Selengut, J., Sigrist, C. J. A., Silventoinen, V., Studholme, D. J., Vaughan, R., and Wu, C. H. (2005) InterPro, progress and status in 2005 Nucleic Acids Res 33, D201–D05.
Acknowledgements
The authors would like to thank all contributors and users of trace2dbest, PartiGene, and other tools of the Edinburgh EST pipeline, in particular Alasdair Anthony, John Parkinson, James Wasmuth, and Ann Hedley. Funding was in part from the NERC Environmental Genomics Thematic Data Program.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Schmid, R., Blaxter, M. (2009). EST Processing: From Trace to Sequence. In: Parkinson, J. (eds) Expressed Sequence Tags (ESTs). Methods in Molecular Biology, vol 533. Humana Press. https://doi.org/10.1007/978-1-60327-136-3_9
Download citation
DOI: https://doi.org/10.1007/978-1-60327-136-3_9
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-58829-759-4
Online ISBN: 978-1-60327-136-3
eBook Packages: Springer Protocols