Abstract
This paper describes the UIcluster software tool, which partitions Expressed Sequence Tag (EST) sequences and other genetic sequences into “clusters” based on sequence similarity. Ideally, each cluster will contain sequences that all represent the same gene. If a naýve approach such as an NxN comparison (N is the number of sequences input) is taken, the problem is only feasible for very small data sets. UIcluster has been developed over the course of four years to solve this problem efficiently and accurately for large data sets consisting of tens or hundreds of thousands of EST sequences. The latest version of the application has been parallelized using the MPI (message passing interface) standard. Both the computation and memory requirements of the program can be distributed among multiple (possibly distributed) UNIX processes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adams M.D., Kerlavage A.R., Fleishmann R.D., Fuldner R.A., Bult C.J., Lee N.H., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., et al. (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377:3–17
Bonaldo M.F., Lennon G., Soares M.B. (1996) Normalization and subtraction: two approaches to facilitate gene discovery. Genome Research 6:791–806
International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921
Message Passing Interface Form (1994) MPI: A message-passing interface standard. University of Tennessee Technical Report CS-94-230
Miller R.T., Christoffels A.G., Gopalakrishnan C., Burke J.A., Ptitsyn A.A., Broveak T.R., Hide W.A. (1999) A comprehensive approach to clustering of expressed human gene sequence: The Sequence Tag Alighment and Consensus Knowledgebase. Genome Research 9:1143–1155
Parsons J.D., Brenner S., Bishop M.J. (1992) Clustering cDNA Sequences. Computational Applications in Bioscience 8:461–466
Schuler G.D. (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes. Journal of Molecular Medicine 75:694–698
Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., et al. (2001) The sequence of the human genome. Science 291:1304–1351
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pedretti, K., Scheetz, T., Braun, T., Roberts, C., Robinson, N., Casavant, T. (2001). A Parallel Expressed Sequence Tag (EST) Clustering Program. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2001. Lecture Notes in Computer Science, vol 2127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44743-1_51
Download citation
DOI: https://doi.org/10.1007/3-540-44743-1_51
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42522-9
Online ISBN: 978-3-540-44743-6
eBook Packages: Springer Book Archive