Abstract
We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds [3] to reduce noise hits. We model picking a set of vector seeds as an integer programming problem, and give algorithms to choose such a set of seeds. A good set of vector seeds we have chosen allows four times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)
Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research 28(1), 45–48 (2000)
Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comp. Biol. 1, 595–610 (2004)
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Biology (RECOMB), pp. 67–75 (2003)
Choi, K.P., Zhang, L.: Sensitive analysis and efficient method for identifying optimal spaced seeds. J. Comp and Sys. Sci. 68, 22–40 (2004)
Hochbaum, D.: Approximating covering and packing problems. In: Hochbaum, D. (ed.) Approximation algorithms for NP-hard problems, pp. 94–143. PWS (1997)
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 138, 253–263 (2004)
Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: Proceedings of the 8th Annual International Conference on Computational Biology (RECOMB), pp. 76–84 (2004)
Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brown, D.G. (2004). Multiple Vector Seeds for Protein Alignment. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-30219-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23018-2
Online ISBN: 978-3-540-30219-3
eBook Packages: Springer Book Archive