Abstract
Optimized spaced seeds improve sensitivity and specificity in local homology search [1]. Recently, several authors [2-4] have shown that multiple seeds can have better sensitivity and specificity than single seeds. We describe a linear programming-based algorithm to optimize a set of seeds. Our algorithm offers a performance guarantee: the sensitivity of a chosen seed set is at least 70% of what can be achieved, in most reasonable models of homologous sequences. Our method achieves performance comparable to that of a greedy algorithm, but our work gives this area a mathematical foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ma, B., Tromp, J., Li, M.: Patternhunter: Faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB 2003), Berlin, Germany, April 2003, pp. 67–75 (2003)
Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004) (in Press)
Brejová, B., Brown, D.G., Vinař, T.: Vector seeds: An extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215, 403–410 (1990)
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Applied Mathematics (2004) (in Press)
Choi, K.P., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. Journal of Computer and System Sciences 68, 22–40 (2004)
Brejová, B., Brown, D.G., Vinař, T.: Optimal spaced seeds for hidden markov models, with application to homologous coding regions. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)
Smith, T.F., Waterman, M.S.: Identification common molecular subsequences. Journal of. Molecular Biology 147, 195–197 (1981)
Hochbaum, D.S.: Approximating covering and packing problems:set cover, vertex cover, independent set, and related problems. In: Hochbaum, D.S. (ed.) Approximation Algorithms for NP-hard Problems, ch. 3, pp. 135–137. PWS (1997)
Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45(4), 634–652 (1998)
Motwani, R., Raghavan, P.: Randomized Algorithm. Cambridge University Press, New York (1995)
Raghavan, P.: Probabilistic construction of deterministic algorithms: Approximating packing integer programs. Journal of Computer and System Sciences 37, 130–143 (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, J., Brown, D.G., Li, M., Ma, B. (2004). Optimizing Multiple Spaced Seeds for Homology Search. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-27801-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive