Skip to main content

Monotone Scoring of Patterns with Mismatches

  • Conference paper
Algorithms in Bioinformatics (WABI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3240))

Included in the following conference series:

Abstract

We study the problem of extracting, from given source x and error threshold k, substrings of x that occur unusually often in x within k substitutions or mismatches. Specifically, we assume that the input textstring x of n characters is produced by an i.i.d. source, and design efficient methods for computing the probability and expected number of occurrences for substrings of x with (either exactly or up to) k mismatches. Two related schemes are presented. In the first one, an O(nk) time preprocessing of x is developed that supports the following subsequent queries: for any substring w of x arbitrarily specified as input, the probability of occurrence of w in x within (either exactly or up to) k mismatches is reported in O(k 2) time. In the second scheme, a length or length range is arbitrarily specified, and the above probabilities are computed for all substrings of x having length in that range, in overall O(nk) time. Further, monotonicity conditions are introduced and studied for probabilities and expected occurrences of a substring under unit increases in its length, allowed number of errors, or both. Over intervals of constant frequency count, these monotonicities translate to some of the scores in use, thereby reducing the size of tables at the outset and enhancing the process of discovery. These latter derivations extend to patterns with mismatches an analysis previously devoted to exact patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Apostolico, A.: Pattern discovery and the algorithmics of surprise. In: Frasconi, P., Shamir, R. (eds.) Artificial Intelligence and Heuristic Methods for Bioinformatics, pp. 111–127. IOS Press, Amsterdam (2003)

    Google Scholar 

  2. Apostolico, A., Galil, Z. (eds.): Pattern matching algorithms. Oxford University Press, Oxford (1997)

    MATH  Google Scholar 

  3. Apostolico, A., Bock, M.E., Lonardi, S.: Monotony of surprise and largescale quest for unusual words (extended abstract). In: Proc. of Research in Computational Molecular Biology RECOMB, Washington, DC (2002); Myers, G., Hannenhalli, S., Istrail, S., Pevzner, P., Waterman, M. (eds.): Also, J. Comp. Bio., 10:3-4, 283–311 (July 2003)

    Google Scholar 

  4. Apostolico, A., Parida, L.: Incremental Paradigms of Motif Discovery. J. Comput. Bio. 7,11(1), 15–25 (2004)

    Article  Google Scholar 

  5. Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21(1/2), 51–80 (1995)

    Article  Google Scholar 

  6. Br\(\bar{a}\)zma, A., Jonassen, I., Ukkonen, E., Vilo, J.: Predicting gene regulatory elements in silico on a genomic scale. Genome Research 8(11), 1202–1215 (1998)

    Google Scholar 

  7. Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Bio. 9(2), 225–242 (2002)

    Article  Google Scholar 

  8. Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically sign ificant alignments of multiple sequences. Bioinformatics 15, 563–577 (1999)

    Article  Google Scholar 

  9. Jonassen, I.: Efficient discovery of conserved patterns using a pattern graph. Comput. Appl. Biosci. 13, 509–522 (1997)

    Google Scholar 

  10. Keich, Pevzner: Finding motifs in the twilight zone. In: Annual International Conference on Computational Molecular Biology, Washington, DC, April 2002, pp. 195–204 (2002)

    Google Scholar 

  11. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Apostolico, A., Pizzi, C. (2004). Monotone Scoring of Patterns with Mismatches. In: Jonassen, I., Kim, J. (eds) Algorithms in Bioinformatics. WABI 2004. Lecture Notes in Computer Science(), vol 3240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30219-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30219-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23018-2

  • Online ISBN: 978-3-540-30219-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics