Matching among multiple random sequences

Naus, Joseph I.; Sheng, Ke-Ning

doi:10.1007/BF02459461

Matching among multiple random sequences

Published: May 1997

Volume 59, pages 483–496, (1997)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Matching among multiple random sequences

Download PDF

Joseph I. Naus¹ &
Ke-Ning Sheng²

59 Accesses
11 Citations
Explore all metrics

Abstract

In searching for strong homologies between multiple nucleic acid or protein sequences, researchers commonly look at fixed-length segments in common to the sequences. Such homologies form the foundation of segment-based algorithms for multiple alignment of protein sequences. The researcher uses settings of “unusualness of multiple matches” to calibrate the algorithms. In applications where a researcher has found a multiple matching word, statistical significance helps gauge the unusualness of the observed match. Previous approximations for the unusualness of multiple matches are based on large sample theory, and are sometimes quite inaccurate. Section 2 illustrates this inaccuracy, and provides accurate approximations for the probability of a common word inR out ofR sequences. Section 3 generalizes the approximation to multiple matching inR out ofS sequences. Section 4 describes a more complex approximation that incorporates exact probabilities and yields excellent accuracy; this approximation is useful for checking the simpler approximations over a range of values.

References

Glaz, J. and J. I. Naus. 1991. Tight bounds and approximations for scan statistic probabilities for discrete data.Ann. Appl. Prob. 1, 306–318.
MATH MathSciNet Google Scholar
Karlin, S., G. Ghandour and D. Fousler. 1985. DNA sequence comparisons of human, mouse, and rabbit immunoglobulin Kappa gene.Mol. Biol. Evol. 2, 35–52.
Google Scholar
Karlin, S. and F. Ost 1987. Counts of long aligned word matches among random letter sequences.Adv. Appl. Prob. 19, 293–351.
Article MATH MathSciNet Google Scholar
Karlin, S., F. Ost and B. E. Blaisdell. 1989. Patterns in DNA and amino acid sequences and their statistical significance. InMathematical Methods for DNA Sequences, M. S. Waterman (Ed), ch. 6. Boca Raton, FL: CRC Press Inc.
Google Scholar
Karlin, S. and F. Ost 1988. Maximal length of common words among random sequences.Ann. Prob. 16, 535–563.
MATH MathSciNet Google Scholar
Leung, M. Y., B. E. Blaisdell, C. Burge and S. Karlin. 1991. An efficient algorithm for identifying matches with errors in multiple long molecular sequences.J. Mol. Biol. 221, 1367–1378.
Article Google Scholar
Mott, R. F., T. B. L. Kirkwood and R. N. Curnow. 1990. An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.Bull. Math. Biol. 52, 773–784.
MATH Google Scholar
Naus, J. and K. N. Sheng. 1996. Screening for unusual matched segments in multiple protein sequences.Commun. in Statist., Simulation and Computation 25, 937–952.
MATH MathSciNet Google Scholar
Sheng, K. N. and J. Naus. 1994. Pattern matching between two non-aligned random sequences.Bull. Math. Biol. 56, 1143–1162.
MATH Google Scholar
Sobel, E. and H. M. Martinez. 1986. A multiple sequence alignment program.Nucleic Acids Res. 14, 363–374.
Google Scholar
Waterman, M. S. 1986. Multiple sequence alignment by consensus.Nucleic Acids Res. 14, 9095–9102.
MathSciNet Google Scholar
Waterman, M. S., R. Arratia and D. J. Galas. 1984. Pattern recognition in several sequences; consensus and alignment.Bull. Math. Biol. 46, 515–527.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Rutgers, The State University, 08855, Piscataway, NJ, U.S.A.
Joseph I. Naus
Roberts Pharmaceutical Corporation, 07724, Eatontown, NJ, U.S.A.
Ke-Ning Sheng

Authors

Joseph I. Naus
View author publications
You can also search for this author in PubMed Google Scholar
Ke-Ning Sheng
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naus, J.I., Sheng, KN. Matching among multiple random sequences. Bltn Mathcal Biology 59, 483–496 (1997). https://doi.org/10.1007/BF02459461

Download citation

Received: 19 August 1996
Accepted: 21 October 1996
Issue Date: May 1997
DOI: https://doi.org/10.1007/BF02459461

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Matching among multiple random sequences

Abstract

Article PDF

Similar content being viewed by others

PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences

Clustal Omega, Accurate Alignment of Very Large Numbers of Sequences

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Matching among multiple random sequences

Abstract

Article PDF

Similar content being viewed by others

PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences

Clustal Omega, Accurate Alignment of Very Large Numbers of Sequences

BLAST and FASTA Similarity Searching for Multiple Sequence Alignment

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation