Similarity landscapes: A way to detect many structural and sequence motifs in both introns and exons

Hultner, Michael; Smith, Douglas W.; Wills, Christopher

doi:10.1007/BF00166165

Similarity landscapes: A way to detect many structural and sequence motifs in both introns and exons

Published: February 1994

Volume 38, pages 188–203, (1994)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Journal of Molecular Evolution Aims and scope Submit manuscript

Similarity landscapes: A way to detect many structural and sequence motifs in both introns and exons

Download PDF

Michael Hultner¹^nAff2,
Douglas W. Smith¹ &
Christopher Wills¹

29 Accesses
5 Citations
Explore all metrics

Abstract

When investigators undertake searches of DNA databases, they normally discard large numbers of alignments that demonstrate very weak resemblances to each other, retaining only those that show statistically significant levels of resemblance. We show here that a great deal of information can be extracted from these weak alignments by examining them en masse. This is done by building three-dimensional similarity landscapes from the alignments, landscapes that reveal whether an unusual number of individually nonsignificant alignments tend to match up to a particular region of the query sequence being searched. The power of the search is increased by the use of libraries consisting entirely of introns or of exons. We show that (1) similarity landscapes with a variety of features can be generated from both intron and exon libraries, using introns or exons as query sequences; (2) the landscape features are real and not a statistical artifact; (3) well-known protein motifs used as query sequences can generate various landscape features; and (4) there is some evidence for resemblances between short regions of sequence carried by introns and exons. One possible interpretation of these results is that both introns and exons may have been built up during their evolution from short regions of sequence that as a result are now widely distributed throughout eukaryotic genomes. Such an interpretation would imply that these short regions have common ancestry. Alternatively, the wide sharing of short pieces of DNA may reflect regions with particular structural properties that have arisen through convergent evolution. The similarity-landscape approach can be used to detect such widespread structural motifs and sequence motifs in the genome that might be missed by less-global searches. It can also be used in conjunction with algorithms developed for detecting significant multiple alignments by isolating promising subsets of the databases that can be examined in more detail.

Article PDF

Quantiprot - a Python package for quantitative analysis of protein sequences

Article Open access 17 July 2017

Protein Structures, Interactions and Function from Evolutionary Couplings

PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Altschul SF, Gish W, Miller W, Myers E, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Google Scholar
Altschul SF, Lipman DJ (1990) Protein database searches for multiple alignments. Proc Natl Acad Sci US 87:5509–5513
Google Scholar
Carlson P, Darnfors C, Oloffson SO, Bjursell G (1986) Analysis of the human apolipoprotein B gene: complete structure of the B74 region. Gene 49:29–51
Google Scholar
Carter PE, Dunbar B, Fothergill JE (1988) Genomic and cDNA cloning of the human C1 inhibitor. Intron-exon junctions and comparison with other serpins. Eur J Biochem 173:163–169
Google Scholar
Clift B, Haussler BC, McConnell R, Schneider TD, Stormo GD (1986) Sequence landscapes. Nucleic Acids Res 14:141–158
Google Scholar
Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Carruthers MH, Nielson T, Turner DH (1986) Improved free-energy parameters for predictions of RNA duplex instability. Proc Natl Acad Sci US 83:9373–9377
Google Scholar
Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256:1443–1445
Google Scholar
Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci US 84:4355–4358
Google Scholar
Konopka AJ, Smythers G, Owens J, Maizel JV (1987) Distance analysis helps to establish characteristic motifs in intron sequences. Gene Anal Tech 4:63–74
Google Scholar
Kouzarides T, Ziff E (1988) The role of the leucine zipper in the fos jun interaction. Nature 336:646–651
Google Scholar
Landschulz WH, Johnson PF, McKnight SL (1988) The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science 240:1759–1764
Google Scholar
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441
Google Scholar
McEvoy SM, Maeda N (1988) Complex events in the evolution of the haptoglobin gene cluster in primates. J Biol Chem 263: 15740–15747
Google Scholar
O'Hara PJ, Grant FJ, Haldeman BA, Gray CL, Insley MY, Hagen FS, Murray MJ (1987) Nucleotide sequence of the gene coding for human factor VII, a vitamin K-dependent protein participating in blood coagulation. Proc Natl Acad Sci USA 84:5158–5162
Google Scholar
Rogerson AC (1991) There appear to be conserved constraints on the distribution of nucleotide sequences in cellular genomes. J Mol Evol 32:24–30
Google Scholar
Roytberg MA (1992) A search for common patterns in many sequences. CABIOS 8:57–64
Google Scholar
Salser W (1977) Globin mRNA sequences: analysis of base pairing and evolutionary implications. Cold Spring Harbor Quant Biol 42:985–1002
Google Scholar
Stewart C-B, Schilling JW, Wilson AC (1987) Adaptive evolution in the stomach lysozymes of foregut fermentors. Nature 330: 401–404
Google Scholar
Waterman M (1989) Sequence alignments. In: Waterman M (ed) Mathematical methods for DNA sequences. CRC Press, Boca Raton, FL, pp 53–92
Google Scholar
Waterman MS (1986) Multiple sequence alignment by consensus. Nucleic Acids Res 14:9095–9102
Google Scholar
Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148
Google Scholar

Download references

Author information

Michael Hultner
Present address: Department of Anatomy S-1334, School of Medicine, University of California, 94143, San Francisco, CA, USA

Authors and Affiliations

Department of Biology 0116 and Center for Molecular Genetics, University of California, 92093, San Diego, La Jolla, CA, USA
Michael Hultner, Douglas W. Smith & Christopher Wills

Authors

Michael Hultner
View author publications
You can also search for this author in PubMed Google Scholar
Douglas W. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Wills
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Correspondence to: C. Wills

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hultner, M., Smith, D.W. & Wills, C. Similarity landscapes: A way to detect many structural and sequence motifs in both introns and exons. J Mol Evol 38, 188–203 (1994). https://doi.org/10.1007/BF00166165

Download citation

Received: 22 July 1992
Revised: 27 January 1993
Accepted: 31 March 1993
Issue Date: February 1994
DOI: https://doi.org/10.1007/BF00166165

Key words

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Similarity landscapes: A way to detect many structural and sequence motifs in both introns and exons

Abstract

Article PDF

Similar content being viewed by others

Quantiprot - a Python package for quantitative analysis of protein sequences

Protein Structures, Interactions and Function from Evolutionary Couplings

PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Similarity landscapes: A way to detect many structural and sequence motifs in both introns and exons

Abstract

Article PDF

Similar content being viewed by others

Quantiprot - a Python package for quantitative analysis of protein sequences

Protein Structures, Interactions and Function from Evolutionary Couplings

PicXAA: A Probabilistic Scheme for Finding the Maximum Expected Accuracy Alignment of Multiple Biological Sequences

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation