Abstract
When investigators undertake searches of DNA databases, they normally discard large numbers of alignments that demonstrate very weak resemblances to each other, retaining only those that show statistically significant levels of resemblance. We show here that a great deal of information can be extracted from these weak alignments by examining them en masse. This is done by building three-dimensional similarity landscapes from the alignments, landscapes that reveal whether an unusual number of individually nonsignificant alignments tend to match up to a particular region of the query sequence being searched. The power of the search is increased by the use of libraries consisting entirely of introns or of exons. We show that (1) similarity landscapes with a variety of features can be generated from both intron and exon libraries, using introns or exons as query sequences; (2) the landscape features are real and not a statistical artifact; (3) well-known protein motifs used as query sequences can generate various landscape features; and (4) there is some evidence for resemblances between short regions of sequence carried by introns and exons. One possible interpretation of these results is that both introns and exons may have been built up during their evolution from short regions of sequence that as a result are now widely distributed throughout eukaryotic genomes. Such an interpretation would imply that these short regions have common ancestry. Alternatively, the wide sharing of short pieces of DNA may reflect regions with particular structural properties that have arisen through convergent evolution. The similarity-landscape approach can be used to detect such widespread structural motifs and sequence motifs in the genome that might be missed by less-global searches. It can also be used in conjunction with algorithms developed for detecting significant multiple alignments by isolating promising subsets of the databases that can be examined in more detail.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Altschul SF, Gish W, Miller W, Myers E, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Altschul SF, Lipman DJ (1990) Protein database searches for multiple alignments. Proc Natl Acad Sci US 87:5509–5513
Carlson P, Darnfors C, Oloffson SO, Bjursell G (1986) Analysis of the human apolipoprotein B gene: complete structure of the B74 region. Gene 49:29–51
Carter PE, Dunbar B, Fothergill JE (1988) Genomic and cDNA cloning of the human C1 inhibitor. Intron-exon junctions and comparison with other serpins. Eur J Biochem 173:163–169
Clift B, Haussler BC, McConnell R, Schneider TD, Stormo GD (1986) Sequence landscapes. Nucleic Acids Res 14:141–158
Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Carruthers MH, Nielson T, Turner DH (1986) Improved free-energy parameters for predictions of RNA duplex instability. Proc Natl Acad Sci US 83:9373–9377
Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256:1443–1445
Gribskov M, McLachlan AD, Eisenberg D (1987) Profile analysis: detection of distantly related proteins. Proc Natl Acad Sci US 84:4355–4358
Konopka AJ, Smythers G, Owens J, Maizel JV (1987) Distance analysis helps to establish characteristic motifs in intron sequences. Gene Anal Tech 4:63–74
Kouzarides T, Ziff E (1988) The role of the leucine zipper in the fos jun interaction. Nature 336:646–651
Landschulz WH, Johnson PF, McKnight SL (1988) The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science 240:1759–1764
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441
McEvoy SM, Maeda N (1988) Complex events in the evolution of the haptoglobin gene cluster in primates. J Biol Chem 263: 15740–15747
O'Hara PJ, Grant FJ, Haldeman BA, Gray CL, Insley MY, Hagen FS, Murray MJ (1987) Nucleotide sequence of the gene coding for human factor VII, a vitamin K-dependent protein participating in blood coagulation. Proc Natl Acad Sci USA 84:5158–5162
Rogerson AC (1991) There appear to be conserved constraints on the distribution of nucleotide sequences in cellular genomes. J Mol Evol 32:24–30
Roytberg MA (1992) A search for common patterns in many sequences. CABIOS 8:57–64
Salser W (1977) Globin mRNA sequences: analysis of base pairing and evolutionary implications. Cold Spring Harbor Quant Biol 42:985–1002
Stewart C-B, Schilling JW, Wilson AC (1987) Adaptive evolution in the stomach lysozymes of foregut fermentors. Nature 330: 401–404
Waterman M (1989) Sequence alignments. In: Waterman M (ed) Mathematical methods for DNA sequences. CRC Press, Boca Raton, FL, pp 53–92
Waterman MS (1986) Multiple sequence alignment by consensus. Nucleic Acids Res 14:9095–9102
Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148
Author information
Authors and Affiliations
Additional information
Correspondence to: C. Wills
Rights and permissions
About this article
Cite this article
Hultner, M., Smith, D.W. & Wills, C. Similarity landscapes: A way to detect many structural and sequence motifs in both introns and exons. J Mol Evol 38, 188–203 (1994). https://doi.org/10.1007/BF00166165
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00166165