Introduction

RNA aptamers targeting small molecules serve as useful model systems for the study of the evolution and biophysics of macromolecular binding interactions. Because of their small sizes, the structures of several such complexes have been determined to atomic resolution by NMR spectrometry or X-ray crystallography (reviewed by Herman and Patel 2000). Moreover, aptamers can be subjected to mutational and evolutionary pressures for which survival is based entirely on ligand binding, without the complicating effects of simultaneous selection pressures for bioactivity, thus allowing the relative contributions of each activity to be evaluated separately.

In vitro selection can be used to explore the topography of fitness landscapes for the sequence space surrounding a given RNA, either for its original activity or for acquisition of a new activity. Lehman et al. (2000) observed relatively smooth phenotypic progress (phenotypic buffering) for a population of ribozymes evolved in vitro to function in an altered ionic environment, despite their trajectory through a rugged genotypic landscape. Reselection for new activities can also suggest sequence and structural features contributing to ligand recognition. For example, aptamers to the amino acid citrulline were evolved through only three mutations to recognize the closely related amino acid arginine (Famulok 1994). In these aptamers, the overall folded structure is very similar for the arginine- and citrulline-binding sequences. These mutations not only introduced codons for arginine—touching off debate as to the origin of the genetic code (Yarus 2000; Knight and Landweber 2000; Ellington et al. 2000)—but were later shown by NMR to make close contact with the target (Burgstaller et al. 1995; Yang et al. 1996). In another example, dopamine-binding RNAs were evolved to recognize the related amino acid tyrosine (Mannironi et al. 2000). In contrast to the arginine/citrulline aptamers, many more nucleotide position changes were required to effect the specificity switch, resulting in apparently unrelated RNA structures. Both of these studies focused primarily on the end-products of the reselection, without addressing evolutionary intermediates. A methodical examination of a defined sequence space can reveal the local accessibility of new functions between related aptamers with distinct functions.

Theoretical and experimental analyses have offered complementary perspectives on neutral evolutionary models by demonstrating that sequences differing by only one or two positions (Hamming distance = 1 or 2) can be structurally equivalent, thereby forming neutral networks in sequence space. Schuster et al. (1994) showed that for the secondary structures more prevalently encountered among RNAs of a fixed length, the set of all structurally equivalent sequences will form a vast neutral network in sequence space. Completely unrelated regions of sequence space can thus be linked through maintenance of a shared folded structure. Evolutionarily, the accumulation of specific mutations permits migration to any point in sequence space along a neutral path for a common phenotype (shared folded structure in this case). A single sequence capable of two distinct functions is said to represent the intersection between two neutral networks. Because many RNA molecules can adopt multiple conformations, intersection sequences are abundant when the only phenotypic criterion is secondary structure. Schultes and Bartel (2000) recently revealed the existence of dual catalytic functions (albeit greatly reduced relative to the wild-type activities) for a single RNA sequence as an intersection point for neutral paths leading to two evolutionarily unrelated and functionally distinct ribozymes. As a general feature of RNA evolution, even if robust functional intersections in sequence space are rare, close approaches between neutral networks for differing functions may abound.

Here we describe the in vitro evolutionary conversion of flavin-binding aptamers to a diverse collection of GMP-binding sequences. The fitness landscapes between two FAD aptamers (differing at 4 of 35 positions) and between an FAD aptamer and a GMP aptamer (differing at 5 of 35 positions) are detailed by assessing ligand binding by all evolutionary intermediates. This work provides the first detailed examination of the complete intervening sequence space between related but functionally distinct aptamer RNAs. Our results suggest that RNA sequence space contains many unrelated solutions to the problem of GMP binding and that some GMP aptamers are very near in sequence space to neutral networks for flavin-binding function. Furthermore, structural analysis of closely related flavin-binding and GMP-binding aptamers reveals overall structural differences accompanying the switch in ligand recognition. These findings correlate well with neutral evolutionary theory (Kimura 1968, 1983) by demonstrating how a flavin-binding phenotype could be maintained in a rugged landscape through mutational drift along a neutral network until the appropriate few off-path mutations force a punctuated transition to the nearby GMP-binding phenotype.

Experimental

Materials

DNA oligos, including mutagenized single-stranded DNA pools, were synthesized by Integrated DNA Technologies (Coralville, IA). Radiolabeled nucleotides were obtained from ICN (Costa Mesa, CA), and other chemicals and affinity resins were obtained from Sigma (St. Louis, MO).

RNA Pools

The three mutagenized pools were synthesized as “top strands” and PCR amplified with the appropriate primers to yield the following sequences: Fmn, 5′-GCTTAATACGACTCACTATAGGGTTAAGTTAGATCGAATggcgtgtaggatatgcttcggcagaaggacacgccATACAGTCAGTCGAACCCATA-3′; Fam, 5′-GCTTAATACGACTCACTATAGGGTTAAGTTAGATCGAATtggtgaggcagtcgaaaggaagtgtaggttgcttcaccaATACAGTCAGTCGAACCCATA-3′; and Fad, 5′-GCTTAATACGACTCACTA-TAGGGTTAAGTTAGATCGAATgggcataaggtatttaattgagaagttt-acaagaaagatgcaATACAGTCAGTCGAACCCATA-3′. Lowercase letters represent the original aptamer sequences (Fig. 1), which were mutagenized to 12% per position during synthesis (original nucleotide at a given position encountered in 88% of synthesized strands). Capital letters represent fixed 5′ and 3′ sequences used for binding primers 5F2 and 3F2, respectively, during amplification steps. The T7 promoter region is underlined. Fmn (35-nt parent) is based on sequence “35FMN-2” from a selection for FMN binding (Burgstaller and Famulok 1994); Fam (39-nt parent) is based on sequence “27FAD-1,” from a selection for FAD binding (Burgstaller and Famulok 1994); and Fad (42-nt parent) is based on sequence F8.27 from a selection for FAD binding (Roychowdhury-Saha et al. 2002).

Figure 1
figure 1

Structures for flavin nucleotide cofactors and GMP, and sequences and proposed secondary structures of the three flavin-binding aptamers from which random pools were generated for FAD and GMP selections. A Flavin mononucleotide (boxed) and flavin-adenine dinucleotide and B guanosine mononucleotide (GMP) present similar hydrogen-bonding faces (upper right of each structure). C Fad (42 nt) is based on an isoalloxazine-specific aptamer (Roychowdhury-Saha et al. 2002), Fam (39 nt) is based on an aptamer selected for FAD binding (Burgstaller and Famulok 1994), and Fmn (35 nt) is based on an FMN-binding aptamer (Burgstaller and Famulok 1994).

In Vitro Selection

RNA was transcribed in vitro and purified by denaturing gel electrophoresis following standard procedures (Roychowdhury-Saha et al. 2002). Internally radiolabeled RNA from each of the three pools was combined equally for an estimated initial complexity of 1014 unique sequences. From a simple combinatorial model, these pools should contain all possible variants containing up to nine mutations from the parental sequences, along with sparse sampling of higher-order mutations. Selections were performed by incubating 5 µM body-labeled RNA (in 100 µl binding buffer: 50 mM Bis-Tris, pH 6.4, 200 mM NaCl, 10 mM MgCl2) with 100 µl resin-immobilized FAD or GMP (linked through the ribose hydroxyls; Sigma, St. Louis, MO) at room temperature for 5 min with occasional mixing. Resin-nucleotide-RNA mixtures were loaded onto a column (1-ml syringe packed with glass wool) and washed six times with 100 µl binding buffer before eluting specifically bound species in six fractions with 100 µl binding buffer containing 5 mM FAD or GMP. The amount of RNA in each fraction was monitored using Cerenkov counting. Excess FAD or GMP was removed by filtering the RNA through Centricon YM-30 molecular weight cutoff membrane spin filters (Millipore, Bedford, MA) prior to reverse transcription and PCR amplification. RNA from the three pools was combined equally to a total of 2 nmol and loaded onto the resin during the first round of each selection. Five-hundred picomoles of RNA was used for all subsequent rounds.

Sequencing and Analysis

PCR products from rounds 3 and 6 of the FAD selection, and from round 7 of the GMP selection, were cloned into plasmids for sequencing. All sequenced isolates from this work are available at http://aptos.chem.indiana.edu/dhburke/all_seqs.html . Sequence analysis was performed using manual alignment and the PileUp multiple alignment algorithm (default settings) in the Genetics Computer Group (GCG) software package (http://www.accelrys.com/products/gcg_wisconsin_package/ ).

Mutant Construction and Binding Assays

Mutant sequences were generated by PCR amplification from a common template (G7-24 or F6-26) using primers carrying the prescribed mutations. Fifty atomoles of single-stranded template DNA were mixed with 200 pmol of each primer, and amplified for 30 cycles in 200-µl reactions. PCR products were in vitro-transcribed and RNA was assayed for FAD or GMP binding as described above.

Structural Probing

In vitro-transcribed RNA was 5′ dephosphorylated using calf intestinal alkaline phosphatase (NEB) and end-labeled using T4 polynucleotide kinase (Epicentre) and γ-32P-ATP. RNA was then incubated at room temperature for 15 min with 0.0001 to 0.01 U V1 RNase (Ambion, Austin, TX) or for up to 1 h with 1 U S1 nuclease (Promega). Reactions were quenched with the addition of an equal volume 95% formamide loading dye and flash-frozen on dry ice. Digestion products were electrophoresed alongside size markers (not shown) and alkaline hydrolysis ladders on 12% denaturing (8 M urea) acrylamide gels. Gels were dried and exposed to phosphor screens for analysis using ImageQuant version 1.2 software (Macintosh).

Results

In Vitro Reselection of FAD Aptamers

Three RNA aptamers, designated Fad, Fam, and Fmn, previously selected to bind flavin nucleotide cofactor targets, were mutated to 12% per position to generate libraries of variants that saturate local sequence space in the vicinity of each parental aptamer (Fig. 1). Pool RNAs were transcribed separately, mixed equally for a combined total of 2 nmol, and applied to resin-immobilized FAD to initiate in vitro selection. FAD-binding activity returned in the second round of selection, with little further increase in pool retention for rounds 3 through 6 (Fig. 2, circles). Nucleotide sequences were determined for isolates from rounds 3 and 6. Differences in the nucleotide lengths of parental aptamer sequences facilitated assignment of the parent that gave rise to each selected isolate. Most isolates are derived from the Fam parent (34 of 45 isolates from round 3, 21 of 36 from round 6) and the Fmn parent (7 from round 3, 14 from round 6) (Table 1). The 81 total sequences from this reselection thus represent a diverse collection of flavin-binding RNAs derived from all three parent aptamers.

Table 1 Distribution of selection isolates derived from three flavin-binding parent aptamers
Figure 2
figure 2

Progress of the FAD aptamer and GMP aptamer selections. Percentage RNA recovered during each round of selection for FAD-binding (circles) and GMP-binding (squares)activities.

GMP-Binding Progeny of Flavin-Binding Parents

When the same mixture of starting libraries was applied to a GMP affinity resin, GMP-binding activity became well established after five rounds, with moderate improvement during two additional rounds (Fig. 2, squares). Among the 40 clones sequenced from round 7, none derive from the Fam aptamer, only 1 derives from the Fmn aptamer, and the rest derive from the Fad aptamer. Furthermore, in contrast to the five Fad aptamer-derived isolates from the FAD reselection and the set of sequences cloned from the Fad starting pool, all but one of the Fad aptamer-derived GMP selection isolates included insertions of one to three nucleotides in the region between the primer-binding sequences. The lone Fmn aptamer-derived sequence (G7-24) is of identical length to the Fmn parent sequence and differs at 7 of 35 positions (20% dissimilarity).

GMP selection isolates were assayed for recognition of both GMP and FAD. In all of the isolates assayed, acquisition of GMP recognition coincided with a loss of FAD recognition (Fig. 3). The loss of flavin-binding activity suggests that the selected GMP-binding aptamers are structurally unrelated to their respective FAD-binding parent. While there seems to be no intrinsic physicochemical reason why such radical restructuring should be required, this interpretation is consistent with the observation that the population of GMP-binding aptamers as a whole exhibits far more mutations than the reselected FAD-binding aptamers. Structural rearrangement is directly supported for the one case examined in detail (see below).

Figure 3
figure 3

Activity assays for GMP selection isolates. Individual selection isolates were assayed for FAD and GMP binding (white bars and black bars, respectively) using the affinity column elution assay. Elution percentages for parent aptamers (left) are averaged values from four independent assays. Selection isolates were assayed once each. Nucleotide lengths between the fixed primer-binding sequences are listed beneath each clone. Transcripts for all assayed selection isolates include primer-binding regions. Parent aptamer sequences were assayed both with (columns) and without (squares) primer-binding regions. Isolate G7-24 (gray bar)—derived from the Fmn parent—is the object of analysis below.

A Rugged Adaptive Landscape Between an FAD Aptamer and a GMP Aptamer

The evolution from flavin-binding to GMP-binding described above is saltatory; the mutations observed in the final selected isolates were introduced simultaneously at the beginning of the selection. Alternatively, gradual evolution through the accumulation of point mutations requires relatively smooth pathways through the adaptive landscape for at least one of the activities to allow their close approach in sequence space. We therefore set out to evaluate binding of FAD and GMP for a number of intermediates in an evolutionary walk between an FAD aptamer and a GMP aptamer.

Isolates F6-26 and G7-24 are both derived from the Fmn parent aptamer, they contain no insertions or deletions, they differ at only 9 of 35 positions (26% dissimilarities), and they exhibit similar recoveries from their respective affinity resins. These circumstances singled out F6-26 and G7-24 as optimal candidates for further analysis. The original flavin-binding activity of the parental Fmn aptamer resides within its 35-nucleotide core. To assure that only the core aptamer portions were compared, F6-26 and G7-24 were synthesized without the primer-binding sequences used during their selections. The 35-nucleotide versions of these isolates retained ligand-binding activity at moderately reduced levels relative to their 75-nucleotide selection clones (22 vs. 47% elution for G7-24, 35 vs. 48% for F6-26). With nine mismatches separating them, 510 possible single mutants (2N − 2, where N = total mismatched positions) occupy the complete sequence space between aptamers F6-26 and G7-24. Rather than survey the entire set of possible mutants, we tested a series of intermediate sequences in search of an FAD-binding evolutionary intermediate more proximal to G7-24. Construct F26m4, which is four mutations from F6-26 and five mutations from G7-24, displayed an FAD binding profile similar to clone F6-26 and did not bind the GMP resin (Fig. 4). With only 30 single mutants occupying the complete sequence space between F26m4 and G7-24, F26m4 was designated as a more practical starting point for analysis of the sequence space leading to clone G7-24 (the exploration of sequence space between F6-26 and F26m4 is described below).

Figure 4
figure 4

Alignment and ligand-binding fitness of intermediate sequences between F6-26 and G7-24. A Dots indicate sequence identity to F6-26 for the 14 mutants between F6-26 and F26m4, and to F26m4 for the 30 mutants between F26m4 and G7-24. Nucleotide changes making an intermediate sequence more like F26m4 or G7-24 are indicated with text. Mutants are grouped according to their Hamming distances from G7-24 (numbers at right). FAD or GMP binding is indicated (F or G) to the right of active sequences. Isolates without such designations were inactive for both activities. Averaged results of FAD (gray bars) and GMP (black bars)-binding assays (B) for the 14mutants between F6-26 and F26m4 and (C) for the 30 mutants between F26m4 and G7-24 are represented graphically. Error bars designate the deviation from the mean for two or more measurements per sequence. Hamming distances from G7-24 are shown below the mutant groups.

In the sequence space between F26m4 and G7-24, there exist 5 single mutants, 10 double mutants, 10 triple mutants, and 5 quadruple mutants. Affinity column assays were used to survey all 30 sequences (Fig. 4). The five single mutants are all active for FAD binding, albeit with varying degrees of decreased activity relative to F26m4. The double, triple, and quadruple mutants are all inactive for FAD binding. Moving from the other direction, only the single mutant G24B7 retains GMP-binding fitness comparable to that of G7-24. Thus, a minimum of three mutations separates the two activities.

Neutral Evolutionary Paths Between Related FAD Aptamers

Sequence F26m4 provided a convenient intermediate in our initial screen for FAD binders between isolates F6-26 and G7-24. In backtracking from F26m4 to F6-26, we surveyed the four possible single mutants, six possible double mutants, and four possible triple mutants. Despite similar FAD-binding activities for F6-26 and F26m4, only 4 of the 14 mutants in the intervening sequence space display comparable ligand-binding ability (Fig. 4). In this limited sequence space, FAD binding is either maintained at roughly equivalent levels or lost entirely, with no apparent middle ground. Uridine appears to be required at positions 12 and 14 unless both positions are changed simultaneously to adenine and the third nucleotide is simultaneously changed from A to C (mutant F26A11; Fig. 4A). The A-to-C mutation strengthens the stem by pairing with G33. Conservation of the uridines at positions 12 and 14 was observed among the originally selected Fmn aptamers (Burgstaller and Famulok 1994) and was also observed in all of our Fmn-derived FAD selection isolates (data not shown). As both of these positions are involved in hydrogen-bonding to other bases near the flavin-binding site (Fan et al. 1996), their conservation in these aptamers is not surprising. Specifically, U12 is part of a base triple that stacks directly upon the intercalated flavin, while U14 pairs with A23 to close the helix that flanks the flavin binding site. The U12A/U14/A mutations have not been previously reported, and it is not immediately apparent how the binding site adjusts to accommodate these changes.

Structural Rearrangement Correlates with Acquisition of New Activity

Secondary structural probing of aptamers was performed using S1 and V1 nuclease digestion of 5′ end-labeled RNA. S1 nuclease preferentially cleaves unstructured regions, while V1 nuclease preferentially cleaves double-stranded regions. Digestion patterns for the FAD-binding aptamers show enhanced S1 cleavage at positions 1–3 and 17–19 and enhanced V1 cleavage at positions 4–7, 11–13, 16–17, 21–22, and 28–30 (Fig. 5A). This pattern is consistent with the secondary structure determined previously for the Fmn parent aptamer (Fig. 1) (Fan et al. 1996; Burgstaller and Famulok 1994). In contrast, the digestion patterns corresponding to G7-24 (Fig. 5) and G24B7 (not shown) are indicative of an entirely distinct folded structure. In these aptamers, S1 cleavage is enhanced at positions 11–14, 22–23, and 25–28, and V1 cleavage is enhanced at positions 12–14, 17–19, and 30–31. Acquisition of new ligand specificity in this case has become possible following the adoption of an entirely new secondary structure (Fig. 5B).

Figure 5
figure 5

Structural probing of FAD and GMP aptamers. A Partial digestions of G7-24 and F26m4 RNAs with S1 and V1 nucleases reveals single- and double-stranded regions, respectively, in the aptamers. Alkaline hydrolysis ladders (AH) and undigested RNA (C) are used for orientation. B Proposed secondary structures for G7-24 and F26m4 corresponding to heavy (■) and light (□) digestion with S1 nuclease, and heavy (•) and light (○) digestion with V1 nuclease. Nucleotide numbering corresponds to alignment of each sequence with the Fmn parent in Fig. 1.

Discussion

Competitive Reselection of FAD Aptamers Among Three Regions of Sequence Space Rich in Flavin-Binding RNAs and the Evolutionary Diaspora from Flavin-Binding Aptamers to GMP-Binding Aptamer

The selections described above utilized mixed pools containing mutagenized variants from three different parental aptamers. By forcing these populations to compete against one another for survival, we were able to evaluate the relative flavin-binding fitness for three distinct regions of sequence space. The RNA pools were of similar lengths (75, 79, and 82 nucleotides), enabling a true functional competition without the preferential enrichment of shorter sequences that has been observed for selection from mixed pools with greatly differing lengths (Huang et al. 2000). Although our sequenced population is dominated by Fam-derived isolates, progeny from all three pools are detectable in both rounds surveyed. Furthermore, the average mutation levels in the randomized region of the selected isolates (~8, ~16, and ~15% per sequence for Fad, Fam, and Fmn sets, respectively) mirror those observed in the starting pools (~13, ~17, and ~14%). This concurrence suggests that the local sequence space surrounding each of the three parent flavin-binding aptamers is densely populated with FAD-binding sequences. Descendents of the Fad parent are underrepresented in the reselected populations, possibly as a result of unfavorable structural contributions from the appended primer-binding regions. When Fad RNA lacking the primer-binding sequences (see Experimental) was applied to the affinity resin under conditions used during the selection, 32% of the input RNA could be eluted, compared to only 5% of the input RNA when the primer-binding sequences were attached (Fmn and Fam improved from ~20 and ~18% with primer-binding sequences appended to ~35 and ~22%, respectively, when these regions were removed; Fig. 3). The Fad aptamer may thus have entered the FAD reselection at a competitive disadvantage relative to individuals from the Fam and Fmn pools.

While our reselection for FAD aptamers appeared to sample from all three starting pools, all but one of the GMP selection isolates derived from the Fad aptamer pool. The overall deviation from the Fad parent observed in these GMP-binding aptamers was of an average Hamming distance of 20 (or nearly 45% sequence dissimilarity). This value far exceeds the difference we observed between the original Fad aptamer sequence and randomly sampled individuals from the mutagenized pool (~13%) or between the Fad aptamer and the Fad-derived selection isolates from the FAD reselection (~8%). Such a result suggests that GMP-binding activity, sufficient to survive the selection conditions, is only accessible at great mutational distances from the three parental aptamers. Indeed, the extensive mutational diversity observed among the GMP selection isolates is indicative of a broad radiation (diaspora) into sequence space. Only four of our selected sequences were not unique; two of these appeared as pairs, one in triplicate, and one in quadruplicate.

Close Approaches and Intersections Along Neutral Walks to New Functions

Neutral networks for unrelated functions can make close approach in sequence space or even intersect at molecules with dual function. Theoretical models of evolutionary pathways for the formation of the tRNAPhe cloverleaf secondary structure generated using secondary structure-folding algorithms demonstrate accessibility to all common folds from any point in sequence space through relatively few mutations (Schuster et al. 1994). Correspondingly, a high level of accessibility to new phenotypes (new folded structures) is made possible through single- or double-mutational diffusion from points along neutral networks for any given fold (Huyhen 1996). While these studies enable the survey of sequences in numbers far exceeding the practical limits of functional screening, they are as yet unable to incorporate realistic biochemical functions as selection criteria, such as the competence of these tRNA-like sequences for aminoacylation and their compatibility with the translational apparatus. The work presented here offers a glimpse at a functionally more realistic fitness landscape for molecular evolution by incorporating ligand binding as a selection criterion and suggests the presence of a neutral network for FAD-binding in close proximity to GMP-binding sequences within an otherwise rugged fitness landscape.

We speculate that there may be additional close approaches between FAD- and GMP-binding activities elsewhere in sequence space, as well as intersection sequences where the two activities reside in the same molecule. For example, Roychowdhury-Saha selected a pool of aptamers that could be eluted from an FAD resin with free GMP, suggesting dual recognition (unpublished). While most of our GMP selection isolates lie at great mutational distances from the three parent flavin aptamers, our systematic exploration of the evolutionary trajectory between the FAD aptamer F26m4 and the GMP aptamer G7-24 reveals a Hamming distance of only three mutations separating flavin binding from GMP binding in this region of sequence space.

Intersection sequences for other functions have been identified by Connell and Yarus (1994) in aptamers that bind both GMP and L-arginine (identified by alternating GMP and L-arginine affinity resins) and by White et al. (2001) in aptamers with dual specificity for human and porcine thrombin (identified by alternating the protein target during each round of selection). Pokrovskaya and Landweber described a modest ligase ribozyme that fortuitously also carried a small self-cleaving motif in an alternative conformation of the same active site. In this case, a switch in divalent metal ions controlled the direction of the reaction: Mn2+ for cleavage, and Mg2+ for ligation (Landweber and Pokrovskaya 1999). The work of Schultes and Bartel remains the only example of the evolutionary interconversion between two unrelated ribozymes positioned at great distances from each other along neutral networks that make close approach. While overlapping functions in a particular sequence are not always robust, our observations and the reports of other groups suggest that neutral paths through sequence space may overlap or make close approach for a very large number of functions.

Acquisition of New Functions Across Phenotypic Buffering Thresholds, Long Neutral Walks with Evolutionary Bridges, and Implications for Macromolecular Evolution

Theoretical analysis of secondary structures for evolving RNA populations has predicted extended periods of phenotypic stasis (or buffering) followed by punctuated incidents of structural transition despite continuous mutational pressure of uniform degree (Fontana and Schuster 1998). This scenario essentially describes neutral evolutionary theory, as developed by Kimura (1968, 1983) to explain the apparent phenotypic stasis observed in biological organisms in the face of seemingly overwhelming rates of random mutation. Applied to a set of in vitro-selected RNA aptamers, we show that neutral mutational drift (retention of FAD binding) can be maintained for some distance in sequence space away from the F6-26 sequence despite a highly rugged local fitness landscape. At one edge of this neutral network (two mutations along the evolutionary walk from F26m4 to G7-24), the population crosses a phenotypic buffering threshold, whereupon the FAD-binding secondary structure is destabilized. In this region of sequence space, the Hamming distance to GMP binding activity is short. Accumulation of additional mutations along this evolutionary path results in the preference for a new secondary structure, and the GMP-binding phenotype is realized.

RNA molecules with guanosine nucleotide-binding activity are widely distributed throughout sequence space. Our in vitro selection produced a tremendous diversity of sequences even among GMP-binding species derived from the Fad parent aptamer. GTP has also been a target for other aptamer selections (Davis and Szostak 2002), and guanosine nucleotides are even utilized for catalysis by the group I intron (Michel et al. 1989) and by in vitro-selected deoxyribozymes (Li and Breaker 1999). Each of these RNAs (or DNAs) is likely surrounded by a constellation of closely related sequences that comprise local neutral networks for guanosine binding. Indeed, neutral networks of varying sizes likely surround most functional RNAs. A simplified map of the functional landscape around F6-26 and G7-24 for FAD- and GMP-binding illustrates the proximity of these two activities in sequence space (Fig. 6). While we have not explored functions of relatives of the G7-24 sequence that are less similar to the flavin aptamers, it is likely that G7-24 and G24B7 lie on the edge of an extensive neutral network for GMP-binding sequences. Furthermore, neutral walks through regions of sequence space adapted to recognition of GMP or FAD may make close approach to, or even overlap with, many other functions, forming an evolutionary bridge between widely divergent sequences and functions. We propose that recognition of other targets could also be the basis for long walks through sequence space.

Figure 6
figure 6

Hypothetical functional landscape for FAD-binding and GMP-binding phenotypes adjacent to surveyed sequence space. Sequence space is mapped in two dimensions onto a grid representing a smallset of unique individuals within a landscape of all possible RNAs 35 nucleotides in length. Individual boxes symbolize unique sequences. Boxes bordering one another on a side or diagonally represent single mutational neighbors. FAD-binding (gray), GMP-binding (black), and inactive sequences (white) occupy the sequence space shown.Sequences surveyed in Fig. 4 are outlined. The surrounding, unexplored sequence space is anticipated to contain proximal neutral networks for FAD and GMP binding.