Introduction

Promoter is a DNA sequence where RNA polymerase initiates transcription. Promoter is singularly important in an individual gene’s expression, as well as in the global regulation of cell physiology of prokaryotes. For example, during the rapid growth phase in E. coli, σ70 RNA polymerase holoenzyme is the dominant RNA polymerase species; it utilizes the promoters for a set of genes for rapid growth. When entering the stationary phase, σS RNA polymerase, which recognizes the promoters of a set of genes for cell preservation, becomes predominant.

The presence of promoters raises two general questions: (1) what sequence composition constitutes a promoter and (2) how that sequence has evolved. A survey of native promoters from an organism shows that the relationship between promoter activity and sequence composition may be subtle. Historically, a promoter consensus was derived for bacteria by compiling the sequences upstream of known genes (Harley and Reynolds 1987; Hawley and McClure 1983; Pribnow 1975; Raibaud and Schwartz 1984; Rosenberg and Court 1979; Seeburg et al. 1977). From a few hundred known E. coli σ70 promoters, the consensus was determined to be TTGACA and TATAAT for the −35 and −10 elements, which are separated by about 17 base pairs. As information has accumulated, sequences deviating from the consensus type have been identified. Notably, in the extended −10 type of promoter, a TGn or TGTGn situated immediately upstream of a −10 element can substitute for a −35 element (Barne et al. 1997; Burr et al. 2000; Kumar et al. 1993). Furthermore, a weak −35 element can be compensated for by a UP element, which is an AT-rich sequence located between position −38 and position −60 (Estrem et al. 1998). Needless to say, if the core promoter is embedded within a complex promoter that requires transcription activators, the promoter elements may further deviate from the consensus so that the basal activity can be lowered. For example, the well-known constitutive mutant lac promoter, lacUV5, is a step closer to the consensus than is the wild-type E. coli lac promoter (which requires an activator to be fully active).

The information gained by surveying native E. coli promoters, however, has some limitations. The native promoters in an organism, evidently, present only a subset of possible promoters. For example, the consensus promoter is conspicuously missing in E. coli even though it is fully functional in vivo. Promoter sequences seemingly different from the natural consensus are found by selecting from a random sequence library (Horwitz and Loeb 1986; Horwitz and Loeb 1988; Oliphant and Struhl 1987, 1988). In a recent exploration, by selecting from a random library within the spacer region, a 10-bp-long AT-rich sequence is found to enhance promoter activity of −35-type promoters (Liu et al. 2004). Perhaps a skewed subset of functional σ70 promoters has been selected due to constraints of gene expression regulation, potential conflicts with other cellular functions such as overlapping specificity with σs RNA polymerase (Gaal et al. 2001), and, conceivably, some particular events in the evolution history. More significantly, surveying the native promoters alone reveals little about how the promoter sequences have evolved. What are the essential factors in evolution, and how do these factors influence evolution? What trajectory is a succession of sequences likely to follow? To these questions, experimental evolution may provide more direct answers. Experimental evolution does not retrace the exact evolution course in nature; instead, it simulates the natural history. By making its own history under defined conditions and in repetitions, experiments may reveal the essential features in the evolution.

In the experiments reported here, we study E. coli promoters by evolving them from a nonfunctional sequence. At the onset of each experiment, a nonfunctional sequence is arbitrarily chosen to initiate evolution. To accelerate the evolution, we use mutagenic PCR (also called error-prone PCR) (Cadwell and Joyce 1992) to generate variants. The mutagenized DNA is inserted in a designated promoter region upstream of the cat (chloramphenicol acetyl transferase) gene on a plasmid that transforms E. coli cells. The plasmid confers a degree of chloramphenicol resistance according to the promoter activity of the insert. By growing transformants on agar medium containing chloramphenicol, functional promoter sequences are enriched. The plasmid DNA is extracted and ready for a new round of mutagenesis. The mutation frequencies are tuned to between 0.4% and 18% by adjusting mutagenic PCR amplification rate. The selection stringency is easily adjusted by varying chloramphenicol concentration in culture media. We subject the promoter region to several cycles of mutagenesis and selection until the population becomes mostly active promoters. These evolved promoters are sequenced, and their activities assayed.

We primarily adjust mutation frequency and observe how that frequency influences evolution speed and the diversity of the evolved population. Given the nature of the selection condition, it is not surprising that nearly all of the sequences are recognizable σ70 promoters. This result, in turn, indicates that the experimental procedure is overall functional. Furthermore, analogous to the high frequency of the extended −10 promoters found in nature, a large portion of the experimentally evolved promoters belongs to the extended −10 type. Interestingly, some dynamic properties shown in the experiments are unexpected. The promoter solutions emerge and populate the sequence pool very fast in just few evolution cycles. The population converges into a small number of groups even at an extremely high mutation frequency. Short deletions, which occur rarely in PCR mutagenesis, are frequently found to bring the −35 and the −10 elements closer to the 17-bp optimal length between them. These and other experimental observations may help us to understand the process of evolution in nature.

Materials and Methods

Cell Culture and Plasmid Preparation

A homologous recombination deficient E. coli strain, TOP10 (Invitrogen, Carlsbad, CA), is used throughout the experiments. The cells grow in 2xYT broth or on 2xYT agar (Q-Biogene, Carlsbad, CA) medium supplemented with antibiotics. Nonselection (regarding the promoter function) medium contains 50 μg/ml kanamycin. Selection medium contains 10 μg/ml kanamycin and various concentrations of chloramphenicol. For the selection process, 25 ml agar medium is poured per petri dish. For the promoter activity assay, a rectangular plate (Omniplate, Nunc brand) is used, and each assay plate contains 30 ml of agar medium. All the cell cultures grow at 37°C. The liquid cultures grow with vigorous shaking. The agar plate culture is incubated at 95% relative humidity.

Plasmid DNA is prepared from 3 ml of saturated cell culture with a QIAprep column kit (Qiagen, Valencia, CA). DNA samples are routinely stored in TlowE buffer (10 mM Tris–Cl, 0.1 mM EDTA, pH 8.0 to 8.5, at room temperature).

PCR and PCR Mutagenesis

A regular Taq polymerase (Roche, Indianapolis, IN) is used in PCR to amplify the promoter region, and Z-Taq (Takara, Madison WI) is used to amplify and to linearize selection plasmid DNA. Except in Experiment 1, the PCR primer sequences for the promoter region are 5′AGTGCAAGUGCAGCUAGAGACAGC AGACCG3′ (Up) and 5′ATGGUGGCAGGUACCTATAUCTC CTACGAGAA3′ (Down). In Experiment 1, one of the above primers (Up) is 5′AGTGCAAGUGCAGCUAGAGACAGCA GA3′. The primer sequences for linear plasmid are 5′AG CTGCACUTGCACUGGGGACA3′ (vector<lic) and 5′ATAU AGGTACCUGCCACCAUGGAGAAA3′ (lic>vector). In these primers some of T’s are replaced with U’s for cloning purpose. Regular PCR solution contains 200 mM Tris–Cl, pH 8.4, 2.5 mM MgCl2, a 0.2 mM concentration of each of the four dNTP’s, a 0.25 μM concentration of each of the primers, and 1.5 units of Taq polymerase in 50 μl. The thermal cycle consists of three steps: (1) denaturation at 94°C for 6 s, (2) annealing at 6°C below the lower melting temperature of the two primers for 6 s, and (3) elongation at 74°C for 6 s for the promoter region or 40 s for the linear plasmid. The total number of cycles varies.

Mutagenic PCR is slightly modified from the method using MnCl2 (Cadwell and Joyce 1992). Taq polymerase (Roche) is used in mutagenic PCR, and the reaction solution contains 200 mM Tris–Cl, pH 8.4, 7 mM MgCl2, 0.2 mM dATP, 0.2 mM dCTP, 0.2 mM TTP, 0.7 mM dGTP, a 0.25 μM concentration of each of the primers, 1.5 units of polymerase in 50 μl, and some amount of MnCl2. MnCl2 is first made as a stock solution of 30 mM MnCl2 in 100 mM HCl and stored at −20°C. The usage of MnCl2 is empirically determined; too much MnCl2 inhibits PCR. To determine the maximum permissible MnCl2 concentration, several mutagenic PCR test reactions, containing MnCl2 between 0.16 and 0.8 mM, are performed. One half of the maximum permissible value, usually around 0.3 mM, of MnCl2 is used in the mutagenic PCR.

Mutagenic PCR produces mostly heteroduplex DNA. Before cloning, a one-step extension is applied to convert heteroduplex to homoduplex. In this extension step, 25 μl of mutagenic PCR product is mixed with 75 μl of regular PCR solution with a 1 μM concentration of each of the two primers. After heating the reaction in a 0.2-ml thin-wall tube to 92°C for several seconds to denature the DNA, the tube is inserted in a metal block to quickly cool down to 50°C. The reaction stays at 50°C for 15 s and then at 74°C for 60 s. After cooling to room temperature (22°C), 1.5 units of Klenow fragment of E. coli DNA polymerase I (New England Biolabs, Beverly, MA) is added to treat the DNA for 10 min. PCR products are routinely examined by 4% agarose gel electrophoresis. If an extra band appears, the DNA band of proper length is excised from the gel and purified with a kit (Zymo Research, Orange, CA).

Ligation Independent Cloning

The PCR-made insert and vector fragments overlap at their ends (Fig. 1). The PCR primers contain two to three uracil (U) bases in place of thymine (T) in the overlap regions. A combined enzymatic digestion with UDG and Endo IV is used to severe the DNA backbone near the U’s and, thus, expose the complementary strand as a long single-strand DNA “sticky end” (Berninger 1993; Rashtchian and Berninger 1992). The enzymatic digestion buffer contains 50 mM Tris–Cl, pH 7.9, and 50 mM KCl. One unit of UDG, 2 units of Endo IV (both from Epicentre, Madison, WI), 0.1 pmol of insert, and 0.1 pmol of vector DNA are added in 20 μl of reaction buffer. Digestion is held at 37°C for 1 h, shifted to room temperature for 10 min. and then is added to the competent cell in transformation.

Figure 1
figure 1

Schematic diagram of selection plasmid pCatKp. The Up primer sequence is used in Experiment 1, and in all other experiments there are three additional nucleotides, 5′CCG3′, at the 3′ end.

Promoter Sequencing

The promoter region is amplified by PCR using 0.2 μl of the liquid sample culture as the template DNA source in a 20-μl reaction volume. The PCR product is purified by ultrafiltration (Millipore, Berilica, MA), eluted in 20 μl of H2O, labeled with DYEnamic ET terminator sequencing mix (Amersham, Piscataway, NJ). After purification by ethanol precipitation, the DNA sample is dissolved in 30 μl of H2O and analyzed by ABI 310 (Applied Biosystems, Foster City, CA). Each sequenced sample is given an identification code of three or four positions separated with a dot, for example, 2.1.8.3×. The first position of the code is the number of the experiment to which the sample belongs. The second is the number of the evolution cycle. The third is an arbitrary serial number. The fourth, assigned to only some samples, is a suffix to mark the promoter activity, i.e., the chloramphenicol resistance level of the transformants. For example, sequence identification “2.1.8.3×” means that it is the 8th sample collected from the 1st cycle of Experiment 2, and the promoter activity is 3×.

Results

Molecular evolution has three components: mutation, selection, and replication. In the promoter evolution experiments, mutation is introduced in vitro by mutagenic PCR, selection is in vivo through the chloramphenicol resistance associated with the promoter activity, and replication occurs both in mutagenic PCR and in selection.

There are several significant parameters in evolution experiments: mutation frequency, selection stringency, clonal size (i.e., population size, number of independent transformants), and number of evolution cycles. Five experiments are reported here. The primary parameter we alter throughout the experiments is mutation frequency, ranging from 0.4% to 18%. The clonal size, obtained by counting colonies of dilution plating on a nonselection agar plate, is of the order of 105 for the first cycle. For the subsequent cycles, this number may be lowered to about 104. Mutation and selection are described in detail below.

In preliminary studies, we noticed that a very high level of expression of cat inhibits cell growth. Consequently, we set the highest chloramphenicol concentration at 330 μg/ml. In most cases we terminate the experiment after three evolution cycles, at which point the population consists of mostly strong promoters.

Sample clones are collected both before and after selection of each evolution cycle. We evaluate the progress of evolution by assaying promoter activity of the sample clones and by DNA sequencing of the promoter regions. For each sampling point along the evolution course, 96 colony samples are collected for promoter activity assay and a subset of the samples is sequenced.

Mutagenesis

PCR mutagenesis can provide a very high mutation frequency within the designated promoter region. To control the overall amplification and, in turn, the mutation frequency, we dilute the template to a desired concentration and let PCR proceed to saturation. For example, for 100-fold amplification, the initial template concentration is 1/100 of the saturation concentration of PCR. The exact amplification required to achieve a certain level of mutation frequency is estimated empirically for each experiment, and the actual mutation frequency achieved is monitored by DNA sequencing.

Among several methods of mutagenic PCR, we use the one with Mn2+ (Cadwell and Joyce 1992). This method has a known biased mutation spectrum. Using the initial sequence of Experiment 3 (see below) as the template, 132 point mutations are identified from sequencing results of five separate mutagenic PCR. Among these point mutations, the occurrences of each type are: AT→GC, 93; AT→TA, 10; AT→CG, 0; CG→TA, 13; CG→GC, 1; CG→AT, 2; deletion, 3; and insertion, 0. The biased spectrum can be only partially attributed to the higher AT content of the initial template, 23 AT pairs and 18 GC pairs.

Selection Plasmid Construction

A selection plasmid, pCatKp, is constructed to have the following features (Fig. 1): It has a LIC (ligation independent cloning) site upstream of the cat gene coding sequence for promoter insertion. Between cat and the promoter insertion site, there is a Shine-Dalgarno sequence for proper translation. Downstream of cat there is a transcription terminator. The plasmid has a p15A replication origin and a kanamycin resistance marker, kanR.

The plasmid is derived from pCAT3basic (Promega, Madison, WI). The ColE1-derived replication origin on pCAT3basic is replaced with p15A ori to reduce plasmid copies per cell. The original drug resistance marker, bla, is replaced with a more stringent kanamycin resistance marker, kanR, to deter the growth of nontransformants. The f1 ori is also deleted. The multicloning site is replaced with the LIC site sequence made of synthetic oligonucleotides. The plasmid is constructed by ligation of PCR amplified fragments step by step. We confirmed that the plasmid, either with or without the insertion of a starting sequence for evolution, does not confer noticeable chloramphenicol resistance, i.e., after incubation for 2 days, no colonies seen on agar plate containing 1 μg/ml chloramphenicol.

Selection

We use agar surface selection primarily to avoid the possibility that very few clones quickly take over the whole population; presumably such takeover is likely to occur in liquid culture. In addition, it is very convenient to appraise the evolution in progress by observing the frequency and the growth rate of colonies on chloramphenicol plates during selection.

The PCR mutagenized promoter region is inserted upstream of the cat gene on the selection plasmid, pCatKp, and the plasmid transforms E. coli cells. The transformants first grow on a nonselection plate for 12 h until tiny colonies are visible (about 0.1 mm diameter). The colonies on this plate represent the population before selection. These colonies are transferred onto five selection plates by replica plating (Lederberg and Lederberg 1952). Unless indicated otherwise, the chloramphenicol concentrations of selection plates are 0, 3.3, 10, 33, 100, and 330 μg/ml, abbreviated as 0×, 0.1×, 0.3×, 1×, 3×, and 10×, respectively. The colonies grow on selection plates for 36 h in the first cycle or 18 h in subsequent cycles. An example of colonies on selection plates is shown in Fig. 2. After selection growth, the cells from each plate are scraped off from the surface and stored in individual tubes. In order to permit some weaker promoters to further evolve, cell collection follows the following schedule: All of the cells are collected from the highest chloramphenicol concentration on which colonies appear. Only 1/10 of cells are collected from the second highest chloramphenicol concentration plate, and 1/100 from the next. Plasmid DNA is extracted from the cells for further evolution and for analysis. By adjusting the duration of colony growth and the chloramphenicol concentration, we always find the proper selection stringency for each evolution cycle.

Figure 2
figure 2

Example of colonies on selection plates. The plates are from the second cycle of Experiment 4. The chloramphenicol concentrations in the top row, from the left, are 10×, 3×, and 1×; those in the bottom row, from the left, are 0.3×, 0.1×, and 0×. The bottom right plate is the source plate of replica plating, and all other plates are the replicas. The photograph was taken after 18 h of incubation of replicas. During this period, the source plate was kept at room temperature. The colony growth tapers off as chloramphenicol increases.

Chloramphenicol resistance of a culture has a noticeable hysteresis. Namely, cells transferred from a large and saturated colony tend to show more chloramphenicol resistance than from a small and growing one. This hysteresis may be largely due to the accumulation of cat gene product prior to chloramphenicol exposure, which effectively reduces the intracellular concentration of the drug. Such hysteresis is undesirable because it introduces phenotype variability. Furthermore, σS promoter may gain a selective advantage if the cells are allowed to grow to saturation. For these reasons, the duration of growth on a nonselection plate is precise both in the selection process and in the promoter activity assay.

Promoter Activity Assay

In this section we describe the procedure to assess the activity of the evolved promoters after all the cycles of an experiment have been completed. The procedure is analogous to the selection. A note of difference, the clones assayed here are samples taken out of the evolving population, whereas the selection is on the evolving population itself. Plasmid DNA samples are collected before and after selection for each evolution cycle. Individually the sample plasmid preparations transform fresh cells, and the transformants grow on a nonselection agar plate overnight till colonies reach about 0.2 mm in diameter. Isolated colonies are randomly taken and placed individually in the wells of a 96-well culture plate. Each well is filled with 100 μl of nonselection liquid medium. After 4 h of incubation at 37°C, a 96-pin replicating tool (Model 140500; Boekel, Feasterville, PA) is used to transfer about 5 μl of culture from each well onto each of six assay agar plates. (The liquid cultures, after spotting, continue to grow for several hours to saturation and are stored at −20°C with 10% glycerol for DNA sequencing.) Chloramphenicol concentrations in the six assay plates are 0, 3.3, 10, 33, 100, and 330 μg/ml; for convenience, these concentrations are abbreviated 0×, 0.1×, 0.3×, 1×, 3×, and 10×, respectively. The 0× plate serves as a nonselection control. After incubation for 18 h, cell spots on the plates are evaluated. Examples cell spots on assay plates are shown in Fig. 3.

Figure 3
figure 3

Example of chloramphenicol resistance assay plates. In each panel, the chloramphenicol concentrations in the top row, from the left, are 10×, 3×, and 1×; those in the bottom row, from the left, are 0.3×, 0.1×, and 0×. The sample colony clones are replicated as culture spots on each of the assay plates as described in the text. The plates were photographed after 14 h of incubation. A shows sample clones before selection of the third evolution cycle of Experiment 2, and B shows sample clones after selection of the same cycle. Within each panel, a sample clone is spotted on the same coordinate on each of the six assay plates, and the spot density can be evaluated to determine the chloramphenicol resistance.

A chloramphenicol resistance value is assigned to each sample clone according to the growth on the assay plates. Specifically, the value is the highest chloramphenicol concentration of agar on which the cell spot density approaches saturation. For example, if the spot density is near saturation on 0×, 0.1×, 0.3×, and 1× chloramphenicol but decreases sharply on 3× or 10×, the promoter activity is assigned as 1×. In most cases, the chloramphenicol resistance can be clearly scored. However, for about 15% of the clones, the spot density drops more gradually at higher chloramphenicol concentrations. In that case, half of these clones are arbitrarily assigned to the highest chloramphenicol concentrations of full spot density, and the rest are assigned to the next highest chloramphenicol concentrations on which the spots are only about half-grown.

This assay is not as quantitative as growth rate measurement of liquid culture but is much simpler. The chloramphenicol resistance of a clone can be easily scored. We use the measured chloramphenicol resistance as the value of promoter activity (see Materials and Methods). As reported previously, a stronger promoter defined by the drug resistance also shows higher activity in vitro and in vivo (Horwitz and Loeb 1986, 1988). We have verified that stronger promoter activity confers higher chloramphenicol resistance in liquid culture and that a 10× activity is approximately the level of a fully induced lac promoter (data not shown).

Results of Individual Evolution Experiments

Experiment 1, Selecting Promoters from a Random Sequence Library

In this experiment, promoters are selected from a library of random sequences. This experiment is essentially the same as those reported previously (Horwitz and Loeb 1986; Horwitz and Loeb 1988; Oliphant and Struhl 1987, 1988). We use this experiment to verify the selection procedure. The random library is made of synthetic oligonucleotides consisting of a 40-nucleotide random region flanked by two primer sites (Fig. 4). The single-strand DNA is made to double strand by one round of replication with primers of the complementary strand (one-step extension; see Materials and Methods). After insertion and transformation, about 1.4 × 104 transformants are subjected to selection. Several clones from each of 0.3×, 1×, and 3× chloramphenicol plates are sequenced (Fig. 4). Based on sequence similarity, the −10 and −35 elements can be identified for most of the clones. In contrast, no promoters are identified among several samples prior to the selection (data not shown). The results indicate that the selection is effective.

Figure 4
figure 4

Promoter sequences from Experiment 1. They are found by one round of selection from a random sequence pool. The sequences are divided into three groups according to promoter activity (i.e., chloramphenicol resistance) of the corresponding clones. For example, the top 15 sequences are of 3× promoter activity. The groups are separated by a line space. Within each group, a missing nucleotide position in a sequence is marked by a hyphen. The short sequences at the top of each group are within the primer sites. If in some cases, duplicated sequences are found, then the number in parentheses is the number of duplicates. In this and subsequent figures, identified promoter elements are in boldface. If more than one promoter can be identified, then the elements of the second promoter are underlined.

Experiment 2, from a Single Initial Sequence to Functional Promoters, Evolution with Changing Mutation Frequencies

This is our pilot experiment to evolve E. coli promoter from a single starting sequence. The length of the mutable region is 37 bp (Fig. 5). Prior to this experiment we estimated that a clonal size of the order of 107 to 108 and many cycles would be necessary to evolve a weak promoter. Therefore, we applied a relatively high mutation frequency in the first cycle. But in fact, promoter solutions emerge much faster than expected. With a mutation frequency of 7%, from 5.8 × 104 transformant clones in the first cycle, already promoters emerge. In the two subsequent cycles, the mutation frequency is reduced to 2%, and clonal sizes are 4.8 × 104 and 3.5 × 106. The range of chloramphenicol concentrations in selection agar plates is between 0.03× (1 μg/ml) and 3×. In this experiment, the population becomes mostly promoters in three cycles (Fig. 6).

Figure 5
figure 5

Promoter sequences from Experiment 2. The short sequence at the very top is the Up primer. The next line is the initial sequence for the experiment. The sequence identification code is explained under Materials and Methods. The identified promoter elements are in boldface. Mutated positions are highlighted, and deletions are marked by hyphens. N is an undetermined base. In many cases, several sequences are evidently closely related because they share the same composition at the mutated positions. In such a case, only one representative sequence is shown, and the number in parentheses is the number of these related sequences. The sequences from each evolution cycle are grouped together and are separated by a line space from another group.

Figure 6
figure 6

Promoter activity distribution of Experiment 2. There were three evolution cycles in this experiment. Samples from before and after selection for each cycle were collected; therefore, there were six sampling points along the evolution process. About 96 clones at each sampling point were assayed. Promoter activity was evaluated at six discrete levels: 0×, 0.1×, 0.3×, 1×, 3×, and 10×, as described under Results. The promoter activity distributions are represented by a group of ribbon lines made using the Microsoft Excel program.

A majority of the clones from the third cycle are resistant to 3× chloramphenicol. There are two major types of sequences (Fig. 5). Twenty-eight of 37 sequence samples are similar to 2.3.12. or 2.3.9. Among these clones, the −35 element identified is located in the upstream primer site, and the −10 element in the mutable region. Between −35 and −10, there is an 18-bp AT-rich region. This group of solution already appears after the first cycle, but only 1 sequence among 12. After the second cycle, it becomes the most frequent one. This solution is puzzling because it is very different from the starting sequence, thus unlikely to be generated by stepwise single base substitutions.

Seven of the 37 sequences are related to another group represented by 2.3.26. This group has an E. coli consensus −35 element in the mutable region. Downstream of the −35, with a 17-bp spacer, there is an easily identifiable −10 element. It requires only three base pair alterations to obtain this promoter from the starting sequence. A precursor of this solution is already seen after the second cycle. This precursor has two of the three specific mutations. Besides the two major groups, two sequences have deletion mutations.

In this and the next three experiments, the initial sequence is confirmed to be nonfunctional (results are not shown but they can be inferred from the activity distribution curve prior to the first selection). This experiment indicates that a functional promoter is not far from any initial sequence. At a mutation frequency of a few percent, it is unnecessary to find the solutions among a large population.

Experiment 3, Evolution at a Low Mutation Frequency

In this experiment, the initial DNA is a single sequence of 41 bp (Fig. 7). The mutation frequency is 0.4% (13 mutations among 3198 bp examined), achieved by a 10-fold mutagenic PCR amplification. We executed nine evolution cycles in this experiment. The clonal sizes of cycle 1 through cycle 9 are 3.5 × 104, 2.5 × 104, 3.4 × 104, 9.0 × 104, 9.0 × 103, 1.9 × 105, 2.4 × 105, 4.7 × 103, and 3.8 × 104. Chloramphenicol resistant increases gradually throughout the cycles and reaches a steady level after six cycles (Fig. 8). In cycles 6, 7, 8 and 9, the mean chloramphenicol resistance of preselection samples becomes high. In other words, the population retains more chloramphenicol resistance after mutagenesis.

Figure 7
figure 7

Promoter sequences from Experiment 3. The sequences are organized and presented in the same way as for Fig. 5. Only the sequences of cycles 3, 6, and 9 are shown.

Figure 8
figure 8

Promoter activity distribution of Experiment 3. Data collection and ribbon presentation are the same as in Fig. 6 except that there were nine cycles in this experiment. For clarity, the ribbon plots are grouped into two panels: (A) before selection of each cycle and (B) after selection. The samples were assayed in two batches. Samples from cycles 1 through 3 were assayed at the same time using one batch of plates. The samples from cycles 4 through 9 were assayed in another batch.

The sample sequences after three, six, and nine cycles of evolution are shown in Fig. 7. After cycle 3, two major types of solutions appear (3.3.6 and 3.3.9; Fig. 7). There is also a single-base pair deletion type (3.3.17). This deletion appears more frequently through the next three cycles.

After cycle 6, a 2-bp deletion (3.6.73) appears. This 2-bp deletion becomes dominant after cycle 9. In cycle 6, one clone seems to be a recombinant of two earlier sequences. This recombinant type becomes more frequent after nine cycles. It is remarkable that nearly all the final solutions have the TGTG motif of the extended −10 promoter.

One conclusion from this experiment is that deletion is a very important factor to adjust the relative positions of the two promoter elements and that recombination is also exploited to create promoter sequences.

Experiment 4, Evolution at a High Mutation Frequency

In a way, evolving sequences is analogous to physical particles in an energy landscape: high mutation frequencies could prevent DNA sequences from being trapped in a potential well. It is plausible that, at high mutation frequencies, solutions very different from the ones in Experiment 3 would appear and that the solutions would be more diverse. On the other hand, the quantitative aspects of this notion are uncertain. For example, how high of a mutation frequency is sufficient to see the effect? In this experiment, the arbitrary starting sequence is identical to that in Experiment 3. The mutation frequency, however, is increased to 3.6%. The clonal size of the initial cycle is estimated to be between 1 × 104 and 1 × 105. In the subsequent cycles, the sizes are 1.3 × 104 and 1.9 × 104. After three cycles, the population is highly chloramphenicol resistance (Fig. 10). The majority of the clones are resistant to 3× and 10× chloramphenicol.

Two major types of sequences are found after three cycles (Fig. 9). Among 26 samples, 13 clones are similar to 4.3.17; they have a 5-bp deletion. Eleven other clones are related to 4.3.7 and have a 1-bp deletion. These two types are not seen among the 20 samples after the first cycle. There are several other substitution mutations among these sequences, but we do not know how they contribute to the promoter function.

Figure 9
figure 9

Promoter sequences from Experiment 4. Only the sequences from cycles 1 and 3 are shown. The sequences are organized and presented in the same way as in Fig. 5. Base position W is a mixture of A and T.

Figure 10
figure 10

Promoter activity distribution of Experiment 4. Data collection and ribbon presentation are the same as for Fig. 6.

In addition to the above two types, there is a minor type represented by clone 4.3.4. It is very similar to 3.3.9, but we are uncertain whether they have evolved independently in this experiment, or they are from cross contamination with Experiment 3. Furthermore, the sequences related to 4.3.0 are puzzling. They are the same as the starting sequence. Why are they present in the third cycle? Maybe these clones survive on the selection plate due to a titration effect from the neighboring strongly chloramphenicol-resistant colonies on a crowded, low-chloramphenicol concentration plate.

We do find diverse promoter solutions in this experiment. However, a very different solution is likely created by a deletion, rather than by a high level of base pair substitution. In other words, the escape from a trap is likely through “tunneling.” Further refinements are needed to weed out deletion mutants and possible contaminants in order to address the initial questions in this experiment.

Experiment 5, Evolution at an Extremely High Mutation Frequency Over a Longer Sequence Region

In this experiment, mutation frequency is elevated to a practical limit, beyond which PCR becomes difficult. The length of the mutable region is increased to 60 bp (Fig. 11). The logic is similar to that in Experiment 4: At a very high mutation frequency on a longer region, we expect many types of solutions to coexist.

Figure 11
figure 11

Promoter sequences from Experiment 5. Only the sequences from cycles 1 and 3 are shown. The sequences are organized in the same way as for Fig. 5. The mutated positions are not highlighted as in previous sequence data figures.

The mutation frequency is about 18%, achieved by three mutagenic PCR in tandem. Three evolution cycles are carried out in this experiment. The clonal sizes of the three cycles are 1.5 × 104, 2.4 × 105, and 2.4 × 104. The population immediately becomes highly chloramphenicol resistant after one evolution cycle; a majority of the clones are resistant to 3× chloramphenicol. The phenotype distributions of the three evolution cycles are similar (Fig. 12). After each mutagenesis, the clones become mostly nonfunctional. After selection, the clones are mostly resistant to chloramphenicol again.

Figure 12
figure 12

Promoter activity distribution of Experiment 5. Data collection and ribbon presentation are the same as for Fig. 6.

After the third cycle, among 23 sequence samples, 18 appear to be extended −10 type (Fig. 11). Four other clones lost the extended −10 character but appear to gain a −35 and a −10. Finally, one clone does not have an obvious −10 element coupled to a −35. Among the 18 extended −10 type clones, 15 have the stronger TGTG (Burr et al. 2000) in place of GGTG of the starting sequence. Four of the 18 clones have a single-base pair deletion, which brings the space between the −10 and the possible −35 elements to 17 bp. The results after the first evolution cycle is similar but with a smaller sample size (Fig. 11).

From this experiment we conclude that whatever we try, the set of solutions still converges to very few groups. This experiment also suggests that at a very high mutation frequency, single-base substitutions alone are sufficient to create effective promoters, and hence, deletion plays a less important role.

Discussion

There are two general questions in molecular evolution: (1) what is the end product of an evolution and (2) how that product emerges. We explored these questions by evolving E. coli promoter experimentally from an arbitrarily chosen, nonfunctional sequence. The procedure used in the experiments was effective. With few iterations of mutation-selection, each lasting less than 2 days, a nonfunctional sequence evolved into a population of promoters.

This selection method is essentially the same as previously reported by others (Horwitz and Loeb 1986; Horwitz and Loeb 1988; Oliphant and Struhl 1987, 1988). This method is based on the actual in vivo promoter function, and it is conceptually different from the RNA polymerase affinity based in vitro method (Gaal et al. 2001; Xu et al. 2001) used in SELEX. We are confident that the evolved sequences are, by definition, functional promoters. Given that the selection relies on the cellular supplied E. coli σ70 RNA polymerase in growing cells, it was no surprise that nearly all of the experimentally evolved promoters bear sufficient similarity to the natural σ70 promoter; the exceptions are very few. Apparently, in sequence space, the promoter solutions are largely distributed around the consensus sequence.

With the procedure described here, some parameters, such as mutation frequency, selection stringency, and population size, could be adjusted, within a range. We primarily tested the effect of one parameter: the mutation frequency. We saw that at a higher mutation frequency, the promoter emerged faster. Again, this result was also anticipated. The knowledge gained from the experiments here was mostly concerned with the dynamic properties of promoter evolution. Prior to the experiments, we expected a very slow emergence of very weak promoters initially, with these weak promoters gradually improving through base pair substitutions. We roughly assumed that at least six base pair positions are required to be the same as in the consensus sequence in order for a sequence to have any detectable promoter function. We predicted that with a moderate mutation frequency, the evolved sequences would be “trapped,” meaning that in each cycle, the sequences are of the same type as in the previous cycle. The experiments lent us an opportunity to examine some of the above notions. Many dynamic properties observed in the experiments were actually rather unpredictable, at least not anticipated by us prior to the experiments.

For instance, promoters emerge faster than we originally imagined. At a mutation frequency of about 0.4%, with a population size of less than 106 clones, and in three cycles of evolution, the population shifted from a single non-functional sequence to functional promoters. At higher mutation frequencies, promoters emerged even faster. Certainly, about half of the promoters utilized an unintended potential −35 element within a PCR primer site, and therefore, were only partially evolved. However, after discounting these promoters, in every experiment we still found that promoters completely evolved within the designated region between the mutagenic PCR primer sites.

Several factors may have accelerated the emergence of promoters: among them is the selection scheme. By plating cells in parallel at several drug concentrations, the selection stringency was effectively adaptive to each cycle. Very weak promoters, which would be deterred by stringent selection, were likely enriched in early cycles.

The occurrence of extended −10 type among native E. coli promoters (Burr et al. 2000) was higher than random chance. Because this high frequency could have been due to some unknown transcription regulation functions of extended −10 promoters, our early analysis overlooked the roll of extended −10 type as an alternative path of the evolution. The TG or TGTG motif of extended −10 type appeared in most of the evolved promoters. The frequent appearance of extended −10 sequences let us think that because the TG or TGTG motif was very short, it is highly probable to create an extended −10. An extended −10 can serve as an intermediate from which a “full-fledged” consensus type promoter can evolve. Apart from any possible transcription regulation advantages, the high frequency of the extended −10 solution among native E. coli promoters (Burr et al. 2000) may have been a reminiscence of evolution intermediates.

In addition to the above factors contributing to quick emergence of promoters, through statistics, 3 of the 12 base positions of the consensus promoter would be expected in a randomly chosen sequence of 23 bp (covering −35 and −10 with a 17-bp spacer). Therefore, on average, one only needs to alter 3 bp to generate a weak promoter sequence (assuming that six correct positions are sufficient to form a promoter). Our results were in agreement with this analysis. In retrospect, the fast appearance of promoters is not surprising.

Throughout the experiments the solutions always converged to very few types. In hopes of capturing many functionally equivalent solutions from each experiment, some measures were taken to preserve diversity. Colonies were selected on an agar surface in order to prevent very few clones from immediately taking over the population. The selection output of each cycle was “doped” with a small fraction of cells collected from less stringent selection plates. Using a multicopy plasmid also reduced the effective selection stringency and, thus, allowed moderate promoters to survive. Despite these arrangements, the evolved solutions still converged into very few groups, even at an extremely high mutation frequency and with a longer mutable region (Experiment 5). Seemingly, the converging “force” of selection was still greater than the dispersion “force” by base pair substitution mutations.

In a case where the evolution continued for nine cycles (Experiment 3), we noticed that, while the resistance of the postselection population remained essentially unchanged, the population after mutagenesis (i.e., before selection) became progressively more chloramphenicol resistant in late cycles. We speculate that this phenotype change was partially due to an implicit selection for stability against the effect of mutation in the experiment. First, it is possible that the sequence itself was less mutable at critical positions. For example, in the particular mutagenic PCR used in the experiments, a GC pair was less likely to mutate, and therefore, the descendants of a sequence with GC pairs in critical positions should be less likely to vary. Second, a promoter in a promoter-dense region in sequence space was likely to evolve into a group of functional variants. Conversely, a sequence in a promoter sparse region tended to lose its descendants. In other words, a promoter that had more positions to “spare” would survive better against mutagenesis. As a special case, a promoter region containing multiple promoters was more likely to retain promoter function than a single-promoter region after mutagenesis. More refined experiments may tease apart the above contributing factors.

Finally, we found it interesting that a large portion of evolved sequences contained small deletions of several base pairs. (Such deletion is likely to have originated in mutagenic PCR.) Given that deletion is rare (only a few percent among all mutations from mutagenic PCR), its role must be significant in promoter evolution. Often, it appeared that deletion reduced the distance between potential −35 and −10 elements to the optimal 17 or 18 bp, for instance, in sequences 2.3.3, 2.3.4, 3.3.17, 3.6.73, 4.3.7, and 4.3.17. In less frequent instances, deletions improved −10 elements, as in sequences 2.3.4 and 4.1.2. Similarly, recombination between heterologous sequences was observed only once throughout the experiments. This event brought −35 and −10 element from two separate sequences into one. It is plausible that creation and improvement of −35 and −10 elements occur largely through base pair substitutions, and that deletion and recombination bring existing elements into proper perspective.

The specific experimental design worked effectively in our experiments and generated some interesting and unexpected results. However, there is plenty of room for further improvement of the design. For instance, the specific mutagenesis method has a skewed mutation spectrum. We observed that overexpression of the selection gene cat could be detrimental to the cell (results not shown). Therefore, the promoter of moderate activity could reach a saturation level of the drug resistance. The selection was for constitutive promoters. The above limitations are not, however, fundamental. The mutation spectrum could be altered almost at will by using different mutagenesis methods (Zaccolo and Gherardi 1999; Zaccolo et al. 1996). The dynamic range of selection can be further extended by using a single-copy plasmid, a weaker ribosome binding site, or another selection marker. To evolve more complex, regulated promoters, an elaborate selection scheme involving counter selection is needed. For that, single-copy plasmid can be highly desirable in order to avoid titration of transcription regulator proteins.

Because alteration of the RNA polymerases or introduction of another bacterial RNA polymerase may severely perturb the growing cells, this in vivo selection is limited to the bacterial σ70 promoters. We do not know how to circumvent this limitation in order to study promoter-RNA polymerase coevolution; certainly such coevolution is an important aspect of speciation. An exception to this limitation may be the evolution of certain phage promoters, such as a T7 promoter. A phage RNA polymerase is specific to the phage’s own promoters. By transforming cells with variants of the phage RNA polymerase gene in addition to the probable selection plasmids, the selection can be applied to the RNA polymerase-promoter pair, and the experimental evolution may re-create a family of the existing RNA polymerase-promoter pairs and create new ones.

Experimental evolution does not reproduce the natural evolution history. In general, evolution in nature involves a large population over a long period of time and is constrained by certain historical conditions, many of which may forever be obscure. In contrast, the experimental evolution course is very short, the population size is small, and the mutation frequency is usually elevated in order to accelerate the process. Nevertheless, the experiments capture the essence of evolution; they are an iteration of mutation, selection, and replication. Experimental evolution permits us to adjust individually the parameters and repeat the process under controlled conditions, and thus, it lets us observe some essential features in the process and test certain hypotheses. In this way, molecular evolution is a bridge between the purely mathematical modeling and the “real world” in nature. Certainly the molecular evolution procedure reported here could also serve as a tool to create promoters for application purposes.