Keywords

1 Introduction

Promoters and terminators play an indispensable role in metabolic engineering and synthetic biology applications for controlling gene expression. These critical elements play a part in regulating both the strength of transcription and the longevity of the transcript. Together, these two forces dictate the overall abundance of mRNA within the cell and ultimately play a significant role in determining protein contents within cells. At the same time, optimizing microorganisms for chemical production via metabolic engineering often requires the use of these elements to create highly regulated intracellular flux [1], often through high-strength promoters [2]. Fine-level control, inducibility, and expression range are all quite important in these endeavors, as has been seen with large strain engineering efforts such as rewiring the yeast Saccharomyces cerevisiae for industrial-level heterologous artemisinin production [3]. Fortunately, our understanding and cataloging of synthetic control elements such as promoters and terminators is continuously improving. In this chapter we consider the selection and engineering of both promoters and terminators for a variety of possible host organisms. Initially, we describe early strategies which mainly relied on genome mining and semi-rational mutagenesis techniques to improve sequence diversity and function. Next, we describe recent advances in the design of these parts using techniques such as hybrid engineering, high-throughput characterization, thermodynamic modeling, synthetic part development, and rational design. In each of these cases, both our understanding and the utility of these parts are enhanced, thus increasing the rate of design cycles within cells.

2 Early Efforts of Promoter Identification and Diversification

2.1 Native Promoter Mining

The initial set of catalogued promoters for synthetic use was derived from the genome of the host organism or a phage that targets the host organism [48]. These promoters were often uncovered as a result of genomic dissections. The advent of genome sequencing and annotation (especially of hosts such as Escherichia coli and S. cerevisiae) allowed for the rapid discovery of endogenous promoters, especially when coupled with mRNA quantification methods. In a similar fashion, promoters for more complex systems such as mammalian hosts have largely been discovered via high-throughput screening methods such as “promoter trapping [911].” This approach typically involves random integration of a promoter-less vector containing GFP followed by fluorescence-based selection to determine adjacent, upstream regions of the genome that enable transcription. In similar fashion to other hosts, the sequencing of genomes (such as the CHO genome [12]) allowed for the discovery of novel, dynamic promoters such as pTXnip, which expresses proportionally to cell density [13].

Libraries of native promoters serve an important role as major synthetic parts and are among the most highly characterized [14, 15]; however, they remain limited in their ability to sample complete gene expression ranges. Although multiple gene overexpression techniques have been used in E. coli [1618] and S. cerevisiae [1922], among other organisms, this approach can be limited and leads to the build-up of toxic intermediates that reduce productivity [23]. In some cases – including commonly-used native promoters in S. cerevisiae – dependencies such as carbon-source metabolism [24] can impact part performance. Such a conditional function is exacerbated in mammalian hosts, as commonly-used viral promoters vary widely in performance between cell lines and are often unstable after many cell generations [2527]. As a result, further engineering of promoters is necessary to obtain desired fine-tuned expression, stability, and conditional performance.

2.2 Mutagenesis Techniques to Diversify Promoter Strength

Random mutagenesis is a powerful approach to augment promoter function without explicitly requiring extensive knowledge of sequence-to-function mapping. Specifically, because mutagenesis techniques such as error-prone PCR (Ep-PCR) indiscriminately target both consensus and non-consensus promoter regions, libraries with a large dynamic range of promoter function can be easily obtained. For instance, error-prone PCR was used to generate a mutant library of the prokaryotic PL-λ bacteriophage-derived promoter, enabling a 196-fold dynamic range of expression in E. coli [28]. The utility of this library was demonstrated by optimizing the expression of phosphoenolpyruvate carboxylase (ppc) for biomass yield and deoxy-xylulose-P-synthase (dxs) for maximal lycopene production. The importance of an expression continuum was highlighted by the fact that optimal dxs expression was dependent on strain genetic background. Similar mutagenesis of the strong constitutive S. cerevisiae TEF1 promoter yielded a library exhibiting a 15-fold dynamic range [28, 29]. Likewise, this library was used to optimize glycerol 3-phosphate dehydrogenase (GPD1) expression for glycerol overproduction in yeast.

As an alternative to Ep-PCR, serial deletion of promoter regions has been used to modulate expression, especially for mammalian hosts. Initially, serial deletion was used as a genetic tool to systematically remove portions of a promoter sequence to better understand function [3032]. As these deletions often tend to dampen promoter activity, this approach has recently been used to generate libraries of weaker promoters [33, 34]. In this regard, serial deletion has been used to create knockdown libraries of glutamine synthetase (GS) expression for the GS-CHO expression system [35]. Moreover, serial deletion can also identify promoter variants that are cell-line specific. For example, the human cytomegalovirus (hCMV) promoter was optimized for transgene expression in both CHO-K1 and HEK-293 cells [36]. This study found that the full-length promoter gave the highest stable expression in CHO-K1 cells whereas the addition of the first exon to the minimal enhancer and core promoters was optimal for expression in HEK293 cells.

Although Ep-PCR and serial deletion are effective at creating a large dynamic range of promoter strength, these approaches suffer from two major deficiencies: (1) higher level expression is hard to achieve and (2) large pools of inactive mutants are generated because of aberrant mutagenesis of elements critical for transcription [2]. Newer techniques (described in the sections below) are required to gain higher expression consistently. To address the second limitation of large inactive pools, more targeted approaches that make use of molecular understanding of promoter function can be employed. As an example, a saturation mutagenesis approach (Fig. 1a) was used to specifically modulate the sequence between consensus −35 “TTGACA” and −10 “TATAAT” motifs [37]. As these two motifs are both necessary and sufficient for the recruitment of the σ70 factor of RNA polymerase II (RNAP II) to initiate transcription [38], a randomized linker region was generated that resulted in a promoter library with a 400-fold dynamic range in Lactococcus lactis [39]. To improve the dynamic range further, a library including mutations of the −35 and −10 motifs exhibited another three orders of magnitude in range, thus demonstrating the importance of the entire promoter sequence [39].

Fig. 1
figure 1

Saturation mutagenesis strategies used to diversify promoters and improve understanding of promoter design rules. (a) Prokaryotic promoters have a highly constrained architecture with consensus −35 and −10 motifs spaced by exactly 17 base pairs for optimal function. (b, c) Eukaryotic promoters lack a rigidly-defined consensus architecture. (b) In yeast, promoters can be broken down into an upstream activating sequence (UAS) containing transcription factor binding sites (TFBSs), such as those for GCR1p (CT-Box) and Rap1p (RPG-box), and a core promoter which serves to recruit RNA Polymerase II. (c) In mammalian hosts, promoters follow a similar general architecture but contain additional consensus motifs such as the initiator element (INR, shown above), transcription factor IIB recognition sequence (BRE), motif ten element (MTE), and downstream promoter element (DPE)

Eukaryotic promoters, although more complex and less rigidly defined than prokaryotic counterparts, can be broken down into a core promoter [40, 41] and upstream enhancer element(s) [42, 43] located 5′ of the core promoter. Efforts to engineer these distinct elements have been successful. For example, Jeppsson et al. created an ENO1-based promoter scaffold (Fig. 1b) containing two GCR1p TFBSs, two Rap1p TFBSs, and a TATA box coupled by spacers whose length was based on the architecture of native promoters [44, 45]. Randomization of these spacer regions afforded 37 synthetic promoters that spanned 3 orders of magnitude in strength. The utility of this library was demonstrated for the controlled knockdown of ZWF1 expression, resulting in a 16% increase in yeast ethanol production from xylose fermentation. Finally, this same approach of creating synthetic promoter scaffolds followed by saturation mutagenesis has been applied to mammalian promoters (Fig. 1c) in which mutagenesis of regions between TFBSs in the JeT promoter afforded a weakened synthetic promoter library with a tenfold range [46].

Collectively, these early mutagenesis techniques demonstrate that utilizing native promoters (prokaryotes) or constructing synthetic promoters (eukaryotes) followed by randomization of spacer regions can provide a promoter library marked by downregulation. Although efforts continue to use these approaches, a greater understanding of promoter architecture and high-throughput characterization techniques have yielded new methods to design promoters rationally with highly specific expression characteristics as described in the following sections.

3 Rational Construction of Promoters with Desired Characteristics

3.1 Hybrid Promoter Engineering

Once essential components of promoter architecture are defined, it is possible to combine disparate elements in a “hybrid promoter engineering” scheme. Importantly, in contrast to Ep-PCR and saturation mutagenesis, the construction of hybrid promoters often yields synthetic promoters which are stronger than the core scaffold [2]. Thus, this technique serves as a potent way to amplify the expression of promoters – an important goal of many engineering endeavors. The first instance of hybrid promoter engineering involved the fusion of the trp and lac promoters to create the tacI and tacII promoters [47]. Notably, this resulted in promoters that were between 7 and 11 times stronger than the derepressed lac promoter although maintaining the same regulation. Similar approaches in E. coli have been utilized to generate regulated promoters. For instance, a strong binding site for the FadR transcription factor was placed upstream of the strong phage promoters PL and PT7 to create a dynamic biosensor-regulator for acyl-CoA conversion to fatty acids in E. coli [48]. A similar concept was used to produce a malonyl-CoA responsive hybrid promoter that controlled flux from acyl-CoA to malonyl-CoA [49]. However, prokaryotic promoters may also be limited by promoter escape after transcript initiation, meaning that the addition of redundant hybrid elements is not guaranteed to improve transcription and can reduce transcription in some cases [50].

Unlike prokaryotic promoters, eukaryotic promoters are largely enhancer-limited, meaning that the addition of enhancer elements (by including additional binding sites) can both regulate and amplify promoter activity (Fig. 2a) [51]. Combining previously isolated Upstream Activating Sequences (UASs) from CYC1 [52, 53], CLB2 (UASCLB) [54], CIT1 (UASCIT) [55], GAL1-10 (UASGAL) [56], and TEF1 (UASTEF) [51] with core promoters such as GPD (PGPD) [24], TEF1 (PTEF) [4], LEU2 (PLEUM) [52], and CYC1 (PCYC) [57] can result in a predictable increase in transcriptional activity [51]. Ultimately, the strongest constitutive promoter in yeast was generated which had mRNA levels 2.5-fold higher than the GPD promoter [24]. Hybrid yeast promoters can also be designed for altered regulation. For example, linking various elements of UASGAL to a constitutive core results in a functional, galactose inducible promoter [51]. A similar approach has been conducted with regulated regions of the ARO9 UAS [58]. Collectively, these approaches resulted in a library of galactose-inducible promoters with a 40-fold range in induced expression strength, and a tryptophan-inducible promoter with a 29-fold range in induced expression strength. This hybrid promoter approach has been extended to non-conventional yeasts such as the host Yarrowia lipolytica. For example, hybrid engineering on the LEU2 core promoter resulted in a constitutive promoter library with 400-fold range in expression [49]. Most importantly, this work demonstrated the generalizability of the hybrid promoter approach to multiple core promoters and alternative UAS elements [59]. Such strong promoters were used in the rewiring of Y. lipolytica, in which constitutive overexpression of DGA1 using the UAS1B16-TEF1 hybrid promoter (among other genetic changes) resulted in a 60-fold improvement in lipogenesis [60].

Fig. 2
figure 2

Promoter engineering strategies. (a) Hybrid promoter engineering uses combinations of sequence motifs to modify expression and regulation. (b) Synthetic promoter scaffolds may be constructed based on native promoters with desired characteristics. These scaffolds can then be diversified using saturation mutagenesis and modified via hybrid promoter engineering. (c) Minimal synthetic core and enhancer elements may be selected using randomization followed by FACS. (d) Promoter elements have been fully characterized via high-throughput oligo library synthesis followed by FACS sorting into different expression bins. (e) Expression can be tuned by altering nucleosome occupancy using a nucleosome prediction model or via addition of nucleosome-disfavoring poly (dA:dT) tracts. Chromatin regulators (CRs) can program a diverse range of transcriptional logic when targeted to synthetic promoters, thus creating more efficient synthetic circuits

Finally, the hybrid promoter approach has been further generalized to mammalian systems. For instance, the binding site of repressor PDX1 in the hCMV promoter was removed, enhancing expression fourfold in transient luciferase experiments [61]. The traditional additive hybrid approach has also been generalized to mammalian hosts to increase expression [62], improve transgene expression in specific hosts [63, 64], and impart novel regulation on promoters. As an example, a strong, cold-inducible promoter was created by combining a mild-cold responsive enhancer (MCRE) to the hCMV promoter [65]. Using this promoter and shifting temperature from 37°C to 32°C afforded sixfold higher erythropoietin production. Collectively, these results indicate that the hybrid promoter approaches are useful in both increasing net expression and imparting unique regulation.

3.2 Synthetic Promoter Scaffolds and Libraries

More recently, efforts have been made to establish synthetic and/or orthogonal [66, 67] promoters. Certainly bacterial systems can take advantage of the T7 RNA polymerase system [68] to generate short, synthetic, and orthogonal promoters for usage in logic gates [6971]. However, the diversity of synthetic prokaryotic promoters is limited by the strict consensus promoter architecture not found in eukaryotes. To create a library of orthogonal core promoters in S. cerevisiae, native promoters were screened over a wide range of growth conditions to find a promoter scaffold that would exhibit the least amount of natural regulation [67]. The resulting candidate promoter, PFY1 (PPFY1), was then de-constructed to produce a minimal promoter scaffold (Fig. 2b) containing the ~100-bp core promoter, a Reb1p binding site, and a poly-dT element that maintained nucleosome depletion and constant DNA bending for constitutive RNA polymerase II access. By randomizing the spacer regions within this core promoter, a library of 36 minimally-regulated promoters with a 10-fold dynamic range in expression was created. This same methodology has been generalized to other organisms including Pichia pastoris, where four natively regulated promoters were sequence aligned to create a set of minimal core promoters from which sequence elements were transferred to modify the native AOX1 promoter [72]. This same approach has been applied to human liver cells where a synthetic promoter scaffold with enhanced TF binding was created via the alignment of the hCMV and HEF1α promoters [64].

In an effort to generate more minimal, synthetic promoters using a library-based approach, Redden and Alper [73] developed an S. cerevisiae minimal core promoter scaffold (Fig. 2c) by dissecting both the core element and the UAS element and identifying functional, minimal units using a library-based approach involving FACS analysis and a series of robustness tests. Ultimately, a series of nine generic core elements were isolated which have limited homology to the genome. The same methodical workflow was used to isolate six synthetic 10-bp UAS sequences that activated these synthetic core promoters. Finally, these elements were combined to generate a minimal promoter with 70% the activity of GPD with an 80% reduction in size. Importantly, these promoters represent a minimal scaffold with highly defined consensus regions similar to those of prokaryotic promoters and thus these elements may be further rationally engineered for desired characteristics. Finally, in HeLa cells, synthetic 100-bp enhancers were created via construction of a library containing tandem repeats of random, micro-array printed 10-bp oligonucleotides [74]. This approach resulted in an enhancer with twice the strength of the hCMV enhancer. Thus, rationally constructing purely synthetic libraries can result in novel promoters with prescribed function across multiple hosts.

4 Sequence-Level Prediction and Specification of Promoters

Most of the methods described above rely heavily on repeated iterations of the synthetic biology design-build-test cycle [75, 76]. In contrast, the ability to specify promoter function at the DNA level would rapidly accelerate the field of synthetic biology by reducing the number of design cycles. This section describes many of the efforts that have been made toward this end.

4.1 Promoter Characterization and Standardization

Promoters, composed of a vast array of distinct regulatory elements, behave as a system that integrates an input from the host to produce an output: gene expression. As high-throughput oligo synthesis [77] and quantification of DNA, mRNA, and protein levels have improved, large combinatorial libraries may be generated to measure promoter performance across a wide range of contexts (Fig. 2d). For instance, in prokaryotes, the Ribosome Binding Site (RBS) controls the binding of the ribosome to the mRNA transcript, thus regulating gene expression at the translational level whereas the promoter regulates expression at the transcriptional level. The independent function of these two regulatory elements has been thoroughly characterized and modeled via the construction of a library containing combinations of 114 promoters and 111 RBSs [78]. Although the model could explain 96% of RNA levels, its prediction of 82% of protein levels demonstrates the complex regulation of prokaryotic gene expression at the translational level. Thus, it is important to consider RBS performance when designing expression cassettes in pathways.

Eukaryotic transcription is regulated by a complex “program” of TF binding and RNAP II recruitment, and thus underlying “design rules” can be extracted that determine how the orientation, copy number, and context of TFBSs affect transcription. To parse these design rules, Sharon et al. [79] created a combinatorial library varying these parameters for 75 transcription factors. Fluorescence-activated cell sorting (FACS) coupled with high-throughput sequencing of 6,500 barcoded promoters generated a large dataset that uncovered regulatory design rules for TFs. For instance, in promoters that contained a Gcn4p binding site, expression and binding site location were related via a periodic function. Using a similar high-throughput characterization technique in mouse liver cells, it was possible to rapidly screen thousands of rationally designed enhancer haplotype variants [80]. This study found that enhancers are highly robust to single nucleotide variation (SNV), but that combinations of SNVs have an additive negative effect on function. This study also determined novel expression-enhancing motifs and characterized predicted TFBSs, thus laying the foundation for future enhancer design rules. In mammalian hosts, a similar predictive model has been used to identify K-mers that denote enhancers recognized by certain TFs [81, 82]. This model can be trained on CHIP-seq data [83] to predict enhancers throughout the genome.

Whereas TFBSs with a well-characterized function may be added to tune expression rationally, sequence-function mapping for core promoters is less understood. The core promoter sequence determines how RNAP II binds in the TATA region, forms the pre-initiation complex to unwind the DNA directly downstream, scans for a TSS, and initiates transcription [8486]. Moving towards rational design, 859 native S. cerevisiae promoters were characterized using flow cytometry to generate a model relating maximal expression to short oligo motifs (K-mers) which impact these steps [86]. Although this model only accounted for 25% of the variance in an aggregate test promoter set, it nonetheless mapped expression-enhancing and repressing characteristics to short motifs in the core promoter to allow prediction of novel synthetic promoters. These results were improved upon via construction and high-throughput characterization of 13,000 specifically designed synthetic core promoters [87], leading to a model relating expression to the presence and orientation of consensus core promoter regions. However, despite analysis of thousands of systematically designed core promoters, the design rules for sequence level specification of core promoter activity are much less understood than those for UAS manipulation.

4.2 Thermodynamic Modeling and Prediction of Promoters

To fully expedite the synthetic biology design cycle, it is desirable to develop methods to design entire promoters de novo for predictable expression. In prokaryotes, thermodynamic models of ribosome interaction with mRNA secondary structure have been constructed to calculate the proportion of bound RBS-mRNA complexes, and thus translation rate [88, 89]. A thermodynamics-based RBS calculator was able to predict expression levels within a factor of 2.3 over an expression range of five orders of magnitude. Most importantly, this RBS calculator takes into account variations in translation rate depending on the genetic context of the RBS, thus allowing a “forward engineering” approach for novel applications.

Although eukaryotic transcriptional regulation involves countless protein factor binding events prior to transcription initiation, it is nonetheless possible to thermodynamically model individual steps as a surrogate for transcription initiation rate. A thermodynamic model incorporating both TF-DNA and TF-TF interactions was trained upon a promoter library containing different TFBS combinations using “effective TF concentration” as a floating parameter to fit the data [90]. Overall, the model predicted 56% of the variance in expression across a wide variety of TFBS arrangements, thus laying a foundation for de novo design of regulatory logic at the DNA sequence level.

To generalize this model further, other events in transcription initiation have been considered. Thermodynamic modeling of the TATA–TATA-binding protein (TBP) complex formed as a first step in the recruitment of RNAP II [91] and re-design of promoters with different consensus TATA boxes created a promoter library which predictably scaled with the thermodynamic affinity of TBP to each TATA Box [92]. Incorporating the thermodynamic model for the TBP–TATA complex with the previously developed model for TF-RNA Polymerase II and TF-TF binding [90] explained 75% of variance in promoter expression across a wide variety of genetic contexts. These examples demonstrate the utility of thermodynamically modeling transcription initiation steps as a means to predict expression. Since discovering promoters is highly important for uncharacterized mammalian hosts, thermodynamic sequence-level approaches have been used to predict novel promoters based on DNA structural properties such as duplex stability and bendability [93, 94]. In addition, mammalian promoter regions have been modeled at the sequence level using an “alpha score,” which describes the likelihood that a genomic region contains a promoter based on its nucleotide composition. Remodeling the X-linked gene cancer/testis antigen 1A promoter to have twice the alpha score improved expression in a non-quantitative manner [95]. Although predictive of high expression, these techniques are limited as they cannot design promoters de novo with prescribed expression. Nevertheless, they demonstrate the potential to use heuristic models for the design and prediction of DNA function.

4.3 Prediction and Rational Modulation of Promoter Nucleosome Occupancy

In eukaryotes, the secondary structure of promoter DNA wound around nucleosomes controls access to the transcription machinery [96]. As a result, the rational design of novel promoters must consider how primary sequence contributes to DNA secondary structure. Nucleosome occupancy at promoters strongly regulates gene expression because nucleosome binding can occlude TFBSs and RNAP II recruitment to the core promoter [97]. Accordingly, rational addition of a tunable nucleosome-disfavoring poly(dA:dT) element [91, 98, 99] upstream of the natural Gcn4p binding site in a synthetic His3-based promoter library afforded predictable control over nucleosome occupancy and thus expression [100]. Similarly, mutation of CpG islands known to be prone to methylation and silencing by histones eliminated promoter silencing during long-term transgene expression in embryonic stem cells [101]. Thus, nucleosome-disfavoring sequences may be considered part of the rational eukaryotic promoter engineering toolbox along with the addition of hybrid enhancers (Fig. 2e).

To map nucleosome occupancy to primary sequence for predictive engineering of promoters, a Hidden Markov Model (HMM) was trained on a genome-wide nucleosome map [102]. This model was utilized to investigate nucleosome occupancy of the previously mentioned TEF1 promoter library, demonstrating that expression correlated inversely with predicted cumulative nucleosome occupancy in a very robust manner. To create a predictive model, a greedy algorithm was developed which allowed re-design of native promoters for up to 16-fold greater strength [103]. Furthermore, this approach was used for the successful de novo design of synthetic yeast promoters. Importantly, sequence-level prediction of nucleosome occupancy affords a predictive method to optimize native promoters fully regardless of genetic context. As a result, future efforts in this area must consider the precise control of nucleosome occupancy to modulate expression.

4.4 Design of Synthetic Promoters with Controlled Chromatin Environment

Moving forward from nucleosome models, the context of eukaryotic DNA is important in considering promoter function. Specifically, eukaryotic DNA is wound around histone octamers in 147 base pair increments and packaged together tightly to create the “bead-on-a-string” backbone of the chromatin [104]. This structure is not composed randomly; in fact, the structure of chromatin surrounding genes has a direct impact on their regulation [105111]. Thus, any endeavor to engineer promoters rationally as synthetic biology “parts” that exhibit defined functions in any genetic context must take into account the chromatin environment of the promoter.

The first step towards any rational bottom-up synthetic biology engineering approach is to parse design rules from the native system. To create design rules for chromatin-based control, a combinatorial library of zinc finger-based synthetic transcription factors was created with specific yeast chromatin regulators (CRs) tethered as the activation domain [112]. These CRs impact gene expression by regulating PIC formation, remodeling and assembly of nucleosomes, chromatin accessibility via histone modification, and transcriptional elongation. From this library screening approach, many different classes of CRs were delineated: activators and repressors, synergistic regulators, spatially encoded regulators that could repress transcription from a non-canonical position downstream of genes, and CRs that could activate or repress multiple genes simultaneously over a long range of genomic space. These minimal chromatin-based components can thus act as synthetic “parts” to create a diverse array of transcriptional logic and predictably tune expression by altering chromatin state. These initial efforts demonstrate the first work towards considering greater genetic context for promoters.

In closing, promoter discovery and characterization has progressed from genome mining to random mutagenesis to combinatorial and rational design. In some of these later cases, the use of computational models has been able to speed the design-build-test cycle. Although limitations still exist with respect to inducible promoters, pure synthetic design, and maximal expression levels, the field has progressed rapidly in recent years.

5 Terminator Discovery and Characterization

In addition to promoters, terminators serve as an important control point when tuning expression in circuits and pathways [113, 114]. Unlike promoters, terminator cataloguing has not been as extensive until recently. In fact, most commonly used terminators have been relics from past experiments and are not often the most efficient. As an example, commonly used terminators such as the native bacteriophage T7 terminator exhibit low termination efficiencies, meaning that transcriptional flux continues through the expression cassette and affects the regulation of downstream genes and limits polymerase recycling [113115]. Furthermore, the collection of terminators available to researchers has traditionally been much smaller in breadth than promoters [116], thus limiting large-scale pathways and circuits because of the fear of genetic instability via homologous recombination [117, 118]. Terminators also serve as a control point to tune expression in eukaryotes via the stability of the 3′ end of the mRNA transcript [119121]. Thus, the base of commonly used terminators must be diversified to meet pathway specifications via both discovery and engineering techniques. We highlight various approaches from terminator mining to synthetic design and models in the following sections.

5.1 Native Terminator Mining

To diversify initially from the commonly used terminator library in E. coli, an extensive library of 582 natural and synthetic terminators [122, 123] was constructed and analyzed for its termination efficiency [124]. To enable further terminator engineering, the study also delineated terminator design rules based on a mechanism where RNAP stalls at the U:A tract, allowing an RNA hairpin to form within the RNA exit channel and terminating transcription. It was shown that the composition of the terminator U-tract effectively controls polymerase dissociation and can thus be rationally designed to impact terminator strength. This work served as one of the more exhaustive studies for bacteria to determine alternative terminators for synthetic constructs.

In contrast to prokaryotic intrinsic termination, eukaryotic mRNA transcript stability is regulated by recruited protein factors such as the cleavage and polyadenylation specificity factor (CPSF) and cleavage stimulation factor (CstF) [125]. Thus, terminators must be characterized not only by their termination efficiency but also by their impact on mRNA and protein levels. Yamanishi et al. undertook the first genome-scale flow cytometry characterization of yeast terminators, determining that the majority of terminators enabling higher expression from a synthetic construct came from ribosomal protein genes [120]. A separate, high-capacity terminator library was constructed by selecting a subset of terminators originating from genes shown to have higher mRNA half-lives [121]. Characterization of this library established a direct relationship between terminator strength and mRNA half-life, thus laying the groundwork for terminator design rules. In addition, the utility of these alternative terminators was proven by improved pathway flux with similar or lower promoter strength as those originally paired with a “traditional” terminator. Thus, terminators clearly serve as an important synthetic part that must be rationally specified to tune expression for metabolic engineering applications.

6 Rational Construction of Terminators with Desired Characteristics

6.1 Hybrid Terminator Engineering

Similar to promoters, the hybrid engineering approach has yielded synthetic terminators with enhanced efficiencies. Multiple combinations of both native and synthetic termination signals were used to enhance the termination efficiency of the T7 terminator while retaining its orthogonality [126]. However, this hybrid approach faces limitations in eukaryotes because termination is a highly concerted process regulated by multiple disparate elements (Fig. 3a).

Fig. 3
figure 3

Evolution of terminators in S. cerevisiae. (a) Unlike promoters, native terminators consist of many defined consensus motifs. (b) A minimal terminator scaffold was created by spacing these terminator motifs 10 bp apart. (c) This scaffold was engineered by rationally modifying the linkers between consensus motifs, adding upstream and downstream elements, and changing the length and sequence of consensus motifs. Licensing: Reprinted with permission from Curran KA, Morse NJ, Markham KA et al. (2015) Short, Synthetic Terminators for Improved Heterologous Gene Expression in Yeast. ACS Synth. Biol. 2015, 4, 824–832. doi:10.1021/sb5003357. Copyright © 2015 American Chemical Society

6.2 Synthetic Terminator Scaffolds and Libraries

To overcome the limitations of hybrid terminator engineering in yeast, a synthetic minimal terminator scaffold (TGuo) was constructed by stringing together defined consensus efficiency, positioning, and poly-adenylation elements which cooperate in the cleavage and 3′ polyadenylation of the mRNA transcript (Fig. 3b) [127]. This minimal scaffold was both diversified and enhanced using modified consensus termination elements and mRNA stability elements [128] to produce a library of rationally designed synthetic terminators (Fig. 3c) which were functional in multiple hosts and improved CAD1 expression for itaconic acid production [129]. Importantly, this technique allowed delineation of design rules based on consensus element identity and spacing, enabling potential rational design of synthetic terminators. These resulting terminators were much shorter in size than native terminators with the additional benefit of enhanced mRNA stability and increased protein production. Thus, in a similar fashion as described with promoters above, once a fundamental understanding of molecular function is obtained, synthetic part design can proceed.

7 Sequence-Level Prediction and Specification of Terminators

Although the previously described methods of synthetic terminator design allow rational diversification of the terminator library, they are nonetheless limited by the natural sequence space. Pure de novo design of terminators requires a fundamental understanding of the constraints underlying terminator function. Very early studies have begun to elucidate underlying design principles for terminators; however, this area is lagging behind the progress made with promoters as described above.

7.1 Terminator Characterization and Standardization

To this end, high-throughput studies have been carried out to measure quantitatively the performance of terminators and determine predictive sequence features for design in both prokaryotes and eukaryotes. For instance, systematic variation of terminator U-tract and hairpin stem-loop sequences in the aforementioned E. coli terminator library [124] afforded optimal expression-enhancing consensus sequences for rational construction of synthetic terminators.

Both native and synthetic terminator libraries have been constructed and characterized to tease apart the functions of different terminator motifs [130] in regulating mRNA abundance in yeast [131, 132]. Characterization of these libraries showed that the AU-rich efficiency element upstream of the poly(A) site plays a major role in 3′ end processing and transcription termination. In addition, terminators were broken down into mono- and di-nucleotide K-mers, leading to identification of dA:dT elements as a major determinant in terminator strength. From these studies, it appears that terminators can be broken down into a collection of tunable elements for rational design.

7.2 Thermodynamic Modeling and Prediction of Terminators

To generate a finer continuum of terminator function, it has become necessary to engineer entirely synthetic terminator sequences based on known design rules and thermodynamic prediction. In prokaryotes, multiple biophysical models have been developed to predict terminator strength based on elementary steps in termination, including U:A hybrid formation, hairpin formation, and mRNA transcript dissociation [122, 133, 134]. Training one of these models on a set of natural and synthetic terminators over a large dynamic range in termination efficiencies afforded a linear sequence-function model with a high coefficient of determination (R 2 = 0.81) [134].

In S. cerevisiae, however, terminator function is much less predictable based simply on distinct sequence elements whose function is determined by biophysical models. In fact, characterization of the aforementioned rationally designed synthetic library [129] demonstrated that consensus termination motifs were not entirely additive. This suggests there is a fundamental code underlying termination in yeast which remains to be uncovered before thermodynamic prediction becomes feasible. However, with a more rigidly defined architecture than promoters, yeast terminators are highly amenable to rational engineering for desired characteristics. Thus, creating fundamental models to describe eukaryotic termination and half-life stabilization are required to advance the field of terminator engineering.

8 Future Directions in Promoter and Terminator Engineering

Improved promoters and terminators help minimize the length of the design cycle. Optimal design of these elements must meet three criteria: robustness, orthogonality, and predictable tunability. Promoters and terminators must be robust in that they function consistently regardless of genetic background, genetic context, and cellular environment [135]. In this regard, unexpected deviation from desired promoter or terminator function is a severe hindrance to the rapid development of circuits and pathways leading to multiple iterations of the design cycle. To improve robustness, efforts have been made to create synthetic promoter scaffolds based on highly constitutive promoters which function consistently across many different cellular environments. However, to date, few significant efforts have been made to engineer eukaryotic promoters that are robust to differing genetic contexts. These efforts are also complicated by the fact that eukaryotic promoters are highly regulated by the chromatin environment in which they are placed. It is thus imperative to develop design rules that govern promoter and terminator chromatin environment to predict and control these factors for optimal gene expression. The promise of purely orthogonal elements can bypass some of the robustness issues as these promoters and terminators seem to function more ubiquitously. Overall, many strides have been made in the past 5 years to provide novel expression capabilities to promoters and terminators. However, because of the regulatory complexity of microorganism hosts, new techniques must be developed to predict and design promoters and terminators for desired function. Nevertheless, these new synthetic parts have greatly improved the ability to engineer strains for metabolic engineering and synthetic biology applications.