Introduction: Why recode organisms?

Human manipulation of the genetic code began in the 1960s as molecular geneticists isolated nonsense and missense suppressor mutations. There, tRNAs were altered to insert “incorrect” amino acids at certain positions in proteins, but such mutations lead to an ambiguous code generating variable products and inefficient protein production (Kaplan 1971; Rogers et al. 1992). Recently, genome-scale modification of the genetic code has become feasible, which could enable construction of organisms with unambiguous alternative genetic codes.

A specific codon can be replaced with a synonymous one in the degenerate 64-codon genetic code (Plotkin and Kudla 2011). Done globally with corresponding tRNA removal, this entirely removes a codon from the genome, allowing reassignment for another use (which may be no use). Recoding, or changing a codon’s use in a genome, has been observed naturally in dozens of organisms, but often for stop codons (Ivanova et al. 2014; Ling et al. 2015). By synthetically recoding organisms, we can gain several valuable features (Lajoie et al. 2016; Mukai et al. 2017).

Repurpose codons for non-natural amino acids

With a free codon and tRNA available, non-natural amino acids could be introduced at an unprecedented ~100% incorporation efficiency. Already, tRNA engineering has enabled incorporation of non-natural amino acids into proteins (Dumas et al. 2015; Wang et al. 2014; Young and Schultz 2010), but efficiency is limited due to competing natural translation processes. New amino acids may improve and even expand protein functions (Wang et al. 2006; Xiao et al. 2015), such as by fluorination (Marsh 2014). A novel proteomic signature would also help in identifying escaped engineered organisms.

Virus resistance

In industrial fermentation, virus contamination is a significant issue: entire production runs can be lost because of a bacteriophage (Jones et al. 2000) and is a longstanding concern for dairy industry lactic acid bacteria (Garneau and Moineau 2011; Samson and Moineau 2013). Recoded cells with specific tRNAs removed or used for a novel amino acid should be broadly resistant to decoding infective nucleic acid messages, such as from viruses. A bacterial strain that cannot recognize a common sense codon should be unable to translate essentially any phage gene.

Resistance to horizontal gene transfer

A general problem for the release of engineered microbes into the wild is that, unlike higher animals and plants, microbes readily exchange DNA with each other across species barriers. Synthetic biologists have envisioned and constructed bacteria to decontaminate pesticide-contaminated fields (Mattozzi and Keasling 2010), non-invasively diagnose the presence of chemicals in the gut (Kotula et al. 2014; Riglar et al. 2017), or photosynthetically synthesize biofuels in open ponds (Savage et al. 2008). Such organisms could exchange DNA with other unengineered microbes, with unpredictable environmental consequences. Recoding can block functional horizontal gene transfer: reassigning stop codons as sense and inserting throughout coding sequences would make recoded host genes unreadable by most other microbes, and removing sense codons would make foreign DNA unreadable in the recoded host.

Biocontainment

Repurposing codons for non-natural amino acids also allows for the development of improved auxotrophs. Synthetic amino acids not found in Nature can be inserted into some essential genes, ensuring inability to survive without the non-natural amino acid feedstock. This could create a realistic version of Michael Crichton’s “lysine contingency” biocontainment in Jurassic Park (New York: Ballantine Books, 1990). Another potential strategy uses a toxin to prevent DNA transfer from engineered organism to environmental neighbors. Adding a non-recoded broad-range toxin sequence (ex., endonuclease) to a transgenic cassette, the recoded host cannot express the lethal gene while other organisms acquiring the cassette can. The toxin also selects against reacquiring native tRNA machinery repurposed when recoding.

New/improved functions and genome reduction

Since recoding methods involve entire genome synthesis (discussed below), new gene clusters can be concurrently inserted, along with deletions for genome reduction. Many genes needed to adapt in uncertain, changing environments (Hutchison et al. 2016) are unneeded in controlled settings such as bioreactors. For one-trick pony industrial strains, the compacted genome itself could improve stability (Csorgo et al. 2012).

Learn fundamental biology

Besides engineering applications, recoding can be a platform to address biological questions. Studying cell response to massive codon replacements, new properties related to global transcription/translation mechanisms may emerge. Recoding of viruses has led to the elucidation of mechanisms by comparison with native virus to identify key sequences and codon usage properties (Martinez et al. 2016). Additionally, non-natural amino acid labeling can selectively tag all proteins of a pathway, enabling systems-level mechanistic studies.

Assembly of recoded organisms: recent efforts

Only several genome-wide recoding efforts have been published. While major advances have been presented for viruses (Coleman et al. 2008, 2011; Martinez et al. 2016) and yeast (discussed briefly), this section focuses on efforts recoding bacteria (Escherichia coli and Salmonella). Assembly methods can be classified into three categories: (1) editing the existing genome, (2) rebuilding by segments, and (3) complete de novo construction (Fig. 1).

Fig. 1
figure 1

Recoding assembly strategies. Current recoding methods can be categorized as a editing the existing genome, b rebuilding by segments, and c complete de novo synthesis. a Site-specific point mutations are made throughout the native genome to change target codons, using oligonucleotides. b The native genome is rebuilt in the native host organism through an iterative stepwise procedure with synthetic DNA segments containing the recoding changes. Shown, segments are integrated by homologous recombination. As in (a), different sections can be built separately and combined in a single strain downstream. c An entire genome is made de novo from synthesized fragments and assembled in one pot, bypassing the need for the native genome (and maybe organism). Larger fragments are successively built up from smaller ones

Editing existing: E. coli TAG recoding by MAGE and CAGE (2011–2013)

A first success was George Church’s lab using multiplex automated genome engineering (MAGE) to change all 321 TAG stop codons to TAA in E. coli (Isaacs et al. 2011). In this method, short oligonucleotides are used to make site-specific codon changes through recombination events (Fig. 1a). The strategy made 10 changes per strain in parallel across 32 strains and combined the results using bacterial conjugation, termed conjugative assembly genome engineering (CAGE). Completion of the TAG to TAA recoded E. coli (“rE.coli”) strain included deletion of release factor RF1, which recognizes the UAG stop (Lajoie et al. 2013b).

rE.coli was shown to indeed have recoded advantages of (1) virus resistance (Ma and Isaacs 2016) and (2) was further engineered for non-natural phenylalanine-derivative amino acid incorporation to (3) create auxotrophic strains dependent on supplied synthetic amino acid, with undetectable escape rates of less than 1 in 1012 for effective biocontainment (Rovner et al. 2015).

MAGE-based approaches were also used to look at viability consequences of recoding essential genes (Lajoie et al. 2013a; Napolitano et al. 2016), because of the importance of codon usage bias in controlling aspects of gene regulation (Goodman et al. 2013; Quax et al. 2015; Tuller et al. 2010a, b).

Rebuilding by segments: integrase-based 50 kb fragments in E. coli (2016)

Moving from MAGE, the Church Lab developed another method involving complete synthesis and lambda phage integrase recombination (Ostrov et al. 2016). They also created and used design software to automate the recoded genome blueprint. Entire 50-kb segments of recoded DNA were synthesized de novo in 2–4 kb fragments and combined in yeast with a plasmid backbone. This backbone has an attP integrase site for integration into a strain modified with a corresponding target attB site in a multi-step process. This method was used to reduce from 64 to 57 codons (over 62,000 replacements for “rE.coli-57”) across 87 strains, with the problem of final hierarchical assembly still a work-in-progress.

Rebuilding by segments: testing replacement schemes, REXER in E. coli (2016)

An assembly method from Jason Chin and colleagues—named replicon excision for enhanced genome engineering through reprogrammed recombination (REXER)—uses the larger bacterial artificial chromosome (BAC) for 100-kb segment replacements in an iterative stepping process (Fig. 1b), also using yeast-assisted assembly of synthetic recoded DNA fragments (Wang et al. 2016). The recoded section is excised by Cas9 after transformation and integrated into the genome by lambda Red homologous recombination. In addition, Wang et al. demonstrate a troubleshooting technique for unviable or poorly growing strains due to recoding. Their efforts highlight the major difficulty that many designed synonymous replacement schemes will be unviable, even on a small scale.

Rebuilding by segments: Salmonella leucine recoding using SIRCAS (2017)

A method we (Pamela Silver Lab) developed also uses homologous recombination and tiled antibiotic resistance marker stepping, shown to make 1557 synonymous leucine replacements across 176 genes in Salmonella typhimurium (Lau et al. 2017). Named SIRCAS for stepwise integration of rolling circle amplification segments, the method uses 10–25 kb linear fragments of synthetic DNA obtained from rolling circle amplification of constructs assembled in yeast. This method requires only an initial genomic integration of inducible lambda Red recombination genes, allowing a rapid two-day turnaround for recoded segment integration.

Complete de novo synthesis: a minimal genome in Mycoplasma (2016)

Though not a codon reassignment effort, the major achievement of creating a minimal genome for the already efficient Mycoplasma mycoides (Hutchison et al. 2016) presents an alternative assembly method. Using massive construction from oligonucleotides to assemble increasingly larger fragments (Fig. 1c), the genome was reduced from 1079 to 531 kb. The herculean procedure used an expansion and contraction pragmatic approach, knowledge of essential genes and a Tn5 transposon disruption map. Amazingly, the newly synthesized genomes were introduced into Mycoplasma that then replicated the genome to yield viable strains.

Rebuilding by segments: synthetic yeast chromosomes (2017)

The Synthetic Yeast Genome Project is a huge effort across many organizations to completely build yeast chromosomes from scratch. A set of seven papers published in Science (March 10, 2017) describe construction of five complete chromosomes, which included recoding TAG to TAA stop codons and deleting all tRNAs, to be moved to a tRNA-only chromosome (Richardson et al. 2017). Their methods use yeast’s natural homologous recombination to integrate 30–60 kb segments of recoded DNA, similar to iterative segmented-rebuilds of bacteria. Notably, the yeast project also includes a troubleshooting strategy (Mitchell et al. 2017) that may be useful for bacterial efforts. Recoding methods for yeast can augment yeast genetics studies useful for industrial purposes (Cubillos 2016; Snoek et al. 2016).

Comparison of methods

Likely the best recoding approach will incorporate aspects from several methods. All have a similar global strategy of evaluating partially recoded strains for viability before piecewise assembly into a single organism. Notably, methods are interchangeable in that recoded DNA can be taken from a viable strain and transferred to another, such as using REXER to combine 50 kb sections of rE.coli-57 precursors or 100 kb sections recoded by SIRCAS. The MAGE-based methods to make a handful of changes in a single strain may be useful in later stages of recoding or in adjusting unviable designs (Ostrov et al. 2016). Strain parallelization in each method gives the possibility of rapid construction.

Construction methods are in place, but troubleshooting methods all require a laborious process filled with trial-and-error. Though groups have tried identifying canonical “rules” for sense codon recoding, many of the empirically found guidelines might only apply to those specific sequences/organisms. A robust troubleshooting process would be a major lift to the field and is an essential part of the assembly process. In addition, improved speed (and cost) in high fidelity DNA synthesis would be a huge boost toward fully recoding organisms at the megabase scale.

The future

While several powerful assembly methods have been described, we have only had a glimpse of the properties so attractive in theory. Strains with greater instances of codon replacements are needed to truly attain these properties. For example, many infective messages may not contain the TAG stop targeted in recoding, or viruses may adapt (Ivanova et al. 2014). Promisingly, many partially recoded strains discussed have similar overall growth as wild-type versions or could have reduced fitness improved through evolution (Wannier et al. 2017).

Lessons from natural recoding?

A deeper understanding of evolutionary events in natural genome recoding may reveal new evolution-based strategies to complement the rebuild recoding methods developed to date. Recoding has been observed over twenty times throughout the tree of life (Knight et al. 2001; Ling et al. 2015). Many of these organisms are bacteria with reduced genome sizes and/or AT-rich composition, with theories that events leading to these properties resulted in recoding (McCutcheon et al. 2009; Osawa et al. 1992). Similar mechanisms have been proposed for mitochondria, where across species eight sense and all three stop codons are reassigned (Sengupta et al. 2007), often several together in the compact genomes (Adams and Palmer 2003). While this may suggest a role of using genome reduction in synthetic recoding, these evolutionary mechanisms based on altered global genome properties are likely not effective on the rapid time scales desired.

However, codon reassignments in large eukaryotic genomes—as in yeasts [4–8 K genes, 9–19 Mb (Riley et al. 2016)]—likely required codon-specific selective pressure. In two separate yeast clades, leucine codon CTG is reassigned to translate as either alanine or serine. Species diverging prior to the predicted recoding event contain thousands of CTG positions in coding regions (Riley et al. 2016) that are not conserved in recoded species (Muhlhausen and Kollmar 2014). These CTGs were proposed to be systematically disfavored and driven to rarity by “mischarging” of tRNACAG, which happens in extant yeasts such as Candida albicans (Massey et al. 2003), or an inability to translate CTG efficiently due to the loss of tRNACAG (Muhlhausen et al. 2016). A more thorough analysis and identification of recoded species lineages may uncover evolutionary paths that inspire synthetic efforts.

Even without new evolutionary insights, a pragmatic option may be to apply selection pressures against a specific codon in an experimental setup, to mirror natural evolution toward reassignment. Usage of a specific codon might be disfavored by introducing a competing tRNA isotype to increase missense errors (Ruan et al. 2008; Santos et al. 1999) or impairing translation by deleting tRNA genes (Bloom-Ackermann et al. 2014). Such pressure may allow recoding instances through non-synonymous routes while maintaining viability, potentially fixing unviable strains in whole-rebuild methods.

Other natural examples may inspire more original strategies. In select ciliates, all three stop codons have added sense meanings (Heaphy et al. 2016; Swart et al. 2016), another possible expansion strategy. Instead of being permanently encoded genome-wide, it may be worth recoding in real-time: Mycobacterium bovis uses a hypoxic stress-induced tRNA modification coupled with a distinctly codon-biased set of stress response genes to enter a state of dormancy (Chionh et al. 2016). Bacteria Acetohalobium arabaticum dynamically expands its code to incorporate pyrrolysine when grown with trimethylamine (Prat et al. 2012). Perhaps an inducible system can be designed where only some genes are recoded under certain conditions.

Economics of genome recoding and DNA synthesis

We estimate the total cost of recoding an E. coli-sized bacterial genome (~5 Mb) to be a few million US dollars. This includes raw DNA synthesis plus assembly into large pieces and incorporation by stepwise replacement. The price tag should be considered with the potential benefits in a multi-billion-dollar fermentation industry (Erickson and Winters 2012). Growing genome-scale recoding efforts could fundamentally change the economics of DNA synthesis. Large-scale orders for recoded genomes are easy to conceptualize, can be designed computationally based on annotated genome sequences, and can be ordered at scale, minimizing processing costs. In contrast, conceptualization and rational design of a multi-component genetic circuit of even 10 kb can still be intellectually prohibitive. We see completion of genome recoding efforts as playing a key role in driving down DNA synthesis costs by increasing demand.

Final remarks

Along with recoding current industrial strains, the promise of synthetic genome recoding is to create versatile, genetically isolated base strains on which to build desired functions. Methods are now in place to fully recode a sense codon in bacteria, with major hurdles being a more robust troubleshooting method for non-viable designs and the still unknown effects of such large-scale codon replacements. Since millions of dollars may be prohibitively expensive for one lab, academic recoding may benefit from large-scale collective efforts, such as in the Yeast Synthetic Genome Project, as the process is so amenable to partitioning.