Significance statement

Allele replacement is an important application of genome editing that has potential to overcome the problem of genetic linkage of negative alleles in conventional breeding and accelerate the introduction of novel desirable traits into plants. The article discusses diverse genome editing methods, including the most recent base- and prime editing for the improvement of different plant traits by allelic replacement. A comparative discussion of gene editing literature in vertebrates provides the reader with the latest advances in the field. Remaining bottlenecks are highlighted and areas of future research are discussed.

Inez H. Slamet-Loedin, International Rice Research Institute, Los Baños, Philippines.

Introduction

The past decade has seen the advent and refinement of a wide range of technologies designed to edit nucleic acid sequences in vivo. CRISPR–Cas endonucleases (Zhang 2019) alongside other technologies such as Zinc Finger Nucleases (ZFNs) (Urnov et al. 2010), TAL effector nucleases (TALENs) (Christian et al. 2010) and meganucleases (Epinat et al. 2013) have enabled the facile generation of targeted DNA Double Strand Breaks (DSBs), the first step in the production of most engineered genetic edits to date. Without further intervention, the first responder to an otherwise lethal DSB in the plant genome is the error-prone non-homologous end joining (NHEJ) repair machinery. A frequent outcome of NHEJ is a frameshift insertion/deletion (indel) mutation, a causative factor for gene knockouts. Gene knockouts have proven useful in elucidating the function of plant genes (Curtin et al. 2018; Yuan et al. 2019; Shu et al. 2020) and in engineering multiple traits with commercial value in crops including soy, potato, rice, tomato, citrus or wheat (Haun et al. 2014; Clasen et al. 2016; Wang et al. 2019; Gao et al. 2020b; El-Mounadi et al. 2020). In fact, the first product from a plant variety with engineered knockouts of two gene homologs, a high oleic soybean oil, has entered the market last year (Calyxt 2019).

However, in terms of the potential that gene editing can offer as a means to revolutionize plant science and crop improvement, these achievements represent the tip of the iceberg. Many important agronomic traits are achieved through subtle changes to gene structure, which carefully modulate the biochemical function of the encoded product and/or levels of gene expression, rather than through loss-of-function mutations (Hua et al. 2019). Indeed, the analysis done by Meyer and Purugganan (2013) has shown that single nucleotide polymorphisms (SNPs), transposon insertions, and other types of genetic modifications are the causal mutations in more than half of 62 analyzed domestication and diversification genes across 23 different crop species. Many of these alterations result in functional mis-sense mutations or have cis-regulatory effects that would be difficult to engineer by introduction of indels. Moreover, there is increasing evidence that the genetic complexity underlying domestication and crop improvement consists of a large number of minor-effect QTL alleles likely organized into an epistatic network (Chen et al. 2020; Soyk et al. 2020). Ultimately, additional versatility in the gene editing toolset is necessary to tackle this complexity.

In the presence of a homologous donor repair template (DRT), DSBs also result in a near 100-fold increase in the frequency of homology-directed repair (HDR) (Puchta et al. 1996). HDR-mediated generation of pre-designed edits, referred to as gene targeting (GT) (Paszkowski et al. 1988), until recently had been the only way to create targeted sequence variation apart from indels. Although GT is still the most versatile method capable of all types of precise editing—targeted sequence insertions, replacements, and point mutations—successful sequence insertion and replacement has also been demonstrated via NHEJ (Li et al. 2016; Bonawitz et al. 2019; Lu et al. 2020; Dong et al. 2020b). Additionally, tools completely independent of endogenous DSB repair pathways, such as cytosine and adenine base editors (Komor et al. 2016; Gaudelli et al. 2017), prime editing (Anzalone et al. 2019) and site-specific recombinases (Kapusi et al. 2012; Gao et al. 2020a; Pathak and Srivastava 2020), have also been validated in plants. Each of these technologies offers different levels of success in achieving precise editing (Fig. 1). This review explores the constraints and benefits of these approaches with the goal of providing a guide to selection of genome editing strategies for various applications in plants.

Fig. 1
figure 1

Technologies for allele replacement in plants

Precise editing by HDR mediated allele replacement (gene targeting)

In the GT approach, a DRT is constructed by flanking the desired sequence modifications on each side by regions of homology to the target locus, often referred to as homology arms. When the DRT is delivered to a target cell and a DSB is simultaneously induced in the genome, the DRT can be used for HDR that proceeds via synthesis-dependent strand annealing (for review on the mechanism of HDR repair in plants see Knoll et al. 2014), resulting in incorporation of the edits in the genome. However, even 30 years after its first demonstration in plants (Paszkowski et al. 1988) and with access to targeted DSB-inducing CRISPR–Cas nucleases and other tools, we have yet to see this approach become widely applied in functional genomics and crop improvement. The main reason is that the frequency of HDR in somatic cells is negligible, making routine allele replacement at scale in plants impractical. In fact, GT is usually successful in only 0.1–1% of recovered plants. Although some of the improvements described below have increased these efficiencies, surpassing the 10% mark can be considered exceptional. In stark contrast, NHEJ repair of targeted DSBs can generate indels with close to 100% efficiency in some species (Pan et al. 2016; Ueta et al. 2017; Lee et al. 2019b; Malzahn et al. 2019). This difference is partly due to the restriction of HDR to the late S and G2/M phases of cell cycle, as opposed to NHEJ which operates in both dividing and non-dividing cells (Mao et al. 2008; Charbonnel et al. 2011). Simply put, the main hurdle to effective GT is the need to create conditions under which HDR is favored over NHEJ. The strategies to do so which have been explored are discussed below.

Improving the availability of the DRT by in planta gene targeting

In theory, one way to enhance the use of exogenous DNA as a template for DNA repair is by making it easily accessible to the repair machinery. This can be done on a quantitative, spatial and temporal level. Quantitatively, either the copy number of the DRT in the cell, or the number of cells that receive the DRT can be increased. Using the latter strategy, Fauser and colleagues were the first to address DRT availability on both quantitative and temporal levels in an approach named in planta gene targeting (ipGT) (Fauser et al. 2012). The two features of ipGT are (1) the stable integration of each component (nuclease, DRT and optionally the target gene) in the genome, such that each cell in the transgenic plant is competent in GT (quantitative increase compared to direct delivery to a subset of cells) and (2) the flanking of the DRT with nuclease recognition sites to allow for its controlled release from the genome by the same nuclease used to induce the DSB in the genomic target (semi-temporal control—the DRT is activated into its extrachromosomal, presumably more recombinogenic form in the cells that also actively induce DSBs in the genome). The components including the I-SceI meganuclease as the DSB inducer, the DRT flanked by I-SceI recognition sites, and a truncated gus reporter gene as the target were brought together by crossing of the respective transgenic lines (introducing another level of temporal control). GT was observed in 0.83% of Arabidopsis F2 progeny in this pilot study, on par with the frequency of T-DNA transformation. In maize, antibiotic selection of the edited tissue and two rounds of tissue culture were necessary to recover 0.55% of ipGT events (Ayar et al. 2013), suggesting some species-specific differences might exist in ipGT. The approach was later simplified by simultaneously delivering both the nuclease and the DRT in a single vector. Circumventing the need for crossing in order to activate ipGT removed one layer of temporal control and possibly the quantitative advantage (if the DRT is excised before it is integrated in the genome and passed on to new cells by cell division), but also accelerated the recovery of edited plants by at least one generation. When Cas9 from Streptococcus pyogenes (SpCas9) was applied to target an insertion into an endogenous locus in Arabidopsis, 0.14% of the T2 plants were positive for GT (Schiml et al. 2014).

This simplified ipTG design has since been used by other groups in Arabidopsis with similar success rates (Zhao et al. 2016; Hahn et al. 2018) and the use of Cas9 from Staphylococcus aureus (SaCas9) as a more efficient DSB-inducer expressed from the egg-cell specific EC1.2 promoter further increased the average frequency of GT in the T2 generation to ~ 1% (Wolter et al. 2018). ipGT with SpCas9-mediated DRT release has also enabled targeted gene insertion in 1.29% of tomato primary transformants (Danilo et al. 2018). A significant improvement of the one-vector ipGT method was reported recently in maize. Re-introducing the second level of temporal control by making SpCas9 expression inducible by heat, combined with selection for the excised DRT events (see also the text below on improved selection of GT events) elevated the efficiency of ipGT for targeted gene insertion to a notable 4.7% in the T0 generation (Barone et al. 2020).

However, similar or higher frequencies of GT up to 18% without stable integration and release of the DRT, but with additional modifications in the method of delivery, target tissue, DRT architecture or the type, efficiency and tissue specificity of nuclease expression and DSB induction, have been observed in many species including tobacco (Zhang et al. 2013), rice (Endo et al. 2016), maize (Svitashev et al. 2015, 2016; Shi et al. 2017; Gao et al. 2020a), soybean (Li et al. 2015), tomato (Yu et al. 2017) and cassava (Hummel et al. 2018; Veley et al. 2020). Delivery of a non-released DRT on a plasmid vector into transgenic Arabidopsis plants already expressing SpCas9 from the egg-cell specific promoter EC1.2 (Miki et al. 2018) led to a frequency of GT seemingly comparable with the ipGT experiments that used the same promoter (Wolter et al. 2018), although direct comparison of the efficiency is impossible due to the different ways GT is calculated in the two reports. These findings suggest that factors other than DRT integration and release may have a more significant impact on the frequency of HDR and must be considered when designing GT experiments. Most importantly, strong and timely expression of the DSB-inducer appears to be crucial (Wolter et al. 2018). Some of these factors are addressed next.

Improving the availability of the DRT by increasing its copy number

Another way to quantitatively modulate the availability of the DRT is to increase the copy number of the DRT molecules in the target cell. This can be done either by effectively delivering a large number of exogenous templates or by replication of the template in vivo. Although the discovery of novel DNA delivery methods has gained momentum in the recent years (Demirer et al. 2019; Ma et al. 2020; Schwartz et al. 2020), most research labs are still limited to Agrobacterium-mediated and particle bombardment (biolistic) approaches. Biolistic transformation results in higher transgene copy number, presumably due to the capacity to deliver an increased payload of DNA molecules, as compared to Agrobacterium. Consistent with this assumption, GT with efficiency of up to 4.10% of T0 regenerants was detected in maize subjected to biolistic delivery, while no GT was observed when the same CRISPR–Cas9/DRT reagents were delivered by Agrobacterium (Svitashev et al. 2015). Biolistic delivery also enabled allele replacement at the maize ALS locus and recovery of chlorsulfuron-resistant plants with the frequency of 2% when the CRISPR–Cas9-gRNA complex was delivered as a ribonucleoprotein (RNP) along with an ssDNA DRT (Svitashev et al. 2016). More recently, a larger study targeted gene insertion in 74 different sites in the maize genome and demonstrated that GT frequencies can reach 18% of T0 plants with particle bombardment in the absence of direct selection for editing, although variability in GT frequencies was observed between the different sites and the average GT frequency was ~ 3.5% (Gao et al. 2020a). When the ipGT approach and DRT release were combined with biolistic delivery in rice, point mutations in the ALS genes could be generated with relatively high efficiencies (Sun et al. 2016). Using a similar approach, an SNP was introduced into the rice NRT1.1B gene resulting in precise editing in up to 6.72% of the T0 plants in the absence of selection (Li et al. 2018b).

Despite these promising results, the clear advantage of particle bombardment for HDR-mediated gene replacement is offset by the increased frequency of random DNA integration and genomic rearrangements. These result from additional DNA breaks and shearing caused by the physical force applied to the particles in order to penetrate the cell wall. Over 70% of the analyzed GT events in maize (Svitashev et al. 2015) and most of the T0 GT events in soy (Li et al. 2015) obtained using biolistic delivery carried additional and rearranged copies of the DRT randomly integrated in the genome. While the rice GT events generated in the studies cited above were not analyzed for random DNA insertions, a separate study found these to be at least 14% of recovered rice NHEJ mutants (Banakar et al. 2019). While some of these undesired insertions can be segregated away, this effectively reduces the advantage of this approach over Agrobacterium delivery. Considering these levels of byproducts and the potential of particle bombardment to induce more complex damage including chromothripsis (Liu et al. 2019), alternative methods of increasing the DRT quantity are highly desirable.

One such approach employs geminivirus replicons (GVRs) to produce hundreds to thousands of copies of the DRT and, optionally, the nuclease expression cassette within the target cell (Baltes et al. 2014). The viral components including the short intergenic region (SIR), two copies of the long intergenic region (LIR) flanking the ends of the replicon cargo and the single expression cassette for the Rep/RepA replicase proteins required for replicon formation and amplification are delivered to the target tissue on a T-DNA vector using Agrobacterium. The advantage of this system is that circular DNA replicons released from the T-DNA are expected to remain episomal, since geminiviral DNA rarely integrates into host DNA (Jeske 2009). In addition, RepA (and in some cases Rep) proteins sequester the plant retinoblastoma-related protein to promote cell cycle progression (reviewed by Gutierrez 2000) as well as cell-division (Gordon-Kamm et al. 2002), and (Baltes et al. 2014) showed that expression of Rep/RepA by itself has a stimulatory effect on DSB-induced HDR and GT. In tomato, replicons derived from the Bean Yellow Dwarf Virus (BeYDV) carrying TALENs or CRISPR–Cas9 as well as a DRT induced tenfold higher levels of targeted sequence insertion compared to the same reagents delivered on a standard non-replicating T-DNA (Čermák et al. 2015). Up to ~ 10% of transformed explants developed callus events overproducing anthocyanin pigments due to targeted insertion of 35S promoter sequence upstream of the ANT1 gene. Visual selection for purple pigmentation along with kanamycin selection allowed by the nptII gene, which was included as a part of the targeted insertion cassette, enabled regeneration of precisely edited plants without any additional replicon or T-DNA insertions in the genome. The utility of this system for GT in tomato was further demonstrated by two other research groups who successfully recovered GT events without direct selection for editing phenotypes. Using BeYDV replicons designed to amplify the DRT but not the CRISPR–Cas9 cassette, (Dahan-Meir et al. 2018) were able to precisely restore a 281 bp deletion in a previously generated crtiso mutant. Remarkably, 25% of T0 transformants showed altered fruit color as a result of precise editing. In a separate study, five point mutations were introduced into the HKT1;2 gene with a much lower frequency of 0.66%, also without selection (Van Vu et al. 2020). Authors of this report performed a series of optimizations and were able to improve the performance of the BeYDV system over the original reagent in the ANT1 GT assay (Čermák et al. 2015) by employing LbCas12a for DSB induction, separating the reagents into two smaller replicons and adjusting the plant growth conditions. Consistent with another study (Dahan-Meir et al. 2018), higher efficiencies of GT were observed when the DRT was replicated on a separate replicon. However, this improved version of the system was not used to target HKT1;2, which partially explains the lower frequency of editing. In addition, the efficiency of target cleavage was also lower at HKT1;2 (38%) (Van Vu et al. 2020) compared to CRTISO (90%) (Dahan-Meir et al. 2018). Successful application of GVRs was also demonstrated in potato, although in this case selection was required to obtain GT events, likely due to the low efficiency of DSB induction (Butler et al. 2016). GVRs derived from Wheat Dwarf Virus (WDV) also enabled targeted insertion of a GFP-nptII 2A fusion into rice ACT1 and GST loci with a frequency of up to 19.4% when kanamycin selection was applied (Wang et al. 2017).

Despite the successes seen in tomato, potato, and rice, several issues were encountered when GVRs were tested in some other species. Efficient in-frame insertion of fluorescent marker genes into all three wheat genomes was achieved with WDV replicons in both wheat cells and scutella, but regeneration of edited plants was not reported (Gil-Humanes et al. 2017). In cassava, plants with GT events could be obtained using BeYDV replicons, but most of these events involved imprecise repair and complex genotypes likely connected to stunted growth and chlorosis observed in all of the edited lines (Hummel et al. 2018). Finally, the use of GVRs in Arabidopsis has been largely unsuccessful for a 10 bp sequence insertion (Hahn et al. 2018), introduction of point mutations (de Pater et al. 2018) and targeted marker gene insertion (Shan et al. 2018). While nearly 50% of somatic events were detected in the latter report, only one (out of 160) heritable event was recovered.

Several conclusions can be drawn based on the available data on the use of GVRs for GT in plants. First, efficient DSB induction seems to be critical for effective HDR induction, as inferred from the variability of GT efficiencies in tomato, correlated with target cleavage efficiency (Dahan-Meir et al. 2018; Van Vu et al. 2020). Second, increasing replicon cargo size negatively affects replication and GT frequencies for both BeYDV and WDV (Wang et al. 2017; Dahan-Meir et al. 2018; Van Vu et al. 2020). For the latter, the size limit supporting efficient replication was determined to be 3 kb (Wang et al. 2017), suggesting that limiting the cargo to the repair template and Rep/RepA while delivering the nuclease on a non-replicating T-DNA or by constitutive expression from the genome should yield the best results. Third, there is host-specific variation in the response to the viral replicase machinery that may result in negative effects on plant regeneration and fitness, as exemplified by the inability to recover healthy plants from the edited tissue in wheat (Gil-Humanes et al. 2017) and cassava (Hummel et al. 2018). Additional geminiviruses or Rep/RepA mutants (Diamos and Mason 2019) can be explored for GVR construction to modulate the host defense response and enable GT applications in these and other species. Lastly, the lack of GT induction in the germline cells, from which geminiviruses are naturally excluded (Jeske 2009), might limit the application of this method to tissue culture approaches for plant regeneration instead of direct germline delivery (Shan et al. 2018) used in Arabidopsis transformation.

Improving the availability of the DRT by tethering to the DSB

In addition to optimizing the DRT quantity, positioning of the DRT in proximity of the DSB should increase its spatial availability and the chance of HDR. Indeed, enhanced GT was observed when the DRT was integrated into the same chromosome as the target in Arabidopsis (Fauser et al. 2012). HDR mediated by an interchromosomal endogenous template in close spatial contact with the target was detected in maize (Liu et al. 2020). Interestingly, such events accounted for 1.3% of all repair events triggered by DSBs in 63 different genes. Nevertheless, the data on DRT tethering for GT in plants is limited. In one of the first examples (Butt et al. 2017), the authors decided to tether RNA templates to the DSB by delivering them to rice tissues as an extension of target-specific gRNAs, termed cgRNAs (chimeric gRNAs) and herbicide-resistant ALS mutants were recovered on selection with low frequency. However, it couldn’t be confirmed that the DNA vector used to express the cgRNA in vivo was not used as DRT instead of the cgRNA, since in vitro transcribed cgRNAs produced background levels of GT in protoplasts. More recently, the ssDNA-binding VirD2 protein from Agrobacterium was fused to SpCas9 to tether end-protected ssDNA DRTs to the targeted DSB in rice plants. T0 plants with GT events were recovered at 2 different loci with frequencies of up to 9.87% in the absence of selection, a fivefold increase over the non-tethered control (Ali et al. 2020). The nature of this DRT required biolistic delivery, which when combined with a chemical modification to stabilize the DRT, and the natural function of VirD2 as a mediator of DNA integration is likely to result in off-target insertions of the DRT—a caveat that will need to be addressed before this approach can be widely utilized.

Substantial efforts have been made in yeast and mammalian cells to tether DRTs to the target site via conjugation of single-stranded DRTs to the gRNA by direct synthesis of RNA/DNA hybrids (Lee et al. 2017) or through use of retrons (Sharon et al. 2018); fusions of DRTs directly to Cas9 or by base-pairing with a DNA aptamer conjugated to Cas9 (Ling et al. 2020); fusions of DRT(ssDNA)-binding proteins to Cas9 (Aird et al. 2018; Savic et al. 2018) or to interacting partners of proteins that accumulate near DSBs (Roy et al. 2018); and interaction of biotinylated DRTs with avidin fused to Cas9 (Gu et al. 2018; Ma et al. 2017; Roche et al. 2018) or to gRNAs extended with streptavidin RNA aptamers (Carlson-Stevermer et al. 2017). These techniques increased HDR by 3- to 30-fold compared to GT in the absence of tethering and could provide a similar advantage in plants.

Modified DRT architectures

The length and ratio of the homologous and non-homologous parts of the DRT, type and stranded-ness of the nucleic acid, presence of chemical modifications, and positioning of the DRT relative to the DSB are all important factors to consider when designing GT experiments. For double-stranded (dsDNA) DRTs, the efficiency of HDR in mammalian cells increased with the length of the homology arms up to about 1–2 kb for dsDNA (Baker et al. 2017; Zhang et al. 2017) and ~ 300 nt for ssDNA (Li et al. 2017b), although homology arms as short as 33 bp are sufficient for GT (Paix et al. 2017). ssDNA is superior to dsDNA as a template for HDR in zebrafish (Bai et al. 2020) and can be used as a template for GT, triggered by two tandem nicks in the same DNA target strand without introducing unwanted indels, a common result of DSB repair (Hyodo et al. 2020). RNA DRTs have also been shown to enable HDR in yeast (Keskin et al. 2014). Symmetricity of the DRT is another factor that was reported to have a significant impact on the efficiency of HDR induced specifically with SpCas9. Intentionally designing ssDNA DRTs to be complementary to the PAM-distal non-target strand released by SpCas9 after cleavage and optimizing the ratio of DRT overlap with the 5′ and 3′ end of the DSB provided significant advantage over completely symmetrical DRTs in human cells (Richardson et al. 2016). Chemical modification of linear DRT ends with phosphorothioate linkages (Renaud et al. 2016) or biotin (Gutierrez-Triana et al. 2018), enhanced HDR by preventing the degradation of the DRT or its random and imprecise integration via NHEJ, respectively. Additionally, complexing the DRT DNA with histones to mimic chromatin improved GT by up to 7.4-fold when compared to naked DNA (Cruz-Becerra and Kadonaga 2020).

A comprehensive study of homology arm length for GT in plants is still absent. In most publications, the length of dsDNA homology arms was between 500 and 1000 bp, but GT has been achieved with homology arms as short as 100 bp + 100 bp (Li et al. 2018b) and as long as 6.3 kb + 6.8 kb (Terada et al. 2002). 30 nt + 30 nt of homology was sufficient for GT with an ssDNA donor (Shan et al. 2013). ssDNA and dsDNA DRTs have also not been extensively compared, with the exception of one report which showed similarly low (0.2–0.4%) efficiency of GT in maize with both a short 127 nt biolistically delivered ssDNA oligonucleotide (ssODN) and a long dsDNA template (Svitashev et al. 2015). Comparably low efficiency GT with 144 nt end-modified ssODNs was also seen in flax (Sauer et al. 2016), while unmodified 80 nt ssODNs failed to support GT in rice (Sun et al. 2016). It was later confirmed that modifying the ends with phosphorothioate linkages to protect the DRT from endogenous nucleases was necessary to achieve above-background levels of GT/insertion using oligonucleotide templates in rice (Ali et al. 2020; Lu et al. 2020). Long ssDNA and ssRNA templates were also evaluated in rice, the latter being an exciting DRT alternative to DNA thanks to the potential to produce large quantities of RNA in vivo by DNA transcription. However, the levels of GT with RNA templates reaching up to 0.13% in bombarded callus were 10–20-fold lower when compared to ssDNA templates (Li et al. 2019b).

Given the lack of more comprehensive datasets comparing different DRT architectures, it is not yet possible to establish optimal design guidelines for DRT usage in plants. Based on the success in animal models, more research into DNA repair using ssDNA, asymmetric and chemically modified templates is expected to provide additional solutions to optimize plant GT.

Increasing the success of GT by improving the selection of GT events

The ability to select for editing events based on resulting phenotypes greatly simplifies the identification of precisely edited plants. For this reason, most proof-of-concept studies to optimize technology for precise editing use a small number of well-known genes as targets that confer resistance to a selective agent. Indeed, over 90% of the proof-of-concept articles cited in this review take advantage of some sort of positive selection. Most frequently, point mutations were introduced into endogenous ALS or analogous genes, resulting in resistance to herbicides. Alternatively, targeted insertions of visual or antibiotic/herbicide selection genes, either as entire expression cassettes, or as translational fusions to endogenous genes are also utilized, as well as targeting of endogenous genes that facilitate visual selection. Still, edits in most genes of interest do not result in selectable phenotypes, and strategies to select or enrich for such edits would have immense value for precise gene editing applications.

As one of the first solutions, positive–negative selection schemes were developed long before targeted DSB induction by way of CRISPR–Cas reagents had become routine. In this strategy, the positive marker such as the hptII gene for hygromycin resistance is used for selection of GT events upon its insertion into the target locus, while the negative selection marker such as the diphtheria toxin gene which is positioned to flank the positive marker on the same vector eliminates random insertion and/or illegitimate recombination events. 1% of GT events was generated using positive–negative selection in rice even in the absence of target cleavage (Terada et al. 2002) and variations of this method have been explored mainly in rice (Shimatani et al. 2015). The obvious drawback to this approach is that the positive selection gene is integrated in the target locus. An elegant solution was designed by Nishizawa-Yokoi et al. (2015), who used PiggyBac transposase to seamlessly remove the transgene following selection of rice GT events obtained with a frequency of 0.12%, in the absence of DSB induction. Alternatively, I-SceI cleavage and subsequent single strand annealing (SSA) repair were also recently used to precisely, albeit less efficiently, excise the marker from the rice genome (Endo et al. 2020). While the selection scheme enabled recovery of plants with edits in otherwise unselectable loci, the frequency of HDR was still low and a significant amount of transformation events had to be generated to allow for selection, something which may not be feasible in species other than rice. Coupling positive–negative selection and seamless marker excision with DSB induction by transient nuclease expression should further increase the efficiency of this approach and extend its use to other species.

As opposed to direct selection for precise edits that often requires complex multi-step procedures, modifications in a surrogate reporter can be used to enrich for selectable marker-free GT at the primary locus of interest. For example, 4.7% of maize T0 transformants with T-DNA insertions that underwent DRT excision as a prerequisite for ipGT were found to have GT events (Barone et al. 2020). Selection of these transformants was enabled by an herbicide resistance gene separated from its promoter by the DRT and activated by DRT excision. Even though cells with GT were not directly selected, the enrichment for cells competent in SpCas9-mediated DSB induction and DRT release led to the highest efficiency of GT in maize achieved by Agrobacterium-mediated transformation to date. It is important to note that an additional improvement to the ipGT approach that likely contributed to this result was the use of heat inducible expression of SpCas9 for additional control over synchronization of genomic DSB induction and DRT release (see also the text above on ipGT), as well as the use of morphogenic genes to improve transformation.

While this approach enriched for cells competent in NHEJ-mediated re-ligation of the transgene after DRT release, an alternative co-conversion approach was designed for worms in which the selectable marker was activated by HDR. When animals pre-selected for precise editing of a scorable endogenous marker gene were screened for precise editing in the second, non-selected locus, a significantly higher GT frequency was observed compared to individuals without successful conversion of the marker gene (Arribere et al. 2014). Importantly, the mutated marker gene could be segregated away in the next generation, and clean lines with mutations only in the unselected, targeted locus could be recovered. This study serves as a proof of principle for the development of similar GT systems in plants. While a visual phenotype was used for selection in roundworms, herbicide resistance resulting from point mutations in the widely conserved ALS genes (Darqui et al. 2020) can be used in plants to enrich for GT at the locus of interest. Although examples of co-targeting HDR at two loci in plants are scarce, it’s previous demonstration in wheat cells (Gil-Humanes et al. 2017) suggests feasibility. Nevertheless, the functionality of this approach in plants remains to be tested.

Alternative DSB inducers for increased efficiency of target cleavage

Since DSBs enhance HDR, increasing the efficiency of DNA cleavage at the target locus should lead to increased levels of GT. The finding that the SaCas9 enzyme which generates DSB-mediated indels with higher efficiency compared to SpCas9, induces more GT in Arabidopsis supports this theory (Wolter et al. 2018). It was later shown that at least in vitro, SaCas9 is a multiple turnover enzyme which releases cleaved DNA faster than SpCas9 (Yourik et al. 2019). Faster turnaround resulting in repeated cleavage of a precisely restored target may increase the chance of HDR. However, the more restrictive PAM (NNGRRT) makes SaCas9 a less interesting alternative due to constraints on target site selection. On the contrary, type V CRISPR systems such as Cas12a (Cpf1) have recently gained more attention as a complementary alternative to SpCas9 thanks to their AT-rich PAMs (Zetsche et al. 2015). After target DNA cleavage, new Cas12a crRNAs can displace the R-loop and initiate re-targeting of DNA (Stella et al. 2018). Similar to SaCas9, this means that Cas12a can re-cut the target, possibly more frequently than SpCas9. In addition, Cas12a cleaves target DNA at the PAM-distal end of the protospacer, away from the seed sequence crucial for target recognition. As a result, repetitive cleavage is often not halted by minimal changes of the target sequence in this distal region (Kim et al. 2017) induced by the repair of the first DSB. Contrary to expectation, this does not translate to higher efficiency of DNA cleavage and indel formation compared to SpCas9, likely due to greater sensitivity to temperature and secondary structures in crRNA (Malzahn et al. 2019). However, despite twofold lower frequency of indel formation, LbCas12a increased the frequency of ipGT in Arabidopsis to 1.47% in T2 generation, a 1.5-fold improvement over SaCas9 (Wolter and Puchta 2019). Similar improvement over SpCas9 was seen in tomato (Van Vu et al. 2020). A temperature tolerant mutant designated ttLbCas12a (Schindele and Puchta 2020) offered further 2.4-fold improvement over LbCas12a at 22 °C (Merker et al. 2020). Although without direct comparison to SpCas9, FnCas12a induced high levels of GT (up to 8% of transformed events) in rice (Begemann et al. 2017). LbCas12 was slightly less efficient in producing GT events (Begemann et al. 2017; Li et al. 2020e), but supported HDR with DRTs containing a single homology arm (Li et al. 2018c) and biallelic precise editing in 100% of the identified T0 GT lines (Li et al. 2020e), something which has rarely been achieved with Cas9 (Li et al. 2015, 2018b; Wang et al. 2017; Yu et al. 2017; Dahan-Meir et al. 2018; Hummel et al. 2018; Barone et al. 2020). It has been speculated that the nature of the DSB ends might also contribute to the increased frequency of HDR triggered by Cas12a enzymes (Begemann et al. 2017). Unlike Cas9 which mostly creates blunt ends (Jinek et al. 2012), Cas12a generates staggered DSB ends with 5′ overhangs (Zetsche et al. 2015). In human cells, DSBs resulting in 5′ overhangs have been found to induce the highest frequencies of HDR (Bothmer et al. 2017). Although this was not observed for ipGT in Arabidopsis (Wolter et al. 2018), data from a transient GT assay in tobacco was consistent with the observations in human cells (Čermák et al. 2017). Further comparative analyses of Cas12a and similar enzymes (Ming et al. 2020) in additional plant species should shed more light on the molecular basis of the enhancing effect and general effectiveness of type V CRISPR systems for GT in plants.

Modulating DNA repair to favor HDR

The dependence of DSB-mediated gene editing outcomes on the DNA repair pathway choice offers the option to achieve desired sequence modifications by shifting the balance between different types of DNA repair. HDR and precise gene editing can be promoted indirectly by preventing NHEJ through inactivation of key NHEJ factors or directly by enhancing HDR functionality via overexpression of HDR factors. Both approaches have been applied to plants. In most eukaryotes including plants, DSB detection and repair is initiated by the KU70/KU80 heterodimer that prevents the dissociation and degradation of the free DNA ends and is effectively responsible for the DNA repair pathway choice by recruiting additional proteins involved in NHEJ (Manova and Gruszka 2015). KU70/KU80 binding is followed by end resection, annealing, and then terminates with ligation by the XRCC4 and DNA ligase 4 (LIG4) complex. During the S and G2 phases of the cell cycle, the SMC6/5 complex can initiate DNA repair by HDR using sister chromatids as templates (Watanabe et al. 2009). This type of HDR can be surprisingly frequent in post-replicative cells, reaching over 80% of DNA repair events in barley (Vu et al. 2014). Consistent with the hypotheses outlined above, (Qi et al. 2013) used protoplasts derived from Arabidopsis ku70, lig4 and smc6b mutants to demonstrate that disruption of the respective gene activities led to a significant, three-to-16-fold increase in GT, with the highest increase observed for the ku70 mutant. This is supported by the results from rice, where transient knockdown of KU70 resulted in the highest increase in the expression of several HDR-related genes in response to DSB induction, and enhanced HDR (Nishizawa-Yokoi et al. 2012). Knockdown of KU80 and LIG4 showed smaller but significant increase. Targeting LIG4 for knockout by CRISPR–Cas9 followed by the delivery of a DRT significantly increased GT in rice and facilitated recovery of biallelic als mutants (Endo et al. 2016). In the absence of classical NHEJ (C-NHEJ), end joining is not completely abolished, but switches to alternative, even more error-prone KU70/LIG4-independent microhomology-mediated pathways (A-NHEJ/MMEJ) (Qi et al. 2013). One key factor involved in MMEJ is Polymerase Q (POLQ), recently shown to be essential for T-DNA integration in Arabidopsis (van Kregten et al. 2016). POLQ deficiency changed the ratio of DNA repair choices in favor of HDR by severely reducing the number of end joining events in mouse ES cells (Zelensky et al. 2017) and the moss Physcomitrella patens (Mara et al. 2019). Thus, even if the overall efficiency of editing is unchanged, polq mutants can be used to increase the precision of HDR approaches by reducing the number of unwanted mutations resulting from other types or repair.

The complementary approach to increase GT by overexpressing HDR proteins has not been explored as extensively in plants as it was in mammalian systems and yeast. An enhancement in HDR in the absence of targeted DSBs was seen after overexpression of yeast RAD54 in Arabidopsis egg-cells, but the overall frequency of GT was extremely low (Even-Faitelson et al. 2011). In contrast, overexpression of SlRAD54 in tomato decreased the efficiency of HDR at a targeted DSB and overexpression of SlRAD51 had no effect (Van Vu et al. 2020). Similarly, even when combined with DSB induction, overexpression of AtRAD52 had negligible effect on GT frequency in Arabidopsis that was only observed when the gene of interest was also targeted by RNAi (Samach et al. 2018). The absence of HDR suppressors RTEL1, RMI2 and FANCM1 also did not enhance GT efficiency in Arabidopsis (Wolter and Puchta 2019). Most recently, (Barakate et al. 2020) identified several heterologous HDR proteins with positive effect on intrachromosomal recombination (ICR) in tobacco pollen, including human RAD51 and DMC1 as well as bacterial RecA and RuvC. Unfortunately, this might not be predictive for extrachromosomal HDR as the bacterial RecA was previously found to stimulate ICR but not GT (Reiss et al. 2000) and the same might be inferred for RAD51 as discussed above. Finally, a combination of NHEJ suppression and HDR enhancement by simultaneous knockout of XRCC4 and overexpression of HDR effector proteins CtIP and MRE11 via their fusion to SpCas9 in poplar explants has yielded promising results, with the efficiency of GT reaching 18% (Movahedi et al. 2020). Larger data sets and testing in additional species are needed to confirm this result, but the observation that the recruitment of HDR factors MRN and CtIP to SpCas9 increases the efficiency of GT is supported by results from human cell lines (Charpentier et al. 2018; Tran et al. 2019; Reuven et al. 2019).

Finally, one strategy to promote HDR which has not yet been evaluated in plants is the use of cell cycle-specific regulation of Cas9 activity. Fusions of the human Geminin protein to SpCas9 result in its proteolytic degradation in the late M/G1 phases and accumulation in S/G2/M phases when HDR is active, improving GT by ~ twofold in human cells (Gutschner et al. 2016; Huang et al. 2017). Identification of regulators for cell cycle-controlled protein activation/expression specific to plants could provide additional means to stimulate GT.

The biggest caveat of manipulating DNA repair outcomes is that DNA repair is an essential process and any modifications to its components may result in genome instability. For example, homozygous ku70 mutants in rice suffer from severe developmental defects and sterility (Hong et al. 2010). Arabidopsis ku70, ku80 and lig4 mutants are fertile, but hypersensitive to DNA damaging agents and show signs of telomere instability (West et al. 2002; Gallego et al. 2003; Furukawa et al. 2015). In order to minimize the undesired side-effects compromising the precision of this GT approach, it might be necessary to limit its application to modulation by transient suppression/overexpression. As an example, small molecule inhibitors of NHEJ have provided means for timed control of DNA repair during gene editing experiments in mammalian cells (Chu et al. 2015; Maruyama et al. 2015; Li et al. 2017a). Alternatively, CRISPR interference and activation (CRISPRi/a) can be used for transient regulation of DNA repair factors, as was shown in human, pig and fungal cells (Schwartz et al. 2017; Ye et al. 2018; Li et al. 2019a). Both of these strategies still need to be evaluated in plants.

NHEJ-mediated sequence replacement

The reason why most approaches for allele replacement take advantage of HDR is that it is expected to be precise and result in seamless editing. However, in addition to the low efficiency, HDR can produce undesired outcomes and most events are by far not free of additional rearrangements (see considerations for selecting the right editing approach below). Considering these limitations, an obvious question arises of whether seamless or almost seamless editing could also be achieved through repair by the dominant, although error-prone NHEJ pathway. Aside from NHEJ repair being significantly more efficient and applicable in both dividing and non-dividing cells, the donor sequence design and delivery would be greatly simplified by avoiding the need to include homology arms. In addition, the lack of homology overlaps between the exogenous DRT and the genome would decrease the length of the junction PCR products and facilitate the use of short read amplicon sequencing methods for quantification of positive events.

Several groups performed experiments to test whether targeted sequence insertions can be achieved by providing donors with free ends produced by nuclease-mediated linearization/release from a circular vector or delivered as linear fragments, while simultaneously inducing a single DSB in the site of interest. This was since achieved with the aid of selectable markers in tobacco (Salomon 1998; Chilton and Que 2003; Tzfira et al. 2003), potato (Forsyth et al. 2016), rice (Lee et al. 2019a), and soybean (Bonawitz et al. 2019) plants with variable efficiencies. When compared to homology-flanked HDR donors designed for the same loci targeted by ZFNs, similar or higher efficiencies were seen with armless NHEJ donors for a 7.1 kb targeted insertion in soybean plants (Bonawitz et al. 2019) and up to 20 kb insertion in tobacco cells (Schiermeyer et al. 2019). Moreover, two groups have used SpCas9 to target a 5.2 kb insertion into rice genome via NHEJ with efficiency of up to 6.25% even without direct selection (Dong et al. 2020b; Xu et al. 2020c).

Unlike sequence insertions, two simultaneous DSBs flanking the region of interest are required for sequence replacement via NHEJ. Weinthal et al. (2013) showed that this is indeed possible when they used ZFNs to replace a pre-integrated GFP transgene with a hygromycin resistance gene in both Arabidopsis and tobacco, albeit at relatively low frequency that only allowed them to recover a single edited plant in each experiment. Because error-prone repair is expected on one or both ends of the replaced sequence as was seen in the tobacco plant, DSBs in coding sequences need to be avoided to prevent loss of gene function. With this in mind, Li et al. (2016) designed a strategy to replace whole exons by inducing DSBs in the flanking introns where short indels are not predicted to be deleterious. Using the more potent SpCas9 nuclease, they were able to replace a 245 bp exon in the EPSPS gene with a version containing five nucleotide substitutions. In the absence of direct selection, 2% of the transformed plants were edited.

Perhaps the most intriguing finding relevant to the development of NHEJ-mediated allele replacement methodology was that short dsDNA oligonucleotides protected with phosphorothioate linkages can integrate into targeted DSBs with a surprisingly high frequency. In rice, dsDNA from 26 to 130 bp in length integrated into SpCas9-induced DSBs in up to 47.3% of recovered T0 plants, with an average of 25% across 14 different loci (Lu et al. 2020). Moreover, partially protected (on 5′ but not 3′ ends) PCR products of 526 bp and 2049 bp integrated with efficiencies of up to 16.4% and 6.6%, respectively. Given the high efficiency of targeted insertion, the authors of this study designed an approach using a single DSB to achieve seamless base substitutions by taking advantage of SSA repair. The oligo is designed such that after insertion, it incorporates the desired point mutation(s) and creates a tandem repeat of the genomic sequence present at the cut site while also re-creating the gRNA target. Repeated cleavage with SpCas9 removes the redundant sequence by SSA while preserving the installed edit(s). This strategy, referred to as TR-HDR (tandem repeat-HDR) enabled seamless base substitutions and insertions with efficiencies up to 5.4% and 11.4%, respectively—on par with or exceeding the frequency seen with the classical HDR approach in the absence of selection (Li et al. 2018b). Targeted insertion of 65–75 nt ssDNA was previously also observed in wheat (Wang et al. 2014). Despite the low efficiency (1.4–2.6%) consistent with the results obtained using ssDNA in rice (Lu et al. 2020), it suggests that NHEJ capture of short sequences is not specific only to rice and with proper oligo design, TR-HDR could be extended to other species.

One common limitation to NHEJ mediated approaches is the off-target integration of linear templates prevalent when particle bombardment is used for delivery, reaching up to 10 additional copies of the insert per plant integrated randomly in the genome (Lu et al. 2020). In addition, the insert can concatenate, integrate in both forward and reverse orientation, and often carries mutations at the ends resulting from imprecise end joining, reducing the number of clean events. To some extent, this can be modulated by the choice of the DSB inducer. Sticky ends created by ZFNs, TALENs and Cas12a nucleases may provide the advantage of directionality by allowing ligation of complementary overhangs on the donor and the genome (Maresca et al. 2013). On the other hand, it was suggested that DSBs with overhangs are preferentially repaired by error-prone NHEJ, while blunt DSBs created by SpCas9 can be directly re-ligated resulting more often in precise NHEJ (Geisinger et al. 2016). In mammalian cells, blunt DNA ends from two SpCas9 DSBs are precisely re-ligated in up to 100% of NHEJ events, and NHEJ-mediated targeted sequence integration and replacement is indel-free in up to ~ 60% of events (Geisinger et al. 2016; Danner et al. 2020). Similarly efficient precise re-ligation of SpCas9-cleaved DNA was observed for deletions in tomato (Čermák et al. 2017; Hashimoto et al. 2018) and both deletions and inversions in Arabidopsis (Schmidt et al. 2019) suggesting that precise replacement should also be feasible in plants, even within coding sequences. To ensure directionality, the replacement sequence construct can be designed such that the gRNA target sites are restored if the sequence is inserted in the undesired direction, but integration in desired orientation prevents further cleavage (Danner et al. 2020). Alternatively, both incorrect orientation and additional mutations can be avoided by flanking the donor sequence with short, 5–25 bp homology arms to induce MMEJ, shown to facilitate efficient and precise sequence integration in animals (Nakade et al. 2014; Sakuma et al. 2016).

With careful consideration to the choice of the DSB inducing agent, insertion sequence design, delivery method and potential off-target integrations, NHEJ-mediated allele replacement can provide a surprisingly effective alternative to precise editing by GT.

Tools for DSB-independent allele replacement

The advantage of creating edits independently of DSB repair pathways is in gaining more control over editing outcomes. Moreover, tools that can autonomously induce different types of edits may open the door to multiplex precise editing by providing potentially higher editing efficiencies while avoiding genome rearrangements resulting from multiple DSBs. First such tools have only been developed recently and I will summarize their application in plants.

Base editing

Base editing is an approach in which target bases are enzymatically converted to create point mutations without the need for a DSB or a repair template (Rees and Liu 2018). Base editors (BEs) take advantage of naturally existing or engineered ssDNA deaminases and the fact that Cas enzymes induce a transient ssDNA R-loop formed upon gRNA hybridization with the target genomic sequence. Deaminase fusions to nuclease-deactivated Cas proteins mediate targeted deamination of bases in this ssDNA sequence stretch. Cytosine and adenine base editors (CBE/ABE) have been developed to induce C to T (or complementary G to A) and A to G (or complementary T to C) transitions via conversion of cytosine to uracil (recognized as thymine) and adenine to inosine (recognized as guanine), respectively (Komor et al. 2016; Gaudelli et al. 2017). Cas nickases are often used in base editors to increase the efficiency of editing by nick-induced repair of the non-deaminated strand using the edited strand as template. Uracil glycosylase inhibitors (UGIs) are used to enhance cytosine conversion by preventing uracil excision repair that would lead to reversion of the mutation to the original state. Since the first versions of BEs were developed, an extraordinary amount of research has been done on increasing their efficiency, specificity, and alleviating target sequence restrictions imposed by the size of the editing window and availability of PAM sequences for CRISPR-Cas binding. The latest developments include CBEs capable of C to G editing (Kurt et al. 2020; Zhao et al. 2020) and TAL-effector-based C to T editors employing a dsDNA cytosine deaminase for mitochondrial genome editing (Mok et al. 2020). Consequently, many BE variants are currently available, adapted to a range of sequence preferences and editing functions. Extensive reviews of the BE types and modes of action have been published that can be used as a guide to selecting the right enzyme for each application (Rees and Liu 2018; Anzalone et al. 2020).

Base editing has been shown to work efficiently across plant species including Arabidopsis, cotton, maize, potato, rapeseed, rice, soybean, strawberry, tobacco, tomato, watermelon and wheat (Shimatani et al. 2017; Zong et al. 2017, 2018; Hua et al. 2018; Kang et al. 2018; Tian et al. 2018; Endo et al. 2019; Li et al. 2019c; Zhang et al. 2019; Qin et al. 2020; Cheng et al. 2020; Veillet et al. 2020a; Ariga et al. 2020; Cai et al. 2020; Xing et al. 2020). On average, 20–30% plants exposed to CBE or ABE could be edited at the target locus, although efficiencies as high as 100% (Veillet et al. 2019; Xu et al. 2019) or as low as 0% (Dong et al. 2020a) were observed depending on the target locus, BE architecture, species and type of selection. This is a clear improvement over most HDR-based precise editing methods with efficiencies generally below 10% and mostly below 5% of recovered plants. Moreover, two different selection strategies were designed and applied in rice and wheat to ensure consistently high efficiency of base editing. Hygromycin phosphotransferase was fused to ABE via a self-cleaving 2A peptide and hygromycin selection was used to enrich for rice lines strongly expressing the fusion protein, nearly 100% of which showed edits across 4 different target sites (Li et al. 2020d). In another approach, two gRNAs are used to simultaneously target the locus of interest and create herbicide tolerance mutations in ACC1 or ALS genes. The numbers of plants edited at the non-selected site dramatically increased on herbicide selection (Zhang et al. 2019; Li et al. 2020d). Base editing was also observed with BEs delivered as mRNA or RNP and in a complete absence of selection (Zong et al. 2018; Zhang et al. 2019). Such DNA-free approaches would be impossible with any donor template-based editing approach.

CBEs containing cytosine deaminases from Petromyzon marinus (PmCDA1), rat (rAPOBEC1) and human (hA3A, hAID) and ABEs derived from ABE7.10 (Gaudelli et al. 2017) containing an engineered E. coli tRNA adenine deaminase (TadA) have been explored for use in plants. Although a systematic comparison of all four cytosine deaminases isn’t available, a PmCDA1 CBE outcompeted an rAPOBEC1 CBE by sixfold in terms of editing efficiency when directly compared in rice callus (Xu et al. 2019). PmCDA1-based CBEs also showed high efficiencies of editing in tomato and potato (Shimatani et al. 2017; Veillet et al. 2019). On the other hand, CBEs with PmCDA1 and A3A have larger editing windows (reaching up to 17 nt for A3A) compared to those using rAPOBEC1 (Zong et al. 2018; Xu et al. 2019), which can be an advantage for efficient editing or an issue if a single specific edit is desired. Most CBEs produce significant amount of undesired side-effects such as C-to-G conversions and indels as by-products of base-excision repair (BER) (Li et al. 2017c; Shimatani et al. 2017; Ren et al. 2018; Bastet et al. 2019), with the exception of hA3A CBEs (Zong et al. 2018). In contrast, precision is the main benefit of ABEs which do not induce as much mutagenic BER, while still achieving high levels of A-to-G editing (Hua et al. 2018, 2020b; Kang et al. 2018; Li et al. 2018a). Moreover, CBEs, but not ABEs were shown to induce elevated levels of random base conversions throughout the whole genome in rice (Jin et al. 2019). On the other hand, both CBEs and ABEs are known to cause wide-spread off-target deamination of cellular RNA in mammalian cell culture (Grünewald et al. 2019; Rees et al. 2019), although this effect may not be critical for plant genome engineering thanks to its transient nature contingent on active expression of the BE transgene that can be controlled and/or segregated away. Most importantly, undesirable on- and off-target properties of BEs can be reduced or eliminated by protein engineering. The newest CBE variants A3Bctd-VHM-BE3 and A3Bctd-KKR-BE3 developed for plants by rational design of the human A3A deaminase have almost undetectable levels of gRNA-independent genomic off-target editing and produce mainly single and double C edits, with marginally reduced on-target activity in rice (Jin et al. 2020).

An obvious shortcoming of base editing is that each type of BE can only edit one type of DNA base. In an attempt to increase the flexibility of BEs, dual cytosine and adenine BEs were created by direct fusions of both deaminases to a single Cas enzyme (Li et al. 2020a) or by recruiting each deaminase to a different gRNA extended with RNA aptamers via deaminase-RNA-binding protein fusions (Li et al. 2020b). While the first system, named STEME, was useful in saturating mutagenesis of a single locus and in vivo protein evolution in rice, only the second system, referred to as SWISS (simultaneous wide-editing induced by a single system) is orthogonal and could be used to edit As and Cs at separate sites. However, the ABE function was compromised in this system, yielded lower editing compared to the standard deaminase-Cas ABE fusion and was up to tenfold less efficient compared to CBE function. Nevertheless, rice plants with simultaneous A and C edits at two different sites were still obtained at low frequency. With the advent of the newest C-to-G and C-to-A BEs, it is now possible to achieve any type of base conversion in a two- or three- step process using a combination of different BEs (Zhao et al. 2020). However, this strategy is currently limited to microorganisms due to infeasibility of C-to-A conversions and low efficiency of sequential editing in eukaryotic cells. In addition, the requirement for obtaining a library of BEs with different modalities to perform base editing might present a hurdle for most labs.

Apart from the inability of current BEs to induce different types of mutations including most transversions, one important limitation is the potential of BEs to edit all bases of the given type in the editing window. In combination with the requirement for the presence of a PAM sequence for Cas binding in a specific distance from the edited base, these so-called bystander mutations may significantly limit the number of targets that can be precisely edited using BEs. So far, most base editing studies in plants focused on the evaluation of BE functionality rather than their ability to create specific traits. Consequently, targets were selected based on the convenience provided by the selective advantage of the resulting phenotype or by sequence context and characterized molecularly, disregarding the effect of bystander mutations on the target trait. In addition to herbicide resistance which could be readily achieved (Tian et al. 2018; Veillet et al. 2019; Zhang et al. 2019; Li et al. 2020d; Cheng et al. 2020), exceptions are the semi-dwarf rice mutant engineered by a single C-to-T substitution in the SLR1 gene (Lu and Zhu 2017) and increased sugar content in strawberry plants with base edits in ebZIPs1.1 (Xing et al. 2020). Rice plants with single base substitution alleles of NRT1.1B (Lu and Zhu 2017) for improved nitrogen efficiency, SPL14 (Hua et al. 2018) for enhanced grain yield and BZR1 (Ren et al. 2019) for resistance to thrip feeding were also successfully created although the plants were not phenotyped for the respective traits. Whether additional traits can be engineered at reasonable frequencies with existing and improved BE variants (Jin et al. 2020) despite the target restrictions and potential bytander mutations remains to be seen. For targets that pass the sequence criteria, BEs are currently the most efficient method for precise allele replacement. Further engineering of BEs to minimize the editing windows and undesired side-effects while introducing novel base specificities might make it possible to extend their use to other targets and multiplexing applications.

Prime editing

Prime editing is the latest addition to the gene editing toolbox, designed to combine the versatility of HDR-based methods and the efficiency of base editing in the absence of DSB repair. The method is based on reverse transcription of an editing template from a modified gRNA (pegRNA) directly into the target locus which is nicked and extended by the prime editor (PE)—a fusion of SpCas9(H840A) nickase to a reverse transcriptase (RT) (Anzalone et al. 2019). The pegRNA both guides the SpCas9-RT complex to the target and encodes the RT template with the desired edit as well as a primer binding site (PBS) complementary to the nicked target DNA strand. The length of the RT template is variable, providing flexibility for creating single or multiple base substitutions, short deletions and insertions. Three prime editing strategies have been described, using PE and pegRNA (PE2); PE, pegRNA and a second gRNA nicking the non-edited DNA strand to trigger repair off of the edited strand (PE3); and PE, pegRNA and a second gRNA nicking the non-edited DNA strand with a spacer matching the edited sequence (PE3b). In human cell lines, PE3 installed a range of substitutions and insertions with higher efficiency compared to GT and induced fewer unwanted indels, while PE3b resulted in similar efficiencies with even fewer indels (Anzalone et al. 2019).

However, the efficiency of PEs appears to be limited in plants. Six groups have used PEs to generate precisely edited rice plants (Li et al. 2020c; Lin et al. 2020; Xu et al. 2020a, b; Butt et al. 2020; Hua et al. 2020a), and prime editing was also demonstrated in maize (Jiang et al. 2020) and potato (Veillet et al. 2020b) plants and rice, wheat and Arabidopsis protoplasts (Lin et al. 2020; Tang et al. 2020; Wang et al. 2020). Edits from combinations of base substitutions to a 66 bp insertion (Wang et al. 2020) were created. Nonetheless, all the studies reported similar findings including average efficiencies barely reaching those of HDR-based methods and extreme variation between different targets and pegRNA designs. For example, when the same locus was targeted using six pegRNAs that shared the same spacer and PBS but differed in the RT template, the frequency with which edited plants could be recovered ranged from zero to 31.25% suggesting that pegRNA architecture is a key factor to efficient editing (Xu et al. 2020a). However, no correlation between the efficiency of editing and PBS/RT template length up to 25 nt or the activity of the same gRNA spacer in indel induction and prime editing was found (Lin et al. 2020; Tang et al. 2020), making it difficult to establish effective pegRNA design guidelines for small edits. Longer insertion and deletion edits of > 20 nt were generally inefficient (Lin et al. 2020). In addition, editing byproducts resulting from pegRNA scaffold insertions, random insertions at the second nicking site, indels and unwanted SNPs accounting for up to 30% of all PE-derived edits were frequently observed (Li et al. 2020c; Lin et al. 2020; Tang et al. 2020; Butt et al. 2020; Wang et al. 2020; Jiang et al. 2020). On the other hand, PE3 was not found to be more efficient than PE2 in plants (Lin et al. 2020; Tang et al. 2020; Xu et al. 2020a; Butt et al. 2020; Hua et al. 2020a; Veillet et al. 2020b). Bypassing the second strand cleavage thus offers the possibility to avoid some of the byproducts without compromising on the efficiency of editing.

Some promising attempts to improve the efficiency of prime editing have been made in rice and maize. A strategy previously used for BEs, in which a 2A fusion of the hptII gene to the PE allowed for selection of high expressors, coupled with the use of the enhanced esgRNA, elevated the efficiency of prime editing in rice by up to 22-fold (Xu et al. 2020b). Increasing the expression level of the pegRNA using a composite CmYLCV/U6 promoter improved the efficiency of prime editing in maize plants from zero to 71.7% (Jiang et al. 2020), the highest efficiency seen in plants. Nevertheless, neither approach had the same effect on all pegRNAs tested and other sites were edited with a much lower efficiency. Induction of undesired mutations was also not addressed by these strategies. Conflicting results were obtained on the effect of the temperature increase to 37 °C, which enhanced the efficiency of editing in one study (Lin et al. 2020), but had no effect in another (Tang et al. 2020). Despite the clear potential prime editing offers for allele replacement in plants, it is clear that more research will be needed to discern the design rules for optimal pegRNA architecture to increase the efficiency of editing and reduce the undesired mutations before prime editing can be widely adopted by the plant research community.

Site-specific recombinases

While BEs and PEs are independent of DSB-repair, they are not fully autonomous since they still rely on the function of base-excision repair (BER) and mismatch repair (MMR), which need to be manipulated to stimulate the desired editing outcomes. Distinct from all other technologies reviewed above, site-specific recombinases (SSRs) are fully capable of catalyzing each step in DNA editing, from target recognition and cleavage to edit incorporation and restoration of the original dsDNA state, independently of cellular machinery. Unpredictable DNA modification byproducts resulting from erroneous DNA repair are generally not expected in SSR-mediated editing. Thanks to their specificity and efficiency often reaching the levels of transformation efficiencies (Srivastava and Thomson 2016), they have been widely used for marker excision and site-specific integration (SSI) of transgenes in many plant species (Kapusi et al. 2012; De Paepe et al. 2013; Nandy et al. 2015; Pathak and Srivastava 2020). Recombinase-mediated cassette exchange (RMCE) for precise allele replacement has also been developed (Louwerse et al. 2007; Nanto et al. 2009; Ebinuma et al. 2015; Anand et al. 2019) and deployed for trait development in crop species (Roesler et al. 2016).

Despite the obvious benefits, SSRs evolved single-nucleotide resolution specificity for their cognate targets presenting a barrier for re-engineering and applying them to new targets. Currently, SSI and RMCE are limited to landing pads containing SSR recognition sites pre-integrated into the genome, relying on the random nature of Agrobacterium-mediated or biolistic transformation. To target the recombination to specific sites, landing pads can be integrated in the genome by CRISPR–Cas9-mediated GT. Efficient replacement of a selection marker in the pre-inserted landing pad with the gene of interest by subsequent RMCE was recently reported in maize (Gao et al. 2020a). A single landing pad insertion cassette containing the recombinase sites and the selection marker was used in this case, but in theory two recombinase sites could be inserted to flank an endogenous gene of interest to enable allele replacement. Such a complex approach does not alleviate the risks associated with DNA repair since GT is still required, but can be useful for building allelic libraries of an individual gene in the same genetic background which would be difficult to do directly via GT.

Re-engineering the specificity of SSRs to make them fully programmable is not yet possible, but partial success in re-targeting recombination activity to new sites has been reported in human cell lines and in prokaryotes. Fusion of the modular Gin recombinase to SpCas9 facilitated excision of a sequence from the human genome at low efficiency (Chaikind et al. 2016). Directed evolution has been used to engineer several serine and tyrosine SSRs to act on a limited number of native host sequences (Bogdanove et al. 2018). In a complementary approach, native sequences that can serve as targets for the SSR could be identified using molecular methods (Bessen et al. 2019) or machine learning (Nivina et al. 2020). In the latter study, recombinase target sites that share little sequence homology with the original site could be engineered based on structure, facilitating recombination of completely novel sequences. Continued progress and future application of modern techniques such as machine learning in the design of programmable SSRs may lead to the development of new editing tools with unprecedented efficiency, precision and versatility.

Considerations for selecting the right editing approach

Precision, efficiency and versatility are the most important attributes of any allele replacement approach. Surprisingly, the limitation common across the majority of existing precise gene editing strategies is incomplete precision. GT frequently results in ectopic recombination where the DRT is repaired using the genome as template and inserted at a random site via NHEJ. So-called one-sided events where the DSB is repaired precisely via HDR on one side and imprecisely via NHEJ on the other are also commonly found (Ayar et al. 2013; Schiml et al. 2014; Čermák et al. 2015; Hummel et al. 2018; Shan et al. 2018; Van Vu et al. 2020). The room for error is provided by the nature of the HDR mechanism prevalent in somatic plant cells—the synthesis-dependent strand annealing (SDSA). In SDSA, one 3′ single strand formed by the DSB invades the homologous DRT double strand and elongates while copying the information from the DRT. It can then reanneal back to the original duplex based on homology to conclude precise repair. But the homology on this other end of DRT seems to be dispensable and the duplex restoration can happen via NHEJ leading to imprecise repair. The mechanisms of aberrant GT outcomes have been reviewed in a great detail by Huang and Puchta (2019). Due to the error-prone nature of NHEJ, the lack of precision in the form of indels at insert-genome junctions is perhaps most apparent in NHEJ-mediated replacement (Weinthal et al. 2013; Bonawitz et al. 2019; Xu et al. 2020c; Dong et al. 2020b). Base editing and prime editing suffer from a variety of editing byproducts as discussed above. In addition, bystander mutations may need to be avoided with BEs. While the SSR technology is considered most precise, imperfect events have also been observed (Anand et al. 2019; Pathak and Srivastava 2020). These limitations effectively reduce the frequency of precise editing and special attention needs to be paid to identification of clean events free of editing artifacts in or outside of the target locus.

In terms of efficiency, it is evident from the available data that BEs can install edits much more robustly compared to the other strategies. In spite of high efficiencies seen with some specific combinations of edits and genomic targets, prime editing cannot yet be recommended as an effective approach for allele replacement due to the lack of consistency. Although the efficiency of GT can be improved by a number of methods, these tend to be target or species-specific. Perhaps the best example of this is the strong effect of GVRs on enhancement of GT in some tomato targets and the complete lack of success in achieving GT in Arabidopsis. On the other hand, ku70/ku80 mutants have been successfully used to enhance GT in Arabidopsis, but need to be used with caution in rice and other species due to the pleiotropic effects on genome stability and fertility.

GT along with NHEJ replacement remain the most versatile techniques, being able to generate all types of variation including short and long insertions and full-length gene replacements, as well as point mutations and combinations of different edits. SSRs have comparably wide application with narrow targetability, while prime editing seems to be restricted to small but diverse edits without severe limitations on target selection. Base editing is on the other end of the spectrum in terms of flexibility, being restricted mostly to transition mutations in a finite number of accessible targets.

All of these factors need to be considered together when selecting the right approach for each type of edit and target species. For single base substitutions, BEs seem to provide the best results in most species, as long as the restrictions on the target sequence and the type of base edit can be met. Targeted sequence insertions, particularly of short oligonucleotide sequences (Lu et al. 2020), but also very long DNA cassettes (Bonawitz et al. 2019) have been most robustly achieved via NHEJ. Despite not being seamless, this approach holds promise for applications such as whole gene/exon knock-ins or regulatory element insertions, where the insertion takes place in non-coding sequences. For all other types of edits, the optimal approach may vary across species. Based on the currently available data reviewed here, I compiled the most efficient methods for precise sequence replacement in the main model and crop species (Fig. 2). It should be noted that thanks to the speed of progress in the gene editing field, these guidelines are expected to change as the existing methods are optimized and new tools are developed.

Fig. 2
figure 2

Most efficient allele replacement methods for major plant model and crop species. Citations and application of selection refer to the highest achieved editing efficiency for the given approach and species. *Direct selection for the gene editing event, DRT donor repair template

Prospects for development and application of precise editing as a common tool in plant biology

The advances in precise manipulation of plant genomes have enabled functional genomics and trait development applications never before possible. For example, long shelf-life (Yan et al. 2018) and salt-tolerant (Van Vu et al. 2020) tomatoes were developed by installing single amino-acid substitutions into the respective genes, increased grain yield in maize has been engineered by a promoter swap (Shi et al. 2017) and a set of new alleles controlling sugar content in strawberry has been generated by combinations of base edits and indels (Xing et al. 2020). Allelic series with single or double amino-acid substitutions was also generated to analyze the genetic requirements for resistance to potyviruses in Arabidopsis (Bastet et al. 2019). However, apart from these and a few other highlights, the transition from proof-of-concept to application has been slow and precise editing has mostly been demonstrated using marker genes as targets. As the field is maturing and different types of edits are increasingly possible, more studies analyzing genotype–phenotype relationships are expected to validate the practical value of each technology in development of precisely engineered plants. One novel application that has been little explored in the past and will be facilitated by the ability to seamlessly modify the genome is non-transgenic fine-tuning of gene expression. Subtle changes in gene expression on both the transcriptional and translational levels represent an important source of new phenotypic variation (Rodríguez-Leal et al. 2017; Xing et al. 2020). A number of precise editing techniques (reviewed in Hua et al. 2019) including targeted insertion/modification of transcription factor binding sites or translational enhancers, engineering of μORFs, miRNA binding sites, intron splice sites and cis-genic promoter replacement will allow us to harness this resource.

Additionally, multiplexed precise editing may enable complex editing tasks such as engineering quantitative trait variation or de novo domestication of wild species (Zsögön et al. 2017; Fernie and Yan 2019; Chen et al. 2020).

Multiplexed generation of custom edits has been hampered by the low editing efficiencies. At most, two targets have been simultaneously modified by GT in wheat cells (Gil-Humanes et al. 2017) and three sites have been edited by BEs in rice (Wu et al. 2019). In human cell lines, increasing the efficiency of HDR to very high levels by mutating or inhibiting DNA-dependent protein kinase catalytic subunit (DNA-PKcs) allowed simultaneous homozygous GT at four sites in a single cell (Riesenberg et al. 2019). Unfortunately, this approach is not directly applicable to plants due to the lack of DNA-PKcs genes (Manova and Gruszka 2015). However, several innovative strategies have been developed in yeast for parallel editing of multiple sites based on DRT tethering (Roy et al. 2018; Sharon et al. 2018) and annealing of synthetic oligonucleotides to ssDNA targets during DNA replication (Barbieri et al. 2017), some of which may be applicable in plants. While base editing seems to be best suited for multiplexing applications, successful editing of more than a few targets across the genome is yet to be reported and may require optimization of BE and gRNA expression.

It is clear that continued improvement will be necessary for scarless editing to become routine in plants. One way to further increase the efficiency of editing via GT is to test whether combinations of different improvement strategies will have an additive effect when applied in parallel. For example, improving the availability of the DRT on quantitative, spatial and temporal level while modulating DNA repair to promote HDR may be beneficial. With the advent of powerful directed evolution and machine learning protocols (Thuronyi et al. 2019; Biswas et al. 2020), it is expected that novel specificities and functions will be engineered into BEs and SSRs in the near future and greatly expand the gene editing toolbox. Computational tools have already been developed to predict the precision and efficiency of base editing (Arbab et al. 2020; Marquart et al. 2020) and prime editing (Kim et al. 2020) outcomes in human cell lines. These algorithms may also facilitate the use and further improvement of BEs and PEs in plants. Gene editing studies in non-plant model systems will continue to represent a great source of potentially universal techniques many of which can be adapted to plants as exemplified in this review.

Conclusion

The potential of gene editing in plant science and agriculture cannot be fully realized without the ability to create various types of edits. As such, seamless editing and allele replacement will be critical in transforming the plant genome engineering field. The number of plant species where precise editing has been successfully applied has increased significantly in the last few years, largely as a result of the development of base editing and the improvement of alternative strategies. The rapidly growing number of novel tools and protocols for accurate sequence conversion supports a cautious optimism that this trend will persist, accelerating the progress toward the flexibility to efficiently generate any type of genetic variation desired for the development of new and improved crops.