Keywords

Transcription Activator-Like Effectors

Transcription activator-like effectors (TALEs) are proteins originally identified in Xanthomonas , a genus of proteobacteria that includes a huge number of bacterial plant pathogens. During the infection process, a mixture of bacterial proteins, including TALEs, are translocated into the cytoplasm of the plant host cells via a type III secretion system. After translocation into the nucleus, the TALEs mimic the function of eukaryotic transcription factors and bind to cis-regulatory elements of the host genome to control and manipulate cellular pathways, with the final goal of promoting bacterial replication [1].

TALEs are composed of N-terminal secretion and translocation signals, a central domain with DNA binding capability and a C-terminal acidic activator domain coupled to nuclear localization signals that enable translocation into nucleus [2] (Fig. 1).

Fig. 1
figure 1

Schematic of a TAL effector. A TAL effector (TALE) contains an N-terminal translocation domain and eukaryotic-like nuclear localization signals and activation domain within the C-terminal portion of the protein. The central part of the protein contains the DNA binding domain, which consists of a variable number of modules with DNA binding capacity. With the exception of the repeat-like structures 0 and -1, the protein sequence of the modules is highly conserved. The repeat variable di-residues (RVDs) within each module dictate the DNA binding specificity

Protein engineering strategies have been especially focused on the identification of N- or C-terminal truncations aimed at creating artificial TALE-based DNA binding domains that combine minimal size with efficient DNA binding activity. In the next paragraphs we will describe and summarize the development of different TALE scaffolds that have been engineered as versatile carriers for various effector domains.

DNA Binding Domain

The central DNA binding domain consists of a variable number of tandem repeats, generally between 15.5 and 19.5, in which the last repeat is shorter and usually referred to as “half-repeat ”. Each module constituting the DNA recognition domain is composed of 33–35 highly conserved amino acids, with the exception of those in positions 12 and 13 that are hyper-variable and referred to as repeat variable di-residues (RVDs) [2]. These two amino acids hold a key role in defining the nucleotide specificity that is a simple ‘one-to-one code’, in which a single RVD contacts a single nucleotide. Cracking this interaction code allowed researchers to infer unknown target sites of natural TALEs based on their protein sequence and, conversely, set the stage for targeting engineered TALEs to chosen genes by assembling the TALE tandem repeat modules in the appropriate sequence [3, 4]. The crystal structure of a natural TALE protein complexed to its target DNA was resolved in early 2012 and highlighted some interesting aspects on how TALEs recognize their cognate DNA [5, 6]. Each module is arranged in two α-helices connected by a loop that contains the RVD . The modules within an array are connected to form a right-handed superhelix structure with the RVD residues pointing inwards. This protein structure coils around the DNA double helix with the RVD residues directly contacting the major groove, remarkably without altering the structure of the DNA double helix. Moreover, the two amino acids of the RVD seem to have different roles: while the residue in position 12 stabilizes the loop the 13th residue makes the specific contact to the nucleotide of the target DNA.

Although the RVD –nucleotide association code was originally described for 15 different naturally occurring RVDs [3, 4], researchers have early on focused on the four most prominent RVDs present in Xanthomonas TALEs: HD, NN, NI and NG that specify the binding to a C, G or A, A and T, respectively; Table 1). This straightforward code (i.e. four different RVDs to target the four nucleotides) has allowed the creation of a multitude of molecular tools that were successfully used in various organisms, making TALEs one of the most promising platforms for DNA targeting. Nevertheless, a key issue when designing DNA targeting tools is specificity. Researchers have tried to improve the targeting specificity of engineered TALE arrays by using less frequent G-specific RVDs, such as NH or NK. However, despite the expected gain in specificity towards guanine, the implementation of the ‘NK’ RVD (instead of NN) in the context of TALE-based designer nucleases or transcription activators often compromised the activity of the final protein [7, 8]. This effect was particularly evident when the ‘NK’ RVDs were located in the N-terminal end of the array [7, 8]. Molecular modeling simulations using the available high resolution structure of PthXo1 bound to its cognate target DNA [6] revealed a higher affinity of NN-containing compared to NK-based arrays [7]. In addition potential context-dependence of these less frequent RVDs cannot be excluded. By screening the specificity of 23 natural RVDs , Zhang and colleagues reported that the ‘NH’ RVD can be a valuable alternative to NN as it showed an improved specificity for the guanine base while retaining similar levels of biological activity [9]. This interesting feature was simultaneously described in an independent study by Boch and colleagues [8]. However, because of the low numbers of NH-containing arrays that have been tested thus far, it is too early to advise scientists to switch from ‘NN’ to ‘NH’ RVDs when aiming to target a guanine. Hence, despite the dual preference for guanine and adenine, the ‘NN’ RVD is still used extensively to target guanine because of its high overall affinity to purines [10].

Table 1 RVDs specificities

Recently, the use of alternative RVDs has been reported. We have highlighted that targeting specificity can be improved through the educated implementation of non-conventional RVDs, based on their exclusion capacities [11]. Miller and colleagues have explored the potential of novel RVDs in order to improve activity and specificity of previously characterized TALENs [12]. This study revealed a remarkable effect of position and sequence context on TALE-DNA recognition and provided novel non-canonical but binding-competent RVDs that can be employed to generate highly active and specific TALENs.

Particularly interesting in this context was evidence that in neural stem cells, TALE-based transcriptional activators have been reported to be unable to activate the silenced Oct4 promoter [13]. In the same study, a TALE-based transcriptional activator failed to activate an in vitro methylated reporter construct transfected in HEK293T cells. On the other hand, chemical inhibition of DNA methyltransferases using 5-aza-20-deoxycytidine enabled the recovery of its activity. These results highlighted the impact of cytosine methylation on TALE-based molecular tools. Structural studies of the T-specific RVD ‘NG’ have demonstrated that NG can accommodate interactions with 5-methylcytosine (5-mC), which suggested that TALEs could potentially be designed to recognize methylated CpG dinucleotides [14]. Valton and colleagues [15] have reported the implementation of ‘N*’ and ‘H*’ RVDs (the asterisks indicate the lack of the 13th residue) into TALE arrays to efficiently target 5-mC (Table 1). While this report confirmed the incompatibility of the ‘HD’ RVD with 5-methylcytosine, it emphasized the superiority of ‘N*’ and ‘H*’ over ‘NG’ to target this methylated base. As a conclusion, avoiding CpG dinucleotides or pre-screening the target site for the presence of methylated cytosines should improve the success rate in generating functional TALE-based molecular tools.

Exploration of the genome of different plant pathogens allowed for the identification of additional TALE-like proteins in Ralstonia solanacearum and Burkholderia rhizoxinica. The characterized Ralstonia proteins share many similarities with Xanthomonas TALEs, including their nuclear localization and the presence of an acidic activator domain at their C-terminus [16]. The DNA binding domain of Ralstonia TALE-like effectors (RTL proteins) is modular with each repeat unit composed of 35 moderately conserved residues that contain RVDs not previously observed in Xanthomonas’ TALEs. RTLs hence offer a novel set of DNA binding modules with different nucleotide affinities and specificities. Moreover, in contrast to Xanthomonas TALEs, RTLs preferentially recognize a guanosine in position 0 of the target site. On the other hand, the generation of TALE arrays that target alternative nucleotides to the ‘invariant’ 5′-thymidine has been recently reported [17]. TALE-like proteins from Burkholderia (Bat proteins) bind to the DNA with the same code as Xanthomonas TALEs [18]. However, Bat proteins are usually shorter and are formed almost exclusively by the repeat-based DNA binding domain. Interestingly, these repeats are highly polymorphic, sharing less than 40 % sequence identity, and their overall affinity seems to be lower as compared to their Xanthomonas counterparts. We have reported the use of such TALE-like scaffolds from Burkholderia to create designer nucleases [19], and the access to the high resolution structure of such TALE-like proteins also highlighted new interesting DNA targeting features [20]. In summary, since the discovery of the ‘one-to-one’ code of TALEs in 2009, several technical advances have allowed scientist to considerably expand the targeting range and the biological application portfolio of molecular tools based on the TALE platform.

N-Terminal Domain

Early work on engineering TALE-effector proteins have demonstrated that the first 152 amino acids could be deleted from the N-terminus without affecting the protein activity, likely because this region is mainly responsible for the translocation into a plant cell [21, 22]. However, attempts to generate artificial TALEs with even shorter N-terminal portions failed to bind to DNA [23, 24] and the N∆152 truncation version rapidly established as the reference scaffold. The crystal structure has revealed that portion encompassing residue 152 to the first regular repeat module is arranged in two repeat-like structures (usually named as repeats 0 and −1) in which a tryptophan in repeat −1 directly contacts the DNA target site at a 5′-thymidine, which is invariably found in nearly all the natural Xanthomonas TALE target sites [2]. The structure suggests that the first steps during binding of a TALE to the DNA target are mediated by this portion of the protein, likely contributing to the high binding energy that enables subsequent target recognition.

Various domains can be fused to the N-terminal portion of TALE proteins without affecting the binding capability of the final molecule; indeed, detection or purification tags (e.g. FLAG, HA, S) or localization signals (nuclear, mitochondrial) have been successfully fused to native TALEs and variants with truncated N-terminal domains. Interestingly, Yang and colleagues have provided evidence that also functional domains, such as the FokI catalytic domain, can be fused to the N-terminus of native TALEs to generate moderately active nuclease pairs [25]. Based on the N∆152 variant, Beurdeley and colleagues fused the catalytic domain of the I-TevI homing endonuclease to the N-terminus of an engineered TALE. This “compact” TALE nuclease (cTALEN) couples the advantage of a partially selective catalytic domain with the programmable DNA targeting specificity of the TALE protein [26]. The impact of alternative N-terminal variants has been further investigated by Barbas and colleagues, using an incremental truncation-based library screening strategy, to demonstrate that N∆120 or N∆128 TALE variants are advantageous to create different designer enzymes, such as chimeric TALE recombinases [27].

C-Terminal Domain

Most of the studies that focused on introducing alterations in the C-terminal portion of TALE-based designer proteins showed that this domain is less critical for the DNA binding function of the protein. Indeed, swapping the natural transcriptional activation domain with heterologous activator domains to C-terminal TALE truncations resulted in functional protein, regardless of the extent of the truncations [23, 24, 28, 29]. However, in the context of designer nucleases, the length of the C-terminal ‘linker’ that connects the DNA binding domain with the FokI endonuclease domain has an impact on both nuclease activity and spacer length tolerability between the two target half sites. A TALEN scaffold retaining 10–70 amino acids of the C-terminal domain showed higher nuclease activity when compared to TALENs harboring the entire native C-terminal domain [7, 23, 29]. On the other hand, while variants with longer C-terminal ‘linkers’ (>40 residues) showed cleavage activity on a broad range of spacers (12–30 bp), TALENs harboring shorter ‘linkers’—or completely lacking the C-terminal domain—exhibited activity over a more restricted range (13–16 bp) of DNA spacers [7, 23, 29]. Thus, depending on the spacer length of the DNA target site, a fitting C-terminal scaffold should be chosen, taking into consideration that minimizing spacer length tolerability with shorter C-terminal variants may help increase the specificity by limiting off-target cleavage. In conclusion, the versatility of the TALE-based DNA binding scaffold allows for the fusion of various effector domains to its C-terminus, such as transcriptional repressor [30] and activation domains [31], chromatin modifiers [32] and nucleases that have thus far been successfully used to modify the transcriptome, the epigenome and the genome of mammalian and plant cells (Fig. 2).

Fig. 2
figure 2

TALE-based effectors. The fusion of various effector domains to a TALE DNA binding array allows for the generation of tailored effectors. (a) Designer nucleases are used to modify the genome by introducing targeted DNA double strand breaks. (b, c) Targeted regulation of the transcriptome is achieved through employment of artificial transcription factors that enable transcriptional activation or repression. (d) Targeted epigenetic changes are induced by using chromatin modifiers

Assembly of TALE Arrays

While the modular structure of DNA recognition by TALE permits the easy design of specific DNA targeting arrays, the physical assembly of nearly identical modules of ~100 bp turned out to be challenging using traditional cloning strategies. Since the advent of TALE-based molecular tools, several platforms enabling the rapid assembly of such targeting modules have been reported [3342]. All these platforms vary in diverse key parameters, such as the number and preparation of starting building blocks, the flexibility of the final array length that can be assembled and finally their throughput. In the following sections we summarize different TALE array assembly strategies that have been developed in recent years and that can be divided into four categories based on their assembly methods (summarized in Fig. 3 and Table 2): (1) standard cloning, (2) Golden Gate based cloning, (3) solid phase assembly, and (4) ligation independent cloning. The vast majority of the protocols developed rely on the use of type IIS restriction enzymes that cleave DNA at a defined distance from their recognition sites, leaving a 4-bp overhang. Hence, if type IIS recognition sites are placed at the 5′ and 3′-ends of each DNA fragment in inverse orientation, various different four-nucleotide overhangs can be created using a single restriction enzyme. Upon restriction and ligation, the final construct is devoid of the original recognition sites (“seamless cloning”).

Fig. 3
figure 3

Methods available to assemble TALE arrays. Sequences of single or multi-repeat units are encoded in a collection of starting plasmids. (a) Illustration of the standard cloning assembly using parallel hierarchical reactions (left) and of the solid phase assembly that is based on iterative enzymatic elongation (right). (b) Illustration of the Golden Gate “one pot–one step reaction” process (left) and of the ligation independent cloning (LIC) strategy (right)

Table 2 TALE array assembly methods

Alternatively, TALE arrays can either be synthesized de novo or validated constructs can be purchased through commercial companies.

Standard Cloning Assembly

This strategy is conceptually the easiest way to assemble TALE arrays and relies on collections of plasmids encoding single or multiple building blocks, standard restriction/ligation enzymatic steps and amplification in E. coli to create intermediate arrays in a parallel hierarchical manner (Fig. 3a, left). The design of the starting constructs involves incorporation of either isocaudomers (e.g.: unit assembly) or type IIS restriction enzymes (e.g.: REAL, REAL-Fast). Depending on the method, the number of starting plasmids can strongly vary from less than ten [43], to a few dozen [35, 37, 44] or even several hundred [30, 34, 38, 45]. The size of the collection is related to the number of repeats incorporated in each starting building blocks (from one repeat up to four for the largest collections). The preparation of the starting blocks either involves PCR amplification or direct digestion from the plasmid collection. At each round of the assembly cycle, the intermediate products can be characterized by colony PCR or restriction digestion to validate a successful process. Depending on the design of the starting material, 2–8 [25] individual building blocks can be coupled in an ordered fashion in a single cloning reaction. Nevertheless, only some of these assembly methods [37, 38, 43, 45] offer large flexibility in the size of the final array length with more than one or two possibilities. The numerous and fastidious molecular biology steps (restriction, ligation, plasmid DNA isolation and DNA fragment gel purification) of these methods clearly limit the production to low throughput. To assemble TALE arrays of a standard size for genome engineering tools (10–24 repeated units), up to 2 weeks are required (depending on the final array length). However, they present the advantage that the basic molecular biology techniques are already implemented in many laboratories.

‘Golden Gate’ Assembly

The Golden Gate cloning technology was primarily developed to allow enzymatic cloning of multiple DNA fragments in a defined linear order [41, 46]. The strength of this method relies on the fact that the whole cloning process (restriction and ligation) can be performed using multiple DNA fragments (e.g. PCR amplified or plasmids) in a single ‘one step–one pot’ reaction by cycling the experimental conditions (e.g. temperature) for both enzymatic steps (Fig. 3b, left). However, the cloning efficiency, i.e. the total number of positive clones obtained after E. coli transformation and plating, drastically decreases when more than nine fragments are assembled [47]. Thus, to generate a typical TALE array containing more than ten repeats, multiple parallel Golden Gate reactions are required to preassemble sub-arrays that have to be further amplified, either by PCR [24] or by plasmid amplification in E. coli transformation [36, 4750], prior to their fusion with an additional Golden Gate reaction. While the original protocols are based on the use of type IIS restriction enzymes, Yang and colleagues [42] developed an alternative PCR-based method for the preparation of the building blocks. Their strategy relies on the use of uracil containing primers for amplification of the building blocks followed by the assembly of the array after a USER (Uracil-Specific Excision Reagent; [51]) digestion of the PCR products, in a single reaction. Another interesting variant of the Golden Gate strategy using PCR products was developed by Sanjana and colleagues [52] and relies on the circularization of the intermediate array followed by removal of unreacted and non-circular products by exonuclease treatment. Circular arrays are then amplified by PCR and further combined to give the final array. While the absence of an amplification step in E. coli and the possibility to eliminate side products represent valuable features in terms of throughput improvement, the necessity to purify intermediate PCR products might temper these advantages.

All Golden Gate based strategies used up to date to assemble TALE arrays rely on collections of a few dozen plasmids (24–78) or PCR amplified fragments that can be handled by most laboratories without particular instrumentation. They allow for the production of TALE arrays in a timeframe varying from a day to a week, depending on the array length (0.5–30.5) and the type of the intermediate step (PCR or plasmid amplification in E. coli). While the development of such ‘one step–one pot’ assembly methods clearly expands the possibilities beyond classical molecular biology techniques, the numerous intermediate steps, such as plating, colony picking, PCR screening and DNA isolation, clearly hamper their potential towards high-throughput automatization of the production.

Solid Phase Assembly

The solid phase assembly of TALE repeats was developed as a high-throughput alternative to Golden Gate cloning strategies (Fig. 3a, right). By analogy to chemical solid phase peptide or DNA/RNA synthesis, this method is based on the use of magnetic beads or coated wells as solid support. It allows for easy removal of excess material and change of reactant solutions. The arrays are thus bound to the support by an initial building block (or initiator) and then elongated enzymatically step-by-step in a reactant buffer solution. The building blocks are considered as protected when non-digested and as activated when a desired cohesive end is created upon enzymatic digestion. The coupling to the solid phase is brought about a biotin–streptavidin interaction and is easier to implement due to the commercial availability of the solid support and the ease to obtain biotinylated oligonucleotides.

As for the two previously described assembly methods, the solid support strategies rely on the use of either type IIS restriction enzymes [33, 38, 45] or isocaudomers [38, 40]. Collection of building blocks are composed of initiators that are biotinylated (5′-end coding strand), extension blocks and terminators. The Iterative Capped Assembly (ICA) method developed by Briggs and colleagues [33] introduced an important additional step that prevents yield decrease due to incomplete ligation efficiency by blocking unreacted products (blockers). Extension blocks are typically prepared by plasmid digestion or PCR amplification from a collection of single repeats [33] or pre-assembled multi-repeat modules [38, 40, 45]. Depending on the length of the array and the size of the starting blocks, the assembly of full size arrays is achievable within 3 days. Since the process can be easily automated using liquid handling workstations and thanks to the absence of intermediate cloning steps, these methods are amenable to medium and high-throughput production.

Ligation Independent Cloning (LIC )

The strategy released by Schmid-Burgk and colleagues [39] is the sole procedure not involving restriction and ligation steps for the coupling of the building blocks (Fig. 3b, right). Ligation independent cloning relies on the creation of long (up to 30 bp) non-palindromic overhangs that are generated taking advantage of the 3′–5′-exonuclease activity of the T4 DNA polymerase in the presence of only one of the four dNTP’s. To increase the throughput of the assembly process, a collection of 3072 penta-repeats encoding plasmids was created and can replace or be combined with their original collection of di-repeats. Using these collections, a one-step LIC reaction enables the assembly of arrays of various sizes within 3 days. Additionally, a bacterial growth at limited dilution was implemented to rely only on liquid handling steps, so further improving the throughput of the LIC strategy. Interestingly, this improvement can also be implemented in most of the above-described assembly methods to improve their production throughput.

Designer TALEs and Their Use

The discovery of TALEs and their unique way to recognize DNA has had an extraordinary impact on life sciences. The modularity of the TALE DNA binding domain and the simple interaction code with DNA has boosted the development of customizable DNA binding domains for a variety of applications, spanning from basic research to applications in human gene therapy. Since 2009, when the TALE DNA recognition code was uncovered [3, 4], the number of published manuscripts that refer to TALE research, technical improvements of the technology and/or their applications has reached 231 in 2013, over 10 times more than 2009. As discussed above, different toolboxes are available to assemble an array of TALE modules to target a chosen DNA sequence. Naturally occurring TALEs contain from as little as 1.5 to 33.5 repeats in their DNA binding domain, with a median of 17.5 repeat modules [2]; however, a minimum of 6.5–10.5 repeats has been reported to be crucial to achieve a measurable biological activity [3]. Once assembled, the tailored DNA binding domain can be fused to different types of effector domains to create designer enzymes for a large variety of applications. The most successful class of artificial enzymes harboring a user-defined DNA binding domain is certainly represented by designer nucleases. These enzymes combine sequence specificity and cleavage activity, brought together by the fusion of a designer DNA binding domain with a nuclease domain, usually derived from the FokI restriction enzyme. Correct dimerization at a given site allows for the introduction of a targeted double stranded break (DSB ) in the genome of interest. Genome editing is the field that has benefitted the most from the introduction of designer nucleases as a mean to introduce targeted genomic modifications. Once the genomic DNA is naturally or artificially damaged, the cell relies on conserved repair mechanisms to promptly repair the insult and avoid apoptosis. In mammals, one of the two major DNA repair pathways is harnessed upon the introduction of a DSB to ensure DNA integrity: (1) non-homologous end joining (NHEJ) or (2) homology-directed repair which is based on homologous recombination (HR). NHEJ is active throughout the cell cycle and is the fastest way for the cell to repair a DSB; however, it is an error-prone mechanism that can lead to small insertions/deletions (indels) at the break sites with serious consequences, including the loss of gene function if the DSB occurs in a gene-encoding region. Conversely, the HR-based repair mechanism allows for precise correction of a DSB since it relies, in the natural situation, on the genetic information contained in the sister chromatid for DSB repair and it is thus restricted to the S/G2 phases of the cell cycle. HR-based DNA repair is rare in mammalian cells with an event occurring in every 104–107 cells [53]. However, pioneering studies in Dr. Jasin’s lab [54, 55] provided evidence that HR frequency can be increased by several orders of magnitude at a certain genomic position upon generation of a targeted DSB and the concomitant delivery of a donor DNA template homologous to the target site. Under these conditions, the genetic information is conveyed from the donor DNA to the target locus, allowing precise genomic modifications. Thus, by harnessing NHEJ or HR repair pathways at specific genomic locations, one can aim at diverse outcomes like gene disruption, gene correction or gene addition. With these tools in hands, scientists have boosted their knowledge of gene functions by expanding reverse genetics to a huge variety of organisms [53]. Besides basic research, genome editing has found broad applicability in other fields like biotechnology, systems biology or human gene therapy, where this technology has been employed to engineer crop species with novel traits, isogenic cell lines to model human diseases and human cells with a corrected genetic defect [56]. For more than 15 years, Cys2-His2 zinc finger-based DNA binding domains have been used to generate designer nucleases with remarkable success. Zinc finger nucleases (ZFNs ) have represented a milestone in the genome engineering field, allowing genomic manipulations in new species for gene function studies [5762] and in therapeutic contexts to correct genetic defects underlying human disorders [6365]. The remarkable progress achieved using ZFNs is epitomized by a phase I clinical trial for the treatment of HIV [66] that demonstrated that gene disruption can be used to create resistance to HIV infection [67, 68]. However, the limited targeting capacity, the context-dependent effects on DNA binding specificity between the repeat units within a zinc finger-array that make the process of generating tailored DNA binding domains time consuming, and a certain degree of unspecific targeting (the so-called off-target cleveage events) have represented a major impediment for their widespread use [6972].

The discovery of TALE-based DNA binding domains has provided a new customizable platform for the generation of designer nucleases. TALE-based nucleases (TALENs) combine high versatility and superior specificity as compared to the well-established ZFN pairs [23, 73]. TALEN with novel specificities can be designed in a reasonable time [48] to target any given DNA sequence with an average of 3 TALEN pairs per base pair of DNA [74]. The targeting range of ZFNs is much lower with an average of one available pair every 50–500 bp [75, 76]. It is hence easy to understand why TALENs have propelled a remarkable expansion of genome editing strategies in the last years with many academic labs employing this novel technology.

The first report of TALE-based designer nucleases dates back to 2010 [28] and subsequent improvement of the TALEN scaffold and their efficacy [23, 29, 77] has led to their use in a variety of cell lines and organisms, including zebrafish [78], mouse [79], rat [73], non-human primates [80] and human induced pluripotent stem cells (iPSCs) [81]. TALEN have also been successfully applied in human gene therapy models, including the targeted genetic correction of the sickle cell disease mutation in human cells [82], restore gp91phox expression in granulocytes derived from iPSC of chronic granulomatous disease patients [83], to restore Dystrophin expression in Duchenne muscular dystrophy patient-derived cells [84], and for the treatment of recessive dystrophic epidermolysis bullosa [85]. Interestingly, designer nucleases can cleave not only genomic DNA but, when using suitable localization signals, they can be targeted to destroy mitochondrial DNA [86, 87]. Although preliminary, this approach opens new opportunities for the treatment of mitochondrial disorders.

In addition to nucleases, other effector domains can be fused to a tailored DNA binding domain to extend the application portfolio from genome editing to the targeted modification of the transcriptome and epigenome . The concept of modulating gene expression at the transcriptional level using designer transcription factor was successfully addressed using zinc finger-based DNA binding domains. Expression levels of endogenous genes were effectively modulated in murine models of human disorders, highlighting the feasibility of using tailored transcription factors as new therapeutics [88, 89]. With the introduction of TALEs, the availability of an easy customizable DNA binding domain has boosted the use of tailored transcriptional activators and repressors to modulate endogenous gene expression [24, 30]. The use of high-throughput methods to generate huge numbers of TALE-based transcription factors [90] may further expand applications in systems biology for the transcriptional control of entire pathways and to model novel gene networks. To further broaden the use of tailored enzymes, customized DNA binding domains can be fused to histone deacetylases (HDACs) and methyltransferases (HMTs) to achieve targeted epigenetic modifications [91, 92].

Specificity of Tailored TALE-Based Effectors

The future of researchers planning to use designer effectors for permanent modifications of the genome, epigenome or transcriptome seems thriving. Yet, one caveat associated with the use of TALE-based designer enzymes is their genome-wide specificity. Lack of specificity may lead to unwanted side-effects through binding of the effectors to off-target sites that share a certain degree of nucleotide identity with the intended target site. This issue has represented a major obstacle when using first generation dimeric ZFNs [93], which was subsequently overcome by redesigning the FokI dimerizing interface to avoid homodimerization [94, 95]. Most of the efforts to assess the specificity of TALE-based enzymes have focused on microarray analysis after delivering TALE-based repressors [30] or on high-throughput approaches developed to dissect the specificity profiles of ZFNs. Screening of in vitro cleaved libraries or the ability of integrase defective lentiviral vectors (IDLVs) to be trapped in DNA double strand breaks have allowed e.g. to profile the specificity of CCR5-specific ZFNs [70, 72]. Both studies exposed a non-trivial degree of off-target cleavage activity of these ZFNs [71]. The invention of alternative platforms for the generation of designer nucleases, such as RNA-guided nucleases (RGNs) and TALENs, has provided novel substance to the genome engineering field. However, while RGNs can show a high degree of off-target activity [9698], the use of second generation TALEN scaffolds [99] and variant CRISPR/Cas9 designs [100, 101] turned out to be less cytotoxic when compared to second and third generation ZFNs [23, 102]. Importantly, we recently demonstrated that higher specificity is directly linked to lower cytotoxicity [103]. In particular for approaches aimed at clinical translation, these results clearly underline the importance of working with a highly specific nuclease platform, such as TALENs.

Evidently, the intrinsic ability of some TALE repeats to recognize more than a single nucleotide poses concerns regarding their specificity. While NG, NI and HD modules show a prominent preference for a single nucleotide [9], the most commonly used G-specific ‘NN’ module can also bind to adenine. As discussed above, systematic studies identified novel and potentially more stringent TALE modules, which may help to further improve the high specificity of TALENs [104]. An open question is whether the risk of genotoxicity can be reduced by using more specific cleavage domains. Based on this notion, fusion of a TALE-based DNA binding domain to the cleavage domain of the I-TevI homing endonuclease to form a monomeric compact TALEN (cTALEN) have been explored [26]. In this scenario, a second level of safety is intrinsically provided by the I-TevI cleavage domain that is active only when a degenerate CNNNG sequence is present in the target site [105]. While the partial DNA sequence preference of the I-TevI domain reduces the occurrence of potential target sites within a genome, the cTALENs certainly simplify the generation of catalytically active TALENs by overcoming the need to generate two monomers per target site. Additionally, as recently shown by Lin and colleagues [106], TALE-based DNA binding domains can be linked to re-engineered meganucleases to specifically target the human genome. With this approach, the engineered TALE-I-SceI fusion protein targeted to the β-globin gene induced comparable HDR at the target locus as a conventional TALEN but showed a significantly lower toxicity. Similarly to what has been accomplished for ZFNs, adapting the obligate heterodimeric FokI cleavage domain may provide additional benefit in terms of specificity [107], and implementing the use of improved or hyperactive FokI domains could help to generate highly specific and highly efficient designer TALENs [108, 109]. Additionally, a rational target site choice to avoid target sequences that share a high degree of identity with other sites in the genome is probably the most simple way to minimize unwanted off-target cleavage [104], and a number of web-based tools assist researchers with this task [110].

Conclusions

The use of designer nucleases to induce permanent genomic modification is increasing exponentially. Since the first reports of chimeric ZFNs that were envisioned to work as customizable restriction enzymes [111], remarkable progress has boosted their use in human gene therapy. With the introduction of TALENs, the widespread use of these enzymes has increased steadily because of a combination of favorable features, like their ease of design, their efficacy and their specificity as a genome editing tool. The advent of RGNs, which are highly efficient in inducing targeted DBSs and even easier to engineer as TALENs, has further accelerated this trend [112]. Although impressive gene editing efficiencies have been reported in primary T cells [113], one limiting step for the researchers interested in modifying the genome of primary cells is the designer nuclease size. TALENs are rather large proteins and their delivery still represents a challenge, in particular when using viral vectors. Their size as well as their repetitive nature can be limiting parameter for viral vector systems, such as adeno-associated viral vectors and retroviral vectors. However, Holkers and colleagues have recently reported that adenoviral vectors are able to transfer intact TALEN DNA into human cells [114] while Yang and colleagues have packaged TALEN genes into lentiviral vectors by diversifying the nucleotide sequence of TALE repeat modules [115]. While the efficiency of the re-coded TALENs was lower as compared to the canonical counterpart, alternative methods have been explored in the meantime. Non-viral delivery methods, such as the transfection of mRNA molecules that contain the complete TALEN coding sequence, have proven exceptionally efficient in inducing gene knockout in primary T cells [99, 113], thereby setting the stage for the translation of TALEN-mediated genome editing in various clinically relevant cell types in the near future.