Introduction

Clustered regularly interspaced short palindromic repeats (CRISPRs) sequences and Cas (CRISPR-associated) proteins are the two elements of an ancient prokaryotic adaptive restriction system conserved in archaeal and other bacterial genomes (Jansen et al. 2002; Makarova et al. 2011; Jinek et al. 2012). CRISPRs represent the memory of the system; they form a repository of short, directly repeating nucleotide sequences that alternate with small unique DNA fragments (Ishino et al. 1987) acquired from previous infections (Bolotin et al. 2005). Cas proteins are the actual effectors (Haft et al. 2005). They are able to process CRISPR sequences into small RNAs (Haurwitz et al. 2010) and to cleave the infectious DNA molecules that match the CRISPR-derived RNA (Marraffini and Sontheimer 2008; Garneau et al. 2010; Gasiunas et al. 2012).

To translate a complex prokaryotic system into a simple genome editing tool, the crRNA (CRISPR RNA) and the tracrRNA (trans-activating crRNA) were fused into a synthetic, small guide RNA (sgRNA) comprised of a hairpin RNA structure that resembles the tracrRNA linked to a 20 bp sequence homologous to the target DNA (Jinek et al. 2012). Of all the Cas proteins (Chylinski et al. 2013), Cas9 is the final effector, able to complex (Nishimasu et al. 2014; Jinek et al. 2014) and to cleave both strands of a DNA molecule after detecting a typical Watson&Crick homologous base pair match with the sgRNA. Genome engineering with this RNA-programmable Cas9 nuclease has broad applications in biology, biomedicine and biotechnology (Charpentier and Doudna 2013). With their colleagues, microbiologist Emmanuelle Charpentier and structural biologist Jennifer Doudna co-authored the seminal publications on this subject and are to be credited for studying and bringing this prokaryotic tool to the attention and the benefit of the eukaryotic world (Jinek et al. 2012, 2014; Charpentier and Doudna 2013).

Different varieties of Cas9 protein are now available for transduction and expression in distinct organisms, and its prototype structure was recently reported (Jinek et al. 2014). Many additional Cas9 proteins from diverse bacteria are being characterized and will soon help to expand the CRISPR–Cas system toolkit (Fonfara et al. 2014).

To begin using the CRISPR–Cas genome editing system, the researcher needs only to identify a 20 bp sequence from the target DNA that is followed by the protospacer adjacent motif (PAM) NGG (Fig. 1a), and clone it into the sgRNA expression vector appropriate for the organism of interest (Ran et al. 2013a, b).

Fig. 1
figure 1

The principle of CRISPR–Cas genome editing. a Example of a CRISPR-binding site in mouse Tyr exon 1 (Giraldo and Montoliu 2002). The N(20) sequence is shown in bold, the PAM motif is underlined. b Typically, a DSB is produced 3 bp upstream of the PAM motif. The cell response to DNA damage includes an error-prone DNA repair mechanism (NHEJ), a HDR mechanism and might trigger apoptosis if the DNA damage cannot be repaired. The editing applications are listed beneath each repair route

As in the case of the zinc-finger nucleases (ZFN; Remy et al. 2010) and transcription activator-like effector nucleases (TALEN; Joung and Sander 2013), genome editing with CRISPR–Cas depends largely on the cell processes triggered by the DNA double strand break (DBS) at the targeted locus (Fig. 1b). The non-homologous end-joining (NHEJ) pathway repairs DNA damage in the absence of template DNA. Small insertions or deletions (indels) are introduced through NHEJ-mediated repair, rendering the resulting genetic modification unpredictable (Barnes 2001). When a DNA donor template is provided, DSB can also be repaired through homology-driven or homology-directed repair (HDR); DSB facilitate the frequency of homologous recombination (Liang et al. 1996; Urnov et al. 2005).

NHEJ-based CRISPR–Cas applications

NHEJ-mediated DSB repair leaves a footprint at the target site; depending on the nature of the two DNA ends, this can range from a single nucleotide exchange, deletion, or insertion to deletion of a few hundred base pairs. This error-prone DNA repair mechanism appears to be based on microhomologies between the two DNA ends. If the DSB overhangs do not allow immediate base pairing, cell exonucleases and polymerases will remove or add nucleotides until microhomology is suitable for DNA end joining (Lieber et al. 2003). This DNA scar, if located in the first coding exon of a gene, will probably result in a frame-shift mutation and almost complete loss of the protein encoded, hence a knockout (KO) allele will be produced simply by targeting the CRISPR–Cas mix to the desired DNA sequence.

Huang and colleagues first showed disruption of an EGFP transgene in mouse and in zebrafish transgenic lines (Shen et al. 2013). This breakthrough report was the first successful proof-of-principle of in vivo CRISPR–Cas use, although no information was provided regarding germ-line transmission of the mutations induced (often found as mosaics). In addition, the mice and zebrafish lines used in this study bore multiple copies of a GFP transgene, further complicating analysis of germ-line transmission.

One of the unique advantages of the CRISPR–Cas system is that NHEJ can be triggered simultaneously at various endogenous loci by co-injection of several sgRNA molecules with Cas9 mRNA. The first of these multiple-targeting experiments in mice, which included as many as five different sgRNA molecules, was reported just 1 year ago (Wang et al. 2013). CRISPR–Cas appears to be the editing tool of choice in mice compared to TALEN or ZFN, not only for its greater activity, but also because of the smaller RNA load needed to achieve such multiple gene targeting events (Wang et al. 2013). Multiplex genome editing using TALEN or ZFN is technically demanding, as the total amount of RNA involved would easily reach toxic concentrations before each individual nuclease reaches its working concentration. This might explain why no experiments involving multiple simultaneous gene targeting events have been reported for ZFN or TALEN.

CRISPR–Cas can be used to target two genes simultaneously on the same chromosome, as was shown for mice (Zhou et al. 2014); this would otherwise require several breeding rounds of two independent KO mouse lines, or two sequential rounds of gene targeting in embryonic stem (ES) cells using distinct, compatible selection approaches. Results were similar in rats, although it was observed that targeting two nearby sequences could lead to deletion of the intervening DNA (Li et al. 2013a). Ma et al. reported the simultaneous disruption of four genes in rats, using one or two sgRNA for each targeted gene (Ma et al. 2014a). Although illustrative of the unprecedented genome editing ability of CRISPR–Cas, these experiments also underline the fact that this approach can easily produce genetically difficult-to-handle animal models, due to the obvious segregation of the many simultaneous mutations in the founder animal.

Generation of full KO mouse lines by CRISPR–Cas embryo injection appears robust and reproducible, although key methodological details remain to be standardized, such as correct RNA concentration (or range) and recommended microinjection route (pronuclear-only, cytoplasmic-only, or a combination of the two). A recent publication provided a detailed technical note to guide microinjectionists in this task; after testing different microinjection routes, the authors recommended cytoplasmic-only microinjections (Horii et al. 2014); microinjection of some material into the pronucleus can be used to confirm positive delivery.

Genome-editing nucleases first allowed generation of targeted zebrafish KO lines (Meng et al. 2008; Bedell et al. 2012), and time-consuming, random ENU mutagenesis was no longer necessary (Kettleborough et al. 2011). Using CRISPR, biallelic KO of multiple targets can now be generated efficiently in zebrafish (Jao et al. 2013).

To mimic large chromosomal rearrangements, distal sequences on the same chromosome can be targeted with different sgRNA, to promote inversion or deletion of the intervening DNA region. Deletions as large as 10 kb were obtained using two distal sgRNA in mouse zygotes (Fujii et al. 2013) and haploid ES cells (Horii et al. 2013). In zebrafish, a 40 kb DNA region was deleted and inverted (Xiao et al. 2013). This strategy can be applied to study large non-coding regulatory elements that might need full deletion, rather than point inactivation, to observe the underlying phenotype. With the CRISPR–Cas system, functional analysis of intergenic/non-coding regulatory DNA sequences, recently annotated by the ENCODE project (Bernstein et al. 2012), is becoming a reality.

Although it is apparently counterintuitive, the error-prone NHEJ repair pathway can promote efficiently targeted transgene integration, with no need for a targeting vector with large homology arms (Auer et al. 2014). The knock-in (KI) vector must bear the same target sequence as the endogenous target region, and must be provided as a circular plasmid. This new method greatly simplifies the complex genotyping strategies, based on laborious Southern blot and long-range PCR assays, needed to identify correctly targeted alleles in the presence of large homology arms (Lay et al. 1998). A homology-independent KI approach was tested in cultured cells with TALEN and ZFN (Maresca et al. 2013), but has not been reported in vivo. Although efficient and straightforward, these KI are thus still largely based on error-prone NHEJ-mediated DNA repair, and the targeted insertion will be accompanied by small indels at the insertion site. For gene-replacement or in-frame reporter gene insertion, one should thus recall that only one-third of the integrants will retain the insertion site open reading frame in the inserted cassette.

Generation and genetic modification of haploid ES cells is a promising tool that will expedite the generation of biallelic complex modified alleles (Li et al. 2012). CRISPR–Cas was used successfully to engineer mouse (Horii et al. 2013) and rat haploid stem cells, generating triple-mutant clones in a single step (Li et al. 2013b).

The CRISPR–Cas system was recently used to generate a double knockout in the Cynomolgus monkey (Niu et al. 2014), a technically challenging species. Cas9 mRNA and an sgRNA mix [directed against Nr0b1 (two sgRNA), Ppar-γ (two sgRNA) and Rag1 (one sgRNA)] were co-injected in intracytoplasmic sperm injection-fertilized eggs. Although some pregnancies were still in gestation at the time of publication, the first two newborn monkeys analysed showed genomic modification of two of the three target genes. Several mutated genotypes were found in the same individual, suggesting distinct CRISPR–Cas cleavage events at different embryonic stages (Niu et al. 2014).

HDR-based CRISPR–Cas applications

Standard ubiquitous targeted gene disruption, although valuable, might not satisfy all the geneticist’s wishes. Conditional mutagenesis allows interrogation of specific tissues, conditions or developmental stages, and is often used to rescue mutated alleles that would otherwise be lethal (i.e., Mastracci et al. 2013). In mice, conditional mutants are generated in a routine but complex process based on traditional gene targeting in ES cells (i.e., using the Cre-loxP system); the resulting mice bearing the genetically-altered loxP-tagged alleles are then bred with the Cre-driver line of choice (Rossant and McMahon 1999). The expression pattern of the Cre lines must be compatible with the organ, tissue or developmental stage in which gene inactivation is to be studied, and requires considerable expertise and resources. For this reason, the task of systematic generation of conditional KOs across all mouse genes has been delegated to specialized large-scale consortia such as the International Knockout Mouse Consortium (IKMC) (Skarnes et al. 2011). Even so, these large consortia often cannot fulfil all researcher interests for inactivation/deletion of a specific exon or introduction of a specific mutation. Targeted nucleases, and particularly the CRISPR–Cas system, might fill this gap, providing a simpler strategy for conditional mutagenesis that does not require ES cell work (Shen et al. 2014).

As commented above, a targeted DSB facilitates homologous recombination by activating the HDR pathway (Rouet et al. 1994). Wang et al. used a combination of an sgRNA and a single-stranded (ss)DNA donor to introduce precise point mutations by homologous recombination into mouse zygotes (Wang et al. 2013). Using two oligonucleotides bearing loxP sites and two sgRNA that targeted the desired insertion sites around a critical exon, they generated a conditional allele directly in a single step by embryo injection (Yang et al. 2013a). This impressive result unequivocally illustrates the power of CRISPR–Cas technology, as this new approach can be used to obtain mouse founders genotypically confirmed to carry a floxed allele in only about 2 months. In contrast, the traditional process (ES cell route) for producing a conditional (floxed) mouse mutant allele might require a month to build the targeting construct, 3 months for transfection, screening and validation of positive recombinant ES cell clones, and 4 to 6 additional months to obtain and confirm a germline-transmitting chimera.

Conditional alleles have been also generated in the rat by embryo injection of one or two sgRNA and a circular targeting vector bearing a floxed exon and short homology arms (Ma et al. 2014b). In this case, the use of two sgRNA appeared to be the most efficient strategy to target loxP sites at the correct location.

In biomedicine, animal disease modelling using sophisticated genome engineering techniques enables customised editing of the same mutations found in patients, rather than full disruption of the causative gene, which is rarely observed in man. The CRISPR–Cas technology could provide innovative, more efficient ways to generate precise animal models of human disease. Use of long, single-stranded oligonucleotides as DNA donor templates to repair the DSB created by these nucleases is the most desirable simplification of the process. Current in vitro studies nonetheless indicate that many parameters require additional optimisation (Yang et al. 2013b), including length of the DNA oligonucleotide, relative position between the cleavage site and the mutation and finally, which DNA strand (or strands) to provide. Quality, purity and concentration of the oligonucleotide DNA donor still lack the necessary standardisation.

In zebrafish, CRISPR–Cas and oligonucleotides were used to introduce an HA tag in two different genes (Hruscha et al. 2013). Accurate small insertions and single nucleotide substitutions were introduced with ssDNA donors; in some cases, additional mutations were found (Hwang et al. 2013), indicating a second NHEJ-repair event subsequent to the first homologous recombination event. Introduction of mutations in the PAM sequence, if compatible with the result desired, will of course prevent re-cutting the edited allele. In mice, correction of a mutation in the Crygc gene rescued a cataract phenotype (Wu et al. 2013). This correction could be driven by the endogenous wild type allele or by an oligonucleotide donor DNA; in this example, dominant-negative mutations with a pathological phenotype were reversed in the absence of a DNA donor, by simply designing an sgRNA specific for the mutated allele (Wu et al. 2013).

The CRISPR–Cas technology has also been welcomed in the field of large animal/livestock transgenesis. The first successful experiments for genetic alteration of pig and cow genomes using CRISPR–Cas systems were reported recently (Tan et al. 2013). In livestock, selective breeding (classical genetics) is traditionally used to transfer valuable traits to the production strain. This process is time-consuming, however, and desired traits are often genetically linked to undesired ones, resulting in lower quality breeds. Targeted nucleases and oligonucleotide donors can be used to introduce allelic variants of agricultural interest directly into the breed of choice. For example, bulls could be genetically dehorned by introducing the Angus POLLED allelic variant (Tan et al. 2013); interestingly, in this report CRISPR–Cas yielded fewer recombinant clones than TALEN.

The CRISPR–Cas system compared with other targeted nuclease platforms

Whereas ZFN design requires complex algorithms and extensive experimental validation to match the target site with a ZF array (Maeder et al. 2008), the TALEN platform benefits from direct correspondence between a set of protein modules (repeat variable di-residue; RVD) and each base pair of the target site (Moscou and Bogdanove 2009) with fewer sequence constraints (Doyle et al. 2012). This modular design allows application of simple, iterative assembly techniques such as Golden Gate (GG) assembly (Engler et al. 2009), which can be completed within a week (Fig. 2a). Solid protocols are reported using TALEN for mouse genome engineering (Hermann et al. 2014). In the case of ZFN, although there are open platforms (Hermann et al. 2012), most molecules are produced commercially and cannot be prepared easily by the researcher.

Fig. 2
figure 2

Comparison of steps for assembly of TALEN and CRISPR reagents in the laboratory. a TALEN assembly needs two consecutive cloning steps, which involve pipetting of more than 40 different compounds. The full process, including assembly and validation in cultured cells, requires at least 7 days. b CRISPR nucleases can be assembled and tested in cultured cells within 5 days. Short annealed oligonucleotides are cloned in an sgRNA plasmid and co-transfected with a Cas9 expression vector. RE restriction enzyme, GG Golden Gate cloning system (Engler et al. 2009), NLS nuclear localisation signal

With CRISPR–Cas, the nuclease recognises its target by Watson&Crick base pairing with the RNA guide molecule. The only sequence constraint is the presence at the 3′ end of the target site of the PAM, an NGG trinucleotide, which occurs on average once every 8 bp in the mammalian genome (Cong et al. 2013).

In a comparative study using human pluripotent stem cells, TALEN mutagenesis efficiency ranged from 0 to 34 %, which reached 51–79 % when CRISPR–Cas was used to target the same set of genes (Ding et al. 2013).

TALEN assembly requires a week’s work by an experienced molecular biologist (Cermak et al. 2011), and the cell must translate and correctly fold two 110 kDa proteins to scan and bind the correct DNA target site. In contrast, the laboratory workload for CRISPR–Cas production is minimal (Fig. 2b) and from the cellular point of view, a 100 bp-long sgRNA is delivered and Cas9 mRNA is translated and folded into a single 150 kDa protein.

The rapid optimisation and wide application of the CRISPR–Cas tools in genome editing are also due to uncomplicated access to inexpensive reagents for academic use from plasmid repositories such as Addgene (Baker 2014). These distribution portals explain the rapid universalisation of CRISPR–Cas reagents.

Cas9 and sgRNA variants (activity and off-targeting)

After codon optimisation and addition of a suitable nuclear localisation signal, Streptococcus pyogenes Cas9 is able to mediate RNA–guided dsDNA cleavage in vertebrates. Last year, four independent groups nonetheless reported that the RNA-guiding process tolerates several mismatches between the RNA guide and the target DNA sequence (Fu et al. 2013; Mali et al. 2013; Pattanayak et al. 2013; Hsu et al. 2013), questioning the specificity of the CRISPR–Cas system. These studies were conducted mainly using cell lines, in which undesired genetic modifications cannot be easily segregated by breeding, as they can in animal models such as mice. These reports nevertheless pushed the scientific community into a search for new, safer strategies. Zhang and co-workers proposed the use of two offset sgRNA together with a new Cas9 variant, D10A, a mutant version that lacks one of the two nuclease domains (Jinek et al. 2012; Fig. 3a). This procedure generates two nearby single-strand breaks (nicks) (Fig. 3b) that are still able to trigger both NHEJ and HDR responses. With this approach, two sgRNA–DNA interactions are necessary to mediate a DSB, while undesired contacts of a single sgRNA with off-target sequences will result in non-mutagenic, easily repairable nicks on just one DNA strand (Ran et al. 2013a, b). Combined use of D10A, known as “nickase”, with duplicated sets of sgRNA boosts the efficiency and nearly abolish the off-target sequences observed (Ran et al. 2013a, b). This double-nicking strategy is effective in mice; Skarnes and colleagues obtained gene KO in the near-absence of off-target mutations (Shen et al. 2014), and Fujii et al. produced a 1 kb deletion by combining four sgRNA (that produced two distal DSB) and Cas9/nickase (Fujii et al. 2014). Use of the Cas9/nickase variant will greatly benefit multiplex editing, as simultaneous targeting of several genes by standard Cas9 protein exponentially increases the risk of off-targeting.

Fig. 3
figure 3

DNA cleavage using wild type and mutant D10A Cas9 proteins. a Above using original Cas9, a double-stranded blunt DNA cleavage is typically observed 3 bp upstream of the PAM motif. Below using the mutant D10A Cas9 variant (nickase), a single-stranded DNA nick is produced in the DNA strand homologous to the sgRNA used. b Two DNA nicks, correctly ordered and spaced, can trigger the same DNA repair pathway and associated mutagenesis as a DSB

Alternative design of guide RNA increased specificity in genome editing; a recent report showed that truncated sgRNA can be more specific on-target than full-length sgRNA (Fu et al. 2014). In the presence of mismatches, shorter sgRNA will interact weakly with DNA. Reducing mismatch tolerance increases specificity. It is tempting to speculate that combined double-nicking strategy and truncated sgRNA design could further improve the specificity of genome engineering.

The relationship between binding and cleavage efficiencies was further studied using a non-functional Cas9 variant combined with sgRNA targeted for a specific locus (Wu et al. 2014). The authors propose a two-state model for Cas9 binding and cleavage, in which a seed match would trigger binding, but extensive additional pairing with target DNA would be necessary for effective cleavage (Wu et al. 2014).

In summary, the widespread success of the CRISPR–Cas technology in less than a year suggests that we are witnessing the birth of a new era in genome engineering. Similar to the revolution in genetic engineering caused by restriction enzymes in the 1970s, the CRISPR–Cas approach (another restriction system from bacteria) appears to offer a new paradigm for genome manipulation. This will enable the production of improved animal models of human disease, the generation of safer and more precise biotechnological products, and possibly a new way to devise innovative gene therapy approaches.