Introduction

In the past decade, precise and efficient genome targeting technologies have emerged that enable systematic reverse engineering of causal genetic variations by allowing selective perturbation of individual genetic elements. RNA interference (RNAi), Zinc finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs) have provided valuable technologies for targeted gene regulation in a diverse range of cell types and model organisms (Joung and Sander 2012; Zhang et al. 2010, 2013b; Li et al. 2012; Gaj et al. 2013; Mussolino and Cathomen 2013; Streubel et al. 2012; Chang et al. 2013; Hockemeyer et al. 2011; Shan et al. 2013a; Hannon 2002). RNAi can be used in a relatively rapid, inexpensive, powerful, reliable, and high-throughput method for genome-wide loss-off unction screening (Berns et al. 2004; Boutros et al. 2004), however, RNAi is also imperfect because it only temporary inhibition of gene function and can exhibit unpredictable off-target effects on other mRNAs (Echeverri et al. 2006; Kaelin 2012). In addition, custom DNA-binding proteins, ZFNs and TALENs are hybrid proteins created by fusing ZF or TALE DNA-binding domain to the nonspecific cleavage domain of FokI endonuclease (Nekrasov et al. 2013; Gaj et al. 2013; Bogdanove and Voytas 2011). The FokI endonuclease nonspecific cleavage domain must dimerize to cleave the DNA target (Klug 2010; Moscou and Bogdanove 2009). ZFNs and TALENs can be programmed to cleave genomes in specific locations, however, these technologies demand elaborate design and assembly of individual DNA-binding proteins for each DNA target sequence. These chimeric nucleases have been successful in genome modifications by generating DNA double-strand breaks (DSBs) that stimulate the standard cellular DNA repair mechanisms, including error-prone non-homologous end joining (NHEJ) and homology-directed repair (HDR) (Weinthal et al. 2010; Gaj et al. 2013; Charpentier and Doudna 2013). NHEJ-mediated repair typically leads to the indels and introduction of small deletions/insertions at the site of the break, resulting in knockout of gene function via frameshift mutations (Gaj et al. 2013; Chang et al. 2013). HDR, however, requires a homologous DNA segment as a template to correct or replace existing genes (Weinthal et al. 2010; Charpentier and Doudna 2013; Gaj et al. 2013). Although the low efficiency of HDR in a variety of cell types and organisms, it can be used to generate precise, defined modifications at the target site.

Recently, another markedly simple, versatile, efficient and breakthrough genome engineering technology for genome editing, the clustered regularly interspaced short palindromic repeats-CRISPR-associated protein (CRISPR–Cas) system, was developed.

Mechanisms of the CRISPR–Cas defense system

In Ishino et al. (1987), a research team observed an unusual repetitive segment of neighbouring bacterial gene. Before 2005, many researchers assumed that these odd sequences were junk, however, three various groups reported that these segments often matched the sequences of phages or plasmids, and indicating a possible role for CRISPR in immunity against transmissible genetic elements (Bolotin et al. 2005; Mojica et al. 2005; Pourcel et al. 2005). CRISPR–Cas systems constitute a widespread class of immunity systems that protect bacteria and archaea from invading viruses and plasmids via RNA-guided DNA cleavage in three steps (Wiedenheft et al. 2012; Gaj et al. 2013; Marx 2007) (Fig. 1). During the acquisition phase, recognition and subsequent integration of viral or plasmid DNA-derived spacers between two adjacent repeat units within the CRISPR loci (Barrangou et al. 2007; Garneau et al. 2010; Yosef et al. 2012; Deveau et al. 2008; Swarts et al. 2012; Datsenko et al. 2012; Cady et al. 2012; Lopez-Sanchez et al. 2012; Deltcheva et al. 2011) (Fig. 1). During the expression phase, the CRISPR loci are transcribed as a precursor CRISPR RNA (pre-crRNA) containing the full set of CRISPR repeats and embedded invader-derived sequences from the leader region (Deltcheva et al. 2011). Next, specific endoribonucleases cleave the pre-crRNAs into short guide CRISPR RNAs (crRNAs) consisting of unique single repeat-spacer element (Deltcheva et al. 2011; Brouns et al. 2008; Carte et al. 2008; Haurwitz et al. 2010; Hatoum-Aslan et al. 2011; Garside et al. 2012; Gesner et al. 2011; Sashital et al. 2011; Charpentier and Doudna 2013) (Fig. 1). During the interference phase, the mature crRNA is incorporated into a large multiprotein complex, called CRISPR-associated complex for antiviral defense (CASCADE), can recognize and base-pair specifically with regions of incoming cognate-invading nucleic acids that have perfect complementarity, triggering degradation or silencing of the foreign sequences (Garneau et al. 2010; Wiedenheft et al. 2012; Deveau et al. 2010; Horvath and Barrangou 2010; Koonin and Makarova 2009; Marraffini and Sontheimer 2008, 2010a, b; Sorek et al. 2008; van der Oost et al. 2009; Waters and Storz 2009; Hale et al. 2009; Beloglazova et al. 2011; Jore et al. 2011; Mulepati and Bailey 2011; Wiedenheft et al. 2011a, b; Makarova et al. 2011) (Fig. 1).

Fig. 1
figure 1

Diversity of CRISPR-mediated adaptive immune systems. CRISPR–Cas systems act in three stages: acquisition, expression and interference. Specific protospacers (with an adjacent PAM) of double-stranded DNA from a invading virus or plasmid are acquired at the leader end of a CRISPR array on host DNA. Each CRISPR locus consists of a series of direct repeats separated by unique spacer sequences acquired from protospacers (Marraffini and Sontheimer 2008, 2010b). After the initial recognition step, Cas1 and Cas2 usually located in the vicinity of the CRISPR array, most probably incorporate the protospacers into the CRISPR locus to form spacers. Pre-crRNA is transcribed from the leader region by RNA polymerase and further processed into short mature crRNAs. The interference process is different in the Type I, II, and III systems. In type I and III, the CASCADE complex binds pre-crRNA, which is cleaved by a CRISPR-specific endoribonuclease, resulting in crRNAs with a typical 8-nt upstream of each spacer sequence (Gesner et al. 2011; Haurwitz et al. 2010; Carte et al. 2010). In type III, Cas6 is responsible for the processing step, but the crRNAs seem to be transferred to a specific Cas complex (Csm in subtype III-A and Cmr in subtype III-B) (Carte et al. 2008). In Type II, a tracrRNA with the repeat region of the pre-crRNA, followed by cleavage within the repeats by the host RNase III in the presence of Cas9 (Deltcheva et al. 2011). The final step results in cleavage of invading nucleic acid and proceeds compelling differences in all systems. In Type I, crRNA with CASCADE complex along with the Cas3 subunit can target that contain complementary target DNA and is probably responsible for cleavage of invading DNA (Sontheimer and Marraffini 2010; Jore et al. 2011; Wiedenheft et al. 2011b; Sinkunas et al. 2011). The two subtypes of CRISPR–Cas type III systems target either DNA (subtype III-A Marraffini and Sontheimer 2008) or RNA (subtype III-B Hale et al. 2009) and a PAM does not appear to be required for the activity of Type III. In Type II, Cas9 loaded with crRNA can probably target invading DNA for cleavage (open orange triangle) in a process that requires the PAM (Haurwitz et al. 2010). Modified from (Makarova et al. 2011)

Architecture and characters of CRISPR systems

CRISPR loci typically consist of several noncontiguous, highly conserved direct repeats separated by stretches of variable sequences called spacers which mostly correspond to sequences of captured viral and plasmid sequences and are often adjacent to groups of conserved protein-encoding genes, named cas genes (Horvath and Barrangou 2010). Based on recent bioinformatic analyses, cas genes encode a large and heterogeneous family of proteins that carry identifiable functional domains typical of nucleases, helicases, polymerases, and polynucleotide-binding proteins, which led to the initial speculation that they may be part of a novel DNA repair system. CRISPR–Cas system can be divided into two partially independent subsystems: the highly conserved ‘information processing’ subsystem involved in the adaptation phase and requires the universally present core proteins, Cas1 and Cas2, and the ‘executive’ subsystem, involved in crRNA processing and interference with invading foreign nucleic acid, and is quite diverse (Bhaya et al. 2011; Makarova et al. 2011; Horvath and Barrangou 2010; van der Oost et al. 2009). Repeat-associated mysterious proteins (RAMPs or Cmr) that constitute a large superfamily of Cas proteins, contain at least one RNA recognition motif (RRM; it is also called the ferredoxin-fold domain), which is somewhat functionally analogous to CASCADE, and have been shown to be involved in the processing of pre-crRNA transcripts (Makarova et al. 2011; Hale et al. 2009; Horvath and Barrangou 2010).

Based on this classification that integrates phylogeny, gene conservation, locus organization, and content, CRISPR–Cas system have recently been classified into three distinct, type I, type II, and type III (Bhaya et al. 2011; Wiedenheft et al. 2012; Makarova et al. 2011) (Fig. 1). The classification reflects an evolution of the defense system into subtype-specific molecular mechanisms for expression and maturation of crRNAs and interference with invaders (Makarova et al. 2011). The type I and III systems share some biochemically and structurally features: multiple specialized Cas proteins that form CASCADE-like complexes with demonstrated RNAse activity are present in several copies in both type I and III system. Cas endonucleases processes pre-crRNA into mature crRNAs, and each crRNA assembles into a large Cas effector complexes use these processed crRNAs to recognize and cleave cognate-invading nucleic acids (Haurwitz et al. 2010; Jinek et al. 2012; Makarova et al. 2006, 2011; Wiedenheft et al. 2012) (Fig. 1). In contrast, type II systems have evolved distinct pre-crRNA processing and interference mechanisms in which a trans-activating crRNA (tracrRNA) binds to the repeat sequences of pre-crRNA forming a dual-RNA. Doubles tranded (ds) RNA-specific ribonuclease RNase III cleaves an RNA duplex formed by the CRISPR repeat and a trans-activating CRISPR RNA (tracrRNA) (Jinek et al. 2012; Deltcheva et al. 2011; Bhaya et al. 2011; Chylinski et al. 2013; Wiedenheft et al. 2012; Makarova et al. 2011) (Fig. 1). In addition, the three types of CRISPR–Cas system show a distinctly non-uniform distribution, with the type I system have been found in both bacteria and archaea, whereas the type III system appear more commonly in archaea. In particular, the type II system are exclusively widespread in bacteria so far (Makarova et al. 2011; Terns and Terns 2011; Bhaya et al. 2011).

Cas9 as an RNA-guided nuclease for genome editing

The best-studied Type II systems are the simplest of the three CRISPR–Cas types, with only four cas genes, one of which is always Cas9 (formerly Csn1) (Chang et al. 2013; Jinek et al. 2013). Cas9 is a single protein, a crRNA-guided double-stranded DNA endonuclease with two nuclease domains, an HNH (McrA-like) nuclease domain that cleaves the complementary DNA strand and a RuvC-like nuclease domain that cleaves the noncomplementary DNA strand (Jinek et al. 2012, 2014; Bikard et al. 2013; Chylinski et al. 2013; Fonfara et al. 2013; Jiang et al. 2013a) (Fig. 2). To form a functional DNA-targeting complex, target recognition and cleavage by the Cas9 protein requires a chimeric single-guide RNA (sgRNA) consisting of a fusion of crRNA (Each crRNA unit then contains a 20-nt guide sequence and a partial direct repeat) and tracrRNA and a short conserved sequence motif downstream of the crRNA-binding region, called CRISPR motifs or protospacer adjacent motif (PAM) (Jinek et al. 2012; Garneau et al. 2010; Jiang et al. 2013a; Feng et al. 2013; Fu et al. 2013; Hsu et al. 2013; Carroll 2012) (Fig. 2). In the CRISPR–Cas system derived from the bacterium Streptococcus pyogenes, the target DNA must immediately precede a 5′-NGG PAM (Jinek et al. 2012), whereas, it has been shown that many type II systems have differing PAM requirements, which may constrain their ease of targeting (Mali et al. 2013b; Cong et al. 2013; Garneau et al. 2010; Gasiunas et al. 2012; Sapranauskas et al. 2011; Zhang et al. 2013a). RNA-guided Cas9 activity creates site-specific DSBs, which are then repaired by either NHEJ or HDR, the sequence at the repair site can be modified or new genetic information inserted (Cong et al. 2013; Mali et al. 2013c; Cho et al. 2013) (Fig. 2). More intriguingly, the Cas9 protein and the sgRNA are the only a minimal set of two molecules necessary for induction of targeted invading DNA cleavage.

Fig. 2
figure 2

Targeted genome editing with RNA-guided Cas9. In a type II CRISPR–Cas system, Cas9 generates a blunt-ended double-stranded break 3 bp upstream of PAM through a process mediated by two catalytic domains in the protein, an HNH domain and a RuvC-like domain each of which cleaves one strand within the target DNA (Mali et al. 2013a; Jinek et al. 2012). Cas9 nucleases carry out strand-specific cleavage (Jinek et al. 2012; Ran et al. 2013b). Nuclease-induced DSBs can be repaired by NHEJ-mediated disruption of the genome and HDR-mediated modification of the genome (Mali et al. 2013a, b, c; Cong et al. 2013)

What makes the CRISPR–Cas9 system even more attractive is the ease, high efficiency, and versatility of the technology. Martin Jinek, designed a single RNA molecule of dual-tracrRNA:crRNA (sgRNA), successfully mixed it with specific Cas9 and showed that the synthetic complexes could target and cleave any dsDNA sequence of interest (Jinek et al. 2012). The type II CRISPR system from bacteria has been rapidly applied to achieve efficient robust RNA-guided genome editing in different species (Horvath and Barrangou 2010; Jiang et al. 2013a; Jinek et al. 2012; Makarova et al. 2011; Marraffini and Sontheimer 2010a; Sorek et al. 2008; Wiedenheft et al. 2012). Significantly, recent studies demonstrate that CRISPR–Cas system can function in human cells. Several researchers engineered a synthetic sgRNA consisting of a fusion of crRNA and tracrRNA can direct ‘humanized’ Cas9 endonuclease in various human cell lines, including induced pluripotent stem cells, they observed the expected alterations to the target DNA (Cong et al. 2013; Jinek et al. 2013; Mali et al. 2013c; Fu et al. 2013; Cho et al. 2013). Cas9 endonucleases that have also been shown to act as a nickases, enabling an additional level of control over the mechanism of DNA repair (Cong et al. 2013; Mali et al. 2013a). Up to now, in addition to human cells, CRISPR–Cas system has been successfully applied to achieve efficient genome editing in many eukaryotic organisms including Saccharomyces cerevisiae (DiCarlo et al. 2013), Caenorhabditis elegans (Dickinson et al. 2013; Friedland et al. 2013), Drosophila (Yu et al. 2013), zebrafish (Chang et al. 2013; Hwang et al. 2013; Jao et al. 2013), mouse (Shen et al. 2013; Li et al. 2013a; Wang et al. 2013a; Wu et al. 2013; Yang et al. 2013), rat (Li et al. 2013a, 2013c), and, at the same time, the feasibility and efficacy of CRISPR–Cas system has also been successfully demonstrated in the plants Arabidopsis thaliana (Feng et al. 2013; Li et al. 2013b; Jiang et al. 2013b), Nicotiana benthamiana (Li et al. 2013b; Nekrasov et al. 2013), and cultivated food crop rice (Oryza sativa) (Feng et al. 2013; Miao et al. 2013; Shan et al. 2013b; Jiang et al. 2013b), wheat (Triticum aestivum) (Shan et al. 2013b) and sorghum (Jiang et al. 2013b) (Fig. 3). Indeed, these findings hint that RNA-guided Cas9 might be useful for engineering other multicellular organisms, including animals and plants. Recently, both research groups demonstrated that further functionality of RNA-guided CRISPR–Cas9 system in both human and mouse cells and that multiplex editing of target genes is feasible upon introduction of multiple sgRNAs at the same time (Cong et al. 2013; Mali et al. 2013c; Pennisi 2013). Use Cas9 system, Li and colleagues have successfully targeted five target genes in Arabidopsis or N. benthamiana, and achieve efficient targeted mutagenesis in all cases (Li et al. 2013b). Subsequently, Gao’s and Zhu’s team have highly efficient targeted mutagenesis in multiple genes in rice (Shan et al. 2013b; Feng et al. 2013). Importantly, stable expression of the Cas9 system in transgenic animals and plants led to mutations in target genes. Impressively, the system was modified to create a more efficient and well-suited, enabling multiple endogenous genes editing by programming Cas9 to edit several sites in a genome simultaneously by simply using multiple guide RNAs (Shan et al. 2013b; Li et al. 2013a, b; Wang et al. 2013a). These pioneering experiments provide dramatic evidence that the technique could be used to engineer these model plant systems and crucial crop species.

Fig. 3
figure 3

Potential application of CRISPR–Cas9 systems. In addition to immunity systems that protect bacteria and archaea from invading viruses and plasmids. The diverse potential applications of Cas9 range from targeted genome editing to targeted genome regulation and possibly to one capable of introducing custom changes in the complex epigenome

In addition to genome editing, CRISPR interference (CRISPRi) can efficiently and selectively repress or activate transcription of targeted genes using a modified Cas9 protein lacking endonucleolytic activity (Qi et al. 2013; Gilbert et al. 2013; Larson et al. 2013) (Fig. 3). Thus, CRISPRi has the potential to be utilized as an efficient and flexible platform for engineering transcriptional regulatory networks control without altering the target DNA sequence. Furthermore, Cas9nuclease-null has been used to target proteins with specific functions to edit the epigenome (Rusk 2014). Further regulation will be able to occur through histone modification (acetylation and methylation) and, hence, change chromatin states and DNA methylation (Fig. 3).

Limitation and expansion of the Cas9 system

Although CRISPR–Cas system show great promise and flexibility for genetic engineering, sequence requirements within the PAM sequence may constrain some applications. In addition to the targeting range, another key question concerning the specificity of CRISPR–Cas RNA-guided endonucleases is whether off-target cleavage is required to evaluate. The issue of specificity is paramount for all the targetable nucleases. Currently, off-target cleavage by ZFNs and TALENs has been reduced by modifying the cleavage domain to require the formation of heterodimers (Carroll 2013). Present early-phase versions of the Cas9 system may also suffer to some degree from the same problem. In the CRISPR–Cas system, earlier studies have demonstrated that, although each base within the 20 nt guide sequence contributes to overall specificity, some base mismatches between the guide RNA and target DNA are tolerated depending on the quantity, position, and base identity of mismatches leading to potential off-target DSBs formation (Cong et al. 2013; Fu et al. 2013; Hsu et al. 2013; Jiang et al. 2013a). It has been reported that there is a high frequency of off-target effect of CRISPR–Cas-induced mutagenesis in human cells (Pattanayak et al. 2013; Fu et al. 2013) and a lower off-target effect in mice and zebrafish (Yang et al. 2013; Hruscha et al. 2013). Besides several studies using genome-wide sequencing found no detectable off-target genome modifications in Arabidopsis and N. benthamiana (Feng et al. 2014; Nekrasov et al. 2013). Nevertheless, more comprehensive studies are required to thoroughly address the off-target issue for the CRISPR–Cas system in other plant species or for other target genes. For routine application of Cas9, it is important to consider ways to reduce the frequency of unexpected mutations from off-target genome modification and to be able to detect the presence of off-target cleavage (Hsu et al. 2013; Fu et al. 2013; Jiang et al. 2013a). Although imperfect Cas9 specificity is a major reason for concern, there are several methods of potentially improving this. The challenges will be to analyse and address possible off-target effects and improve the efficiency and specificity of the system. Potential attractive strategy minimizes off-target mutagenesis include exploiting different Cas9 homologs identified through bioinformatics and directed evolution of these nucleases toward higher specificity. Alternatively, the range of targetable sites could be expanded through the use of homologs with different cognate PAM sequences. Additionally, a previously report shown that a Cas9 nickase mutant (Cas9n) cut only one DNA strand, and facilitated HDR at on-target sites can potentially increase the specificity of target recognition (Cong et al. 2013). More recently, a double-nicking strategy of combining Cas9n with paired guide RNAs by comparison, maintains high on-target efficiencies while drastically reducing off-target modifications to background levels (Ran et al. 2013a). In particular, a more thorough sequencing analysis for a large number of sgRNAs will also provide more information about the potential off-target cleavage of the CRISPR–Cas system and lead to a better prediction of potential off-target sites.

Comparison with other genome editing technologies

ZFNs, TALENs, and RNA-guided DNA endonucleases are transformative tools that have broad implications for synthetic biology, the direct and multiplexed perturbation of gene networks, and targeted in vitro and in vivo gene therapy (Gaj et al. 2013). Whereas, CRISPR–Cas9 system offers several potential advantages over ZFNs and TALENs. The complex designs of ZNFs or TALENs for each target gene and the efficiency of targeting may vary substantially, no multiplexed gene-targeting has been reported to date. However, compared with ZFNs and TALENs, CRISPR–Cas9 system not only offer a simpler means of attaining specificity and demonstrate equal or greater cleavage efficacy, but also provide a gene editing tool that can more easily be targeted to one or more genomic loci. Furthermore, ZNFs and TALENs locate target sequences using proteins that are often difficult and costly to produce. Given that the CRISPR–Cas9 system’s sgRNAs are now much easier to make than proteins exploited in ZNFs and TALENs genome engineering technologies. CRISPR systems have stormed onto the scene, promising to even out-compete ZNFs and TALENs. Ultimately, CRISPR may take a place beside ZNFs and TALENs, with the choice of editing tool depending on the particular application.

Future directions

The discovery and application of bacterial systems, have revolutionized molecular biology in the past. But for now, despite the intricacies of significant progress has been made in the last few years, many central aspects remain obscure. An important question is how safe, effective and specific are CRISPR–Cas9 system is not well understood. In addition, what is the optimal RNA scaffold for powerful application of CRISPR–Cas9 in multiple eukaryotic systems is still unclear. Furthermore, how effective is Cas9 systems as a basis for generating versatile and heritable modifications specifically at target genes between animals and plants also await elucidation. Intriguingly, more research will raise new questions and highlight the areas with the greatest potential for future research.

Given the dizzying rate at which CRISPR-targeting publications are appearing, researchers are clearly eager to capitalize on these advantages. Ideally, most research teams are to build a library of CRISPRs that can be harnessed to target any sequences in an organism’s entire genome, including promoters, enhancers, introns, and intergenic regions, which are inaccessible by means of RNAi (Shalem et al. 2013). In particular, null off-target mutagenesis using the CRISPR–Cas9 system could overcome one of the major limitations of RNAi, which would allow access to an entirely new repertoire of regulation of gene function (Wang et al. 2013b). Just a few days ago, two studies were published using CRISPRs for genome-scale loss-of-function screens in human cells. Moreover, relative to other methods of plant genome engineering and editing, the CRISPR–Cas9 system should be applicable to a wide range of higher plants. Notably, the CRISPR–Cas9 system facilitates HDR, in the future the technique will be successfully applied to precisely insert into a specific location of other cereals with more complicated genomes, which requires future investigation. Further, this creates a valuable new tool holds significant promise for plant biologists and breeders. Looking forward, the versatility and ease of use afforded by RNA-guided Cas9 enzymes coupled with its singular ability to bring together RNA and DNA in a fully programmable fashion will form the basis of a versatile tool for rewriting genomic sequence information that has the potential to explore and reshape any genome and constitute a new and promising paradigm to understand.