Main

Our previous analysis of the intracellular human pathogen Francisella novicida U112 by small RNA (sRNA) sequencing identified sRNAs expressed from two CRISPR–Cas loci13,16 (Extended Data Fig. 1a). As well as for the type II-B locus13, we detected sRNAs from a CRISPR–Cas locus that resembled the minimal architecture of type II systems but lacked a cas9 gene. Upstream of the cas1, cas2 and cas4 genes17, FTN_1397 was identified as a cas gene encoding a protein distinct in sequence from known Cas proteins; this was later named cpf1 (cas gene of Pasteurella, Francisella)17. This system was recently classified as a type V-A system belonging to the class 2 CRISPR–Cas systems18,19. The CRISPR array contains a series of nine spacer sequences separated by 36-nucleotide (nt) repeat sequences. The mature RNAs are composed of a repeat sequence in 5′ and spacer sequence in 3′, similar to the repeat-spacer composition of types I and III systems but distinct from the spacer-repeat composition of type II systems2,14,20 (Extended Data Fig. 1b). As in type I, the repeat forms a hairpin structure at its 3′ end20. Neither the presence of a Cas6 homologue nor the expression of a tracrRNA-like sRNA could be detected in the vicinity of the F. novicida type V-A locus, indicating that Cpf1 uses a distinct mode of crRNA biogenesis compared to the mechanisms that have been described thus far2,4,14.

We investigated whether Cpf1 acts as the single effector enzyme in pre-crRNA processing in type V-A systems. Recombinant F. novicida Cpf1 protein was overexpressed, purified and biochemically characterized. In contrast to the recently reported formation of Cpf1 dimers in solution16, our data reveal a molecular weight of 187 kDa (Extended Data Fig. 2), indicating that Cpf1 is a monomer. This result is corroborated by another study showing the crystal structure of Cpf1 from Lachnospiraceae bacterium (LbCpf1). No oligomerization of Cpf1 was observed in the crystals, analytical ultracentrifugation experiments or electron microscopy21. The monomeric nature is consistent with Cpf1 forming a complex with the guide crRNA to bind and cleave target DNA because if the active protein was a dimer16, it would probably require a tandem DNA target site, or alternatively, two different crRNAs targeting the top and bottom strand of the DNA.

In vitro cleavage assays show that Cpf1 processes a pre-crRNA consisting of a full-length repeat-spacer, yielding a 19-nt repeat fragment, and a 50-nt repeat-spacer crRNA intermediate (Fig. 1). Only RNAs with full-length repeat sequences were processed, indicating that the RNA cleavage activity is repeat-dependent (Extended Data Fig. 3a). The observed cleavage site is in good agreement with the data obtained by RNA-seq (Extended Data Fig. 1b) and a recent study16. The crRNAs produced in vitro represent intermediate forms that undergo further processing at the 5′ and 3′ ends by a nonspecific mechanism in vivo. Cpf1 cleaves pre-crRNA four nucleotides upstream of the stem-loop (Fig. 1). The cleavage site is reminiscent of many Cas6 enzymes and Cas5d, which recognize the hairpin of their respective repeats2,4,5,20. Cpf1, however, does not cleave directly at the base of the stem-loop, suggesting that the structure is not the only requirement for processing of pre-crRNA. Northern blot analysis using an inducible Escherichia coli heterologous system also demonstrates processing of pre-crRNA upon Cpf1 expression (Extended Data Fig. 3b), resulting in the expected RNA fragments.

Figure 1: Cpf1 processes pre-crRNA upstream of the repeat stem-loop structure.
figure 1

a, Denaturing polyacrylamide gel showing the processing of internally labelled 69-nt pre-crRNA (200 nM) by Cpf1 (1 μM) in the presence of 10 mM MgCl2 over 10 min. T1, RNase T1 ladder; OH, alkaline hydrolysis ladder; C, control reaction without Cpf1. Shown is a representative of three independent experiments. b, Schematic representation of pre-crRNA repeat structure. The Cpf1 cleavage site is indicated by a black triangle.

PowerPoint slide

To investigate the importance of the repeat and its hairpin structure in successful Cpf1 processing, we designed RNAs with mutations that yield either an altered repeat sequence keeping the stem-loop structure or an unstructured repeat. In contrast to the wild-type RNA substrate containing an intact repeat, none of the mutated RNAs was cleaved by Cpf1 (Extended Data Fig. 4a, b). We further designed repeat variants with either single nucleotide mutations between the cleavage site and the stem-loop (a region referred to as repeat recognition sequence (RRS)) or different sizes of the loop and stem regions (Extended Data Fig. 4a). Single nucleotide mutations in the RRS yielded repeat variants that were not, or only poorly, cleaved by Cpf1 (Extended Data Fig. 4c), indicating that these residues between the stem and the cleavage site have a role in processing of the substrate. This can be explained by the distinct secondary structure of crRNA in complex with Cpf1, where the RRS folds back to make contacts with the stem-loop21. Changes in the loop region of the repeat structure resulted in reduced cleavage activity for a shorter loop, whereas an increased loop length did not influence cleavage (Extended Data Fig. 4d). Extensive contacts of Cpf1 to the stem-loop of the crRNA21 explain why alterations of the stem structure yielded non-cleavable substrates. These results highlight the requirement of a stem-loop structure specific in length and sequence for recognition by Cpf1. Thus, the repeat cleavage reaction is highly sequence- and structure-dependent.

To determine the ion dependency of Cpf1 processing activity, we tested a variety of divalent metal ions in RNA cleavage assays. The activity of Cpf1 in pre-crRNA processing was highest when Mg2+ was added to the reaction (Extended Data Fig. 5a). Addition of Ca2+, Mn2+ or Co2+ also mediated cleavage, although not to the level of specificity observed with Mg2+. Equimolar addition of EDTA markedly reduced Cpf1 processing activity. The dependency on Mg2+ is in contrast to the ion-independent reaction of Cas6 (types I and III)2,20 or Cas5d (type I-C)5. A Mg2+ ion is coordinated in the structure of the crRNA21. Whether this ion is required for catalysis or only for stabilization of the tertiary structure has not yet been determined. Thus, our study highlights a novel crRNA biogenesis mechanism in which Cpf1 is a metal-dependent endoribonuclease that cleaves pre-crRNA in a sequence- and structure-specific manner. Similarities in the pre-crRNA processing mechanisms of Cpf1 and Cas6 enzymes of type I and type III systems indicate potential evolution of these ancestral CRISPR–Cas systems through transposition events18. This hypothesis is supported by our finding that Cpf1 functions as the endoribonuclease of type V-A systems together with the repeat-spacer composition of mature crRNAs and the requirement for a hairpin structure in the repeat. Bioinformatic analyses indicate that type V systems may be ancestral versions of type II systems. Type V may be considered as a link between class 1 and class 2 systems, which is supported by the recent discovery of a subtype V-B that encodes tracrRNA18,19.

It was previously shown that Cpf1 acts as the DNA endonuclease guided by crRNA to cleave double-stranded (ds)DNA site-specifically16. In accordance with that study, we show that only crRNA containing an intact stem-loop and a sequence complementary to the target DNA mediated Cpf1 DNA cleavage that resulted in a staggered cut producing a 5-nt 5′ overhang (Fig. 2a, b; processed crRNAs (RNA1–3), full-length pre-crRNAs (RNA4–6), mutated crRNAs (RNA7 and 8), Extended Data Figs 6 and 7). Surprisingly, a crRNA with a spacer-repeat arrangement also mediated cleavage by Cpf1, albeit with less efficiency than the wild type. Although the RNA processing activity of Cpf1 is highly dependent on the repeat sequence (sequence mutant, Extended Data Fig. 4a, b), a similar RNA resulted in residual DNA cleavage activity (RNA7, Extended Fig. 6). This might be due to the 3′ end nucleotide of the repeat, which was not mutated and was recently reported to be crucial for DNA targeting16 and for maintaining the specific tertiary structure of crRNA21.

Figure 2: Cpf1 cleaves target DNA specifically at the 5′-YTN-3′ PAM-distal end to generate 5-nt 5' overhangs in the presence of Ca2+.
figure 2

a, b, Cpf1-mediated target plasmid DNA cleavage (a) and Cpf1-mediated oligonucleotide duplex cleavage (b), dependent on the crRNA containing spacer 4 or 5 (crRNA-sp4 or crRNA-sp5), in the absence or presence of Ca2+. c, Schematic representation of the protospacer 5 sequence in the DNA (top), and the structure of crRNA-sp5 used in a, b, d and e (bottom). Cleavage sites corresponding to fragments obtained in b and confirmed by sequencing (Extended Data Fig. 7) are indicated by blue triangles. The PAM is marked in grey. d, Plasmid DNA containing the PAMs 1–6, or 5′-radiolabelled double-stranded oligonucleotide containing PAMs 1, 7–9 were cleaved by Cpf1 in the presence of 10 mM CaCl2 (upper and lower panel, respectively). e, Plasmids containing protospacer 5 and single or quadruple mismatches (mut_1-4 and mut_19-22) along the target strand were tested for cleavage by Cpf1 programmed with crRNA-sp5 in the presence of 10 mM MgCl2. Quantification of three independent experiments are shown in Extended Data Table 1a. li, linear; sc, supercoiled; M, 1 kb ladder. Data in a, b, d and e are representatives of at least three independent experiments.

PowerPoint slide

Given that Cpf1 can process pre-crRNA, it is not surprising that RNAs with the full-length repeat-spacer (RNA4 and RNA6, Extended Data Fig. 6) mediate similar cleavage activities as the mature crRNA form. RNA containing the full-length repeat-spacer led to the most efficient DNA binding and nuclease activity of Cpf1 (compare RNA4 to RNA3 and RNA6, Extended Data Figs 8a and 6a, b). The processed form of crRNA (RNA3, Extended Data Fig. 6) was constructed on the basis of sRNA sequencing results (Extended Data Fig. 1) before the exact RNA processing of Cpf1 was known (Fig. 1), which resulted in a 3-nt shorter 5′ end. Binding to and processing of pre-crRNA induces conformational changes in Cpf1, causing the enzyme to change into an active endonucleolytic state21. Similarly, an induced-fit mechanism is used by Cas9, which undergoes large conformational rearrangements upon binding to tracrRNA–crRNA22.

A seed sequence of 3–5 nt at the PAM-proximal side of the protospacer has been reported for Cpf1 (ref. 16). Using plasmids with single mismatches between spacer and protospacer along the target sequence, we observed that Cpf1 was sensitive to mismatches within the first eight PAM proximal nucleotides, and would not tolerate four consecutive mismatches. Furthermore, Cpf1 was sensitive to mismatches around the cleavage site (position 1–4 on the PAM-distal site), but to a lesser extent (Fig. 2e, Extended Data Table 1a). In the co-crystal structure of Cpf1 and crRNA, the targeting region of crRNA was not resolved, indicating that Cpf1 does not bind and stabilize this part of crRNA21. This is in contrast to Cas9, which makes extensive contact with the guide portion of the RNA, possibly explaining its longer seed region22,23. Together with the recent Cpf1 characterization16, our results indicate that there may be additional factors influencing the specificity, such as the base content of the target sequence. The results highlight similarities between Cpf1 and Cas9 (refs 24, 25), which first recognizes the PAM and subsequently probes the crRNA complementary to the target DNA. Mismatches around the target site might disturb correct positioning of the catalytic residues and therefore reduce cleavage activity.

Aligning the two predicted protospacer sequences of the F. novicida U112 type V-A CRISPR–Cas revealed a conserved 5′-TTA-3′ sequence located on the non-target strand upstream of the protospacer. To verify the potential PAM, protospacer 5 was cloned without its flanking region yielding a 5′-CTG-3′ sequence. Both plasmids were cleaved equally well by Cpf1, indicating that the second position in this sequence is critical (Fig. 2d, Extended Data Fig. 7d). Mutagenesis of all three nucleotides followed by DNA cleavage analysis shows that Cpf1 recognizes a PAM, defined as 5′-YTN-3′, upstream of the crRNA-complementary DNA sequence on the non-target strand. This result expands on the already reported 5′-TTN-3′ PAM16. To analyse strand specificity of PAM recognition, we designed oligonucleotide substrates with either AAN or TTN on both strands. These substrates were not cleaved by Cpf1, indicating that the PAM needs to be double-stranded and is probably recognized on both strands (Fig. 2d, lower panel).

We next investigated the metal ion dependency of DNA cleavage by Cpf1. Notably, we observed that in addition to Mg2+ and Mn2+, which were shown to mediate activity in Cas9 (ref. 15), Cpf1 also cleaves DNA in the presence of Ca2+ (Extended Data Fig. 5b, Extended Data Table 1b). To investigate potential differences in cleavage with Mg2+ or Ca2+, we carried out DNA cleavage reactions in the presence of either of these ions (Fig. 2, Extended Data Fig. 7). In Cas9, two active motifs, HNH and RuvC, are responsible for cleavage of the target and non-target strand, respectively15. The HNH motif of Cas9 from Neisseria meningitidis is Ca2+-dependent26. If there were two active sites in Cpf1, each coordinating one of the metal ions and cleaving one of the DNA strands, we would expect a difference in cleavage of target and non-target strands depending on the ion used. In contrast, we did not observe differences in the efficiency of target or non-target strand cleavage by Cpf1 in the presence of Ca2+ or Mg2+ (Fig. 2b, Extended Data Fig. 7b). This finding indicates the presence of only one catalytic motif in Cpf1 that is responsible for cleaving both DNA strands and can coordinate Mg2+ as well as Ca2+ ions.

Our experiments show for the first time that Cpf1 exhibits dual (RNA and DNA) cleavage activity. To determine the respective cleavage motifs, we performed mutagenesis of conserved residues along the Cpf1 amino acid sequence (Supplementary Fig. 2). Alanine substitution of residues H843, K852, K869 and F873 had no effect on DNA cleavage activity (Fig. 3a, upper panel), but resulted in decreased in vitro RNA cleavage activity (Fig. 3a, middle panel). To further confirm the involvement of these residues in RNA processing in vivo, a heterologous E. coli assay co-expressing pre-crRNA (repeat–spacer–repeat) and Cpf1, or a variant thereof, was established. Northern blot analysis was performed with total RNA extracted following induced expression of the Cpf1 variant (Fig. 3a, lower panel, Extended Data Fig. 3b). Pre-crRNA is more abundant in the presence of Cpf1, indicating possible protection from degradation by Cpf1. Expression of wild-type Cpf1 results in the production of a distinct band of around 65 nt, which corresponds to a mature crRNA formed by two cleavage events within the repeats. In the presence of Cpf1(H843A), this band is absent; however, two additional longer RNAs appear due to changed processing by this mutant, as observed in vitro (Fig. 3a, middle panel). Mutants K852A and K869A also resulted in the production of the 65-nt fragment, but with less intensity compared to the wild type, and two additional products with longer sizes. In vitro, these mutants have almost no RNA processing activity. RNA-binding experiments with Cpf1(K852A) and Cpf1(K869A) (Extended Data Fig. 8b) indicated a slightly higher affinity for RNA than wild-type Cpf1, which may explain the cleavage products observed in vivo. The residual activity of these Cpf1 mutants produces processed RNA, which is likely to be bound tighter to the protein and therefore better protected from degradation. Cpf1(F873A) had reduced RNA cleavage activity in vitro, which could not be detected in vivo. Mutation of the aforementioned residues did not negatively affect RNA binding (Extended Data Fig. 8b), indicating that the identified residues of Cpf1 are potentially responsible for RNA cleavage. Analysis of the co-crystal structure of Lachnospiraceae bacterium Cpf1 revealed that the identified residues are located in close proximity to the 5′ of the processed crRNA21.

Figure 3: Cpf1 contains active centres for RNA and DNA cleavage.
figure 3

a, RNase motif mutants were tested for DNA plasmid cleavage activity (agarose gel, upper panel), in vitro pre-crRNA cleavage activity (denaturing polyacrylamide gel, middle panel) and in vivo pre-crRNA processing activity (northern blot, lower panel). In vitro cleavage was performed in the presence of 10 mM MgCl2. b, DNase motif mutants were tested for plasmid DNA cleavage activity (agarose gel, upper panel) and in vitro pre-crRNA cleavage activity (denaturing polyacrylamide gel, lower panel). c, Additional RuvC motif mutants were tested for DNA cleavage of double-stranded oligonucleotide substrates in 10 mM CaCl2 (upper two panels) or MgCl2 (lower two panels). Target or non-target strand was 5′ radiolabelled before annealing to the non-labelled complementary strand. d, Schematic representation of Cpf1 amino acid sequence (N terminus not shown for clearer visualization) with the active domains for RNA and DNA cleavage highlighted in orange and blue, respectively. Mutated amino acids are indicated with the DNase motif shown in red. e, Summary of recognized substrates, metal ion dependency and crRNA requirements for both RNase and DNase motifs of Cpf1. −, no activity; + residual activity; +++ full activity. Data in ac are representatives of at least three independent experiments.

PowerPoint slide

Mutagenesis of D917, E1006 and D1255 in the split RuvC motif resulted in loss of DNA cleavage activity16 (Fig. 3b, upper panel), but did not influence the RNA processing activity of Cpf1 (Fig. 3b, lower panel), nor did it affect binding affinity to the DNA target (Extended Data Fig. 8c).

While screening for active site residues, we observed differences in DNA cleavage for some mutants depending on the metal ion present. Mutants E920A, Y1024A and D1227A showed no DNA cleavage in the presence of Ca2+, but wild-type activity when Mg2+ was present (Fig. 3c). These residues are located in close proximity to the three identified catalytic residues and may be responsible for coordination of the Ca2+ ion. Mutating residue E1028 also led to loss of Ca2+-promoted DNA cleavage and additionally reduced cleavage of the non-target strand in the presence of Mg2+, indicative of its involvement in non-target strand cleavage. In contrast, mutation of residues H922 and Y925 resulted in markedly reduced cleavage of the target strand in the presence of Ca2+, whereas these mutants showed wild-type levels of DNA cleavage activity in the presence of Mg2+. These findings suggest that H922 and Y925 are involved in Ca2+ coordination and target-strand cleavage.

We show that two aspartates (D917, D1255) and one glutamate (E1006) form the catalytic site, which is in good agreement with the recent characterization of Cpf1 and other RuvC/RNaseH motifs16. These kind of catalytic motifs generally use a two-metal-ion mechanism for DNA cleavage, as shown for Cas9 from Streptococcus pyogenes23. Enzymes with a two-metal-ion mechanism have more specificity for metal ions, Mg2+ in particular27. In contrast, enzymes using a one-metal-ion mechanism for cleavage (for example, HNH nucleases) are more flexible in their specificity for metal ions. For example, KpnI cleaves DNA with high fidelity in the presence of Ca2+, but less specifically in the presence of Mg2+ (ref. 28). As mentioned before, the HNH motif of Cas9 from N. meningitidis is active in the presence of Ca2+ (ref. 26). In addition to the identified RNA processing activity of Cpf1, this enzyme may also represent a new type of DNA nuclease using two-metal-ion catalysis, with the ability to utilize Mg2+ or Ca2+ ions. The physiological relevance of Cpf1 using both ions for DNA cleavage remains undetermined and requires further investigation.

In summary, Cpf1 is an enzyme with two separate catalytic moieties that cleave RNA or DNA (Fig. 3d). The RNase motif is specific for the ribose and unable to cleave DNA. This specificity can be explained by specific interactions of Cpf1 to 2′-OH groups of crRNA21. The DNase motif shows cleavage activity only against double-stranded and single-stranded target DNA, but no activity against single-stranded RNA, double-stranded RNA or RNA–DNA heteroduplexes (Fig. 3e, Extended Data Fig. 9). There are other nucleases reported to have certain promiscuity towards RNA and DNA cleavage activity, but one of the two activities is usually highly unspecific29,30. To our knowledge, Cpf1 is the first enzyme with two specificities, cleaving RNA in a sequence- and structure-dependent manner, and also performing DNA cleavage in the presence of the RNA that is produced in the first reaction. In the context of CRISPR immunity, type V-A appears to be the most minimalistic system described thus far, using only one enzyme, Cpf1, to process pre-crRNA and then using this RNA to specifically target and cut invading DNA. Evolution of one protein to perform these two specific reactions leads to a more effective mechanism, and also makes this system ideal for horizontal gene transfer. Finally, this mechanism opens new avenues for sequence-specific genome engineering, silencing and facilitates multiplexing.

Methods

Data reporting

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

Small RNA sequencing

Small RNA sequencing data of Francisella novicida U112 (Supplementary Table 1a) used in this study were obtained previously13. Briefly, a cDNA library of RNAs (treated with tobacco acid pyrophosphatase) of F. novicida U112 grown to mid-logarithmic phase was prepared using the ScriptMiner Small RNA-Seq Library Preparation Kit (Multiplex, Illumina compatible) and sequenced at the Campus Science Support Facilities GmbH (CSF) Next Generation Sequencing (NGS) Unit of the Vienna Biocentre. After adaptor removal and quality trimming, the reads were mapped to the F. novicida U112 genome (GenBank: NC_008601, 48205 mapped reads) using Bowtie. The read coverage was calculated using BEDTools (version 2.15.0.)31 and a normalized wiggle file was created and visualized using the Integrative Genomics Viewer32,33.

Production and purification of recombinant Cpf1

The cpf1 (FTN_1397) gene was amplified from genomic DNA of F. novicida U112 and cloned into the expression vector pET-16b to facilitate expression of Cpf1 with an N-terminal 6× His-tag (Supplementary Table 1b, c). For the production of the protein in E. coli (NiCo21 (DE3)), the cells containing the overexpression plasmid were grown at 37 °C to reach an optical density (OD)600 of 0.6–0.8. Expression was induced by addition of 0.5 mM isopropylthio-β-d-galactoside (IPTG) and the cultures were further incubated overnight at 18 °C. After collection, the cell pellet was resuspended in lysis buffer (20 mM HEPES (pH 7.5), 500 mM KCl, 25 mM imidazole, 0.1% Triton X-100) followed by 6 min of sonication (0.5 s pulses) for cell disruption. The lysate was cleared by centrifugation (47,800g, 30 min, 4 °C) and the supernatant was applied to Ni2+-NTA-Sepharose resin in a drop column. After washing steps with 10 ml of lysis buffer followed by 10 ml wash buffer (20 mM HEPES (pH 7.5), 300 mM KCl, 25 mM imidazole), the protein was eluted with elution buffer (20 mM HEPES (pH 7.5), 150 mM KCl, 250 mM imidazole, 0.1 mM DTT, 1 mM EDTA). The eluates were analysed by SDS–PAGE followed by Coomassie blue staining. Fractions containing Cpf1 were pooled for cation-exchange chromatography (HiTrap Heparin; GE-Healthcare) using a FPLC Äkta-Purification system (GE-Healthcare) and Cpf1 was eluted with a linear gradient of 100–1000 mM KCl. Peak fractions were analysed by SDS–PAGE and Coomassie blue staining. Cpf1-containing fractions were pooled and directly applied to an equilibrated (20 mM HEPES (pH 7.5), 150 mM KCl) prepgrade Superdex 200 size-exclusion column (GE-Healthcare) and purified via fast protein liquid chromatography (FPLC), followed by analysis by SDS–PAGE and Coomassie blue staining. Molecular weight calibration of the column was performed using molecular weight markers, as described in the manufacturer’s protocol (Kit for Molecular Weights, Sigma-Aldrich). The protein was dialysed against dialysis buffer (20 mM HEPES (pH 7.5), 150 mM KCl, 50% glycerol) and stored at −20 °C until use.

Site-directed mutagenesis of Cpf1

Oligonucleotides for the site-directed mutation of Cpf1 (Supplementary Table 1c) were designed using the QuickChange Primer Design tool of Agilent and produced by Sigma-Aldrich. A series of PCRs was performed to obtain the desired mutation. Briefly, the overexpression vector containing wild-type cpf1 was amplified in two reactions with either the forward or reverse QuickChange primer. After an initial amplification, the two reactions were mixed and a second PCR was performed. Following PCR, the template plasmid was degraded with DpnI (3 h, 37 °C) and introduced by transformation into chemically competent DH5α cells. Plasmids were prepared using a plasmid Miniprep kit (Qiagen) according to the manufacturer’s instructions. Successful mutagenesis was confirmed by sequencing analysis of the plasmids (SeqLab).

Generation of RNAs used in this study

The sRNAs tested in this study were generated by in vitro transcription using the AmpliScribe T7-Flash kit (Biozym) according to the manufacturer’s protocol. In brief, oligonucleotides containing the desired sequence (Supplementary Table 1c) and a T7-promoter sequence were hybridized to an oligonucleotide containing the complementary T7-promoter sequence. The hybridization product was then used as a template for the transcription reaction according to the AmpliScribe T7-Flash kit (Biozym). To obtain internally labelled RNAs, [α-32P]ATP (5000 Ci mmol−1, Hartman Analytic) was added to the in vitro transcription reaction34. In order to generate end-labelled RNAs, the unlabelled transcripts were dephosphorylated with Fast-AP phosphatase (Fermentas) for 30 min at 37 °C followed by a purification using Illustra Microspin G-25 columns (GE-Healthcare). The dephosphorylated RNAs were then labelled using T4 polynucleotide kinase (Fermentas) and [γ-32P]ATP (5000 Ci mmol−1) according to the manufacturer’s instructions, and separated using denaturing polyacrylamide gel electrophoresis (8 M urea; 1× TBE; 10% polyacrylamide). Subsequent to short exposure to an autoradiography screen (for radioactively labelled RNAs) or ethidium bromide (EtBr) staining (for unlabelled RNAs), the respective bands of the RNAs were excised. Elution of the RNAs was achieved by incubation of the gel pieces in 500 μl RNA elution buffer (250 mM NaOAc; 20 mM Tris-HCl (pH 7.5); 1 mM EDTA (pH 8.0); 0.25% SDS) and overnight incubation on ice. Following elution, RNA was precipitated with 2 vol ice-cold ethanol (100% EtOH) and 1/100 glycogen for 1 h at −20 °C. After washing with 70% EtOH, the air-dried pellets were resuspended in H2O.

In vitro RNA cleavage assay

RNA cleavage assays using indicated concentrations of Cpf1 and various RNA substrates were conducted in KGB buffer35 (100 mM potassium glutamate, 25 mM Tris-acetate (pH 7.5), 500 μM 2-mercaptoethanol, 10 μg ml−1 BSA) supplemented with 10 mM MgCl2 at 37 °C in a final volume of 10 μl. If not indicated otherwise, the reaction was stopped after 10 min by the addition of 2 μl proteinase K (20 mg ml−1) following 10 min incubation at 37 °C to achieve protein degradation. After adding 2× loading dye (10 M urea, 1.5 mM EDTA (pH 8.0)), the samples were loaded on 12% denaturing polyacrylamide gels run in 1× TBE for 3 h at 12.5 V cm−1. For the sequencing gels, the samples were precipitated before loading on 10% denaturing polyacrylamide gels. The gel electrophoresis was carried out at 40 W for 3.5 h. Visualization was achieved by phosphorimaging (Typhoon FLA 9000 Fuji). For RNA size determination, a 5′-end-labelled 69-nt long transcript consisting of a short form of pre-crRNA (repeat-spacer 5, full-length) was subjected to alkaline hydrolysis generating a single nucleotide resolution ladder and to RNase T1-specific cleavage. Each individual experiment was performed in three replicates.

In vivo RNA processing

To investigate in vivo RNA processing by Cpf1, a heterologous system was designed in E. coli. A DNA fragment encoding a pre-crRNA containing a repeat-spacer-repeat structure under the control of a T7-promoter and T7-terminator was synthesized by Integrated DNA Technologies and cloned into pACYC184 using HindIII and EagI yielding pEC1690. E. coli BL21(DE3) was co-transformed with this plasmid and the overexpression vector of wild-type or mutant Cpf1. The empty expression vector pET-16b served as a negative control. The bacterial cells were grown in the presence or absence of 0.1 mM IPTG at 37 °C to reach early exponential phase (OD600 = 0.4). RNA was extracted using TRIzol (Sigma-Aldrich) according to the manufacturer’s protocol followed by northern blot analysis as described previously36,37,38. In brief, RNA was separated on denaturing 10% polyacrylamide gels (8 M urea, 1× TBE) and transferred by semi-dry blotting on a nylon membrane (Hybond TM N+, GE Healthcare). Chemical crosslinking was performed for 1 h at 60 °C with 1-ethyl-3-(3-dimethylaminopropyl) carbodiimide hydrochloride. Oligonucleotides were radioactively labelled with [γ-32P]ATP (5000 Ci mmol−1) and T4 polynucleotide kinase (Fermentas) as described above and purified using Illustra Microspin G-25 columns (GE Healthcare). The hybridization of the probe against the spacer in the pre-crRNA (Supplementary Table 1c) was performed in Rapid-hyb buffer (GE-Healthcare) by incubation overnight at 42 °C. The radioactive signal was visualized using phosphorimaging. Each individual experiment was performed in three replicates.

Generation of DNA substrates

To determine the target cleavage site of Cpf1, spacer sequences of the F. novicida U112 type V-A CRISPR array were analysed by BLAST39. Potential targets for spacer 4 and spacer 5 were identified in F. novicida 3523, located in the intergenic region between coding sequence AEE26308.1 and AEE26307.1, and in AEE26301.1, respectively. Target protospacer containing a sequence complementary to spacer 5 including 42 bp up- and downstream sequences was synthesized as oligonucleotides containing HindIII overhangs. Following hybridization of the oligonucleotides, the fragments were cloned into pUC19 using HindIII yielding plasmid pEC1664 (protospacer 5 and flanking region). The same protospacer sequence without flanking regions was cloned into pUC19, yielding pEC1688 (protospacer 5). In order to identify the PAM, mutagenesis was performed by applying the described protocol for site-directed mutagenesis on pEC1688. Plasmid preparation was done using Miniprep kit (Qiagen) according to the manufacturer’s instructions and DNA integrity was confirmed by sequencing analysis (SeqLab). Oligonucleotides containing the protospacer (Supplementary Table 1c) were ordered at Sigma and hybridized before radioactive labelling. Alternatively, a single-stranded oligonucleotide was labelled and hybridized with the complementary non-labelled oligonucleotide. 5′-end-labelling reactions were performed using [γ-32P]ATP (5000 Ci mmol−1) and T4 polynucleotide kinase (Fermentas) according to the manufacturer’s instructions. The labelled oligonucleotides were purified using Illustra Microspin G-25 columns (GE healthcare).

In vitro DNA cleavage assay

Plasmid DNA cleavage assays were performed by pre-incubating 100 nM Cpf1 with 200 nM RNA in KGB buffer supplemented with either 10 mM MgCl2 or 10 mM CaCl2 for 15 min at 37 °C. Plasmid DNA (10 nM) was added to the reaction to yield a final volume of 10 μl and further incubated for 1 h at 37 °C. Reactions were stopped by the addition of 1 μl proteinase K (20 mg ml−1) and 5 min incubation at 37 °C. Before separation of the reaction, 3 μl of 5× DNA loading buffer (250 mM EDTA, 1.2% SDS, 25% glycerol, 0.01% bromophenol blue) were added and the samples were loaded on 0.8% agarose gels (1× TAE buffer). Cleavage products were visualized by EtBr staining. In cleavage assays using radioactively labelled substrates, 5 nM of 5′-labelled double-stranded oligonucleotides were added to the pre-formed complex of Cpf1 and RNA, and incubated at 37 °C for 1 h. After proteinase K treatment, 10 μl of 2× denaturing loading buffer (95% formamide, 0.025% SDS, 0.5 mM EDTA, 0.025% bromophenol blue) were added. Oligonucleotides of the size of the expected cleavage products were 5′-radiolabelled as described above and mixed with an equal volume of 2× denaturing loading buffer to serve as size markers. After 5 min incubation at 95 °C, the samples were loaded on 12% denaturing polyacrylamide gels and run in 1× TBE for 70 min at 14 V cm−1. Cleavage was visualized using phosphorimaging. Each individual experiment was performed in three replicates.

Electrophoretic mobility shift assays

Substrates for electrophoretic mobility shift assays (EMSAs) were generated as described above. For DNA binding reactions, Cpf1 was pre-incubated in binding buffer (20 mM Tris-HCl (pH 7.4), 100 mM KCl, 1 mM DTT, 5% glycerol) containing two molar excess of crRNA. After 15 min at 37 °C, 1 nM labelled DNA substrate was added. The reaction was then carried out at 37 °C for 1 h before the samples were loaded on a native 5% polyacrylamide gel running at 10 V cm−1 for 50 min in 0.5× TBE to separate protein–DNA complexes from unbound DNA. For RNA binding reactions, the crRNA was dephosphorylated using Fast AP (Fermentas) and 5′-radiolabelled with [γ-32P]ATP (5000 Ci mmol−1) and T4 polynucleotide kinase (Fermentas) according to the manufacturer’s instructions. A total of 0.5 nM radiolabelled RNA were incubated with Cpf1 in binding buffer (20 mM Tris (pH 7.5), 150 mM KCl, 10 mM CaCl2, 1 mM DTT, 5% glycerol, 0.01% Triton X-100, 10 μg ml−1 BSA) for 1 h at 37 °C and loaded on 4% native polyacrylamide gels running at 10 V cm−1 for 30 min in 0.5× TBE. The gels were exposed on an autoradiography film overnight and visualized by phosphorimaging. Fractions of bound and unbound nucleic acids were determined densitometrically and the percentage of bound nucleic acid was plotted against the protein concentrations. The dissociation constant, Kd, was determined using a nonlinear regression analysis.

Multiple sequence alignment of Cpf1 orthologues

Cpf1 orthologous sequences were derived by BLAST39 search of the NCBI database using Cpf1 of F. novicida U112 as a query. A multiple sequence alignment of 52 orthologous sequences was generated using MUSCLE40. The alignment of nine of the sequences was visualized with Jalview41.