Introduction

As a class of versatile living systems, bacteria are useful in many fields of synthetic biology. In bacteria, genetic information contained on the single-copy genome determines the characteristics of a specific strain. To understand bacterial characteristics and utilize them to explore the world and serve human life, researchers frequently conduct genome engineering to reprogram the genetic information of bacteria. Through DNA editing, researchers can add desired exogenous genetic information to or delete unwanted endogenous genetic information from the bacterial genome. The long fragment editing technique is of great importance in accelerating bacterial genome engineering to obtain genetically stable strains. For example, the long fragment deletion technique can help to simplify the bacterial genome to explore the minimal genome of a specific strain (Kato and Hashimoto 2007, 2008), and the long fragment insertion technique can help to expand the bacterial genome to archive the expanding information of the human world (Shipman et al. 2017). In metabolic engineering, plasmid maintenance requires continuous antibiotic use, which has led to biosafety issues and elevated industrial cost (Mignon et al. 2015). The long fragment editing technique is an ideal tool for constructing plasmid-independent and high-production strains.

To accelerate the process of genome engineering, researchers have developed many methods for generating insertions and deletions in the bacterial genome. Recombineering is a classical method that has been popular in synthetic community for 20 years (Datsenko and Wanner 2000; Sharan et al. 2009; Jeong et al. 2013; Pines et al. 2015). Though recombineering can handle the insertion and deletion of short DNA fragments (Wang et al. 2009; Warner et al. 2010; Isaacs et al. 2011), the editing efficiency decreases dramatically for long fragments (Jeong et al. 2013). On the one hand, the transformation efficiency of linear donor DNA decreases steeply with the length; on the other hand, the recombination relies on the formation of replication fork, and the scope of replication fork for the annealing of donor DNA is limited (Mosberg et al. 2010). Generating a double-strand break (DSB) in the target DNA is an effective strategy for improving the efficiency of genome editing because DSB fully stimulate DNA repair pathways. For cleaving double-stranded DNA (dsDNA), CRISPR/Cas9 is much more flexible than the homing endonuclease I-SceI which needs an 18-bp recognition site to be integrated into the target DNA before inducing DNA cleavage (Tischer et al. 2006; Yu et al. 2008; Yang et al. 2014). Cas9 endonuclease complexed with a designed single-guide RNA (sgRNA) can generate DSB in a specific protospacer sequence where a proper protospacer-adjacent motif (PAM) exists (Gasiunas et al. 2012; Jinek et al. 2012; Jiang et al. 2013). CRISPR/Cas9 technology relies on sgRNA-directed cleavage at the target site to kill wild-type cells, thus circumventing the need for selectable markers or counter-selection systems. In addition, changing the 20-bp spacer sequence can reprogram the specificity of the Cas9-sgRNA complex. In the past 7 years, many genome editing methods and protocols based on the CRISPR/Cas9 technology have been reported (Garst et al. 2017). These methods are time-saving and easy to use, but they still do not fundamentally solve the problems of long fragment editing. In many methods, the CRISPR/Cas9 system just functions as a selection tool, and its potential has not been fully exploited (Li et al. 2019a; Wu et al. 2019). CRISPR/Cas9-assisted non-homologous end-joining (NHEJ) well exploits the potential and is useful for long fragment deletion (Su et al. 2016, 2019; Zheng et al. 2017; Huang et al. 2019). However, it inevitably generates stochastic indels and may lead to DNA rearrangement. In addition, long fragment insertion is impossible by this strategy.

Among all CRISPR/Cas9-based genome editing methods, CRISPR/Cas9-assisted recombineering methods perform better. Existing CRISPR/Cas9-assisted recombineering methods use circular DNA (plasmid-borne dsDNA) (Jiang et al. 2015; Zhao et al. 2016; Feng et al. 2018) or linear DNA (PCR-amplified dsDNA or synthesized ssDNA) (Jiang et al. 2013, 2015; Li et al. 2015; Zhang et al. 2017; Zhao et al. 2017) as the editing template, and both kinds of templates have advantages and disadvantages. The circular editing template can avoid the attack by DNA exonucleases and copy itself along with the plasmid replication, thus resulting in much higher homologous recombination efficiency and editing efficiency. However, there is a significant chance that the total plasmid will be integrated into the genome, and these recombination events are difficult to distinguish from desired recombination events through conventional PCR verification, leading to high false-positive rates. It seems that this phenomenon has not yet been noticed. A linear editing template circumvents the trouble caused by plasmid integration, thus resulting in a higher positive rate. However, the sensitivity of the linear editing template to DNA exonucleases leads to lower homologous recombination efficiency and editing efficiency. To solve this contradiction, we made a systematic optimization in the basis of existing CRISPR/Cas9-assisted recombineering methods and developed an efficient long fragment editing method for large-scale and scarless genome engineering in Escherichia coli. This method enabled us to insert and delete large DNA fragments into and from the genome, with high positive rates and editing efficiency. Notably, the high performance of the method was independent of high transformation efficiency, making the method much easier to put into practice. Furthermore, the method was successfully applied in genome simplification and metabolic engineering, demonstrating its value as a genome engineering tool for constructing genetically stable E. coli strains. We believe that this method has the potential to be widely used in other sequenced organisms in addition to E. coli.

Materials and methods

Strains and culture conditions

E. coli strain DH5α (American Type Culture Collection, ATCC 68233) served as the host strain for molecular cloning and plasmid manipulation. MG1655 (ATCC 47076) served as the genetic material in editing experiments except where otherwise stated. The strains involved in this study are listed in Table S1. The verification primers used in the genome editing experiments are listed in Table S2. Luria-Bertani (LB) medium (10 g/L tryptone, 5 g/L yeast extract, and 10 g/L NaCl) was used for cell growth in all cases except where otherwise noted. The solid medium contained 20 g/L agar. Super optimal broth with catabolite repression (SOC) medium (20 g/L tryptone, 5 g/L yeast extract, 0.5 g/L NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4, and 20 mM glucose) was used for cell recovery. M9 medium (6 g/L Na2HPO4, 3 g/L KH2PO4, 0.5 g/L NaCl, 1 g/L NH4Cl, 1 mM MgSO4, 0.1 mM CaCl2, 10 mg/L VB1, 40 g/L glucose, and 4 g/L yeast extract) was used for shake-flask fermentation. The working concentrations of ampicillin (Amp) and kanamycin (Kan) were 0.1 g/L and 0.025 g/L, respectively. The working concentrations of isopropyl-β-D-thiogalactopyranoside (IPTG), X-gal, glucose, and sucrose in media or cultures were 1 mM, 0.1 g/L, 10 g/L, and 20 g/L, respectively. The working concentration of L-arabinose was 20 mM in liquid media and 5 mM in solid media. Details of the reagents and media used in this study are listed in Table S3.

Plasmid construction

The plasmids involved in this study are listed in Table S4. The complete sequences of plasmids p15A-PBAD-Cas9-PT5-Redγβα, pSC101-PBAD-sgRNA-Donor-T1, pSC101-PBAD-sgRNA-Donor-T2, and pSC101-PBAD-sgRNA-Donor-T3 are presented in Notes S1S4. The CRISPR target sequences designed in this study are listed in Table S5. The construction of plasmid pSC101-PBAD-sgRNA-Donor was the key step in the genome editing experiments. When constructing the pSC101-PBAD-sgRNA-Donor plasmid containing one sgRNA expression chimera, pSC101-PBAD-sgRNA-Donor-T1 served as the parental plasmid. First, a specifically designed donor DNA was integrated into pSC101-PBAD-sgRNA-Donor-T1 to construct an intermediate plasmid. The donor DNA contained two homologous arms of approximately 500 bp. Then, a specific spacer (20 bp) was inserted into the intermediate plasmid between the araB promoter and the gRNA scaffold via single PCR and single Gibson Assembly (Gibson et al. 2009). The spacer introduced by PCR served as the overlap in Gibson Assembly. When constructing the pSC101-PBAD-sgRNA-Donor plasmid containing two sgRNA expression chimeras, pSC101-PBAD-sgRNA-Donor-T2 and pSC101-PBAD-sgRNA-Donor-T3 served as the parental plasmids. First, a specifically designed donor DNA was integrated into pSC101-PBAD-sgRNA-Donor-T2 to construct an intermediate plasmid. Then, the intermediate plasmid and pSC101-PBAD-sgRNA-Donor-T3 were combined to construct the pSC101-PBAD-sgRNA-Donor plasmid through PCR and Gibson Assembly. The two specific spacers introduced by PCR served as overlaps in Gibson Assembly. The detailed construction procedures of the pSC101-PBAD-sgRNA-Donor plasmid are illustrated in Fig. S1. To reduce construction procedures, the plasmid pSC101-PBAD-sgRNA-Donor can also be obtained through single Gibson Assembly of multiple fragments.

Procedures for genome editing, plasmid curing, and iterative editing

First, the Kan-resistant (KanR) plasmid p15A-PBAD-Cas9-PT5-Redγβα (plasmid#1) was transformed into the target strain such as MG1655 to obtain the corresponding transformants such as MG1655/plasmid#1. A series of temperature-sensitive Amp-resistant (AmpR) plasmids were constructed to express specific sgRNA and generate specific donor DNA, and these plasmids were collectively named pSC101-PBAD-sgRNA-Donor (plasmid#2). Then, specific plasmid#2 was transformed into the MG1655/plasmid#1 strain, and the MG1655/plasmid#1/plasmid#2 strain was screened in a LB plate with Amp, Kan, and glucose at 30 °C. One or several single colonies were inoculated into 2 mL LB medium, and the culture was cultivated at 30 °C for 2 h (Time 1). Then, 2 μL Amp, 2 μL Kan, and 20 μL IPTG were added to the culture. After 1 h (Time 2), 20 μL L-arabinose was added, and the cultures were cultivated for another 3 h (Time 3) before plating. A 1-μL or 0.1-μL aliquot of the culture was plated onto a LB plate containing Amp, Kan, and L-arabinose, and the plate was incubated overnight at 30 °C. Positive mutants were verified by colony PCR and sequencing. The flowchart of genome editing is shown in Fig. 1 and Fig. S2. The positive mutant was cultivated in LB medium in the presence of only Kan at 40 °C for 12 h to remove the temperature-sensitive AmpR plasmid#2 (Fig. S3a). Then, the obtained edited strain containing only plasmid#1 was used as the starting strain for the next round of genome editing. The KanR plasmid#1 is not stable in the host strain in the absence of Kan. When the final round of genome editing was completed, the edited strain was cultivated in LB antibiotic-free medium at 40 °C for 12 h to remove both the AmpR plasmid#2 and the sucrose-sensitive KanR plasmid#1 (Fig. S3b). The overnight culture was diluted for plating on a LB plate containing sucrose. Theoretically, colonies grown on the plate are plasmid-free. For further verification, single colonies were inoculated into LB medium with or without corresponding antibiotics. The flowchart of plasmid curing and iterative editing is shown in Fig. S2.

Fig. 1
figure 1

Constitutions of the genome editing system and schematic of genome editing. LHA left homologous arm, RH: right homologous arm

Calculation of positive rate and editing efficiency

One hundred colonies in the LB plate containing Amp, Kan, and L-arabinose were tested by colony PCR to screen for positive mutants. Twenty of the positive mutants were further verified via sequencing. The positive rate was calculated as the proportion of positive colonies to the total number of colonies. In blue-white selection experiments, positive colonies were also recognized by their color. White colonies were positive, and blue colonies were negative. One control group was set along with the experimental group to calculate editing efficiency. In the control group, L-arabinose was not added, and thus no Cas9 protein and sgRNA were expressed. All other conditions and processes were the same as for the experimental group. The editing efficiency was calculated as the proportion of positive colonies in the experimental group to the total number of colonies in the control group.

Measurement of growth curve and transformation efficiency

For measuring the growth curve, one single colony was inoculated into 5-mL LB medium, and the culture was cultivated at 37 °C for 12 h. Then, 1-mL seed liquid was inoculated into 100-mL fresh LB medium, and the culture was cultivated at 37 °C in a 220-rpm shaker. During the 12-h cultivation, samples were taken every hour to measure the optical density of the culture at a wavelength of 600 nm (OD600) using an ultraviolet spectrophotometer (V-5100, Shanghai Metash Instruments Co., Ltd). For measuring transformation efficiency, pure pUC19 was used as supercoiled DNA. First, 1 μL pUC19 (1 ng/μL) was added to one tube of competent cells (100 μL). Next, the mixture was incubated for 30 min before conducting heat-shock for 1 min in a 42 °C water bath. Then, the tube was placed on ice for 2 min before adding 900 μL 37 °C SOC medium, and the tube was shaken at 200–230 rpm (37 °C) for 40 min. Finally, 100 μL of the cultures was plated on a LB plate containing Amp, and the plate was incubated overnight at 37 °C. The transformation efficiency is N × 104 CFU/μg pUC19 (“N” refers to the number of transformants obtained in the plate).

Shake-flask fermentation and product detection

For testing isobutanol production, single colonies of engineered strains were inoculated into 5 mL LB media containing the appropriate antibiotics, and the cultures were cultivated at 37 °C for 12 h. Then, 200-μL seed liquid was transferred to airtight shake flasks containing 20 mL antibiotic-free M9 medium for micro-aerobic fermentation. During the 72-h fermentation, samples were taken every 12 h to test the biomass and the titer of isobutanol. Biomass was evaluated by measuring the OD600 of fermentation broth with an ultraviolet spectrophotometer (V-5100, Shanghai Metash Instruments Co., Ltd). For measuring isobutanol concentration, the fermentation broth was centrifuged at 1400×g for 10 min. The supernatant was tested using a gas chromatograph (PANNA GCA91, Shanghai Wangxu Electric Co., Ltd) with high-purity isobutanol as the standard and high-purity n-pentanol as an internal reference.

Results

Optimization of CRISPR/Cas9-assisted recombineering method

The CRISPR/Cas9-assisted recombineering system developed by Li et al. is popular in metabolic engineering community (Li et al. 2015), so we conduct optimizations in the basis of this system. After four rounds of optimizations and testing, we obtained the final edition. The detailed optimization process was illustrated and descripted in Fig. S4. The genome editing system we constructed is a two-plasmid system which consists of five elements: a Cas9-expressing cassette, an sgRNA-expressing cassette, a λ-red recombination system, a donor DNA-generation system, and a plasmid curing system (Fig. 1). A two-plasmid system is more convenient in plasmid construction than a one-plasmid system (Li et al. 2015; Zhao et al. 2016). For the convenience of description, the two plasmids are referred to as plasmid#1 and plasmid#2, respectively, in the following text. Specifically, Cas9 protein and λ-Red recombinases (Gam, Beta, and Exo) were expressed by plasmid#1, which contained a p15A replication origin and a KanR gene. Targeting sgRNA was expressed by plasmid#2, which contained a pSC101 replication origin and an AmpR gene (Fig. 1). There were two types of plasmid#2, the first containing two sgRNA-expressing cassettes and the second containing one sgRNA-expressing cassette. The variant of plasmid#2 depended on the type of genome editing. Donor DNA, which served as an editing template to introduce sequence deletions, insertions, or replacements, was integrated into plasmid#2 to circumvent exonucleases’ attack and copy itself along with plasmid replication. The target site (N20 + PAM) on the genome was added to plasmid#2 in the flanks of donor DNA; thus the Donor DNA was released from the plasmid#2 by Cas9 cleavage during genome editing. The generated linear editing template participated in homologous recombination with the cleaved genome DNA (Fig. 1).

An inducible promoter was used to control the expression of Cas9, thus the CRISPR/Cas9 system functioned only when an inducer was added. In order to select an inducible promoter that is strict enough, the L-arabinose-induced PBAD promoter and two IPTG-induced promoters, namely PT5 and PLlacO1, were tested individually (Table S6). As a result, the PBAD promoter was stricter than both PT5 and PLlacO1 (Fig. S5a), so it was used for the expression of Cas9. Finding that there was still obvious leakage expression, we replaced the PJ23100 promoter controlling the expression of sgRNA by PBAD promoter. In addition, glucose was added to solid medium to further reduce the leakage expression of Cas9 and sgRNA (Fig. S5a). Similarly, an inducible promoter was used to control the expression of λ-Red recombinases; thus the λ-Red recombination system functioned only when needed, as constitutive expression of λ-Red recombinases may lead to genome instability. The promoters PBAD, PT5, and PLlacO1 were tested individually (Table S6), and the PT5 promoter produced highest recombination frequency (Fig. S5b). In addition, we also optimized other terms, including the concentrations of L-arabinose and IPTG, the culture time, and the medium to further improve the system’s performance (Table S6). At an appropriate concentration of L-arabinose, the expression levels of Cas9 and sgRNA were enough for cleaving the single-copy genome, but insufficient for cleaving all copies of plasmid#2 (about five copies (Thompson et al. 2018)). Therefore, edited cells still possessed resistance to Amp. To construct the plasmid curing system, we used the temperature-sensitive pSC101 replication origin for plasmid#2 and added the sucrose-sensitive sacB gene to plasmid#1 as a counter-selection marker (Fig. 1).

Each cycle of editing started with the transfection of plasmid#2 into cells containing plasmid#1 (Fig. 1 and Fig. S2). Then, the correct transformants containing the two plasmids were cultivated for cell reproduction before adding inducers to trigger DNA cleavage and DSB repair. Theoretically, sgRNA guides Cas9 to recognize and cleave the target DNA, generating DSB in the genome and plasmid#2. Then, the λ-Red recombinases mediate homologous recombination between the broken genome and linear donor DNA. This transfers the desired mutation from the donor DNA to the genome, destroying the target site (Fig. 1 and Fig. S2). The cells acquiring the desired mutation survive, and these with an unrepaired genome undergo cell death. Thus, plating liquid cultures on agar medium containing Kan and Amp allowed the selection of desired clones. Colonies growing on the plates were further verified through PCR and sequencing. Then, correct mutants were cultivated at 40 °C in medium containing only Kan to eliminate plasmid#2 (Fig. S3a). The cultures were inoculated into fresh medium to prepare competent cells for a new round of editing (Fig. S2). Each cycle of editing required 3 days. After the final round of editing, plasmid#1 and plasmid#2 were eliminated together by incubating the correct clones at 40 °C in antibiotic-free medium and plating the cultures on agar medium containing sucrose (Fig. S2 and S3).

Long fragment insertion

To evaluate the ability of the genome editing method to mediate long fragment insertion, we tried to insert fragments of different lengths (3 kb, 6 kb, 9 kb, and 12 kb) into the lacZ gene of E. coli strain MG1655 (Fig. 2a). We constructed four different versions of plasmid#2 harboring the corresponding donor DNA and expressing the same sgRNA targeting the lacZ gene. The four inserted fragments came from the F plasmid of E. coli strain XL1-Blue, and they had no homology with the MG1655 genome. The insertion of these fragments would inactivate the lacZ gene encoding β-galactosidase. Thus, we could differentiate edited and unedited colonies via blue-white selection. The edited colonies were white in a LB plate containing IPTG and X-gal, while the unedited colonies were blue. We also identified edited clones though PCR. One pair of primers (F1/R1) was designed for the verification of 3-kb insertion (Fig. 2a), and correct clones obtained much larger PCR products than the control (Fig. S6a). Two pairs of primers were designed for the verification of 6-kb, 9-kb, and 12-kb insertions (Fig. 2a). The correct clones obtained the desired PCR products using both F1/R2 and F2-X/R1 (X = 1, 2, 3), while the control did not (Fig. S6b–d). The PCR products were further verified by sequencing. Based on the results of blue-white selection, PCR, and sequencing, we determined the editing efficiencies and positive rates. The editing efficiencies in these four insertion experiments were 1.2 × 10−3, 1.2 × 10−3, 9.6 × 10−4, and 7.2 × 10−4, respectively (Fig. 2b). The positive rates in the four insertion experiments were 97.3, 98.3, 96.7, and 98.3%, respectively (Fig. 2b). These results indicated that both Cas9-mediated DNA cleavage and λ-Red-mediated DSB repair were efficient in our experiments. We found that the small-proportion negative colonies (< 5%), commonly called “escapers” (Jiang et al. 2013; Li et al. 2015), came from two sources. More than half of the escapers did not undergo cleavage by Cas9, probably because of the limited induction time and intensity of L-arabinose. The remaining escapers acquired deletions of unknown length in the target site, which was likely due to the presence of A-EJ repair (Chayot et al. 2010; Huang et al. 2019). The 12-kb insertion is sufficient for application in most cases of synthetic biology and metabolic engineering. The method was compared with three existing CRISPR/Cas9-based methods that performed relatively well in long fragment insertion (Fig. 2c). Method 1 enabled the insertion of 8-kb exogenous DNA, yielding a positive rate of 15% (Li et al. 2015). Method 2 enabled the insertion of 7-kb exogenous DNA, and the positive rate was 61% in the presence of a selectable marker (Chung et al. 2017). Method 3 enabled the insertion of 7-kb exogenous DNA, yielding a positive rate of 10% (Li et al. 2019b). Our method performed better as it enabled the insertion of 12-kb exogenous DNA with a positive rate of 98.3%.

Fig. 2
figure 2

Long fragment insertion. (a) Schematic of fragment insertion of different lengths. TS target site, LHA left homologous arm, RHA right homologous arm, F forward primer, R reverse primer. (b) Editing efficiencies and positive rates in the four editing experiments. (c) Comparison of largest insertion length and positive rate between three reported methods and our method. Data are expressed as means ± s.d. from three independent experiments

Long fragment deletion

First, we successfully deleted a 99.9-kb fragment, starting at 565,156 and ending at 665,088, in the MG1655 genome (Fig. 3a). To determine the relationship between editing performance and the length of the deleted fragment, we selected seven fragments of different lengths within the 99.9-kb fragment for individual deletion. The lengths of these fragments were 9.1, 21.5, 30.6, 39.4, 59.8, 79.8, and 99.9 kb (Fig. 3a). To delete these fragments, we constructed seven different versions of plasmid#2 harboring two sgRNA-expressing cassettes. One sgRNA targets the same site (TS1) in the genome, and the other targets different sites (TS2–1–TS2–7) (Fig. 3a). Based on the results of PCR and sequencing, we determined their editing efficiencies and positive rates (Fig. 3b). As demonstrated, all positive rates were over 95%, similar to the results in long fragment insertion experiments. The deletion of 9.1, 21.5, 30.6, 39.4, 59.8, and 79.8-kb fragments resulted in similar editing efficiencies, while the deletion of the 99.9-kb fragment resulted in lower editing efficiency (Fig. 3b). We found that the 99.9-kb fragment knockout strain grew much more slowly than wild-type MG1655, while the 79.8-kb fragment knockout strain had a similar growth rate to wild-type MG1655 (Fig. S7a and S7d). This phenomenon implies that the terminal region of the 99.9-kb fragment contained some genetic information that was important for cell growth. The decrease in editing efficiency of the 99.9-kb deletion experiment might be due to the lower viability of edited cells, as the editing efficiency was calculated based on the number of colonies formed on the plate. In this study, we also successfully deleted other long fragments in the genome (Fig. 4d), which are described in the next section. The method was compared with four existing CRISPR/Cas9-based methods that performed relatively well in long fragment deletion (Fig. 3c). Method 1 enabled the deletion of 12 kb of genome DNA, yielding a positive rate of 90% (Li et al. 2015). Method 2 enabled the deletion of 17 kb of genome DNA, and the positive rate was 17% (Su et al. 2016). Method 3 enabled the deletion of 100 kb of genome DNA, yielding a positive rate of 75% (Zhao et al. 2017). Method 4 enabled the deletion of 123 kb of genome DNA, and the positive rate was 36% (Zheng et al. 2017). Our method performed better as it enabled the deletion 186.7 kb of genome DNA with a positive rate of 96.6%.

Fig. 3
figure 3

Long fragment deletion. (a) Schematic of fragment deletion of different lengths. TS target site. (b) Editing efficiencies and positive rates in the seven editing experiments. (c) Comparison of largest deletion length and positive rate between four reported methods and our method. Data are expressed as means ± s.d. from three independent experiments

Fig. 4
figure 4

Deletion of nonessential sequences and genome simplification. (a) Deletion of long fragments containing no essential gene. (b) Deletion of long fragments containing one essential gene. (c) Deletion of long fragments containing two essential genes. (d) Schematic of the deletion of fragment No.1. LHA left homologous arm, RHA right homologous arm, F forward primer, R reverse primer. (e) Representative results of PCR verification in the deletion experiment of fragment No.1. (f) Results in the deletion experiments of 12 nonessential fragments. (g) Summary of cumulative deletion. Data are expressed as means ± s.d. from three independent experiments

Identification of nonessential sequences and chromosomal simplification

According to previous reports, the MG1655 chromosome harbors 4497 genes, including 4296 protein-encoding genes and 201 RNA-encoding genes (Keseler et al. 2013, 2017). Researchers at Keio University identified the essentiality of all protein-encoding genes in E. coli K-12 by single gene deletion, generating the Keio collection (Baba et al. 2006; Yamamoto et al. 2009). This provided important information for us to identify potential nonessential long fragments in the MG1655 genome. To delete a long fragment, we needed to construct a plasmid#2 that expressed a pair of sgRNA targeting two flanks of the fragment and harboring the corresponding donor DNA (Fig. 4a). To delete a long fragment harboring a limited number of essential genes, we added these genes to the corresponding plasmid#2 between the two homologous arms. Therefore, the essential genes remained in the genome after editing, and the edited cells survived (Fig. 4b and c). For each long fragment deletion, we designed two pairs of primers for PCR verification. The first primer pair targets DNA sequences within the long fragment, and the second primer pair targets the adjacent sequences outside the two homologous arms (Fig. 4d and Fig. S8). The correct clones did not obtain PCR product using the first primer pair but obtained the corresponding PCR products using the second. On the contrary, the unedited control clone obtained the corresponding PCR products using the first primer pair but did not obtain PCR products using the second (Fig. 4e and Fig. S9).

Altogether, we successfully deleted 12 long nonessential fragments in the MG1655 genome (Table 1), including the 99.9-kb fragment (No. 3) mentioned in the previous section. These fragments are located in different regions of the genome, and their lengths range from 52.0 to 186.7 kb. Among the 12 fragments, No. 3, No. 8, and No. 11 harbor one essential gene; No. 1 and No. 4 harbor two essential genes; and No. 9 harbors three essential genes (Table 1). Based on the results of PCR and sequencing, we determined the editing efficiencies and positive rates (Fig. 4f). All positive rates were over 95%, and the editing efficiencies ranged from 2.3 × 10−4 to 1.3 × 10−3. The deletion of fragments No. 3, No. 4, and No. 7 led to much lower editing efficiencies than those from deletion of the other fragments. By measuring growth curves of the 12 knockout strains, we found that the No. 3, No. 4, and No. 7 knockout strains grew much slower than other knockout strains, and the No. 4 knockout strain grew slowest (Fig. S7). This phenomenon implies that these fragments were important for cell growth and that the lower viability of edited cells might have led to the increase in editing efficiency. In addition, the different DNA loci and sgRNA sequences are also possible factors influencing the editing efficiency.

Table 1 Long fragments deleted in the MG1655 genome

After deleting 12 long fragments individually, we tried to construct cumulative deletion mutants. Here, we used MG1655-ΔNo. X to represent the MG1655 mutant that loses fragment No. X (X = 1, 2, 3, …, 12). As No. 1 was the longest fragment deleted in this study (Table 1), we chose to construct cumulative deletion mutants based on strain MG1655-ΔNo. 1. Though iterative editing, we successfully deleted fragment No. 9 from MG1655-ΔNo. 1, generating strain MG1655-ΔNo. 1/ΔNo. 9 that lost a total of 270.7 kb of the DNA sequence, containing 268 open reading frames (ORFs) (Fig. 4g). We then tried to delete a third fragment based on MG1655-ΔNo. 1/ΔNo. 9. According to the growth curves of single deletion mutants, the knockout of fragment No. 2, No. 5, No. 6, No. 8, No. 10, or No. 12 had no apparent influence on cell growth (Fig. S7). Therefore, we attempted to delete these fragments individually in MG1655-ΔNo. 1/ΔNo. 9. As a result, we successfully obtained strains MG1655-ΔNo. 1/ΔNo. 9/ΔNo. 2, MG1655-ΔNo. 1/ΔNo. 9/ΔNo. 5, and MG1655-ΔNo. 1/ΔNo. 9/ΔNo. 6. The three knockout strains lost a total of 324.1, 370.6, and 368.7 kb of the DNA sequences containing 315, 364, and 368 ORFs, respectively (Fig. 4g). We failed to knock out fragments No. 8, No. 10, and No. 12 in MG1655-ΔNo. 1/ΔNo. 9 despite repeating the experiments several times, implying that these fragments were all essential for the survival of MG1655-ΔNo. 1/ΔNo. 9.

Metabolic engineering of E. coli for isobutanol production

Higher alcohols such as isobutanol and n-butanol show promise in becoming the next generation of biofuels, due to their higher energy density, higher vapor pressure, and relatively low hydroscopicity (Saini et al. 2017; Liang et al. 2018). To illustrate the potential of applying the genome editing method to metabolic engineering, we used the method to modify the E. coli genome for producing isobutanol. First, we constructed a chassis strain named JW74 based on MG1655 with six rounds of genomic editing (Fig. 5a). The competency of JW74 was 170-fold that of MG1655, making it much easier to transform exogenous DNA. We then built a 7.9-kb operon and integrated it into the JW74 chromosome, thus displacing fragment No. 5 (Fig. 5a) and generating strain SH258. Fragment No. 5 was 99.9 kb in length, and the corresponding knockout strain grew slightly faster than its parental strain (Fig. S7f). The operon consists of five structural genes and 5′ and 3′ untranslated regions (UTRs). The 5′ UTR contains a strong bacterial ribosome-binding site (Elowitz and Leibler 2000) and a T7 promoter, which naturally controls the expression of bacteriophage T7 RNA polymerase (Rong et al. 1998); the 3′ UTR contains a T7 terminator. The five structural genes are alsS, ilvC, ilvD, kivD, and adhA (Fig. 5a). Among the five genes, ilvC and ilvD came from E. coli, alsS came from Bacillus subtilis (Atsumi et al. 2009), and kivD and adhA came from Lactococcus lactis (Atsumi et al. 2008) (Fig. 5b). In order to initiate transcription of the operon, we introduced the T7 RNA polymerase-encoding gene controlled by the T5 promoter (Bujard et al. 1987) to the SH258 genome, generating the SH274 strain (Fig. 5a). Though the T5 promoter is an inducible promoter repressed by LacI, it served here as a strong constitutive promoter. This is because SH274 is a lacI-defective strain. In traditional metabolic engineering, introducing a high-copy-number fermentation plasmid is a commonly used strategy to overexpress enzymes related to the target products. Therefore, we constructed the pColE1-PT5-alsS-ilvC-ilvD-kivD-adhA plasmid and transformed it into JW74, generating the SH279 strain.

Fig. 5
figure 5

Metabolic engineering of E. coli for isobutanol production. (a) Construction of the SH274 strain based on the JW74 strain. (b) The synthetic pathway of isobutanol. (c) Results of isobutanol fermentation of the SH274 strain. (d) Results of isobutanol fermentation of the SH279 strain. Data are expressed as means ± s.d. from three independent experiments

We used the strains SH274 and SH279 to conduct micro-aerobic fermentation in shake flasks containing 20 mL of M9 medium. Briefly, the acetolactate synthase (AlsS) converts pyruvate, the intermediate product of glycolysis, into 2-acetolactate. This is then transformed into 2,3-dihydroxy-isovalerate by ketol-acid reductoisomerase (IlvC). As the substrate of dihydroxyacid dehydratase (IlvD), 2,3-dihydroxy-isovalerate is converted into 2-ketoisovalerate, which is transformed into isobutyraldhyde by 2-ketoisovalerate decarboxylase (KivD). Finally, isobutyraldhyde is catalyzed by alcohol dehydrogenase (AdhA), generating isobutanol (Fig. 5b). During fermentation, samples were taken every 12 h to measure the OD600 value and isobutanol titer (Fig. 5c). As a result, isobutanol reached a maximum titer of 1.3 g/L after 48 h of SH274 fermentation (Fig. 5c). This was the first attempt to produce isobutanol without introducing a high-copy-number fermentation plasmid, and isobutanol production was higher than many reports using such a plasmid (Lan and Liao 2013; Chen and Liao 2016). For strain SH279, isobutanol reached a maximum titer of 5.5 g/L after 48 h (Fig. 5d). This is 4.2-fold that of SH274, indicating that the SH274 strain has much room for improvement. In future study, we therefore plan to increase the copy number of the operon PT7-alsS-ilvC-ilvD-kivD-adhA-TT7 in the SH274 genome to strengthen the expression of related enzymes.

Discussion

Many CRISPR/Cas9-based methods have been developed for genome engineering in E. coli. Generally, the CRISPR/Cas9 system is combined with heterologous DSB repair systems, including homologous recombination systems (Jiang et al. 2013, 2015; Li et al. 2015, 2019b; Zhao et al. 2016, 2017 Chung et al. 2017; Zhang et al. 2017) or NHEJ systems (Su et al. 2016; Zheng et al. 2017). NHEJ-mediated methods generate stochastic DNA indels in the target region, which makes genome editing inaccurate. In contrast, homologous recombination-mediated methods can achieve precise genome editing with higher editing efficiency. The method developed in this study is based on the CRISPR/Cas9-assisted recombineering, which is a class of methods that combine the CRISPR/Cas9 system and λ-Red system. CRISPR/Cas9-assisted recombineering methods need an artificial donor DNA as the editing template. Both circular DNA (plasmid-borne dsDNA (Jiang et al. 2015, Zhao et al. 2016, Feng et al. 2018)) and linear DNA (PCR-amplified dsDNA (Jiang et al. 2015, Li et al. 2015, Chung et al. 2017, Zhang et al. 2017, Zhao et al. 2017, Li et al. 2019b) or synthesized ssDNA (Jiang et al. 2013; Li et al. 2015)) have been used as the editing template in existing methods. Editing template integrated into a plasmid can avoid the attack by DNA exonucleases and copy itself along with plasmid replication, which greatly increases homologous recombination frequency and thus increases editing efficiency. However, the homologous recombination between a cleaved genome and a plasmid generates either non-crossover product or crossover product (Fig. S10a). Therefore, there is a significant chance that the total plasmid be integrated into the genome at the target site, leading to undesired recombination products (Fig. S10b). Although genome editing with linear editing templates circumvents the trouble caused by plasmid integration, it results in much lower editing efficiency due to the sensitivity of linear editing templates to DNA exonucleases. To solve this contradiction, we integrated the editing template into a plasmid and added the target sequence to the flanks of the editing template. Therefore, the editing template can be released from the plasmid by Cas9 cleavage. By using this strategy, the advantages of the circular editing template and the linear editing template are combined in one method. In our genome editing system, two independently inducible promoters, an arabinose-inducible (PBAD) system and a lactose-inducible (Plac) system, were used to control the expression levels of CRISPR/Cas9 and λ-Red. Considering that cross-talk between the two promoters might limit their ability, we utilized an evolved AraC (Lee et al. 2007) to construct the PBAD system. The evolved PBAD system is ten times more sensitive to L-arabinose and tolerates IPTG significantly better than the wild type (Lee et al. 2007), thus eliminating the potential effects of inducer interactions. The using of a strict inducible promoter to control the expression of CRISPR/Cas9 also improves the performance of the genome editing system. With this strategy, the transformants have enough time to recover and reproduce before DNA cleavage, which increases the cell activity and initial cell number.

The genome editing method developed in this study has been proven to be efficient for large fragment editing. It enabled us to insert DNA fragments up to 12 kb and delete DNA fragments up to 186.7 kb. Compared with existing methods, our method results in higher editing efficiency and positive rates. In addition, the high performance of this method is independent of highly efficient competent cells. As a powerful genome engineering tool, the method has great application potential. To demonstrate its potential, we have applied the method in genome simplification and metabolic engineering. E. coli has been the prominent prokaryotic organism in research laboratories since the origin of molecular biology, and is arguably the most completely characterized single-cell life form (Blattner et al. 1997). Functional analyses have shown that E. coli cells grown under given conditions use only a small fraction of their genes (Tao et al. 1999). As Koob et al. have proposed, deletion of genes that are nonessential under a given set of growth conditions could identify a minimized set of essential E. coli genes and DNA sequences (Koob et al. 1994). In the past decades, researchers have explored nonessential sequences and removed them from the E. coli genome individually or cumulatively, trying to construct a minimized genome (Yu et al. 2002; Hashimoto et al. 2005; Kato and Hashimoto 2007, 2008). In their studies, the methods utilized to delete nonessential sequences were complicated and time-consuming. To remove a long fragment from the genome, researchers have tried many recombination techniques both alone or in combination, including Flp/FRT, Cre/loxP, λ-Red, Tn5 transposon, and phage P1 transduction (Yu et al. 2002; Kang et al. 2004; Kato and Hashimoto 2007, 2008). Compared with these methods, our method is time-saving and easy to handle. Using this method, we have constructed 12 individual deletion and four cumulative deletion strains based on MG1655, with the simplest genome lacking a total of 370.6 kb of DNA sequence containing 364 ORFs. Although some of the deletions generated could coexist in a single strain, many deletions that were viable individually were not viable when combined with other deletions, which clearly indicates that some genes are not dispensable simultaneously, despite being dispensable individually. The genes belonging to this group may be those involved in alternative metabolic pathways. This observation also suggests that the number of essential genes is greater than estimated and further illustrates the utility of our combinatorial-deletion approach for functional study of the E. coli genome. Our work of large fragment deletion was based on the Keio collection created by researchers at Keio University. According to their research, genes that are not included in the Keio collection had no apparent effect on cell growth if deleted individually (Baba et al. 2006; Yamamoto et al. 2009). However, in our study, three out of twelve large DNA fragments had significant effects on cell growth, which might due to the following reasons. First, some genes in a DNA fragment are involved in alternative metabolic pathways, and deletion of these genes simultaneously will block the synthesis of important substances. Second, these DNA fragments contain some RNA-encoding genes that are important for cell metabolism and regulation. In the Keio collection, only protein-encoding genes were studied and RNA-encoding genes were not, so it was hard for us to determine whether a RNA-encoding gene was essential or not. Third, deletion of non-essential genes had a cumulative effect on cell growth.

Microorganisms are versatile living systems for achieving biosynthesis of valuable molecules contributing to chemical, energy, and pharmaceutical processes (Huo et al. 2011, 2018; Paddon and Keasling 2014; Fang et al. 2018; Yu et al. 2019). Plasmids have been commonly used for domesticating microbial materials to obtain desired cellular functions, due to the simplicity of genetic manipulation. Antibiotics have been widely used to minimize phenotype variation of plasmid-containing microbes. However, the use of antibiotics may result in multidrug-resistant species by horizontal gene transfer, and metabolic burden leading to suboptimal production of target compounds (Mignon et al. 2015). The addition of antibiotics not only increases the cost, but also contaminates final products in industrial settings. Genome integration is a good alternative to plasmids and provides more stability for artificially introduced genetic information. In this study, we integrated the isobutanol synthetic pathway into a chassis strain derived from MG1655, generating a genetically stable metabolic engineering strain that produced 1.3 g/L isobutanol in a shake flask. The case shows the application potential our method in genome integration and metabolic engineering.