Introduction

Genome shuffling is a powerful method for rapid evolution of strains to achieve desirable phenotypes using recursive multi-parental protoplast fusions. This method was first applied to the improvement of tylosin production in Streptomyces fradiae, resulting in a significant phenotype improvement after only two rounds of shuffling equivalent to those obtained previously using 20 rounds of mutagenesis and screening (an achievement requiring about 20 years of effort) (Zhang et al. 2002). Another supreme advantage of this technology is that it allows genetic breeding performed on microbes of which the genetic basis and network information are poorly understood (Biot-Pelletier and Martin 2014). To date, the method has been used successfully in many microorganisms to improve important phenotypes, such as improved yield of tylosin in S. fradiae (Zhang et al. 2002) and pristinamycin in Streptomyces pristinaespiralis (Xu et al. 2008, 2009), enhanced resistance against pentachlorophenol in Sphingobium chlorophenolicum (Dai and Copley 2004), and increased tolerance to acids in Saccharomyces cerevisiae (Cheng et al. 2015; Pinel et al. 2011; Wang et al. 2007; Wei et al. 2012, 2008; Zheng et al. 2011). Despite the effectiveness of genome shuffling, there are still practical needs to make the method simpler, faster, more efficient, and less time consuming.

During genome shuffling, protoplast fusion is performed by subjecting protoplasts to an electric pulse or, more often, by incubating them in the presence of PEG that alters membrane fluidity (Biot-Pelletier and Martin 2014). Depending on the fusion, the phenomenon of conjugational transfers of genetic materials from different species will take place, allowing for recombination of genomes of the starting populations (Petri and Schmidt-Dannert 2004). Obviously, a high-efficiency protoplast fusion is essential for successful genome shuffling, and the higher the rate of different kinds of useful parental fusions is, the more rapidly the desired improved strains are obtained. Theoretically, any number of protoplasts can merge into a single fusant, but how many different kinds of parents would be optimal in order to achieve the best efficiency of fusions? In other words, when designing genome shuffling, how many kinds of parental strains should we submit to achieve the maximal heteroplasmic fusions? Is the more the better? Or is there a peak range, after which the efficiency of heteroplasmic fusions will drop, even with increased numbers of different kinds of parental strains? Little attention has been given to this problem in previous studies, and the number of parents currently used in genome shuffling is largely following the earliest protocol reported by Zhang et al. (2002). In this study, we attempted to obtain such information using a powerful tool, namely the Monte Carlo simulation, which can be used to estimate the significance level of any test statistic (Danilov et al. 2013; Reed et al. 2015; Sham and Curtis 1995). Monte Carlo simulation methods have been successfully applied to spatial directionality of surface reactions to decide the Monte Carlo reaction probabilities (Kerr et al. 2008), which is the very point where we are inspired primarily.

Herein, we described our finding of the optimal number of parental strains in genome shuffling and the application of this finding to yield improvement of a bioactive natural product from an endophytic fungus. Specifically, we used the optimized method to increase the yield of deacetylmycoepoxydiene (DAM, a derivative of mycoepoxydiene, MED, Fig. 1) by over 200-fold in just two runs of genome shuffling. DAM is an antitumor natural product isolated from Phomopsis sp. A123, a fungal endophyte of mangrove plants (Shen et al. 2007; Sommart et al. 2009; Trisuwan et al. 2011; Zhu et al. 2015). DAM has received a great deal of attention in the recent years, due to its structural novelty and potent activity (Prachya et al. 2007; Sommart et al. 2009). However, the low yield of DAM produced in the original Phomopsis sp. A123 has become a bottleneck for the studies of its structure-activity relation and the potential as a new antitumor drug lead. Therefore, the development of a rapid and reliable method of genome shuffling and the demonstration of its utility are of both theoretical and practical values.

Fig. 1
figure 1

Chemical structure of mycoepoxydiene (MED) and deacetylmycoepoxydiene (DAM)

Materials and methods

Microorganism strains, media, and screening method

Phomopsis sp. A123 (CCTCC M206060) was maintained on PDA (potato dextrose agar) slants containing 20 % (v/v) sea water and allowed to develop for 7 days at 28 °C. A bacterial indicator stain, Bacillus subtilis [CMCC (B) 63501] was used in the screening of yield-improved DAM mutants and incubated at 37 °C in LB medium. A previously developed procedure, antimicrobial-TLC–HPLC (ATH) (Zhang et al. 2011), was used for high-throughput screening of mutants and fusants (see Supplementary Information). Briefly, each of the cultures of various Phomopsis sp. strains was assayed for B. subtilis inhibitory activity by plug diffusion method. Those with a clear inhibition zone were selected and extracted using methanol (5 mL). The methanol extracts were further tested by TLC to confirm the production of DAM. Depending on the TLC results, the stains with the highest yield were then analyzed by high-performance liquid chromatography (HPLC).

Simulation of cell fusion

Monte Carlo simulation describes a simulation in which a parameter of a system is estimated using Monte Carlo techniques. Monte Carlo estimation is the process of estimating the value of a parameter by performing an underlying stochastic or random experiment. In this paper, Monte Carlo simulation method is adopted to estimate the percentage of cell fusions. The basic idea is that, for a cell, if the minimum distance between it and cells of other kind is smaller than that between it and cells of same kind, it will fuse with cells of other kinds. Some assumptions are as follows:

  1. A.

    Each cell is uniform distribution in the solution.

  2. B.

    The cells of same kind have the same characteristics.

  3. C.

    The total number of cells is N t .

  4. D.

    The number of cells of each kind is N, and n is the number of the cell kinds, meeting the constraint N t  = n · N.

In the Monte Carlo simulation, we assume that a uniform random variable (RV) within unit area represents a cell. The n groups independently generated uniform random variables within unit area represent the n kinds of cells. The procedure of Monte Carlo simulation is operated as follows.

  1. 1.

    To generate the random variables to represent the N t cells. The n groups uniform random variables within unit area are generated independently and each group has N random variables meet the constraint N t  = n · N. The process of cell fusion is simplified as shown in Fig. 2a, which shows three kinds of parent cells, namely kinds A, B, and C, coexisting in a solution. The RV a i (a j ), b j , and c j represent specific cells of kinds A, B, and C, respectively, and d min is the minimum distance between any two cells. For example, \( {d}_{\min \left({a}_i,{a}_j\right)} \)describes the minimum distance between cell a i and cell a j from A, and \( {d}_{\min \left({a}_i,{b}_j\right)} \)represents the minimum distance between cell a i and cell b j from A and B, respectively. Assuming cell a i as an acceptor, the valid fusion between cells will take place when the minimum distance between cells of different kind is smaller than that between cells of the same kind. For example, if \( {d}_{\min \left({a}_i,{b}_j\right)}<{d}_{\min \left({a}_i,{a}_j\right)} \) or \( {d}_{\min \left({a}_i,{c}_j\right)}<{d}_{\min \left({a}_i,{a}_j\right)} \), the cell a i will fuse with cell b j or cell c j . If \( {d}_{\min \left({a}_i,{b}_j\right)}<{d}_{\min \left({a}_i,{a}_j\right)} \) and \( {d}_{\min \left({a}_i,{c}_j\right)}<{d}_{\min \left({a}_i,{a}_j\right)} \), the cells a i , b j , and c j will fuse together.

  2. 2.

    To calculate the frequencies of valid fusions for the given number of different parent cells. As illustrated in Fig. 2a, if a cell of kind A is supposed to be an acceptor, the following procedure calculates the number of fusions of three kinds of cells:

$$ \begin{array}{l}{\mathrm{count}}_3=0;\\ {} fori=1:N\\ {}\mathrm{if}\left({d}_{\min \left({a}_i,{b}_j\right)} < \kern.10em {d}_{\min \left({a}_i,{a}_j\right)}\begin{array}{ccc}\kern2em & \kern1em \&\&\kern1em & \kern2em \end{array}{d}_{\min \left({a}_i,{c}_j\right)}<{d}_{\min \left({a}_i,{a}_j\right)}\right)\\ {}\begin{array}{cc}\kern2em & \kern2em \end{array}{\mathrm{count}}_3++;\\ {}\mathrm{endif}\\ {}\mathrm{endfor}\end{array} $$
Fig. 2
figure 2

Monte Carlo simulation of protoplast fusion. a Schematic diagram of invalid and valid protoplast fusion, in a simulation using three types of protoplasts, a, b, and c. For the symbols, a i and a j represent two different individual cells of type - a protoplasts, and b j and c j represent a cell from type - b and type - c, respectively. The fusion between the same type cells (a i and a j ) is invalid, whereas the fusion between different types of cells (a i and b j ; a i and c j ; a j and b j ; a j and c j ; a i , b j , and c j ; a j , b j , and c j ; so on). The symbol d min represents the minimum distance between two protoplasts. For example, d min(ai, aj) represents the distance between two individuals of the same type (type - a); d min(ai, bj) or d min(ai, cj) represent the distance between two individuals of different types (type - a and type - b; type - a and type - c). b. The frequency (%) of heteroplasmic fusions is plotted against the number of different parents submitted. The frequency of every group was calculated after 100 trials of fusion

where count3 is the numbers of fusions of three kinds of cells. In general, based on the method, we can get [count M , count M-1 ,……, count2, M ≤ n], which denote the numbers of fusions of M kinds of cells, M-1 kinds of cells, ……, 2 kinds of cells, respectively. Base on the calculations, the percentage of total fusions for N t cells is calculated by

$$ {f}_i=\frac{{\displaystyle \sum_{i=2}^M{\mathrm{count}}_i}}{N_t}\times 100\% $$

Protoplast preparation and mutagenesis

The parent strains that were cultured in PDA broth containing 20 % (v/v) sea water were collected and washed through filtration. The mycelia were then resuspended in a mixture of 1 % lywallzyme (Guangdong Institute of Microbiology, Guangzhou, China), 0.5 % lysozyme (Sigma, St. Louis, USA), and 1.0 M sodium chloride as an osmotic stabilizer to prepare protoplast. After 2 h of incubation with gentle shaking (50 rpm), the protoplasts formed were collected, purified, and resuspended in stabilizer solution for further study. The mutagenesis was carried out as described by Khattab and Bazaraa (2005). UV irradiation was performed by exposing the protoplasts directly to UV light (15 W) at a distance of 20 cm for 2 min (Supplementary Table S1), and a dose of 140 μg/mL of NTG (nitrosoguanidine) was added for NTG mutagenesis for 60 min. After that, the mutated protoplasts suspension was diluted with 1.0 M sodium chloride and transferred on the regeneration PDA. The colonies arising from surviving cells were evaluated for DAM yields using ATH method (Zhang et al. 2011). The mutants (M1 and M2, Supplementary Table S2 and S3) that showed higher levels of DAM over the wild type were preserved for parental library construction.

Genome shuffling of Phomopsis sp. A123

Genome shuffling was carried out according to the previously reported method with a few modifications (Dai and Copley 2004; Hatvani et al. 2006; Patnaik et al. 2002; Zhang et al. 2002). Eight starter strains (M1 and M2 strains in Supplementary Information) obtained from the NTG mutagenesis and the UV mutagenesis were individually deactivated before fusion (Supplementary Table S1). Specifically, four of the eight starter protoplasts were treated with UV (20 cm distance irradiation for 5 min), and the rest four starter protoplasts was treated with one of the following conditions, heat (51 °C, 8 min), ethanol (30 %, 30 min), nystatin (250 μg/mL, 30 min), or iodoacetic acid (0.5 %, 30 min). An equal amount of each of the eight inactivated protoplasts was mixed, and the mixtures were centrifuged at 4000g for 5 min and resuspended in 5 mL stabilizer solution containing 40 % PEG (MW 6000) and 25 mM CaCl2. Then, the mixture was incubated at 30 °C for 15 min and spread on regeneration plates containing 1.0 M NaCl. After incubation at 28 °C for 4 days, colonies that appeared on plates were collected, and the DAM yield was evaluated using the ATH method (Zhang et al. 2011), to obtain the G1 strains (Supplementary Table S4). The high-yield mutants were collected, and eight of these were chosen for the second round of genome shuffling to obtain the G2 strains (Supplementary Table S5 and S6).

Results

Simulation model for genome shuffling

Protoplast fusion, a crucial step in genome shuffling, could occur when cells adhere to each other tightly after breakdown of the cell membranes (Chen and Olson 2005). In theory, heteroplasmic fusions could preferably take place over homoplasmic fusions, when the minimum distance between a given protoplast and the surrounding protoplasts from different parents is smaller than that between its siblings. Based on this principle, we employed Monte Carlo simulation to decide how many kinds of parental species we should adopt to achieve the maximal useful fusions in genome shuffling. The basic idea is shown in the section of “Simulation of cell fusion.” The percentage of fusions was estimated for the given total cells N t and kinds n, and the results are shown in Table 1 and Fig. 2. As expected, the simulation confirmed that the frequencies of heteroplasmic fusions are very low when a small number of cell types are submitted, while the rates of fusions rise gradually with the number of different parental kinds increase. However, it is not the more the better. The simulation predicted that the ideal numbers of parent kinds are from eight to 12, within which all the frequencies of heteroplasmic fusions are up to 8 %. Multi-parental protoplast fusion is a time-consuming and laborious process, and the more the kinds of parental protoplasts are employed, the more difficult the process would become. So, we adopted eight as the optimal number for parental protoplast fusion in our following experiments.

Table 1 Frequencies of heteroplasmic fusions among different types of parental strains obtained through Monte Carlo simulation

Generation of starter strains for genome shuffling

Genome shuffling is a relatively new tool to amplify the genetic diversity within a selected mutant population through the recursive genetic recombination. The success of this approach largely depends on a genetically diverse parental library with a favorable phenotype compared with the original starter strain. In this study, we used NTG and UV mutations to generate the initial pool of Phomopsis sp. strains with an enhanced DAM yield (Supplementary Table S2 and S3). We screened a large number of mutants (1970 strains) using the ATH method and obtained a pool of DAM-enhanced strains after two rounds of random mutagenesis. Based on the simulation described above, eight of these strains, three strains (642, 770, and 823) from NTG mutation (M1 series), and five strains (37, 268, 274, 465, and 819) from NTG + UV mutation (M2 series) were selected as the starter strains for the first run of genome shuffling (Supplementary Table S2 and S3). Besides, even with a high rate of 8 % for heteroplasmic fusion under the optimal condition as predicted by the simulation (Table 1 and Fig. 2b), there would still be about 92 % of parental protoplasts remained as non-fusion or homoplasmic fusion, which would certainly increase the difficulty in the subsequent screening process. To avoid the problem, we inactivated the starter protoplasts prior to the shuffling, using a variety of methods (Supplementary Table S1). Specifically, the eight selected starter strains (M1 series and M2 series) were treated with UV, heat, ethanol, nystatin, or iodoacetic acid. Each of the treatments was fatal to the protoplasts as shown in Supplementary Table S1, and the protoplasts could regenerate only when fusions took place between protoplasts with different kinds of mutations, which presumably resulted in functional complementation upon fusions. Meantime, because the conditions that caused non-regeneration of the starter protoplasts were different, the genetic mutations in the eight starter strains were presumably different. Thus, any colony that could grow in the regeneration media after the genome shuffling would not only result from a heteroplasmic fusion (as homoplasmic fusion would be fatal) but also would have an increased genetic diversity, which is a key to the high efficiency of this genome shuffling method for the breeding of the desired phenotype.

Genome shuffling of the endophytic fungus Phomopsis sp. A123

The starter protoplasts were prepared from mycelia of various strains of Phomopsis sp. Based on the simulation above, the eight parental protoplasts (strains 642, 770, 823 from the M1 series and strains 37, 268, 274, 465, 819 from the M2 series) were subject to recursive protoplast fusions. After the first round of protoplast fusion, 664 colonies (G1 series) were obtained and screened for DAM production (Supplementary Table S4). Approximately a dozen of the colonies exhibited a clearly improved DAM yield, among which eight strains, G1-11, G1-55, G1-138, G1-145, G1-146, G1-295, G1-313, and G1-555, produced DAM at a yield of 143, 86, 85, 120, 89, 91, 137, and 106 mg/L, respectively (Fig. 3a, Supplementary Table S6). These strains were collected and used as the parental pool for the second round of genome shuffling. About 1400 colonies (G2 series) were obtained from the second shuffled library and screened for DAM yield (Supplementary Table S5). Approximately two dozens of colonies showed a further improved DMA yield, among which eight strains, G2-102, G2-119, G2-127, G2-448, G2-650, G2-866, G2-919, and G2-1008, gave a yield of 141, 195, 147, 193, 113, 180, 219, and 120 mg/L, respectively (Fig. 3a, Supplementary Table S6). Overall, the yield of DAM in these strains has been significantly improved after two rounds of genome shuffling, compared to the starting wild-type strain A123 (0.8 mg/L) (Fig. 3a). The HPLC profiles of metabolites from a selected group of strains, including the initial strain (A123), a NTG + UV mutant (M2-37), and two genome shuffling strains (G1-11 and G2-919), are presented in Supplementary Fig. S1. The results showed that the DAM peak increased markedly in the genome shuffling strains, while peaks for other metabolites had little change among the various strains. To show the power of this optimized genome shuffling method, we compiled the DAM yields into a single diagram that included the yield-improved strains from three generations of Phomopsis sp. (Fig. 3b). While the classic NTG mutation could achieve some yield improvement, the genome shuffling clearly produced a much higher number of high-yield strains and a greater extent of DAM improvement.

Fig. 3
figure 3

Validation of the simulation-based genome shuffling method using the endophytic fungus Phomopsis sp. as the test organism. a Yield improvement of the antitumor natural product deacetylmycoepoxydiene (DAM) through chemical-physical mutagenesis and genome shuffling. Strain A123 (wild type) was used as the starter strain for DAM yield improvement, the M1 series and M2 series were the first and second generation of NTG/UV mutants, and the G1 series and G2 series were the first and second generation of genome shuffling mutants. b An overview of the DAM productivity in three pools, including the NTG mutants, the first and the second genome shuffling strains which contained 1200 individuals

Discussion

Genome shuffling is a powerful tool for rapid breeding of microbial strains toward desirable phenotypes. However, no theoretical study had been carried out to predict the optimal number of parental strains in the starting library for highly efficient genome shuffling. In this study, we used Monte Carlo simulation to predict the optimal range of parental protoplasts for maximal heteroplasmic fusions. The procedure of genome shuffling contains two steps: the first is to obtain starter strains with the desired phenotype through random mutagenesis and a rapid screening, and the second step is to shuffle the genomes in the pooled population by homologous recombination using recursive protoplast fusion. Genome shuffling thus accelerates directed evolution through recursive recombination of improved progeny. The role of the first step is to produce genetic diversity by random mutations, while the protoplast fusion in the second step recombines those beneficial diversity into one whole genome to accumulate the useful mutations (Biot-Pelletier and Martin 2014). So the number of parental strains for shuffling is a crucial factor that would affect the result of genome shuffling dramatically. Previously, Zhang and coworkers introduced 11 parents for protoplast fusion and obtained a desirable result (Zhang et al. 2002). Hida et al. (2007) adopted 4 parents and had no improvement during the first three rounds of genome shuffling, but obtained 30 % improvement of the desired phenotype after the 4th run of shuffling. The difference in these two previous studies could be explained by our simulation model (Table 1 and Fig. 2). Our simulation showed that the ideal parental number for genome shuffling is in the range of 8–12 different kinds of protoplasts. In the study of Zhang et al. (2002), 11 parents were used, which is within the optimal range of our simulation, and indeed, a rapid improvement of the desired property was observed. On the other hand, only four parental strains were used in the study of Hida et al. (2007), which is not in the optimal range, and indeed, the desired improvement was not obtained. However, the shuffling could be accumulative, so when the number of shuffling reached 3 runs (3 × 4 = 12 parents), Hida et al. (2007) obtained improvement of the target product.

To validate the simulation, we chose Phomopsis sp. A123 as a test system and demonstrated the effectiveness of the optimized genome shuffling in rapidly increasing the yield of the antitumor natural product DAM. Phomopsis sp. A123 is an endophytic fungus associated with mangrove plants. Fungal endophytes represent a largely untapped resource for drug discovery. However, many of these fungi are refractory to direct genetic engineering using tools in molecular biology. For example, our attempts to study DAM biosynthesis through directly manipulating the putative DAM biosynthetic genes in Phomopsis sp. A123 failed so far (Xie et al. 2011). The optimized genome shuffling described here provides a powerful alternative that can dramatically increase the DAM production in this endophytic fungus. It only took two runs of shuffling to drive the DAM yield from 0.8 up to 219 mg/L, an increase of over 270-fold. This result is significant for two reasons. First, DAM is an antitumor natural product with a chemical structure distinct from existing anticancer drugs; second, the drastic DAM increase is made in an endophytic fungus that currently does not have any other effective way to manipulate the DAM biosynthesis. We believe our findings from the simulation and the optimized genome shuffling of Phomopsis sp. will have a broad impact on the exploitation of the vast resource of endophytic fungi.

One distinct aspect of genome shuffling, compared to traditional breeding methods, is the efficiency. The clear superiority in DAM yield improvement in the strains resulted from genome shuffling over the strains resulted from NTG mutation attests the power of genome shuffling (Fig. 3b). The DAM yield of the most desirable strains in each generation increased progressively while the by-products decreased at the same time. This suggests that the shuffling process might have also affected the metabolic network in whole genome so that the flow to the desired compound increased, while the flow to the undesired ones decreased accordingly. The initial A123 strain appeared to have many different kinds of metabolites, and DAM was only a minor component of the entire metabolites probably due to competitions from many metabolic pathways (Fig. 4a). The NTG/UV mutations and genome shuffling gradually drove the metabolic flows toward the DAM biosynthetic pathway, while the others might be weakened or blocked accordingly after the processes of mutation and shuffling (Fig. 4b–d).

Fig. 4
figure 4

Schematic illustration of a possible process that led to DAM yield improvement in Phomopsis sp. through NTG/UV mutations followed by genome shuffling. a, b, c, and d depict four possible stages involving the initial strain (A123), the NTG/UV mutants, and the fusants of the first and the second genome shuffling, respectively. The arrows represent hypothetical metabolic flows, and P1, P2, P3, and P4 represent different by-products