Introduction

Point mutation has historically been regarded as a dominant factor in the evolution of the DNA sequence of clonal organisms such as bacteria. However, there is growing evidence that local recombination which occurs within a short sequence of a gene can play an important role in maintaining bacterial genetic variability (Smith et al. 1991). Such recombination events can take place following the lateral transfer of gene segments of a gene via transduction, transformation, and conjugation; events that can occur not only within species, but also between closely or even distantly related species (Smith et al. 1991; Milkman and McKane 1995; Ochman et al. 2000). Given that homologous recombination is highly dependent on sequence similarity (Vulic et al. 1999), recombination is expected to occur within species much more frequently than between species. Previous sequence analyses of bacterial genes suggest that the recombination rate varies depending on the gene function. In particular, the highest recombination rates have been observed in genes implicated in pathogenicity or virulence probably due to diversifying selection (Li et al. 1994; McGraw et al. 1999). However, several studies have indicated that in some cases housekeeping genes can also display relatively high recombination rates (Nelson and Selander 1994; Fell et al. 1996; Zhou et al. 1997).

Despite numerous examples of intragenic recombination in bacteria, few cases of recombination have been documented for cyanobacterial genes. For multicellular cyanobacteria, possible recombination is suggested to have possibly occurred within a part of the phycocyanin operon (PC-IGS) of Arthrospira (Manen and Falquet 2002), Anabaena, Aphanizomenon, and Nodularia (Janson and Granéli 2002). Possible recombination tracts have also been found in Nostoc within the locus coding two subunits of Rubisco (rbcLX) (Rudi et al. 1998). However, the occurrence of intragenic recombination between different strains has never been substantiated in unicellular cyanobacteria such as Microcystis.

The cyanobacterial genus Microcystis is generally found associated with water blooms all over the world. Generally, five Microcystis species (M. aeruginosa, M. ichthyoblabe, M. novacekii, M. viridis, and M. wesenbergii) have been recognized as the dominant species. These species were defined solely based on morphological characters (e.g., cell size, colony form, sheath characteristics); that is, all these five Microcystis species are “morphospecies,” of which distinction has no ecological or genealogical bases. Morphological characters of Microcystis, however, are highly variable and sometimes overlap, depending on the culture conditions (Otsuka et al. 2000). The low sequence divergence of 16S rDNA within and between morphospecies (<0.7% [Otsuka et al. 1998]) and the lack of correspondence between the morphospecies and sequence cluster in 16S–23S rDNA intergenic spacer (Otsuka et al. 1999) suggest that the species definition of Microcystis are invalid. Based on these results and the high DNA–DNA reassociation values between the five morphospecies of Microcystis (>70%, which is high enough for them to be classified within a single bacterial species [Wayne et al. 1987]), Otsuka et al. (2001) consolidated the five Microcystis species into a single species, M. aeruginosa.

In the present study, we analyze the microcystin synthetase (mcy) gene cluster of Microcystis spp. including four morphospecies and one unidentified species of Microcystis. The products of the mcy gene cluster have a combinational structure of polyketide synthase (PKS) and nonribosomal polypeptide synthetase (NRPS) (Nishizawa et al. 1999, 2000; Tillett et al. 2000), and catalyze the nonribosomal synthesis of microcystins, a variety of cyclic heptapeptides with unknown biological function but that are known to be toxic to humans and other animals. This gene cluster is located on the chromosome, and contains 10 genes (mcyAJ; Fig. 1A); mcyA- G, and J encode “modules” of PKS/NRPS (involved in polyketide/polypeptide chain elongation [reviewed by Cane et al. 1998]) and additional accessory domains (catalyzing modification of polyketide/polypeptide chains), whereas mcyH encodes a putative ABC transporter that is speculated to function in the extracellular or subcellular transportation of microcystins. Preliminary sequence analyses have indicated that there is a certain level of genetic variation among mcy genes of Microcystis spp. Here, we investigate the genetic basis for the maintenance of mcy gene variation focusing on the possibility of recombination. We have sequenced and analyzed the four selected regions located at a known distance from each other within the mcy gene cluster. Our results strongly suggest the occurrence of interstrain recombination in the mcy gene cluster within Microcystis spp.

Figure 1
figure 1

A Organization of the microcystin synthetase (mcy) gene cluster based on Tillett et al. (2000). The 5′ upstream region of mcyJ was identified as dnaN, a putative homologue of the DNA polymerase III β subunit of Synechocystis sp., whereas the 3′ downstream region contains six open reading frames (uma16) with unknown function. B Multidomain structure of the four selected mcy genes. Arrows indicate the positions of the PCR and sequencing primers. Abbreviations for each domain are as follows: A, adenylate forming domain; ACP, acyl carrier protein; AT, acyltransferase; C, condensation domain; CM, c-methyltransferase; DH, dehydratase; KR, ketoreductase; KS, ketosynthase; NMT, N-methyltransferase; OM, o-methyltransferase; PCP, peptidyl carrier protein.

Materials and Methods

Strains, Culture, and DNA Extraction

The bacterial strains used in this study are listed in Table 1. All strains of Microcystis obtained from the NIES (National Institute for Environmental Studies, Japan) were cultured according to the recommendations described in the NIES strain list (Watanabe et al. 2000). Other strains were cultured in MA medium (as described in the NIES strain list) under the same conditions as for NIES strains. DNA extractions were performed following a published protocol (Neilan et al. 1995) with two slight modifications; harvested cells were initially incubated for 30 min in a saturated sodium iodide solution to crush the cell wall, and ethyl alcohol was used in the final step of the procedure to precipitate the genomic DNA.

Table 1 Strains and determined sequences used in this study

PCR Amplification and Sequencing

All of the PCR primers used in this study are shown in Fig. 1B. Three sets of primers, which are directed to the partial regions of the first dehydratase domain of mcyD (the mcyD gene contains two dehydratase domains; see Fig. 1B), the adenylate forming domain of mcyG, and the o-methyltransferase of mcyJ, were designed based on the published mcy gene sequence of Microcystis aeruginosa PCC7806 (Tillett et al. 2000). All three sets of primers are expected to amplify fragments of ∼550 bps, a length that easily accommodates full-length sequencing in both directions. In addition, published primers for mcyA (Tillett et al. 2001) were used to amplify these genes. Additional internal primers were designed for the sequencing of mcyA (Fig. 1B). The PCR reactions (25 μl) included 2.5 μg of genomic DNA, 2.5 μl of 10× PCR buffer, a 0.2 mM concentration of each dNTP, a 2.5 μM concentration of each primer, 4% dimethyl sulfoxide, and 0.3 μl of exTaq DNA polymerase (Takara, Shiga, Japan). All PCR reactions included an initial denaturation step of 3 min at 94°C, followed by 40 cycles at 94°C for 1 min, 50°C for 1 min, and 72°C for 1 min. When minor amplified bands (which might complicate sequencing) occurred, annealing temperatures were raised to a maximum of 63°C. Amplicons were purified using a Suprec-02 spincolumn (Takara) and were sequenced directly without cloning. Sequencing reactions were performed using ABI 310 automated sequencers, and DYEnamic ET terminator cycle sequencing (Amersham Biosciences, Piscataway, NJ) or BigDye terminator (Perkin-Elmer, Foster, CA) kits. Sequences have been deposited in the DDBJ (DNA Data Bank of Japan) nucleotide libraries (http://www.ddbj.nig.ac.jp/) under accession numbers AB110103–110146 (Table 1).

Sequence and Phylogenetic Analyses

The determined sequences of mcyA were aligned with those of strains with mcyA genes described by Tillett et al. (2001). Our mcyD, G, and J sequences were aligned with those of Microcystis aeruginosa PCC7806 (Tillett et al. 2000) by using CLUSTAL W version 1.84 (Thompson et al. 1994). Independent gene genealogies were inferred by the split decomposition analysis (Huson 1998) using the online version of SplitsTree version 2 (http://bibiserv.techfak.uni-bielefeld.de/splits/) based on hamming distance. (Note that the split decomposition analysis can also detect possible recombination tracts when a recombination event breaks linkage associations among sites.) Using PAML version 3.13a (Yang 1997), rates of synonymous (d S) and nonsynonymous (d N) substitutions were estimated from pairwise comparisons based on the maximum likelihood (ML) method of Goldman and Yang (1994) in that transition–transversion (TS–TV) bias, unequal base frequencies, and codon bias were also taken into account, whereas one d S/d N ratio was assumed for all branches. Nucleotide diversities (Nei 1987) were calculated using DnaSP version 3.53 (Rozas and Rozas 1999).

Statistical Analyses for Detecting Recombination

Several statistical tests have been developed to detect the intragenic recombination in a given alignment of genes, and the relative performance of these methods has been assessed using both simulated (Posada and Crandall 2001) and empirical data (Posada 2002). We used two tests that were highlighted by these authors to be more effective in detecting recombination when sequence divergence is relatively high (>1%): the runs test implemented in the GENECONV program version 1.81 (Sawyer 1999) and RDP (Martin and Rybicki 2000). These two methods employ different strategies. The former test is based on substitution, whereas the latter test is based on phylogenetic discordance. The GENECONV program was used with all the default settings except that the mismatch penalties are set to 1 (Gscale = 1) and to analyze silent sites only (-seqtype = SILENT). The “Global P-value” calculated from 10,000 random permutations of the alignment was used to assess the significance of any unusually long fragments detected, because this value is more conservative than the “pairwise P-value.” The RDP analysis was carried out with the window size set to 10 nucleotides, and the user-defined cutoff value was set to 0.05.

To assess the degree of congruency among the four gene genealogies, the incongruence tree length difference (ILD) test (Farris et al. 1994) was performed using PAUP* version 4.0b10 (Swofford 1999). In this test, we randomized each polymorphic site over loci without replacement and summed the tree length. This process was repeated 10,000 times, and then the sum of the four original MP tree lengths was compared to those of the 10,000 trees that were generated from the randomized data. If the four genealogies are incongruent, the observed tree length should be much shorter than those of the simulated trees, because the swapping of the sites amid the incongruent dataset yields homoplasy. To avoid biased sampling from the genes with more informative sites, analyses in which sites were weighted reciprocally to the total number of informative sites were also performed (hereafter designated “weighted ILD tests”).

Results

Sequence Analysis

The published primers for the N-methyltransferase domain of mcyA (Tillett et al. 2001) and our newly designed primers for the segments of three genes (mcyD, G, and J) led to the successful amplification of the corresponding gene fragments from all strains tested. There were no insertions or deletions in the amplified segments, so we could align these sequences easily and without ambiguity. The average pairwise percentage nucleotide difference of the mcyA, D, G, and J segments was 2.79, 1.70, 1.66, and 1.32% (only one sequence variant was included), the maxima being 5.87, 3.09, 4.73, and 2.53%, respectively.

Detecting Recombination Within the Microcystin Synthetase (mcy) Gene Cluster

The result of the split decomposition analysis of mcyA genes (Fig. 2) indicated a reticulate phylogeny, itself suggesting a history of recombination. The split decomposition analyses of the remaining three mcy genes are shown in Fig. 3. The mcyD and J phylogenies yielded tree-like structures. Conversely, the mcyG phylogeny produced a networked structure, suggestive of recombination. However, a networked phylogeny revealed in a splitgraph does not necessarily imply the presence of recombination, because homoplasy can also recover a similarly networked structure. In addition, recombination that makes sequences identical at sites within the recombined genes does not manifest itself in the splitgraph. Using the GENECONV and RDP programs, we therefore further substantiated the possibility of recombination within these genes. These programs can also identify the possible breakpoints of a recombinant gene. The runs test implemented in GENECONV identified several possible recombination tracts within mcyA (Table 2) but not within the mcyD, G, and J genes. The RDP analysis also identified three possible recombinational events within mcyA (Table 3), whereas no recombination was detected in the other three mcy genes. The recombined region identified by the two different programs spans 150–710 bp (Tables 2 and 3). Given that the RDP analysis, which is based on phylogenetic discordance, also failed to detect any tract of recombination within mcyG, the networked phylogeny of mcyG recovered by the split decomposition analysis is most probably attributable to homoplasy. It should be also noted that no correlations between morphospecies and phylogeny are observed in all four splitgraphs (Figs. 2 and 3).

Table 2 Results of GENECONV analysis of mcyA
Table 3 Results of RDP analysis of mcyA
Figure 2
figure 2

Split decomposition analysis of mcyA based on hamming distance. Morphospecies are indicated as follows: open triangle,M. aeruginosa; filled diamond, M. novacekii; filled circle, M. viridis; filled triangle, M. wesenbergii. Superscript indicates unidentified strains (M. sp.).

Figure 3
figure 3

Split decomposition analysis of mcyD, G, and J based on hamming distance. Asterisks indicate strain T20-3, which exhibits apparent phylogenetic conflicts among these three mcy gene genealogies. Morphospecies indications are as in Fig. 2.

The results of the independent split decomposition analyses indicated phylogenetic discordance (Fig. 3). One of the most prominent cases is the strain T20-3, which is more closely related to PCC7806 and PCC7941 in the mcyD and mcyJ splitgraphs but more distantly related to these two strains in the mcyG splitgraph. To further assess the phylogenetic discordance among the mcy genes, we performed an ILD test using four mcy genes from 12 strains. Of the 550, 549, and 552 sites determined for mcyD, mcyG, and mcyJ, respectively, 27, 27, and 19 sites are polymorphic, and of these, 18, 25, and 14 sites are parsimony-informative. mcyA alignments including only the 12 strains for which mcyD, G, and J were available identified 80 polymorphic sites, 43 of which are parsimony-informative. Before conducting the ILD test, we used each MP tree to exclude all homoplastic sites from all four genes; this process left 20, 8, 16, and 6 informative sites for mcyA, D, G, and J, respectively. This procedure ensured that the incongruity indicated by the ILD tests did not result from biased sampling of homoplastic sites. The unweighted ILD test significantly rejected the null hypothesis of the congruency of the four mcy gene genealogies (P < 0.0001; Fig. 4). The weighted ILD test also rejected the clonal hypothesis of mcy genes (P < 0.0001). Both weighted and unweighted tests including all polymorphic characters (but not homoplastic sites) gave the same results. These results provide strong evidence for recombination within the mcy gene cluster. To investigate whether the detected recombination was simply attributable to the presence of the single anomalous sequence of strain T20-3, we performed an ILD test excluding those data. The weighted ILD test included 38 informative sites for 11 strains. The results still showed a significant deviation from the clonal hypothesis (P < 0.0001), indicating that there are further recombination tracts within the mcy gene cluster other than that of strain T20-3.

Figure 4
figure 4

Result of incongruence length difference (ILD) test of the four mcy gene segments.

Discussion

The results of the split decomposition, GENECONV, and RDP analyses of the partial segment of the mcyA gene suggest that it has undergone recombination. The putative recombined regions of the mcyA genes span no more than 1000 nucleotides (Tables 2 and 3), which is consistent with the previous suggestion that bacterial intragenic recombination can occur within a few hundred base pairs (Smith 1995). In contrast, we cannot detect any evidence of intragenic recombination within the mcyD, G, or J loci. This could be due to the small number of samples and short sequence length included in this study. Probably, natural Microcystis populations might have likely undergone more intragenic recombination events than detected in the present study, since recombination cannot be detected in the absence of genetic variation.

The incongruence of the four mcy gene genealogies indicated by the independent split decomposition analyses (Figs. 2 and 3) and ILD tests (Fig. 4) strongly suggests that recombination occurred not only within the mcyA gene, but also within the entire mcy gene cluster. Because Microcystis harbors nonribosomally synthesized polypeptides other than microcystins (Dittmann et al. 1997; Neilan et al. 1999), one hypothesis is that the observed discrepancy among the four mcy gene genealogies is explained by gene conversion between other polypeptide synthetase genes encoded in the same genome. However, we consider this to be unlikely, because the results of our PCR experiments and the subsequent direct sequencing identified only a single relevant allele for all four mcy loci. We can also exclude the possibility of gene conversion between the other dehydratase domains of the mcy gene cluster and our sequenced segment of mcyD, or between another adenylate domain within the mcy gene cluster and the segment of mcyG that we investigated, as both corresponding domains are highly divergent from each other. In addition, no Microcystis strains have yet been documented that have more than one copy of the mcy gene cluster. Therefore, the possibility of gene conversion among two or more copies of the mcy gene in the same genome is also unlikely. Taken together, the discordance among the four mcy gene phylogenies is explained most convincingly by past interstrain recombination within the entire mcy gene cluster.

Interstrain recombination probably occurs following the lateral transfer of mcy genes between different strains of Microcystis. Although the relatively frequent occurrence of the lateral transfer of mcy genes has previously been suggested (Tillett et al. 2001), an underlying genetic mechanism is not currently known. Interestingly, pMa025, a plasmid recently identified in M. aeruginosa, was shown to possess a sequence similar to bacterial PKS, although it lacked sequence similar to mcy genes (Wallace et al. 2002). We speculate that some undiscovered plasmids or phages might mediate genetic exchange between different strains of Microcystis. If transformation is really responsible for the uptake of foreign genes in Microcystis as it is in some other bacteria, the occurrence of recombination within mcy genes might not be so frequent, but rather more incidental or episodic in the evolutionary history of Microcystis. This is because most strains of Microcystis appear to be either naturally untransformable or difficult to transform due to the presence of extracellular nuclease activity that could disturb the uptake of foreign DNA (Takahashi et al. 1996; Dittmann et al. 1997). This view is consistent with relatively low numbers of apparent recombination tracts within identified the mcy genes in the present study (Tables 2 and 3). Clearly, the genetic mechanism underlying the natural competence of Microcystis should be clarified to address these issues.

It may be of interest to review the taxonomic treatment of Microcystis by Otsuka et al. (2001) in light of the presence of sex, i.e., biological species concept. The nonmonophyletic nature of each morphospecies and their discordant placement in the four splitgraphs (Figs. 2 and 3) strongly suggest that recombination has occurred between different morphospecies. The presence of genetic exchange among different morphospecies of Microcystis conforms to the single “biological” species entity of Microcystis. However, it might be possible that recombination had occurred exclusively within the mcy gene cluster. Linkage disequilibrium analyses of multiple housekeeping loci should further illuminate the species entity of Microcystis spp.

To date, more than 60 variants of microcystins in which two amino acids are replaced have been identified (Sivonen 1996). (Variants of microcystins are designated microcystin-“XZ,” where X and Z are the variable amino acids at the second and fourth positions of the cyclic heptapeptides.) Several studies have suggested that the production of such variation within the microcystins is attributable to the catalytic plasticity of the module rather than to the differences in the primary structure of the module itself (Dittmann et al. 1997; Nishizawa et al. 1999, 2000; Tillett et al. 2000; Kurmayer et al. 2002). However, the conserved nature of each domain of the mcy genes undoubtedly increases the chance of recombination even between modules with differing substrate specificity, which could lead to the production of a novel microcystin variant. In fact, Mikalsen et al. (2003) recently reported that the gene conversion of the first adenylate domain of mcyB for the mcyC-like adenylate domain led to the exclusive production of microcystin-RR and its derived variants (an adenylate domain is involved in the recognition and activation of specific amino acid). Mikalsen et al. (2003) only studied a limited number of strains, so it seems likely that other strains with the non- mcyC-like mcyB adenylate domain that produces microcystin-RR are also present. In any event, further investigation with many more strains is needed to assess the contribution of recombination to the production of specific microcystin variants.

Recombination in bacterial genes has most frequently been detected when selection has subsequently caused the resultant strains to diverge. However, there is no evidence that variation within mcy genes or microcystin contents confer selective advantages in Microcystis, suggesting that mcy genes are not under diversifying selection. In support of this view, the average numbers of pairwise synonymous substitutions per synonymous site (d S) of mcy genes are somewhat higher than that of nonsynonymous substitutions (d N) (Table 4; P < 0.001, F-test), suggesting that mcy gene products are subjected to purifying selection. Interestingly, mcy gene knockout mutants are shown to be viable and display no obvious phenotypic defects compared to wild strains (Dittmann et al. 1997; Nishizawa et al. 1999, 2000; Tillett et al. 2000). These two superficially incongruent observations suggest that the possession of mcy genes might be favorable in certain environmental conditions in a strain-specific manner.

Table 4 Substitutional patterns of mcy genes

In conclusion, our phylogenetic and statistical analyses suggest that the mcy gene cluster shows a mosaic structure that is probably due to intraspecific recombination, and that even small segments (∼1000 bps) of a single mcy gene could have undergone recombinational replacement. This study, for the first time, substantiates the occurrence of recombination between different strains of Microcystis spp. and of interstrain recombination of bacterial NRPS/PKS genes. Concurrently, our study indicates that the evolutionary history of genes of Microcystis is a reticulate rather than tree-like bifurcation pattern, suggesting that the phylogenetic and taxonomic study of Microcystis spp. should take into account the mosaic nature of Microcystis genes.