Introduction

Endosymbiotic theory on the origin of plastid (dating from Mereschkovsky 1905) is now widely accepted, mainly based on the support given by the phylogenetic analysis of the plastid genomes and cyanobacterial genomes (for historical overview and discussions, see Sapp 1994; Archibald 2015; Sato 2016). Various common traits shared by the plastids and cyanobacteria have been taken as the evidence for the endosymbiotic theory, such as oxygenic photosynthesis, prokaryotic ribosomes, and prokaryotic RNA polymerase (for overview, see Sato 2001; Moriyama and Sato 2014). In addition, the fact that galactolipids, monogalactosyl diacylglycerol and digalactosyl diacylglycerol, are shared by the plastids and cyanobacteria was also recognized as an important support for the endosymbiotic origin of plastids. The complete elucidation of the biosynthetic pathways of galactolipids in both plastids and cyanobacteria, however, conclusively denied the apparent similarity of membrane lipids in plastids and cyanobacteria, because entirely unrelated enzyme sets are working in the two systems (Awai et al. 2014; Sato and Awai 2016).

Peptidoglycan is a complex polymer shared by the plastids of certain plants and algae and cyanobacteria (Vollmer and Seligman 2009; Takano and Takechi 2010). Cyanobacteria possess a clearly defined peptidoglycan layer, which appears as an electron-dense layer in electron microscopy. Historically, the plastids of Cyanophora paradoxa (glaucophyte) and the chromatophores of Paulinella chromatophora (rhizopod) have been regarded as true endosymbiotic cyanobacteria, and conventionally named “cyanelles” (see, for example, the classical text book by Stanier et al. 1967), because they possess clearly identifiable peptidoglycan layer (Kies 1974; Iino and Hashimoto 2003; Sato et al. 2009; Nowack and Grossman 2012). Homologs of the genes for the peptidoglycan synthesis enzymes, whcih we call “PG enzymes” in the present study, were identified in the moss Physcomitrella patens and the lycophyte Selaginella moellendorffii (Machida et al. 2006; Takano and Takechi 2010; Rensing et al. 2008; Banks et al. 2011). A recent genomic analysis identified a complete set of PG enzymes in some green algae such as Micromonas sp. CCMP1545 (van Baren et al. 2016), as well as in the charophyte Klebsormidium flaccidum (Hori et al. 2014). Quite recently, presence of the complete set of PG enzymes was also suggested in the draft genome sequences of some gymnosperms, although the results were based on homology search detecting partial sequences in most components (Lin et al. 2017). In P. patens, the lack of putative peptidoglycan was found to result in defects in plastid division (Takano and Takechi 2010, and the references therein). In angiosperms such as Arabidopsis thaliana, only limited components of PG enzymes have been found, and these components are believed to act in some different functions other than the synthesis of peptidoglycan (Takano and Takechi 2010). The existence of putative peptidoglycan, which was not visible by electron microscopy, in the moss was detected recently by a new technology called ‘click chemistry’, in which a modified exogenous substrate, ethenyl d-alanyl-d-alanine, was incorporated and converted to a fluorescent derivative in situ after integration into putative peptidoglycan (Hirano et al. 2016). The existence of “chloroplast peptidoglycan” is, therefore, becoming a reality, beyond the presence of homologs of PG enzymes.

Is the plastid peptidoglycan another evidence for the endosymbiotic origin of plastids? In other words, are the PG enzymes heritages from cyanobacterial endosymbiont? van Baren (2016) provided an extensive table of homologs of PG enzymes in plants and algae. Researchers instinctively believe that the plastid peptidoglycan or PG enzymes originate from the endosymbiont and that the PG enzymes in P. chromatophora, C. paradoxa, and P. patens are orthologs. We questioned this naive belief, and started to analyze the origin of ten PG enzymes (Table 1). The phylogenetic tree of MurE in Lin et al. (2017), for example, indeed indicate the relationship of MurE of plants and algae with bacterial MurE, but to understand the origin(s) of PG enzymes, we need a large number of bacterial sequences that cover the whole spectrum of bacterial phyla having PG enzymes. The comparative genomic database Gclust (Sato 2009) provides a good platform for the search of origins of PG enzymes. Exploiting the Gclust databae, we performed detailed phylogenetic analysis, and found curious results. In the present article, we will show that the origins of the PG enzymes are quite diverse. Some enzymes are certainly of endosymbiotic origin, but most others are definitely not. We will present various different hypotheses on the origins of PG enzymes in plants and green algae.

Table 1 List of peptidoglycan synthesis enzymes used in the present study

Methods

Sequence data

Representative enzymes for the peptidoglycan synthesis (“PG enzymes”) were already listed in Takano and Takechi (2010). We used the preformed homolog clusters in the Gclust database (Sato 2009) as available in the web site http://gclust.c.u-tokyo.ac.jp (Dataset 2012-42) to find clusters including proteins corresponding to the PG enzymes. This dataset contained protein sequences of various photosynthetic organisms (cyanobacteria, photosynthetic bacteria, algae, and plants) as well as non-photosynthetic organisms (various bacteria, archaea, fungi and animals). We did not detect PG enzymes in archaea, fungi and animals, with a few exceptions. Many PG enzymes were easily detected in a single cluster (e.g., MurA in Cluster 971, MurB in Cluster 1,062, MurG in Cluster 1,033, Ddl in Cluster 485). The enzymes MurC and MurD were found mixed in Clusters 572, 2176 and 19,584, and the enzymes MurE and MurF were found in a single cluster, Cluster 210. The enzyme MraY was found in Clusters 1,104 and 22,022. The PBPs (penicillin-binding proteins) of the Class A (Sauvage et al. 2008) were found in a large cluster, Cluster 114. When multiple clusters were found, the cluster number was added ahead of the gene identifiers. The PG enzymes encoded by the Paulinella chromatophore genome (Nowack et al. 2008) were obtained from the CyanoClust database (Sasaki and Sato 2010) at http://cyanoclust.c.u-tokyo.ac.jp/ (Dataset Cyanoclust4). The original sources of the sequences are described in the respective web sites.

Additional sequences were retrieved from the web sites of genome projects: two Micromonas species (MicpuN3v2_GeneCatalog_proteins_20160404 for M. commoda RCC299, Worden et al. 2009, and MicpuC3v2_GeneCatalog_proteins_20160125 for M. pusilla CCMP1545; van Baren et al. 2016) from Joint Genome Institute (http://jgi.gov/); hypothetical protein sequences (022111) of C. paradoxa from Cyanophora Genome Project (Price et al. 2012http://cyanophora.rutgers.edu/cyanophora/home.php).

The Ddl proteins of plants and certain algae have a duplicate structure. Except for the transit sequences, the N-terminal half and the C-terminal half were both homologous to the bacterial Ddl proteins. The C-terminal part was cleaved after the initial alignment, and both N-terminal and C-terminal domains were used for the phylogenetic analysis.

Phylogenetic analysis

All protein sequences for each PG enzyme were aligned by the software Muscle version 3.8.31 (Edgar 2004). The alignment was visualized by the software Clustal X version 2 (Larkin et al. 2007). In most cases, distant sequences were removed, and ill-aligned N- and C-termini were trimmed by the “getclu” command of the software SISEQ (Sato 2000). Only the sites having gaps in less than 20% of the total sequences were used for the calculation (this was done again by the “gap 0.2” option of the “getclu” command). Initial phylogenetic tree was constructed by the maximum likelihood (ML) method using the software PhyML version 3 (Guindon et al. 2010) (options were: -d aa –m LG –s BEST –b -5). Then, very distant sequences were removed. MurC and MurD were split from the initial large combined tree (Clusters 572, 2,176 and 19,584), and MurE and MurF were also separated based on the initial large tree of Cluster 210. The splitting point in each tree was used as the root in the subsequent phylogenetic analysis. All other trees were unrooted.

Baysian Inference (BI) analysis was then performed using the software MrBayes version 3.2.6 (Ronquist et al. 2012). Parallel processing was performed in Linux workstations and MacPro in the UNIX environment. WAG and LG models were used in both PhyML and MrBayes, but the results were not very different. Only the results with the LG model are presented in the current study. Other parameters in MrBayes were: rates = invgamma (in some cases, gamma), ratepr = variable, ngen = 2,000,000 (up to 12,000,000). samplefreq and burnin were appropriately set depending on the value of ngen.

In many cases, the initial BI analysis did not resolve all branches. Then, the ill-aligned N- and C-termini were trimmed. Distant sequences were again removed. Trimming, alignment and BI analysis were repeated to obtain reasonably branched trees. The resulting alignment was used to obtain a ML tree to compare with the BI tree. The graphical representation of the final trees was constructed by the software FigTree version 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/), followed by decoration by Adobe Illustrator version CS6. Some metrics of phylogenetic calculations are given in Table 2.

Table 2 Metrics of phylogenetic analysis

In the main text, we will show only the collapsed version of phylogenetic trees for simplicity. The full versions of both ML and BI trees are presented in a Supplementary Figures package. The multiple alignments that were used for the phylogenetic calculations (after gap removal and trimming) are also available as a Supplementary Data package.

Results

Conservation of PG enzymes in plants and algae

Table 1 is a list of ten PG enzymes, with their conservation profiles in plants and algae analyzed in the present study. The occurrence of PG enzymes in all the species included in the Gclust database as well as Micromonas sp. CCMP 1,545 is shown. Although information on many other species was shown in recent publications (van Baren et al. 2016; Lin et al. 2017), not all genomes listed therein were available at the start of study. We used A. thaliana and Vitis vinifera (as well as some additional plants included in the Gclust database) as representatives of angiosperms. The nuclear sequences of P. chromatophora are not available, but its plastid genome encodes nine out of ten PG enzymes. MurF might be encoded by the nuclear genome of P. chromatophora, but we were not able to analyze it. It was very difficult to estimate the presence of PG enzymes in C. paradoxa, because many of the homologs that were detected by the TBLASTN search were partial sequences. We used them as many as possible, but we had to remove some very short or distant sequences. In the case of MurB, partial sequences covering different parts were detected in the protein sequences of C. paradoxa. We assembled them into a single (still partial) hypothetical sequence for use in phylogenetic analysis. In general, we were not able to obtain reliable results with the sequences of C. paradoxa, and they are shown as tentative results with the currently available data.

The list in Table 1 shows that the complete set of PG enzymes are conserved in K. flaccidum, Micromonas sp., P. patens, and S. moellendorffii within the genomes analyzed. As indicated by asterisks, various angiosperms including A. thaliana and V. vinifera contain MurE, MraY, MurG, and Ddl. PBP was not encoded by the nuclear genomes of plants and algae, other than C. paradoxa and the species in which all PG enzymes are conserved.

PG enzymes of probable endosymbiont origin

Figure 1 shows phylogenetic tree of the first component of the peptidoglycan synthesis system, MurA. The MurA proteins of various bacterial phyla were clearly separated in the current phylogenetic analysis. The MurA proteins of plants and algae were all included within the clade of cyanobacterial MurA. MurA of P. chromatophora was closely related to the marine species of Synechococcus, namely, Synechococcus sp. PCC307 and Synechococcus sp. WH5701. In this respect, all the PG enzymes of P. chromatophora were similarly closest homologs of these two cyanobacteria (except MurF, which was not encoded in the chromatophore genome). The PG enzymes of Viridiplantae (plants and green algae) were monophyletic, whereas the two MurA homologs in C. paradoxa seemed paraphyletic but closely related to the MurA of Viridiplantae.

Fig. 1
figure 1

Phylogenetic tree of MurA. Phyla are color-coded (according to the color scheme in the FigTree software. RGB intensities are shown in bracket): green (or Spring) [0,255,0], land plants; SeaFoam [0,255,128], algae; red (or Maraschino) [255,0,0], α-proteobacteria; Tangerine [255,128,0], β-proteobacteria; grape [128,0,255], γ-proteobacteria; cyan (or Turquoise) [0,255,255], cyanobacteria; clover [0,128,0], green bacteria; midnight [0,0,128], Firmicutes; Aluminum [153,153,153], Archaea (very rare occurrence); mocha [128,64,0], actinobacteria; black [0,0,0], other bacteria and fungi (very rare occurrence). BI tree is shown, but the values on each branch indicate confidence levels of the branching (BI/ML). For simplicity, 1.00 is shown as “1”. The confidence values are indicated only on large branches, that are important for judging the origin(s) of plastid PG enzymes. Many branches that are not related to plastid PG enzymes are collapsed for visibility. All the original BI and ML trees are available in Supplementary Figures

The MraY proteins of Viridiplantae were also a sister group of cyanobacterial MraY (Fig. 2a). All MraY proteins of plants and algae were monophyletic. We still do not have a MraY sequence in C. paradoxa.

Fig. 2
figure 2

Phylogenetic tree of MraY (a) and Class A PBP (b). Taxon names are not shown explicitly in cyanobacteria. There are three major clades in b, which are numbered serially. For other explanations, see the legend of Fig. 1. Three clades were identified as indicated. The number at the top of each taxon name indicates the cluster number in the Gclust database

Penicillin-binding proteins

Penicillin-binding proteins (PBPs) and their homologs (called by different names, PbpA, PbpB, PbpC, MrcA, MrcB, PonA) belong to a large family of proteins consisting of three major clades (Fig. 2b). There was no simple relationship between the gene names and the clades. In α-proteobacteria, there are three major clades of PBPs. Beta proteobacteria had PBPs of Clades 1 and 2. Cyanobacterial PBPs were found in Clades 2 and 3. In Clade 2, only the cyanobacteria called β-cyanobacteria (those cyanobacteria other than marine Synechococcus and Prochlorococcus that are called α-cyanobacteria) had PBPs in two sub-clades. In Clade 3, there were two sub-clades of cyanobacterial PBPs, one containing all cyanobacteria, and the other containing only β-cyanobacteria.

PBP of P. chromatophora was found within the main cyanobacterial sub-clade of Clade 3. Plant and algal homologs of PBPs were found in Clade 2, in which the closest members were MrcB of γ-proteobacteria. They were clearly separated from any cyanobacterial PBPs. The PBP homologs detected as encoded in the genome sequences of C. paradoxa (C55323_11373 and C11577_4081) were short and distant. They were not analyzed, because it was necessary to remove them during the iteration of refining the phylogenetic tree of high confidence.

Other PG enzymes of non-endosymbiont origin

All other PG enzymes of Viridiplantae were not found within the clade of cyanobacterial homologs (Figs. 3, 4). It is interesting to note that the homologs of Chlamydia were the closest relatives of green plant enzymes in MurB, MurD and MurF (see ML tree for MurF in page 12 of Supplementary Figures package). In other PG enzymes, no consistent close relationship to a particular group of organisms was apparent. We explain some interesting points below.

Fig. 3
figure 3

Phylogenetic trees of MurB (a), MurG (b) and Ddl (c). See the legend of Fig. 1 for explanations. In plants and Micromonas sp. CCMP 1545, duplicated domains were separately analyzed. Taxon names of cyanobacteria and some bacteria are not shown for visibility of the whole trees. Instead, the branches were color-coded

Fig. 4
figure 4

Phylogenetic trees of MurC (a), MurD (b), MurE (c) and MurF (d). See the legend of Fig. 1 for explanations. Taxon names of cyanobacteria and some bacteria are not shown for visibility of the whole trees. Instead, the branches were color-coded

In MurE (Fig. 4c), homologs were also found in Chlamydomonas reinhardtii, Ostreococcus tauri and O. lucimarinus. All the MurE proteins of plants and algae, including MurE of C. paradoxa, were monophyletic, and most closely related to MurE of Firmicutes, rather than cyanobacterial MurE.

The Ddl proteins of land plants and Micromonas sp. CCMP 1545 contain duplicated Ddl domains. The N- and C-terminal domains were sister groups in the phylogenetic tree (Fig. 3c). Curiously, the single domain Ddl proteins of prasinophytes, O. tauri and O. lucimarinus, were associated to the clade of C-terminal domains. This gives us an interesting suggestion. Namely, the duplication of the Ddl domain occurred during the evolution of green algae. But these Ddl proteins were not closely related to the cyanobacterial Ddl proteins. The Ddl protein of Micromonas sp. RCC299 (protein 9,464, having only a single domain), as well as Ddl sequences of a fungus Neurospora crassa (NCU0517) and an archaea Methanosarcina acetivorans (MA0153 and MA4352), were distant, and was removed during the refining process.

PG enzymes of C. paradoxa did not show consistent affiliation to homologs of cyanobacteria or Viridiplantae. MurB of C. paradoxa was found within the clade of MurB of α-proteobacteria (Supplementary Figs. 3 and 4). Sister relationship with cyanobacterial homologs was found in MurE and MurF, whereas sister relationship with Viridiplantae homologs was found in MurD and MurE (a second copy). No clear relationship was found in MurD (a second copy) and MurG. Preliminary analysis suggested an association of the Ddl sequence (C7167_728) of C. paradoxa with γ- or β- proteobacteria, but it was necessary to remove this short Ddl during the refining process of phylogenetic analysis.

Various additional sequences of PG enzymes were found, such as MurG of C. reinhardtii, Ddl of P. patens and Populus trichocarpa, MurD and MurF of P. patens. It is not clear if some of these resulted from contamination in genome sequencing, but the closely related Ddl proteins of the two plants within the clade of Ddl of β-proteobacteria are not likely due to contamination. This could point to several independent events of gene transfer.

Discussion

The present study was started to confirm that the plastid peptidoglycan was a heritage of endosymbiont, but this attempt was not quite successful in the sense that we could not confirm the expected postulate. Instead, however, we obtained curious results that are rich in suggestions. We performed extensive phylogenetic analysis of ten enzymes involved in the peptidoglycan synthesis including various bacterial sequences. The results are rather complicated. To simplify the situation, we distinguish three cases, namely, plants and green algae (Viridiplantae), Cyanophora paradoxa, and Paulinella chromatophora. The results of phylogenetic analysis clearly indicate that all the PG enzymes in P. chromatophora are of cyanobacterial origin, even though MurF has not been identified yet. This rhizopod is considered to acquire the chromatophore quite recently (maybe 60 million years ago, Nowack et al. 2008) in the history of photosynthetic organisms, and this is likely the reason why many genes of endosymbiont origin are conserved in the chromatophore genome. Among them, we already found enzymes involved in the synthesis of galactolipids (Awai et al. 2014; Sato and Awai 2016).

The situation in Viridiplantae is complex in two ways: first, not all plants and green algae retain the peptidoglycan biosynthesis enzymes. K. flaccidum, P. patens and S. moellendorffii are representative plants that conserve the complete set of PG enzymes (Table 1). Partial homologous sequences for all PG enzymes were detected in the gymnosperms Picea abies and Pinus taeda (Lin et al. 2017). Angiosperms, in general, do not have the complete set, although four or five enzymes are still encoded by the nuclear genome. Most known green algae such as Chlamydomonas reinhardtii do not conserve the complete set of PG enzymes, which is present in only Micromonas sp. CCMP 1545 and some other green algae as shown by van Baren et al. (2016). Another complexity resides in the fact that many PG enzymes in Viridiplantae seem to originate from bacteria other than cyanobacteria. Some enzymes are likely of chlamydial origin, but the origins are diverse for many other enzymes. Curiously, however, all the PG enzymes of Viridiplantae are monophyletic. If multiple copies are found, at least one copy belongs to this orthologous clade. In Viridiplantae, MurA and MraY could be the only bona fide endosymbiotic PG enzymes that are believed to originate from the cyanobacterial endosymbiont. The results suggest that many PG enzymes in Viridiplantae were acquired early during the evolution of green algae from various different bacteria. In this sense, we should be cautious about the enzymes of likely cyanobacterial origin, because even the cyanobacteria-related enzymes could result from horizontal gene transfer at this stage, but not from the original endosymbiosis. We still need studies on various other enzymes acquired during the evolution of Viridiplantae to construct an overview, namely, how was the entire variety of the donors of the transferred genes.

As we stated earlier, the phylogenetic analysis of PG enzymes of C. paradoxa is limited by the scarcity of the complete sequences. Curiously, a book chapter on the genome analysis of this organism (Bhattacharya et al. 2014) stated that all PG enzymes including MurC and MraY were detected in the contigs of draft genome sequences. We were not able to detect these enzymes in both protein and nucleic acid sequences in the publicly available data. All we could do was to detect a very weak homology in contigs different from the listed ones. All available data suggest again diverse origins of PG enzymes of C. paradoxa, but this alga seems to retain more enzymes related to cyanobacterial homologs. Because of the limitation in the quality of data in C. paradoxa, we cannot draw a conclusion on the history of PG enzymes of Glaucophyta and Viridiplantae, which will be an interesting topic in the future.

Finally, we summarize a possible evolutionary history of PG enzymes in Viridiplantae. The simplest scenario assumes that all the PG enzymes were inherited from cyanobacterial endosymbiont, and then they were replaced, one by one, by exogenous enzymes through horizontal gene transfer. The fact that some unrelated paralogs are present in plants and algae suggest that such horizontal gene transfer events occurred many times during the evolution of Viridiplantae, not just in green algae. We can then simply suppose that PG enzymes were lost in most green algae and also in red algae. Gene transfer might have been in dynamic equilibrium with gene loss, but gene loss was predominant in most green algae. PG enzymes are kept in mosses and pteridophyes (and perhaps in gymnosperms), but were lost in angiosperms. Because PG enzymes (and presumably also peptidoglycan) are involved in chloroplast division at least in mosses (Takano and Takechi 2010), the loss of PG enzymes must be compensated by an alternative mechanism, but we do not have information to discuss on this point.

In another extreme scenario, biosynthetic capability of peptidoglycan was completely lost after the primary endosymbiosis. At some point in the evolution of green algae (before and/or after the separation from Glaucophyta and Rhodophyta), the ability of peptidoglycan biosynthesis was introduced by several events of horizontal gene transfers. The retention of MurA and MraY in many plants and algae might disfavor this hypothesis. We need to consider horizontal gene transfers for these enzymes. In this scenario, we will also have to consider simultaneous gene transfer from many bacteria, and this might be unlikely, because only the complete set of PG enzymes can synthesize peptidoglycan. Presence of only several enzymes might have no selective value. The presence of peptidoglycan in C. paradoxa might be difficult to explain by this scenario. We still have to use more complete genome data of this alga to discuss if the origins of PG enzymes are really different in C. paradoxa and green plants.

What is clear in both scenarios is that extensive horizontal gene transfers must have occurred in early evolution of Viridiplantae. The monophyly of all the PG enzymes analyzed in plants and green algae is remarkable. The replacement of genes for at least eight PG enzymes must have occurred within a short period of green algal evolution before the diversification of the lineage leading to the land plants. Because the putative origins of these enzymes were quite diverse as shown in the Results, the horizontal gene transfers must have involved various different bacteria, not just Chlamydia.

In considering such scenarios, we will have to keep in mind the following points: First, the complete set of PG enzymes are necessary for the synthesis of peptidoglycan, but some enzymes could function in different biochemical processes. Second, the role of peptidoglycan (or PG enzymes) in chloroplast division may be important in some plants and algae, while in other species, it might not be necessary. This difference could reflect the changes in the selection pressure during the evolution of Viridiplantae. The evolution of plastid peptidoglycan, and hence, the evolution of plastids must be more complicated as we thought before. This will be an interesting field of research, reflecting the whole history of plastid or plant evolution.