Introduction

Marine bacteria, particularly many species in Gammaproteobacteria, Bacteroidetes, and Planctomycetes, are key players in the global ocean carbon cycle (Michel et al. 2006). These marine bacteria produce diverse carbohydrate-active enzymes (CAZymes) such as glycoside hydrolases, carbohydrate esterases, and polysaccharide lyases, many of which are secreted out of the cell. The specific biochemical function of many of these enzymes remains to be elucidated but their characterization holds the potential for understanding complex polysaccharide bioconversion processes (Hehemann et al. 2014). The role of CAZyme-producing bacteria in the marine environment is particularly important given the diversity and quantity of polysaccharides produced by marine macroalgae (especially phaeophytes and rhodophytes) coupled with their large biomass and subsequent ease of harvestability—providing an important carbon source for several marine bacteria (Rhein-Knudsen et al. 2015). CAZyme-producing bacteria can therefore be opportunistic pathogens or saprophytes rather than commensal or mutualistic macroalgal symbionts (Egan et al. 2013).

Agar and carrageenan are among the major forms of polysaccharides in the marine environment, being the main structural material that make up the cell wall and intercellular matrix in many species of red algae (De Ruiter and Rudolph 1997; Hehemann et al. 2014). Various subtypes of these polysaccharides exist, and it is likely that various types of agarases and carrageenases that are specific to these subtypes have already evolved. In the marine environment, bacteria use these enzymes to degrade the polysaccharides that make up the cell wall of their seaweed hosts; hydrolysis of polysaccharides can either provide the bacteria with carbon sources or serve as an important mechanism for the bacteria to infect their hosts. A number of these enzymes have already been isolated and at least partially characterized (Michel et al. 2001; Chauhan and Saxena 2016). However, many carrageenases have not been purified to homogeneity and are not very well characterized (De Ruiter and Rudolph 1997), a situation that remains true today. Proteobacteria and Bacteroidetes are the two bacterial phyla under which most carrageenan-degrading isolates are classified (Chauhan and Saxena 2016).

The growing biotechnological interest in oligosaccharides as potential therapeutic products has also provided impetus to search for CAZyme-producing marine bacteria. Agarases and carrageenases are logical targets considering the amount of the agar and carrageenan that are produced globally via industrial-scale extraction of these polysaccharides from seaweeds (farmed or harvested from the wild) (Rhein-Knudsen et al. 2015). Oligocarrageenans, for example, have been reported to exhibit diverse bioactivities (Yuan et al. 2005; Hu et al. 2006; Yuan et al. 2006a, b; Kalitnik et al. 2012; Wang et al. 2012). Isozymes with slightly different enzymatic specificities may hold the potential for converting carrageenans into various types of oligosaccharides with different bioactivities or applications. For instance, κ-carrageenases exhibit diverse physicochemical properties (Potin et al. 1991; Khambhaty et al. 2007; Zhou et al. 2008; Sun et al. 2010; Liu et al. 2011; Li et al. 2013; Chauhan and Saxena 2016; Liu et al. 2016; Zhu and Ning 2016). The potential structural and functional diversity of the oligosaccharides resulting from enzymatic hydrolysis by these carrageenases is therefore potentially high. Equally important are further applications of CAZymes in cellulosic biomass conversion for bioethanol production and protoplast isolation for production of improved algal strains (Hehemann et al. 2014; Chauhan and Saxena 2016).

Traditionally, bioprospecting for carrageenases involves isolation and purification of enzymes from bacterial culture. In this study, whole genome sequencing of a carrageenase-positive rhodophyte-associated taxonomically novel isolate was used to discover CAZymes, particularly carrageenases. The genomic approach not only allows the discovery of the entire complement of putative CAZymes that a species can potentially produce (whether the genes are expressed or not in culture), but also provides important insights into the biology and ecology of the species, and hence the contexts in which the CAZymes play their roles, e.g., host-bacterial interactions. To our knowledge, this report is among the first to characterize the genome of a carrageenolytic marine bacterium.

Materials and Methods

Provenance of the Bacterium

The Gram-negative, aerobic, mesophilic, pigmented, rod-shaped, motile, and carrageenolytic marine bacterium designated as isolate AOL6 was isolated from a thallus of a diseased seaweed Kappaphycus alvarezii. The seaweed was obtained from a seaweed farm in Calatagan, Batangas, Philippines (13° 50′ 0.5′′ N, 120° 37′ 6.8″ E). Carrageenolytic marine bacteria were selectively grown in marine broth (40 g/L) with 1.5% κ-carrageenan. The semi-refined κ-carrageenan powder was obtained from Shemberg in Mandaue City, Cebu, 6014, Philippines. Clearing zones and liquefaction of the substrate κ-carrageenan indicated enzymatic activity. Preliminary molecular identification based on 27F and 1492R universal bacterial primers (Lane 1991) for 16S/SSU rRNA was performed to confirm the identity and purity of the isolate. The morphological characteristics were observed using Hitachi S-510 scanning electron microscope. Meanwhile, Gram-staining, capsule staining, and motility test were likewise performed. For biochemical characterization, API 20E and BIOLOG Gen III were used. The isolate was deposited at the University of the Philippines Los Baños Philippine National Collection of Microorganisms with Accession/Collection No. BIOTECH 10559 and at the University of Santo Tomas Collection of Microbial Strains in Manila, Philippines, with Accession/Collection No. UST CMS 5002.

DNA Extraction, Genome Sequencing and Assembly

Genomic DNA was extracted from an isolate AOL6 culture (mid-exponential phase) using Vivantis GF-1 Bacterial DNA Extraction Kit following manufacturer’s instructions. The quality of the extracted DNA was assessed by agarose gel electrophoresis and UV-Vis spectrophotometry. High-quality DNA was subjected to pyrosequencing using Roche 454 GS Junior Sequencer. An appropriate amount (800 ng) of DNA was used to construct the library following manufacturer’s protocol for fragmentation, purification, blunt-ending, adapter ligation, and size selection. The quality and quantity of the library were assessed by 1% agarose gel electrophoresis stained using Invitrogen’s SYBR Gold DNA stain and by fluorescence spectroscopy (molecules/μL) using BIOTEK microplate reader, respectively. Then, 10−7 dilution (TE buffer) of DNA library was prepared. Clonal amplification of DNA library was performed by emulsion PCR with copy to bead ratio of 1.5 cpb (copy per bead). Clonally amplified DNA beads were recovered, enriched, loaded to the PicoTiter Plate, and subjected to 200 cycles of pyrosequencing. De novo genome assembly including genome size estimation was performed using Roche GS De Novo assembler (Newbler 2.7). Assembly metrics was assessed using PRINSEQ (Schmieder and Edwards 2011). The k-mer depth (k = 11, D’ = 18) was estimated by k-mer counting methods such as Jellyfish (Marçais and Kingsford 2011) and KMC 2 (Deorowicz et al. 2015). Subsequently, the read depth (D) was calculated based on k-mer depth (D = Dl / (lk + 1) where l is the average read length). Together, the k-mer and read depths were used to calculate the estimated genome size by diving the total number of k-mers by the k-mer depth (G = Nbase / D = Nread (l – k + 1) / D’ where G is the estimated genome size, Nbase is the number of bases, Nread is the number of reads, D’ is the k-mer depth at the peak of the k-mer histogram, and Nk-mer = (l – k + 1) is the number of k-mers in a read) (Sohn and Nam 2018). This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession NOXC00000000. The version described in this paper is version NOXC01000000. The SSU (16S) rRNA and LSU (23S) rRNA sequences were deposited in GenBank with accession numbers MF928089 and MF928709, respectively. Likewise, the rpoB (DNA-dependent RNA polymerase subunit beta) sequence was deposited in GenBank with accession number MG934211. Unless stated otherwise, data handling and analysis were performed using custom-made Python and R scripts.

Genome Annotation

Genome annotation was carried out using automated and manual workflows. Automated genome annotation was performed using Prokka 1.12 (Seemann 2014). Structural and functional annotations were manually performed following the prediction of coding sequences (CDSs). The list of CDS was predicted from the genome assembly using Glimmer 3.02 (Salzberg et al. 1998; Delcher et al. 1999). The predicted CDS were indexed and retrieved from the assembly. CDSs were translated using EMBOSS TRANSEQ (Rice et al. 2000) at Frame 1 using NCBI Codon Table 11 (Bacterial, Archaeal and Plastid Genetic Code). Translated CDS were annotated by BLASTp (Altschul et al. 1990) search against GenBank NR, trEMBL, and SwissProt protein databases with an e-value threshold of 1E-10. Enzyme Commission (EC) numbers were assigned to CDS using Protein Sequence Analysis Toolkit (PSAT) (Leung et al. 2016). Identification of clusters of orthologous groups (COGs) and gene ontology term enrichment (GO enrichment) were performed using EggNOG4.5 database (Huerta-Cepas et al. 2016). KEGG pathway maps (map03070 and map02040 with Copyright Permission Ref. No. 180121) were reconstructed using BLASTKOALA (Kanehisa et al. 2016). Structural features such as rRNA and tRNA were predicted using RNAmmer (Lagesen et al. 2007) and ARAGORN (Laslett and Canback 2004), respectively. CAZymes were identified and annotated using dbCAN (Yin et al. 2012) and SignalP 4.1 (Petersen et al. 2011). BLASTp-NR result was analyzed using MEGAN (MEtaGenome ANalyzer) (Huson et al. 2007) highlighting notable features such as microbial attributes. Prokaryotic two-component systems (TCS) were identified using P2RP (Barakat et al. 2013) or retrieved from P2CS database (Ortet et al. 2015). Finally, synonymous codon usage order (SCUO) for each CDS was computed using CodonO (Angellotti et al. 2007). Studies have shown that a correlation exists between codon usage biases, and protein and transcript levels in different species (Gupta et al. 2004; Sahoo and Das 2014; Zhou et al. 2016; Das et al. 2017; Uddin 2017). Hence, in this analysis, the SCUO values were used as proxies to estimate gene expression levels. The CDSs with higher SCUO based on statistical analysis were thus assumed to be potentially highly expressed. The terms CDS, protein-coding sequence, ORF (open-reading frame), and protein (translated CDS) were used interchangeably for brevity and clarity, while genes refer to both protein-coding sequences and other features such as transfer and ribosomal RNA genes.

Phylogenetics and Comparative Genomics

A small subunit (SSU/16S) rRNA gene was identified from the assembly using RNAmmer (Lagesen et al. 2007). Homologous sequences (n = 12) were retrieved from the Ribosomal Database Project (RDP) (Cole et al. 2009). Phylogenetic analysis was performed using MEGA (Kumar et al. 2016) and MrBayes (Huelsenbeck and Ronquist 2001) following multiple sequence alignment (MSA), removal of flanking 5′ and 3′ ends by manual trimming, model testing, and tree building. The phylogenetic tree was inferred from the multiple sequence alignment using Bayesian Markov Chain Monte Carlo (MCMC) method and Kimura-2 parameter (K2P) G + I as the best nucleotide substitution model with 1,000,000 generations. The same workflow was also employed for the RNA polymerase Beta subunit (rpoB) protein using Bayesian MCMC method employing model jumping between fixed rate amino acid models. The analysis was iteratively run until convergence at 100,000 generations. The operational taxonomic units (OTUs) included in the rpoB analysis used in the analysis were based on the list of Spring et al. (2015). Complete or draft genomes of related taxa based on 16S rRNA gene tree were searched for each taxon at the genus or species level. As not all taxa included in the analysis have complete or draft genomes available, only those genus or species with available genome data were included in the phylogenomic analysis. Gene (CDS) features were extracted from respective GenBank files as multi-FASTA amino acid (.faa) files. Species identification tool (specI) (Mende et al. 2013) was used to extract all universal single-copy genes (40 in the specI COG database, Supplementary Table 2) from the list of identified taxa. Then, a common set of marker genes were identified leaving 38 genes for comparison (Supplementary Table 2). The phylogenomic analysis using concatenated marker genes was performed following the same workflow as described previously for SSU rRNA. The evolutionary history was inferred using the neighbor-joining (NJ) method where the evolutionary distances were computed using the JTT matrix-based method with 1000 bootstrap resamplings. In silico DNA-DNA hybridization was simulated based on pairwise genome comparisons between AOL6 with Teredinibacter turnerae T7901 and with Saccharophagus degradans 2-40 using JSpecies (Richter and Rosselló-Móra 2009). In this analysis, the genome of AOL6 was randomly divided to 1020 bp where the resulting fragments were used as query for reference mapping.

Comparative genomics with respect to phylogenetically related carbohydrate degraders Teredinibacter turnerae T7901 (NC_012997.1) and Saccharophagus degradans 2-40 (NC_007912.1) was performed based on COGs using OrthoVenn (Wang et al. 2015) and CAZyme/carbohydrate-binding module profiles and metabolic pathways.

CAZymes—κ-Carrageenases

Computational analysis was performed to search for κ-carrageenases (κCa). An approach combining similarity- and profile-based methods, i.e., BLASTp search against GenBank NR, trEMBL and Swiss-Prot databases and HMM search (Eddy 1998; Finn et al. 2011) based on annotated 18 κCas (protein) sequences from UniProt (The UniProt Consortium 2014), and HMM search based on 120 SEED sequences for GH 16 superfamily was implemented. The intersection of the results, containing potential κCa-encoding sequences, was identified from the consolidated results.

Detection and Measurement of κ-Carrageenase Activity

Protein isoforms were detected from the culture medium (supplemented with 0.2% κ-carrageenan in marine broth) of AOL6 using zymography-based assay as described by Smith et al. (2005). The enzyme activity from the culture medium was measured using 3,5-dinitrosalicylic acid reducing sugar assay following the protocol of Tayco et al. (2013) against agar, alginate, ι-carrageenan, λ-carrageenan, and starch. This assay was performed in triplicates. The carrageenan (κ, ι, λ), alginate and starch powder used in this assay were obtained from Sigma-Aldrich. Agar was obtained from Conda Pronadisa.

Results

Morphological and Biochemical Characteristics of the Isolate

Using standard microbiological procedures, isolate AOL6 was found to be Gram-negative, aerobic, mesophilic, pigmented, rod-shaped, motile (Supplementary Fig. 1), and carrageenolytic. Details of the phenotypic and biochemical characteristics are described in a separate paper (David et al., manuscript in preparation).

Genome Assembly Metrics and Features

The sequencing run generated 166,418 single-end reads with mean base length per read of 446.95 bases. These reads are equivalent to a total of 74,380,448 bases. The Newbler-generated assembly consisted of 55 contigs covering a total of 4,129,655 bases, with an N50 of 193,405 bases and a mean coverage of 18×. The genome size of the bacterium was estimated to be 4.4 Mb. Thus, the sequencing experiment has covered 93.86% of the expected genome size. On the other hand, k-mer counting methods estimated the genome size to be 4.04 Mb. The assembly metrics, detailed in Table 1, indicated that majority of the sequence reads were used in the assembly leaving only a miniscule portion (0.0012% reads, 0.093% total bases) as singletons—generating 55 contiguous sequences/contigs. On the other hand, the general features of the genome showed a GC composition of 52%, 1 copy of small subunit (SSU/16S) rRNA gene, 1 copy of large subunit (LSU/23S) rRNA gene, 1 copy of 5S rRNA gene, 40 tRNA genes (Supplementary Table 1), and 1 transfer-messenger RNA (tmRNA). The genome was observed to encode 3659 protein-coding genes (CDS).

Table 1 Genome assembly metrics (GS De Novo Assembler/Newbler 2.7)

Phylogenetic Affiliation

Phylogenetic analysis of 16S rRNA gene sequences and rpoB protein sequences extracted from the assembly allowed the identification of species closely related to isolate AOL6 (Fig. 1a). Based on the topology of the trees (Fig. 1a, b, and c), AOL6 was found to be most closely related to known carbohydrate-active bacteria Teredinibacter turnerae T7901 (Ttu T7901) and Saccharophagus degradans 2-40 (Sde 2-40) and to a free-living marine bacterium Eionea nigra 17X/A02/237 (Fig. 1a). Together, they form a distinct clade under family Cellvibrionaceae, affiliated to phylum Proteobacteria and class Gammaproteobacteria. Of these three related taxa, only Ttu T7901 and Sde 2-40 have genome data available for comparison. Hence, whole genome comparisons were limited to these two-related species. Phylogenomics based on 38 universal single-copy marker genes (Supplementary Table 2) produced a tree with similar topology (Fig. 1c), placing AOL6 adjacent to but distinct from Ttu T7901 and Sde 2-40 species. The phylogenetic affiliation of AOL6 with respect to these two carbohydrate degraders was in congruence with in silico DNA-DNA hybridization results (Fig. 1d). The average nucleotide identity (ANI, BLAST and MUMmer—ANIb and ANIm) and tetra (tetranucleotide signatures) values between the taxa were lower than the required threshold for species circumscription. Traditional DNA-DNA hybridization values recommend the species borders to be set at 60–70% which can be equated to ANI (average nucleotide identity) range of 95–96% and tetra value of ≥ 0.99 (Richter and Rosselló-Móra 2009).

Fig. 1
figure 1

a Phylogenetic affiliation of AOL6 with other marine bacteria based on 16S (SSU) rDNA gene sequences and b rpoB protein sequences reconstructed using Bayesian MCMC method. Numbers at the nodes indicate posterior probabilities based on 1,000,000 (16S) and 100,000 (rpoB) generations. Bar (0.05) indicates nucleotide/amino acid substitutions per site. Accession numbers are indicated before the name of each taxon. AOL6 and closely related bacteria are grouped as a clade under Cellvibrionaceae. c Neighbor-joining (NJ) tree based on 38 single-copy marker genes (proteins, Supplementary Table 2) where bar (0.05) and numbers at the nodes indicate amino acid substitutions per site and bootstrap support, respectively. Accession numbers are indicated before the name of each taxon. d In silico DNA-DNA hybridization with respect to T. turnerae T7901 (Ttu T7901) and S. degradans (Sde 2-40) genomes

Comparative Genome Analysis

Comparative analysis of AOL6 with respect to Ttu T7901 and Sde 2-40 highlighted key similarities and differences between the species, including differences in metabolic pathways (Supplementary Tables 4 and 5). Table 2 shows a comparison of the general features of the genomes of AOL6 and the closely related Cellvibrionaceae species Ttu T7901 (Yang et al. 2009), Sde 2-40 (Weiner et al. 2008) and Cellvibrio japonicus Ueda107 ((DeBoy et al. 2008), GenBank Accession No. CP000934). The genome of AOL6 is the smallest in this group (20% smaller compared to Ttu T7901 and Sde 2-40, respectively), and contained the fewest ribosomal RNA operon (rrn) copy number (1 vs. 2 or 3) and fewer CDS regions (3659 vs. 4105 and 4065 of Ttu T7901 and Sde 2-40, respectively). It is important to take note, however, that the exact rrn copy number of AOL6 cannot be definitively determined using the current genome assembly.

Table 2 Comparison of general genome features

COG category assignment of AOL6, Ttu T7901, and Sde 2-40 shows that 86, 88, and 89% annotation coverage of respective CDSs have been achieved (Table 3). Considering that CDSs may be assigned to multiple COG categories and in order to get an accurate representation of COG categories, CDS assigned to multiple COG categories was thus counted individually and separately. Hence, the number of COG category assignments would not necessarily reflect the number of CDSs/proteins annotated. The number of “S, unknown function” category was excluded from downstream analyses involving total number of COG category assignments as its sheer overrepresentation would potentially mask important recognizable features as presented in Fig. 2 and Table 9.

Table 3 Summary of orthology assignments of AOL6, Ttu T7901, and Sde 2-40
Fig. 2
figure 2

Comparative analysis of COGs of AOL6, T. turnerae T7901, and S. degradans 2-40. COG annotation and classification is based on bacterial EggNOG 4.5

Functional classification of COGs appears to be similar among the three bacteria (Fig. 2). A genome-wide comparative clustering analysis of homologous proteins among AOL6, Ttu T7901, and Sde 2-40 indicated they have 2432, 2497 and 2584 clusters, respectively (Fig. 3). Together these species formed 2912 clusters, 2759 orthologous clusters (which comprise at least two species), and 1782 single-copy clusters. From these clusters, there are 34 paralogous clusters in AOL6. Gene ontology term enrichment (GO enrichment) assigned these clusters to 12 GO terms which comprised 48 proteins. These proteins belong to 12 distinct names representing specific biological process, molecular function, and cellular component (Table 4). Noteworthily, some of these proteins are putatively involved in carbohydrate processing specifically κ-carrageenan hydrolysis and other carbohydrate modification enzymes.

Fig. 3
figure 3

Genome-wide comparison of homologous (orthologous and paralogous) protein clusters and summaries of the disjunction and intersection shared among AOL6, Tte T7901, and Sde 2-40

Table 4 GO enrichment of the 34 paralogous clusters of AOL6 with respect to Tte T7901 and Sde 2-40

The proportion of CAZymes in the genome (CAZome) of AOL6 was analyzed and compared to CAZomes of known carbohydrate-degrading marine bacteria as shown in Table 5. The proportion of CAZymes in the genome of AOL6 is 5.82% which consist of the hydrolytic enzymes glycoside hydrolases, polysaccharide lyases, carbohydrate esterases, glycosyl transferases, and auxiliary REDOX enzymes. This proportion exceeded that of known and previously identified carbohydrate degraders except Saccharophagus degradans 2-40 (Mann et al. 2013). In-depth analysis of more species-specific CAZymes and CBMs with respect to Ttu T7901 and Sde 2-40 revealed distinct patterns that differentiate one from the other species (Supplementary Fig. 2), such as in terms of the number and relative abundance of each CAZyme and CBMs in the genome. The difference in the composition and abundance of carbohydrate-binding modules (CBMs) in the genomes of AOL6, Ttu T7901, and Sde 2-40 was apparent, where each genome appeared to have a unique CBM profile (Fig. 4). Proteins containing a CBM but without a catalytic domain were also observed in the genome of AOL6 (Supplementary Table 3). For instance, it was observed that CBM8 and CBM61 are only found in AOL6 (Fig. 4). Interestingly, these two CBMs have never been observed to be associated or to co-exist with any catalytic domains of CAZymes identified in AOL6.

Table 5 CAZymes in the genomes of various carbohydrate-active bacteria
Fig. 4
figure 4

Comparative analysis of the carbohydrate-binding modules (CBM) of AOL6 including T. turnerae T7901 and S. degradans 2-40 using dbCAN (database for carbohydrate-active enzyme annotation) as a reference/database. CBMs marked with asterisk (*) are those not found to be in the same protein with CAZyme catalytic domains (in AOL6 only) (left panel). Top 10 CBMs per species are shaded in gray (right panel)

κ-Carrageenases and Polysaccharide-Degrading Activities

Three potential κCa-encoding ORFs were identified in the genome of AOL6 designated as κCa02330, κCa02332, and κCa10164 (Table 6). κCa02330 has a predicted molecular weight of 51.15 kDa similar to previously identified κ-carrageenases (Potin et al. 1991; Zhou et al. 2008; Sun et al. 2010; Liu et al. 2011, 2013; Zhu and Ning 2016). κCa02332 and κCa10164 were predicted to be relatively larger with molecular weights of 86.98 and 95.98 kDa, respectively, but these are smaller than the 128-kDa κ-carrageenase identified by Khambhaty et al. (2007) from Pseudomonas elongata. A zymography assay (which would reveal the combined effect of a mixture of active enzymes secreted by the bacteria) using κ-carrageenan as substrate detected at least 3 isozymes (Fig. 5). This result was consistent with the result of the computational analysis which identified 3 potential κ-carrageenase-encoding genes (Table 6). Assays were also carried out to determine the enzymatic activity of crude extract of AOL6 (grown in κ-carrageenan-supplemented marine broth) against κ-carrageenan and other polysaccharide substrates. Figure 6 revealed high enzyme activity against κ-carrageenan at 80.22 U/ml where 1 unit of enzyme is equivalent to the protein that produces 1 μg of D-galactose per minute (significantly lower activities were detected against the other polysaccharides, including other carrageenans, which ranged from 16 to 20 U/ml).

Table 6 Predicted κ-carrageenase in the genome of AOL6
Fig. 5
figure 5

Zymogram showing at least 3 active isoforms of κ-carrageenase in the culture medium of AOL6

Fig. 6
figure 6

Enzymatic activities of the crude enzyme extract (from cultures grown in liquid medium supplemented with k-carrageenan) on various polysaccharide substrates

Bacterial Secretion System

Homologs of most genes/proteins involved in the bacterial type ii secretion system (T2SS) were found in the genome of isolate AOL6 including Sec and Tat general secretion systems Fig. 7). These proteins included GspD, GspC, GspF, GspG, GspI, GspJ, GspK, GspL, GspM, GspE, SecD/F, SecE, SecG, SecY, YajC, YidC, SecA, FtsY, SecB, ffh, TatA, TatB, and TatC. Approximately 12% (434 of 3659) of the total predicted proteins from the genome of AOL6 (Table 7) maintain an N-terminal signal peptide sequence where 16% (71 of 434) of these are putative CAZymes comprised by GH (glycoside hydrolases), PL (polysaccharide lyases), CE (carbohydrate esterases), GT (glycosyl transferases), and AA (auxiliary enzymes or REDOX enzymes). Based on the number of predicted CAZymes including CBMs (carbohydrate-binding modules), it is therefore predicted that about 34% (81 of 239) are possible substrates of or secreted extracellularly through a T2SS.

Fig. 7
figure 7

Bacterial secretion systems. Boxes highlighted in green are found in the genome of AOL6

Table 7 Statistics of modular and non-modular CAZymes and signal peptides

Two-Component Signal Transduction Systems

Bioinformatics analysis based on P2RP (Barakat et al. 2013) identified several putative TCS proteins. From this analysis, 101 TCS proteins were identified and classified to 49 histidine kinases (HKs), 49 response regulators (RRs), and 3 phosphotransferases (PPs) (Table 8). Based on a 200-bp distance cut-off (Williams and Whitworth 2010) for functionally-linked TCS proteins, the proportion of paired TCS genes in the genome of AOL6 was 54% (54/101) while the remaining fraction was classified as either complex (8%, 8/101) or orphaned (39%, 39/101). This number is much higher than in Teredinibacter turnerae T7901 and Saccharophagus degradans 2-40, which have 20% paired-50% orphaned and 24% paired-46% orphaned TCS, respectively (Ortet et al. 2015). On the other hand, in E. coli, a distant relative in Gammaproteobacteria, a large majority (71.5%) of TCS proteins are organized in pairs while orphaned and complex TCS account for only 14.5 and 14.5%, respectively (Mizuno 1997). While majority of the prokaryotic TCS proteins (62%, e.g., Firmicutes) are encoded in pairs at loci within genomes, some organisms appear to favor encoding TCS proteins as orphan genes (e.g., Xanthomonas) (Williams and Whitworth 2010). Comparison of the number of TCS proteins among carbohydrate-degrading representatives of phylum Proteobacteria and Bacteriodetes (Table 8 and Fig. 8) revealed interesting patterns of composition and abundance of RRs; the distribution of RRs appeared to be clade-specific regardless of the habitat of the bacteria.

Table 8 TCS proteins identified from the bacterial genomes of carbohydrate-degrading marine bacteria
Fig. 8
figure 8

Comparison of the composition and proportion of response regulators (RRs) from the genomes of carbohydrate-degrading marine bacteria. Left panel shows the count while the right panel shows the proportions of the RRs per species

Discussion

AOL6 Represents a Novel Copiotrophic, Carrageenolytic Marine Bacterium

Molecular phylogenetic analysis based on 16S rRNA gene sequences and rpoB protein sequences identified the bacterial species most closely related to isolate AOL6 (Fig. 1a, b), Teredinibacter turnerae T7901 (Ttu T7901) and Saccharophagus degradans 2-40 (Sde 2-40). However, the 16S rRNA and rpoB sequences of AOL6 were only 93 and 90% identical to the corresponding sequences in Ttu T7901 and Sde 2-40, respectively, hence indicating that AOL6 is a distinct species. Phylogenomic and in silico DNA-DNA hybridization analyses (Fig. 1c, d) also supported the same conclusion. The results of the analyses in fact suggest that AOL6 represents not only a novel species but also a new genus in the family Cellvibrionceae. AOL6 is a carrageenolytic marine bacterium isolated from a diseased thallus of a farmed rhodophyte Kappaphycus alvarezii (Gigartinales, Rhodophyta). Ttu T7901, a closely related species, is a cellulolytic, diazotrophic, and Gram-negative marine bacterium isolated from the gills of a broad range of wood-boring marine bivalves of the family Teredinidae while Sde 2-40 is a free-living Gram-negative bacterium isolated from decaying seagrass (Weiner et al. 2008; Yang et al. 2009). Eionea nigra 17X/A02/237, a member of the same group, is a Gram-negative bacterium isolated from the coastal waters of the north-western Mediterranean Sea (Urios et al. 2011). The carbohydrate-degrading activity of Eionea nigra remains to be confirmed. The phylogenetic affinity of AOL6 places it as potential member of the emerging Microbulbifer/Teredinibacter/Saccharophagus group of marine carbohydrate degraders (Weiner et al. 2008). Spring et al. (2015) identified the members of the family Cellvibrionaceae and Microbulbiferaceae that comprise a copiotrophic group of marine bacteria with special ability to utilize complex polysaccharides as substrates. Based on rpoB sequences (Fig. 1b), AOL6 belongs to a group of bacteria known as r-strategists, characterized by preference for nutrient-rich environments (Spring et al. 2015). Furthermore, the definition of prokaryotic K-strategists and r-strategists can be based on the copy number of rRNA operons, rpoB sequence delineation in reference to Fig. 1b, and the number of COG category assignments in Table 9 (Spring et al. 2015). This inferred ecological adaptation based on rpoB protein sequence (r-selection) was also consistent with the overrepresentation of COG categories namely motility, defense mechanisms, signal transduction mechanisms, and transcription (Table 9). Finally, on the basis of genome size and number of protein-coding sequences (Giovannoni et al. 2005), AOL6 appeared unlikely to be an obligate symbiont and is therefore free-living or host-associated. On the other hand, ribosomal RNA operon (rrn) copy number in a bacterial genome can be used to infer the ecological and reproductive strategies of an organism in response to specific ecological niche where copiotrophs generally have more than 1 copy of rrn while oligotrophs have only 1 copy of rrn (Klappenbach et al. 2000; Lauro et al. 2009; Roller et al. 2016). In AOL6, the exact rrn copy number (Table 2) cannot be conclusively determined as the average read length of pyrosequencing introduces an inherent limitation in resolving tandemly repeated copies of ribosomal RNA genes. The sequencing depth spanning the predicted 16S rRNA gene (170×) far exceeds that of the average sequencing depth for the draft genome (18×). Therefore, this raises the possibility of a mis-assembly leading to a generalized repeat collapse and expansion (Phillippy et al. 2008) and which can only be addressed if the reads are long enough to span this tandemly repetitive elements (Sohn and Nam 2018). Notwithstanding the general trend of rrn copy number with respect to trophic strategies, it has been shown that counterintuitively large genomes of Planctomycetes (a feature of copiotrophs) possess only a single copy of rrn (a feature of oligotrophs) (Lauro et al. 2009).

Table 9 Proportion of COG category assignments of Ttu T7901, AOL6, and Sde 2-40

Carrageenan-degrading or carrageenolytic marine bacteria belong to two distantly related phyla Proteobacteria and Bacteriodetes, although most of the isolates belong to the former group (Chauhan and Saxena 2016). Most of these isolates are Gram-negative, including Pseudoalteromonas, Cellulophaga, Pseudomonas, Cytophaga, Tamlana, Vibrio, Catenovolum, Microbulbifer, Zobellia, and Alteromonas (Michel et al. 2001; Mou et al. 2004; Sun et al. 2010; Hatada et al. 2011; Li et al. 2013; Liu et al. 2013; Yao et al. 2013; Ziayoddin et al. 2014; Zhu and Ning 2016). Like most carrageenolytic marine bacteria, AOL6 is a Gram-negative bacterium that belongs to phylum Proteobacteria. Some Gram-positive bacteria such as Bacillus sp. (Kang and Kim 2015) have also been reported to produce carrageenases. While several carrageenolytic bacteria have already been identified and characterized, AOL6 is among the relatively few bacteria whose genome was sequenced and characterized.

Comparative Genomics Reveals Shared and Distinct Features with Respect to Closely Related Carbohydrate-Degrading Species T. turnerae T7901 and S. degradans 2-40

Comparative genomic analysis with respect to Ttu T7901 and Sde 2-40 identifies the key features that differentiate AOL6 from the two species. GO enrichment analysis (Fig. 2) identified 34 paralogous clusters in AOL6 that are involved in carbohydrate processing, and these included κ-carrageenase activity. Ttu T7901 and Sde 2-40 are carbohydrate-active bacteria that degrade cellulosic materials (Weiner et al. 2008; Yang et al. 2009). Therefore, this indicated that the bacteria can be differentiated based on carbohydrate metabolism, e.g., carbohydrate specificity (carbohydrate-binding modules, CBM) and utilization (CAZymes). Analysis of more species-specific CAZymes and CBMs revealed distinct patterns that differentiate one from the other (Fig. 4 and Supplementary Fig. 2). This analysis revealed a unique profile for each species highlighting the number and relative abundance of each CAZyme and CBM in the genome. Specifically, it was observed that CBM8 and CBM61 are only found in AOL6 (Fig. 4 and Supplementary Table 3). These two CBMs are observed neither to be associated nor to co-exist with any catalytic domain in CAZymes. CBM8 is a cellulose-binding module from a cellulase of the slime mold Dictyostelium discoideum while CBM61 is a binding module for β-1,4-galactan and is found to co-occur with catalytic domains in GH16, GH30, GH31, GH43, GH53, and GH66 (Lombard et al. 2014). CBMs are non-catalytic in nature and are usually found in association with the catalytic domains of CAZymes. CBM is as a contiguous amino acid sequence within a CAZyme with a discrete fold having carbohydrate-binding activity. The presence of CBM in an enzyme confers substrate specificity to the corresponding CAZyme (Boraston et al. 2004; Lombard et al. 2014). In the case of cellulases, CBMs comprise part of a scaffolding subunit that organizes the catalytic subunits into a cohesive multienzyme complex known as cellulosome (Shoseyov et al. 2006).

Critical to the assembly of cellulosomes are two groups of structural modules, namely, cohesins and dockerins which mediate the interaction of different enzymes to form molecular assemblies that degrade plant biomass (Bayer et al. 2004). Interestingly, a search for these modules in the genome of AOL6 revealed that neither cohesins nor dockerins were found in the genome. A similar search in the genomes of Teredinibacter turnerae T7901 or Saccharophagus degradans 2-40 found that either cohesins or dockerins, but not both, were encoded in the genome of these two species. The implications of these findings remain to be investigated.

CAZome Analysis Reveals CAZyme Abundance Comparable to Other Carbohydrate-Degrading Marine Bacteria: a Fraction of Cell Wall-Degrading Enzymes Is Predicted to Be Secreted Through the T2SS Pathway

An organism may be described by the number of CAZymes in its genome. In the study conducted by Barbeyron et al. (2016), carbohydrate catabolism of marine heterotrophic bacteria (MHB) is influenced by phylogeny, as well as habitat, and as a consequence, CAZymes can be used as a predictive tool to assess a bacterial species’ ecological niche, in particular its role in the carbon cycle. Therefore, the varying proportion of CAZymes in the genome can be used as an indication of bacterial behavior or lifestyle. The CAZome of free-living organisms typically correspond to 1–5% of the predicted coding sequences. Extremely reduced CAZomes are characteristic of species with a strict intracellular parasitic lifestyle (Lombard et al. 2014). For instance, microorganism that degrade plant cell walls synthesizes an extensive portfolio of hydrolytic enzymes that display complex molecular structures (DeBoy et al. 2008). In AOL6, the proportion of CAZymes in the genome is 5.82%—possibly indicating a specialized function to degrade a range of polysaccharides. This value exceeds that of most known and previously identified carbohydrate degraders (Table 5) (Mann et al. 2013). The number of CAZymes in the genome of AOL6 suggests that it does not only live and thrive in symbiosis with carrageen-rich algae but is also capable of degrading and utilizing other polysaccharide substrates—suggesting a promiscuous lifestyle.

Protein secretion plays a central role in bacterial response to environmental conditions or stimuli. In particular, symbiotic bacteria, whether pathogenic, commensal, or mutualistic, rely on a variety of secreted proteins in its interaction with their host organisms (Tseng et al. 2009a). The presence of genes involved in the bacterial T2SS in the genome of AOL6 (Fig. 7) but not of other types of secretion systems was an interesting finding. T2SS, a protein secretion machinery found in various Gram-negative bacteria and is one of the known six protein secretory systems (Tseng et al. 2009b), enables the transport of cytoplasmic proteins that are generally associated with the breakdown of host tissues and therefore important in causing the symptoms associated with bacterial infection. The secreted proteins have a range of biological functions, but they are generally enzymes. They include proteases, lipases, phosphatases, and CAZymes where the site of action is primarily extracellular (Korotkov et al. 2012). In contrast to AOL6, the genome of the endosymbiont Teredinibacter turnerae T7901 apparently encodes types III, IV, and VI (T3SS, T4SS, and T6SS) secretion systems apart from T2SS (Yang et al. 2009). The presence of such secretion systems is believed to be the hallmarks of intracellular symbionts (Lee and Schneewind 2001), and therefore, their absence in the genome of AOL6 suggests that this species is unlikely to be an intracellular pathogen or symbiont. On the other hand, the genome of Saccharophagus degradans 2-40 encodes T1SS and T2SS secretion systems (Weiner et al. 2008).

Genome analysis (Table 7) shows that approximately 12% (432/3659) of the total predicted proteins from the genome of AOL6 maintain an N-terminal signal peptide sequence where 16% (71/434) of these are CAZymes comprised by GH (glycoside hydrolases), PL (polysaccharide lyases), CE (carbohydrate esterases), GT (glycosyl transferases), and AA (auxiliary enzymes or REDOX enzymes). Based on the number of predicted CAZymes including CBMs (carbohydrate-binding modules), it is proposed that about 34% (81/239) of these are possible substrates of or secreted extracellularly through the type II secretion system.

Aside from the apparent exclusive use of T2SS for CAZymes secretion, the analysis of the AOL6 genome also revealed how CAZymes (and perhaps other hydrolytic or degradative enzymes) may be delivered to their target efficiently by this species. Codon usage preference analysis can be used to predict the highly expressed genes in a genome (Gupta et al. 2004; Angellotti et al. 2007; Sahoo and Das 2014; Zhou et al. 2016; Das et al. 2017; Uddin 2017). From this strategy, pilZ gene is identified to be potentially highly expressed. The product of this gene is believed to be involved in type IV fimbrial biogenesis promoting adhesion to host cells during colonization and are associated with a form of surface translocation called twitching motility (Alm et al. 1996; Han et al. 2007; Yamagata et al. 2012; Nivaskumar and Francetic 2014; Green and Mecsas 2016). Conceivably, the putative high expression pilZ gene may play a role in secreting CAZymes efficiently during colonization of algal hosts.

The Collection of TCS Proteins in AOL6 Genome Putatively Reflects Role in Coordinating Complex and Dynamic Interactions Between the CAZyme-Rich Bacterium and Its Polysaccharide-Rich Host

The TCSs are one of the primary means by which bacteria regulate their behavior in response to external or environmental stimuli (Borland et al. 2015). They play important roles in a broad range of physiological mechanisms such as virulence, chemotaxis, motility, and metabolism which are essential for plant-associated bacteria (Gao and Stock 2009; Leonard et al. 2017). Coordination of virulence mechanisms involves several TCSs (Leonard et al. 2017). Therefore, identification of TCSs may provide valuable insights into the behavior and virulence mechanism of a bacterium. Table 8 shows that majority of the predicted TCSs in AOL6 are involved in bacterial chemotaxis and surface colonization, which could mediate the infection process. Diseases in seaweeds such as the “ice-ice” disease observed in Kappaphycus species result from infection by opportunistic pathogens promoted by unfavorable environmental conditions (Largo et al. 1999). Bacteria with polysaccharide-degrading abilities tend to be opportunistic pathogens or saprophytes rather than commensal or mutualistic macroalgal symbionts (Egan et al. 2013). Initial events in most plant-microbe interactions include bacterial chemotaxis toward plant exudates and bacterial attachment on the plant surface (Yao and Allen 2006; Antúnez-Lamas et al. 2009; Heindl et al. 2014 cited in Leonard et al. 2017). It is surmised that regulation of expression of chemotaxis and surface colonization proteins, as well as the synthesis of extracellular enzymes, is facilitated by the TCS proteins coordinating complex and dynamic host-bacterial interactions. In the case of AOL6, its colonization of its algal hosts would conceivably involve its rich repertoire of predicted response regulators putatively involved in bacterial chemotaxis and surface colonization, in concert with its T2SS that enable type IV fimbrial biogenesis followed by extrusion of cell wall-degrading enzymes, including κ-carrageenases. Investigation of this hypothesis would be a step toward a deeper mechanistic understanding of how bacteria cause diseases in seaweeds.

Comparative investigation reveals a pattern of TCS protein composition and abundance among carbohydrate-degrading marine bacteria at different taxonomic lineages and habitats. The repertoire of TCS proteins appears to be phylogeny-dependent rather than habitat-dependent (Table 8 and Fig. 8) suggesting that bacteria belonging to a specific taxonomic rank could alternate from one lifestyle to another and assume different lifestyles in response to habitat condition. T. turnerae is an endosymbiont from the gills of a broad range of wood-boring bivalves of family Teredinae (shipworms) (Distel et al. 2002; Yang et al. 2009). S. degradans 2-40 was isolated from a dead, decaying seagrass. By far, S. degradans 2-40 has the largest repertoire of CAZymes and has been demonstrated to degrade an array of complex polysaccharides (Ekborg et al. 2005; Weiner et al. 2008). Microbulbifer agarilyticus GP101 (genome accession no. NZ_CP019650.1), isolated from ocean sediments, possessed both agarolytic and carrageenolytic activities (Miyazaki et al. 2008). Alteromonas macleodii ATCC 27126 (genome accession no. NC_018632.1) is a copiotrophic and planktonic marine bacteria (Ivars-Martinez et al. 2008). Zobellia galactanivorans DSIJ (genome accession no. NC_015844.1) is a seaweed-associated (rhodophyte) marine bacterium that possesses both agarolytic and carrageenolytic activities and is currently a model for studying algal polysaccharide bioconversions (Barbeyron et al. 2001; Barbeyron et al. 2016).

Conclusion

In this study, the genome of a newly discovered seaweed-associated, carrageenolytic, and culturable marine bacterium designated as isolate AOL6 was sequenced and analyzed. Phylogenetic and phylogenomic analyses suggest that this bacterium represents not only a novel species but also a novel genus in the family Cellvibrionaceae. Analysis of its genome revealed several interesting features. Similar to its relatives in the family Cellvibrionaceae, such as T. turnerae T7901 and S. degradans 2-40, AOL6 has an abundance of CAZyme-encoding genes in the genome. Unlike Ttu T7901 and Sde 2-40, however, it includes carrageenases in its CAZyme repertoire and apparently relies exclusive on T2SS for secreting proteins, presumably including the cell wall-degrading enzymes polysaccharide lyases, carbohydrate esterases, and glycoside hydrolases. In addition, majority of its putative TCSs are predicted to be involved in bacterial chemotaxis and surface colonization. Its pilZ gene, whose product is presumably involved in cell adhesion and CAZyme secretion, also appears to be potentially actively expressed based on codon usage bias analysis. These characteristics suggest that AOL6 is an opportunistic, potentially pathogenic, r-strategist, adapted to colonizing polysaccharide-rich hosts, such as seaweeds.