Introduction

It is known that the microbial diversity of marine habitats presents a great venue for exploration. This untapped potential has incited research interests in marine microflora for biotechnological applications [21]. Recently, new strains have been discovered to carry phenotypes that are ideal for recombinant microorganisms in biorefineries, such as the ability to utilize different substrates, fast sugar transport, inhibitors and product tolerance, and new pathways [2]. Novel enzymes from such marine isolates were either involved in the breakdown of polysaccharides or in new metabolic pathways [11, 12, 15, 16, 24, 28]. More often, these new enzymes and pathways have been used in the development of engineered microbial strains.

For instance, Vibrio sp. EJY3 was shown to metabolize 3,6-anhydro-l-galactose (l-AHG) through a new pathway which was successfully assembled in Escherichia coli [28]. A novel enzyme, Amy63, was identified from Vibrio alginolyticus 63, and exhibited multifunctional hydrolytic activity against amylose, agar, and carrageenan [13]. Other marine isolates such as Algibacter alginolytica sp. nov. carry diverse gene clusters for the utilization of different polysaccharides [24]. Falsirhodobacter sp. Alg1 was also reported to efficiently degrade alginate from brown seaweeds [19]. These studies are examples of the ongoing search for new and unique phenotypes from marine microorganisms for biotechnological applications.

In this report, a new bacterium isolated from the marine ecosystem is described. The criterion used for selection of the novel strain was the capability to utilize different natural polysaccharides such as agar, alginate, carrageenan, and chitin. This was used for two reasons; (1) the possibility of the new isolate to become a promising candidate for development of a recombinant industrial strain for the utilization of polysaccharides, and (2) the genome of the new isolate would contain a wealth of genes to explore for biotechnology applications. Hence, the genomic contents of the new isolate are also analyzed and described in this report.

Materials and Methods

Sample Collection and Bacteria Isolation

Seawater and seaweed samples were collected from Yeosu, South Korea. Bacteria from seawater samples were isolated by preparing serial dilutions and plating on Marine Agar or MA (BD Difco™ Marine Agar 2216, New Jersey, USA). Bacteria bound to the surface of seaweeds were isolated by washing the samples with sterile deionized water, from which 100 µL aliquots were plated on MA. Plates were incubated at 28 °C for 7 days. Colony phenotype was observed by visual inspection, while motility was determined by stabbing on a semi-solid MA and scanning electron microscopy (SEM). Cellular fatty acid composition was analyzed using gas chromatography–mass spectrometry method as previously described [18]. Physiological characterization was done using the BioMerieux API® test kits (Fisher Scientific, South Korea).

Phylogenetic, Genome Sequencing, and Comparative Genomics

The 16S rRNAs of selected isolates were PCR-amplified using the universal primers (27F and 1492R). Calculations of pairwise 16S rRNA sequence identities was performed by BLASTn. Phylogram was generated using CLC Sequence Viewer 7 (Qiagen, Korea). The draft genome sequence was constructed de novo using Illumina MiSeq sequencing data (Chunlab Inc., South Korea). Illumina MiSeq sequencing data were assembled and analyzed with CLC Genomic Workbench 7.5.1 (CLCbio, Denmark). Resulting contigs were scaffolded using GS Assembler 2.3 (Roche Diagnostics, CT). The coding DNA sequences (CDS) were predicted using PRODIGAL ver.2.6.2. CRISPRs were identified using Piler-CR ver.1.06 and CRISPR Recognition Tool ver.1.2. tRNAs were searched using tRNAscan-SE ver.1.3.1. Meanwhile, rRNA and ncRNA were identified using INFERNAL ver.1.0.2 using the Rfam ver.12.0 database. The annotation of each CDS was made by homology search against Swiss-prot, EggNOG ver.4.1, SEED, and KEGG databases. All carbohydrate-active enzymes (CAZymes) were predicted by dbCAN HMMs ver.4.5 [27]. Comparative genomics studies were conducted using Orthologous Average Nucleotide Identity Tool (OAT) [10], and were verified using the ANI-independent Genome-to-Genome Distance Calculator 2.1 (GGDC) online software [17].

Accession and Type Culture numbers

All partial 16S rRNA sequences of isolates were deposited in GenBank with accession numbers KX685644 to KX685657 (Table S1). The Whole Genome Shotgun project of strain W5C has been deposited at DDBJ/ENA/GenBank under the accession MDDP00000000 (version MDDP01000000 is described in this paper). The strain has been deposited in the Korea Collection for Type Cultures (KCTC 13157BPT) and the Japan Collection of Microorganisms (JCM 32108T). The corresponding DPD Taxon Number is TA00166.

Results and Discussion

Isolation and Characterization of W5C

Fourteen isolates with high agarolytic activities were selected from the purified colonies. Phylogenetic analysis of the 16S rRNA sequence (Fig. 1a) revealed that the isolates belong to the Pseudoalteromonas, Alteromonas, Glaciecola, or Cellulophaga genera. Strain W5C displayed the highest agarase activity as visualized by the halo formation after staining the plate with Lugol’s iodine solution (Fig. 1b). Alginate degradation was indicated by the zone of clearing around the streak of W5C, which was also observed in the presence of chitin (Fig. 1b). W5C also grew on carrageenan as indicated by the liquefaction surrounding the colony (Fig. 1b). Hence, W5C was selected for phenotype tests as shown in Table 1. Motility tests by agar stab and SEM imaging showed that W5C was capable of gliding motility (Fig. S1). Furthermore, its fatty acid profile consisted mainly of pentadecanoic acids (branched, unsaturated, or saturated) and hexadecanoic acids (palmitic acid and its unsaturated derivative) (Table S2). This is characteristic of Cellulophaga type strains. There are currently five validated species in this genus: C. lytica DSM 7489T, C. algicola DSM 14237T, C. baltica LMG 18535T, C. fucicola LMG 18536T, and C. pacifica KMM 3664T [5, 7, 26]. On the other hand, the List of Prokaryotic names with Standing in Nomenclature (LPNSN) database (http://www.bacterio.net) reports eight, which additionally includes C. uliginosa ATCC 14397T, C. tyrosinoxydans DSM 21164T, and C. geojensis KCTC 23498T [5, 8, 20, 26]. A summary of the phenotypic properties of W5C compared to selected Cellulophaga type strains is shown in Table 1.

Fig. 1
figure 1

Characterization of isolated strains. a Phylogenetic tree of agarolytic isolates. The phylogram was constructed using the neighbor-joining method. GenBank accession numbers for 16S rRNA sequences are shown in parenthesis. The evolutionary distances were computed using the Kimura 2-parameter method. Bar, 0.100 substitutions per nucleotide position. Distance calculations were done based on 1000 replicates. Bacteroides fragilis ATCC 25285, Vibrio sp. EJY3, and Saccharophagus degradans 2–40 were used as outgroups. b Visualization of strain W5C’s growth on different marine polysaccharides. The base solid media contains artificial sea water [g per liter; 24.53 NaCl, 5.20 MgCl2, 4.09 Na2SO4, 1.16 CaCl2, 0.70 KCl, 0.20 NaHCO3, 0.10 KBr, 0.027 H3BO3, 0.0025 SrCl2, 0.0030 NaF, 0.10 FeC6H6O7, 0.00010 NH4NO3]. Na-alginate agar plate contains 1% (w/v) Na-alginate and the unhydrolyzed alginate was precipitated by flooding with saturated CaCl2 aqueous solution

Table 1 Physiological characteristics of strain W5C and other Cellulophaga type strains

Genome Properties of W5C

The genome sequence of W5C yielded more than 3.6 million reads with an N50 value of 201,560 and a coverage of 229. The reads assembled into 51 contigs with 3,803,581 nucleotides and a GC content of 32.03% (Fig. 2a). Plasmid sequences were not found in W5C. The initial annotation resulted in 3334 open reading frames (ORF), 34 tRNA- and 5 rRNA-coding genes. 3128 genes were assigned to the Cluster of Orthologous Groups (COGs), 55.6% of which have designated putative functions while the rest were annotated as hypothetical proteins (Table S3). The genome size of W5C is slightly larger than its closest relative DSM 7489T (3.76 Mbp); however, it is smaller than the genome of DSM 14237T (4.89 Mbp) [1, 22].

Fig. 2
figure 2

Genome map of strain W5C (a) and comparative genomics OrthoAni analysis of W5C strain with related Cellulophaga species (b). A complete description of COG categories is supplied as supplementary Table S2. The OrthoANI values with strain W5C were calculated using the OAT software. Orthologous high-scoring pairs between two genome sequences were selected and included in subsequent calculations

W5C Belongs to a Novel Species of Cellulophaga

Species identification was performed for W5C. Genome relatedness indices such as OrthoANI, GGDC, digital DDH (dDDH) [3], and Maximal-Unique Match (MUMmer) alignment were used. The OrthoANI value for W5C against DSM 7489T is 90.5%, which is below the cut-off for species boundary of around 95% (Fig. 2b). This indicates that W5C belongs to a distinct species. OrthoANI values of W5C against other Cellulophaga strains were also < 90.5% (Fig. 2b), while the traditional ANI values were < 90.3% (Fig. S2A). GGDC analysis for W5C showed values of > 0.096 against other Cellulophaga strains (Fig. S2B). Furthermore, dDDH simulations revealed that W5C belongs to a distinct species as its estimated dDDH with DSM 7489T was at 41.0%, with HI1 at 41.1%, with KL-A at 41.1%, with NN016038 at 18.9%, with strain 18 at 18.6%, and with DSM 14237T at 18.8% (threshold is at 79% for delineating subspecies) [17]. Additionally, MUMmer alignment (Fig. S3) showed that only selected regions of the W5C genome have high similarities with DSM 7489T. Based on the differences in the genome sequence of W5C with other known Cellulophaga strains, it is proposed that W5C belongs to a different species hence named Cellulophaga omnivescoria sp. nov. W5C.

Metabolism of C. omnivescoria sp. nov. W5C

The genome content analysis reveals that C. omnivescoria sp. nov. W5C has an intact EMP pathway. The genes for pentose phosphate pathway are present except for the glucose-6-phosphate 1-dehydrogenase. The Entner–Doudoroff pathway is incomplete with the glucose 6-phosphate dehydrogenase missing. This pathway could be involved in alginate assimilation as observed in other alginolytic bacteria [24]. The genome has genes encoding for pyruvate dehydrogenase complex (E1, E2, and E3). The tricarboxylic acid cycle genes are complete but none was found for the glyoxylate shunt, indicating that the isolate is incapable of assimilating acetate [4]. Genes for oxidative phosphorylation were found, while there is no genetic evidence conferring capability for anaerobic or fermentative respiration. W5C most likely stores its energy and phosphorus in the form of polyphosphates due to the presence of polyphosphate kinases and exopolyphosphatase. Genes were found for nitrate/nitrite transport system, ferredoxin–nitrate reductase, and dissimilatory nitrate reduction.

The genome also contains genes for the d-xylose isomerase pathway while none was found for l-arabinose assimilation. The Leloir pathway genes are present, while the DeLey–Doudoroff pathway is incomplete since galactose dehydrogenase is missing. This incomplete pathway could be involved in carrageenan degradation as proposed by Lee et al. [12]. The genome sequence also reveals the presence of 31 CDSs for sulfatases, 23 of which are located within a predicted polysaccharide utilization loci (PUL). This suggests that W5C is capable of assimilating sulfated polysaccharides such as fucoidan or carrageenan. This indicates the potential of W5C for the utilization of marine polysaccharides. Unfortunately, no report has described or elucidated the genetic regions for polysaccharides utilization from related strains.

Abundance of Polysaccharide Degradation ORFs

From the genome sequence data of C. omnivescoria sp. nov. W5C, both hydrolytic (CAZymes) and metabolic enzymes for polysaccharide degradation were sought out. The W5C genome has 149 ORFs annotated as CAZymes; 64 glycoside hydrolase (GH), 39 glycosyltransferase (GT), 4 polysaccharide lyase (PL), 20 carbohydrate esterase (CE), 19 carbohydrate-binding module (CBM), and 3 auxiliary activity (AA) (Table S4). These correspond to 4.8% of protein-coding genes, which is typical for bacteria with several PULs [16]. This is higher compared to DSM 7489T and DSM14237T which only have 3.1 and 3.4%, respectively [1, 14, 22]. The presence of GHs and PLs in the genome indicates the potential of W5C to assimilate a wide range of polysaccharides such as agar, alginate, carrageenan, arabinoxylan, and fucoidan.

Agar degradation is provided by four annotated β-agarases (GH16) and two neoagarobiose hydrolases (GH117) (Fig. S4). The presence of alginate lyase (PL6, PL7, and PL17) in the genome is also indicative of its capability to assimilate alginate (Fig. S5). Furthermore, carrageenan degradation capability is evidenced by the presence of two ι-carrageenases (GH82), two λ-carrageenases (GHNC), and one κ-carrageenase (GH16) (Fig. S6). Arabinoxylan degradation is afforded by a α-l-arabinofuranosidase (GH3), whereas those of fucoidan and fucosidases were enacted by α-l-fucosidase (GH29). Glucan degradation is facilitated by three α-glucosidases (GH13), an endo-β-1,3-glucanase (GH16), and two β-glucosidases (GH3). Porphyran is also likely degraded by two β-porphyranases (GH16). Interestingly, GH74 was annotated as endo-xyloglucanase responsible for the degradation of xyloglucans, which are structural polysaccharides in green macroalgae [9]. GH109 was also annotated as α-N-acetylgalactosaminidase, and GH110 as α-d-galactosidases.

Marine PUL of W5C

A closer look at the genetic structure of C. omnivescoria sp. nov. W5C for the utilization of several marine polysaccharides reveals the presence of several PULs (Fig. 3). The criterion for distinguishing these PULs was based on the proximity of CAZymes and transporters homologous to SusC/SusD-like protein, or TonB-dependent receptor/transporter proteins [24].

Fig. 3
figure 3

Predicted polysaccharide utilization loci in W5C for agar, alginate, and carrageenan

The agar PUL in the genome includes 4 putative β-agarases, 9 ORFs for putative sulfatase, three ORFs for putative β-galactosidase, and several genes for the metabolism of carbohydrates (Fig. 3). This PUL contains putative genes coding for enzymes involved in the metabolism of l-AHG, which have > 80 and > 65% identity with l-AHG cycloisomerase and l-AHG dehydrogenase from Postechiella marina M091 and Vibrio sp. EJY3, respectively [11, 28]. The proximity of these metabolic gene clusters to the agar PUL reveals an evolutionary development of a robust system for the complete utilization of agar unique to W5C. Moreover, this robust agar PUL is ubiquitous in other Cellulophaga strains, with only a few regions of difference from the W5C genome (Fig. S4). This offers several advantages for the fitness and survival of W5C, including stable genetic regions, co-transcription, and co-expression of related enzyme functions, and a less complex gene regulation [6]. To date, the genomes of other known agarolytic marine bacteria do not contain a similar PUL, wherein the network of ORFs for genes involved in the hydrolysis of agar and complete metabolism of its hydrolysis products are present.

The PUL for alginate degradation contains three CDSs which were annotated as CAZymes belonging to PL family (Fig. 3). One of these PULs is unique to W5C, since it has no homologue in other strains (Fig. S5). The PULs for carrageenan are in different regions of the genome (Fig. 3). Two ι-carrageenase CDSs were found in one PUL. Meanwhile, two λ-carrageenase CDSs were in another PUL which also contains other types of oligosaccharide transporters such as the ABC transporter, and the CDS for the first two enzymes involved in the Leloir Pathway for galactose metabolism [23]. A putative \(\kappa\)-carrageenase was also found in the genome, but is located far from any PUL. Interestingly, three of these carrageenases have no homologues in other Cellulophaga strains according to the comparative genomics results (Online Resource 2 and Fig. S6). The hydrolysis of chitin in C. omnivescoria sp. nov. W5C is presumed to be carried out by the enzymes annotated as GH23.

Other PULs are also shown (Fig. 3). PUL A contains an ORF for the rare enzyme, α-1,4-polygalactosaminidase, which is involved in the degradation of galactosaminogalactans [25]. PUL B contains ORFs annotated as arabinofuranosidase as well as genes for the DeLey–Doudoroff pathway. This PUL may be involved in the degradation of porphyran due to the presence of ORFs annotated as sulfatases (Fig. 3). PULs C and D are most likely involved in the utilization of cellulose/carboxymethylcellulose. These two PULs carry ORFs for enzymes with homology towards GH5 and GH13 family, which are associated with cellulase and amylase, respectively. The function of PUL E is unknown. However, BLASTp analysis reveals that the two CDSs show homology with enzymes belonging to the GH1 family for glucosidase and endoglucanase. Interestingly, these were not classified as GH1 enzymes during CAZymes identification. PUL F is most likely involved in laminarin degradation. This PUL contains enzyme homologues of laminarinase (GH16) and glucosidase (GH3) (Fig. 3).

In summary, the newly isolated strain C. omnivescoria sp. nov. W5C shows a capability to degrade a wide range of natural polysaccharides such as agar, alginate, chitin, carrageenan, laminarin, and cellulose/carboxymethylcellulose. This is evidenced by the presence of several annotated CAZymes, as well as by the occurrence of multiple PULs in the genome. These PULs are excellent sources for the discovery of novel hydrolytic enzymes with unique characteristics useful for biorefining of marine renewable biomass. Currently, a few of the ORFs in these PULs are under investigation for their hydrolytic activities and their prospects in improving the existing technologies for marine polysaccharides depolymerization.

Description of Cellulophaga omnivescoria sp. nov

Cellulophaga omnivescoria (om.ni.vis.kor.yɑ. N.L. adj. omnis, all, everything; N.L. v. vescor, eat, devour; N.L. part. adj.; omnivescoria pertaining to the ability to consume several types of polysaccharides).

Cells are Gram-negative rods, 0.23–0.30 m wide and 1.0–2.5 µm long. Cells lack flagella but is capable of gliding motility. Colonies on MA plates have yellow to yellow-orange pigmentation, of low convexity and have a flame-like edge. Colonies also have traces of opaque green iridescence. Growth is strictly aerobic with an oxidative type of metabolism. Growth is optimal at 28 °C, pH 7.0–7.5, and 8% salinity, but exhibited tolerance up to 15% NaCl. They are capable of hydrolyzing agar, alginate, esculin, gelatin, and carrageenan, but not starch; nitrate reductase and oxidase-negative, but catalase-positive; capable of producing esterases (C4 and C8), lipase, galactosidase, glucosidase, leucine and valine arylamidases, trypsin, α-chymotrypsin, acid and alkaline phosphatase, α-mannosidase, and α-fucosidase; and positive for metabolism of glucose, mannose, mannitol, capric acid, and maltose, but not starch, arabinose, gluconate, adipic acid, malate, and citrate. The fatty acid mainly contains penta- and hexadecanoic acids (20.8% i15:1ω10c, 34.8% i15:0, 15.7% 15:0, 10.7% 16:1ω7c, and 9.7% 16:0). G + C content of the genome is 32% (genome sequence).

The type strain is JCM 32108T (= KCTC 13157BPT). The strain was isolated from surface of red algae from the east coast (Yeosu City) of the Republic of South Korea [34°46′39.1″N 127°44′41.5″E].