Introduction

The genus Vibrio consists of over 140 species (www.bacterio.net/vibrio.html). However, within this genus there are closely related species that are difficult to identify based on the 16S rRNA gene sequence. The family is divided in many clades according to their phylogenetic relationship and established by Multilocus Sequence Analyses (MLSA) (Rosselló-Mora and Amann 2001; Saitou and Nei 1987). In particular, the Mediterranei clade consists of five species, V. mediterranei, V. maritimus, V. variabilis, V. thalassae, and the recently described V. barjaei (Sawabe et al. 2013; Pujalte and Garay 1986; Chimetto et al. 2011; Tarazona et al. 2014; Dubert et al. 2016), that are highly similar.

The first species of the clade was described in 1986 as V. mediterranei (Pujalte and Garay 1986) and was isolated from sea sediments in Valencia, Spain. Then, in 2001, V. shiloi (Vibrio shilonii corrig.) was isolated from diseased corals (Kushmaro et al. 2001) in the coast of Israel however, that same year, Thompson et al. (2001), proved that V. shiloi was a later heterotypic synonym of V. mediterranei. In 2011 V. variabilis and V. maritimus were described (Chimetto et al. 2011), these species were isolated from the zoanthid coral Palythoa caribaeorum in Sao Paulo, Brazil. In 2014, another two species were proposed, V. thalassae (Tarazona et al. 2014) isolated from seawater in Valencia, Spain and V. madracius (Moreira et al. 2014), which was isolated from the scleractinian coral Madracis decactis. However, the next year, González-Castillo et al. (2015), proposed that the species V. madracius should be considered a later heterotypic synonym of V. thalassae. Finally, Dubert et al. (2016) described V. barjaei, which was isolated from the Grooved carpet shell clam (Ruditapes decussatus) in Galicia, Spain.

All species of the Mediterranean clade are described with polyphasic taxonomy; polyphasic taxonomy integrates the analysis of phenotypic, genotypic and phylogenetic characters (Colwell 1970). Due to the problem of the variability of phenotypic tests, polyphasic taxonomy has also incorporated tools with a wide resolution power, such as whole genome analysis (WGA).

In the era of genomic analyses, DNA–DNA hybridization (DDH) seems to be an outdated method that needs to be replaced (Richter and Rosselló-Móra 2009; Chun et al. 2018).

One of the most promising methods is the average nucleotide identity (ANI), because it provides results equivalent to the DDH (Rosselló-Mora and Amann 2001). The importance of this method is that with only a 20% of the genome is sufficient to classify and identify bacteria (Richter and Rosselló-Móra 2009), with a threshold limit of ANI to differentiate between species is 95–96%. Another technique is the in silico DNA–DNA hybridisation (DHD), the threshold of this technique is the same as the traditional DDH (70%) and, as ANI, only needs about 20% of the genome to get the same result as with the full genome. Both techniques are the de facto candidates to replace traditional DDH (Meier-Kolthoff et al. 2013).

The aim of the present study was to perform a detailed polyphasic based genomic analyses of the available strains of the Mediterranei clade of Vibrio.

Materials and methods

Genome sequence data

The genomic sequences of two type strains (V. thalassae and V. barjaei) and the genomic sequences of thirteen reference strains were obtained from the National Centre for Biotechnology Information (NCBI) and the Sequence Read Archive (SRA). We sequenced the genomes of the type strains of V. maritimus CAIM 1455T, V. variabilis CAIM 1454T, and V. mediterranei CAIM 316T by means of the Ion Torrent platform as described earlier (Quail et al. 2012; Moreira et al. 2014) with minor modifications as follows. Library preparation was carried out using the Ion Plus Fragment Library Kit, with 1 μg DNA (in Low TE, 50 μL). DNA was fragmented using the BioRuptor® Sonication System as described in the Ion Plus Fragment Library Kit protocol. End repair, adapter ligation, nick repair, and amplification (10 cycles) were also performed as described in the Ion Plus Fragment Library protocol. Library fragments (300–350 bp) were selected through agarose gel (2% m/v) electrophoresis. Concentration of the libraries were determined with Ion library Quantization kit using TaqMan® in a CFX96™ Real-Time PCR System (BIO-RAD®). The amount of library required for template preparation was calculated using the Template Dilution Factor calculation described in the manufacturers protocol. Emulsion PCR and enrichment steps were carried out in the Ion OneTouch™ 200 Template Kit v2. Ion Sphere Particle quality assessment was carried out as outlined in this protocol. Sequencing was done using a 318 chip using barcoding. The Ion PGM™ 200 Sequencing Kit was used for sequencing following the recommended protocol and Torrent Suite 1.5 was used for analyses. The reads were assembled de novo with Newbler (RunAssembly ver. 2.3). These genomes were annotated using the RAST server (Rapid Annotation Subsystem Technology) (Aziz et al. 2008) and have been deposited at DDBJ/EMBL/GenBank under the project accession number [GenBank: JYJJ00000000 for Vibrio maritimus CAIM 1455T], [GenBank: JYJK00000000 for Vibrio variabilis CAIM 1454T] and [GenBank: JYJL00000000 for Vibrio mediterranei CAIM 316T]. On the basis of genome sequences, G + C percentage can be directly calculated (Table 1).

Table 1 Genome information of the strains used in this study

Phylogenetic analysis

The sequences of the 16S rRNA gene, eight housekeeping genes ftsZ, gapA, gyrB, mreB, pyrH, recA, rpoA, topA (MLSA) and of 139 single-copy genes (SCG) were extracted from the draft genomes sequences. Sequence data analyses were performed with Geneious ver 7.1.8. Sequence similarity of the 16S rRNA was determined using the EzTaxon-e server (Kim et al. 2012). Phylogenetic trees were reconstructed using the neighbour-joining, maximum-likelihood, and maximum parsimony algorithms with MEGA ver. 5.05 (Tamura et al. 2011). In addition, a phylogenetic tree was made with 20 species of the genus Vibrio, to locate the position of the JCM strains (JCM 19235, JCM 19239 and JCM 19240) and T01 strains within the genus. The 16S rRNA gene was analyzed for recombination events with the program SplitsTree v4 (Huson and Bryant 2006).

Genomic analysis

The ANI was calculated according to Richter and Rosselló-Móra (2009). The OrthoANI was calculated according Lee et al. (2015), the software calculates the values of original ANI, and then, calculates the OrthoANI with OrthoANI algorithm.

Genomic comparisons between all the available genomes of the Mediterranei clade were done with the CMG-Biotools package (ver.2.2) (Vesth et al. 2013) which basically relies in several scripts; localization of rRNA sequences in genomic DNA with RNAmmer (Lagesen et al. 2007), gene finding with prodigal (Hyatt et al. 2010), and its own perl scripts to calculate the proteomic relationships. The amino acid and codon usage was calculated using the CMG-Biotools BioPerl modules; it was a simple calculation of the fraction of each amino acid or codon count of the total count of amino acids or codons.

The core and pan genome of the Mediterranean clade was obtained with an Anvi’o, which is an open source extensible software platform, and also provides an interactive interface (Eren et al. 2015). The phylogenomics workflow (http://merenlab.org/2017/06/07/phylogenomics/) was followed; the HMM profile for 139 single-copy genes from Campbell et al. (2011) was used.

The in silico phenotyping was obtained with two programs that obtain the probable phenotypes based on curated databases. The genomes of the Mediterranei clade were annotated with Traitar (Weimann et al. 2016); this program obtains the phenotypic data from the GIDEON database (Berger 2005). In silico phenotyping was also obtained with Vibriophenotyping (ver. 1.1) (Amaral et al. 2014).

Result and discussion

Phylogenetic analysis

The phylogenetic tree based on sequences of the 16S rRNA gene (Fig. S1) and of 139 single-copy genes (SCG) (Fig. 1) of 20 species of the genus Vibrio showed that strain T01 classified in GenBank as V. variabilis does not correspond to the species to which it was assigned. However, strain T01 forms an independent branch placed outside the Mediterranei clade, it could represent a new species of Vibrio. The phylogenetic tree showed that the Mediterranei clade is composed of five species (V. mediterranei, V. maritimus, V. variabilis, V. thalassae, and V. barjaei) and that the clade could be divided into two clusters, one composed of V. maritimus and V. variabilis, and the other composed of V. mediterranei, V. thalassae, and V. barjaei. In addition, the distances of MLSA genes and 16S rRNA between different species of Mediterranei clade varied considerably (between 89.5 and 97.2%) and the intraspecies similarity was at least 98.4% using nine genes; Pascual et al. (2010) reported an intraspecies similarity of at least 93% and interspecies similarities between 86 and 93%, for species of the Harveyi clade, using seven genes.

Fig. 1
figure 1

Phylogenetic tree based on concatenated sequences of 139 single-copy genes (SCG) of type and reference strains of mediterranei clade by Maximum likelihood method based on the Jones–Taylor–Thorton model. Vibrio cholerae type strain was used as out-group. Scale bar, base substitutions. Sequences were aligned with MAFFT v7.215, uninformative regions were deleted with GBlocks v0.91b, the tree constructed with Geneious v10.0.2 (Jaccard-Neighbor joining, 1000 bootstrap support) and rendered with FigTree v1.4

Codon usage

The codon and amino acid usage was calculated for the 12 complete genomes and visualized as heatmaps (Figs. S2, S3); the Mediterranei clade was clearly divided into two clusters, the first cluster integrated by V. mediterranei, V. thalassae, and V. barjaei and the second cluster by V. maritimus, and V. variabilis.

The most used codons by members of the Mediterranei clade were CAA, TTG, AAA, and TTT, and the least used codons were CCC, GGG, CCT, AGG, CCG, CGG, TCC, and GGA. The most used amino acids were arginine, serine, and leucine; and the least used were tryptophan and methionine. These results are similar showed by the genus Veillonella, due, they coincide in a codon usage of the most used (AAA), and in six of the least used codons usage (CCC, GGG, CCT, AGG CGG and TCC), however, amino acids usage coincide in the most used amino acid leucine and in the least used amino acid methionine (Vesth et al. 2013).

The ANI between different species of Mediterranei clade varied considerably, between 76 and 95% (ANIb) (Fig. 2); between 84 and 95% (ANIm) (Fig. S4) and the intraspecies similarity was at least 95% (ANIb) and 96% (ANIm). The ANI for the pairs V. thalassae MD16T and V. thalassae A-354, V. variabilis CAIM 1454T and JCM 19239 was 95% (ANIb) and 97% (ANIm); 81% (ANIb) and 87% (ANIm) respectively. The ANI value for V. mediterranei strains was at maximum 99% for both methods (ANIb and ANIm). The values obtained by ANI for the Mediterranei clade species coincide with the minimal standards mentioned by Chun et al. (2018).

Fig. 2
figure 2

Average nucleotide identity (ANIb) heatmap. ANIb was for all 18 genomes calculated based on genome sequences. The percentage of ANIb was plotted in one heatmap using R. The heatmap was clustered in 2D, thus reordering the organisms and the ANIb to show the shortest distance between them. Dendrograms was draw for ANIb and can be used to visualize the difference in usage between organisms. (Color figure online)

Furthermore, we performed analyses for only the type strains with OrthoANI (Fig. S5), which is a more complete variant of the ANI (Lee et al. 2015). The results obtained by OrthoANI were highly similar to those obtained with the ANI, with the advantage that a heatmap and dendrogram showing the phylogenetic relationships of the Mediterranei clade were obtained (Fig. S5). Considering the threshold, the JCM strains (JCM 19235, JCM 19239, and JCM 19240) do not belong to the species to which the strains were originally assigned, coinciding with the results shown by the AAI.

Proteome comparisons

The proteomes were constructed for 18 genomes with the CMG-Biotools. The BLAST matrix (Fig. S6) illustrates that the conservation between genomes of the Mediterranei clade is generally low (29.6–99.1%). The BLAST matrix for the pairs V. thalassae MD16T and V. thalassae A-354 was of 38.8%, and V. variabilis CAIM 1454T and JCM 19239 was 40.9%. The BLAST matrix for V. mediterranei strains was at a maximum of 99.1%. These results are similar to other vibrios; V. cholerae strains share between 70 and 80% of proteins, while the similarity with organisms outside the species ranged between 30 and 45% (Vesth et al. 2010). The homology within proteomes (paralogues) (red squares in Figure S6) between different species of the Mediterranei clade did not show considerable variation (between 2.6 and 4.9%), coinciding with those reported for V. cholerae strains (1.3–5.3%) (Vesth et al. 2010).

The core and pangenome of the Mediterranei clade

The pangenome of the 18 strains of the Mediterranei clade (Fig. 3) revealed a total of 13,094 gene clusters that include 2057 core gene clusters (47,283 genes in all 18 genomes, average of 2626.8 genes per genome), 1488 persistent gene clusters (28,108 genes, average 1561.5 genes per genome) and 9549 gene clusters of accessory genes (24,645 genes, average 530.5 genes per genome), of these accessory genes, 5462 gene clusters (5921) belong to genes associated with a single genome (singletons). It should be highlighted that these results are specific for the type strains and their reference strains the Mediterranean clade, included in this study. Neither of the accessory genes are specifically found in all strains of the Mediterranei clade, V. mediterranei (n = 10) showed 3935 gene clusters (24,645 genes), V. maritimus (n = 3) 3261 gene clusters (19,602 genes), V. variabilis (n = 2) 2908 gene clusters (17,604 genes), V. thalassae (n = 2) 3070 gene clusters (19,162 genes) and V. barjaei (n = 1) 1930 gene clusters (18,036 genes). The core genome of Mediterranei clade is relatively large compared to those of the Negativicutes class with a core genome of 134 gene families, however, is consistent with previous pangenomic investigations, coinciding with that was reported for V. cholerae (core 2500 gene families) (Vesth et al. 2010). The genomic groups that emerged from this analysis (378 SCGs selected from the nucleus) coincided with the phylogenetic clusters of the Mediterranean clade (Fig. 1). The distinction of these groups was also supported by differences in average nucleotide identity values between genomes (Fig. 3). The genetic content shared between genomes were effective predictors of their phylogenetic relationships (Delmont et al. 2018).

Fig. 3
figure 3

The pangenome for 18 genomes of Mediterranei clade was calculated based on genome sequences. The values calculate with Anvi’o (ver. 5.5). (Color figure online)

In silico phenotyping

The genomes of the Mediterranei clade were positive for phenotypic features related to growth; growth at 6.5% NaCl, at 42 °C, on MacConkey agar, on ordinary blood agar, and bile-susceptible (Fig. S7). Positive results were predicted for the following enzymes, alkaline phosphatase, lipase, nitrate to nitrite, and negative for coagulase production, pyrrolidonyl-beta-naphthylamide related to carboxylic acid. Negative for malonate, and tartrate use. Genomes were negative for glucose oxidizer, Voges–Proskauer, esculin hydrolysis, l-arabinose, and salicin, and positive for glucose fermentation, methyl red, D-mannitol, d-mannose, maltose, sucrose, and trehalose. They were positive for indole, proteolysis, and catalase; and negative for melibiose, spore formation, hydrogen sulfide. However, the most interesting result was that the phylogenetic tree generated with Traitar was able to separate the Mediterranei clade into two clusters. Therefore, this method demonstrates a high-resolution power based on phenotypic features generated from genomes.

With the Vibriophenotyping program (Table S1), it was found that all species of the Mediterranei clade were negative for l-arabinose, ornithine, and Voges–Proskauer; only d-mannose was positive for all species of the clade. The species of the Mediterranei clade share four similar phenotypic features (30.7%) of the thirteen analysed by the Vibriophenotyping program. In addition, the four phenotypic features they share were compared with Traitar; in three of these they were identical (l-arabinose, d-Mannose and Vogues), however, in the phenotypic features of Ornithine they did not coincide. In particular, for the type strains, the results of the five tests obtained with both methods were compared with data obtained from the literature; the results were found to coincide in most cases, except for the d-mannose phenotypic test (Table S1). The indole phenotype was positive for all species by Traitar, but, the Vibriophenotyping program showed a species with a negative result V. thalassae A-354, the program found the enzyme-related (tnaA tryptophanase) in this phenotype with an identity of 91.1%, however, it did not cover more than 70% of the sequence length, and therefore the program identified it as negative.

The in silico phenotyping for V. mediterranei strains shared seven phenotypic features (53.8%); and the pairs V. thalassae MD16T and V. thalassae A-354 shared eleven phenotypic features (84.6%), these results are similar to those reported by González-Castillo et al. (2015) by means of conventional phenotyping methods. Traitar, the microbial analyzer, is positioned as a useful tool for fast primary identification. Phenotypic identification is laborious, time consuming and involves higher costs; moreover, most of these kits were designed for clinical use (Noguerola and Blanch 2008) and not readily suitable for environmental strains. The power of these methods is based on searching genomes for enzymes related to phenotypes, leaving behind the aforementioned problems. This type of bioinformatics tools will become a powerful tool for researchers, providing benefits such as time and cost savings, and therefore a more reliable and standardized bacterial phenotypification.

Conclusions

The Mediterranei clade is composed of five valid species (V. mediterranei, V. maritimus, V. variabilis, V. thalassae, and V. barjaei) arranged in two subclades, and two potential new species, JCM 19235 and JCM 19239 forming one species and JCM 19240 another.

Strain T01 originally identified as V. variabilis is very distant to the strains of the Mediterranei clade and it is most probably a new species of Vibrio.

The genomic comparison method with the highest resolution power was ANI (ANI and OrthoANI), followed by MLSA. The MLSA based on the 139 gene scheme showed also a high resolution power, separating the clade Mediterranei in two clusters, a cluster with V. maritimus, V. variabilis and the two potential new species, and another cluster with V. mediterranei, V. thalassae, and V. barjaei, this separation was also supported by genomic methods (OrthoANI, codon and amino acid usage). In silico phenotyping, especially with Traitar, showed also a strong discriminating power. These results are important since it is now possible to replace the use of many conventional wet-lab methods, although more thorough analyses have to be made to confirm results.

Finally, based on the results shown in this polyphasic approach, we provide further evidence that allows the substitution of DDH with ANI (ANI and OrthoANI) and that in the near future, only a genomic sequence (or sequences) will be all that is needed to describe a new bacterial species.