Introduction

The genus Pseudoalteromonas denotes a group of Gram-negative, aerobic marine bacteria belonging to the class Gammaproteobacteria, firstly established in 1995 (Gauthier et al., 1995).

In these two last decades, several Pseudoalteromonas strains have been isolated from Polar Regions, inshore waters or surfaces of marine organisms, and were shown to synthesize a wide range of bioactive molecules (Kobayashi, 2003; Feller, 2013; Yu et al., 2013). In this context, it has been recently shown that strains belonging to this genus and isolated from different ecological niches in Antarctica possess the ability to completely inhibit the growth of human opportunistic pathogens belonging to the Burkholderia cepacia complex (Bcc) via the synthesis of a plethora of different antimicrobial compounds (Papaleo et al., 2012; 2013); moreover, some of them are also able to synthesize antibiofilm molecules (Papa et al., 2013; Parrilli et al., 2015).

Another intriguing feature of some Pseudoalteromonas representatives is the association with marine eukaryotic hosts. Indeed, complex communities with biofouling activities have been shown to be beneficial to the eukaryotic host, effectively playing a role in host defence (Holmström et al., 1992; Egan et al., 2002). The study of the molecular mechanisms of antifouling activity from Pseudoalteromonas species is promising for multiple applications, such as biofouling control in aquaculture and novel drug discovery. In particular, the isolation of novel antibiotics is strategically important, considering the global threat for human’s health posed by the emergence of multi-drug resistant (MDR) pathogens, mainly due to antibiotic over usage (Paitan & Ron, 2014). Finally, the most studied Pseudoalteromonas strain, namely Pseudoalteromonas haloplanktis TAC125 (Médigue et al., 2005), has been suggested as an alternative host for the soluble overproduction of heterologous proteins, given its ability to grow fast at low temperatures (Wilmes et al., 2010; Rippa et al., 2012; Corchero et al., 2013; Giuliani et al., 2014). All these features make the representatives of Pseudoalteromonas genus interesting biological subjects, with a great and still under-exploited biotechnological potential.

From a taxonomical viewpoint, the Pseudoalteromonas genus, together with Alteromonas, Glaciecola, Thalassomonas, Colwellia, Idiomarina, Shewanella, Moritella, Ferrimonas and Psychromonas, forms a group referred to as Alteromonas-like bacteria. Historically, the genus was first described by Gauthier et al. in 1995 when, according to a broad-scale phylogenetic analysis based on small subunit ribosomal DNA sequences, it was observed that some Alteromonas representatives did not belong to the monophyletic taxon, which included the other Alteromonas species (Gauthier et al., 1995). Based on this phylogenetic analysis and phenotypical evidences, a total of 12 Alteromonas species, together with Pseudomonas piscicida, were assigned to the genus Pseudoalteromonas.

From a methodological viewpoint, this taxonomical revision highlighted the central role of the 16S rRNA gene sequence to assess the evolutionary relationships among bacterial species. However, being based on a single gene, this method suffers some limitations, such as the limited ability to resolve closely related species (Stackebrandt and Ebers 2006), the fact that it does not represent the whole gene collection, and the poor correlation with the genome-scale method of DNA–DNA Hybridization (DDH), which is an experimental method measuring the overall similarity between two genome sequences (Schildkraut et al., 1961; McCarthy & Bolton, 1963). A higher resolution can be obtained using the sequences of multiple conserved genes with a method known as Multi-Locus Sequence Analysis (MLSA) (Stackebrandt et al., 2002).

At the early stage of the sequencing era, it was theorized how the whole-genome sequence would be the standard to determine the taxonomy (Wayne et al., 1987). Nowadays, thanks to ever decreasing costs and running time, genome sequencing for prokaryotes has become a routine, to the point that a considerable number of sequenced genomes are available in biological repositories such as GenBank.

This wealth of data can be exploited to infer phylogenetic relationships using genome-scale computational methods to provide quantitative estimation of the genomic similarity analogously equivalent to the experimental method DDH (Goris et al., 2007; Kim et al., 2014). One of the advantages provided by a computational method is that it can be used to replace DDH, thus allowing performing significant taxonomic studies at a genome scale for a vast number of species.

Although it may be erroneously considered as marginal, a correct and robust taxonomic classification of microbes plays a central role in describing the extent of microbial diversity in relation with different environments and/or eukaryotic hosts. Also, a reliable phylogenetic reconstruction is crucial in guiding the choice of which other novel organisms should be introduced in a Next Generation Sequencing (NGS) pipeline, in order to avoid oversampling a narrow taxonomic space (the “where to add taxa” problem) (Eddy, 2005; McAuliffe et al., 2005; Pardi & Goldman, 2005; Geuten et al., 2007).

In this work, we performed a comprehensive and multi-level study on the taxonomy of a panel consisting of 25 currently available Pseudoalteromonas representatives and 13 de novo sequenced genomes from Antarctic strains, by integrating different genome-scale methods, namely, genome-scale phylogeny, Average Nucleotide Identity (ANI) and Tetra Nucleotide Frequency (TNF). The benefits and pitfalls of phylogenetic analyses for taxonomic purposes using a genome-scale level set of shared genes in prokaryotes have been discussed by Rossello-Mora (2012). Concerning the ANI, it has been demonstrated how it can be used to discriminate whether two genomes belong either to the same or different species, using a fixed threshold (Goris et al., 2007). Lastly, it has been also shown that oligonucleotide frequencies exhibit species-specific patterns (Karlin & Burge, 1995; Karlin, 1998) and how, in particular, frequencies of tetranucleotides harbour a phylogenetic signal (Pride et al., 2003). This integrated approach revealed that (i) different methods produce consistent results, and (ii) incoherence is observed between genome-scale driven taxonomic annotation and current affiliations of members of the genus Pseudoalteromonas.

Materials and methods

Pseudoalteromonas dataset

The available genomic sequences of 25 Pseudoalteromonas strains were downloaded from GenBank. Additionally, the genomes of 13 Pseudoalteromonas strains isolated from different Antarctic ecological niches (marine sponges, water column, sediments) were sequenced. The genomic DNA was purified according to the protocol described by Papaleo et al. (2013) and sequenced using an Illumina HiSeq 2000 platform (Cock et al., 2010). The resulting reads were trimmed using SolexaQ to obtain a high-quality reads library, which were assembled with the tool AbySS 1.4, choosing the k-mer value leading to the best assembly, defined as the ratio between assembled nucleotides and number of contigs obtained.

Genome coherence measures

To measure the genome coherence between Pseudoalteromonas representatives, the Jspecies tool was used (Richter & Rosselló-Móra, 2009). This software implements different methods to provide a quantitative measure of genomic similarity, which can be used to estimate whether two genomes belong to the same species. The metrics implemented in Jspecies are the Average Nucleotide Identity (ANI) and the TetraNucleotide Frequencies (TNF) (Teeling et al., 2004a, b; Goris et al., 2007).

Average nucleotide identity (ANI)

The ANI provides a quantitative measure of the genomic similarity between two organisms. The Jspecies tool implements the ANI computation method described by Goris (Goris et al., 2007), in which a genome is divided into 1024 nucleotide fragments, which are mapped onto the other genome to measure the ANI. To map the sequences, we computed the ANI values using the MUMmer tool (Delcher et al., 2003). According to Goris et al. (2007), a pair of genomes with an ANI value greater than 0.96 is considered to belong to the same species.

Tetra Nucleotide Frequencies (TNF)

The TNF can be used to measure the similarity of two genomes, by computing the significance of the frequency of each oligonucleotide as Z-scores (Schbath, 1997), and by measuring the Pearson’s correlation of these values. In particular, it has been demonstrated that there is a good correlation with the similarity measures obtained using the DDH method (Teeling et al., 2004a, b). Specifically, two genomes with TNF value greater than 0.99 should be considered as belonging to the same species (Teeling et al., 2004a, b). To have a clearer signal and better visualize the species relationships, the TNF values have been transformed into binary values, considering as 1 the TNF values greater than the species threshold, and as 0 the values lower than the threshold.

Conserved genes phylogeny

In this work, a set of 1537 highly conserved genes has been found using the DuctApe suite (Galardini et al., 2014). The amino acid sequences of the proteins encoded by the 1537 conserved genes (i.e. shared by all the 38 Pseudoalteromonas genomes) have been aligned using the ClustalW software (Larkin et al., 2007) with default parameters and concatenated in a single sequence spanning 597,947 residues. Neighbour-Joining (NJ) phylogenetic tree was obtained with Mega 6 software (Tamura et al., 2013) with the following parameters: Poisson model, uniform rates among sites and 500 bootstrap replicates.

Hierarchical clustering

Hierarchical clustering was performed using the Unweighted Pair Group Method with Arithmetic mean (UPGMA).

Orthologous genes dendrogram

Using the DuctApe suite (Galardini et al., 2014), it was possible to identify 2901 groups of orthologs, which were differentially present in the Pseudoalteromonas genomes. This information has been used to produce a matrix of presence/absence. The dendrogram of this matrix has been obtained by performing a hierarchical clustering (UPGMA) using Jaccard distance.

Results

Pseudoalteromonas dataset

To gain insights into the genomic similarities between representatives of the Pseudoalteromonas genus, a dataset of 38 genomes was used. Thirteen genomes were obtained in this work from Pseudoalteromonas strains isolated from different Antarctic ecological niches (sponges, sediments, and seawater). The 38 genomes belong to bacterial strains representative of the 13 currently described Pseudoalteromonas species. The genomic features of the strains considered are given in Table 1.

Table 1 Main features of the 38 genomes from the Pseudoalteromonas strains analysed in this work

Conserved genes phylogeny

The NJ phylogenetic tree computed using the genes shared by all the 38 Pseudoalteromonas strains is given in Fig. 1. The visual inspection of this tree revealed the presence of a main group consisting of 29 sequences (corresponding to 28 strains, since the genome sequence of strain TAC125 was determined twice using two different methodologies). Since this clade comprises all the P. haloplanktis strains, this group was referred to as P. haloplanktis-like group. However, the P. haloplanktis species appeared not to be monophyletic in the clade, since other species (namely P. undina, P. marina and P. arctica) were embedded in this group.

Fig. 1
figure 1

Pseudoalteromonas phylogenetic tree based on a concatenated sequence consisting of 597,947 amino acids from 1537 conserved proteins shared by the 38 genomes. The azure area represents the P. haloplanktis-like group, while the red dots stand near to strains not assigned to the P. haloplanktis species. Unless specified, bootstrap support is 100. The scale below represents the substitution rate

Analysis of ANI

The results of the hierarchical clustering of the ANI matrix, embedding the ANI values computed for each pair of genomes (reported in Additional File 1) revealed the presence of clusters formed by groups of strains sharing highly similar genomes (ANI value > 96%), which might be considered as belonging to the same species (see Fig. 2). Twenty-five strains were split into 8 clusters of variable size (ranging from 2 to 5 strains). The remaining 13 strains were not grouped with any other representatives, and they were referred to as singletons. Interestingly, among the singletons, we found a relatively high number of previously defined Pseudoalteromonas species (9 out of the 13 species represented in the dataset).

Fig. 2
figure 2

Heatmap representation of the ANI matrix. The clusters reported representative species according to the ANI method. The cell colours represent the ANI values, i.e. a dark red colour stands for a ANI value of 0, whereas a white colour stands for ANI value of 1

Notably, the Pseudoalteromonas species are found to be consistent with the clusters observed, in that the different Pseudoalteromonas species join different clusters; the only exception is represented by P. haloplanktis, since the four strains previously affiliated to P. haloplanktis species are split into three groups (one of them being a singleton, Fig. 2).

Analysis of TNF

The visual inspection of the hierarchical clustering of the TNF binary matrix (whose data are reported in Additional File 2) shown in Fig. 3 revealed a different number of clusters with respect to those identified with the ANI. We detected seven singleton Pseudoalteromonas strains and two major clusters, one of which including most of Pseudoalteromonas strains. A deeper analysis of the clusters composition revealed a consistency between the clusters found with TNF and those observed in the MLSA-based phylogenetic tree. As an example, the largest cluster found with TNF contains the same strains, forming the P. haloplanktis-like group.

Fig. 3
figure 3

Heatmap representation of the TNF matrix. White cells represent pair of strains with TNF value greater than the species threshold. Clustered strains represent species found according to the TNF method

To have a clearer picture of the similarity of the dendrograms produced with different methods, each pair is reported in Additional File 3, 4 and 5, while the number and composition of cluster obtained with the two methods are given in Table 2.

Table 2 Cluster composition (number and composition of clusters) of dendrograms obtained with ANI and TNF

Orthologous genes analysis

In order to fully exploit the information embedded in the genomic sequences, the gene content information has been used to make phylogenetic inferences. The dendrogram of the matrix of gene presence/absence clustering, embedding a total of 2901 genes differentially present in the Pseudoalteromonas strains reported in Additional File 6, has a topology that is very similar to that of the phylogenetic tree obtained with the concatamer of the conserved genes.

Discussion

In this work, we used the genomic sequences of 38 Pseudoalteromonas representatives to depict their taxonomic relationships, using two different (genome-scale) approaches. To the best of our knowledge, this dataset constitutes the most comprehensive and recent source of information for this genus. The first approach used was a phylogenetic analyses based on a comprehensive set of common proteins (genome-scale phylogeny). The topology of the phylogenetic tree showed no monophyly for the type species of the Pseudoalteromonas genus, i.e. P. haloplanktis, even though it was possible to detect a well-defined clade comprising all the P. haloplanktis representatives, which was named P. haloplanktis-like group. This group contains 28 strains, i.e. the majority of the Pseudoalteromonas strains considered in this work, representative of four Pseudoalteromonas species, namely P. haloplanktis, P. undina, P. marina and P. arctica. This result might be explained as follows:

  1. (1)

    The topology of the phylogenetic tree is misleading. In fact, it has been argued how by concatenating genes, the information about individual loci may be hidden, leading to loss of resolution power and, potentially, to misleading results (Rosselló-Móra, 2012 and references therein).

  2. (2)

    The taxonomic assignments are not consistent with the tree topology, which might be due to assignments based on a not full taxonomic characterization of Pseudoalteromonas isolates.

The other approach used was the genome coherence approach, which relies on the idea that members of the same species share the same genomic features such as nucleotide identity and/or similar tetranucleotide composition. Therefore, to explore the likelihood of the two different scenarios, we analysed the genome coherence of the Pseudoalteromonas strains using the ANI and TNF. It has been demonstrated (Pride et al., 2003; Goris et al., 2007) how both these methods can be effectively used to discriminate whether two genomes belong to the same or different species. Hence, we used these approaches to investigate whether the taxon and/or the branching order obtained are consistent with each other and to test the reliability of the results obtained with the MLSA phylogeny. Data obtained revealed that i) the results obtained with these methods are consistent with the genome-scale phylogeny and that ii) the two methods have a different resolution; in particular, the ANI identifies more (putative) species that are found to be clustered together when using the TNF, where the topologies of the dendrograms obtained with these methods show substantial agreement. The fact that the three different methods used in this work provided such similar conclusions is essentially a strong confirmation of the results obtained that can be summarized as a catalogue of inconsistencies with the current taxonomic annotation. In order to fully exploit the information embedded in the genomic sequences, the gene content information has been used to make phylogenetic inferences, which again were found to be coherent with the previous methods. Overall, this suggests that the information embedded in the pattern of orthologs presence can be used to capture the actual phylogenetic relationships.

Although the golden standard for species definition in microbes relies on a polyphasic approach (Vandamme et al., 1996), i.e. requires a combination of molecular and phenotypic tests, these analyses may not always be possible, due to complications in cultivability or due to the experimental efforts required by these taxonomic methods.

By contrast, in the landscape of the post-genomic era, approaches based on the genomic sequences retain several advantages. These methods are faster and cheaper than traditional taxonomic methods and, most importantly, can be easily replicated and applied to uncultivable organisms for which the genomic sequence can be retrieved with technologies like single-cell genomics and/or metagenomics.

In conclusion, the major findings of this work are that (i) a group of three Pseudoalteromonas representatives assigned to different species (P. flavipulchra JG1, Pseudoalteromonas sp. NJ631 and P. piscicida JCM 20779) has been consistently found to belong to the same species according to the three methods used. For these reasons, we propose that these strains might be assigned to the same species; (ii) the presence of a group of similar strains probably belonging to the species P. haloplanktis (P. haloplanktis-like group). Interestingly, most of these strains share a common isolation site (Antarctica) and similar environment/lifestyle (marine environment/association with marine sponges). On the basis of these evidences, we propose to include these strains in the species P. haloplanktis; (iii) a group of three strains belonging to different species (P. undina NCIMB 2128, P. marina mano4 and P. arctica A37) is found in the P. haloplanktis-like group, possibly meaning that they might be included in the species P. haloplanktis.