Introduction

A recent study on the International Space Station (ISS) using an environmental metagenome analysis revealed the presence of 115 validly described microbial genera and 318 species dominated by Rhodotorula (35%) and Pantoea (10%) that were putatively viable and thriving (Singh et al. 2018b). It has been reported that dead cells can be differentiated from the intact cells before analyzing the samples via state-of-the-art molecular technologies (Singh et al. 2018b; Vaishampayan et al. 2013). In these reports, propidium monoazide (PMA) was used as a viability marker to measure intact microorganisms before DNA was extracted and subjected to downstream molecular analyses (Bonetta et al. 2017; Kibbee and Örmeci 2017; Vesper et al. 2008; Weinmaier et al. 2015). The metagenomic sequences generated after treatment with PMA were mined and assembled during this study for the retrieval of genomes that could have originated from intact cells.

The draft metagenome-resolved genome (MRG) obtained via binning and assembly validates the read-based finding to determine the uncultivable species (Barnum et al. 2018). A previously published report shows that contemporary workers were able to identify the uncultivable organism Candidatus Amarolinea aalborgensis gen. nov., sp. nov., associated with a wastewater treatment plant via fluorescent in situ hybridization microscopy, 16S rRNA microbiome, and shotgun metagenomic sequence analysis but could not cultivate them to validly describe the microbial species (Andersen et al. 2019). Another study, based on 16S rRNA microbiome analysis from a marine saltern pond in southwest Spain revealed the presence of a dominant halophilic bacterium, enabling the researchers to define a suitable culture medium for successfully isolating and describing the novel genus Spiribacter salinus (Leon et al. 2014).

The ISS-MRG analyses during this study provided near-complete draft assemblages of seven genomes. Subsequently, based on the average nucleotide identity (ANI; < 95%) and digital DNA-DNA hybridization (dDDH; < 70%) characterizations (Goris et al. 2007), four of these seven MRGs were phylogenetically affiliated to a novel genus. Furthermore, seven strains belonging to this novel genus were retrieved from the ISS culture collection which were originally isolated from various locations of the ISS and archived (Venkateswaran 2017). 16S rRNA gene sequence analyses were not able to resolve taxonomies of these seven ISS strains because they exhibited high similarities with 25 validly described Pantoea species (> 97.5%). However, comparative whole genome sequence (WGS) analyses demonstrated that these seven ISS strains were phylogenetically not affiliated with any validly described genus that belong to the family Erwiniaceae. This is the first study that utilizes the “metagenome to phenome” approach to describe a novel genus from the ISS samples. The genomic and polyphasic taxonomic characterizations further support the proposal of Kalamiella as a novel genus and K. piersonii as a type species (IIIF1SW-P2T).

The objectives of this study are to retrieve novel genomes from metagenome sequences of ISS environmental samples to narrow down the searching for cultured isolates from the archived culture collection of the ISS (~ 500 strains) to generate phenomes. In addition to the taxonomic characterization, comparative genomic analyses of seven K. piersonii strains and four MRGs were carried out to elucidate antimicrobial resistance phenotypic properties, multidrug resistance gene profiles, and genes related to potential virulence and pathogenic potential.

Materials and methods

Sample collection and isolation of bacteria

The sampling of ISS surfaces performed for this study took place within the US on-orbit segments. Samples collected during this study were: Node 3 (Location Nos. 1, 2, and 3), Node 1 (Location Nos. 4 and 5), Permanent Multipurpose Module (Location No. 6), US Laboratory (Location No. 7), and Node 2 (Location No. 8 and control). A detailed description of the various locations sampled was published elsewhere (Singh et al. 2018b). Sample collection from ISS environmental surfaces, processing, and cultivation of bacteria were previously reported (Urbaniak et al. 2018; Venkateswaran 2017).

Molecular characterization of pure culture

A loopful of purified microbial culture was subjected to DNA extraction with the UltraClean DNA kit (MO Bio, Carlsbad, CA) or Maxwell Automated System (Promega, Madison, WI) as per manufacturer instructions. The extracted DNA was eluted in 50 μL of molecular-grade water and stored at − 20 °C until further analysis. The 16S rRNA gene was amplified using the forward primer, 27F (5′-AGA GTT TGA TCC TGG CTC AG-3′) and the reverse primer, 1492R (5′-GGT TAC CTT GTT ACG ACT T-3′) (Checinska et al. 2015; Suzuki et al. 2001). PCR was performed with the following conditions: denaturation at 95 °C for 5 min, followed by 35 cycles consisting of denaturation at 95 °C for 50 s, annealing at 55 °C for 50 s, and extension at 72 °C for 1.5 min, and finalized by extension at 72 °C for 10 min. The amplified products were treated with Antarctic phosphatase and exonuclease (New England Biolabs, Ipswich, MA) to remove 5′- and 3′-phosphates from unused dNTPs before sequencing. Sequencing was performed by Macrogen (Rockville, MD, USA) using 27F and 1492R primers for Bacteria. The resulting sequences were assembled using SeqMan Pro from the DNAStar Lasergene package (DNASTAR Inc., Madison, WI). Bacterial sequences were searched against the EzTaxon-e database (Kim et al. 2012) and identified based on the closest percentage similarity (> 97%) to previously identified microbial type strains.

Whole genome sequencing of ISS strains

WGS sequencing was carried out on an Illumina HiSeq 2500 instrument with paired-end sequencing. The NGS QC Toolkit v2.3 (Patel and Jain 2012) was used to filter the data for high-quality vector- and adaptor-free reads for genome assembly (cutoff read length, 80%; cutoff quality score, 20). High-quality vector-filtered reads were used for assembly with SPAdes (Nurk et al. 2013) genome assembler using default parameters. Each genome was subsequently annotated with the help of Rapid Annotations using Subsystems Technology (RAST) (Aziz et al. 2008). The quality of the final assemblies was checked using the Quast package (Gurevich et al. 2013).

Metagenome sequences and assembly of ISS samples

Metagenomic sequences were generated from PMA-treated samples to enable analyses of the intact portion of microbial cells during this study. The metagenome sequences generated from eight different locations of the ISS during three flights spanning a 15 month period were deposited in NCBI GenBank (Singh et al. 2018b). As already reported, the HiSeq 2500 platform (Illumina) was used for shotgun sequencing, resulting in 100-bp paired-end reads (Singh et al. 2018b). The paired-end 100-bp metagenomic reads were processed with Trimmomatic (Bolger et al. 2014) to trim adapter sequences and low-quality ends, with a minimum Phred score of 20 across the entire length of the read used as a quality cutoff. Reads shorter than 80 bp were removed after trimming. The remaining high-quality reads were subsequently assembled using metaSPAdes (Nurk et al. 2017). Contigs were binned using Metabat2 version 2.11.3 (Kang et al. 2015). Recovered genomes were evaluated with CheckM (Parks et al. 2015), and a recovered genome was considered good with at least 90% completeness and at most 10% contamination. All results were reconfirmed using the MetaWRAP pipeline (Uritskiy et al. 2018).

Phylogenetic analysis

Multilocus sequence analysis (MLSA)-based phylogenetic affiliation was performed as reported elsewhere to interpret the tree positions of the Erwiniaceae members considered in this study (Palmer et al. 2018). Representative genomes of Erwinia, Tatumella, Mixta, and Pantoea were used to determine the correct phylogenetic position of all ten isolated strains and seven MRGs. Representatives of order Enterobacterales such as Enterobacteriaceae, Pectobacteriaceae, and Yersiniaceae were included in the phylogeny, with Serratia marcescens subsp. marcescens Db11 selected as the outgroup. Members of genus Buchnera and Wigglesworthia were not selected due to the absence of full-length genes in the publicly available database (NCBI). Full-length DNA sequences for the genes atpD, gyrB, infB, and rpoB (Adeolu et al. 2016) were retrieved from 62 genomes, and an MLSA phylogenetic tree was generated.

NCBI BLASTn (version 2.2.31+) was used to compare the gene similarities (Camacho et al. 2009). Each gene was individually aligned using Clustal-Omega (version 1.0.3) (Sievers et al. 2011), and alignments were visualized for accuracy in SeaView (version 4.5.4) (Galtier et al. 1996). The four gene sequences (atpD, gyrB, infB, and rpoB) were concatenated together using a custom Perl script (https://github.com/sandain/pigeon/blob/master/scripts/concat.pl) in the order listed. A maximum-likelihood phylogeny was generated using FastTree (version 2.1.10) (Price et al. 2010) from the concatenated nucleotide alignment. Ecotype Simulation 2 (https://github.com/sandain/ecosim) was used to root the phylogeny using Serratia marcescens subsp. marcescens Db11 as the outgroup and to generate the image of the phylogenetic tree. Bootstrapping for the phylogeny was generated by PHYLIP’s SEQBOOT (Felsenstein 2004), and the CompareToBootstrap.pl script provided by Price et al. (2010) was used to calculate the bootstrap values.

Phylogenomics and SNP analyses

Pairwise ANI was calculated using the algorithm from Goris et al. (2007) using EzTaxon-e (Kim et al. 2012). The dDDH analysis was performed using the Genome-to-Genome Distance Calculator 2.0 (GGDC 2.0) (Meier-Kolthoff et al. 2013). The single nucleotide polymorphism (SNP)-based phylogenetic tree was generated using CSIPhylogeny (Larsen et al. 2012) version 1.4. Using genome sequences of multiple isolates, CSIPhylogeny calls SNP, filters the SNPs, performs site validation, and infers a phylogeny based on the concatenated alignment of high-quality SNPs.

Differential gene analysis of WGS and MRGs

Genes identified by Palmer et al. (2018) as differentially present in the genomes of Erwinia, Mixta, Pantoea, and Tatumella were compared with the WGS and MRGs presented in this study using NCBI BLASTn (version 2.2.31+). A custom reference database created from nucleotide sequences was used to find genes from various Pantoea and Tatumella genomes used to construct the MLSA phylogeny. Sequences that aligned to the entire reference sequence (or nearly so, > 90%) were considered to be present in these newly sequenced genomes.

Nucleotide sequence deposition

The draft genome sequence of type strain IIIF1SW-P2T (ISS isolate) and ISS-IIIF5SW (ISS metagenome) were deposited in NCBI under accession numbers SAMN10096957 and SAMN09635154. The accession number of all WGS and MRGs of ISS strains are given in Table 1. This whole-genome shotgun project has been deposited in GenBank under the BioSample No. SAMN10096957 and SUB ID No. SUB4539219. The version described in this paper is the first version, and the accession number for the isolate K. piersonii IIIF1SW-P2T is RARB00000000.

Table 1 Digital DDH (DNA-DNA Hybridization) and ANI (Average Nucleotide Identity) values of ISS strains WG and MRG comparison with members of Erwaniaceae family. (WG: Whole Genome of isolated strains; MG: Metagenome Resolved Genome)

Phenotypic characterization of ISS strains

Phenotypic characterizations of all ISS strains were performed according to standard protocols (Jones 1981). Morphology, size, and pigmentation were observed on trypticase soy agar (TSA; BD Difco) plates after 24 h of incubation. Salt tolerance was determined using 1% peptone water supplemented with 2.0, 5.0, 7.0, and 10.0% (w/v) NaCl. The pH range and pH optimum were tested at pH 5.0, 6.5, 7.0, 8.0, 9.0, and 10.0 by adjusting pH with biological buffers (Xu et al. 2005). Growth at different temperatures was carried out in TSA by incubating the agar plates at 4, 10, 15, 20, 25, 30, 37, and 44 °C for 24 to 48 h. The Gram reaction was determined using the commercial kit (BD Difco), according to the manufacturer’s instructions. Motility was checked using the method described by Skerman (Skerman 1967).

All seven isolates were biochemically identified using BioLog (Hayward, CA) carbon substrate utilization profile characterization (Wragg et al. 2014). For cellular fatty acid analysis, all strains were grown on nutrient agar medium at 30 °C for 24 h before collecting biomass. Cellular fatty acids were extracted, methylated, and analyzed by gas chromatography according to the instructions of the Sherlock Microbial Identification System (MIDI version 4.0), as described previously (Müller et al. 1998; Pandey et al. 2002).

Matrix-assisted laser desorption ionization time of flight (MALDI-TOF) mass spectrometry protein analysis was carried out using freshly grown isolates on TSA media (Schumann and Maier 2014). The sample processing for MALDI-TOF, spectra analysis, and comparative analyses steps were followed as described recently (Seuylemezian et al. 2018). The Microflex LT bench-top mass spectrophotometry instrument (Bruker Daltonics, Billerica, MA, United States) for generating spectra (n = 3) and FlexAnalysis software (Bruker Daltonics, Billerica, MA, USA) for processing raw spectral data, were used (Seuylemezian et al. 2018).

Phenotypic antibiotic resistance testing was carried out as described before (Urbaniak et al. 2018). Briefly, the isolates were streaked from glycerol stocks onto TSA plates. A single colony was inoculated into 5-mL Tryptic Soy Broth and grown overnight at 30 °C. Aliquots of 100 μL were plated on TSA. Agar diffusion discs (BD BBL™ Sensi-Disc™, Franklin Lakes, NJ) were placed aseptically on a plate, and the strains were incubated at 30 °C for 24 h. The tested antibiotics included: 30-μg cefazolin; 30-μg cefoxitin, 5-μg ciprofloxacin, 15-μg erythromycin, 10-μg gentamicin, 1-μg oxacillin, 10-μg penicillin, 5-μg rifampin, and 10-μg tobramycin. The diameter of inhibition zones was measured for each antibiotic disk and recorded in millimeters. The resistance results were compared with the zone diameter interpretive charts provided by the manufacturer.

Results

Metagenome-resolved genomes

Analyses of metagenomic sequences from eight different locations of the ISS enabled the recovery of seven nearly full-length MRGs related to members of the family Erwiniaceae. The ANI analysis (Table 1) showed high similarities of one MRG with Pantoea dispersa (99%), two MRGs with Pantoea brenneri (99%), but four MRGs were not phylogenetically affiliated with any of the known species of Erwiniaceae due to its lower ANI values (< 81%) and are proposed as novel during this study. Among the 22 validly described Pantoea species, only Pantoea septica was the closest with the four novel MRGs, and they exhibited a 90% ANI and 41.2% dDDH values which are too low to place them in the same genus (Goris et al. 2007). The four novel MRGs were recovered from ISS location Nos. 1 (Port panel of Cupola), 5 (Overhead-4- Zero-G Stowage Rack), 7 (Overhead-3 panel surface; LAB103), and 8 (Crew quarter bump-out exterior aft wall). Quality filtering of the shotgun metagenome sequences showed that the draft genome size of these novel MRGs were ~ 4.8 × 106 bp. Table 2 summarizes assembly statistics and GenBank accession numbers of all MRGs assembled in this study qualifying the minimum information about a metagenome-assembled genome standard (Bowers et al. 2017).

Table 2 Genome characteristics of the assembled metagenome-resolved genome (MRG) from ISS metagenome

Metagenomes to phenomes

The16S rRNA gene sequences of ~ 500 strains archived from the ISS were queried against the four novel MRGs. In total, ten strains isolated from Flight No. 3 that exhibited high 16S rRNA gene sequence similarities (> 98%) were further sequenced for their whole genomes. The WGS of these ten ISS strains were compared with the four novel MRGs and found that WGS of seven strains showed > 99.9% ANI values. The ANI and dDDH analyses of the isolated strains with MRGs, and across the various members of the family Erwiniaceae confirm the novel identity of seven out of these ten strains (Table 1). The ANI values of the seven isolated strains that exhibited > 99.5% ANI values with MRGs ranged from 79 to 81% for members of Erwiniaceae family, except for Pantoea septica (90%), which is below the species cutoff (> 95%). The identity range represents that seven ISS strains are not related to any of the closely related genera of Erwiniaceae family. Additionally, the ANI results were further corroborated by the dDDH value that ranges between 19.1 to 23.4% except for Pantoea septica (41.2%), that lies below the species cutoff value of > 70%. Lower ANI and dDDH values represent the genomic distance of all isolated strains from other genus members of Erwiniaceae family. Table 1 summarizes ANI and dDDH values of the isolated strains against the closely related members of Erwiniaceae family. In addition, the WGS of two isolated strains belong to validly described P. dispersa (> 99% ANI), and one isolate was identified as P. brenneri (98% ANI). The seven strains belong to novel genus were isolated from ISS location Nos. 1 (Port panel of Cupola), 2 (Forward side panel wall of the Waste and Hygiene Compartment), 5 (Overhead-4-Zero-G Stowage Rack), 6 (Port-2 Rack wall), and 7 (Overhead-3 panel surface; LAB103). A minimum of one strain was isolated in each of these locations with the exception that location No. 5 yielded three strains of the novel genus.

Genome sequence characteristics of novel Kalamiella piersonii strains

The Illumina HiSeq 2500 platform yielded 1.30 × 107 paired-end reads for the sequencing of isolated strains. Subsequently, the draft genome was assembled with a total of 1.27 × 107 reads after performing trimming and quality filtering of the paired-end sequences. The assembled draft genome consisted of 27 contigs with a genome size of ~ 4.85 × 106-bp. The median contig length (N50) size was 5.2 × 105 bp, with a mean coverage of 100×, G + C Mol % was 57.07%, and the number of coding sequences identified after RAST annotation were 4830. Table 3 summarizes assembly statistics of all seven novel ISS strains isolated and sequenced in this study.

Table 3 Genome characteristics of the assembled whole genomes from ISS

MLSA analysis of MRGs and novel ISS strains

The MLSA-based phylogenetic analysis was based on the housekeeping genes identified in the whole genomes (atpD, gyrB, infB, and rpoB). Sixty-two genomes of members of family Erwiniaceae and related species, including the seven MRG with all four MLSA genes were used to construct the phylogenetic tree. Concatenated sequences of all genes gave a phylogenetic tree with five clades excluding the members of Enterobacteriaceae, Pectobacteriaceae, and Yersiniaceae (Fig. 1). These clades can be summarized as Pantoea clade (22 species +3 MRG), Kalamiella clade (n = 7 strains +4 MRGs + Pantoea septica strain LMG 5345), Mixta clade (4 species), Tatumella clade (4 species), and Erwinia clade (6 species + Pantoea coffeiphila strain 342). Differential nesting was seen in the phylogenetic tree for the following organisms: Pantoea coffeiphila strain 342 is nested in Erwinia clade while Pantoea septica strain LMG 5345 is aligned more with the new clade Kalamiella proposed in this study.

Fig. 1
figure 1

Maximum-likelihood phylogeny for members of Kalamiella gen. nov., Erwinia, Mixta, Pantoea, Tatumella, and other genera of the family Enterobacteriaceae reconstructed from concatenated, full-length nucleotide sequences of the genes atpD, gyrB, infB, and rpoB. The concatenated sequence from Serratia marcescens subsp. marcescens Db11 was used as the out-group. Newly sequenced isolates and metagenome-resolved genomes are highlighted in red. The thick black bars on the right indicate the genus of each clade, with Kalamiella gen. nov. highlighted in red. The scale bar indicates 5% nucleotide sequence divergence. Bootstrapping values are included on internal nodes in the phylogeny and represent the number of trials (out of 100) that included that particular branching pattern

Differential gene analysis of MRGs and novel ISS strains

A similar set of genes that were reported and identified as differentially present in the genomes of Erwinia, Mixta, Pantoea, and Tatumella (Palmer et al. 2018) were found in WGS, and MRGs are presented in this manuscript. Genomes identified as members of Kalamiella gen. nov. shared many genes with Pantoea and Mixta spp. (Table S4). Most of the genomes identified as Kalamiella gen. nov. lack the genes related to purine and pyrimidine metabolism (PRHOXNB or XDH, and rutD or rutC), whereas most of the genomes identified as Pantoea or Mixta include these genes, perhaps representing a metabolic difference in these genera. Differences between genomes identified as Kalamiella gen. nov. were found mainly in WGS (e.g., bioF, garR, hutI, and nadA) which are likely the result of gaps in the assembly and the inability to align an entire sequence to the reference and may not represent biological differences.

Antimicrobial and multidrug-resistant characteristics of ISS strain genome

A detailed genome analysis of type strain IIIF1SW-P2T was carried out to understand its genetic makeup. Annotated features were classified as carbohydrate metabolism (348 genes), amino acid and derivatives (384 genes), protein metabolism (214 genes), cofactors, vitamins, prosthetic groups, pigments (170 genes), membrane transport (108 genes), and RNA metabolism (59 genes) (Fig. 2). To test antimicrobial resistance at the genomic level, the ISS strain was further compared with nosocomial isolates Pantoea septica having 90% ANI identity with the ISS strain. Features pertaining to virulence, disease, and defense were very similar in both the genomes, accounting for 50 genes in IIIF1SW-P2T and 49 genes in P. septica. Both organisms show the presence of cobalt-zinc-cadmium resistance, copper homeostasis, and Mycobacterium virulence operon as major virulence factors.

Fig. 2
figure 2

Metabolic functional profiles and subsystem categories distribution of strain IIIF1SW-P2T (ISS isolate)

Phenotypic characterization of novel Kalamiella piersonii strains

All Kalamiella strains are Gram-stain-negative, motile and rod-shaped, and do not form endospores. Colonies are beige in color with circular and smooth edges. All strains grow at temperatures between 12 and 37 °C (with optimum growth at 30 °C), at a pH range between 6.0 and 9.0 (with optimum at 8.0) except for the strain IIIF1SW-P2T and IIIF2SW-P2 that showed growth as high as at pH 10. All strains showed growth in the presence of 0–5% NaCl. The phenotypic differentiation of Kalamiella strains from other closely related genera belong to the family Erwiniaceae is shown in the supplementary Table S1.

The BioLog-based carbon substrate profiles of all members of the family Erwiniaceae are depicted in supplementary Table S1. All the species belonging to Kalamiella, Erwinia, Mixta, Pantoea, and Tatumella did not exhibit differential biochemical characteristics except for their utilization of d-cellobiose and gentiobiose. Both of these sugars were not assimilated by Kalamiella and Erwinia members, whereas the majority of Mixta, Pantoea, and Tatumella species utilized them as the sole carbon source.

Fatty acid methyl ester (FAME) profiles of Kalamiella strains and related genera are shown in supplementary Table S2. All genera of the family Erwiniaceae produced palmitic acid (C16:0; ~ 31 to 39%) in high amounts whereas Erwinia species can be differentiated from other genera of Erwiniaceae by the high C17:0 cyclo production (~ 30%) as reported elsewhere (Rezzonico et al. 2016). Kalamiella strains produced ~ 13% of C17:0 cyclo fatty acids, whereas Mixta (11%), Pantoea (7%), and Tatumella (9%) species produced less. Kalamiella strains produced C19:0 cyclo ω8c 9-(2-eptylcyclopropyl) nonanoic (2.5%), while only Pantoea showed the presence of this fatty acid (0.8%). Similarly, Kalamiella and Mixta strains were differentiated by not producing vaccenic acid (C18:1 ω7c), but Pantoea, Erwinia, and Tatumella strains showed the presence of this unsaturated fatty acid (Tracz et al. 2015). However, Kalamiella and Mixta species exhibited similar FAME profiles, but Mixta species produced iso-C19:0 which could be used to differentiate from each other (Chen et al. 2017; Palmer et al. 2018).

The MALDI-TOF profiles of the K. piersonii strains showed a score value of 1.8 with P. septica as the nearest match, while all other known species of Pantoea had a MALDI-TOF score < 1.5. This supports that Kalamiella strains should be considered a novel genus.

Antibiotic characteristics of novel Kalamiella piersonii strains

All seven K. piersonii isolates were resistant to cefoxitin, erythromycin, oxacillin, penicillin, and rifampin, while all strains were susceptible to cefazolin, ciprofloxacin, and tobramycin (Table S3). When compared with the other three Pantoea species tested, K. piersonii strains did not differ much in its resistance profiles for the antibiotics tested via disc diffusion methods.

Discussion

The integration of the “classical to current” concept in microbiology has given researchers tools which were never available before. Classical microbiology depends on the culture of the organism and studying it as a live entity under a “polyphasic” analysis (Checinska Sielaff et al. 2017). On the other side is MRG where the organism is uncultured but the genetic tools give us a predicted genome for downstream analysis (Sangwan et al. 2016). The MRG approach leads to understanding genetic mobility, metabolic interactions (Barnum et al. 2018), expanding the tree of life (Parks et al. 2017), and also decoding the mammalian gut (Stewart et al. 2018). However reliable these tools and approaches are, live organisms are helpful to validate their existence. The fine line of intermediate understanding is where the phenomics lies and plays a crucial role in microbiological studies. Microbial phenomics can be defined as “use of classical knowledge in synchrony with omics data science in deciphering the genetic instructions of the genome, leading to the microbial understanding and cultivation.” Here, we present the very first study utilizing samples collected from ISS where metagenome to phenome approach was undertaken and which led to culturing and description of a novel bacterial genus.

Since the description of novel microbial species requires a cultured organism, an attempt was made to find the strains from the archived bacterial species (~ 500 strains) isolated from ISS samples that could match with the MRGs. It has been well established that strains that show < 97.5% 16S rRNA gene sequence similarities would not exhibit > 70% DDH values, a gold standard to describe novel species, thus the 16S rRNA gene was used to screen a large number of strains (Stackebrandt and Ebers 2006). The 16S rRNA gene sequences of 30 out of 500 strains archived from the ISS exhibited higher similarities (> 97%) with MRGs, and among them 10 strains (> 98%) were selected for WGS. In addition to the 16S rRNA gene (100%), the gyrB gene (> 99%) that was reported to be the phylogenetic discriminator was amplified (La Duc et al. 2004). Comparative analyses of the 16S rRNA gene (> 98%) and the gyrB gene (> 95%) similarities yielded seven strains (Fig. S1) that were similar to the four MRGs. The WGS of these seven strains showed > 9 9.9% ANI values with the novel MRGs, confirming that the metagenomes to phenomes approach would yield the cultured isolates required for describing novel species. The established threshold values to delineate bacterial species were > 95% for ANI and > 70% for dDDH (Goris et al. 2007). Such an approach that enables researchers to isolate microbes also allows them to characterize the functional pathways, and their potential virulence properties that can directly affect human health. The SNPs analyses of cultivated strains and PMA-treated MRGs showed that strains isolated from ISS location Nos. 1 (Port panel of Cupola), 2 (Forward side panel wall of the Waste and Hygiene Compartment), 5 (Overhead-4-Zero-G Stowage Rack), 6 (Port-2 Rack wall), and 7 (Overhead-3 panel surface; LAB103) were identical. The absence of K. piersonii strains from location #8, where a novel MRG was recovered, might be due to their low abundance at this site and warrants further study. A quantitative PCR approach that was reported to be successful in measuring specific microbes of interest (Bargoma et al. 2013) should be implemented to selectively monitor for the presence of K. piersonii, when necessary.

Redox-active metals such as Co2+, Zn2+, and Ni2+ are essential cofactors in bacterial metabolism, but become toxic at higher concentrations (Nies 1992). Hosts to bacterial infection have adapted to this by adjusting their metal homeostasis as a defense mechanism against infection (Braymer and Giedroc 2014; Troxell and Hassan 2013). In an evolutionary arms-race, bacteria have evolved metal resistance mechanisms to overcome these defenses (Barber and Elde 2015). Recent incidences of infection on the ISS (Crucian et al. 2016) in addition to evidence for changes to human gene expression related to immune system function during extended stays in microgravity (unpublished) could explain the high incidence of cobalt-zinc-cadmium resistance genes detected in metagenomes sampled from various locations of the ISS (Singh et al. 2018b). Genes associated with metal resistance have been detected in the MRG and WGS sequenced in this study, including cadmium transporting ATPase (EC 3.6.3.3), cation efflux system proteins CusC and CusF, cobalt-zinc-cadmium resistance proteins CzcA, CzcB, and CzcD, copper sensory histidine kinase CusS, heavy metal-resistant transcriptional regulator HmrR, heavy metal RND efflux CzcC, and zinc transporter ZitB.

Variation lay in the presence of Beta-lactamase (EC 3.5.2.6), DNA-binding heavy metal response regulator and, transcriptional regulator MerR family gene in P. septica, while the strain IIIF1SW-P2T had unique presence of cobalt-zinc-cadmium resistance protein, mercuric ion reductase (EC 1.16.1.1), FAD-dependent NAD(P)-disulphide oxidoreductase (Table S4). As these have very similar genomic resistance profiles, it should not be concluded that stain IIIF1SW-P2T is a human pathogen similar to P. septica. The K. piersonii IIIF1SW-P2T strain is a biofilm forming organism which is supported by the presence of genes involved in biofilm formation and quorum sensing. The ISS isolate had more stress-resistant genes, especially those helping with osmotic stress and detoxification. These genes include betaine aldehyde dehydrogenase (EC 1.2.1.8), choline dehydrogenase (EC 1.1.99.1), high-affinity choline uptake protein BetT, OpgC protein, cell wall endopeptidase, family M23/M37, GST-like protein yncG, RNA-binding protein Hfq, anti-sigma B factor antagonist RsbV, serine phosphatase RsbU, alkanesulfonates ABC transporter ATP-binding protein, and FrmR, a negative transcriptional regulator of the formaldehyde detoxification operon. The above-mentioned genes were not present in the P. septica, the nearest Earth analog, which may be the adaptation mechanism of the organism towards the microgravity environment on the ISS or towards the industrial level cleaning regime.

The PathogenFinder (Cosentino et al. 2013) algorithm predicted that the strain IIIF1SW-P2T had ~ 42% probability of being a human pathogen, and hence K. piersonii strains should be rated as non-human pathogens. This was concluded on matches with genes from 15 pathogenic families and 21 nonpathogenic families in the database corresponding with the type strain.

With the inclusion of genomes as prokaryotic taxonomic descriptors (Chun and Rainey 2014), higher level taxons, like class Gammaproteobacteria, saw a prominent reclassification in taxon description. Subsequently, a new order “Enterobacterales” was defined in the Gammaproteobacteria and seven families were described including Erwiniaceae (Adeolu et al. 2016). The taxonomic description of the family Erwiniaceae is being updated with new advancements in genomics and phylogeny (Palmer et al. 2018). The Erwiniaceae family, consisting of phytopathogens (Hauben et al. 1998), has undergone multiple taxonomic revisions (Gardan et al. 2004; Mergaert et al. 1999; Rezzonico et al. 2016). The description of seven genera such as Buchnera, Erwinia, Mixta, Pantoea, Phaseolibacter, Tatumella, and Wigglesworthia were reported (Adeolu et al. 2016; Palmer et al. 2018). The genus Pantoea is ubiquitous and currently consists of 25 validly described species that are pathogenic to plants, insects, and animals (Walterson and Stavrinides 2015). Species associated with Buchnera were reported to be a symbiont of aphids (Munson et al. 1991), Tatumella consists of human pathogens (Hollis et al. 1981), Phaseolibacter members are plant pathogens (Halpern et al. 2013), and Wigglesworthia species are an insect endo-symbiont (Aksoy 1995).

In order to discriminate the genus/species combination in the family Erwiniaceae, a recent report utilized an MLSA strategy to reassign five Pantoea species into a new genus, Mixta (Palmer et al. 2018). Gene sequences of atpD, gyrB, infB, and rpoB were successfully used to differentiate members of the order Enterobacterales (Adeolu et al. 2016). The MLSA tree of 62 genomes that included the ISS genomes clearly shows that 4 MRGs and 7 ISS strains clade separately with P. septica, and belong to a new clade with K. piersonii. This opens the possibility for reclassification of P. septica into the Kalamiella genus, but that is beyond the scope of this study. Phylogenetic distance and cladification of Kalamiella is also supported by the ANI and dDDH values where Kalamiella closely associates with P. septica and is distinctly different from all other members of family Erwiniaceae. These phylogenomic characterizations support the proposal of Kalamiella as a novel genus and K. piersonii as novel species, with the designated type strain of IIIF1SW-P2T that was isolated from the ISS Port panel of the Cupola (location No. 1. which is the observation deck for the crew).

Phenomics also leads to cover the gaps of “great plate anomaly” (Staley and Konopka 1985). A descriptive ISS metagenome analysis was carried out recently that led to the identification of a total of 318 defined microbial species, but only 32 (~ 10%) of them could be cultured (Singh et al. 2018b). A description of the novel species Solibacillus kalamii (Checinska Sielaff et al. 2017) found in ISS HEPA filter particulates was possible due to the WGS approach (Seuylemezian et al. 2017), which was found to be very closely related to Solibacillus silvestris (Krishnamurthi et al. 2009). Similarly, thorough genomic characterization of several ISS Enterobacter strains revealed their close association with clinical strains that caused diseases in neonatal patients (Doijad et al. 2016; Roach et al. 2015) and were also resistant to carbapenem and colistin (Norgan et al. 2016). Isolation of E. bugandensis strains helped to characterize its multidrug resistance; otherwise, metagenome and genome sequences could only predict such metabolic pathways (Singh et al. 2018a). Furthermore, the availability of bacterial isolates might allow researchers to analyze them in in vivo to discern the influence of microgravity on their pathogenicity, and thorough “omics” characterization of isolates could help to understand the pathogenic potential (Singh et al. 2018a). Microbial phenomics has broad applications in space microbiology not only because of concerns for astronaut health but also for applied and basic microbiology study. Isolation of novel exosporium (rich in lipoproteins)-producing spore-forming bacterium, S. kalamii, might have great potential for biotechnological relevance (Checinska Sielaff et al. 2017). Isolation of enhanced pathogenic variants of known microbes (Singh et al. 2016) will help to improve our understanding of complex metabolic networks that control fundamental life processes under microgravity and in deep space.

Kalamiella piersonii gen nov., sp. nov. description

Etymology: (N.L. fem. dim. n. Kalamiella, named after APJ Abdul Kalam (1934–2015), a well-known scientist who advanced space research in India. pierson.i.i N.L gen. n. piersonii referring to Duane Pierson, an accomplished American space microbiologist).

Cells are Gram-strain-negative, aerobic, motile short rods (1–1.2 × 2.8 um), occurring in the single or dual arrangement. Colonies are circular, convex with a diameter of approximately 0.6–1.0 mm and beige in color after 24 h of incubation on TSA medium at 30 °C. Cell growth occurs at 12 and 37 °C but not at 4 or 44 °C. The optimum growth was observed at 30 °C. The pH tolerance is between 6.0 and 10.0, with a pH optimum at 8.0. All of the strains displayed positive growth at 0–5% NaCl. The strains were positive for carbon source utilization of dextrin, d-maltose, N-acetyl-d-glucosamine, N-acetyl-β-d-mannosamine, α-d-glucose, d-mannose, d-fructose, d-galactose, l-rhamnose, inosine, 1% sodium lactate, d-mannitol, myo-inositol, glycerol, d-glucose-6-PO4, d-fructose-6-PO4, troleandomycin, rifamycin SV, glycyl-l-proline, l-alanine, l-arginine, l-aspartic acid, l-glutamic acid, l-histidine, lincomycin, guanidine HCl, niaproof 4, d-galacturonic acid, l-galactonic acid lactone, d-gluconic acid, d-glucuronic acid, glucuronamide, mucic acid, d-saccharic acid, vancomycin, tetrazolium violet, tetrazolium blue, citric acid, d-malic acid, l-malic acid, lithium chloride, and γ-amino-butyric acid. The strains were negative for carbon source utilization of d-trehalose, d-cellobiose, gentiobiose, sucrose, d-turanose, stachyose, d-raffinose, a-d-lactose, d-melibiose, b-methyl-d-glucoside, d-salicin, N-acetyl-d-galactosamine, N-acetyl neuraminic acid, 3-methyl glucose, d-fucose, l-fucose, fusidic acid, d-serine, d-sorbitol, d-arabitol, d-aspartic acid, d-serine, minocycline, gelatin, l-pyroglutamic acid, l-serine, pectin, quinic acid, p-hydroxy-phenylacetic acid, methyl pyruvate, d-lactic acid methyl ester, l-lactic acid, α-keto-glutaric acid, bromo-succinic acid, nalidixic acid, potassium tellurite, Tween 40, α-hydroxy-butyric acid, β-hydroxy-d,l-butyric acid, α-keto-butyric acid, acetoacetic acid, propionic acid, acetic acid, formic acid, aztreonam, sodium butyrate, and sodium bromate. Major cellular fatty acids (> 10%) are C16:0, C17:0 cyclo, Summed Feature 3, and Summed Feature 8. Lesser fatty acids are C12:0, C14:0, C14:0 2-OH, C19:0 cyclo ω8c, and Summed Feature 2.

The type strain, IIIF1SW-P2T (=DSM 108198=NRRL B-65522T), was isolated from the ISS Port panel of the Cupola, which is the observation deck for the crew. The DNA G + C content of the type strain is 57.07 mol% (whole genome).