Introduction

Rothia is a genus of gram-positive, aerobic, coccoid or bacillary, nonmotile, non-spore-forming bacteria of the family Micrococcaceae, phylum Actinobacteria (Austin 2015). To date, 15 Rothia species have been identified. As part of the normal microflora of the gastrointestinal tract (especially the oral cavity and stomach) of humans and animals, it can also cause gastric atrophy and intestinal metaplasia and induce opportunistic infections of the upper respiratory tract in immunocompromised people (Nardone and Compare 2015; Odeberg et al. 2023). Various Rothia have been recovered from soil, water sources, benthos, rocks, atmosphere, and other sources (Austin 2015).

Endophytic Rothia strains of different species have been isolated from the rhizosphere and tissues of many plants, viz. Sphagnum magellanicum (Opelt et al. 2007), Dysophylla stellata (Xiong et al. 2013), Hedysarum perrauderianum (Torche et al. 2014), H. naudinianum (Torche et al. 2014), Musa acuminate (Sekhar and Thomas 2015), Oryza sativa (Evangelista et al. 2017), Seidlitzia rosmarinus (Shurigin et al. 2020), Camellia sinensis (Borah and Thakur 2020), Alnus glutinosa (Davis et al. 2020), A. incana (Mercurio et al. 2022), Zea mays (Elbahnasawy et al. 2021; Pisarska and Pietr 2015), Vaccinium myrtillus (Mažeikienė et al. 2021), Santalum album (Tuikhar et al. 2022), Miscanthus floridulus (Xiao et al. 2023), Arabidopsis thaliana (Sokolov et al. 2021), Vítis vinífera (Vaghari Souran et al. 2023), and Beta vulgaris (Petrović et al. 2024). Rothia endophytes are inhibitory to several pathogenic fungi, bacteria, parasitic nematodes, and insect larvae, and they can be used as biofertilizers (Asadu et al. 2020; Bano and Muqarab 2017; da Silva et al. 2018; Nuaima 2022). Some Actinobacteria, e.g. Rothia mucilaginosa, have been reported to have a nifH gene sequence representing their ability to fix atmospheric nitrogen, which is yet to be demonstrated (Gtari et al. 2012). Additionally, Rothia endophytica can be utilized as biofertilizer producing no adverse effect, has shown biocompatibility during cocultivation with crops by generating enough indole-3-acetic acid, tryptophan, and biomass, and demonstrates phosphate solubilizing ability (Asyakina et al. 2023; Sokolova et al. 2024).

Many Rothia genomes isolated from humans, animals, and the environment have biosynthetic gene clusters that supposedly produce antibiotic nonribosomal peptides, siderophores, and other secondary metabolites that modulate microbe–microbe (quorum sensing) and, potentially, microbe–host interactions (de Oliveira et al. 2022). Certain strains produce natural substances with antimicrobial activity, which are planned to be used to treat human and animal infections (Fatahi-Bafghi 2021). In particular, peptidoglycan hydrolases produced by Rothia dentocariosa can be used as highly specific therapeutic agents against nasal pathogens (Stubbendieck et al. 2023). Finally, a number of Rothia soil bacteria can degrade xenobiotics, specifically phthalate esters and aromatic hydrocarbons (Jia et al. 2022; Yastrebova and Plotnikova 2020).

In a suspension culture of Arabidopsis thaliana (L.) Heynh (All-Russian Collection of Higher Plant Cell Cultures, Timiryazev Institute of Plant Physiology of the Russian Academy of Sciences, Moscow, Russia), a bacterial microflora was found that did not inhibit plant growth (Sokolov et al. 2021). A number of microbiological characteristics of this strain are presented in Methods. DNA sequencing of the isolate’s 16S rRNA gene sequence (GenBank accession number OQ702765.1) was done by the Research and Production Company SYNTOL (2023). The results obtained by the 16S rRNA test (Sokolov et al. 2021; Shchyogolev et al. 2023) showed that this isolate (hereafter Isolate SG) is taxonomically close to members of the genus Rothia.

Analysis of the systematic position of Isolate SG (Sokolov et al. 2021) and phylogenetic studies of Rothia members in the 16S rRNA-based test (de Oliveira et al. 2022; Elbahnasawy et al. 2021; Fan et al. 2002; Fatahi-Bafghi 2021; Ko et al. 2009; Stubbendieck et al. 2023; Tuikhar et al. 2022; Xiong et al. 2013; Yastrebova and Plotnikova 2020) have shown their taxonomic proximity to other Micrococcaceae, including Kocuria, Arthrobacter, Micrococcus, and other genera.

This article reports the results for the quantitative intra- and intergeneric taxonomic relationships between the Micrococcaceae strains closely related to Isolate SG (Sokolov et al. 2021). The historically given assignments (names) of the strains were validated by using (1) the known results from whole-genome sequencing of the strains’ DNA and (2) a set of five whole-genome phylogenetic tests.

Methods

Microbiological characteristics of Isolate SG

Isolate SG was recovered from a suspension culture of Ar. thaliana (L.) Heynh (Sokolov et al. 2021) and was deposited in the Collection of Rhizosphere Microorganisms (Collection of Rhizosphere Microorganisms 2023), Institute of Biochemistry and Physiology of Plants and Microorganisms, Russian Academy of Sciences (IBPPM RAS) under accession number IBPPM 684. Bacteria were grown on tryptic soy broth or on tryptic soy agar (Becton Dickinson, NJ, USA).

Morphologically, Isolate SG is a gram-positive, facultatively anaerobic, coccoid bacterium that is non-acid-resistant and is shaped closely to a sphere (diameter, about 1 µm). The optimal growth temperature is 30–37 °C. When growing in liquid tryptic soy broth, the cells spread by gliding motility. Incubation on tryptic soy agar at 30 °C gives rise to creamy, rounded colonies with an even edge and a smooth surface. The cells utilize sucrose and lactose as their sole sources of carbon and energy; they are sensitive to rifamycin, tetracycline, and cefoxitin and are resistant to laevomycetin (chloramphenicol), penicillin, and amoxycillin.

16S rRNA test

Isolate SG was taxonomically identified by its 16S rRNA sequence. The sequence was obtained by the Research and Production Company SYNTOL (SYNTOL 2023) by polymerase chain reaction with a set of universal primers, by using a cell suspension of Isolate SG provided by us. Bacteria were isolated from a 10-day-old suspension culture of Ar. thaliana (L.) Heynh by fractional centrifugation. The first centrifugation of the Arabidopsis cell suspension was done with a Jouan BR4i centrifuge (Jouan, France) at 300 g for 30 min. The supernatant liquid was passed through a fine-pore paper filter to remove plant cell debris and was centrifuged again with a 5810R centrifuge (Eppendorf, USA) at 15,000g for 30 min. The bacterial cell sediment was resuspended in phosphate-buffered saline. The obtained 16S rRNA gene sequence of Isolate SG (GenBank accession number OQ702765) covers V2–V8 hypervariable regions and can be used for genus and species identification of the strain (Shchyogolev et al. 2023).

Sequence alignment of the 16S rRNA test

The results generated by the 16S rRNA-based tool (Search EzBioCloud Database 2023; Table 1) with the 16S rRNA gene sequence of Isolate SG were used as the initial sets of strains, 16S rRNA gene sequences, and genomes. A 16S rRNA phylogenetic tree was constructed with the “Advanced” option of the NGPhylogeny web service (Lemoine et al. 2019; NGPhylogeny.fr 2023) and with the FastME 2.0 program (FastME 2.02023). This method is distance based (Lefort et al. 2015) and was selected to facilitate comparison of its results with those of the ANI and AAI tests. The 16S rRNA genes of Isolate SG and of the strains listed in Table 1 were subjected to multiple sequence alignment (MSA). The MAFFT MSA tool used is part of the NGPhylogeny toolkit, can be combined with FastME, PhyML, and other tools, and is commonly used in phylogenetic studies.

Table 1 Characteristics of the type strains related to Isolate SG in the 16S rRNA-based test (Search EzBioCloud Database 2023)

Evolutionary distance matrices of average nucleotide and amino acid identities (ANI and AAI)

The average nucleotide identity (ANI) and average amino acid identity (AAI) matrices (ANI/AAI-Matrix 2023; Goris et al. 2007; Jain et al. 2018; Rodriguez-R and Konstantinidis 2014) and their corresponding evolutionary distance matrices were used to obtain quantitative data on the belonging of strains to a particular taxonomic category on the basis of whole-genome sequencing data. The matrices were generated by linear transformation of the ANI/AAI values. References to the genomes of 20 strains marked with asterisks in Table 1 are given in the output data of the program (Search EzBioCloud Database 2023). Additionally, when this study was in progress, only six more references to the genomes of strains of the species listed in Table 1 and marked with a double asterisk could be found on the website (Genome Information by Organism 2023). Thus, it became possible to take into consideration 26 whole genomes [references for 26 of the 33 strains in Table 1 (column 5)] across the range of I values from Table 1.

Tests based on ANI and AAI form the basis of the Microbial Genomes Atlas (MiGA) genomic and metagenomic data management and processing system (Rodriguez-R et al. 2020) for the phylogenetic classification and cataloging of microbial genomes and analysis of their gene content. In this study, the TypeMat version (New query datasets 2023) was used, which determines the taxonomic position, novelty rank, and content of the query genome and finds the closest genomes of the type strains of officially named species from the TypeMat specialized database (Rodriguez-R et al. 2020; 20,809 reference genomes as of summer 2023).

In the output generated by the programs (ANI/AAI-Matrix 2023), the phylogenetic trees built from evolutionary distance matrices by the BioNJ method (and some other methods) do not include data to characterize branch significance. In MSA-using taxonomic studies, this function is fulfilled by bootstrap analysis (Lemoine et al. 2018). This gap was addressed by applying the rate of elementary quartets (RΕQ) program (REQ 2023), which is designed to estimate branch significance for phylogenetic trees based on distance matrices. The closer to 100 is the quantitative value in the RΕQ output (Guénoche and Garreta 2000), the more fully the corresponding branch is supported by the pairwise evolutionary distances.

The systematic position of Isolate SG at the whole-genome level with respect to the entries from the UniProt protein structure database (UniProtKB 2023) (250 million records) was evaluated with the AAI-profiler program (AAI-profiler 2023; Medlar et al. 2018).

GTDB-based taxonomic classification

The taxonomic position of Isolate SG was also determined on the basis of the phylogenetic system described by Parks et al. (2017, 2018, 2020). This system is aimed at constructing a Genome Taxonomy Database (GTDB; Genome Taxonomy Database 2023) by using sets of 120 (bacteria) or 122 (archaea) concatenated amino acid sequences of ubiquitous single-copy proteins as phylogenetic markers. The online version of the GTDB-Tk taxonomic classification toolkit (Chaumeil et al. 2022; GTDB-Tk Classify-v1.6.0 2023) was used.

Phylogenetic trees

Phylogenetic structures with relatively small numbers of operational taxonomic units (OTUs) (tens and hundreds) were visualized and analyzed with the MEGA11 program (MEGA 2023). For trees with hundreds of thousands of OTUs, the Archaeopteryx program (Archaeopteryx 2023) was used. Table S1 (Supplementary Material) summarizes the phylogenetic tests used, with quantitative criteria for taxon demarcation.

Results and discussion

16S rRNA-based test

Table 1 summarizes the data obtained by the 16S rRNA-based tool (Search EzBioCloud Database 2023) for 33 type (T) strains of the genera Rothia, Kocuria, and Arthrobacter in the “Valid names only” category. On the basis of pairwise alignment of the 16S rRNA gene sequences, these strains were identified as related to Isolate SG, with values of pairwise genetic sequence identity I > 95%. Following the commonly accepted convention and as stated on the web page (Understanding Results 2023), I values of 95–98.6% correspond to strains belonging to the same genus and those of 92–95% correspond to strains belonging to the same family. According to Yarza et al. (2014), the taxonomic threshold for genera is I > 94.5% and that for families is I > 86.5%. For species, the condition I > 98.6% is now accepted (Kim et al. 2014; Stackebrandt and Ebers 2006), in contrast to the previously used I > 97% (Tindall et al. 2010).

The I value (Table 1, line 2, column 6, red color) indicates that Isolate SG and R. amarae strain JCM 11375 belong to the same species. The > 95% pairwise I values, obtained for Isolate SG with all strains (Table 1, column 6), are the basis for assigning Isolate SG to the same genus with each of the indicated 33 type strains (16S rRNA-based test). According to the traditional taxonomical classification trends, these strains have been assigned by their authors as members of three genera: Rothia, Kocuria, and Arthrobacter. These conventional illustrations of conducting traditional prokaryotic classifications essentially need further revision on the basis of new emerging methods. These clarifications must be identified on the basis of phylogenetic studies of strains by using whole genome data (Genome Taxonomy Database 2023; Parks et al. 2017, 2018, 2020).

Figure 1 shows the phylogram of 16S rRNA gene sequences obtained with FastME 2.0 (FastME 2.02023) for the strains listed in Table 1. The numbers near the nodes indicate statistical support for the branches (absolute units) in 1000 bootstrapping cycles (Lemoine et al. 2018). The red dots mark the nodes of four clusters (monophyletic groups a–d) comprising sets of 5 to 12 Rothia, Kocuria, and Arthrobacter strains. As it follows from the results of determination of the percent identity matrix (Fig. S1, Supplementary Material), the pairwise values of I between strains within these monophyletic groups mostly satisfy the condition I > 95%. This indicates that by the criteria used (Kim et al. 2014; Stackebrandt and Ebers 2006; Understanding Results 2023; Yarza et al. 2014), these strains belong to the same genus according to 16S rRNA technology. Meanwhile, between clusters, I values generally satisfy the 87–92% < I < 95% condition, which demarcates genera within a family. However, both within and outside the clusters, there are a notable number of deviations of I values from these quantitative criteria of taxon demarcation. This fact illustrates their conventionality as averaged characteristics within fairly broad distributions of the corresponding statistical data with 16S rRNA sequences (Luo et al. 2014).

Fig. 1
figure 1

Phylogram (FastME) of the 16S rRNA gene sequences of the Rothia, Kocuria, and Arthrobacter type strains related to Isolate SG (Table 1). The branch length reflects the FastME distance. The numbers near the nodes indicate the bootstrapping results

Whole-genome GTDB, ANI, and AAI tests

More unambiguous taxonomic estimates can be expected from the use of the ANI and AAI matrices on the basis of whole-genome DNA sequencing of the strains (ANI/AAI-Matrix 2023; Goris et al. 2007; Jain et al. 2018; Rodriguez-R and Konstantinidis 2014). In Table 1, GCA_007666515.1 was initially specified as the genome corresponding to R. amarae strain JCM 11375T, which is the most closely related to Isolate SG in the 16S rRNA test. However, the GCA_007666515.1 genome is described in the NCBI database record as an “anomalous assembly” owing to the “unverified source organism.” We examined it by using the GTDB-Tk software package (GTDB-Tk Classify-v1.6.0 2023) to determine its most probable taxonomic classification and to clarify the species identity of the strain under study on the basis of ANI. The resulting taxonomic classification of this genome was as follows: domain, Bacteria; phylum, Actinobacteriota; class, Actinomycetia; order, Actinomycetales; family, Micrococcaceae; genus, Rothia; and species, Rothia sp001683935 (reference genome ID), when assigning to it the taxonomy of the reference genome GCF_001683935.1 of Rothia sp. ND6WE1A. The resulting pairwise value of ANI = 98.87% and the value of AF = 96% (the percentage of orthologous regions) indicate high intraspecies relatedness between R. amarae JCM 11375T (GCA_007666515.1) and Rothia sp. ND6WE1A (GCF_001683935.1). The program output contains the item “note,” which states that “topological placement and ANI have congruent species assignments.”

In view of the results given just above, the GCF_001683935.1 genome [the reference genome for GCA_007666515.1 in the GTDB (Genome Taxonomy Database 2023)] was accepted in our laboratory as a verified reference genome for our new isolate SG. Hence, the GTDB test was the subsequent whole-genome test method to determine the differences, in contrast to GCA_007666515.1 (the reference genome for SG in the 16S rRNA test).

Like the aforementioned 16S rRNA-, ANI-, and AAI-based tests, the GTDB technology remains within the paradigm of vertically inherited genotypic traits of prokaryotes, because it uses the markers from the core part of the pangenome without consideration of the effects of horizontal gene transfer (HGT) in its accessory and strain-specific parts (Koonin 2012). This probably explains, in particular, the need to verify prokaryote classification and nomenclature with the corresponding classification changes, which turned out relevant for about 60% of the GTDB entries (Genome Taxonomy Database 2023) analyzed in Parks et al. (2018).

Figure 2 shows the phylogram of the ANI of the 26 strains being considered. The phylogram is presented in the output data of the software package (ANI/AAI-Matrix 2023) and was constructed by the BioNJ method by using a matrix of ANI values. The red dots mark the nodes of five monophyletic groups (a–e) comprising sets of three to seven Rothia, Kocuria, and Arthrobacter strains. The results reflected in the matrix of pairwise ANI values (Fig. S2, Supplementary Material) show that there is no significant nucleotide identity between the genome of Rothia sp. ND6WE1A (the reference strain for Isolate SG) and the genomes of a number of strains. This explains the greatest evolutionary distance for Rothia sp. ND6WE1A among all the 26 strains taken into consideration (Fig. 2).

Fig. 2
figure 2

Phylogram obtained by the BioNJ method by using the distance matrix in the ANI-based test with the genomes of the strains listed in Table 1. The branch length reflects the ANI distance. The numbers near the nodes are branch significance estimates made with REQ (2023)

There are ambiguous taxonomic % values (Fig S1S2, Supplementary Material) in the 16S rRNA test (based on sequence similarity) and ANI matrix pairwise values (based on the evolutionary distance) at the genus and species levels. ANI values ≤ 80% were considered within and between the monophyletic groups in Fig. S2. This is associated with the limited sensitivity of the ANI-based test at ANI values ≤ 80% (Rodriguez-R and Konstantinidis 2014). For more reliable quantitative demarcation of strains at the family level, the developers of the resource (ANI/AAI-Matrix 2023) recommend the use of the AAI-based test.

It is recommended that the ANI/AAI-Matrix software package (ANI/AAI-Matrix 2023) be used to generate a matrix of AAI values and the corresponding phylogenetic constructs by using the whole proteomes of the same 26 strains of Rothia, Kocuria, and Arthrobacter. The red dots (Fig. 3) indicate 5 nodes of monophyletic groups a–e, comprising 2–13 strains in the phylogram. The pairwise AAI matrix values showed a distribution of strains between the five groups, marked with dashed rectangles (Fig. 4), with intrageneric (AAI ≥ 65%, red) and intraspecies (AAI > 90%, yellow) AAI values (Understanding Results 2023). For demarcating genera at the family level, the condition 45% < AAI < 65% (colorless cells) is given on the web page (Understanding Results 2023). These inequalities are based on averaged characteristics and are generally consistent with the statistical analysis results given in Luo et al. (2014). Note the higher branch support in the AAI-based test (Fig. 3), as compared with that in the ANI-based (Fig. 2) and 16S rRNA-based (Fig. 1) tests.

Fig. 3
figure 3

Phylogram obtained by the BioNJ method by using the distance matrix in the AAI-based test with the genomes of the strains listed in Table 1. The branch length reflects the AAI distance. The numbers near the nodes are branch significance estimates made with REQ (2023)

Fig. 4
figure 4

AAI value matrix obtained from the distance matrix presented in the output data of the software package (ANI/AAI-Matrix 2023). The dashed rectangles mark the fragments corresponding to monophyletic groups a–e, whose nodes are marked with red dots in Fig. 3. The cells with AAI ≥ 65% (red, same genus) and AAI > 90% (yellow, same species) are highlighted in color

In the AAI-based test, the strains collected within separate monophyletic groups a–e in Figs. 3 and 4 should be considered belonging to the same genus (AAI ≥ 65%). By the AAI < 65% criterion, however, the evolutionary distances between these groups correspond to the intergeneric level within the family Micrococcaceae. Consequently, the genus affiliation of the strains in monophyletic groups a, b, and c, originally assigned by their authors to the genus Rothia, and d and e, originally assigned to the genus Kocuria, requires clarification (with possible renaming). Rothia sp. ND6WE1A, the reference strain for Isolate SG at the whole-genome level, and R. dentocariosa ATCC 17931T, the type species for the genus Rothia, are part of groups a and b (Figs. 3, 4), separated by intergeneric AAI values. In the AAI-based test, Art. crystallopoietes strain DSM 20117T and partly R. mucilaginosa ATCC 25296T are found outside the genera Rothia and Kocuria. It is noteworthy that in the 16S rRNA-based test (Table 1, column 5), these strains show I > 95% (an intrageneric level of sequence identity to Isolate SG) (see also Shchyogolev et al. 2023). This fact illustrates limitations of the 16S rRNA test in the taxonomic identification of bacterial isolates, which have been repeatedly considered in the past two decades. For example, Konstantinidis and Tiedje (2007) pointed out that the frequent result of the high sequence conservation of prokaryotic 16S rRNA genes is that strains differing in their phenotype, ecology, or complete genome may have very similar 16S rRNA gene sequences.

The AAI-based test results (Figs. 3 and 4) indicate that Rothia sp. ND6WE1A, the reference strain for Isolate SG at the level of the genomes studied, belongs to the genus Rothia and forms a monophyletic group with R. aerolata CCM 8669 (AAI = 71%), R. terrae KJZ-14 (AAI = 67%), and R. nasimurium E1706032 (AAI = 67%) in cluster a (Figs. 3 and 4). The sources of the strains are highly diverse in type and geography. Rothia sp. ND6WE1A is a member of a microbial consortium at the cathode of a solar microbial fuel cell originally enriched in seawater, New Jersey, USA. R. aerolata CCM 8669 is a type strain from the microbial culture collection in Beijing, China. R. terrae KJZ-14 and R. amarae KJZ-9 were isolated from soil dirt in Minhang District of Shanghai, China, and R. nasimurium E1706032 was isolated from a bird brain at a duck farm in Jining, Shandong Province, China. The adaptation of these strains to life in such contrasting ecological niches is reflected in their species diversity (AAI < 90%) within the genus Rothia, to which they were originally assigned by their authors.

These observations on Rothia sp. ND6WE1A are consistent with the results from the use of the MiGA webserver (New query datasets 2023; Rodriguez-R et al. 2020). These results indicate that this genome most probably belongs to the order Micrococcales (p = 0.0024) and probably belongs to the genus Rothia (p = 0.31). However, the “Taxonomic novelty” section states that it most probably belongs to a species not included in the TypMat database (New query datasets 2023; p = 0.0021), which comprises genomes of type strains (including Rothia dentocariosa ATCC 17931T), the highest taxonomic rank with p ≤ 0.01. The data presented in MyTaxa Scan testify to the detection in the genome of Rothia sp. ND6WE1A of regions with an unusual taxonomic distribution. These regions are shown and enumerated in the MyTaxa Scan diagram (Supplementary file MyTaxa_Scan_diagram_Rothia_sp_ND6WE1A.pdf). According to Luo et al. (2014) and Rodriguez-R et al. (2018, 2020), they can be interpreted as the result of HGT, because the estimates presented in the “Quality (essential genes)” section show zero contamination of the genome core. These regions are highly probably in the accessory and strain-specific parts of the pangenome (Koonin 2012; Tettelin and Medini 2020) and are not accounted for in the other tests we used.

Rothia sp. ND6WE1A, the reference strain for Isolate SG, is part of the GTDB R06-RS202 reference tree, included in the file bac120_r202.tree, with 254,091 OTUs (GTDB Data 2023). Figure 5 shows a Micrococcaceae subtree fragment that is a monophyletic group comprising strains assigned to the genus Rothia in GTDB R06-RS202. This fragment was extracted from the file bac120_r202.tree with Archaeopteryx (Archaeopteryx 2023). In the phylogenetic tree of Fig. 5, including 22 OTUs, the genome designations in the file bac120_r202.tree (GTDB Data 2023) [access codes to the NCBI genome assemblies (Genome Information by Organism 2023)] were replaced by the strain names given in the corresponding entries in the NCBI database. The red dots denote cluster nodes—three isolated monophyletic groups a–c within the g_Rothia fragment of the file bac120_r202.tree (GTDB Data 2023).

Fig. 5
figure 5

Fragment of the reference phylogenetic tree bac120_r202.tree (GTDB Data 2023) for the monophyletic group assigned to the genus Rothia in GTDB R06-RS202. The branch length reflects the similarity between the sets of 120 concatenated amino acid sequences of ubiquitous single-copy proteins. The numbers near the nodes indicate the bootstrapping results

As an example of the abovementioned need for detailing of the taxonomic affiliations detected by GTDB technology (Parks et al. 2018), one can note that the entries attributed by their authors to the genera Curtobacterium and Kocuria are found in the monophyletic group c of the g_Rothia fragment of the file bac120_r202.tree (GTDB Data 2023) (Fig. 5). A relevant positive finding here is the clarification of the systematic position of the metagenomic entries Micrococcaceae bacterium UBA7136 (rat intestinal metagenome) and M. bacterium UBA5788 (urban ground metagenome), having the status “unclassified Micrococcaceae” and found as part of one cluster in the monophyletic group a of the g_Rothia fragment (Fig. 5).

For estimating the validity of the generic assignment of the strains and metagenomic entries belonging to monophyletic groups a–c of the Fig. 5 phylogram, whose nodes are marked with red dots, the AAI test (ANI/AAI-Matrix 2023) was used as the most reliable means currently available to distinguish between strains within the family (Goris et al. 2007; Rodriguez-R and Konstantinidis 2014; Rodriguez-R et al. 2018). Among the 22 strains in Fig. 5, whose genomes are used in GTDB R06-RS202 (GTDB Data 2023), the proteomes of M. bacterium UBA7136, M. bacterium UBA5788, and Rothia sp. MGYG-HGUT-01258 were not found in the NCBI database (Genome Information by Organism 2023). The results obtained for the remaining 19 strains are shown in Figs. 6 and 7.

Fig. 6
figure 6

Phylogram obtained by the BioNJ method by using the distance matrix in the AAI-based test with the proteomes of the strains from the monophyletic group assigned to the genus Rothia in GTDB R06-RS202 (Fig. 5). The branch length reflects the AAI distance. The numbers near the nodes are branch significance estimates made with REQ (2023)

Fig. 7
figure 7

AAI value matrix obtained from the distance matrix presented in the output data of the software package (ANI/AAI-Matrix 2023). The dashed rectangles mark the fragments corresponding to monophyletic groups a–c, whose nodes are marked with red dots in Fig. 6. The cells with AAI ≥ 65% (red, same genus) and AAI ≥ 90% (yellow, same species) are highlighted in color

The red dots in Fig. 6 mark the nodes of three clusters corresponding to those in Fig. 5, with the same designations (a–c). The species (strain-level) and genus (cluster-level) demarcation of the OTUs in Figs. 5 and 6 is quantitatively ensured by their comparison with the matrix of AAI values in Fig. 7, in which the areas corresponding to clusters a–c in Figs. 5 and 6 are marked with dashed lines. The coloring shows cells satisfying the AAI ≥ 90% (yellow, same species) and AAI ≥ 65% (red, same genus) conditions (Luo et al. 2014; Understanding Results 2023). The colorless cells correspond to intergeneric pairwise values of 45% < AAI < 65%, joining clusters a–c together (Figs. 5 and 6) within the same family (Luo et al. 2014; Understanding Results 2023). The belonging to the same species in the AAI-based test (Fig. 7, yellow cells) is confirmed for the strains R. nasimurium irhom31/R. nasimurium E1706032; Rothia sp. HMSC078H08, with the pair R. mucilaginosa ATCC 25296T/R. mucilaginosa NUM-Rm6536; Kocuria sp. TGY1127_2/Kocuria sp. 36.

The most substantial result of using the AAI-based test in the analysis of the phylograms in Figs. 5 and 6 for the strains assigned to Rothia in GTDB R06-RS202 (GTDB Data 2023) is the < 65% intergeneric level of AAI values (Luo et al. 2014; Understanding Results 2023) for clusters a, b, and c (Fig. 7). In accordance with the conditions (Luo et al. 2014; Understanding Results 2023), these clusters should be assigned to different genera within the same family (AAI > 45%).

AAI-profiler

Figure 8 shows the AAI distribution diagram for the proteome of the reference strain Rothia sp. ND6WE1A, used as a query in the AAI-profiler test (AAI-profiler 2023; Medlar et al. 2018). On the horizontal axis are plotted AAI values between the query and the species members in the UniProt database (UniProtKB 2023). The vertical axis shows the matched fraction (MF, coverage), which is the fraction of query proteins that have a match in the species. MF values are averaged characteristics over a large dataset in the UniProt database, including proteins from partially sequenced genomes. The icons in the diagram correspond to the species that have scored the highest marks with allowance for AAI and MF, i.e., the sum of the sequence identity values for all query proteins with established matches. Related species, grouped and colored on the basis of genus, form a characteristic “cloud” in the diagram, with AAI values reflecting the evolutionary proximity of the UniProt strains to the query strain. The horizontal axis has icons for the species for which DNA sequencing results have been obtained for individual proteins only.

Fig. 8
figure 8

AAI distribution diagram for Rothia sp. ND6WE1A, as found in the AAI-profiler output data. The vertical dashed lines in Fig. 8 correspond to the AAI cutoff values for grouping strains into genera (AAI > 0.65) and species (AAI > 0.9). The icons are colored according to genus (bacteria) or order (eukaryotes). Eukaryotic species are marked with rhombuses; bacteria, with circles; archaea, with crosses; and everything else (viruses, metagenomes, unclassified samples), with squares

The vertical dashed lines in Fig. 8 correspond to the AAI threshold values for grouping strains into genera (AAI > 0.65) and species (AAI > 0.9) according to the conditions specified on the web page (Understanding Results 2023). The strain most closely related to Rothia sp. ND6WE1A is R. amarae KJZ-9, with AAI = 98.9%, ANI = 98.6%, and MF = 0.946. Thus, all three genomes [GCA_007666515.1, R. amarae (mucilaginosa) DE0531 (originally listed in Table 1 as corresponding to R. amarae JCM 11375T but excluded from our consideration; see above); GCA_001683935.1, Rothia sp. ND6WE1A; and GCA_014705945.2, R. amarae KJZ-9] represent the same species with high interstrain evolutionary relatedness, characterized by ANI and AAI values of about 99%.

Another strain closely related to Rothia sp. ND6WE1A [AAI = 94%, which exceeds the threshold value for grouping strains at the species level (AAI > 0.9)] is Arthrobacter sp. SMCC G919, whose icon is located on the horizontal axis of the diagram (Fig. 8). Then come members of Thermocrinis, Planobispora, Mycobacterium, Glutamicibacter, Micrococcus, and other genera, which too are located on the horizontal axis in descending order of AAI values ≈ 0.9. The appearance in this diagram, grouping mostly Micrococcaceae members, of “noncongruent” species with a small (almost zero) MF, may be associated with HGT (Medlar et al. 2018; Treangen and Rocha 2011).

On the other hand, Arthrobacter occupies a certain position in the main part of the diagram with an MF in the range 0.03–0.25 and AAI = 62–67% in the region of 60–80%, which groups strains at the genus level (Luo et al. 2014). Members of Rothia and Kocuria also fall into the same area in the diagram (Fig. 8), with MFs ranging from 0.28 to 0.95. Such “incorporation” of Kocuria and Arthrobacter members (and other bacterial species/strains marked with pink, green, and light green dots in Fig. 8) into the main cluster of Rothia, comprising species related to the query strain Rothia sp. ND6WE1A, should be considered a signal for eventual taxonomic reclassification of these members, which was also shown above with the pairwise whole-genome tests based on ANI and AAI. According to Medlar et al. (2018), at high MFs, such overlapping is due to misclassified or mislabeled samples, whereas at lower MFs, it may indicate contamination or, possibly, HGT.

As found with AAI-profiler, the strain most closely related to Rothia sp. ND6WE1A (after R. amarae KJZ-9) is R. nasimurium PT-32 (Fig. 8; AAI = 71.7%, MF = 0.69). These data show a noticeable gap between Rothia sp. ND6WE1A (and its closely related R. amarae KJZ-9) and the main group of species in Fig. 8 (MF, 25% and greater), which indicates that its genome differs strongly from those of the related species, found with AAI-profiler in the UniProt database. This observation is in harmony with the results of phylogenetic studies using ANI (Fig. 2), which showed a considerable evolutionary deviation of Rothia sp. ND6WE1A from the 26 Rothia, Kocuria, and Arthrobacter strains considered.

According to the results in Figs. 3, 4, 5, 6 and 7, Rothia sp. ND6WE1A, which is the reference strain for Isolate SG at the reference genome level, and R. dentocariosa ATCC17931T, which represents the type species of Rothia (Austin 2015), may be members of different monophyletic groups separated by intergeneric AAI values < 65%. Thus, when the strains belonging to such groups are appropriately verified (Figs. 3, 4, 5, 6, 7), priority in the genus name Rothia should be given to the cluster with a member of the type species R. dentocariosa. It follows that the strains comprising cluster a in Figs. 3, 4, 5, 6 and 7 (including Rothia sp. ND6WE1A) could be assigned an appropriate genus name after additional genotypic and phenotypic data are obtained (Oren and Garrity 2014). This is consistent with the results of the MiGA-based test for Rothia sp. ND6WE1A, which showed its statistically valid taxonomic novelty in relation to the TypeMat reference database, containing type whole-genome material. Additional physiological–biochemical and genetic studies are needed to resolve this issue with respect to Isolate SG.

In the AAI-profiler test (Fig. 8), Rothia sp. ND6WE1A was the same species as R. amarae KJZ-9, with ANI and AAI values of about 99%. That they live in largely different ecological niches (see above) is very probably ensured by the phenotypic traits controlled by the genes in the accessory and strain-specific parts of the pangenome (Koonin 2012; Tettelin and Medini 2020). These genes, introduced most commonly by HGT, and their products are not accounted for in the tests based on 16S rRNA, ANI, AAI, GTDB-Tk, and MiGA (outside the “MyTaxa Scan” section), which handle “housekeeping” genes.

Conclusions

The obtained results show that the introduction into consideration of the ANI and AF, which ensure quantitative identification of OTUs at the species level for the GTDB bacterial and archaeal genomes, solves the problem of constructing a taxonomic structure from domain to species only in part. This is supported by the contradictions that we have identified between the obtained results and the genus names of the strains (metagenomic entries) in GTDB R06-RS202. We have shown that as the first step to eliminate them, one can use AAI-based tests. However, additional genotypic and phenotypic characteristics need to be considered before an appropriate reclassification of the strains/metagenomic entries can be made.

Thus, the presented results should be considered as identifying the problem and providing a signal for the subsequent reclassification of the strains in question. Such reclassification could be done by the authors of the strains (or any other interested researchers), by using appropriate additional genotypic and phenotypic criteria within a polyphasic approach.