Introduction

Smokeless tobacco (ST) is a scientifically proven carrier of carcinogenic compounds, which is an essential part of the daily routine of more than 356 million people across 140 countries. (Sinha et al. 2018). India is among the top consumers of ST products where 21.4% of adults including men and women consume tobacco routinely. Approximately 0.35 million Indians die every year from consuming tobacco in any form (Siddiqi et al., 2020). We are rich in information on the chemical constituents of tobacco products (Rodgman and Perfetti 2009; Vishwakarma and Verma 2020, 2021), however, limited information is available on the bacterial diversity of indigenous ST products. The smokeless tobacco harbor a high bacterial load that accounts for more than one million bacterial cells per gram of tobacco products (Han et al. 2016; Smyth et al. 2017). The inhabitant bacterial diversity of tobacco may exhibit dynamic metabolism which participates in several physiological reactions to cope with the environment of tobacco products (Tyx et al. 2016; Sami et al. 2021, Srivastava et al. 2021). During the storage and aging of tobacco products, nitrite accumulation may occur due to bacterial metabolic activity (Law et al. 2016; Han et al. 2016), and need to be proved. The available nitrate is converted into nitrite by bacterial nitrate reductases during oxygen-deprived conditions and/or anaerobic environments (Di Giacomo et al. 2007; Fisher et al. 2012). Microbes have the ability to convert nitrate into nitrite by two pathways i.e., dissimilatory nitrate reduction and periplasmic nitrate reduction pathways (González et al., 2006; Srivastava et al. 2022a). The nitrite further reacts with tobacco alkaloids to form tobacco-specific nitrosamines (TSNA), which are well-known carcinogens (Tyx et al. 2016; Rivera et al. 2021; Vishwakarma and Verma et al. 2021). Moreover, Rothia, Streptococcus, Neisseria, and Corynebacterium-like bacteria also constitute the ST bacterial microbiome and have been reported in acetaldehyde formation during fermentation (Mason et al., 2015; Halboub et al. 2020; Vishwakarma and Verma, 2021). Besides, the tobacco-inhabitant bacteria contribute to the secretion of bacterial toxins, pro-inflammatory molecules, lipopolysaccharides (LPS), flagellins, and peptidoglycan/peptidoglycan fragments (Tyx et al. 2016; Smyth et al. 2017; Sajid et al. 2021a), that elevates the tobacco chewing health risks on the human oral cavity. Therefore, it is crucial to study bacterial diversity along with its role in risk factors for human oral health. At present, only one report (Sajid et al. 2021a) is available on bacterial diversity from Indian smokeless tobacco products, however, it is limited to a sample size of seven only and includes only loose tobacco leaves. In the present study, taxonomic and functional profiling of indigenous smokeless tobacco is discussed to understand their bacterial composition and putative health risks. Moreover, an attempt was also made to compare the bacterial diversity and predicted functional genes in between loose and commercial ST products.

Material and methods

Sample collection, library preparation and FastQC

Popular brands of commercial smokeless tobacco (ST) and loose tobacco leaves were collected from the local market of various regions of northern India during June to July 2021 (Table 1). The samples were stored to arrest the microbial growth at low temperature (4 °C) till their processing. 0.5 gm of ST sample was vortexed rigorously in sterile phosphate buffer saline (0.1 M, pH 8.0) to loosen the associated bacterial communities. The samples were centrifuged at low-speed of 1000 rpm to separate the bacterial soup from the tobacco leaves. The supernatant was centrifuged at high speed of 10,000 rpm to collect the bacterial pellet. Thus, obtained bacterial cells were treated with extraction buffer (N, N, N, N-cetyltrimethylammonium bromide 1%, polyvinylpolypyrrolidone (PVPP) 2%, 1.5 M NaCl, 100 mM EDTA, 0.1 M TE buffer (0.1 M, pH 8.0), 0.1 M sodium phosphate buffer (pH 8.0), and 100 μL RNase A; Verma and Satyanarayana, 2011) along with 2.5 µL each of lysozyme (10 mg/ml) and proteinase-K (10 mg/ml) and kept at 37 °C for 1 h followed by addition of 50 µL of 10% SDS (w/v) solution. The cells were further kept at 60 °C for 1 h to lyse the bacterial cells. The lysed soup was obtained after high-speed centrifugation and loaded into silica-based columns of Qiagen DNeasy Blood and Tissue Kit (Qiagen, USA) after slight modifications. The metagenomic DNA was eluted in 50 µL of 0.1 M TE buffer (pH 8.0) and stored at − 20 °C until use. The 16S rRNA amplicon library was prepared by Qubit quantified 5 ng of extracted metagenomic DNA for amplifying the V3–V4 region (V3F_CCTACGGGNBGCASCAG and V4R_GACTACNVGGGTATCTAATCC) hypervariable regions using NEBNext Ultra DNA library preparation kit. Thus, obtained libraries were processed for paired-end sequencing at AgriGenome Labs Pvt. Ltd, Cochi, India with Illumina HiSeq 2500 platform. The sequences obtained were processed for their QC analysis along with average base content per read and GC distribution (Supplementary Table 1).

Table 1 Sample information, pH and moisture content of all ST samples

Data processing, taxonomy assignment

Raw reads obtained through Illumina platform were further processed in QIIME2 (Quantitative Insights into Microbial Ecology) pipeline (ver. 2020.11). Further, q2-dada2 plugin available in QIIME2 pipeline was used for denoising, chimera removal and quality filtering of sequences to generate good quality reads (QC value > 30) and amplicon sequence variants (ASVs). The obtained good quality reads were further used for taxonomy assignment by using qiime feature-classifier command with the SILVA 16S database (Quast et al. 2013).

Diversity analysis, and functional prediction

Various alpha and beta diversity indices such as Chao1, Shannon, Simpson, PD whole_tree, Good’ estimator, Bray–Curtis’s dissimilarity index and Weighted Unifrac analysis were calculated in QIIME2 pipeline on command basis to determine bacterial diversity and their abundance in each ST samples. The putative functional annotation of the bacterial population of the ST samples were analysed using the phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt2; v 2.4.1) tool (Douglas et al. 2020) which uses ASVs table generated after denoising step into its reference tree for functional prediction.

Statistical and network analysis

On the basis of packaging, twenty ST samples were categorized into two groups i.e., CT (commercial tobacco) and LT (loose leaf tobacco) groups. The Mann–Whitney 'U’ test was performed to observe significantly different genera present in between CT and LT groups. The Prism software (ver. 9.2.0; http://www.graphpad.com/) was used for box-plot analysis for various alpha diversity indices (Chao1, Shannon, Simpson) in between CT and LT groups. The Principal Coordinate analysis (PCoA) plot and weighted unifrac distances were generated through Past software (ver. 4.03; https://folk.uio.no/ohammer/past) to observe the heterogeneity of bacterial community between the two groups. The rarefaction curves were made with the sampling depth of 1000 for each sample of respective ST products (Supplementary Fig. 1) The PICRUSt derived information for the genes involved in nitrogen metabolism, biofilm formation, and proinflammatory biomolecules were compared statistically in between two groups (CT and LT) by using a two-sided G-test (w/Yates) + Fisher’s test at a 95% confidence interval in STAMP software (ver.2.1.3; http://kiwi.cs.dal.ca/Software/STAMP). Further, the Spearman correlation (rs) was assessed in between genera (> 0.1% abundance), and functional genes involved in respective pathways (nitrogen metabolism, biofilm formation, and proinflammatory biomolecules) by using otu.association command in MOTHUR (Schloss PD et al., 2009). The Cytoscape (ver. 3.9.1; Shannon et al., 2003; https://cytoscape.org/) was used for visualizing each correlation, network was made at the respective Spearman correlation (r) cut-off of 0.5, 0.1, and 0.3 for nitrogen metabolism, biofilm formation, and proinflammatory biomolecules respectively.

Results

OTUs generation and taxonomic assignment of ASVs

Amplicon libraries were successfully achieved from respective metagenomes followed by Illumina sequencing. The V3-V4 based amplicon sequencing generated a total of 77,87,704 raw reads, which were reduced to 38,44,595 high-quality reads (ASVs) after their processing through DADA2 in QIIME2 (Supplementary Table 2). The generated ASVs were processed for taxonomic assignment using Quantitative Insights into Microbial Ecology (QIIME2) software (https://qiime2.org). A total of 34 phyla, 82 classes, 176 orders, 256 families, 356 genera, and 154 species were assigned using SILVA database. A significant portion of raw reads were unassigned that accounts for more than 50% at various taxa level.

Overall dominant phyla, genera, and species

Overall analysis of twenty ST samples exhibited the dominance of three phyla that showed their relative abundance ≥ 1%. It includes Firmicutes (28.99%), Proteobacteria (18.69%), and Actinobacteria (1.36%). Whereas, Bacteroidetes (0.26%), Cyanobacteria (0.14%), Planctomycetota (0.028%), Acidobacteriota (0.010%), Verrucomicrobiota (0.006%), Patescibacteria (0.004%), and Gemmatimonadota (0.003%) represent the small contributors of ST microbiome (Supplementary file 1; Fig. 1A). Firmicutes was observed as most dominant phylum among thirteen samples (ST-1, ST-4, ST-5, ST-9, ST-11, ST-12, ST-13, ST-14, ST-15, ST-16, ST-18, ST-19, and ST-20) while comparing at inter-individual level. Whereas Proteobacteria was observed as dominant phylum among remaining seven tobacco samples followed by Firmicutes. The abundance of Actinobacteria count was second order among four ST samples only (ST-11, ST-12, ST-15, and ST-20). Moreover, Bacteroidota and Cyanobacteria counts were of the third or fourth order in all STs.

Fig. 1
figure 1

The picture showed the relative abundance at phylum level (> 0.001) of bacterial taxonomy (A). Relative abundance at genera level (> 0.1%) bacterial taxonomy (B). Relative abundance at Species level (> 0.1%) of bacterial taxonomy (C). Shared bacterial microbiome of ST products at phyla and genera level (D)

Of the 358 assigned genera, Lysinibacillus (9.87%), Dickeya (8.37%), Terribacillus (5.67%), Bacillus (5.61%), Oceanobacillus (2.57%), Halomonas (1.84%), Staphylococcus (1.39%), Aerococcus (1.38%), and Desemzia (1.13%) showed their abundance more than 1%. While, Corynebacterium (0.87%), and Facklmia (0.84%), Solibacillus (0.67), Alkalibacterium (0.057%), were observed for their abundance in the range of 1 to 0.5% (Supplementary file 2; Fig. 1B). Inter-individual differences at the genus level resulted in a very similar pattern where genera of Firmicutes were shared among the majority of the tobacco samples. Six ST samples (ST-6, ST-8, ST-10, ST-18, ST-19, ST-20) were populated with Lysinibacillus. While, Terribacillus and Bacillus showed their dominance among four different STs (ST-1, ST-7, ST-9, ST-16) and (ST-4, ST-5, ST-11, ST-14). Two STs (ST-2, ST-17) were dominated by genus Dickeya, it was also showed second dominance in two STs (ST-16, ST-18). Oceanobacillus, Halomonas, Aerococcus, and Corynebacterium were the dominant genera in only one ST respectively (ST-15), (ST-3), (ST-13), and (TE-12). Oceanobacillus and Bacillus also showed their second major dominance in five or six STs (ST-4, ST-5, ST-8, ST-11, ST-14) and (ST-1, ST-2, ST-7, ST-9, ST-15, ST-17). Lysinibacillus showed second order dominance in two STs (ST-12, ST-13). Whereas Staphylococcus Aerococcus, desemzia, Facklamia and Solibacillus showed their second dominance in one ST sample (ST-3, ST-19, ST-6, ST-20, and ST-10) sample only.

Similarly, only four species Dickeya phage (7.47%), Lysinibacillus xylanilyticus (6.30%), Staphylococcus carnosus (1.23%), and Bacterium B16S (1.23%), were observed for more than 1%. Whereas, Oceanobacillus caeni (0.95%), and Bacterium CC30 (0.65%) accounted for more than 0.5% of total share of sequences (Supplementary file 3; Fig. 1C). Inter-individual differences at the species level, eleven STs (ST-4, ST-5, ST-6, ST-8, ST-10, ST-12, ST-13, ST-14, ST-18, ST-19, and ST-20) were populated by Lysinibacillus_xylanilyticus. Dickeya_phage was showed its prevalence in four STs (ST-2, ST-7, ST-16, and ST-17). Oceanobacillus_caeni showed its dominance in three different STs (ST-9, ST-11, and ST-15). Staphylococcus carnosus and Daucus carota were dominated in only one ST products respectively (ST-6) and (ST-1).

Shared bacterial phyla, genera and species among the STs

Of the five different phyla that showed their dominance greater than 0.1%, only three phyla Firmicutes, Proteobacteria, and Actinobacteriota were shared among all the ST products (Fig. 1D). While Bacteroidota were present among 95% and Cyanobacteria, Acidobacteriota were present in 65% of ST samples. Planctomycetota, were observed among 70% of tobacco products.

Similarly, among twenty-two genera accounted for more than 0.1% showed that seven genera (Lysinibacillus, Bacillus, Terribacillus, Dickeya, Staphylococcus, Corynebacterium, and Stenotrophomonas) among all the ST samples (Fig. 1D). While Pseudomonas, Aerococcus, Oceanobacillus, and Sphingobacterium were shared among 85–95% of the ST samples. Alkalibacterium, Brachybacterium, Halomonas, Solibacillus, Acinetobacter were shared among 75–85% in all STs. Desemzia, Glutamicibacter, Gracilibacillus shared 65–75% STs. Paenibacillus shared in 60% of total STs. Genera Facklamia, Lentibacillus was shared among 53% of total STs (Supplementary file 4).

Diversity indices

Of the twenty ST samples, maximum richness (Chao1; species richness including rare taxa) was observed in ST-3 (1658.723) followed by ST-4 (1116.97), and ST-9 (1272.33). The minimum Chao1 was observed in ST-13 (417.38). The Faith_PD values is an indicator of phylogenetic tree diversity, where the PD values indicates the diversity relatedness. ST-19, ST-9 and ST-7 showed the Faith PD values above 400, whereas ST-14 (59.84) sample showed the minimum Faith_PD value. The Simpson’s index of the majority of the samples was > 0.9 which indicates higher diversity and lower evenness of the bacterial community among the ST samples. The maximum Simpson’s index was observed > 0.98 in three samples (ST-11, ST-15, and ST-9). The Shannon index also follows the similar profile where ST-11, ST-15, and ST-9 shared the maximum values of the respective indices which indicate higher bacterial diversity in these samples. The Goodness estimator value of the majority of the samples was > 99% suggested for a good coverage where 100 sequences will contribute for only one OTU/ASV. Comparative analysis revealed that the CT group exhibits higher species richness over the LT group (Table 2; Fig. 2A). Beta diversity through Bray Curtis dissimilarity index and weighted unifrac showed that diversity between LT and CT groups are similar with respect to relatedness and abundance at the genera level (Fig. 2B).

Table 2 Various alpha diversity indices and estimators obtained among the ST samples
Fig. 2
figure 2

The box-plot showing the alpha diversity indices of two groups CT and LT (A). The PcoA plot showing the beta-diversity indices between two groups (B). The box-plot showing the significantly different genera in between two groups CT and LT (C)

Comparative analysis of commercial and loose smokeless tobacco products

Twenty smokeless tobacco samples were categorised into two groups as commercial smokeless tobacco (CT) and loose smokeless tobacco (LT) groups. The CT and LT groups include ten samples, where CT group represents ST-1, ST-2, ST-3, ST-4, ST-5, ST-11, ST-15, ST-17, ST-18, and ST-20 commercial smokeless tobacco, whereas other 10 samples (ST-6, ST-7, ST-8, ST-9, ST-10, ST-12, ST-13, ST-14, ST-16, ST-19) are of loose smokeless tobacco leaves. The CT group (68.5%) was richer in Firmicutes as compared to the LT group (56.8%). Whereas, Proteobacteria were prevalent in LT group (38.03%) over the CT group (26.4%). Actinobacteria, Bacteroidota, Cyanobacteria were also observed in both the groups.

Comparative level analysis at genus level revealed that Bacillus (25.14%) and Dickeya (17.2%) were the dominant genera in CT group followed by Lysinibacillus (16.17%) Oceanobacillus (10.94%), Terribacillus (8.57%), and Halomonas (6.11%). Whereas, in the LT group, Lysinibacillus (25.9%) and Terribacillus (19.6%) were the dominant genera followed by Bacillus (9.67%), and Aerococcus (9.31%).

Species level analysis followed the profiling of genus level analysis. In the species level analysis LT group showed Lysinibacillus xylanilyticus (43.4%) the most abundant species followed by Bacterium B16S (22.5%), Dickeya phage (15.09%), Staphylococcus carnosus (6.75%), and Oceanobacillus caeni (6.36%). The CT group Dickeya phage were highly abundant (43.5%) followed by Lysinibacillus xylanilyticus (28.12%), Oceanobacillus caeni (12.10%), and Staphylococcus carnosus (7.27%).

The Mann–Whitney comparison between CT and LT groups showed that at the phylum level both the groups are not statistically different, whereas at genus level three statistically significant genera were observed that include Bacillus (p-value: 0.02), Halomonas (p-value: 0.04), and Aerococcus (p-value: 0.002) (Table 3; Fig. 2C).

Table 3 Analysis to determine the statistically significant genera between CT and LT group

The PICRUSt based analysis for functional annotation

Functional prediction

The PICRUSt analysis was carried to predict the functional annotation of the 16S rRNA sequences that generated putative functions of the associated genes at four various tier level. To summarize the information, we have discussed of tier I and Tier IV level analysis. Tier I analysis showed that the majority of genes were involved in the environmental processing pathway (28.6%) followed by metabolism (22.3%), genetic information processes (19.1%), cellular processes (16.5%), human disease (9.2%), and organismal system (3.9%) (Supplementary Fig. 2A). Tier IV analysis is the extrapolation of clustered genes in tier I level and thus provides an extensive information of the relevant genes of respective groups. Here, only genes involved in nitrogen metabolism, biofilm formation, and pro-inflammatory molecules were considered for extensive analysis at the tier IV level. These genes are primarily discussed in the majority of ST microbiome-based investigations in the context of bacterial-derived risk factors (Supplementary Fig. 2B).

Genes involved in nitrogen metabolism

A total of 48 genes and their subunits were observed among twenty ST samples (Supplementary file 5). These genes were categorized into five broad categories that include i). nitrogen fixation genes (nifH, nifK, and nifD), ii). nitrate reductase genes (nar and nxr group of genes), iii). nitrite reductase genes (nirB, nirS, nirK, nirA, nirD), iv). denitrification genes (norB, norC, and norZ), and v). nitrate/nitrate transporters (nrt, nas, and cyn group of genes). Of these groups, the maximum share was of nitrate reductase genes (15.57%) followed by genes involved in nitrate/nitrite transporters (11.49%), and nitrite reductase (8.95%) among the ST samples. The least share was of genes involved in dentrification (1.06%) and nitrogen fixation (0.007%) (Supplementary file 5), where four samples (ST-6, ST-15, ST-18, and ST-19) showed absence of nif genes (Fig. 3A).

Fig. 3
figure 3

A Comparative analysis of five major groups of tier IV nitrogen metabolism in between CT and LT group (A). Further analysis identified several significantly different genes of nitrogen metabolism in between CT and LT groups (B)

While comparing between two groups, LT group showed higher abundance of the respective genes as defined in five groups of nitrogen metabolism. Further, the STAMP analysis revealed that LT group has significantly higher abundance of genes (Benjamini– Hochberg FDR-adjusted value < 0.05, q-value, − 1e − 15) of three groups (nitrogen fixation, nitrate reductase and nitrate/nitrite transporters), while the other two groups (nitrite reductase and denitrification) of genes were relatively higher in the CT samples (Supplementary file 5; Fig. 3B). The tier IV analysis showed that of the total nineteen relevant genes involved in nitrogen metabolism, the LT group was enriched with genes involved in nitrate and nitrite reductase and their transporter proteins. The CT group showed the dominance of genes of nitric oxide reductase and nitrous oxide reductase type.

Genes involved in biofilm formation

Biofilm formation genes were studied under the four major groups that includes i). Quorum sensing, ii). biofilm formation_Vibrio cholerae, iii). biofilm formation_Pseudomonas aeruginosa, and iv). biofilm formation_Escherchia coli. The maximum share of the genes was of quorum sensing (53%) followed by biofilm formation_Escherchia coli (13.4%), biofilm formation_Vibrio cholerae (11.5%), and biofilm formation_Pseudomonas aeruginosa (8.47%) (Supplementary file 6). The comparative analysis showed that the LT group exhibited higher abundance (Benjamini–Hochberg FDR-adjusted value < 0.05, q-value, − 1e − 15) of all the respective genes. The STAMP analysis at tier I level showed higher abundance of three genes (biofilm formation_Vibrio cholerae, biofilm biofilm formation_Pseudomonas aeruginosa, and biofilm formation_Escherchia coli), whereas quorum sensing genes were abundant CT group (Supplementary file 6; Fig. 4A). Tier IV analysis of thirty-four relevant genes associated with the biofilm formation showed that both the groups account 50% genes (Fig. 4B). The CT group showed the prevalence of peptide/nickel transport system permease protein along with oligopeptide transport system ATP binding protein and anthranilate synthase component. Whereas the LT group exhibited significant count of genes involved in Type VI secretin system proteins and LysR family transcriptional regulator, and genes of NarL family (Fig. 4B).

Fig. 4
figure 4

Comparative analysis of four major group of tier IV biofilm forming gene in between CT and LT group (A). Further analysis showed the pattern of various genes involved in biofilm formation in between CT and LT groups (B). Similarly, comparative analysis of five major groups of proinflammatory biomolecules (tier IV) were showed in between CT and LT groups (C). Further analysis showed the pattern of various genes involved in pro-inflammation in between CT and LT groups (D)

Genes involved in pro-inflammation

The proinflammatory genes were studied under the six major groups (Lipopolysaccharide, Peptidoglycan, Bacterial toxins, Teichoic acid, Flagellin, and ABC transporters). Of the total 619 genes, the maximum share was of ABC transporters (61%) encoding genes followed by flagellin (17.6%), lipopolysaccharide (5.45%), and peptidoglycan (1.44%) genes. The genes of bacterial toxins (0.84%) and teichoic acid synthesis (0.57%) were also observed (Supplementary file 7). The comparative analysis using STAMP (Benjamini–Hochberg FDR-adjusted value < 0.05, q value, − 1e − 15) showed that CT group showed dominance of peptidoglycan, bacterial toxins, teichoic acid, and Flagellin. Whereas LT group was prevalent in genes of ABC transporters and Lipopolysaccharide type (Supplementary file 7; Fig. 4C). Further tier IV analysis of the thirty relevant genes involved in proinflammation showed that the majority of the proinflammatory genes (60%) were associated with the CT group (Fig. 4D). Genes involved in osmo-protectant transport system permease protein, Kdo-2 lipid IV A louryltransferase, and D-alanine ligase were significantly abundant in LT groups over the CT group (Supplementary file 7; Fig. 4D).

Correlational and network analysis

To understand the correlation between the genus (relative abundance > 0.1%) and genes involved in biofilm formation, nitrogen metabolism, and proinflammatory biomolecules, Network analysis of 0.1% genera with respective nitrogen metabolism gene showed 37 nodes and 101 edges. Number of neighbours were 3.081, and characteristic path length was 3.152. The network density and clustering coefficient were 0.086 and 0.269. At the level of nitrogen metabolism, twelve bacteria showed strong correlation (rs > 0.5; p-value < 0.05) with respective genes involved in nitrogen metabolism. Pseudomonas (rs: 0.67), Lentibacillus (rs: 0.57), Dickeya (rs: 0.55), Gracilibacillus (rs: 0.52), Halomonas (rs: 0.73), Sphingobacterium (rs: 0.47), Stenotrophomonas (rs: 0.45), Corynebacterium (rs: 0.59), Glutamicibacter (rs: 0.58), Aeroccous (rs: 0.52), Brachibacterium (rs: 0.64) and Gracilibacillus (rs: 0.52) showed moderate to strong correlation with nitrite reductase genes. While genes of nitrate/nitrite transporters were moderately correlated with Dickeya (rs: 0.56) and Desemzia (rs: 0.51). Nitric oxide reductase showed strong correlation with Pseudomonas (rs: 0.64), Halomonas (rs: 0.73), and Sphingobacterium (rs: 0.51). Nitric oxide reductase and nitrous oxide reductase genes showed moderate correlation with Pseudomonas (rs: 0.57), Halomonas (rs: 0.65), Sphingobacterium (rs: 0.51), Stenotrophomonas (rs: 0.48), whereas, Lysinibacillus (rs: -0.55) and Pausibacillus (rs: -0.47) showed negative correlation (Fig. 5A).

Fig. 5
figure 5

Co-relation network analysis of bacterial community (> 0.1%) which shows the significant correlation with nitrogen metabolism genes (A), biofilm forming genes (B), and proinflammatory biomolecules (C)

Network analysis of 0.1% genera with respective biofilm forming gene showed 26 nodes 60 edges. Number of neighbours were 4.538, and characteristic path length was 2.375. The network density and clustering coefficient were 0.182 and 0.377. In biofilm network connection Desemzia showed their strong correlation with quorum sensing (rs: 0.62), E. coli (rs: 0.53), P. aeruginosa (rs: 0.61), V. cholerae (rs: 0.60). Besides this, quorum sensing showed moderate correlation with Facklamia (rs: 0.45) (Fig. 5B).

Network analysis of pro-inflammatory biomolecules gene showed 29 nodes, 70 edges, and 4.828 number of neighbours. Characteristic path length was 2.451 and network density and clustering coefficient were 0.172 and 0.406 respectively. The genes of teichoic acid exhibited strong correlation with Alkalibacterium (rs: 0.71), Brachybacterium (rs: 0.68), Corynebacterium (rs: 0.61), and Gracilibacillus (rs: 0.54), while it was moderate with Lentibacillus (rs: 0.49). The ABC transporters shows a strong correlation with Desemzia (rs: 0.66) and moderate correlation with Dickeya (rs: 0.44). The peptidoglycan and flagellin showed their strong correlation with Desemzia where rs values were 0.64, and 0.56 respectively. Gene of bacterial toxins showed their strong correlation with Desemzia (rs: 0.63) and Flacklamia (rs: 0.56). Besides, it exhibited moderate correlation with Staphylococcus (rs: 0.49), while, it was negatively correlated with Bacillus (rs: − 0.49). The genes involved in lipopolysaccharide synthesis find strong correlation with Dickeya (rs: 0.61) and Desemzia (rs: 0.55) (Fig. 5C).

Discussion

Approximately 356 million smokeless tobacco consumers of all categories exist at present worldwide (Sinha et al. 2018). This burden dominates in South and Southeast Asia (> 85%) thereby also accounting for maximum oral diseases including oral cancer in these regions (Al-hebshi et al. 2017). Besides, several other diseases such as periodontal disease, tooth calculus, peritonitis, and bacteremia have also been linked with the regular consumption of ST products in any form (Tomar et al. 2019; Vishwakarma and Verma 2021). India accounts for ~ 70% of DALYs (disability-adjusted life year) lost due to ST consumption (Siddiqi et al. 2020). Of the various reasons, microbial-derived health risks are one of the reasons that make ST products a carrier of health risks (Al-hebshi et al. 2017; Tyx et al. 2020; Saxena et al. 2022). A handful reports are available on ST products worldwide (Table 4), however, we are limited in information on indigenous ST products from India for their microbial assessment, therefore, the current study was carried out to better understand the potential bacterial risks associated with ST product consumption that assist in defining a baseline microbiological profile of indigenous varieties of ST.

Table 4 Comparative analysis of various characteristics analyzed during this study with previous reports of ST samples

Overall analysis revealed that Firmicutes was the dominant phylum among the smokeless tobacco samples followed by Proteobacteria, Actinobacteria, Bacteroidota, and Cyanobacteria. A similar pattern of bacterial phyla has been reported among various smokeless tobacco products (Smyth et al. 2017; Tyx et al. 2020). American moist snuff, Yemeni Shammah, also accounted for > 99.7% of Firmicutes, whereas Swedish snus was dominated by Proteobacteria (Al-hebshi et al. 2017). A recent investigation on Indian moist ST products revealed that the Proteobacteria dominates among ST products followed by Firmicutes (Sajid et al. 2021a; Table 4), which may be due to the type of the ST samples i.e., moist tobacco products for bacteriome analysis.

Genera level analysis (> 1%) follows the phyla level profiling as Lysinibacillus (a Firmicute) was the dominant genus among the ST products followed by Dickeya, Terribacillus, Bacillus, Oceanobacillus, Halomonas, and Aerococcus. Han et al (2016) also observed the dominance of Bacillus spp. in ST samples during cultivation as well as metagenomic-based approaches (Table 4). It indicates that ST bacterial microbiome shares a highly common bacteriome (shared) worldwide at the phylum and genus levels. Bacillus spp. find potential in nitrate reduction and generate the precursor for TSNA formation through the nitrosation of various nicotine and derived alkaloids (Law et al. 2016; Tyx et al. 2016). In addition, Bacillus spp. have also been reported in pulmonary inflammation, and food poisoning (McKillip 2000; Ozdemir and Arslan 2019). A handful of studies reported an elevated count of Bacillus spp. during packaging of the ST products (Fisher et al. 2012; Srivastav et al. 2020). Besides, Corynebacterium, Pseudomonas, Acinetobacter, Stenotrophomonas, Alkalibacterium, and Staphylococcus were also observed as the top twenty genera of the ST products here. Several such genera have been previously reported in ST products (Tyx et al. 2016; Sajid et al. 2021a; Han et al 2016; Vishwakarma and Verma 2021). The prevalence of these genera may distinguish the ST products as the carrier of potential bacterial pathogens which needs to be confirmed at the species level. Corynebacterium and Staphylococcus have been identified as significantly different genera in the oral cavity of oral squamous cell carcinoma (OSCC) patients (Srivastava et al 2022b). Mukherjee et al. (2017) also observed an elevated count of Corynebacterium among the patients with squamous cell carcinoma of the oral tongue (OMTC). The occurrence of Pseudomonas in ST products may be due to large-scale contamination during their processing (Wang et al. 2011; Li et al. 2020). However, it was observed among the top 10 genera in various investigations (Tyx et al. 2016, 2020; Sajid et al. 2021a). During our previous investigations on identifying the effect of ST products in oral bacteriomes also did not observe Pseudomonas as a significantly different genus, however, it was more associated with the oral cavity of the OSCC patients (Srivastava et al 2022b). The prevalence of Staphylococcus spp. in a few ST products need to be further confirmed for their role in oral pathogenesis at the species level. These saccharolytic and biofilm-forming bacteria exhibit the ability to attach to inert oral surfaces and may assist in the adherence of several Gram-negative oral pathogens such as Fusobacteria and Porphyromonas (Yong et al. 2019). Cooccurrence of Staphylococcus, Fusobacterium, and Corynebacterium has been previously predicted for their role in oral tumorigenesis (Srivastava et al. 2022a) as these bacteria assist in TSNA formation (Okamoto et al. 2017; Di Giacomo et al. 2007; Fisher et al. 2012; Hyde et al. 2014). Stenotrophomonas is a rare genus of ST products and finds a role in degrading polycyclic aromatic hydrocarbons (PAHs) (Mangwani et al. 2014) that are abundantly present in tobacco. Whereas Alkalibacterium exhibits potential in cellulose hydrolysis and carbohydrate fermentation (Yumoto et al. 2008). Several species of Acinetobacter belong to a Gram-negative, biofilm-forming oral pathogen that causes pneumonia and chronic obstructive pulmonary diseases (Richards et al. 2015). However, its presence in ST products needs to be explored at the species level.

We observed a few bacterial species that may be grouped as a series of rare species. Very less information is available on these speciesWhere, Lysinibacillus xylanilyticus has been reported as a rhizosphere bacterium having the potential to promote plant growth (Ahsan et al. 2021). Staphylococcus carnosus is a harmless food-grade bacterium that finds a role as a starter culture in the fermentation of food products (Corbiere Morot-Bizot et al. 2007). Besides, S. carnosus contributes to flavor, color stabilization, peroxide decomposition, and food safety (Muller et al. 2016; Lofblom et al. 2017). Whereas, the occurrence of the Dickeya phage may be compared with the Dickeya solani which exhibits resistance to bacteriophages and behaves as a plant pathogen (Bartnik et al. 2022). Extensive research is required in this line to find out the role of these bacteria in ST products.

Comparative analysis in between CT and LT groups identified no significant differences at the phylum level. Whereas, only three genera (Aerococcus, Halomonas, and Bacillus) were identified as significantly different at the genus level (p-value < 0.05). Aerococcus was more associated with the LT group, whereas the CT group was enriched with Halomonas and Bacillus. Thus, Indian commercial tobacco shows similar harmful effects as loose tobacco products in the context of bacterial diversity. The presence of Aerococcus and Halomanas in ST products is quite inconsistent, however, Aerococcus was identified as the dominant genus in American moist snuff (Tyx et al. 2016; Smyth et al. 2017). Whereas Halomonas was also identified as a less shared genus among ST products. Moist snuff (Han et al. 2016), cigar tobacco (Smyth et al. 2017), and shisha tobacco (Hani et al. 2018) were observed for the prevalence of Halomonas. Aerococcus (Rasmussen 2016) and Halomonas (Chattopadhyay et al. 2021) both were identified as opportunistic pathogens which were unexplored in the context of oral diseases. Whereas, the prevalence of Bacillus spp. in the commercial group of tobacco is more prone to oral diseases including oral tumorigenesis due to their ability in nitrate reduction. The majority of the prevalent genera as well as significantly different genera identified here, have also been reported for their multiple antibiotic resistance (Savini et al. 2009; Sajid et al. 2021b).

To understand the functional characteristics of the bacterial diversity of the ST products, the PICRUSt-based analysis was performed with a focus on three important aspects of loose and commercial ST products. It includes i). nitrogen metabolism, ii). biofilm formation activity, and iii). proinflammation activity. The LT group showed a significantly higher abundance of genes involved in nitrate and nitrite reductase pathways and their transporters. It corroborates the findings of previous investigations where a significantly higher abundance of nitrate reductase genes was observed (Tyx et al. 2016; Rivera et al. 2020; Srivastava et al. 2022a). Cytoplasmic as well as membrane-bound nitrate reductase (nas, nap, and nar) reduces nitrates into nitrite that reacts with alkaloids to form cariogenic nitrosamines (Wahlberg et al. 1999; Wang et al. 2017). Various bacteria adapt to use nitrate as an electron acceptor under anaerobic and/or oxygen-deprived conditions that usually occur under aging, fermentation, and long storage of the tobacco leaves (Nishimura et al. 2007; Di Giacomo et al. 2007; Fisher et al. 2012). The LT and CT groups shared very similar profiles in nitrate/nitrite reductase genes with slight abundance in the LT group which may be due to the less monitored condition of LT tobacco in the context of their quality assessment. However, due to continuous nitrate reduction, nitrite accumulates inside the cell which is deleterious for their growth, therefore bacteria release the nitrite outside the cell through nitrate/nitrite anti-porters or nitrite extrusion transporters (Bush et al. 2001; Law et al. 2016). The released nitrite is either metabolized through assimilatory or dissimilatory (denitrifying) pathways or undergoes N-nitrosation that is combined with tobacco-specific alkaloids [N-nitrosonornicotine (N,N,N), N-nitrosoanatabine (NAT), N-nitrosoanabasine (NAB), and 4-(methyl nitrosamino)- 1-(3-pyridyl)-1-butanone (NNK)] to form various TSNA compounds (Wang et al. 2017; Sajid et al. 2021a). The abundance of nitrate, as well as nitrite reductase genes, have been identified in both dry and moist snuff tobacco (Tyx et al. 2016; Rivera et al. 2020; Srivastava et al. 2022a). Thus, the inhabitant bacteria of the smokeless tobacco may play a crucial role in oral carcinogenesis.

Similarly, at the level of biofilm formation attributes, the LT group was more associated with the genes involved in biofilm formation of E. coli, Pseudomonas aeruginosa, and Vibrio cholerae type at the tier I level. Here, the genes of quorum sensing were prevalent in the CT group. Several periodontal diseases are linked with the biofilm formation activity of the oral pathogens in presence of nicotine (Wu et al. 2018; El-Ezmerli and Gregory 2019). Exposure to nicotine has been reported to promote the growth as well as biofilm formation activity of several pathogens (Huang et al. 2012; Nasrin et al. 2020; Vishwakarma and Verma, 2021). However, up to 8 to 16, mg/ml of nicotine concentration supports the growth as well as biofilm formation in the majority of the bacteria, thereafter it inhibits nicotine-induced-biofilm formation (DuBois et al. 2014). The effect of nicotine and tobacco extracts is almost unexplored on tobacco inhabitant bacteria and grabs attention in this line of research. The LT, as well as CT groups, were abundant in Type VI secretion system (T6SS) proteins that find a role as antibacterial activity due to the release of toxic secretory proteins (Russell et al. 2014; Chen et al. 2019). Such proteins are frequently produced by Gram-negative bacteria (E. coli, V. cholerae, and P. aeruginosa) to either kill the other bacterial species or escape themselves (Russell et al. 2014). The T6SS proteins attack bacterial cell walls and membranes by amidase, glycohydrolase, and lipase activities (Russell 2011; Alcoforado Diniz et al., 2015). The higher abundance of T6SS proteins in the LT group may be due to environmental exposure and other anthropogenic activities. Whereas the CT group showed prevalence of nickel transport system, which indicates the engagement of ST inhabitant bacteria in nickel efflux. The tobacco contains ~ 2.20–4.91 mg/kg of nickel (Stojanaovic et al. 2005) and therefore the bacterial population must be struggling to cope with a higher concentration of nickel.

Various factors of pro-inflammatory genes were further discussed in the context of risk associated with the ST products. The LT groups exhibited a prevalence of genes involved in ABC transporters and lipopolysaccharides (LPS) whereas the CT group harboured a greater number of genes of teichoic acid, flagellin, and bacterial toxins. The members of the ABC transporters superfamily are well known for their ATP-driven transport of chemically diverse substrates across the biological membrane such as toxins, xenobiotic compounds, bile acids, lipophilic drugs, and several macromolecules (Leslie et al. 2005; Mercado-Lubo et al. 2010; Muriithi et al. 2020). Tobacco harbors more than 4000 types of toxic compounds including carcinogens (Vishwakarma and Verma 2021) where inhabitant bacteria may involve in the efflux of such compounds by using such energy-driven transportation of the molecules. The role of ABC transporters needs to be extensively studied among tobacco inhabitant bacteria. Lipopolysaccharides were also more abundant in the LT group which corroborates the findings of Sajid et al (2021a) on Indian ST products. The Gram-negative bacteria-derived LPS have been reported for their role in pro-inflammation (Karpinski 2019) and OSCC (He et al. 2015). LPS of Pseudomonas has been reported in cigarette tobacco and smoke (Hasday et al. 1999). Here, we obtained a higher count of Pseudomonas in the LT group where LPS behaves as a pathogen-associated molecular pattern (PAMP) molecule that participates in stimulating strong innate immunity (Needham et al. 2013; Scott et al. 2017). A report gives confirmation about praising of immunosuppressive microenvironment produced by LPS-mediate chronic inflammation which enhances the tobacco-specific carcinogens. (Liu et al. 2021). Kurago et al (2008) observed that LPS moieties exhibit the potential to increase the progression of OSCC. Few clinical studies also illustrated the role of LPS and flagellins in colorectal cancer and hepatocellular carcinoma (Fedirko et al. 2017; Kong et al. 2016). Other pro-inflammatory molecules such as teichoic acid, flagellin, and bacterial toxins were more abundant in the CT group. However, Sajid et al (2021a) did not observe such genes from moist Indian ST products. Several Gram-positive bacteria produce wall teichoic acid for their protection against antibacterial activity (Kohler et al. 2009). Thus, the prevalence of teichoic acids in ST products enhances the Gram-positive bacterial load on ST products. Bacterial toxins have also been reported in various ST products (Pauly 2011; Tyx et al. 2016; Han et al. 2016) and assist in pathogenesis. Besides LPS (endotoxins), secretory bacterial toxins such as cytolethal distending toxins, Dentilisin, ExoY, cytotoxic necrotizing factors, and pneumolysin find a role in oncogenesis (Silbergleit et al. 2020). These bacterial toxins either directly interact with DNA/chromosomes or facilitate the oncogenesis pathways (Lemichez et al. 2013; Joossens et al. 2011; Silbergleit et al. 2020). Besides, bacterial toxins also exhibit the potential to control tumor-promoting inflammation (Kuper et al. 2000; Merchant et al. 2005). However, the working of the PICRUSt can not be ignored which describes the functional potential of the marker genes. Here, the selection of the biological databases may be biased due to limited existing reference genes and/or genomes, where rare-environment-specific functions are more likely ignored (Douglas et al. 2020).

Correlation and network analysis identified a strong correlation between several genes involved in nitrate/nitrite reductase, biofilm formation, and proinflammation and inhabitant bacteria of ST products (Fig. 5A, B, C). It uncovers several bacterial-derived risk attributes due to tobacco chewing (Sajid et al. 2021a; Saxena et al. 2022; Srivastava et al. 2022a). Moreover, the prevalence of nitrate/nitrite reductase genes may stimulate TSNA accumulation in the ST products which alters the DNA and leads to oral tumorigenesis (Khariwala et al. 2017). The moist tobacco showed a similar hypothesis where the bacteriome of moist STPs exhibited a strong correlation with colorectal cancer (Srivastava et al. 2022a). A previous investigation showed that Fusobacterium nucleatum promotes colorectal carcinogenesis through FadA adhesin that modulates E-cadherin/β-catenin signaling (Rubinstein et al. 2013).

Conclusion

Overall bacteriome analysis of ST products concluded that Indian smokeless tobacco products are populated with Firmicutes followed by Proteobacteria, Actinobacteria, Bacteroidota, and Cyanobacteria. Where, Bacillus spp. dominate among the samples. Therefore, chewing tobacco exhibits potential for oral dysbiosis. The functional prediction identified several bacterial-associated risk attributes, which include the prevalence of genes involved in nitrate reduction, biofilm formation, and, pro-inflammation that led to the onset of several oral diseases including oral cancer. The comparative analysis revealed that the LT and CT groups exhibit no significant difference at the phyla level, whereas at the general level only three bacteria (Aeroccous, Halomonas, and Bacillus) were identified as significantly different. Therefore, commercial tobacco products are at equal risk as loose tobacco products in the context of their bacterial-derived health risks. It further indicates the ignorance of quality checks of tobacco products while processing them at the commercial level. A deep insight is required in the line to understand the transition of resident bacteria of tobacco products into the oral cavity of tobacco chewers having oral cancer. Though, several potential bacterial markers of oral cancer such as Prevotella, Fusobacterium, and Porphyromoans have not been identified as dominant genera among ST products during this investigation. Therefore, it may be hypothesized that these bacteria do not transfer from ST products, however, chewing tobacco may support the proliferation of these oral pathogens that need to be further confirmed. The study has a limitation as it focuses only on the surface bacteria of the ST products. In-situ lysis of the tobacco can provide a better representation of the bacterial communities of ST products along with their functional attributes. Despite of highest tobacco consumers in India, we severely lack the information on the bacterial diversity of indigenous tobacco products. The present investigation paves a way to understand bacterial diversity and its correlation with associated health risk factors. In addition, the findings will assist the companies involved in tobacco processing in their monitoring to check the microbial status.