Introduction

Tobacco (Nicotiana tabacum L.), particularly flue-cured tobacco, is one of the most economically important non-food crops in the world. Fresh flue-cured tobacco is not suitable for cigarette products due to its pungent and irritating smoke. Thus, an additional process called aging (or fermentation) is applied to improve the aroma and qualities of tobacco leaves (TLs) (Su et al. 2011). The aging process is very complicated and has been linked to the enzymatic action of microorganisms and chemical interactions within the TLs (Jensen and Parmele 1950). There are two types of aging methods: artificial and natural aging. Artificial aging involves controlling the moisture and temperature, which significantly shortens the fermentation period (Su et al. 2011).

Artificial aging is related to the microbial theory of fermentation. According to previous reports, appropriate levels of certain microorganisms and specific aging conditions improve the characteristics of TLs, including aroma. Tamayo and Cancho (1953) first reported on the microbiology of tobacco fermentation and indicated microorganisms improve TL aroma during the fermentation process. In a subsequent study, several Bacillus species were shown to contribute to the pleasing aroma of TLs (English et al. 1967). Likewise, Pseudomonas spp., another important bacterial species found on TLs, degrade nicotine and improve the qualities of TLs (Ruan et al. 2005; Wang et al. 2009; Wang et al. 2007; Yu et al. 2015; Zhong et al. 2010). In recent years, an increasing number of studies have revealed important roles for microorganisms during the aging process, including acceleration of the aging process and the improvement of TL aroma (Chen et al. 2008; Zhao et al. 2007).

Using culture-independent methods, the composition of TL bacterial communities has been elucidated for both unaged and aging flue-cured TLs. Zhao et al. analyzed the bacterial communities of aging TLs using 16S ribosomal RNA (16S rRNA) PCR-denatured gradient gel electrophoresis (DGGE) technology (Zhao et al. 2007). Several dominant bacteria were identified, and different patterns of bacterial communities during different aging periods were also observed. In another study, bacterial community diversity in flue-cured TLs was examined by performing 16S rRNA sequence analysis on several types of TLs. The numbers and types of bacteria decreased gradually during the aging process in tobacco variety K326 (Huang et al. 2010). According to a study by Su et al. (2011), the distribution patterns of certain dominant microorganisms differed between unaged and aging flue-cured tobacco. Specifically, Pseudomonas sp., Pantoea sp., and other dominant bacteria changed significantly in Zimbabwe TLs after aging. Based on these reports, the indigenous microorganisms found in TLs may be key factors in the aging process. Thus, understanding the microbiology of bacteria in TLs is an important step towards effectively controlling the artificial aging process.

Raw tobacco must be subjected to threshing and redrying prior to aging. Thus, there are two major groups of TLs: raw tobacco (before threshing and redrying) and redried tobacco (after threshing and redrying) (Davis and Nielsen 1999). Subjecting TLs to threshing and redrying is an effective strategy to prevent mildew. Redrying involves removing moisture from the leaves by applying heat and then treating the leaves with steam until a predetermined moisture level is obtained. During redrying, the temperature may reach 70–150 °C for bundled leaves and 55–85 °C for strips. Concomitantly, strong physical shearing, called threshing, is also carried out. Because of these physical processes, communities of microorganisms may be disturbed, and many indigenous microorganisms may be inactivated. However, how the threshing and redrying processes affect bacterial communities still remains largely unknown, and how these changes affect the subsequent aging process also has yet to be elucidated.

In the present study, Illumina MiSeq sequencing was applied to analyze the diversity of bacterial communities inhabiting raw TLs and redried TLs. These bacterial communities were further analyzed after 1 year of aging.

Materials and methods

TL sampling

TLs (Zhongyan100, grade C3F) were sampled from Xuchan City, Henan Province, and collected by the China Tobacco Henan Industrial Co., Ltd. (Zhengzhou, China). Four types of TLs were obtained. Specifically, unaged raw TLs and redried TLs (TLs after threshing and redrying) were collected and stored at −20 °C for use. After 1 year of fermentation at ~4–25 °C, aging raw TLs and redried TLs were again sampled and stored at −20 °C. These samples were designated sample 1 (unaged raw TLs), sample 2 (unaged redried TLs), sample 3 (aging raw TLs), and sample 4 (aging redried TLs).

DNA extraction and PCR amplification

Microbial DNA was extracted from TL samples using an E.Z.N.A. Soil DNA Kit (Omega Bio-tek, Norcross, GA, USA) according to the manufacturer’s instructions. The V3–V4 region of the bacteria 16S ribosomal RNA gene was amplified by PCR (95 °C for 3 min, followed by 27 cycles at 95 °C for 30 s, 55 °C for 30 s, and 72 °C for 45 s and a final extension at 72 °C for 10 min) using primers 338F 5′-barcode-ACTCCTACGGGAGGCAGCAG-3′ and 806R 5′-GGACTACHVGGGTWTCTAAT-3′, where the barcode is an eight-base sequence unique to each sample. PCR reactions were performed in triplicate. Each reaction consisted of a 20-μL mixture containing 4 μL of 5× FastPfu buffer, 2 μL of 2.5 mM dNTPs, 0.8 μL of each primer (5 μM), 0.4 μL (1 unit) of FastPfu polymerase (TransGen AP221-02: TransStart™ FastPfu DNA polymerase, TransGen Biotech, Beijing, China), and 10 ng of template DNA.

Illumina MiSeq sequencing

Amplicons were extracted from 2% agarose gels, purified using an AxyPrep DNA Gel Extraction Kit (Axygen Biosciences, Union City, CA, USA) according to the manufacturer’s instructions and quantified using QuantiFluor™-ST (Promega, Beijing, China). Purified amplicons were pooled in equimolar quantities and pair-end sequenced (2 × 250) on an Illumina MiSeq platform according to standard protocols. Sequencing was performed by a commercial company (Majorbio Bio-Pharm Technology Co., Ltd., Shanghai, China). Libraries were constructed according to the TruSeq™ DNA Sample Prep Kit protocol (Illumina, USA). The raw reads were deposited into the NCBI Sequence Read Archive (SRA) database (accession number: SRP082278).

Processing of sequencing data

Raw FASTQ files were demultiplexed and quality-filtered using QIIME (version 1.17) under the following criteria: 300-bp reads were truncated at any site receiving an average quality score of <20 over a 50-bp sliding window, and truncated reads shorter than 50 bp were discarded; exact barcode matching, two-nucleotide mismatch in primer matching, and reads containing ambiguous characters were removed; and only sequences that overlapped for more than 10 bp were merged according to their overlap sequence. Reads that could not be merged were discarded. Operational taxonomic units (OTUs) were clustered with a 97% similarity cutoff using UPARSE (version 7.1 http://drive5.com/uparse/), and chimeric sequences were identified and removed using UCHIME. The taxonomy of each 16S rRNA gene sequence was analyzed using the Ribosomal Database Project (RDP) Classifier (http://rdp.cme.msu.edu/) against the SILVA (SSU115) 16S rRNA database with a confidence threshold of 70%.

Results

Sequence statistical analysis

A total of 197,799 reads were obtained through Illumina sequence analysis, comprising 163,315 reads corresponding to Cyanobacteria and 7189 reads corresponding to mitochondria. The Cyanobacteria and mitochondria sequences detected in the leaf samples were abundant, as expected. However, these reads did not represent actual bacterial amounts. After removing inauthentic data, 201,151 valid bacterial reads were obtained from the four samples, averaging between 48,235 and 51,925 sequences per sample (Table 1). The average lengths were 428.07 bp for sample 1, 429.02 bp for sample 2, 434.00 bp for sample 3, and 428.31 bp for sample 4. A total of 2488 sequencing reads from each sample were randomly selected for bacterial community structure analysis. According to previous studies, communities should be compared using equal numbers of sequence reads (Schloss et al. 2011). Figure 1 shows the Shannon-Wiener diversity index values for the four samples. The rarefaction curves reached saturation plateaus for all four samples, indicating the selected sequence data adequately reflected the bacterial abundance of these samples. Overall, 283 OTUs were obtained from the four samples using UPARSE, with 77 OTUs from sample 1, 72 OTUs from sample 2, 66 OTUs from sample 3, and 68 OTUs from sample 4. Good’s coverage was 99.64% for sample 1, 99.48% for sample 2, 99.38% for sample 3, and 99.24% for sample 4.

Table 1 Sequence data analysis of four TL samples
Fig. 1
figure 1

Shannon-Wiener index values for all samples. Rarefaction curves for the Shannon index were calculated using mothur, with reads normalized to 2484 for each sample using OTUs clustered at 97% sequence identity. The rarefaction curves reached a plateau, indicating sequencing depth was reasonable. 1 unaged raw TLs, 2 unaged redried TLs, 3 aging raw TLs, 4 aging redried TLs

As shown in Table 1, bacterial diversity was compared between the four samples. Sample 1 had the highest Shannon value (3.38), which was indicative of the greatest bacterial diversity. The Shannon value decreased after threshing and redrying (sample 2, 2.52) as well as after 1 year of aging (sample 3, 2.33). Additionally, the Shannon value changed slightly (2.53 to 2.33) for redried TLs after 1 year of aging. The Simpson values for these samples also exhibited the same trend.

Relative abundance of microbial communities

All selected sequences were classified from phylum to species using the RDP Classifier and QIIME software (Table S1). For sample 1, Proteobacteria was the most dominant phylum (56.15%), followed by Firmicutes (38.99%). For sample 2, Firmicutes (76.49%) was the most dominant phylum, followed by Proteobacteria (21.30%). Similar results were observed for sample 3 (16.92% for Proteobacteria vs. 80.43% for Firmicutes) and sample 4 (17.64% for Proteobacteria vs. 79.10% for Firmicutes). Bacteria with relative abundances greater than 1% were selected as representative genera for community comparison. Genera with relative abundances lower than 1% were classified as “others” (Table S2). Figure 2 shows a bar plot depicting microbial communities at the genus level for all four samples. Sample 1 contains 20 representative genera, accounting for 91.16% of reads. Sample 2 contains 11 representative genera, accounting for 94.21% of reads. Sample 3 contains 10 representative genera, accounting for 93.05% of reads, and sample 4 contains 10 representative genera, accounting for 93.13% of reads (Table S2). According to the RDP Classifier results, sample 1 comprised the greatest number of genus types, which agrees with the Shannon-Weiner index results, in which sample 1 showed the highest level of bacterial diversity among the four samples.

Fig. 2
figure 2

Bacterial community bar plot showing the relative abundances of bacterial genera in all four samples. Genera with relative abundances of less than 1% were assigned as “others.” 1 unaged raw TLs, 2 unaged redried TLs, 3 aging raw TLs, 4 aging redried TLs

At the genus level, the relative abundance of Bacillus was 22.38% for sample 1, which increased to 44.85% after threshing and redrying (sample 2) and to 51.61% after 1 year of aging (sample 3). The relative abundance of another type of Firmicutes bacteria, Lactococcus, was 8.60% for sample 1, 17.77% for sample 2, and 19.25% for sample 3. The relative abundances of other major genera in sample 1, including Sphingomonas (6.51%), Stenotrophomonas (5.71%), Pantoea (4.98%), and Enterobacter (3.22%), decreased after threshing and redrying as well as 1 year of aging (Table S2). Notably, most of these genera belong to the Proteobacteria phylum, while most of the genera whose relative abundances increased belong to the Firmicutes phylum. However, only slight changes in redried TL bacterial communities were observed after 1 year of aging. For example, Firmicutes changed from 76.49% (sample 2) to 79.10% (sample 4), and Proteobacteria changed from 21.30% (sample 2) to 17.64% (sample 4).

The heatmap results further illustrate genus distributions in the four samples (Fig. 3). According to the hierarchical cluster analysis, sample 1 was significantly different from the other three samples and was characterized by one main cluster. Samples 2 and 4 primarily grouped together, then clustered with sample 3, and finally clustered with sample 1. Based on these results, the bacterial community in sample 1 was vastly different from those in samples 2 and 4, implying there were significant changes in the raw TL bacterial community after threshing and redrying as well as aging. However, a smaller degree of change occurred in the redried TL bacterial community after 1 year of aging.

Fig. 3
figure 3

A heatmap showing bacterial distribution at the genus level in the four samples. Phylogenetic analysis was performed using the complete clustering method, and the relationship among the four samples was calculated using the Bray-Curtis distance. The relative abundance of each genus is indicated by the color intensity. 1 unaged raw TLs, 2 unaged redried TLs, 3 aging raw TLs, 4 aging redried TLs

Unique and shared OTU analysis

Figure 4 shows the unique and shared OTUs in the four samples. Comparing these four samples, there were 11 unique OTUs in sample 1, 2 unique OTUs in sample 2, 6 unique OTUs in sample 4, and 0 unique OTU in sample 3. Samples 1 and 2 shared 57 OTUs, which covered 88.34 and 98.23% of the reads in samples 1 and 2, respectively (Table S1). Similarly, 55 shared OTUs accounted for 87.90 and 98.71% in samples 1 and 3, respectively. Notably, 58 shared OTUs accounted for 98.99 and 98.59% of the reads in samples 2 and 4, indicating similarities between the bacterial communities in samples 2 and 4. A total of 44 shared OTUs were found in all four samples, accounting for 81.35, 97.19, 97.91, and 97.71% of the reads in samples 1 through 4, respectively. Considering the shared OTUs in these samples, sample 1 was significantly different from the other samples.

Fig. 4
figure 4

Unique and shared OTUs (97% sequence identity) in the four samples. 1 unaged raw TLs, 2 unaged redried TLs, 3 aging raw TLs, 4 aging redried TLs

Unique OTUs further highlight the different bacterial community in sample 1 compared to the other samples. There were 20 unique OTUs accounting for 11.65% of the reads in sample 1, while 15 unique OTUs accounted for only 1.77% of the reads in sample 2. Analysis of these 20 unique OTUs at the phylum level revealed that 10/20 of them belonged to Proteobacteria, 4/20 belonged to Firmicutes, 4/20 belonged to Bacteroidetes, and 2/20 belonged to Actinobacteria (Table 2). Based on these results, these unique bacterial OTUs were reduced during the threshing and redrying processes. Comparing samples 1 and 3, 22 unique OTUs accounted for 12.10% of the reads in sample 1, while 11 unique OTUs accounted for only 1.21% of the reads in sample 3. The majority of the unique OTUs belonged to Proteobacteria at the phylum level. In addition, the relative abundance of shared OTUs shifted between sample 1 and sample 3. However, 14 unique OTUs accounted for 1.00% of the reads in sample 2, while 10 unique OTUs accounted for 1.41% of the reads in sample 4. According to our analysis of the unique OTU profiles at the phylum level, bacterial diversity appeared to change slightly, as evident in sample 2 and sample 4 (Table 2). However, the low percentages of these unique bacteria in both samples (2 and 4) indicate the major bacterial communities (relative abundance >1.0%) remained stable in redried TLs after 1 year of aging (Table S2). Thus, different bacterial community changes were observed in raw TLs versus redried TLs after 1 year of aging.

Table 2 Profile of unique OTUs at the phylum level for each compared group

Discussion

This is the first report to describe bacterial communities on the TL surface based on Illumina MiSeq sequencing. The rarefaction curves approached the saturation plateau, indicating representation of the majority of dominant bacteria on the TLs. OTU numbers decreased after 1 year of aging, both on raw and redried TLs, with 66 OTUs in aging raw TLs, 77 OTUs in unaged raw TLs, 68 OTUs in aging redried TLs, and 72 OTUs in unaged redried TLs. These results are similar to those obtained in previous studies. Huang et al. reported 32 OTUs and 54 OTUs in aging and unaged flue-cured tobacco leaves (FCTLs) (K326), respectively (Huang et al. 2010). In another report, 65 OTUs and 84 OTUs were obtained from aging and unaged Zimbabwe TLs, respectively (Su et al. 2011). The different OTU numbers identified in these studies may be attributed to sample collection from different areas and different sequencing techniques. Nevertheless, the results from our study corroborate a previous report (Zhao et al. 2007) in which the amount and number of species of bacteria inhabiting TLs were reduced during the aging process.

Because TLs must be subjected to threshing and redrying prior to aging, bacterial communities on both raw TLs and redried TLs were systematically analyzed in the present study. Bacterial diversity decreased after the threshing and redrying processes. The higher Shannon and lower Simpson indices revealed higher bacterial diversity on the surface of raw TLs. This is consistent with the microbial community bar plot results. High temperature and high osmotic pressure during the threshing and redrying processes potentially affect microbial communities, causing a decrease in bacterial diversity on redried TLs. Further analysis of the bacterial communities at the phylum level revealed a change in the relative abundances of dominant genera after the threshing and redrying processes. Proteobacteria was the most dominant phylum on raw TLs (covering 56.15% of all representative sequences), followed by Firmicutes (covering 38.99% of all representative sequences). Similar distributions of these two phyla were reported on Zimbabwe FCTLs and K326 FCTLs (Huang et al. 2010; Su et al. 2011). Interestingly, after the threshing and redrying processes, Firmicutes became the most dominant phylum (accounting for 76.49% of all representative sequences), followed by Proteobacteria (accounting for 21.30% of all representative sequences). Firmicutes comprises the genera Lysinibacillus, Lactobacillus, Lactococcus, and Bacillus. These bacteria utilize glucose to produce lactic acid or produce spores to adapt to adverse environments, such as high osmotic pressure, low pH, and high temperature (Balcazar et al. 2006). Thus, they are able to survive the threshing and redrying processes. For example, the relative abundance of Bacillus (spore-forming bacteria) increased from 22.39 to 44.86% after threshing and redrying. Lactobacillus (1.69 to 3.90%) and Lactococcus (8.60 to 17.77%) are lactic acid bacteria that utilize sugars, such as fructose and glucose, to produce lactic acid and adapt to low-pH environments. Sugar is the primary compound in TLs and thus may be a factor in the observed increase in the population of sugar-fermenting bacteria. In contrast, several dominant Proteobacteria species, including Sphingomonas (6.51 to 1.81%), Stenotrophomonas (5.71 to 2.25%), and Pantoea (4.98 to 0.08%), were reduced in number after the threshing and redrying processes.

The effects of threshing and redrying on bacterial communities may continue after 1 year of aging. Small changes in bacterial communities were observed on redried TLs after 1 year of aging. The top three dominant genera (Bacillus, Lactococcus, and Pseudomonas) remained consistent between unaged and aging redried TLs (Table S1). The relative abundances of Bacillus, Lactococcus, and Pseudomonas on unaged and aging redried TLs changed slightly, from 44.86 to 46.10%, 17.77 to 17.00%, and 11.17 to 8.60%, respectively. In contrast, large changes in bacterial communities were observed on raw TLs before and after aging. Although Bacillus was the most dominant genus in both unaged and aging raw TLs, the relative abundance of Bacillus increased dramatically from 22.39% (unaged) to 51.61% (aging) on raw TLs. Similarly, the relative abundance of Lactococcus increased from 8.60% (unaged) to 19.25% (aging), while Pseudomonas decreased from 16.40% (unaged) to 8.80% (aging). Based on these results, populations of dominant bacteria remained relatively stable on redried TLs during aging but changed dramatically on raw TLs during aging. Raw TLs that have not undergone threshing and redrying may contain greater numbers of macromolecular compounds (e.g., starch, cellulose, and protein) than redried TLs. After threshing and redrying, surviving Bacillus and Lactococcus may utilize these compounds to rebuild larger populations during aging. Cellulose and lignin degradation by Bacillus megaterium in tobacco soil fields has previously been reported (Su et al. 2015). In a study from 1967, Bacillus subtilis and Bacillus circulans degraded macromolecular compounds to produce aromatic chemicals in tobacco (English et al. 1967). Lactococcus is commonly used in the food industry for processes including cheese making and milk fermentation. The primary purposes for using this bacterium are to achieve the rapid acidification of foods, causing a drop in the pH of a fermentation product, and preventing the growth of other spoilage-causing bacteria (Cavanagh et al. 2015). Thus, an increase in Lactococcus may partially explain the drop in pH in TLs after aging (Zhang et al. 2014). However, as previously reported, Pseudomonas shows low tolerance to extreme environments (e.g., low pH and low moisture) during the aging process, although it is one of the more dominant bacteria inhabiting the TL surface (Huang et al. 2010). Thus, the relationship between bacterial communities and chemical compounds should be further investigated.

Different bacterial communities that inhabit raw TLs and redried TLs potentially affect TL quality. Increases in Bacillus and Lactococcus populations help bioconvert macromolecular compounds and enhance the production of a desirable aroma during aging (English et al. 1967). Several Bacillus species have been isolated from tobacco material and have been shown to improve TL and tobacco waste extract (TWE) quality (Liu et al. 2015). Lactococcus utilizes sugars to produce lactic acid (Balcazar et al. 2006) and may largely affect TL sugar content. Pseudomonas is one of the most interesting genera due to its ability to degrade nicotine in TLs, which has been described in several reports (Chen et al. 2008; Wang et al. 2007; Wang et al. 2008; Yang et al. 2011); therefore, Pseudomonas has been widely employed to improve TL quality. Other bacteria, such as Pantoea and Sphingomonas, degrade compounds in TLs, affecting TL quality as well (Ma et al. 2016; Zhao et al. 2015).

In summary, different bacterial diversities were observed in raw and redried TLs and contributed to differences in bacterial communities after 1 year of aging in addition to affecting TL characteristics and quality during the aging process. This study reveals the need for further examination of the different chemical compounds found in raw and redried TLs as well as the chemical changes that occur during aging.