Introduction

Tobacco wastes have been classified as “toxic and hazardous wastes” by European Union Regulations (Novotny and Zhao 1999). Various technologies for tobacco waste treatments such as biomethanation (Meher et al. 1995), composting (Piotrowska-Cyplik et al. 2009), and sequencing batch reactor technology (Wang et al. 2009) have been explored. During the past decade, the production of “reconstituted tobacco” was gradually introduced into the tobacco waste utilization (Ashok and Joao 2001). This technology could reduce the manufacturing costs of cigars and cigarettes and thus gained worldwide acceptance by the tobacco industry (Wang et al. 2005b). The crude materials of reconstituted tobacco are tobacco wastes, such as tobacco plant leaf scraps, stems, dry tobacco dust, adhesives, reinforcing fibers, mineral ash modifiers, and humectants. In China, several special reconstituted tobacco factories have been built during the past decade. For example, Hangzhou Liqun Environment Protecting Paper Co., Ltd. processes more than 10,000 t of tobacco wastes per year (Zhong et al. 2010) to produce reconstituted tobacco sheet via papermaking process. The first step of this process is immersing tobacco leaf scrap wastes and tobacco plant stem wastes in water to make the aqueous extract (tobacco waste extract (TWE)). Then, TWE is smeared back to the reconstituted tobacco sheets. To control or improve the final quality of reconstituted tobacco sheets, TWE could be treated via biochemical, physical, and microbial technology.

Biochemical techniques have been widely used in tobacco industry to improve tobacco quality and to produce flavor materials and biofilters. For examples, Bernasek et al. (1989) used protease to decompose protein to be smaller water soluble molecular components and provide a reconstituted tobacco material with low protein content, which is useful as smokable materials for the manufacture of cigarettes and other smoking article. From broadleaf tobacco, English et al. (1967) selected 20 thermophilic members of the genus Bacillus, some of which could give a pleasing aroma in filler tobacco either after single strain or multiple strain treatment. To improve the quality of final tobacco products, Gravely and Geiss (1984) developed a process for simultaneous degradation of pectin and cellulose components of tobacco materials by using mixed cultures of microorganisms.

Another application of biochemical technique focuses on the nicotine degrading because nicotine is one of the key harmful components in tobacco and cigarettes. Actually, the development of low-nicotine cigarettes is becoming increasingly important in the market (Zhao et al. 2012). An increasing number of nicotine-degrading strains, such as Pseudomonas putida S16 (Wang et al. 2004, 2005a, 2007), Pseudomonas sp. ZUTSKD (Zhong et al. 2010), Pseudomonas stutzeri ZCJ (Zhao et al. 2012), and Ochrobactrum intermedium DN2 (Yuan et al. 2007), have been reported to be effective and potential to control the nicotine level in TWE or aging tobacco leaves. The nicotine content in TWE is one of the key factors that play roles in the final quality of reconstitute tobacco. To improve or control the quality of reconstituted tobacco, the native and more effective microorganism in TWE is obligated to be isolated and utilized. Meanwhile, the knowledge of bacterial community in TWE is also necessary.

The information of bacterial communities of tobacco leaves has been reported by Huang et al. (2010) and Su et al. (2011) via culture-independent 16S ribosomal RNA (rRNA) gene libraries method. Huang et al. (2010) found that 27 species of bacteria exist in both the unaged and aging flu-cured tobacco leaves (FCTL), in which Bacillus spp. and Pseudomonas spp. were two dominant genera. Su et al. (2011) found that the dominant genera were Pseudomonas and Pantoea based on 16S rRNA sequence analysis, and Pseudomonas occupied 21.29 and 24.04 %, while Pantoea occupied 19.55 and 18.47 % of 16 s rRNA gene libraries, on unaged and aging FCTL from Zimbabwe. However, few reports could be found on the knowledge of bacterial community in TWE. Bacterial communities change in original solution, and concentrated solution should have important impact on the quality of the final reconstituted tobacco. Thus, we developed an uncultured method to analyze the bacterial diversity in TWE to obtain the whole bacterial community information including that of uncultured species via the conventional media and culture condition, via which more than 85 % microbes cannot be obtained in pure culture (Amann et al. 1995).

In this study, the total microbial genome DNAs (derived from) of original solution and concentrated solution of TWE was successfully isolated via a culture-independent method. Moreover, the 16S rRNA sequences were amplified and directly sequenced through Roche 454 bar-coded pyrosequencing. The objectives of the present study are as follows: (i) to make a thorough investigation of composition, diversity, phylogeny, and bacterial community in original and concentrated solution of TWE during the industrial process of reconstituted tobacco sheet manufacturing via papermaking method; (ii) to provide a basis for clarifying the roles of bacteria in the reconstituted tobacco process; and (iii) to guide isolation of native functional microorganisms for application in control and optimization process.

Materials and methods

Sample collection

The TWE samples were derived from Hangzhou Liqun Environment Protecting Paper Co., Ltd., Zhejiang Province, China. During the industrial process of reconstituted tobacco, the waste leaf scraps and plant stems (denominated as M and G, respectively) were immersed in water, respectively, to make the aqueous extracts (TWEs). After original TWE (collected as M1 or G1) was harvested via filtration, the filter residue was collected to make paper after pulping. The original TWE was then concentrated and stored in a 10-m3 tank (collected as M2 or G2). After 5-day storage under 45 °C in the tank (aging process), the aged TWE (collected as M3 and G3) were smeared back to the reconstituted tobacco paper. To clarify the diversity change during the process, the original, concentrated, and aging TWEs, from waste leaf scraps and plant stems, respectively, were sampled for bacterial diversity analysis. Thus, the total six kinds of samples were collected and denominated as follows: M1, the original TWE from tobacco leaf scraps wastes; M2, the concentrated M1; M3, the aged M2; G1, the original TWE from tobacco plant stem wastes; G2, the concentrated G1; and G3, the aged G2.

Genomic DNA extraction

To harvest all possible bacterial cells in samples, 6-ml sample were centrifuged at 12,000 × g for 10 min at 4 °C (Biofuge Stratos, Thermo Scientific, USA). After the supernatant was removed, genomic DNA (gDNA) in precipitation was extracted using a Ezup Soil Genomic DNA Preparation Kit with Spin Column (Sangon Biotech (Shanghai)Co., Ltd., Shanghai, China).

Amplification, purification, and quantification of 16S rRNA genes

The bar-coded conserved primers 27 F and 533R (sequencing end) containing the A and B sequencing adaptors (454 Life Sciences, CT, USA) were used to amplify a region in the 16S rRNA genes covering V1–V3 region. The A and B sequencing adaptors were useful for purification, emPCR, and pyrosequencing. The forward primer (B-27 F) was 5′-cctatcccctgtgtgccttggcagtctcagagagtttgatcctggctcag-3′, where the sequence of the B adaptor is shown in pane. The reverse primer (A-533R) was 5′-ccatctcatccctgcgtgtctccgactcagnnnnnnnnnnttaccgcggctgctggcac-3′, where the sequence of the A adaptor is shown in pane, and the Ns represent a sample-specific bar-code sequence. The PCR mixture (final volume, 20 μl) contained 4 μl 5× reaction buffer (TransStart™ FastPfu Buffer, TransGen Biotech, Beijing, China), 2 μl dNTPs (2.5 mM), 0.4 μl forward primer (5 μM), 0.4 μl reverse primer (5 μM), 0.4 μl FastPfu polymerase (TransGen AP221-02: TransStart™ FastPfu DNA Polymerase, TransGen Biotech, Beijing, China), 10 ng template DNA and dd H2O. The PCR amplification program was as follows: (a) 1× (2 min at 95 °C); (b) 25× (30 s at 95 °C, 30 s at 55 °C, 30 s at 72 °C); and (c) 5 min at 72 °C, 10 °C until halted by user. For each sample, three replicate PCRs were performed using PCR instrument (GeneAmp® 9700, ABI, USA). PCR products of the same sample were assembled and visualized on 2 % agarose gels and purified with AxyPrepDNA gel extraction kit (Axygen, USA). Then, the purified PCR products were quantified using QuantiFluor™-ST (Promega, USA).

EmPCR and pyrosequencing

To improve the accuracy of pyrosequencing, the amplified, purified, and quantified 16S rRNA genes (PCR products) from each reaction mixture were pooled in equimolar ratios based on concentration and subjected in emPCR to parallel amplification to generate amplicon libraries. Amplicon pyrosequencing was according to Sequencing Method Manual_XLR70 kit and performed by Roche 454 Genome Sequencer FLX + on the platform at Majorbio Bio-Pharm Technology Co., Ltd., Shanghai, China. These sequences are available through the NCBI BioProject database with ID 238859 (http://www.ncbi.nlm.nih.gov/bioproject/238859).

Statistical and bioinformatic analysis

After pyrosequencing, valid sequences were obtained. To achieve higher quality and more accurate statistical and bioinformatic analysis data, these pyrosequencing valid sequences were optimized by using SEQCLN and MOTHUR program (http://sourceforge.net/projects/seqclean/ and http://www.mothur.org/wiki/Main_Page). Then, the trimed sequences were acquired. The procedures that clustered operational taxonomic units (OTUs) were as follows: Unique (merged the same sequences and find out the different sequences), Align (compared with SLIVE database (http://www.arb-silva.de/, version silva SSU111) by using KMER searching (http:www.mothur.org/wiki/Align.seqs)), Chimeric remove (by using UCHIME, http://drive5.com/uchime), Distance calculate (calculated uncorrected pairwise), and OTU cluster (by using Furthest neighbor (http://www.mothur.org/wiki/Cluster) defined by 97 % similarity). Taxonomy analysis was performed by Mothur taxonomy (http://www.mothur.org/wiki/Classify.seqs) and naive Bayesian classifier. Rarefaction curve, microbial community barplot, OTU Venn analysis, and microbial community heatmap analysis and multiple sample taxonomy analysis tree and similarity tree were generated using Mothur and R program.

Results

Genomic DNA extraction and amplification of 16S rRNA genes

Genomic DNA seems to be difficult to be extracted from TWE samples due to its characteristics of high osmotic, sticky, and component complexity. The concentration of final extract of genomic DNA was not high enough to be viewed on agrose gel (Fig. 1a). However, all the samples is enough to amplify the 16 s RNA gene sequence via a pair of universal prime (Fig. 1b). It was qualified to be quantified and sequenced.

Fig. 1
figure 1

Electrophoresis of genomic DNAs extracted from TWE samples via 1 % agarose gel (a) and their PCR products of 16S RNA sequences via 2 % agarose gel (b) (2-μl loading samples). M1, the original TWE from tobacco leaf scrap wastes; M2, the concentrated M1; M3, the aged M2; G1, the original TWE from tobacco plant stem wastes; G2, the concentrated G1; G3, the aged G2

Sequence statistical analysis of the samples

A total of 33,151 valid reads (longer than 200 bp) and 2,253 OTUs were obtained from the six samples through Roche 454 FLX + pyrosequencing analysis. These sequences/OTUs are available through the NCBI BioProject database with ID 238859 (http://www.ncbi.nlm.nih.gov/bioproject/238859).

As shown in Fig. 2, six samples contained reads ranging from 3,659 to 7,148, with maximal OTUs ranging from 398 to 746. The rarefaction curves tended to approach the saturation plateau. Good’s coverage estimations revealed that 94 % (G1), 95.7 % (G2), 91 % (G3), 93 % (M1), 95.9 % (M2), and 96 % (M3) of the species were obtained in all of the samples. Some bacteria might be still missed due to the limitation of the method. However, the size of reads is enough to reveal most of the dominant bacteria species on the TWEs.

Fig. 2
figure 2

Rarefaction curves of OTUs clustered at 97 % sequence identity across different samples. Rarefaction curve could tell us if the sequencing coverage was good to detect most of the species. When rarefaction curves reached near plateau, it means sampling depth was reasonable

Microbial community barplot

All sequences were classified from phylum to genus via the program Mothur using the default setting. From all the genera, 14 different genera or groups were selected as representative genera for the community comparison among six samples based on the genus abundance in all sample (above 0.1 %, Table S1). However, top 4 abundant genera or groups in G1and the genus Pseudomonas were selected based on comparison requirement. As shown in Fig. 3 and Table 1, the six samples showed dissimilar 16S rRNA profiles in genus-level distributions.

Fig. 3
figure 3

Bacterial community barplot showing the relative reads abundance of different bacterial genus within the different communities. Sequences that could not be classified into any known group were assigned as “No_Rank.” M1, the original TWE from tobacco leaf scrap wastes; M2, the concentrated M1; M3, the aged M2; G1, the original TWE from tobacco plant stems wastes; G2, the concentrated G1; G3, the aged G2

Table 1 Shared genus among the G and M libraries

Sample G3 contains ten representative genera, in which Lysinibacillus, Lactobacillus, Azospirillum, Lactococcus, Acinetobacter, Hydrocarboniphaga, Thermus, Comamonas, and Peseudomonas were the main genera accounting for 74.82 % reads of the G3 sample. Sample G2 contains eight representative genera, in which Lactobacillus, Lysinibacillus, Acinetobacter, Comamonas, and Acetobacter accounted for 91.46 % of the reads. Sample G1 contains four representative genera, in which Lactobacillus accounted for 99.86 % of the reads while Lysinibacillus, No_Ranks, Vellonella, and Curvibacter accounted for only 0.14 % of the reads. In general, the proportion of Lactobacillus decreased from G1 to G3 (99.86, 71.14, and 10.27 %) while the proportion of Lysinibacillus increased from G1 to G3 (0.05, 18.92, and 57.36 %).

Similar to G samples, Lactobacillus is also the dominant genus in M samples and accounted for 92.21, 80.33, and 47.18 % of the reads in the M1, M2, and M3 samples, respectively. The proportion of Lactobacillus also decreased from M1 to M3 while that of Lysinibacillus increased from M1 to M3 (0.16, 0.53, and 46.14 %). However, sample M1, M2, and M3contained eight, nine, and six representative genera, respectively, and showed similar diversity to G samples.

Unique and shared OTU analysis in multiple samples via Venn diagram

Venn diagram showed the number of unique and shared OTUs in multiple samples (Fig. 4). G1, G2, and G3 samples shared 40 OTUs while M1, M2, and M3 samples shared 96 OTUs. Further statistical analysis based on reads scale showed that the 40 shared OTUs in G samples covered 99.86, 71.14, and 10.27 % of the all reads in G1, G2, and G3 samples, respectively (Table 1). Although all 40 shared OTUs in G1–3 samples belonged to genus Lactobacillus which represented the most abundant genus common to the three samples, its abundance on reads scale decreased from G1 to G3. For genus Lysinibacillus, there were no shared OTUs in G samples. However, its abundance on reads scale increased from G1 (0.05 %) to G3 (57.36 %), being in coincidence with the results dedicated from Fig. 3.

Fig. 4
figure 4

Shared OTU analysis of the different libraries via Venn diagram showing the unique and shared OTUs (3 % distance level) in the different OTU libraries. a G1, G2, and G3 OTU libraries; b M1, M2, and M3 OTU libraries

Statistical analysis (Table 1) also revealed that the 96 shared OTUs in M samples covered 97.2, 93.98, and 98.76 % of the reads in the M1, M2, and M3 samples, respectively. Most of the shared OTUs belong to the genera Lactobacillus (67 in 96 OTUs) and covered 92.21, 80.33, and 47.18 % of the M1, M2, and M3 reads, respectively. Only one OTU belongs to the genera Lysinibacillus and covers 0.15, 0.53, and 46 % of the M1, M2, and M3 reads, respectively. However, the abundance change of these two genera in reads of M samples was similar to that of G samples. Two OTUs belonging to the genera Pseudomonas covered 0.27, 0.57, and 0.11 % of the M1, M2, and M3 reads, respectively. Five OTUs belonging to the genera Comamona covered 0.88, 2.38, and 0.46 % of the M1, M2, and M3 reads, respectively. One OTU belonging to the genera Acinetobacter covered 0.039, 0.35, and 0.22 % of the M1, M2, and M3 reads, respectively. Reads belonging to the genera Lactococcus covered 0.1, 0.017, and 0.018 % of the M1, M2, and M3 reads, respectively.

It is also found that the number of OTUs shared between the G and M libraries was 15 (Table 1). The most abundant OTUs shared by the two groups belong to the genus Lactobacillus (27.75 and 23.39 % of the G and M reads, respectively).

Heatmap results showing bacterial distribution among the six samples

Heatmap result is shown in Fig. 5, in which the top 100 abundant genus in six samples were selected as the representative genus. Hierarchically clustered heatmap analysis based on the bacterial community profiles at genus level disclosed that M1 and G1 samples grouped together firstly, and they then clustered with M2, G2 samples in order, and finally clustered with M3, G3 samples grouped altogether (Fig. 5). In addition, the heatmap analysis indicated that the G3 and M3 samples were significantly different from the other samples (G1 and M1, G2 and M2 clustered group).

Fig. 5
figure 5

Double hierarchical dendrogram showing the bacterial distribution among the six samples. The bacterial phylogenetic tree was calculated using the neighbor-joining method, and the relationship among samples was determined by Bray-Curtis distance and the complete clustering method. The heatmap plot depicts the relative percentage of each bacterial family (variables clustering on the y-axis) within each sample (x-axis clustering). The relative values for bacterial family are depicted by color intensity with the legend indicated at the bottom of the figure. Clusters based on the distance of the six samples along the x-axis and the bacterial families along the y-axis are indicated in the upper and left of the figure, respectively

Multisample taxonomy analysis tree

All sequences that covered more than 0.01 % of all reads in each sample were selected as representative sequences. In genus scale (Table S1), G1, G2, and G3 contain 5, 73, and 43 genera, while M1, M2, and M3 contain 65, 117, and 61 genera. In phyla scale (Table S2), G1, G2, and G3 contain 2, 9, and 18 phyla, while M1, M2, and M3 contain 8, 10, and 9 phyla. Their taxonomy information is shown in Fig. S1, in which the community composition of the representative bacteria in all samples was calculated and shown in pie plots after the lowest taxonomical unit.

It was found that phyla Firmicutes and Proteobacteria were present in all samples and covered 89.42 and 6.59 % of all representative sequences (Fig. S1, Table S2). In detail, Firmicutes and Proteobacteria covered 99.96 and 0.01, 92.30 and 4.10, and 71.714 and 16.22 % of the reads in the G1, G2, and G3 samples, respectively (Table S2). They also covered 93.81 and 4.20, 84.50 and 10.73, and 95.47 and 2.51 % of the reads in the M1, M2, and M3 samples, respectively.

Discussion

This is the first time to reveal the bacterial diversities on the TWE samples from the process of reconstituted tobacco sheets via Roche 454 bar-coded pyrosequencing method. The results indicated that all TWE samples harbored abundant bacterial communities. The dominant phyla of all samples were Firmicutes and Proteobacteria, covering 89.42 and 6.59 % of all representative sequences, respectively. Firmicutes and Proteobacteria were also found as the dominant phyla in tobacco leaf sample especially during aging and fermentation process (Huang et al. 2010). However, in Huang’s report, Proteobacteria was the first dominant phylum (53.33 % of clone library of aged tobacco leaf), while Firmicutes was the second dominant phylum in tobacco leaves (45.33 % of clone library of aged tobacco leaf) (Huang et al. 2010; Su et al. 2011). The difference may result from the resource difference and the diversity analysis method difference. Huang et al. (2010) and Su et al. (2011) constructed 16 s rRNA clone library for sequencing, while we utilize pyrosequencing method for directly sequencing the PCR products of 16 s rRNA. Pyrosequencing method should cover more sequences. Furthermore, in Huang’s reports, total bacterial genomic DNA was isolated from the surface of tobacco leaves. However, in our study, the tobacco leaves were crushed to make TWE, and the total genomic DNA was isolated either the surface of or inside tobacco leaves. Thus, more species belong to Firmicutes, which usually exist inside tobacco leaves as endophytic species (Chen et al. 2011), might be sequenced in our study than that in Huang’s study.

As the first dominant phylum in TWE, Firmicutes contains genera Lysinibacillus, Lactobacillus, Lactococcus, and Bacillus. Lactobacillus and Lysinibacillus were the most dominant genera and covered 39.83 and 59.19 % of reads of G samples and 60.17 and 40.81 % of reads of M samples. As reported in previous studies (Balcáza et al. 2006), lactic acid bacteria (Lactobacillus and Lactococcus in TWE) and spore- producing bacteria (Bacillus and Lysinibacillus in TWE) can utilize the sugar to produce lactic acid or produce spore to adapt to the high sugar degree, high osmotic pressure, and low pH environment. Thus, it is reasonable that Lactobacillus and Lysinibacillus were the most dominant bacterial genera in G and M samples (Fig. 3). However, their distribution varied in different stages of the G and M samples. The relative abundance of Lactobacillus decreased from 99.86 % (G1) to 10.27 % (G3) in G samples and decreased from 92.21 % (M1) to 41.78 % (M3) in M samples, respectively. The relative abundance of Lysinibacillus increased from 0.05 % (G1) to 57.36 % (G3) in G samples and increased from 0.16 % (M1) to 46.14 % (M3) in M samples, respectively. These results suggested that the bacteria belonging to Lactobacillus genus and Lysinibacillus genus showed different tolerant ability to the same environment. In our view, Lactobacillus genus, which can utilize the sugar to produce lactic acid with no significantly pH reduction of samples from G1 to G3 and M1 to M3, may be suppressed by product inhibition effect. On the contrary, strains belonging to genus Lysinibacillus can successfully adapt to the changing and extreme environment; thus, it can establish large populations in G3 and M3 samples.

Meanwhile, some bacteria belonging to the above genera had interesting functions in tobacco or cigarettes. For example, Bacillus spp. can enhance the production of a desirable aroma and improve the smoking qualities of the tobacco. For examples, English et al. (1967) reported that either Bacillus subtilis or Bacillus circulans contributed to a pleasing aroma in Pennsylvania “Wrapper B” filler tobacco. Kaelin et al. (1994) reported that Bacillus thuringiensis is part of the natural microflora in the stored-tobacco environment and might be involved in controlling pests in storage products. In our study, several strains belonging to Bacillus was also isolated from TWE, such as B. subtilis SM and B. pumilus M3-1-B exhibiting significant effect on aroma improvement in TWE.

As the second dominant phylum in TWE, Proteobacteria contained a number of genera such as Pseudomonas, Acinebacter, Comamonas, and Achromobacter. Among these genera, Pseudomonas was the most interesting genus. An increasing number of reports (Chen et al. 2008; Li et al. 2010; Ruan et al. 2005) have shown that Pseudomonas could degrade nicotine. However, the role of other dominant bacteria in the aging process of tobacco remains unknown. In our study, several strains belonging to Proteobacteria were also isolated. For examples, Pseudomonas sp. JY-Q is capable of nicotine degrading and Microbacteria sp. M2-2-H exhibits significant effect on aroma improvement in TWE.

In general, the change of bacterial community in TWEs during reconstituted tobacco process is a complex and dynamic process. There might exit more factors, such as enzymatic actions of bacteria and physical and chemical interactions within the samples, which might influence the community change (Wang et al. 2005b; Jensen and Parmele 1950) and required further research in our future work.