Introduction

The global demand for renewable energy sources such as wind, solar, and lignocellulosic biofuels to replace fossil fuels is very crescent [1]. Among the greatest challenges for the utilization of lignocellulosic biomass as a biofuel resource is the production of efficient and inexpensive enzymes that act on cellulose (around 35–50%), hemicellulose (around 20–35%), and lignin polymers (around 10–20%) of plant cell wall [2, 3]. For this reason, microorganisms from animal (including insect) guts that feed on lignocellulose could represent an interesting source of new enzymes specialized in lignocellulose breakdown. The gut of termites is considered the most efficient mini bioreactor in nature that promote the lignocellulose degradation by enzymes produced by termites and their intestinal microorganisms (more than 90% cellulose degradation). Considering the great phylogenetical similarities between termites and cockroaches [4, 5], these insects could be an interesting source of new enzymes involved in the digestion of lignocellulosic compounds. Cockroaches, like termites, digest lignocellulosic compounds through the cooperation of enzymes derived both from the cockroach itself, and its gut microbiota [3, 4]. Wharton & Wharton [5] and Elpidina et al. [6] reported enzymatic activities against proteins, starch, cellulose, and cellobiose in the midgut of the omnivorous cockroach Nauphoeta cinerea. The degradation of the lignocellulolytic material inside the gut the cockroaches is mainly made by microbial consortia [2].

Microbial consortia (natural or artificial) show several potential advantages for valuable product synthesis, lignocellulose utilization, bioremediation, and other biotechnological products in comparison to pure cultures. The synergies of different metabolic pathways of microorganisms from these consortia can result in more efficient degradation of substrates with a major spectrum of useful products for different sectors of biotechnology [7, 8]. Several research groups have assembled microbial consortia—from different sources, such as soil, intestine, rumen, and feces [9, 10] that are able to degrade lignocellulosic biomass. Thus, the development of microbial consortia in specific biomass represents a rich arsenal of genes/enzymes to supplement existing enzyme cocktails, as well as the transformation of fungus with an enzyme of metagenomic origin [9,10,11].

Metagenomic approaches allow to identify several new carbohydrate-active enzymes (CAZymes) originating from these different consortia. The organization of several CAZymes in a specific arrangement named PUL (polysaccharide utilization loci) has also been described, mainly in the Bacteroidetes genome [12]. The Bacteroidetes isolates from P. americana digestive tracts show the presence of several PUL and non-PUL-associated CAZyme-coding genes [12]. These genes could target starch, pectin, and/or hemicellulose. However, there has been no specific locus involved in lignin processing described so far.

In this study, an aerobic, lignocellulolytic microbial consortium derived from the intestinal microbiota of the cockroach Nauphoeta cinerea (NaLC) was developed, with the aim of identifying potential lignocellulolytic microorganisms and novel CAZymes. Using pretreated sugarcane bagasse as the exclusive carbon source, it was observed changes in the morphological and compositional aspects of the sugarcane bagasse after cultivation with the microbial consortium. The microbial community structure was investigated by high throughput sequencing of the 16S rRNA gene for prokaryotic diversity analysis and the Internal Transcribed Spacer 2 (ITS2) for fungal analysis. Furthermore, shotgun metagenomic and binning analyses were performed, revealing several CAZymes and microorganisms probably involved in the deconstruction of the pretreated sugarcane bagasse.

Materials and Methods

Enrichment of Nauphoeta Lignocellulolytic Consortium

The N. cinerea cockroach colony was maintained in glass aquariums exposed to natural temperature and photo regime conditions. The insects were fed exclusively with sugarcane bagasse pretreated by steam explosion for at least 14 days.

The sugarcane bagasse used in this study was pretreated by steam explosion followed by an alkaline delignification reaction [11]. The bagasse was ground in a Willey type mill (Pulverisette® 19—Fritsch) to a particle size of 2 mm, named here as BED. Bushnell Haas Broth commercial medium (BHB) [9] was supplemented with 0.5% (w/v) sugarcane bagasse as the only carbon source, kindly provided by the Brazilian Biorenewables National Laboratory (LNBR).

The initial inoculum to establish the microbial consortium consisted of a mixture of three intestines of the cockroach Nauphoeta cinerea in 100 mL of BHB supplemented with BED 0.5% (w/v). The culture was grown in Erlenmeyer flasks (250 mL), stirred at 120 RPM at 30 °C. The NaLC was maintained by continuous subculture inoculating 1 mL of supernatant transferred every 7 days into fresh BHB enrichment medium for at least 6 months, after that period the experiments were performed. The control group was generated in culture media incubated in the absence of NaLC.

Determination of BED Consumption by NaLC

After the 1st, 3rd, 5th, and 7th days of incubation, the culture medium was filtered through filter paper previously weighed. This material was placed in an oven at 50 °C for 48 h to eliminate water from the filter. Subsequently, the material was weighed, and the dry weight of the remaining sugarcane bagasse was compared to the control group. The control group consisted of the BED incubated in the absence of NaLC.

Characterization of BED Degradation by NaLC

The chemical composition of the BED was performed by collecting 7th day growth samples. The control consisted of an incubation of non-inoculated medium under the same conditions as experimental groups. All experiments were performed in triplicate. Before analysis, the biomass was washed with sterile water to remove excess microbial biomass. The cellulose, hemicellulose, and lignin contents of the control and consortium samples were determined by an acid hydrolysis method and reported as a solid dry mass fraction [10]. The hydrolysate was characterized by high-performance anion exchange with pulsed amperometric detection (HPAEC-PAD) with a Dionex ICS-3000 system, using CarboPac PA1 column (4 × 250 mm) and CarboPac PA1 guard column (4 × 50 mm) to determine the concentration of cellulose and hemicellulose. Soluble lignin was determined from ultraviolet absorption (215 and 280 nm) of the hydrolysate. The insoluble lignin was determined as the solid faction from hydrolysis and subtracted from its ash content.

Enzyme Activity Analysis from NaLC Enrichment

For analysis of enzymatic activity, 1 mL of the culture supernatant was collected on the 3rd and 7th days. The final reaction volume was 100 μL, where 50 μL of the supernatant was mixed with 50 μL of pre-prepared 1% (w/v) solutions of different substrates as carboxymethylcellulose (CMC, Sigma), 1% (w/v) beta-glucan (Megazyme), 1% (w/v) beechwood xylan (Sigma), 1% (w/v) microcrystalline cellulose (Avicel®) (sigma), and 1% (w/v) xyloglucan from tamarind (Megazyme), among others diluted in 0.1 M phosphate buffer (pH 7.0). The reactions containing the supernatant from the enzymatic consortium and the substrates were incubated at 30 °C for 1 h. The amount of reducing sugars was determined using the 3,5-dinitrosalicylic acid (DNS) method [13]. As negative controls, the same procedure was performed, but with supernatant BED culture incubated in the absence of NaLC, and the residual sugar values subtracted from the results found for each substrate. Blank reactions were also made by incubating substrates in the absence of supernatants (both experimental and control groups), and the residual sugar values subtracted from the results found for each substrate.

Morphological Analyses by Electronic Microscopy

The analyses of dried BED by scanning electron microscopy (FEI Quanta 250) were performed by sampling on the 7th day of growth. The control consisted of a BED on the 7th day non-inoculated medium. ImageJ 1.33f, which is available online, was used to measure fiber areas of different lengths of sugarcane bagasse [14]. Environmental scanning electron microscopy micrographs were obtained with a non-commercial FEI Quanta 250 equipped with a Bruker QUANTAX EDS XFlash 6 detector. The samples were observed with an acceleration voltage of 15 kV, the current of the probe 80 pA, and the environmental distance between the sample surface and the second pressure limiting opening was 10 mm.

DNA Extraction and 16S rDNA and ITS2 Sequencing

Samples from the microbial consortium collected on the 7th day of cultivation were filtered using a nylon membrane (12 μm). The retained material was washed by centrifugation in 10 mL PBS, 14,000 g at 4 °C for 10 min. DNA extraction was performed using FastDNA® SPIN Kit for soil (MP Biomedicals), according to the manufacturer’s instructions.

The diversity and composition of bacterial communities were determined using the protocols of Caporaso et al. [15] and White et al. [16]. The pairs of oligonucleotide primers used in the amplification of the V4 region of the 16S rDNA gene and ITS2 were modified by the insertion of the illumina adapter sequence at the 5′ of each oligonucleotide, resulting in the following sequences, respectively: 515F_V4 5′- tcgtcggcagcgtcagatgtgtataagagacaggtgccagcmgccgcggtaa-3′, 806R_V4 5′-gtctcgtgggctcggagatgtgtataagagacagggactachvgggtwtctaat-3′ [15] and ITS3 (5′-tcgtcggcagcgtcagatgtgtataagagacagggcatcgatgaagaac-3′), ITS4 (5′- gtctcgtgggctcggagatgtgtataagagacagtcctccgcttattgatatg-3′) [16]. PCR was carried out with high fidelity Phusion polymerase (Thermo Scientific) using the following cycle: 2 min for 98 ºC (initialization and denaturing), followed by 98 °C for 30 s, 57 ºC for 40 s (annealing), 72 ºC for 2 min (extension) for 25 cycles, with a final elongation step at 72 °C for 10 min. The amplicon libraries were then prepared using Nextera DNA Sample Preparation Kit, following the manufacturer’s recommended protocol (Illumina). All amplicon libraries were performed in duplicate and finally pooled in equal molar quantities. The sequencing run was performed with a Miseq Reagent kit v2 (300 cycles) mixed with 15% PhiX control V3 on a Miseq instrument available at NGS sequencing facility at LNBR/CNPEM.

Diversity Analysis

Raw 16S rRNA reads from Illumina sequencing were quality checked using FASTQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/); the USEARCH merge pairs function was used to assemble pair-end reads and to remove low-quality sequences. Merged reads were further analyzed using the UPARSE pipeline [17]. Sequences were clustered into operational taxonomic units (OTU) based on a 97% identity using the USEARCH method. Chimeric sequences were identified and removed using the UCHIME method implemented with the cluster OTU function. The representative sequence from each OTU was assigned to taxonomic ranks using Ribosomal Database Project RDP Classifier [18] and UNITE Fungal ITS training sets [19]. The Chao1, ACE, Shannon, and Simpson indices, rarefaction curves, and sample coverage (Good’s coverage) were calculated by employing the Phyloseq and Vegan packages using R statistical software version 3.2. Nucleotide sequences were deposited in NCBI under the accession numbers: SRR12005466.

Metagenomic Analysis

An Illumina Nextera DNA library was prepared using total DNA from the 7th day of enrichment following the manufacturer’s specifications. Sequencing was performed in paired-end mode (2 × 100-bp reads) on an Illumina Hiseq 2500 instrument (Illumina, San Diego, CA, USA) available at NGS sequencing facility at LNBR/CNPEM. The de novo metagenome assembly was performed using the IDBA-UD software [20]. Gene prediction and automatic annotation were performed using Prokka [21]. Contigs shorter than 1000 nucleotides were excluded from downstream analysis. Nucleotide sequences were deposited in NCBI under the accession numbers: SRR11869874.

The metagenome de novo assembly was binned using CONCOCT [22] with default parameters to recover metagenome-assembled genomes (MAGs). The completeness and contamination ratios of the recovered genomes were assessed using CheckM [23] and the most likely taxonomic ranks of the recovered genomes were predicted by GTDB-Tk [24].

Functional annotation MAGs were done using Prokka. Further analysis for carbohydrate-active enzyme annotation was performed using the dbCAN2 HMM profiles [25], the carbohydrate-active enzymes database [26], and PFAM database [27]. Clusters of CAZyme genes were analyzed using CGC clusters from the dbCAN-seq database [28].

Statistical Analysis

The statistical set of the GraphPad Prism 6 program (GraphPad Software, San Diego, CA) was used for data and image analysis. The results were expressed as an average of 3 independent repetitions. The experimental data microscopic images and enzyme activity data were submitted to ANOVA and the means compared with the Tukey test using a 95% confidence interval. For metagenomics data, the statistical analysis was made using specific bioinformatic programs.

Result and Discussion

The Mobilization of Sugarcane Bagasse by NaLC

The pretreatment of lignocellulosic biomass is considered an essential step in increasing the accessibility of lignocellulose to enzymatic hydrolysis [1]. The initial chemical composition of raw sugarcane bagasse was 41.3% cellulose, 29.3% hemicellulose, and 23.9% lignin. The steam explosion pretreatment process promoted partial solubilization and consequent detachment of hemicellulose, reducing the initial content from 29% to just 7.9% (in remaining solids), and a relative increase of cellulose content (by percentage) in the treated sugarcane bagasse (Table 1). The chemical composition of sugarcane bagasse (BED) incubated with no bacteria consortia (control groups) showed little modifications when compared to non-incubated BED. These differences may be explained by the release of microspheres in the culture medium that were not collected for chemical analysis.

Table 1 Chemical composition of sugarcane bagasse

Due to the delignification step, the lignin content of BED also decreased to 18.2%. The pretreatment of sugarcane (with steam explosion followed by delignification) results in partial hydrolysis of hemicelluloses and concomitant lignin oxidation. This process also generates defibrillation of cellulose, allowing access to hydrolytic enzymes for subsequent fermentation [1].

The analysis of BED degradation by the NaLC was performed over 7 days, showing a high consumption rate from the 1st to the 3rd day of cultivation, during which around 35% of the biomass was consumed compared to the control (Fig. 1A). However, from the 3rd to the 7th day, the rate of degradation decelerated. In general, 55% degradation of the lignocellulosic content by NaLC occurred (Fig. 1A), where it was mainly the cellulosic fraction that decreased from 75.7 to 47.7% after 1 week of incubation (Table 1).

Fig. 1
figure 1

Lignocellulose degradation profile of NaLC. A Determination of the dry mass loss of the sugarcane bagasse. B The extracellular enzymatic activities of the supernatant lignocellulolytic microbial consortia (NaLC), gray bars activity after BED incubation for 3 days and black bars for 7 days. Values are the mean ± SD (standard deviation) from three independent experiments; NS indicates a non-significant difference at p = 0.05. Xyg, xyloglucanase; AVCase, avicelase; BG, B-glucanase; Xn, xylanase

Wongwilaiwalin et al. [2] showed efficient degradation of filter paper (54.7%), sugarcane bagasse (59.4%), and rice straw (75.3%) through a thermophilic microbial consortium bred from sugarcane bagasse compost. Although it is very difficult to compare the results of different studies, several factors such as the microorganisms’ source, substrate composition, and the pretreatment and culture conditions which influenced the rates of fiber degradation by different consortia.

In order to correlate biomass degradation with the enzymatic activities, cellulases and hemicellulases of NaLC were evaluated. Enzymatic activities were detected in the supernatant of 3rd day cultivation samples for β-glucanase and xylanase (Fig. 1B). The NaLC supernatant showed low activity against β-glucan, xylan, avicel, and CMC, and showed no activity against pectin, xyloglucan or arabinoxylan.

The breakdown of insoluble and complex polysaccharides requires a close interaction between the enzyme and the substrate as well as the cooperation of multiple enzymes to improve hydrolysis [3]. Anaerobic microorganisms (such as those of intestinal microbiota) degrade the lignocellulolytic material through a multienzymatic complex, called a cellulosome. This structure has great spatial proximity to and high affinity with the lignocellulolytic surface, facilitating the synergistic action of hydrolytic enzymes upon substrates [29]. In contrast, the aerobic microorganisms degrade lignocellulose by secreting several soluble cellulases and hemicellulases [29].

It was also demonstrated some morphological changes in BED after the 7th day of cultivation with NaLC. The structure of BED showed a significant decrease in the thickness of the fibers observed by scanning electron microscopy (Fig. 2A–C). In addition, there was also observed the presence of microspheres with varied sizes in both control and experimental groups, but these structures were generally 3 × larger in BED incubated with NaLC, although less common (Fig. 2D, E). The distribution, size, and abundance of these microspheres vary, depending on the type of pretreatment [30]. The enlargement of microspheres formed by incubation with NaLC may be caused by the fusion of smaller spheres.

Fig. 2
figure 2

Morphological analysis sugarcane bagasse treated with NaLC. General view of the sample showing fibers and pseudo-lignin after incubation in control (no bacteria-CTL) (A) and NaLC (B) at 7 days. The insets show the amplification of pseudo-lignin in the both groups. Measures from fiber areas control (black bars) and bacteria consortia (NaLC) (grey bars) by 7 days (C). Measures of diameter and frequency of fibrous microspheres by scanning electronic microscope of sugarcane bagasse incubated in control (black bars) and NaLC (gray bars) at 7 days (D and E, respectively). Values are the mean ± SD (standard deviation) from three independent experiments. Asterisk (*) indicates a significant difference where *p < 0.05 and ****p < 0.0001

According to Shinde et al. [30], these microspheres show similar properties with lignin and are named pseudo-lignin. Several studies have reported the formation of pseudo-lignin due to pretreatment with low pH and/or high temperatures, such as pretreatment with steam explosion [30, 31]. This is the first time that formation of these structures has been observed due to microbial consortia. The composition and quantity of pseudo-lignin vary significantly depending on the type of biomass and pretreatment conditions used. One mechanism involved in the formation of pseudo-lignin is the condensation between lignin and furfural/hydroxymethyl furfural (HMF) derived from the dehydration of pentose and hexose. HMF and furfural can also produce other aromatic compounds which may give rise to pseudo-lignin via polymerization/polycondensation reactions [30, 31]. Another factor that probably contributes to the formation of pseudo-lignin is the cultivation condition of NaLC under high aeration conditions. Hu et al. [31] studied the role of oxygen in the formation of pseudo-lignin and showed that an atmosphere of enriched oxygen significantly increases the formation of these microspheres. An important aspect is that pseudo-lignin could bind lignocellulolytic enzymes/microbes in vitro, affecting negatively the enzymatic hydrolysis of polysaccharides [30, 31], which might explain the low enzymatic activity of NaLC showed in Fig. 1B.

Taxonomic Analysis of NaLC

Large-scale sequencing is considered a powerful tool for the study of complex microbial consortia, allowing for the identification of genes, pathways, and genomes involved in plant biomass conversion [3]. In this study, it was evaluated the microbial diversity and composition of the NaLC using 16S rDNA and ITS2. The analysis of the eukaryotic profile based on the ITS2 did not reveal sequences related to fungal species. In addition, the visualization of bagasse by optical and electronic microscopy after NaLC incubation did not reveal the presence of yeasts and/or hyphae (data not shown). The low nutritional quality of the medium with sugarcane bagasse as the sole carbon source probably explains the absence of fungi.

Regarding 16S rDNA sequence analysis, 62.04 quality sequences were obtained, and these were used for diversity analysis (see Supplementary Figure 1), which resulted in the identification of 22 OTUs. Based on the rarefaction curve, which formed a plateau, there was found that depth of sequencing was enough to estimate diversity and richness of this bacterial community (Figure 3A). The NaLC community was dominated by bacteria only and the most abundant phyla in the community were Bacteroidetes (50.6%) and Proteobacteria (47.7%), less representative phyla were Firmicutes (1.5%) and Acidobacteria (<1%) (Figure 3B). The microbial composition in NaLC resulted in the predominance of two phyla, Bacteroidetes and Proteobacteria. These results were similar to other data from aerobic bacterial communities from environments that degrade lignocellulosic compounds from different biomasses, including sugarcane bagasse [32]. However, under anaerobic conditions, several studies on lignocellulolytic microbial enrichment have shown that Firmicutes is the most abundant phylum [32].

Fig. 3
figure 3

Microbial diversity of NaLC. A Rarefaction curve analysis of 16S rDNA sequences from NaLC; B Relative abundance (%) of bacterial composition based on 16S rDNA target sequencing assigned to each phylum level; C genus level. The total OTUs defined by using a threshold of 97% using UPARSE pipeline     

The Bacteroidetes phylum consists of gram-negative bacteria, and their members colonize different habitats, including the gastrointestinal tract of animals [14, 32]. This phylum produces a great variety of hydrolytic enzymes, which are also involved in the processing of complex polysaccharides [32], through the genomic organization of CAZymes in polysaccharide utilization loci (PUL) [12]. Among the Bacteroidetes of NaLC, the representatives of the genus Flavobacterium sp. had an impressive representation (41.8%) (Fig. 3B), which comprises several aerobic cellulolytic bacteria [33].

The phylum of Proteobacteria had a representation of 47.7% in the NaLC and is considered the most extensive and varied among the phyla of bacteria. This phylum has great diversity in morphological and metabolic terms, including autotrophic and heterotrophic, aerobic, and anaerobic [34]. This group is considered to be efficient in the degradation of several aromatic compounds, aliphatic components, and others [34]. In insects, several metagenomic studies have showed that Bacteroidetes, Proteobacteria, Firmicutes, and Actinobacteria are the predominant phyla in the gut of diverse insect orders [35, 36].

Among the Proteobacteria, Sphingomonas sp. and Sphingopyxis sp. had a relative abundance of 18.2% and 4.5% respectively (Fig. 3C). Sphingomonas and Sphingopyxis are widespread in nature and can be obtained from various sources such as water, soil, living organisms, and even human skin. According to Mnich et al. [37] and Shin et al. [38], many Sphingomonas strains produce lignin-degrading enzymes and some species of Sphingopyxis are capable of expressing lignocellulolytic hydrolases.

The phylum Firmicutes is found in several environments, such as the gastrointestinal tract of animals, especially under anaerobic and/or micro aerobic conditions [35]. They are able to degrade recalcitrant components [32], but in data of this study, Firmicutes represented only a small percentage of the analyzed sequences (Fig. 3B). This fact may be related to the aerobic condition of consortia cultivation.

It was also found that the Acidophilic phylum that has already been observed in several environments, such as soils, hot springs, active sludge, and lake sediments [39]. These microorganisms are considered important lignocellulose degrading agents. The genomic analysis of this group showed the presence of several genes involved in the degradation of complex plant polysaccharides. These microorganisms can be cultivated in nitrogen-poor substrates and withstand considerable levels of anti-nutritional compounds such as those found in sugarcane [39].

The Profile of CAZyme Genes in NaLC

To evaluate the composition of genes encoding CAZymes in the NaLC, the metagenomic DNA on the 7th day of cultivation with pretreated sugarcane bagasse was sequenced. The de novo metagenome assembly was performed using the software IDBA_UD and 148.50 open reading frames (ORFs) were identified for most of these genes (about 87%) predicted to encode proteins with sequences of more than 100 amino acids [see Supplementary Fig. 2]. In order to identify putative genes encoding enzymes involved in breaking and modifying carbohydrates, the predicted ORFs were annotated using a dbCAN HMM-based search.

Among the identified CAZyme classes, genes encoding glycoside hydrolases (GHs) represented the most abundant class, followed by glycosyl transferases (GTs), carbohydrates esterases (CEs), while less prevalent classes where those of carbohydrate binding modules (CBMs), auxiliary enzymes (AAs), and polysaccharides lyases (PLs) (Fig. 4A).

Fig. 4
figure 4

Functional classification of identified CAZymes. A The open reading frames (ORFs) of genes encoding proteins classified according to their functions using the dbCAN HMM profiles for automated carbohydrate-active enzyme annotation. B The taxonomic annotation assigned by Ghost Koala annotation software from identified CAZymes (18,954 proteins). C (GH) glycoside hydrolases represented by 20 families with the highest number annotated. D (CE) Carbohydrate esterase. E (CBM) Carbohydrate binding modules are (CBM) represented by 20 families with the highest number annotated. F (AA) Auxiliary activities    

The taxonomic profile of CAZymes revealed the predominance of sequences belonging to the Bacteroidetes and Proteobacteria phylum, indicating that these taxa are probably primary agents in the deconstruction of sugarcane bagasse by NaLC (Fig. 4B), which appears to correlate with the results from 16S rDNA sequence analysis. The specific participation of these NaLC genes, during sugarcane bagasse degradation, is under investigation using the metatranscriptome approach (data not shown).

The polysaccharide lyases (EC 4.2.2.-) are a group of enzymes that cleave uronic acid-containing polysaccharides through an elimination mechanism. The amount of PLs is generally low and always less than that of GHs, the most likely explanation for this observation is that the substrates of PLs represent only a small proportion of all carbohydrate polymers [40]. Therefore, the present study will focus only on the most abundant CAZyme in the breakdown and modification of carbohydrates, which are GHs, CEs, CBMs, and AAs.

Glycoside hydrolases (EC 3.2.1.-) are a group of enzymes which hydrolyze the glycosidic bond between two or more carbohydrates or between a carbohydrate and a non-carbohydrate moiety [41]. Several GH families comprised of classical cellulolytic enzymes such as endoglucanases (GH5 and GH9) and β-glucosidases (GH1 and GH3) were identified in NaLC and are among the most abundant (Fig. 4C). However, genes encoding for cellobiohydrolase family GH7 have not been identified, which was predicted as this family has not yet been found in bacteria or archaea, and only one gene was found for GH6. The family GH23 was highly represented among the total GHs found in NaLC, followed by GH13 and GH43 (Fig. 4C). The glycosyl hydrolases of the GH23 family are known as lysozyme type G, peptidoglycan lyases and chitinases, which represent a group of enzymes that hydrolyze polysaccharides [42]. The enzymes of GH23 family have been classified as debranching enzymes that could act in polysaccharide mobilization [43]. Bras et al. [42] identified in Clostridium thermocellum as endoglucanase whose active site exhibited a conserved structure with enzymes of the GH23 family. The GH13 family, known as the α-amylase family, contains enzymes with different activities and substrate specificities. According to Li et al. [44], these proteins were identified in several metagenomes from different environments.

Genes from the GH43 family that encode hemicellulases have also been found in NaLC, including xylanases (GH43 and also families GH8 and GH5) that randomly cleave the main chain of xylan and β-xylosidase (GH43 and also families GH3 and GH1) that cleave non-reducing ends-releasing xylose monomers, as well as mannosidases (GH2) and α-L-arabinofuranosidases (GH43) (Fig. 4C). Genes from this family, as well as from GH5, are often identified in several microbial communities and are divided into 37 (GH43) and 51 (GH5) subfamilies, which differ in structure and function [45, 46].

The fourth most abundant family among the GHs in the NaLC consortium was the GH3 family (Fig. 4C); this family also appeared among the most abundant in several metagenomes analyzed, including the human gut, mouse, termite, and bovine rumen [44]. In the termite gut, 69 modules corresponding to the catalytic domain of GH3 were identified, suggesting its participation in the final metabolism of oligosaccharides into simple sugars [47].

The second most abundant class of CAZymes found in NaLC was the CEs, a group of enzymes that remove ester substituents from glycan chains, helping GHs in the conversion of biomass [48]. Among the most abundant families found in NaLC were CE1 and CE4, described as acetyl xylan esterases, directly involved in hemicellulose modification [29] (Fig. 4D). These data are similar to that observed by Liu et al. [49], where CE1 and CE4 were one of the most abundant families in the gut microbiota of the higher termite Globitermes brachycerastes. Another well-represented family was CE10 (carboxyl esterase), but these enzymes are no longer considered carbohydrate esterases and their information is no longer updated in the CAZy database.

In addition to the GHs and CEs, non-catalytic modules (CBMs) were also abundant in the NaLC community. CBMs recognize and bind to carbohydrates, improving the catalytic efficiency of GHs in the degradation of polysaccharides [29]. The most abundant family was CBM50, which binds to peptidoglycan and chitin (Fig. 4E) [48]. CBM50 can be found associated with 6 families of GHs, including GH23 (the most abundant among the GHs of NaLC), that could explain the predominance of these CBMs in NaLC.

The class of auxiliary activities (AAs) is involved in the degradation of lignocellulose through oxidative pathways (Fig. 4F) [50]. AA genes were also found among the NaLC, such as genes encoding enzymes from the AA2 (peroxidases), AA4 (vanillyl alcohol oxidase), and AA6 (1,4-benzoquinone reductase) families. Among the AAs involved in the degradation of glucose-based polymers, it was found genes encoding for AA3 (cellobiose dehydrogenases/glucose oxidase) and AA7 (glucooligosaccharide oxidase). Genes from families AA3 and AA7 appear to be associated with both lignin and cellulose degradation [50].

The enzymes from AA3, AA4, AA6, and AA7 families have been associated to eukaryotic cells, mainly fungal species [50]. However, lignin degradation by AA families produced by some bacteria has also been recently described [51]. But the role of these AA enzymes in bacteria is not well understood.

Several Cazyme Genes in NaLC are Organized as PUL

The metagenome assembly used a database for the recovery of microbial genomes throughout the contig binning strategy. This analysis resulted in the recovery of 22 MAGs, being mostly of high-quality, with more than 70% completeness and < 5% contamination, which is in agreement with the total number of OTUs found in 16S rDNA sequence analysis. Among the MAGs, 16 genomes were recovered with more than 90% completeness and 14 genomes were classified at the genus level [see Supplementary Fig. 3]. The MAGs were analyzed using dbCAN database for the identification of CAZymes and CAZyme gene clusters. It was assembled in 30 contigs with a genome size of 5.78 Mb and was classified as belonging to the gram-negative genus Niabella sp., family Chitinophagaceae from the Bacteroidetes phylum.

Bacteria from Bacteroidetes are found in many ecosystems and they usually have several CAZymes involved in the degradation of polysaccharides. They degrade complex glycans outside the cell and fully convert them to monosaccharides in the periplasmic space, avoiding competition for the substrate with other bacteria [52]. The Bacteroidetes microorganism organizes its genes involved in metabolism of specific polysaccharides in specific loci encoding for different proteins such as transporters, hydrolases, and regulators in the same portion of the genome [12]. In the gut of cockroach Periplaneta americana, some Bacteroidetes isolates showed (at their genomes) PUL with typical markers as SusD cell surface glycan binding proteins, SusC-like/TonB-dependent transporters, and transcriptional regulators that include extracytoplasmic (ECT) sigma/anti-sigma factors systems [12].

The specific arrangement of these genes is called a PUL and is widely disseminated in bacteria from bovine rumen, human distal intestine, and ocean environments [53]. Despite being identified in the intestinal microbiota of termites/cockroaches, the presence/participation of PUL in the deconstruction of lignocellulose remains poorly investigated. Therefore, a bioinformatics protocol was run for the identification of PUL in the genome of Niabella sp., using the presence of SusC and SusD gene pairs (transporter genes) as markers [54], and several PUL were identified in the genome of Niabella sp. from NaLC (see supplementary Fig. 4). Supplementary Fig. 5 shows the CAZyme prediction of the MAGs at the phylum level, revealing that the majority of CAZymes identified were GHs, for which 451 genes were found from Bacteroidetes MAGs, 210 from Proteobacteria, 121 from Verrucomicrobia, 79 from Acidobacteria, and 16 from Planctomycetes. Genes encoding CEs, PLs, and AAs were also found in the MAGs, with the majority of them belonging to the Bacteroidetes phylum. Among the recovered genomes, MAG 34 showed the highest number of annotated CAZymes, a total of 193 genes. It is a high-quality MAG with 99.5% genome completeness and 0.5% contamination [see Figure Supplementary 3].

However, unlike PUL for carbohydrates, gene clusters involved in lignin mobilization have not been previously described. Supplementary Fig. 6 shows a specific arrangement of some genes that suggest the presence of a putative lignin utilization locus – LUL. In this case, the SusC and SusD gene pairs were surrounded by genes encoding an AA2 peroxidase, NADH dehydrogenase, and a MnSOD (manganese superoxide dismutase) gene. The catalytic cycle of lignin-modifying peroxidases (AA2) is started with hydrogen peroxide (H2O2) provided by the AA3 family [55], the most abundant AA family in the NaLC, shown in Fig. 4 F. In addition to the peroxidase-based mechanism, some oxidases and reductases can also promote a redox cycle for lignin modification [56]. The detection of NADH dehydrogenase in the genome of Niabella sp. suggests participation of this nucleotide in redox balance during lignin mobilization. The occurrence of lignin degradation by the Fenton reaction, which consists of the reaction between Fe2+ with H2O2, generating hydroxyl radicals with high redox potential [57] and can attack lignin, cannot be ruled out. Moreover, located next to the AA2 gene, an acyl carrier gene was identified, suggesting involvement in the transport of hydrophobic intermediates during lignin oxidation. In addition, integrating this LUL, it was observed a MnSOD gene that was recently described as a ligninase in the genome of Sphingobacterium sp.—T2, also from Bacteroidetes phylum—which would be a new function since it was believed to act only against oxidative stress [57].

This putative new locus (LULs) in Bacteroidetes genomes opens several important questions about co-localization, co-regulation, genomic architecture, sensing/sequestration of monomers/products, enzymatic oxidation, and transport of oligo/monomers of lignin (or derivatives) during lignin mobilization. The organization of these genes in LULs could also give new insights into biomass mobilization by microorganisms.

Conclusion

This study showed that the lignocellulolytic microbial consortium derived from the gut microbiota of the cockroach Nauphoeta cinerea was able to consume 55% of the sugarcane bagasse, which was the only carbon source. The 16S and ITS2 analyses showed that the microbial community of NaLC was composed only of bacteria, mainly from phyla Bacteroidetes and Proteobacteria. The shotgun metagenomic analysis showed that this enrichment consortium had a diversified reservoir of genes encoding for CAZymes (mainly GHs and CEs). It was also observed the presence of a novel locus named here LULs that encodes several enzymes probably involved in lignin mobilization (mainly AA2 peroxidase, NADH dehydrogenase, and a MnSOD gene). Finally, the enzymatic repertoire detailed here showed the potential of microbial communities from the insect gut as a reservoir of novel genes and microorganisms that efficiently decompose pretreated lignocellulosic biomass.