Introduction

The use of lignocellulosic biomass as a renewable feedstock holds great potential for biotechnology, biofuel and biomaterial industries (Guerriero et al. 2016). This kind of feedstock provides an interesting alternative to fossil sources since it is cheap, abundant, recyclable, renewable and widespread biomass on Earth, being its use compatible with food production (Tirado-Acevedo et al. 2010; Sheldon and Woodley 2018). Several innovative strategies have been developed for the conversion of lignocellulose. In this context, more than 200 high value-added compounds have been reported (Isikgor and Becer 2015; Arevalo-Gallegos et al. 2017; Ali et al. 2020; Haldar and Purkait 2020; Ning et al. 2021; Patel and Shah 2021). Various fungi and bacteria able to enzymatically decompose lignocellulose were previously identified. However, the complete enzymatic hydrolysis of this feedstock is still a challenge due to the recalcitrant structure of these materials (Batista-García et al. 2016). Therefore, it is essential to continue the search for novel microbial enzymes with improved ability to fully convert lignocellulose (Montella et al. 2016; Wang et al. 2019).

Metagenomic approaches have opened a window to the unknown diversity of microorganisms present in the environment (Schmeisser et al. 2007). Functional metagenomics is the state of art technique to identify new enzymes and biochemical mechanisms with desired biological activities (Mirete et al. 2016; Berini et al. 2017; Madhavan et al. 2017). An environment where lignocellulosic biomass is naturally processed represents a promising source of lignocellulose-degrading enzymes. Compost habitats are recognized as important bioreactors for the renewable bioenergy of the planet, sustaining an immense diversity of microorganisms specialized in the degradation of lignocellulosic residues (Wang et al. 2016). Therefore, this type of biomass represents a favourable source of biocatalysts with high stability and efficiency. Isolation of high molecular weight (HMW) metagenomic DNA is the critical step for metagenomic library construction for functional screening. Since the 1980s, several methods for DNA extraction have been published and continue to be revised (Lear et al. 2018). Numerous studies tried to establish the most suitable protocol for the extraction of total soil DNA (Philippot et al. 2010; Petric et al. 2011; Tanase et al. 2015). Nevertheless, a standard method that can be universally applied to all soil types to obtain DNA with the desired purity, quantity and HMW for metagenomic approaches is still lacking.

DNA extraction from compost samples represents a challenging procedure since these samples are rich in organic matter and humic material. Humic substances have a similar charge and size to DNA, generally resulting in co-precipitation during the extraction process, as may be evidenced by the frequent brown colour of some DNA extracts (Harry et al. 1999; Lakay et al. 2007). The presence of humic material in DNA extracts also inhibits the activity of some enzymes, thus limiting downstream applications (Harry et al. 1999; Dong et al. 2006). Some studies highlighted the difficulty to recover pure DNA and reported the development of several specific methodologies to increase purity (Harry et al. 1999; Rajendhran and Gunasekaran 2008; Young et al. 2014). Therefore, time-consuming methods like the use of caesium chloride density gradients, extensive and repetitive precipitation steps and also expensive chromatographic procedures have been described (Lakay et al. 2007; Rajendhran and Gunasekaran 2008; Sar et al. 2018). Nevertheless, these methodologies either are not cost-effective or require additional steps for the purification of DNA, namely the removal of humic acids and size selection.

In this work, a direct extraction method for metagenomic DNA isolation from compost or soil samples containing up to 50% of lignocellulosic material is described. This alternative protocol allows a rapid, efficient and cost-effective extraction of total DNA with the quantity, purity and HMW suitable for further metagenomic studies (provisional patent application PT116634). The methodology relies on the mechanical and chemical/enzymatic lysis of cellular material combined with the action of powdered activated charcoal (PAC). Due to the pore volume and large surface area which results in an exceptional adsorption capacity, PAC is commonly used to eliminate several contaminants (Marsh and Rodríguez-Reinoso 2006).

Material and methods

Reagents

All chemicals and buffer components were of analytical grade and purchased from Sigma-Aldrich (St. Louis, USA) unless specified otherwise.

Compost samples

The compost samples were collected from two Portuguese composting units, namely Terra Fértil (Chamusca) and Lipor (Porto). The Terra Fértil unit generally handles agroforestry residues (around 50%) and municipal sludge (around 50%). The sample from Terra Fértil (approximately 1 kg) was obtained from a pile with approximately 4 weeks of composting, corresponding to the thermophilic phase of the process. The Lipor unit generally combines food wastes (50–70%) and agroforestry residues (30–50%). The sample from Lipor (approximately 1 kg) was obtained from a pile at the mesophilic phase of the process. Until being used for physicochemical characterization and DNA extraction, the compost samples were stored at 4 °C.

Compost sample characterization

Moisture content was determined gravimetrically after drying the samples in an oven at 105 °C until constant weight. The pH of the compost samples was determined from a mixture of water/compost in the ratio of 4:1 (w/w) using a pH meter with a calibrated glass electrode (Kalra 1995). Total organic carbon, nitrogen, hydrogen and sulphur content was determined through elemental analysis by automated dry combustion (Requimte/LAQV, Faculty of Sciences and Technology, Nova University of Lisbon, Portugal).

Metagenomic DNA isolation

The methodology was established based on previous studies from Zhou et al. (1996), Bergmann et al. (2014), Devi et al. (2015) and Verma et al. (2017) with key modifications and optimizations. Total DNA was extracted from compost samples using simultaneously mechanical and chemical/enzymatic methods. The compost sample (1 g) was mixed with 5 ml of a previously optimized lysis buffer (100 mM Tris–HCl, 100 mM Na EDTA, 1.5 M NaCl, 100 mM Na2HPO4, 100 mM CaCl2.2H2O, 1 mg/ml of proteinase K (NZYtech, Lisbon, Portugal), 1 mg/ml of lysozyme, 0.2 mg/ml RNase A (NZYtech, Lisbon, Portugal)) and 1% (w/v) PAC (4–8 mesh) and 1 g of glass beads (425–600 µm) in a 50-ml tube. Depending on the tested conditions, the mixture was further homogenized: (i) briefly in a vortex, (ii) at maximum speed for 2.5 min or (iii) at maximum speed for 5 min. The soil suspension was then incubated at 37 °C, 150 rpm, for 30 min. After that, 1 ml of 20% (w/v) sodium dodecyl sulphate (SDS) was added and the sample was incubated for 30 min at 65 °C (the tube was inverted to properly mix every 10 min). The obtained lysate was centrifuged at 3200 × g, 4 °C, for 10 min, and the supernatant was transferred to a clean 15-ml tube and gently mixed with 1 volume of chloroform:isopropanol (C:I) (24:1 v/v). The mixture was then centrifuged at 7000 × g, 4 °C, for 5 min. The aqueous phase (top phase) was transferred to fresh tubes followed by DNA precipitation through the addition of 1 volume of 3 M C2H3NaO2 (pH 5.2) and 0.4 volumes of 30% (w/v) polyethylene glycol (PEG, MW-8000) and mixed by inversion. The tubes were kept at − 20 °C for 20 min. The tubes were slowly thawed on ice, and then, the DNA pellet was precipitated at 12,000 × g, 4 °C, for 10 min, and resuspended with 500 μl of Tris–EDTA buffer (TE) (10 mM Tris, 1 mM Na EDTA, pH 8.0). After that, 1 volume of C:I (24:1 v/v) was added. The tubes were gently inverted a few times and further centrifuged at 12,000 × g, 4 ºC, for 5 min. The aqueous phase was transferred to fresh tubes and 1 volume of cold isopropanol was added. After gently shaking by inverting the tubes, the mixture was allowed to precipitate for 5 min at 4 °C. The precipitated DNA was pelleted by centrifugation at 12,000 × g, 4 °C, for 10 min, and washed twice with 500 μl of 70% (v/v) ethanol. After that, the DNA sample was centrifuged at 12,000 × g, 4 °C, for 2 min. The supernatant was discarded and the pellet was air-dried for 10 min at room temperature (RT). Metagenomic DNA was dissolved in 100 μl of TE buffer and stored at 4 °C. Three replicates of each compost sample were used to evaluate the reproducibility of the methodology.

Additional experiments to evaluate the effectiveness of the method and validate its potential application under different conditions were performed. For the Terra Fértil sample, 10 mg of commercial humic acids was added to the sample. For the Lipor sample, the vortex time used for cell lysis was reduced to 2.5 min (Lipor 2.5) and a condition without mechanical lysis (Lipor 0) was also evaluated, as described above.

Yield and purity of the metagenomic DNA

DNA was quantified using the Nanodrop One (Thermo Scientific™, Waltham, USA), and the purity was determined by both absorbance ratios of A260/280 and A260/230. DNA recovery yield was estimated as an average value with standard deviation (g DNA g−1 compost). To evaluate the quality of the extracted DNA, a 0.5% agarose (Grisp, Porto, Portugal) gel electrophorese was run in 1 X Tris–acetate EDTA buffer (TAE), at 25 V, 4 °C, for 24 h. GeneRuler High Range DNA Ladder (ThermoFisher, Waltham, USA) was used as a molecular marker according to the manufacturer instructions. The gel was stained with 1 mg/ml Thiazol orange and visualized using the ChemiDoc™ XRS + (Bio-Rad, Hercules, USA).

Humic acid determination

To estimate the concentration of the humic acids in both the original composting sample and the extracted DNA, a standard curve (0–500 ng μl−1) was prepared using commercial humic acids (Alfa Aesar, Kandel, Germany) and the absorbance was measured at 340 nm (A340) on a spectrophotometer reader (Bio-Tek from Izasa, Carnaxide, Portugal). Humic acids from both compost samples were extracted by acid precipitation as previously reported (Sar et al. 2018). Briefly, 1 g of compost sample was mixed with 9 ml of 0.1 M NaOH and stirred for 3 h at RT. Then, the mixture was centrifuged at 2500 × g, RT, for 10 min. The supernatant was transferred to a fresh tube and acidified to pH 1.0 using concentrated HCl and incubated overnight at RT in the dark. The precipitated humic acids were pelleted by centrifugation at 2500 × g, RT, for 10 min and air-dried for about 15 min. The pellet was resuspended in 1 ml of TE. Suitable dilutions were prepared for both the humic acids extracted from the original compost samples and the humic acids present in the purified DNA solutions. In all cases, the concentration of humic acids was similarly determined by measuring A340 and using the previously prepared standard curve. All the analyses were performed in triplicate.

PCR amplification

The purity of the DNA samples was evaluated by PCR to amplify a region of the prokaryotic 16S rRNA gene (Wu et al. 2009). For amplification of 1500 bp of the 16S rRNA gene, the universal primers pair 1492R (5′-TACCTTGTTACGACTT- 3′) and 27F (5′- AGAGTTTGATCCTGGCTCAG- 3′) were used. PCR reaction mixture (total volume of 25 µl) was prepared using 0.5 µl of each primer of a work solution prepared at 10 µM, 1 µl of template DNA (approximately 20 ng/µl), 12.5 µl of Speedy Supreme NZYtaq 2 × Green Master Mix (NZYTech, Lisbon, Portugal) and 10.5 µl of water for each sample. Amplification was performed using the following cycles: 1 cycle at 95 °C for 2 min, followed by 35 cycles of 30 s at 95 °C, 40 s at 52 °C and 1 min at 72 °C followed by 5-min extension step at 72 °C. PCR was performed using Life ECO Thermo Cycler (BIOER, Hangzhou, China). The Escherichia coli strain EPI300-T1R (Epicentre Biotechnology, Madison, USA) and distilled water were used as positive and negative controls, respectively. The amplified products were separated by electrophoresis in a 0.8% agarose gel at 80 V for 45 min. NZYDNA Ladder III (NZYTech, Lisbon, Portugal, range size 200–10,000 bp) was used as a molecular weight standard. The gel was stained with 1 mg/ml Thiazole orange and visualized using the ChemiDoc™ XRS + (Bio-Rad, Hercules, USA).

Construction of a metagenomic library

To evaluate the suitability of the extracted HMW DNA for metagenomic approaches, a metagenomic library was constructed for both compost samples. An aliquot of the metagenomic DNA isolated from the compost sample was end-repaired to generate blunt-ended, 5′-phosphorylated DNA, according to the manufacturer’s instructions of CopyControl™ fosmid library kit (Epicentre Biotechnology, Madison, USA). The reaction mixture was scaled up to a final volume of 20 µl. Since 90% of the end-repaired metagenomic DNA possessed a range of approximately 40 kb and a relatively tight band was observed (data not shown), it was directly used for the construction of the fosmid library. This library was constructed using the pCC1FOS™ vector with E. coli strain EPI300-T1R as a host, according to the manufacturer’s instructions of the CopyControl™ fosmid library kit. One fosmid clone was randomly picked, and the fosmid DNA was isolated using the FosmidMAX™ DNA Purification Kit (Epicentre Biotechnology, Madison, USA) according to the manufacturer’s instructions. The fosmid DNA insert (around 10 µg) was verified by restriction digestion using the Fast Digest enzymes NotI (Thermo Fisher Scientific, Waltham, USA) and HindIII (Thermo Fisher Scientific, Waltham, USA), according to the manufacturer’s instructions. The pattern of enzymatic restriction of the fosmid was evaluated by running a 0.5% agarose (Grisp, Porto, Portugal) gel electrophorese in 1 X TAE, at 25 V, 4 °C for 24 h. λ DNA-Mono Cut Mix (NEB, Ipswich, USA) was used as a molecular marker. The gel was stained with 1 mg/ml Thiazol orange and visualized using the ChemiDoc™ XRS + (Bio-Rad, Hercules, USA).

DNA sequencing and taxonomic annotation

Metagenome sequencing, de novo assembly and open reading frames (ORFs) prediction and taxonomic annotation were performed by Novogene (Cambridge, UK). For the construction of the metagenomic library, the NEBNEXT® ULTRA™ DNA LIBRARY PREP KIT was used following the manufacturer’s recommendations (New England Biolabs, Ipswich, USA) and the shotgun sequencing was performed using the Illumina NovaSeq6000 Platform. High-quality short reads of the DNA sample were initially assembled using an optimized SOAPdenovo protocol (Li et al. 2010) and the prediction of the ORF was performed using the MetaGeneMark (Gene Probe, Inc, San Diego, USA). The redundant ORFs were removed by CD-HIT (Li and Godzik 2006). The taxonomic annotation was performed by BLASTX using DIAMOND program (Buchfink et al. 2015) against the NCBI protein non-redundant (NR) database.

Results

Compost characterization

Two lignocellulose-rich compost samples with high humic acid content were used to optimize an efficient methodology for the isolation of HMW DNA suitable for metagenomic approaches. The results obtained for the physicochemical characterization of the compost samples are described in Table 1. The compost sample from Terra Fértil was collected at the thermophilic phase of the process at a temperature around 62.7 °C, presenting a pH of 6.96 and a moisture content of 69%. For each gramme of compost, approximately 10.6 mg of humic acids was extracted. The compost sample from Lipor was collected at the mesophilic phase of the process (20–40 °C), presenting a pH of 8.47 and a moisture content of 87.4%. For each gramme of compost, approximately 10.9 mg of humic acids was extracted.

Table 1 Physicochemical properties of the compost samples used in the DNA extraction procedure

Optimization of the DNA extraction method

The experimental procedure herein presented results from the combination of different strategies of DNA extraction previously reported (Zhou et al. 1996; Bergmann et al. 2014; Devi et al. 2015; Verma et al. 2017) with key modifications and improvements. Several experimental conditions (temperature, incubation and agitation time, buffer composition and addition of reagents) were previously tested (data not shown) to select the best combination towards a purer and HMW DNA while reducing the time and costs associated with the protocol. As an example, in Table 2, three preliminary tested protocols are presented comparing the main experimental conditions and results obtained. Additionally, the visual aspect of the DNA solutions after extraction with these preliminary protocols is shown in Fig. 1.

Table 2 Comparison of the main experimental conditions and results obtained for some preliminary designed protocols for DNA extraction from compost samples and the optimized protocol
Fig. 1
figure 1

Visual aspect of the DNA solutions after extraction with the protocols A, B, C and optimized method as described in Table 2

The optimized method successfully uses mechanical and chemical/enzymatic lysis to access the HMW DNA, and PAC to effectively remove contaminants, namely the humic acids (Table 2). A schematic representation of the improved methodology for metagenomic DNA isolation from compost samples is represented in Fig. 2. As illustrated in the figure, the optimized methodology comprises three main steps, namely cell lysis and humic acid removal, DNA recovery and DNA purification. The most relevant improvement noticed for this methodology resulted from the optimization of the cell lysis (procedure and buffer) and the addition of an effective agent to remove the undesired contaminants. Therefore, in the second step, after humic materials and other contaminants being successfully eliminated by centrifugation, the metagenomic DNA was retrieved. In the last step, the crude DNA was ultimately purified. The visual aspect of the DNA solution obtained after extraction with the optimized method is presented in Fig. 1.

Fig. 2
figure 2

Overview of the improved DNA extraction method comprising 3 main steps. PAC, powdered activated charcoal; SDS, sodium dodecyl sulphate; C:I, chloroform:isopropanol; PEG, polyethylene glycol (MW-8000); TE, Tris–EDTA buffer

The efficiency of the isolation procedure and quality of the DNA

Total DNA was successfully extracted from both compost samples using the optimized method described above. With 5 min of mechanical lysis, an average yield of 10.5 ± 0.22 and 23.9 ± 0.82 μg of DNA per g of compost was obtained for Terra Fértil and Lipor, respectively. Furthermore, the 260/280 absorbance ratio was 1.80 ± 0.02 for Terra Fértil and 1.91 ± 0.01 for Lipor. Regarding the 260/230 absorbance ratio, 1.6 ± 0.03 and 2.0 ± 0.01 were obtained for Terra Fértil and Lipor, respectively. All these parameters indicate DNA with potential for further use in metagenomic studies (Table 3). However, the absorbance ratios were clearly affected by the addition of commercial humic acids (10 mg) to the Terra Fértil sample (Table 3).

Table 3 Yield and purity of the isolated metagenomic DNA

The size and fragmentation of the isolated metagenomic DNA were determined by agarose gel electrophoresis (Fig. 3). A poor fragmentation of DNA was observed for both samples from Terra Fértil (lanes 1 and 2). However, the DNA extracted from the Lipor sample with 5 min of agitation in the vortex for mechanical lysis was highly fragmented (lane 5). When extracting DNA from the Lipor sample without mechanical lysis (lane 3), only 2.5 ± 0.03 μg g−1 of compost was recovered (Table 3). Nevertheless, when reducing the mechanical lysis time from 5 to 2.5 min, similar yields to the ones of Terra Fértil were obtained (13.8 ± 0.04 μg g−1 of compost) and the DNA fragmentation was visibly reduced (lane 4). With the addition of 10 mg of humic acids to the Terra Fértil sample, the amount of extracted DNA was reduced by half (5.6 ± 0.18 μg g−1 of compost). Additionally, for all the tested conditions, the humic acids were totally removed (> 99.9%) using this optimized methodology (Table 3).

Fig. 3
figure 3

Molecular weight characterization of the extracted DNA by gel electrophoresis. DNA was electrophoresed on 0.5% agarose gel, 4 °C for 24 h. Lane M —GeneRuller High Range DNA ladder; lane 1—Terra Fértil sample with 5-min agitation in the vortex for mechanical lysis; lane 2—Terra Fértil sample + 10 mg of humic acids with 5-min agitation in the vortex for mechanical lysis; lane 3—Lipor sample briefly mixed in the vortex (no mechanical lysis); lane 4—Lipor sample with 2.5-min agitation in the vortex for mechanical lysis; lane 5—Lipor sample with 5-min agitation in the vortex for mechanical lysis

Validation of metagenomic DNA in downstream applications

The purity of the isolated metagenomic DNA and its suitability for downstream applications was validated by amplification of the conserved domain 16S rRNA and through the construction of a metagenomic library. For all the replicates studied, a product of 1500 bp was successfully amplified and no PCR product was observed in the negative control (Fig. 4). The metagenomic library was successfully constructed with approximately 6.5 × 10−3 colony-forming unit (cfu) ml−1 for both the Terra Fértil and Lipor samples. One randomly selected clone was purified and verified. The fosmid insert was submitted to restriction digestion using the enzyme NotI, which cut twice the pCC1FOS™ vector releasing the insert, and the enzyme HindIII that cuts once the pCC1FOS™ vector. The restriction pattern obtained is shown in Fig. S1 (supplementary material). The insert size presented about 40 kb, i.e. the ideal size for the construction of fosmid libraries.

Fig. 4
figure 4

PCR efficacy of the isolated metagenomic DNA by amplification of the 16S rRNA gene. Lane M—NZYDNA Ladder III. Lane 1—negative control (no template); lane 2—Terra Fértil sample with 5-min agitation in the vortex for mechanical lysis; lane 3—Terra Fértil sample + 10 mg of humic acids with 5-min agitation in the vortex for mechanical lysis; lane 4—Lipor sample mixed briefly in the vortex (no mechanical lysis); lane 5—Lipor sample with 2.5-min agitation in the vortex for mechanical lysis; lane 6—Lipor sample with 5-min agitation in the vortex for mechanical lysis

Taxonomic annotation

The results of the taxonomic annotation at the kingdom level indicated the dominance of bacterial community for both samples. After analyzing the abundance at the phylum level within the bacterial community, it is possible to observe that the compost sample from Lipor presented the following composition: Proteobacteria > Bacteroidetes > Actinobacteria > Firmicutes (Fig. 5). The compost sample from Terra Fértil presented a similar composition, but an additional phylum was identified, namely the Balneolaeota (Fig. 5). For both samples, the genes annotated to other phyla with low abundance (< 1%) and those that were not assigned in the NR database were included in the group Others.

Fig. 5
figure 5

Microbial composition (at the phylum level) obtained for the Terra Fértil and Lipor samples and also for similar samples containing (A) chipped wood (Montella et al. 2017), (B) sugarcane bagasse (Soares et al. 2018), (C) corn stover (Zhu et al. 2016), (D) apple pomace (Zhou et al. 2017) or (E) wood detritus (Oh et al. 2017)

Discussion

DNA isolation represents the basic and probably the most important step in molecular biology. The technology for DNA isolation from highly contaminated composting or soil-like samples is still currently a limiting factor in the meta-omic analysis. For the effective construction of metagenomic libraries, it is important to ensure that enough amount of highly pure and HMW DNA is isolated. Extracting DNA from soil-like samples requires the previous understanding of the remarkable complexity of the original samples and the identification of the multiple factors that may affect the performance of the extraction method (Miller et al. 1999). Commercially available DNA extraction kits have significant limitations in recovering high amounts of DNA with HMW since they have not been designed for this purpose. Additionally, the commercial kits are usually expensive and the UltraClean™ Soil DNA Isolation Kit, amply considered as one of the most effective for metagenomic DNA extraction from soil-like samples (Sharma et al. 2014), was discontinued. In general, the direct lysis method is more used than the indirect methods based on preliminary cell extraction. These indirect methods specifically target prokaryote DNA almost excluding eukaryotic DNA (Gabor et al. 2003); thus, the resulting DNA is not totally representative of the global environment. The major disadvantage of the direct lysis method is the potential co-extraction of humic acids, which can significantly compromise the success of soil metagenomic projects (Rajendhran and Gunasekaran 2008).

An environment where lignocellulosic biomass is naturally degraded, such as composting units, represents an abundant source of novel enzymes and an interesting sample for functional metagenomic studies (Gudiña et al. 2020). In this work, two composting samples rich in lignocellulosic material and presenting a high content of humic acids were used to develop and optimize an isolation procedure for metagenomic DNA. The samples were obtained from two Portuguese compost units presenting around 10 mg g−1 compost of humic acids, as illustrated in Table 1. Sar and co-workers (2018) studied the DNA extraction from other lignocellulosic samples, using indirect methods, and found a similar humic acid content in salt lake wood (9.7 mg g−1). However, for other samples such as forest soil or rice straw compost, a lower humic acid content (5.4 and 6.5 mg g−1, respectively) was reported (Sar et al. 2018). Additionally, higher levels of humic acids were found on forest arenosols (27.32 mg g−1) and meadow histosol (34.66 mg g−1) (Wnuk et al. 2020). Variations in the humic acid content are strongly dependent on the origin and composition of the sample. Since the compost samples analysed in this work are from different origins, they also present different elemental contents and physicochemical properties (Table 1). The Terra Fértil sample was collected at the thermophilic phase of the process (around 62.7 °C) and a C:N ratio of 11.5 was observed. The C:N ratio is considered typical for this phase of the decomposition due to some losses of carbon, mainly as carbon dioxide, and the increase of N content per unit of material (Goyal et al. 2005). The thermophilic phase was selected since it is the phase where it is expected to find more robust enzymes potentially involved in the degradation process of lignocellulose, such as cellulases, hemicellulases or xylanases (Berini et al. 2017; Wang et al. 2019). The Lipor sample was collected at the mesophilic phase of the process and, as expected, a higher percentage of C was obtained since the composting process was initiating (Bernal et al. 1998). Nevertheless, the visible heterogeneity and variable particle size of the Lipor sample resulted in higher standard deviations associated with its elemental composition. These two compost samples were selected to cover the main steps (mesophilic and thermophilic phases) of the composting process.

Some conventional methodologies described in the literature (Zhou et al. 1996; Yeates et al. 1998; Pang et al. 2008) were initially followed to extract high-quality DNA from the compost samples. Nevertheless, using those protocols, the extracted DNA presented low quality, quantity and molecular weight (< 20 kb, highly fragmented, data not shown). Thus, some improvements were considered in the development of our methodology for DNA extraction from compost samples. Several modifications in crucial steps, such as the use or not of mechanical lysis, temperature and incubation time, buffer composition and addition of reagents and enzymes, were established and evaluated based on previous reports from Bergmann et al. (2014), Devi et al. (2015) and Verma et al. (2017). For the majority of the preliminary tested conditions, unsatisfactory results were obtained regarding reproducibility, DNA quality or molecular weight. The high content of organic carbon and humic acids in the compost samples may explain these unsatisfactory results. The three preliminary tested protocols presented in Table 2 allow comparing the main conditions and the obtained results. In protocol A, neither removal agent for humic acids nor mechanical lysis was used, and the lysis buffer was prepared according to Pang et al. (2008). Using these conditions, a brown solution with no genomic DNA was obtained (Fig. 1). The presence of contaminants, such as humic acids, can explain the visual appearance of the solution. To remove these contaminants, cetrimonium bromide (CTAB) and polyvinylpolypyrrolidone (PVPP), already described in the literature as effective removal agents (Zhou et al. 1996; Rajendhran and Gunasekaran 2008), were added to the lysis buffer used in protocol B, but again, a brown solution (Fig. 1) was obtained with a very low amount of genomic DNA. In protocol C, Na2HPO4 (used to keep DNA integrity) and CaCl2 (flocculation agent) were added to the lysis buffer (Rajendhran and Gunasekaran 2008; Verma et al. 2017). With this protocol, great improvements were achieved and a highly clear solution was observed at the end of the DNA extraction procedure (Fig. 1). Nevertheless, the DNA was extremely fragmented. In addition, this procedure (protocol C) was extremely time consuming (approximately 3 days mainly due to the slow resuspension of DNA performed at 4 °C to avoid fragmentation) and exhibited poor reproducibility. All these previous attempts allowed us to acquire the necessary expertise to design the current optimized methodology (Table 2) which resulted in an unusual but effective sequence of steps. In this case, a highly clear solution of metagenomic DNA (Fig. 1) with > 40 kb was obtained (Fig. 3).

The methodology herein proposed for DNA extraction is a three-step adjusted procedure (Fig. 2). The crucial approach of our methodology relied on the optimization of both the cell lysis method and buffer, and the addition of PAC, which effectively removed more than 99% of the humic acids (Table 3). In the first step, which corresponds to cell lysis, mechanical and chemical/enzymatic techniques were carefully combined using an optimized lysis buffer. This buffer resulted from the suitable combination of chemicals and enzymatic reagents, namely Na2HPO4, CaCl2, proteinase K, lysozyme and RNase A, in addition to those compounds traditionally used in lysis buffers. The second step comprises the DNA recovery by precipitation after the elimination through centrifugation of the humic acids and other contaminants which can adsorb on PAC and/or flocculate with CaCl2. The use of PEG to precipitate the DNA also helps to increase its purity since it does not co-precipitate humic acids with the DNA as occurs with isopropanol or ethanol precipitation (LaMontagne et al. 2002). The third and final step corresponds to the purification of the crude DNA. This latter step quantitatively concentrates the DNA and also helps to eliminate additional contaminants that remain in the solution while DNA is selectively precipitated.

PAC is not the most frequently used removal agent for soluble humic components and other impurities in DNA extraction. However, it already proved its effectiveness in some previous works (Baker et al. 1992; Sharma et al. 2014; Devi et al. 2015), possibly due to its unique adsorption capacity. Other agents more often used to remove humic acids like CTAB and PVPP were also tested in this work. Although DNA with high purity was also extracted with these agents, much fewer amounts of DNA were obtained (not enough for further metagenomic studies) meaning that the DNA was probably lost during the extraction process.

The yield is an important factor in a DNA extraction process since DNA isolation from soil-like samples is often incomplete, suggesting that some DNA is inaccessible and therefore cannot be accounted for subsequent analysis. Recently, Sar et al. (2018) published an optimized protocol for soils contaminated with humic acids based on indirect methods of DNA extraction. However, using indirect methods, only bacterium was selected, thus excluding the other microbial diversity present in the soil. Therefore, the authors reported much lower DNA yields and purity (0.33 ± 0.04 µg g−1 with A260/280 1.45 ± 0.02 for salt lake wood; 0.55 ± 0.05 µg g−1 with A260/280 1.46 ± 0.04 for forest soil; and 0.81 ± 0.04 µg g−1 with A260/280 1.74 ± 0.02 for rice straw compost) when compared to those obtained with our optimized protocol (Table 3). Also, the fragmentation size is a key aspect since small fragments (less than 30 kb) are not suitable either for the analysis of complete genomes/metagenomes or to study and explore the complete or virtually complete metabolic pathways. High DNA yields were generally obtained with the use of severe mechanical treatments, such as sonication and mechanical bead beating, to effectively lyse microbial cells. However, such treatments can significantly shear DNA to sizes of 5–10 kb or less, being frequently reported as not suitable to extract DNA for subsequent metagenomic approaches (Liesack et al. 1991). Nevertheless, some mechanical treatments have been used and as can be noted, especially in the Terra Fértil sample, a poor fragmentation of the DNA can be obtained (Fig. 3, lanes 1 and 2) when suitable conditions are applied. This result suggests that the mechanical lysis conditions used in our optimized extraction protocol (glass beads and 5 min vortex) did not cause severe DNA shearing in the Terra Fértil sample. However, depending on the origin and composition of the sample, some DNA fragmentation can occur, as observed for the Lipor sample (Fig. 3, lane 5). This compost sample exhibited high heterogeneity including the presence of small branches and pieces of wood which may act themselves as a kind of “beads” in the mechanical lysis, thus promoting a higher fragmentation of the DNA. Nevertheless, when the agitation time of the mechanical lysis was reduced to half (2.5 min), a significant improvement was obtained due to the decrease of DNA fragmentation (Fig. 3, lane 4). This evidence confirms the importance of a skilled evaluation of the natural characteristics of the compost samples (e.g. moisture, microbial composition, humic acid concentration or heterogeneity/homogeneity of the sample) to strategically modify and adapt crucial steps of the extraction methodology to each type of sample.

To evaluate the purity of the extracted DNA, the ratios of absorbance at 260/280 nm (Table 3) and also the absorbance at 340 nm were determined (to indirectly assess the humic acid content). The results indicated that the extracted DNA presented adequate purity for metagenomic studies. The presence of humic acids greatly impacts the success of the DNA extraction as observed when commercial humic acids were added to the Terra Fértil sample. This addition affected the absorbance ratios (Table 3), but not the DNA size and quality (Figs. 3 and 4). The final concentration of DNA was also affected when commercial humic acids were added, even if the protocol proved to be efficient on the removal of the humic acids (> 99%). The detrimental effect observed for a higher concentration of humic acids may have resulted from the ability of these compounds to establish interactions with both DNA and enzymes. Therefore, some DNA may have been lost after binding to the humic acids adsorbed on the PAC (Saeki et al. 2011; Wnuk et al. 2020) and/or inefficient enzymatic lysis could occur due to the formation of enzyme-humic acid complexes (Li et al. 2013). However, despite the lower concentration of DNA, a strong and clean band with more than 48 kb was noted in the electrophoresis gel (Fig. 3, lane 2) and the DNA could be further used for downstream applications (Fig. 4). This means that this protocol has a wide range of applicability including using samples with different humic acid concentration.

The purity of the extracted DNA is an important aspect for further applications, since the presence of impurities, especially the co-extracted humic acids, strongly inhibit Taq polymerases (Tsai and Olson 1992). Therefore, the purity and quality of the metagenomic DNA from both compost samples were evaluated by PCR amplification of the bacterial 16S rRNA using universal primers. A fragment of 1400 bp of bacteria was successfully amplified for all the tested conditions (Fig. 4). Additionally, the purity and suitability of the extracted metagenomic DNA were investigated through the construction of fosmid libraries. The libraries were constructed for both Terra Fértil and Lipor (2.5 min of mechanical lysis) samples, and a fragment with approximately 40 kb was successfully inserted (Supplementary Fig. S1). The preparation of metagenomic libraries with large inserts will increase the chances of discovering novel biocatalysts and biosynthetic pathways, also leading to the discovery of new genes.

The microbial composition obtained for the isolated DNA is strongly dependent on both the sample origin/composition and the efficiency of the lysis/extraction methodologies. To evaluate the potential of the optimized protocol to access DNA considered as representative of the microbial communities commonly present in soil-like samples, a shotgun sequencing analysis was performed. After taxonomic annotation, it was possible to obtain the most dominant and assigned phyla. A comparable microbial composition was found for Terra Fértil and Lipor samples (Fig. 5). In both cases, the dominant phylum was Proteobacteria followed by Bacteroidetes. The phyla Actinobacteria and Firmicutes are also present but in a lower percentage. For the Terra Fértil sample, an additional phylum was detected, namely the recently proposed phylum Balneolaeota (Hahnke et al. 2016) identified in extreme environments (Sorokin et al. 2018). Furthermore, the microbial compositions obtained for Terra Fértil and Lipor samples are also comparable with those reported in the literature for similar samples containing chipped wood (Montella et al. 2017), sugarcane bagasse (Soares et al. 2018), corn stover (Zhu et al. 2016), apple pomace (Zhou et al. 2017) or wood detritus (Oh et al. 2017) (Fig. 5). Although the detected differences in terms of abundance, the dominant assigned phyla are the same which indicates that the optimized protocol, the classical methodologies (Zhou et al. 2017; Soares et al. 2018) and the commercial kits (Zhu et al. 2016; Montella et al. 2017; Oh et al. 2017) have equivalent performance when accessing DNA from the microbial communities.

Another important feature to be highlighted in our optimized method of DNA extraction is the considerable time-consuming improvement. This method proved to be highly efficient and fast (only 4 h of bench work), not including additional and expensive steps of purification. All the procedure can be performed using commonly standard molecular biology equipment and reagents, thus reducing the associated costs. Furthermore, it was demonstrated that under suitable conditions of mechanical lysis, the DNA fragmentation can be minimized and the recovery yield increased.

In conclusion, the humic acids and phenolic compounds naturally present in soils are hard to remove, making DNA extraction and purification a critical step in the success of several metagenomic approaches. Our optimized methodology for DNA extraction was developed to provide a rapid, efficient and cost-effective procedure for total DNA extraction resulting in high purity, yield and molecular weight DNA. This novel protocol can be adapted and successfully applied to different compost samples rich in humic acids to obtain DNA representative of the microbial compositions and suitable for functional metagenomic studies.