1 Introduction

Open-pit coal mining drastically affects terrestrial ecosystem functioning worldwide (Singh and Singh 2006; Maiti 2013). During open-cast coal mining, large amounts of spoil material overlying coal layers are excavated and deposited aboveground. This process adversely affects existing ecosystems either by destroying or burying them under spoil material deposits. Restoration of soil biological activity requires some human intervention for returning disturbed soil to “nearly-natural” ecosystems; however, this is a long and expensive process (Doong and Lei 2003; Vázquez et al. 2013). To implement good soil restoration practices, an initial approach will require comparing complex biotic and abiotic soil interactions between undisturbed and disturbed ecosystems.

Coal mining companies in the dry Colombian Caribbean region produce about 90% of the coal extracted in the country (Agencia Nacional de Mineria 2017; UPME 2016). Díaz et al. (2013) determined that coal mines significantly decreased soil quality, causing its physical degradation and low fertility. Plant populations and the agricultural value of these soils are also negatively affected due to soil degradation and contamination (Betancur and Escobar 2013; Holguin 2011). For example, soil pollution by heavy metals represents an environmental risk because metals are not biodegradable and bioaccumulate, ultimately deteriorating soil quality (Bhattacharyya et al. 2003; Kahn et al. 2010; Ali et al. 2013). Minimizing environmental damages and reconstructing soils after coal mining are of utmost importance, but an initial soil biotic and abiotic characterization is always required. This study provides the baseline for sustainable soil management after coal mining in the dry Colombian Caribbean region. We used physical, microbial, and chemical characterizations of two contrasting soil ecosystems, native dry vegetation, and spoil material deposits under recovery.

Soil microbial communities are essential, as they are responsible for physiological and metabolic processes of great importance for soil quality (Bundy et al. 2009; Lauber et al. 2009; Ranjard et al. 2003; Rutgers et al. 2016), while supporting key ecosystem functions such as nutrient cycling, water infiltration, decomposition, soil respiration, and plant growth, among others (Guerra et al. 2020). When soil degradation happens, it is often irreversible, causing a substantial loss of plant and microbial biodiversity, and their ecosystem functions. After mining, soil’s surface layers are rich in pyrite (FeS2) that later oxidizes to sulfuric acid when exposed to oxygen and water. This process results in high levels of soil acidification (Fanning and Fanning 1989), loss of soil structure, a decrease in soil organic matter (SOM) content, and a low rate of water infiltration—ultimately leading to an increased nutrient leaching and soil erosion (Guebert and Gardner 2001; Kampf et al. 1997). Also, the level of stability, functional compatibility, and/or redundancy of microbial communities will affect the availability of soil nutrients, SOM (soil organic matter), and soil physical stability (Griffiths and Philippot 2013; McGuire and Treseder 2010; Shade et al. 2012; Tardy et al. 2014). SOM is critical for microbial communities; at the same time, these communities aid to increase SOM content, usually deficient in post-mined soils. Assessing the ecosystem functions of soil microorganisms at the site of mining helps to understand soil rehabilitation processes and informs the management of these areas (Gomez et al. 2004; Mishra and Nautiyal 2009; Peng et al. 2015).

Although at a global scale the effects of post-mining rehabilitation treatments in soil biodiversity are not well understood, regional- and local-scale studies show some patterns. Andrés and Mateos (2006) found that even after 12 years, four different post-mining treatments did not lead to the recovery of soil mesofauna biodiversity. In contrast, Madej et al. (2011) found that a soil quality index, based on the biodiversity of soil microarthropods, significantly increased on a post-mining chrono-sequence after abandonment. Kneller et al. (2018) found organic amendments either of plant material or topsoil increased important ecosystem functions of soil microorganisms and their microbial activity. For example, N and C contents increased resulting in plant growth and survivorship after soil was treated with these amendments. Co-introductions of native soil biota and vegetation seem to have the best results when restoring diverse native vegetation on post-mining landscapes (Vahter et al. 2020). However, responses also seem dependent upon the restoration strategy used, time that elapsed after mining, and intensity and type of mining, among other local factors. More baseline studies on the effects of different post-mining restoration strategies on soil biodiversity and ecosystem functions are needed.

Here we ask: are there any differences between the microbiomes in a coal mine and adjacent natural areas? And can we evaluate the effects of post-mining treatments on soil biodiversity and ecosystem functionality? The aim of this study was to identify changes in soil microbial community composition, and taxonomic and functional diversity resulting from land rehabilitation after coal mining and compared them to predisturbance levels from a reference site covered by dry forest vegetation. We sampled two study sites 7 years after rehabilitation efforts, and a reference dry forest site associated with a coal mine located in La Guajira, northern Colombia. We predict the dry forest site will have a higher diversity of fungi and bacteria related to soil formation, organic matter decomposition, and symbiosis. While the rehabilitation site will have stress-tolerant microbiomes. Environmental rehabilitation comprises topographic reformulation after the removal of the coal and revegetation with mesquite trees.

2 Materials and Methods

2.1 Study Site

The study site lies within an open-pit coal mine (300 ha) located at Palmito, Barrancas municipality in La Guajira Department, northeastern Colombia (11°02′N, 72°40′W). The average altitude of spoil heaps is 240 m.a.s.l., and mean annual precipitation varies between 500 and 1000 mm, with bimodal seasons of 50–100 rainy days per year. The mean annual temperature is 26 °C. The coal mine is divided into seven main zones: administrative (1), active extracting (2), backfill area (3), southern spoil dump (active) (4), northern spoil dump top (inactive, revegetated) (5), and northern spoil dump terraces with a slope (inactive, revegetated) (6), and a native dry forest (7) next to the mine (Fig. 1).

Fig. 1
figure 1

Main study areas delimited in red.1. Administrative zone, 2. Active extraction zone, 3. Backfill area, 4. Southern spoil dump (active), 5. BiT, northern spoil dump tops 6; Bi, northern spoil dump terraces; Bi represents a different study treatment to BiT in our study because of the slopes, 7. B, native forest. Red lines indicate inactive dumps undergoing recovery and native forest

This study was performed on three of these seven sites: the top northern inactive spoil dump (zone 5, hereafter BiT, initials refer to Spanish translation “Botadero inactivo Top”), its revegetated terraces (zone 6, hereafter Bi, initials refer to Spanish translation “Botadero inactivo”), and a control site of native dry forest (zone 7, hereafter B, initials refer to Spanish translation “Bosque”) (Fig. 1).

This region is part of th Perijá mountain range, an area with high anthropogenic intervention due to coal mining. A total of 16 tree species have been identified around coal mines and in native dry forests such as Spondias mombin, Ficus glabrata, Casearia corimbosa, and Guasuma ulmifolia (Vargas 2011). The Bi dump was closed in 2010 and its six terraces were revegetated with mesquite trees (Prosopis juliflora), but some other vegetation has been naturally colonizing the area ever since.

2.2 Experimental Design

A total of 18 soil samples were collected in February 2018 from three sites: northern spoil dump (BiT), terraces of the northern spoil dump (Bi), and a control site of native dry forest (B) (Fig. 1). BiT and Bi were considered as different treatments since Bi had a 45–65° slope that potentially affects physicochemical and microbiome analysis due to water erosion unlike in BiT. Six composed samples were collected on each site, choosing six random points, and trying to cover as much area as possible. Each of the 18 individual soil samples (3 sites × 6 samples each) included four pulled soil subsamples. These samples were collected in a zigzag random walk, always removing top organic debris, and sampling up to 10 cm soil depth; this depth concentrates most of the roots of the revegetated plant species. The four soil subsamples were combined and sieved through a 2-mm mesh, thoroughly homogenized, and placed until full in sterile 50-mL conical centrifuge tubes, and immediately stored in ice. The remaining soil for each was used for soil physicochemical analyses. Later in the day, samples were flash-frozen in liquid nitrogen for transportation to the laboratory. Once in the laboratory, tubes were stored at − 20 °C until DNA extraction and sequencing were performed.

2.3 Soil Physicochemical Analyses

For physicochemical analysis, three replicates per treatment were analyzed for a total of nine samples. Physicochemical characterization of soil samples was carried out in accordance with the Colombian technical standards (NTC 5526:2007, NTC 5596:2008, NTC 5349:2016, NTC 5350:2020; Instituto Colombiano de Normas Técnicas y Certificación – ICONTEC 2007, 2008, 2016, 2020) at the Laboratorio de Química de Suelos, Aguas y Plantas, of Agrosavia, Tibaitata, Colombia. The following physicochemical parameters were determined on each soil sample: texture (Bouyoucos 1936), pH in a 1:5 ratio of soil:water using a Corning pH-meter 340, electrical conductivity with a conductivimeter (720 WTW), soil organic carbon (Walkley–Black chromic acid wet oxidation method; Nelson and Summers 1982), available P (Bray II method: extraction with HCl + NH4F and ascorbic acid, measured in a spectrophotometer at 800 nm), Fe, Cu, Zn, Al, and Mn (quantification with atomic absorption spectroscopy—AAS 240/280, Agilent, USA); S and B (extraction with calcium phosphate monobasic, quantified in a spectrophotometer UV–vis) and Na, K, Ca, and Mg (extraction with 1 M ammonium acetate at pH 7, quantification with AAS). Cation exchange capacity was calculated as the sum of cations (Al, Na, K, Ca, Mg). For metal concentration in the soil, only two replicates per treatment could be analyzed. Cd determination was carried out by the acid digestion method (Díaz et al. 2013) and flame atomic absorption spectroscopy (FAAS, AA-7000, SHIMADZU, Japan) and Pb by the closed digestion method (with nitric:hydrochloric acid; Dolgopolova et al. 2004) and inductively coupled plasma emission spectrometry (ICP-OES, iCAP 6500, Thermo Scientific, USA).

Due to sampling and analysis limitations, soil chemical analyses were done for samples shown in Tables S1 and S2. Thus, these analyses should be taken as a general characterization of the sites but cannot be used for further analyses (i.e., to relate them with microbial community parameters).

2.4 DNA Extraction, PCR Amplification, and Sequencing

DNA was extracted for each sample from 0.25 g of soil using the Power Soil DNA isolation kit (MoBio Laboratories, Inc., Carlsbad, CA), following the manufacturer’s instructions. All genomic DNA concentrations were quantified in a Qubit 2.0 Fluorometer (Invitrogen, NY, USA). Ribosomal RNA (rRNA) sequenced markers were amplified from the total DNA in each sample using PCR with barcoded primers unique to each sample. The 16S rRNA gene V4 region was amplified from all 18 samples, while the ITS1 sequence was amplified from 9 out of the 18 samples (three samples by site). To obtain PCR products, jagged ends of DNA fragments were converted into blunt ends using T4 DNA polymerase and Klenow Fragment, and T4 Polynucleotide Kinase. Then, an “A” base was added to each 3′ end to make it easier to add adapters. Fragments that were too short were removed by AMPure beads (Thermo Fisher Scientific Inc. Madrid, Spain). Fused primers with dual indexes and adapters were used for PCR, and fragments too short were also removed by AMPure beads. In both cases, qualified libraries were used for sequencing. Fragments were sequenced with Illumina Hiseq 2500 technology 250 bp long pair-ended reads. Amplification and sequence analysis was completed by the BGI Life Tech company, Hong Kong.

2.5 Bioinformatic Analyses

Raw data was preprocessed to get clean reads. All sequenced reads with an average quality Phred score ≤ 20 over a 25 bp sliding window were truncated. Trimmed reads having less than 75% of their original length, reads contaminated by adapters, reads with ambiguous bases (N base), and reads with 10 consecutive equal bases were removed.

Clean sequences were processed using DADA2 (Callahan et al. 2016) designed to resolve exact biological sequences from Illumina sequence data and not involve sequence clustering. All pipelines used in this study are available at: https://github.com/TheAriasLab. Clean paired sequences were processed using the following DADA2 parameters: filterAndTrim based on quality plots; the forward and reverse sequences were trimmed to lengths between 200 and 240 bp. Filtering of sequences was set to maxN = 0, maxEE = 2 (for both reads), truncQ = 11; where maxN was the maximum number of “N” bases, and maxEE corresponded to the maximum expected error calculated from the quality score (EE = sum [10ˆ(-Q/10)]. The truncQ parameter truncated reads at the first instance of a quality score ≤ 11. Other parameters were set to default. Error rates were estimated by learnErrors, where the nbases parameter was set to 10ˆ8.

Sequences were de-replicated using the DADA2 command derepFastq with default parameters. Exact sequence variants were resolved using the DADA2 command dada. Next, chimeras were removed with the DADA2 (Callahan et al. 2016) command removeBimeraDenovo. By applying a consensus method, taxonomy was assigned against version 132 of the Silva database for bacteria released in July 2017 (Glöckner et al. 2017), and against the UNITE database for fungi released in February 2019 (Nilsson et al. 2019) using the command assignTaxonomy from DADA2. Taxa abundances were transformed into percentages.

2.6 Alpha and Beta Diversity Analyses

Species accumulation curves (Ugland et al. 2003) were produced using Vegan (Oksanen et al. 2010) in R Studio (RStudio Team 2021). Summaries of taxonomic composition were produced with Phyloseq v 1.16.2 (McMurdie and Holmes 2012) and plotted in R Studio. Alpha diversity was estimated for each sample with the Phyloseqestimate_richness command. Beta diversity was evaluated using Phyloseq v 1.16.2 as unweighted UniFrac distance (Lozupone et al. 2007), using plot_ordination commands. Weighted UniFrac distance measures were also calculated: the software weights phylogenetic lineages that are not shared between pairs of samples by their abundance. Additionally, correlations to test if the structure of microbial communities was influenced by environmental variables using Adonis in Vegan were also implemented.

Normality for each biodiversity index (Chao1, Shannon, and Simpson; calculated using Vegan) was calculated using the Shapiro–Wilk test with the R base command shapiro.test(x), while homoscedasticity for each index was examined with the Levene test using the R Studio CAR package (Fox and Weisberg 2019) v 3.0–11 (using the command LeveneTest).

2.7 GeoChip 5.0S Hybridization and Data Preprocessing

Two samples, B (native dry forest) and Bi (northern spoil dump terraces), were analyzed using GeoChip 5.0S (Tu et al. 2014). Each sample was the product of combining all six replicates for each treatment. GeoChip 5.0S gene micro-arrays contain 167,044 probes targeting functional genes assigned to several gene categories: carbon (degradation, fixation, methane), nitrogen, sulfur, and phosphorus cycling, energy metabolism, metal homeostasis, organic remediation, secondary metabolism (e.g., antibiotic metabolism, pigments), stress responses, viruses (both bacteriophages and eukaryotic viruses), virulence, and others (phylogenetic genes and CRISPR system). All hybridizations were performed at 42 °C with 40% formamide for 16 h on a MAUI hybridization station (BioMicro, Salt Lake City, UT, USA). After hybridization, the arrays were scanned (NimbleGen MS 200, Madison, WI USA) at 100% laser power. The numbers were normalized by signal intensity, and all showed positive probes detected on each sample. Probes were removed as negative if the SNR < 2 or the signal < 200 or < 1.3 times the background. No further data transformation was performed (He et al. 2010; Lu et al. 2012). Singletons or positive probes detected only in one of the samples were discarded prior to statistical analyses to remove noise from the dataset. All procedures were performed at Glomics Inc. (Norman, OK, USA).

2.8 Differential Abundance Analysis

EdgeR version 4.1 (Robinson et al. 2010) was used to find taxa and genes with differential abundance among pairs of locations. Either (BiT or Bi: treatments) vs. the native dry forest (B: control). Only, significantly abundant taxa with a logFC > 3 or a log FC < -3 and with FDR <  = 0.01 were considered.

2.9 Association Analysis

Association analyses were performed by filtering ASVs not assigned to a genus, and those with less than 10 counts in all samples. All ASVs with a prevalence of less than 15% in samples were also filtered. Counts were normalized using a center log ratio (CLR) transformation. Spearman’s rank correlation was performed using distances, and only significant associations with coefficient > 0.6 (FDR <  = 0.01) were considered.

2.10 Metabolic Potential Analysis for Bacterial Microbiomes

Metabolic potential for microbial communities was calculated using the software Picrust v2.0 (Douglas et al. 2020). This analysis shows the abundance of predicted genes in each sample. Genes with less than 10 counts across samples and genes present in less than 10% of the samples were filtered, and also gene abundance with low variance among samples (less than 10%; based on inter-quantile range) was filtered. All other data were normalized using a centered log ratio (CLR) normalization. Grouping analyses were also performed using the Bray–Curtis distance. Predicted genes were identified with KEGG release 93.0 (Kanehisa et al. 2016) and grouped according to metabolic function and relative abundance on each site. To search for key genes of interest, an abundance differential analysis between Bi vs B and B vs BiT was completed using EdgeR version 4.1 (Robinson et al. 2010). Correlation analyses were implemented using Spearman’s correlation ranks. The top significant genes were reported here: the first 10 are positive and the last 10 are negative.

3 Results

3.1 Soil Physicochemical Analyses

Soil texture class includes loam-clay for treatments BiT3 and Bi3, loam in Bi1, to loam-silty in B (native dry forest) (Table S1). The amount of Pb in the soil for different treatments ranged between 23.31 and 23.81 mg/kg, and Cd between 1.32 and 3.01 mg/kg, finding the lowest amount for both metals in the native dry forest (B) (Table S2). Most sites had an alkaline pH (7.26–8.15), high values of S, Zn, and Mg content, and high cation exchange capacity (27.93–48.55 cmol(+)/kg), as well as low Ca content (21.66–42.92 cmol(+)/kg) (Table S1). Soil chemical properties in Bi replicates were similar, while those of BiT were heterogeneous. Electric conductivity was five times greater in disturbed soil samples BiT1, Bi1, and Bi3 than in native forest samples B3 and B6 (Table S1). For heavy metals, boron had high values in all sampled sites, while zinc had high values only in BiT. Potassium was low in all samples (Table S2).

3.2 Soil Microbial Diversity and Community Structure

3.2.1 Sequencing

Four million paired-end reads were obtained with read lengths around 250 bp for the 16S rRNA V4 region of Bacteria and Archaea (six replicates per treatment), and the ITS1 region of fungi (three replicates per treatment for Bi and B). However, in fungi, one of the replicates in BiT (top dump) could not be sequenced; this treatment had only two replicates to work with. After filtering singletons, the total number of 16S reads ranged between 1,202,013 in sample B1 and 1,770,139 in sample B5. While for ITS1 ranged between 1,618,254 in Bi6 and 2,158,687 in BiT5. A total of 62,756 16S amplicon sequence variants (ASVs) were found, and taxonomic assignments to kingdom include Archaea 516; Bacteria 61,384; Eukaryote 402; and non-assigned 454. For ITS, a total of 6084 fungal ASVs were identified.

3.2.2 Rarefaction Curves

Species saturation reached for both fungi and bacteria a sequencing depth of 500,000 sequences (Fig. S1). Species number at saturation was lower in Bi than in BiT and B for both 16S- and ITS-based curves. For the 16S, the species number at saturation in BiT is lower than in B (Fig. S1A).

3.2.3 Taxa Relative Abundance

The relative abundance of fungal and bacterial taxa represented more than 0.1% of the total reads. For fungi, the most abundant phyla included Ascomycota and Basidiomycota. Dominant fungal genera included Cladosporium, Nothophoma, Mycocentrospora, Ceratobasidium, Fusarium, and Volutella (Fig. 2A). Cladosporium was well represented in all sampled sites, while Volutella and Strelitziana were quite abundant in B1 (Fig. 2A). For bacteria, most represented phyla include Acidobacteria, Actinobacteria, Bacteroidetes, Chlorolexi, Firmicutes, Gematimonadetes, Planctomycetes, Proteobacteria, and Thaumarchaeota. The genus Microvirga was the most abundant, and Actinomadura, Actinoplanes, Azohydromonas, Byssovorax, Cardidatus, Coochioplanes, Criptosporangium, and Microvirga were also well represented (Fig. 2B).

Fig. 2
figure 2

Relative abundance for genera (relative abundance > 0.1%) expressed as the percentage of recovered sequences in each sampled site. Tropical dry forest (B) sequenced replicates (B1, B3, B5); coal mine dump terraces (Bi) sequenced replicates, for this treatment only two replicates were recovered (Bi5, Bi6); top dump (BiT) sequenced replicates (BiT2, BiT3, BiT5). Panel B shows results for bacteria, and their six replicates in each location (subindexes 1–6)

Analysis of relative abundance for dominant taxa gave a general and qualitative description of the microbial profile in all sampled sites. Next, a differential abundance analysis was performed to test if the qualitative differences observed were statistically significant. Fungal and bacterial ASVs abundance was compared among BiT or Bi, to those from B (Table 1). Bacterial ASVs with differential abundance among BiT or Bi and B did not represent dominant genera and included Ureibacillus, Paraburkholderia, Pelagibius, and Kallotenue among others. The top fungi ASVs displaying differential abundance between Bi and B was Fussariella decreasing in Bi compared to B (FC = 8.5, FDR = 0.008). ASV from non-dominant genera included Podospora, Candida, Alternaria, and Zygosoporium among others (Table S3).

Table 1 Significantly different abundance of bacteria and fungi identified to genera between locations. FC, fold change, FDR, false discovery rate; significant results imply abs (FC) ≥ 3, FDR ≤ 0,01. B, native dry tropical forest; BiT, top dump; Bi, dump terraces

3.2.4 Alpha Diversity Indexes

For bacterial communities, there was a significant difference in indexes between treatments (Chao1 p-value 0.0001, Simpson p-value 0.004, and Shannon p-value 0.001) and data between samples grouped by sampling site. There were significant differences in the Chao1, Simpson, and Shannon indexes between samples BiT and Bi (p-value 0.0004, p-value 0.009, p-value 0.004, respectively), and between BiT and B (p-value 0.005, p-value 0.007, p-value 0.01) (Fig. 3A, B). Fungal communities showed significant differences with the Chao1 index (p-value 0.025) when samples were grouped by sample sites, but not between pairs of comparisons.

Fig. 3
figure 3

Alpha diversity calculated for fungi and bacteria communities. Shannon, Simpson, and Chao1 indexes were calculated for each treatment: tropical dry forest (B), coal mine dump terraces (Bi), and top dump (BiT), and their replicates (black dots). Panel A: alpha diversity distribution in bacterial communities for six replicates. Panel B: alpha diversity values in fungal communities for three replicates in B and BiT, but only two in Bi, since for this treatment data for only two replicates was recovered during sequencing. Values of the Shannon index for fungal communities in the Bi replicates overlap. **Significantly different values (p < 0,01); ns, non-significant

3.2.5 Beta Diversity for Bacterial and Fungal Communities

Weighted UniFrac distance shows bacterial (Adonis R2 = 0.5, p-value = 0.001) and fungal (Adonis R2 = 0.4, p-value = 0.006) communities at each site were different when compared to other sampled sites (Fig. 4). Overall, most samples were grouped within their site, except for BiT6, grouped among Bi samples. BiT and Bi microbial communities are more similar than when compared to B.

Fig. 4
figure 4

Beta diversity calculated for fungal and bacterial communities using weighted UniFrac distance for tropical dry forest (B), coal mine dump terraces (Bi), and top dump (BiT). Panel A: beta diversity values in fungal communities for three replicates in B and BiT, but only two in Bi, since for this treatment data for only two replicates was recovered during sequencing. Panel B: beta diversity for bacterial communities and their six replicates in each location

3.2.6 Association Analysis

Comparing bacteria from B and BiT samples, 285 ASVs from Romboutsia, Pseudenhygromyxa, and Parasegetibacter among other genera had a negative association (FDR <  = 0.01) with BiT. Additionally, 280 ASVs from bacterial genera Marisediminicola, Spirosoma, and Methyllocella, among others, had a positive association (FDR <  = 0.01) with BiT (Fig. 5A). When analyzing the correlation of bacteria between Bi and B samples, 111 ASVs were negatively associated with Bi, like Romboutsia and Acidicaldus, and 125 ASVs were positively associated with Bi, including Tetrasphaera, Sulfurirhabdus, and Methyllocella (Fig. 5B) (Table S4). No significant associations were found for fungi when BiT, Bi, and B samples were compared with an FDR <  = 0.01. However, when the FDR was relaxed to FDR <  = 0.05, 24 ASVs from Zygosporium, Podospora, and Candida were negatively associated with BiT, while Alternaria was the only fungal genus positively associated (Fig. 5C).

Fig. 5
figure 5

Abundance association analysis of amplicon sequence variants (ASVs) using Spearman’s rank correlation. The 25 more significant values (with lower p-value) are shown for bacteria. Panel A: comparing the top dump (BiT). vs tropical dry forest (B) sites; and panel B: the coal mine dump terraces (Bi) vs tropical dry forest (B). For fungi, panel C compares top dump (BiT) vs tropical dry forest (B)

4 Functional Diversity of Soils Assessed by GeoChip 5.0S Analysis

The total number of genes detected in the native dry forest (B) was higher than in revegetated dump terraces (Bi) for all gene categories (Table 2). The number of bacterial and fungal-related genes in B was higher than in Bi. In contrast, Archaeal-related genes were more abundant in the dumping site (Table S5). Most abundant genes for forest and dump were related to carbon cycling, followed by organic remediation, while less abundant genes in both forest and dump were related to secondary metabolism and phylogenetic informative genes (Table 2).

Table 2 Number of detected genes using GeoChip 5.0S in a native dry tropical forest (B) and an inactive dump (Bi) undergoing recovery in a coal mine of La Guajira, Colombia

4.1 Metabolic Potential Analysis for Bacterial Microbiomes

Using Bray–Curtis distance for gene abundance, samples were grouped by sampled sites (Bi and BiT: dumps, and B: forest). Bi and BiT were clustered but each treatment was separated into distinct groups, different from samples collected in the forest (B) (Fig. 6A). There was no qualitative difference in the relative abundance of metabolic categories, indicating differences could be related to individual genes rather than to broad categories (Fig. 6B). Therefore, we performed a statistical differential abundance analysis and compared individual gene abundance among BiT or Bi, and B. Top genes with significant differential abundance (p < 0.01) among BiT or Bi and B included K12673 N2-(2-carboxyethyl) arginine synthase and K16133 microcystin synthetase protein. McyI increased in treatments and K10842 CDK-activating kinase assembly factor MAT1 and K19658 peroxisomal enoyl-CoA hydratase 2 decreased in treatments (Table S6). Top genes that had a statistically significant positive association (p < 0.01 and correlation > 0.6) with BiT or B included K19428 sugar transferase EpsL, K19576 MFS transporter, DHA1 family, and quinolone resistance protein. Those with a negative association with treatments included K06292 spore germination protein BB and K15076 elongin-A among others (Table S7).

Fig. 6
figure 6

Metabolic potential of bacterial communities calculated using the software Picrust v2.0 (Douglas et al. 2020) for tropical dry forest (B), coal mine dump terraces (Bi), and top dump (BiT) and their replicates. Panel A: dissimilarity of gene abundances using hierarchical clustering of Bray–Curtis. B KEEG metabolic term relative abundance

5 Discussion

Here we provide an analysis of soil microbiomes using metabarcoding in the biggest coal mine exploitation in Colombia. We used comparisons between two contrasting sites, native dry forest vegetation and disturbed coal mining ecosystems undergoing recovery. This work uses alfa and beta diversity of microbial communities and relates diversity with the potential functional roles of microbial communities.

One of the most surprising findings in our results was how alpha bacterial and fungal diversity were differentially affected. Bacterial alpha diversity was higher in the inactive spoil dump (BiT), while was similar between the revegetated dump terraces (Bi) and the native dry forest (B), despite soil organic matter and available phosphorous being the lowest at the spoil dump (BiT) (Table S1). Although this might sound contra-intuitive (Li et al. 2014), previous studies have shown that in areas most contaminated by metals, bacterial diversity was also not affected (Benidire et al. 2020). It has been reported that there is not a linear relationship between the level of metal contamination and bacterial biodiversity (Radeva et al. 2013). Also, it is possible that bacterial taxa inhabiting this system thrive under low organic matter conditions, or that cations, which were similar across treatments, had a higher effect on bacterial alpha diversity than macronutrients. Thus, this is an aspect for future exploration in the specific context of our study. After 7–8 years of ceasing activities, it could mean that levels of metal pollution are not so high, or different from revegetated and native dry forest treatments, as reflected in our results. However, replicates for this study only allowed a general characterization of soil chemistry, and no statistical analyses were possible as sampling for soil chemistry was unbalanced (Table S1, S2). Also, it could mean that the native forest is highly disturbed by mining activities taking place nearby.

Fungal alpha biodiversity did not differ between the native dry forest and the inactive spoil dump. Some studies have shown heavily metal–polluted mine soils reduce fungal biodiversity while increasing the tolerance of some fungal taxa, like Aspergillus spp., to some of those metals (Iram et al., 2009). Also, phytoremediation of metal-polluted soils seems to increase fungal biodiversity, especially ectomycorrhizal and saprophytic fungi (Gil-Martínez et al. 2021). While also showing some fungal tolerance to metals, other studies did show a relatively high fungal biodiversity in mine soils (Taleski et al. 2020). For example, Vahter et al. (2020) showed arbuscular mycorrhizal fungi (AMF) were resilient to very polluted mining conditions in Estonia. In contrast to our results, Guo et al. (2020) have shown that for AMF, alpha biodiversity is higher in undisturbed areas compared to mining dumps. It is also possible that, as the cation contents were similar across treatments, and some fungal guilds are more influenced by cations than C and N contents (Marín and Kohout 2021), this might result in no differences in fungal alpha diversity.

Overall, the most abundant bacterial phyla included Acidobacteria, Actinobacteria, Bacteroidetes, Chlorolexi, Firmicutes, Gematimonadetes, Planctomycetes, Proteobacteria, and Thaumarchaeota, which also represent most abundant and ubiquitous bacterial phyla in diverse soils (Fierer and Ladau 2012; Lauber et al. 2009). Proteobacteria, Actinobacteria, and Acidobacteria are the top three most common phyla at a global scale (Delgado-Baquerizo et al. 2018). Bacterial genera distribution across the three sampled sites was generally equal (i.e., major bacterial genera having similar abundance across treatments), with the dominance of two or three phyla across sampled sites. Other studies have shown that Acidobacteria abundance in soils is mainly influenced by pH, with subdivisions 1 and 3 being more abundant in acidic soils and subdivisions 4 and 6 being more abundant in neutral soils (Jones et al. 2009; Kishimoto et al. 1991; Sait et al. 2006). Members of Acidobacteria have been reported to be dominant in neutral soils of semiarid regions, including cyanobacterial crusts (Gundlapally and Garcia-Pichel 2006; Moquin et al. 2012). In our study, Acidobacteria were highly abundant. This pattern could be common across the northeastern Colombian region. Lastly, members of Actinobacteria also are highly abundant bacterial phyla across our results; they have been isolated from abandoned mining areas and shown resistance to metals like Pb, Cr, Zn, and Cu (El Baz et al. 2015).

Although there were no significant differences in fungi alpha diversity between native dry forests and spoil dumps, samples were grouped together within each site, and beta diversity and the distribution of fungal genera were different between treatments. Genera like Cladosporium, Mycocentrospora, Nothophoma, Ceratobasidium, Fusarium, Preussia, and Volutella showed different abundances among sampled sites. Intriguingly, none of these genera appeared in the association analysis for fungi.

The plant-fungal pathogen Cladosporium displayed resistance against drugs (Talbot et al. 1988) and metals (Shao and Sun 2007), and it is currently being used to remove such metals from water bodies (Mota et al. 2020). This genus was more abundant in one replicate of the native dry forest and one of the revegetated dump terraces. Fusarium is also widely known for its ability to degrade pollutants like polycyclic aromatic hydrocarbons and others (Wei et al. 2017; Zhang et al. 2012; Zhao et al. 2017). Ceratobasidium which could be plant pathogens, saprotrophs, and orchid symbionts has been associated with mining soils elsewhere (Ouanphanivanh et al. 2013). From the association analysis for fungi stand out some saprophytic (and often hallucinogenic) genera like Psilocybe and Conocybe, which are often indicative of advanced decomposition processes and/or decomposition associated with feces. Goats are often found in this area and may be responsible for the abundance of these fungal genera. Overall, fungal genera included a plethora of functional guilds: plant pathogens (Zygosporium, Thecaphora), animal pathogens (Podospora, Candida), and coprophilous fungi (Ascobolus, Polytolypa, Psilocybe, and Conocybe), among others.

A surprising result was that metabolic potential did not differ among treatments, but functional diversity did. Environmental constraints (i.e., metal concentration, harsh weather conditions) suppressing the development of metabolic functions could be responsible for this observation. It is worth noting that Picrust v. 2.0 is a purely speculative tool, of the “potential” of soil microbial communities, and does not represent what is happening in them. In addition, important critiques to possible flaws of this technique have emerged recently (Louca et al. 2018; Sevigny et al. 2019). Our Picrust v. 2.0 analyses show no differences in the bacterial metabolic potential among different treatments: untreated spoil dumps, revegetated terrace dumps, and native dry forests. KEGG predictions for more abundant bacterial phyla found in soil (Delgado-Baquerizo et al. 2018), such as Acidobacteria (Lladó et al. 2016) showed a prevalence of amino acid, carbohydrate, energy, cofactors, and vitamin metabolites, which are commonly dominant (Wrighton et al. 2012). Furthermore, these major metabolic types are crucial for all soil microorganisms, so an additional exploration of specific metabolic and physiological processes—related to bacterial resistance to heavy metals—is necessary. Three treatments overall did not affect bacterial genera distribution; this result is not surprising since metabolism and physiology usually are phylogenetically constrained.

In contrast with bacterial metabolic potential results—and with alpha bacterial diversity results—GeoChip 5.0S analysis showed the native dry forest had a higher number of genes related to carbon cycling, metal homeostasis, phosphorus and sulfur cycling, organic remediation, N cycling, among other processes, compared to the revegetated dump terraces. This result shows the importance of analyzing biodiversity by different metrics of taxonomic, phylogenetic, and, as in this case, functional diversity. To properly address nutrient cycling and other ecosystem processes, as well as ecological restoration targets, it is crucial to establish causal paths between those processes and soil microbial community attributes and ecosystem functions/services (Xu et al. 2020). On a global scale, soil biodiversity and ecosystem functions are rarely addressed at the same time (Guerra et al. 2020; but solutions have been proposed, see Guerra et al. 2021). Hall et al. (2018) propose that microbial processes (i.e., nitrogen fixation) are more directly related to a nutrient pool or cycle than microbial community properties (i.e., biomass C:N ratio, functional gene abundance), and microbial membership (i.e., taxonomic and phylogenetic diversity, community structure, co-occurrence networks), the last two having effects on nutrient cycling via their effect on microbial processes. Our study corroborates this view, as differences among treatments were higher when microbial processes/properties were, directly and indirectly, quantified, compared to plain alpha/beta diversity measurements and metabolic prediction.

6 Conclusions

Microbial diversity in degraded soils is usually lower than in fertile, healthy soils, supporting plant growth. Regarding this, we overall found that (i) bacterial alpha diversity was surprisingly higher in the spoil dump, while fungal alpha diversity was similar between that treatment and the native dry forest. ii. Some fungal genera might be more sensitive to the different treatments, as they were more unequally distributed compared to bacterial taxa. (iii) Some bacterial and fungal genera positively and negatively correlated to different treatments could be good bioindicators of the mining effects in soil. (iv) There were no significant differences regarding the metabolic potential of bacterial communities across treatments—but the technique used for this has many flaws. And more importantly, (v) the native dry forest had a higher number of genes related to carbon cycling, metal homeostasis, phosphorous and sulfur cycling, organic remediation, and N cycling, among other processes, compared to the revegetated dump terraces. This shows that post-mining revegetation strategies in our study system still need more time to reach a target situation of native vegetation. These results are intricate and show that a single measure of soil microbial biodiversity and/or community complexity does not tell the whole story of different treatment effects. Measures of alpha taxonomic diversity could tell a very different story than functional diversity measures (as GeoChip 5.0S). This study constitutes a pioneer study regarding the ecosystem functioning effects of revegetation strategies in post-mining scenarios in Colombia.