Introduction

The Indo-Burma region is one of the richest biodiversity hotspot zones and represents the transition between the Indian and Indochinese subregions (Myers et al. 2000). This region contains an estimated 9.7 and 8.3 % of the world’s known endemic plant and vertebrate species, respectively (Brook et al. 2003). To date, very little work has been done on the subsurface microbial diversity in the Indo-Burma region.

Caves represent subsurface habitats and are less explored in terms of microbial biodiversity and community composition due to environmental and geographical constrains. Lack of photosynthesis and limited nutrient makes the caves an extreme environment to sustain life. However, alternative energy in the form of allochthonous organic materials transported from the surface through bats, rodents, and human activities, or by percolating water, is utilized by certain groups of microorganisms (Barton 2006). Caves also act as long-term reservoirs for endemic as well as allochthonous microorganisms (Engel et al. 2010). Earlier, such studies have reported diverse groups of microorganisms associated with different geological and environmental factors (Adetutu et al. 2011, 2012). They have been implicated in astrobiology, drug discovery, and cave conservation studies (Northup et al. 2011; Saiz-Jimenez 2012). These microbial communities influence the formation and preservation of cave deposits by constructive and destructive processes. In addition, cave microbes act as primary producers which sustain populations of more complex organisms (Barton and Northup 2007).

The majority of cave microbial diversity studies have been done using culture-dependent techniques which revealed only 1 % of the total microorganisms (Ward et al. 2009). Molecular microbial ecology tools such as denaturing gradient gel electrophoresis (DGGE) and clone library analysis are being used by many researchers to characterize these uncultured microbes, but these techniques are not sufficient to analyze the entire population in the community (Adetutu et al. 2012). With the advancement of next-generation sequencing, cave microbial ecology research has furthermore expanded, which allows the use of culture-independent techniques to reveal the microbial biodiversity in caves (Gray and Engel 2013). Most of the caves present in Mizoram are of tectonic origin caused by tension cleavage of the compact host rock (Gebauer et al. 2001). Since these caves are present in unexplored environments, it is assumed that microorganisms living in these caves would be mostly new and unknown. Studying this unique habitat provides an opportunity to understand global microbial diversity, novel population assemblages, energy dynamics, and metabolism (Ortiz et al. 2013).

The objective of this study was to use high-throughput Illumina sequencing (1) to survey the microbiota of cave sediment communities near the Indo-Myanmar border Mizoram, Northeast India, and (2) to understand the influence of the cave physico-chemical properties on the bacterial diversity. These analyses were based on the hypothesis that the undisturbed habitats in the Indo-Burma region will host novel bacterial species and the cave ecological parameters might favor the species diversity and richness.

Materials and methods

Three caves were selected for the present study based on the criteria that they have not been studied or explored yet. Sediment samples were collected from different locations of the caves Murapuk (CMP), Lamsialpuk (CLP), and Farpuk (CFP), and upon collection, the samples were sieved and preserved at 4 °C (Table 1; Fig. 1). CFP is undisturbed and is not influenced by any anthropogenic activities, whereas CLP and CMP are near to human inhabitants and have been under some exogenous influence. Ten samples (100 g) were collected at different locations and mixed thoroughly to make a composite sample for individual caves. Sediment samples were analyzed for carbon and nitrogen content with a CHNS/O analyzer (Perkin Elmer, MA, USA), and pH of the samples was measured using a pH meter (UTECH). No specific permit was required for sampling since it did not involve any endangered species or sampling in the protected areas.

Table 1 Details of the cave samples analyzed in the present study
Fig. 1
figure 1

Geographical location of the sampling sites in Champhai district of Mizoram, Northeast India

Bacterial DNA was isolated from 0.5-g sediment samples using the Fast DNA spin kit (MP Biomedical, Solon, OH, USA), and the DNA concentration was quantified using a microplate reader (SpectraMax 2E, Molecular Devices, CA, USA). The V4 hypervariable region of the 16S rRNA gene was amplified using 10 pmol/μl of each forward 515F (5′-3′) and reverse 806R (5′-3′) primer. The amplification mix contained 40 mM dNTPs (NEB, MA, USA); 5X Phusion HF reaction buffers (NEB, MA, USA); 2 U/μl F-540 Special Phusion HS DNA Polymerase (NEB, MA, USA); and 5 ng template DNA and Milli-Q water to make up 30 μl total volume. The PCR running conditions consisted of an initial 98 °C for 30 s followed by 30 cycles of 98 °C for 10 s, 72 °C for 30 s, and a final extension at 72 °C for 5 s.

High-throughput paired end Illumina MiSeq sequencing (2 × 250 bp) was performed at SciGenom Lab, Cochin, India. Raw fastq sequences were processed and analyzed using the QIIME software package v.1.8.0 according to base quality score distribution, average base content per read, and guanine-cytosine (GC) content in the reads (Caporaso et al. 2010a, b). Sequences with poor quality (i.e., with a quality score <25 and read length <200 bp) were filtered using the split_libraries command. We then performed USEARCH to remove chimeric sequences (Edgar et al. 2011) followed by deletion of singletons since we relied on filtering rather than denoising the data (Jones et al. 2009). The sequences from the CFP sample had more than 50 % singletons in the consensus reads, which are believed to possess no taxonomic information and hence deleted from further sequence analysis. Preprocessed consensus V4 sequences obtained were clustered into operational taxonomic units (OTUs) based on their sequence similarity using the Uclust program (similarity cutoff = 0.97) (Edgar 2010).

A representative sequence for each OTU was finally aligned to the Greengenes core set reference database using the PyNAST program (DeSantis et al. 2006). Representative sequence for each OTU was classified using the RDP classifier and Greengenes OTU database. Sequences which are not classified were categorized as unknown. The Shannon diversity and observed species indices were calculated by QIIME software. The Shannon index represents OTU abundance and estimates for both richness and evenness, whereas observed species metric detected unique OTUs present in the samples. In this study, beta diversity between three bacterial cave communities (CLP, CMP, and CFP) was determined by weighted and unweighted UniFrac (Lozupone and Knight 2005). A weighted UPGMA tree was constructed by performing Jackknife test A with ten replicates and each subsample containing 100,000 random reads selected from each sample. Also, a heatmap was generated to determine the relative abundance of the top 200 OTUs classified at the genus level using the CIMminer tool for clustered image maps (http://discover.nci.nih.gov/). Principal component analysis was performed using PAST statistics software v.2.17c to determine the correlation between microbial diversity and physico-chemical factors (Hammer et al. 2001). Pearson correlations between soil characteristics and bacterial phyla were calculated using PASW Statistics 18 (SPSS Inc., Chicago, IL, USA).

Results and discussion

In the present study, high-throughput Illumina sequencing was performed to survey the bacterial community present in the cave sediments of Mizoram, Northeast India, to determine if ecological factors are responsible for changes in bacterial community composition between the different caves. Paired end Illumina sequencing generates raw fastq sequences with 90 % of bases having a Phred score greater than 30. A total of 1,135,543 preprocessed reads were clustered into 10,643 OTUs. Sample libraries ranged from 259,895 (CFP) to 470,260 (CLP) sequence reads (Table 2).

Table 2 Summary statistics of Illumina paired end reads (V4 region of 16S rRNA gene)

Alpha and beta diversity

The number of OTUs and the Shannon diversity index were calculated to estimate the alpha diversity. A total of 14,631 OTUs were found in the complete dataset of processed sequence reads. Based on the observed OTUs, CMP had the highest bacterial diversity (6555), followed by CLP (4968) and CFP (3108). The Shannon index also showed a high bacterial diversity at CMP (9.30) compared to those of CLP (9.05) and CFP (4.75). This analysis shows that the sample from CMP is more diverse than the other two samples (CLP and CFP). Rarefaction curve for observed species and the Shannon metric are shown in Fig. 2. Beta diversity represents the explicit comparison of microbial communities based on their composition. Unweighted UniFrac reveals a close relationship between these three cave bacterial communities (Table 3). A consensus UPGMA tree with a weighted UniFrac approach clustered the CLP and CMP samples together, suggesting that they have similar bacterial communities. Conversely, CFP was clustered separately, suggesting that bacterial communities were different from the two other samples (Fig. 3). This variation may be due to the remote location with no anthropogenic disturbances in the Farpuk (CFP) than in the other two caves. Identification of large numbers of microbial species is a common phenomenon for underground microbial community compared to surface environment (Moss et al. 2011; Epure et al. 2014).

Fig. 2
figure 2

Rarefaction analysis of alpha diversity among the CMP, CLP, and CMP samples. Two different diversity matrixes were used. a Observed number of species. b Shannon diversity index

Table 3 Unweighted UniFrac distance matrix among the cave samples
Fig. 3
figure 3

Phylogenetic tree based on the distances between samples CLP, CMP, and CFP with weighted UniFrac approach

Composition of cave bacterial communities

A total of 21 bacterial phyla and 21 candidate phyla were identified from the three cave sediments. Communities were dominated by Actinobacteria, Planctomycetes, Chloroflexi, Acidobacteria, and Proteobacteria. Relative abundances among the top ten dominant phyla are represented in Figs. 4 and 5 and varied among the cave sediment. A detailed comparison of dominant bacterial genera is presented as a heatmap in Fig. 6. The heatmap analysis shows that low-abundance bacterial genera were mostly present in the CFP sample, whereas both the CLP and CMP samples contained a high number of dominant bacterial genera.

Fig. 4
figure 4

Taxonomy classifications of reads at phylum level for the cave samples. Only top ten enriched class categories are shown in the figure. Classification is performed using the RDP classifier and Greengenes OTUs database

Fig. 5
figure 5

Taxonomy classifications of OTUs at phylum level for the cave samples. Only top ten enriched class categories are shown in the figure. Classification is performed using the RDP classifier and Greengenes OTUs database

Fig. 6
figure 6

Heatmap analysis of the bacterial community based on the top 200 OTUs classified at the genus level. The color intensity in each panel shows the percentage of a genus in a sample, referring to the color key at the bottom

Our study detected Actinobacteria Actinobacteria as the most dominant (35.97 % of total sequence) with the majority of the OTUs falling under the order Actinomycetales, followed by Solirubrobacterales and Acidimicrobiales. They are an important inhabitant of caves since they are actively involved in the formation of colored crystals in cave walls and thus constructive biomineralization process (Barton et al. 2001).

Other identified phyla include Thermoleophilia, Rubrobacteria, and MMB-A2-108. In our study, twelve Actinobacteria were identified up to species level: Streptomyces radiopugnans, Virgisporangium ochraceum, Actinomadura vinacea, Streptomyces lanatus, Rhodococcus fascians, Saccharopolyspora hirsute, Virgisporangium ochraceum, Streptomyces mirabilis, Actinomadura vinacea, and Mycobacterium celatum. In all the cave samples, OTU 3283 was the most dominant phylotype, and BLAST results showed a 100 % similarity with the genus Mycobacterium. Another dominant phylotype included OTU 5355 which closely related to Arthrobacter—a member of the GC-rich “actinomycete” capable of utilizing a diverse range of organic substances as carbon and energy sources such as nicotine, nucleic acids, and various herbicides and pesticides (Li et al. 2005). The predominance of diverse groups of Actinobacteria in the present study is in accordance with the findings of earlier studies in cave ecosystems (Cuezva et al. 2012).

The phylum Chloroflexi had the second largest number of sequences (13.96 %) and was dominated by the class Ktedonobacteria. Genera identified under this phylum included Chloroflexus, FFCH10602, Caldilinea, Ardenscatena, Chloronema, and Oscillochloris. The most dominant OTUs in this phylum were OTU 1827 and OTU 9830, which were classified under the orders Thermogemmatisporales and TK10, respectively. Members of these groups are commonly found in most caves and act as aerobic and anaerobic thermophiles, filamentous anoxygenic phototrophs, and anaerobic organohalide respirers (Hanada et al. 1995; Hugenholtz and Stackebrandt 2004; Maymó-Gatell et al. 1997; Sekiguchi et al. 2003).

In this study, an average value of 13.76 % of all sequences corresponded to Planctomycetes, a distinct phylum of the domain bacteria having intracellular compartmentalization and lack of peptidoglycan in their cell walls (Fuerst and Sagulenko 2011). Most of the dominant OTUs in this phylum were classified under the orders WD2101 and Gemmatales. This phylum was the main group present in the CMP (22.82 % of all reads) and CLP (18.43 % of all reads) samples, whereas CFP communities constituted of only 0.03 % Planctomycetes. Although this phylum is a common member of cave bacterial communities, its role in cave habitats is unclear due to limited culturable representatives. Few studies showed their involvement in metabolism of sulfated polysaccharides as well as oxidation of ammonia (Schmid et al. 2000; Jetten et al. 2003).

Proteobacteria was found to be diverse in all the three cave bacterial communities. A total of 46,154 sequences with 497 OTUs were found under the class Alphaproteobacteria. Major OTUs in Alphaproteobacteria were classified under the order Rhizobiales. Dominant taxa within the Alphaproteobacteria included Sphingomonadaceae, Kaistobacter, Bradyrhizobiaceae, Bradyrhizobium, Hyphomicrobiaceae, and Rhodoplanes. The family Sphingomonadaceae represents the aromatic hydrocarbon-degrading bacteria (Kersters et al. 2006).

Betaproteobacteria was less diverse with only 19 OTUs (2792 sequence) detected in all the samples. The most dominant OTUs within the Betaproteobacteria included OTU 3602 and 8071. Both OTUs were identified as Burkholderia. Gammaproteobacteria was less diverse and was dominated by the genus Dyella. A total of 323 sequences clustering into 37 OTUs were classified under the subphyla Deltaproteobacteria. Conventional studies on caves across the world revealed heterotrophic interactions and carbon turnover by Alphaproteobacteria and Betaproteobacteria and Firmicutes and Actinobacteria (Barton 2014), whereas application of next-generation sequencing technology on cave environments revealed higher diversity and metabolic activities in microbial communities (Tetu et al. 2013; Ortiz et al. 2014).

The phylum Acidobacteria was moderately abundant in the cave samples and represented by 11.44 % of the total sequences obtained. This phylum consisted of families Solibacteraceae, Koribacteraceae, and Acidobacteriaceae. The most dominant OTUs in this phylum included OTU 7994 and OTU 9544 and belonged to the class Chloracidobacteria and Acidobacteria-6, respectively. OTUs 6901 and 5227 demonstrated close sequence similarity with Candidatus Solibacter usitatus Ellin6076, which is adopted to survive under low-nutrient conditions (Ward et al. 2009).

A total of 33,411 reads comprising of 361 OTUs was classified within the phylum Armatimonadetes (formerly known as “candidate division OP10”), a dominant and globally distributed lineage within this “uncultured majority.” All the OTUs were classified under the genus Fimbriimonas, except OTU 1709, which was classified under the genus Chthonomonas. One OTU (OTU 4733) from the CMP sample was classified under the genus Gemmatimonas. Seven OTUs containing 89 reads (CFP53, CLP2, and CMP34) were affiliated with genus Nitrospira. Members of this group are obligate chemolithoautotroph and obtain energy by oxidizing nitrite (Watson et al. 1986; Ehrich et al. 1995). They have been detected in the Mexican anchialine caves, Tito Bustillo caves, and Movile caves in Romania (Pohlman et al. 1997; Schabereiter-Gurtner et al. 2002). Thus, lithochemotrophy might be an important survival factor for cave-dominating bacterial species (Wu et al. 2015).

Our analysis reveals 45 OTUs under the phylum Bacteroidetes. Taxa classified up to species level included Cytophaga xylanolytica, Flavobacterium succinicans, Bacteroides plebeius, Sphingobacterium multivorum, and Fontibacter flavus. The most dominant OTUs classified were under Flavisolibacter (OTU 1336 and OTU 3516) and Adhaeribacter (OTU 8478). Members of the genus Bacteroides have been known to act as opportunistic pathogens and mostly causing post-operative infections of the peritoneal cavity and bacteraemia (Wexler 2007). However, the role of Bacteroidetes in cave systems is not yet clear and needs to be further investigated.

Composition of cave archaeal communities

There were 334 sequences comprising 19 OTUs classified under the phylum Euryarchaeota, dividing into four classes—Methanomicrobia, Thermoplasmata, Halobacteria, and Methanobacteria. Within the Euryarchaeota, 92 reads comprising 9 OTUs were affiliated as methanogenic Archaea under the genera Methanocella, Methanocorpusculum, Methanoculleus, Methanosarcina, Methanobacterium, Methanosaeta, and Methanoplanus, and 13 reads were identified as halophilic Archaea. Members of the genus Methanocella play a key role in methane emissions in paddy fields and were responsible for the final step of the anoxic degradation of organic substances (Conrad et al. 2006; Sakai et al. 2011; Lü and Lu 2012). Methanocorpusculum species are commonly found in subsurface environments and were previously detected as the most prominent genus in a coal bed and shale (Strapoc et al. 2008; Waldron et al. 2007). One OTU was classified under the genus Methanosaeta, frequently detected in both anaerobic methane-producing bioreactors and shallow marine sediments (MacLeod et al. 1990; Thomsen et al. 2001). A total of 87 reads were identified under the genus Methanosarcina, the only known anaerobic methanogens that produce methane using all three known metabolic pathways for methanogenesis (Galagan et al. 2002). On the other hand, four reads were identified each as Methanoplanus and Methanoculleus, whereas five reads were classified under the genus Methanobacterium. However, within the phylum Euryarchaeota, none of the OTUs were classified up to the species level in all the three cave communities and were found to be present in rare numbers (<0.01 % of total bacterial community). The phylum Crenarchaeota was also detected in very low numbers and was clustered into 21 OTUs (total read 1698). All the Crenarchaeota were assigned into two classes: MBGA and Thaumarchaeota. The families identified under Crenarchaeota were Nitrososphaeraceae and SAGMA-X.

Analysis of candidate phylum

In the present study, 21 candidate phyla or bacterial lineages were identified which forms part of the rare biosphere. The most dominant OTU among the candidate phyla was OTU 1407 (read = 3094), classified under the phylum AD3 and having close sequence similarity with the environmental clone LuqGS470001. This clone was originally isolated from deep saprolite and saprock, which are believed to play a role in weathered minerals (Minyard et al. 2012). Other candidate phyla identified in the present study included LD1, MPV, NKB, OD, OD1, OD3, TM6, TM7, WS1, WS2, WS3, WWE1, ZB3, BH1, BRC1, FCPU, GAL, GN, ZB3, and Kazan3b. Top ten bacterial genera based on OTU number and top ten OTUs based on total read count among the three cave samples are represented in Supplementary file Tables S1 and S2, respectively. Relative abundance of bacterial diversity from phylum to species is shown in Supplementary Figs S1 to S3. Commonly identified species among the cave samples were members of the phyla Actinobacteria, Firmicutes, and Proteobacteria (Table 4).

Table 4 Shared and unique taxa (identified up to species level) in the cave samples

Rare and abundant species

Illumina sequencing revealed a large number of phylotypes among the cave samples that belong to the rare biosphere, which are microorganisms with extremely low abundance (Reid and Buckley 2011). In the present study, the rare species constituted between 75.72 and 83.01 % of all samples while abundant species comprised between 16.98 and 24.27 % (Fig. 7). The selection criteria for rare (<0.01 % of total community) and abundant (other than rare) species were based on a previous study (Aravindraja et al. 2013). Ratios of rare and abundant OTUs among all the three samples were similar and within a range of 3.11–4.88. The most abundant phylotype was OTU 6722 classified under Actinobacteria and was present in all the three cave samples. Figure 8 shows the unique and shared species among the rare biosphere in all the three samples. According to the Venn diagram, only 171 rare OTUs (1.78 %) are being shared among the three communities, but the majority of the rare species in CFP are unique, whereas many common OTUs were observed between CLP and CMP samples. Among the abundant species, 3.94 % are shared by all the three cave samples. Many OTUs were rare in one cave community but abundant in other cave samples, which might be due to different ecological factors forcing some groups to be dormant and become a member of the rare biosphere. These members can become active during favorable conditions and become abundant.

Fig. 7
figure 7

Percentage of abundant and rare OTUs among the cave samples

Fig. 8
figure 8

Venn diagram showing the unique and shared species among the a rare and b abundant OTUs among of the cave samples

Relationship between bacterial phyla and the soil physico-chemical factors

Ecological factors that control microbial community structure in caves is a key question in microbial ecology. In the present study, a correlation matrix based on the Pearson rank coefficient was carried out to determine association of particular physico-chemical parameter to the relative abundance of the bacterial phyla identified from all sites. Correlation between Bacteroides abundances with total nitrogen, pH, and nitrogen was observed but were not statistically significant. Among the abundant bacterial phylum, Thermi was negatively correlated with elevation (p < 0.05) and temperature (p < 0.05). Similarly, Eukaryoarchea was also negatively correlated with carbon (p < 0.05) and Firmicutes with nitrogen (p < 0.05). The abundance of Crenarchaeota was positively correlated with humidity (p < 0.005). Within the candidate phylum, statistical significant correlation was observed for BRC1 and WS2 with humidity (p < 0.005). A PCA plot was generated to examine the association between microbial community structure and sediment physico-chemical properties (Fig. 9). In the complete dataset, there were 20 significant correlations between cave physiochemical properties and the abundance of specific bacterial phylum (Table S2), but the full exploration of these correlations is beyond the scope of this report. The present study detected only a few significant correlations which may be ascribed to small sample size, or the identified taxa might be influenced by ecological parameters not measured in the present study. Other important parameters such as K, C, Ca, moisture, and dissolved oxygen have strong influence on the bacterial community composition (Stomeo et al. 2012).

Fig. 9
figure 9

Principle component analysis showing correlation between physiochemical factors with the dominant bacterial phyla. Abbreviation: Arm (Armatimonadetes), Par (Parvarchaeota),TM6, Cya (Cyanobacteria), NKB19, Cre (Crenarchaeota),WPS-2, Gem (Gemmatimonadetes), PRO (Proteobacteria), MVP-21, WS2, Syn (Synergistetes), WS3, OP3, Unk (unknown), Ver (Verrucomicrobia), The (Thermi), GAL (GAL15), BAC (Bacteroidetes), FBP, AD3, BHI (BHI80-139), TM7, Ten (Tenericutes), Aci (Acidobacteria), Ther (Thermotogae), OP11, BRC1, OD1, ELU (Elusimicrobia), Eur (Euryarchaeota), Nit (Nitrospirae), Act (Actinobacteria), ECP (FCPU426), Plan (Planctomycetes), Fir (Firmicutes), Chl (Chloroflexi), ZB3, Chlo (Chlorobi). C Carbon, N Nitrogen, HUM Humidity, TEMP Temperature, ELE Elevation, PH pH

Conclusion

This study provides an in-depth analysis of unexplored bacterial diversity in cave samples of Mizoram, Northeast India, with 21 major and candidate phyla, as well as a large portion of unclassified bacteria indicating the possibility of the presence of novel species. This study is in agreement with previous reports of unexplored environment (Aravindraja et al. 2013). It was determined that the number of classified reads decreased from the phylum to species level, leading to a number of unidentified species. The most dominant phylotypes were OTU 6722 (11.72 %) and OTU 4035 (5.72 %) belonging to Actinobacteria and Verrucomicrobia, respectively. The remaining phyla found in the cave communities had low (<4 %) abundance. The present study revealed a unique bacterial community in Farpuk, which was mostly classified under uncultured Actinobacteria. The study also revealed that the bacterial diversity was higher in the CMP and CLP compared to that in the CFP samples. This might be because CFP is inaccessible compared to other caves, and their diversity is not influenced by any exogenous source and might have different ecological conditions. Further analysis with more samples and whole genome sequencing will reveal the actual role of these rare and abundant phyla present in the cave samples.